Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Similar documents
Neural Network Model of the Backpropagation Algorithm

Fast Multi-task Learning for Query Spelling Correction

More Accurate Question Answering on Freebase

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

1 Language universals

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

MyLab & Mastering Business

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Variations of the Similarity Function of TextRank for Automated Summarization

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

Probabilistic Latent Semantic Analysis

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

International Series in Operations Research & Management Science

Word Embedding Based Correlation Model for Question/Answer Matching

Speech Emotion Recognition Using Support Vector Machine

Deep Neural Network Language Models

Lecture 1: Machine Learning Basics

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

A deep architecture for non-projective dependency parsing

Summarizing Answers in Non-Factoid Community Question-Answering

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semantic and Context-aware Linguistic Model for Bias Detection

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Curriculum Vitae of Chiang-Ju Chien

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

EDUCATION AND THE PUBLIC DIMENSION OF MUSEUMS

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Investigation on Mandarin Broadcast News Speech Recognition

HLTCOE at TREC 2013: Temporal Summarization

Learning Methods in Multilingual Speech Recognition

arxiv: v4 [cs.cl] 28 Mar 2016

arxiv: v1 [cs.cl] 20 Jul 2015

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Calibration of Confidence Measures in Speech Recognition

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Georgetown University at TREC 2017 Dynamic Domain Track

Efficient Online Summarization of Microblogging Streams

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Constructing Parallel Corpus from Movie Subtitles

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.lg] 3 May 2013

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Language Model and Grammar Extraction Variation in Machine Translation

A study of speaker adaptation for DNN-based speech synthesis

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Assignment 1: Predicting Amazon Review Ratings

Re-evaluating the Role of Bleu in Machine Translation Research

Ensemble Technique Utilization for Indonesian Dependency Parser

Vocabulary Agreement Among Model Summaries And Source Documents 1

Australian Journal of Basic and Applied Sciences

Noisy SMS Machine Translation in Low-Density Languages

Online Updating of Word Representations for Part-of-Speech Tagging

Missouri Mathematics Grade-Level Expectations

arxiv: v1 [cs.cl] 27 Apr 2016

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

The role of word-word co-occurrence in word learning

The Role of the Head in the Interpretation of English Deverbal Compounds

A heuristic framework for pivot-based bilingual dictionary induction

BLACKBOARD TRAINING PHASE 2 CREATE ASSESSMENT. Essential Tool Part 1 Rubrics, page 3-4. Assignment Tool Part 2 Assignments, page 5-10

Human-like Natural Language Generation Using Monte Carlo Tree Search

Unsupervised Cross-Lingual Scaling of Political Texts

Improvements to the Pruning Behavior of DNN Acoustic Models

A Comparison of Two Text Representations for Sentiment Analysis

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Extracting and Ranking Product Features in Opinion Documents

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Cross Language Information Retrieval

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Residual Stacking of RNNs for Neural Machine Translation

Mandarin Lexical Tone Recognition: The Gating Paradigm

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Detecting English-French Cognates Using Orthographic Edit Distance

Distant Supervised Relation Extraction with Wikipedia and Freebase

A Case Study: News Classification Based on Term Frequency

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

On-the-Fly Customization of Automated Essay Scoring

Modeling function word errors in DNN-HMM based LVCSR systems

Python Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Matching Similarity for Keyword-Based Clustering

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Modeling function word errors in DNN-HMM based LVCSR systems

A Review: Speech Recognition with Deep Learning Methods

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Writing quality predicts Chinese learning

Meta Comments for Summarizing Meeting Speech

Transcription:

Novel Word Embedding and Translaion-based Language Modeling for Exracive Speech Summarizaion Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen +, Hsin-Min Wang, Hsin-Hsi Chen # Academia Sinica, Taian # Naional Taian Universiy, Taian + Naional Taian Normal Universiy, Taian {kychen, ourney, hm}@iis.sinica.edu., + berlin@csie.nnu.edu., # hhchen@csie.nu.edu. ABSTRACT Word embedding mehods revolve around learning coninuous disribued vecor represenaions of ords h neural neorks, hich can capure semanic and/or synacic cues, and in urn be used o induce similariy measures among ords, senences and documens in conex. Celebraed mehods can be caegorized as predicion-based and coun-based mehods according o he raining obecives and model archiecures. Their pros and cons have been exensively analyzed and evaluaed in recen sudies, bu here is relaively less ork coninuing he line of research o develop an enhanced learning mehod ha brings ogeher he advanages of he o model families. In addiion, he inerpreaion of he learned ord represenaions sill remains someha opaque. Moivaed by he observaions and considering he pressing need, his paper presens a novel mehod for learning he ord represenaions, hich no only inheris he advanages of classic ord embedding mehods bu also offers a clearer and more rigorous inerpreaion of he learned ord represenaions. Buil upon he proposed ord embedding mehod, e furher formulae a ranslaion-based language modeling frameork for he exracive speech summarizaion ask. A series of empirical evaluaions demonsrae he effeciveness of he proposed ord represenaion learning and language modeling echniques in exracive speech summarizaion. Keyords Word embedding, represenaion, inerpreaion, language model, speech summarizaion 1. INTRODUCTION Ong o he populariy of various Inerne applicaions, rapidly grong mulimedia conen, such as music video, broadcas nes programs and lecure recordings, has been coninuously filling our everyday life [1, 2]. Obviously, speech is one of he mos imporan sources of informaion abou mulimedia. By virue of speech summarizaion, one can efficienly brose mulimedia conen by digesing he summarized audio/video snippes and associaed ranscrips. Exracive speech summarizaion manages o selec a se of salien senences from a spoken documen according o a arge summarizaion raio and subsequenly concaenae hem ogeher o form a summary [3]. The de specrum of summarizaion mehods developed so far may be roughly divided ino hree caegories: 1) mehods simply based on he senence posiion or srucure informaion [4], 2) mehods based on unsupervised senence ranking [5], and 3) mehods based on supervised senence classificaion [5]. Ineresed readers may refer o [5, 7, 8] for comprehensive revies and ne insighs ino he maor mehods ha have been developed and applied h good success o a de variey of ex and speech summarizaion. Orhogonal o he exising commonly-used mehods, e explore in his paper he use of various ord embedding mehods [9-11] in exracive speech summarizaion, hich have recenly demonsraed excellen performance in many naural language processing (NLP) relaed asks, such as machine ranslaion [12], senimen analysis [13] and senence compleion [14]. The cenral idea of hese mehods is o learn coninuous, disribued vecor represenaions of ords using neural neorks, hich can probe laen semanic and/or synacic cues, and in urn be employed o induce similariy measures among ords, senences and documens. According o he variey of he raining obecives and model archiecures, he classic mehods can be roughly classified ino he predicion-based and coun-based mehods [15]. Recen sudies in he lieraure have evaluaed hese mehods in several NLP-relaed asks and analyzed heir srenghs and deficiencies [11, 16]. Hoever, here are only a fe sudies in he lieraure ha coninue he line of research o crysalize an enhanced ord embedding mehod ha brings ogeher he meris of hese o maor families. In addiion, he inerpreaion of he learned value of each dimension in a learned ord represenaion is a bi opaque. To saisfy he pressing need and complemen he defec, e propose a novel modeling mehod, hich no only inheris he advanages from he classic ord embedding mehods bu also offers a clearer and more rigorous inerpreaion. Beyond he effors o improve he represenaion of ords, e also presen a novel and efficien ranslaion-based language modeling frameork on op of he proposed ord embedding mehod for exracive speech summarizaion. Unlike he common hread of leveraging ord embedding mehods in speech/ex summarizaion asks, hich is o represen a documen/senence by averaging he corresponding ord embeddings over all ords in he documen/senence and esimae he cosine similariy measure of any given documensenence pair, he proposed frameork can auhenically capure he finer-grained (i.e., ord-o-ord) semanic relaionship o be effecively used in exracive speech summarizaion. In a nushell, he maor conribuions of he paper are ofold: A novel ord represenaion learning echnique, hich no only inheris he advanages from he classic ord embedding mehods bu also offers a clearer and more rigorous inerpreaion of ord represenaions, is proposed. A ranslaion-based language modeling frameork on op of he proposed ord embedding mehod, hich can also be inegraed h classic ord embedding mehods, is inroduced o he exracive speech summarizaion ask.

2. CLASSIC WORD EMBEDDING METHODS Perhaps one of he mos ell-knon seminal sudies on developing ord embedding mehods as presened by Bengio e al. [9]. I esimaed a saisical n-gram language model, formalized as a feedforard neural neork, for predicing fuure ords in conex hile inducing ord embeddings as a by-produc. Such an aemp has already moivaed many follo-up exensions o develop effecive mehods for probing laen semanic and synacic regulariies manifesed in he represenaions of ords. Represenaive mehods can be caegorized as predicion-based and coun-based mehods. The skip-gram model (SG) [10] and he global vecor model (Gloe) [11] are ell-sudied examples of he o caegories, respecively. Raher han seeking o learn a saisical language model, he SG model is inended o obain a dense vecor represenaion of each ord direcly. The srucure of SG is similar o a feed-forard neural neork, h he excepion ha he non-linear hidden layer in he former is removed. The model hus can be rained on a large corpus efficienly, geing around he heavily compuaional burden incurred by he non-linear hidden layer, hile sill reaining good performance. Formally, given a ord sequence, 1, 2,, T, he obecive funcion of SG is o maximize he log-probabiliy, T c = 1 k = c k 0 + k log P ( ), (1) here c is he ndo size of he conexual ords for he cenral ord, and he condiional probabiliy is compued by exp( v ) + k + k v P( ) =, (2) exp( v v ) i=1 here vv +kk and vv denoe he represenaions of he ords a posiions +k and, respecively; denoes he i-h ord in he vocabulary; and is he vocabulary size. The Gloe model suggess ha an appropriae saring poin for ord represenaion learning should be associaed h he raios of co-occurrence probabiliies raher han he predicion probabiliies. More precisely, Gloe makes use of a eighed leas squares regression, h he aim of learning ord represenaions ha can characerize he co-occurrence saisics beeen each pair of ords: f ( X i= 1 = 1 2 )( v v + b + b log X ), (3) here and are any o disinc ords in he vocabulary; XX i denoes he number of imes ords and co-occur in a predefined sliding conex ndo; f( ) is a monoonic smoohing funcion used o modulae he impac of each pair of ords involved in model raining; and vv i and bb i denoe he ord represenaion and he bias erm of ord, respecively. Ineresed readers may refer o [15, 16] for a more horough and eneraining discussion. 3. METHODOLOGY 3.1 The Proposed Word Embedding Mehod Alhough he predicion-based mehods have shon heir remarkable performance in several NLP-relaed asks, hey do no sufficienly uilize he saisics of he enire corpus since he models are usually rained on local conex ndos in a separae manner [11]. By conras, he coun-based mehods leverage he holisic saisics of he corpus efficienly. Hoever, a fe sudies have indicaed heir relaively poor performance in some asks [16]. Among all he exising mehods (boh he predicion-based and coun-based mehods), he inerpreaion of he learned value of each dimension in he represenaion is no inuiively clear. Moivaed by hese observaions, a novel modeling approach, hich naurally brings ogeher he advanages of he o maor model families and resuls in inerpreable ord represenaions, is proposed. We begin h he definiion of erminologies and noaions. As mos classic embedding mehods, e inroduce o ses of ord represenaions: one is he se of desired ord represenaions, denoed by M; he oher is he se of separae conex ord represenaions, denoed by W. W and M are HH marices, here he -h columns of marices W and M, denoed by WW R HH and MM R HH, correspond o he -h ord in he vocabulary. H is a pre-defined dimension of he ord embedding. To make he learned represenaion inerpreable, e assume ha each ord embedding is a mulinomial represenaion. Furhermore, o make he compuaion more efficien, e assume ha each ro vecor of marix W follos a mulinomial disribuion as ell. To inheri he advanages from he predicion-based mehods, he raining obecive is o obain an appropriae ord represenaion by considering he predicive abiliy of a given ord occurring a an arbirary posiion of he raining corpus (denoed ) o predic is surrounding conex ords: c P( k = c k 0 + k ) = W M c + k k = c 0 = 1 k W M The denominaor can be omied because i alays equals o 1. In order o characerize he hole corpus saisics ell, e rain he model parameers in a bach mode insead of using a sequenial learning sraegy. Therefore, he obecive funcion becomes. (4) n(, )log( W M ), (5) i= 1 = 1 i here n(,) denoes he number of imes ords and cooccur in a pre-defined sliding conex ndo. Obviously, such a model no only bears a close resemblance o he predicion-based mehods (e.g., SG) bu also capializes on he saisics gahered from he enire corpus like he coun-based mehods (e.g., Gloe), in a probabilisic frameork. The componen disribuions (i.e., W and M) can be esimaed using he expecaion-maximizaion (EM) algorihm. Advanced algorihms, such as he riple ump EM algorihm [17], can be leveraged o accelerae he raining process. Since he raining obecive of he proposed mehod is similar o ha of he SG model, and i resuls a se of disribuional ord represenaions, e hus erm he proposed model he disribuional skip-gram model (DSG). The inerpreive abiliy of DSG ll be discussed in deail laer in Secion 4.3. 3.2 Translaion-based Language Modeling for Summarizaion Language modeling (LM) has proven is broad uiliy in many NLP-relaed asks. In he conex of using LM for exracive speech summarizaion, each senence S of a spoken documen D o be summarized can be formulaed as a probabilisic generaive model for generaing he documen, and senences are seleced based on he corresponding generaive probabiliy P(D S): he higher he probabiliy, he more represenaive S is likely o be for D [18]. The maor challenge facing he LM-based frameork is ho o accuraely esimae he model parameers for each senence. The simples ay is o esimae a unigram language model (ULM) on he basis of he frequency of each disinc ord occurring in he senence S, h he maximum likelihood crierion:

n(, S) P ( S) =, (6) S here n(,s) is he number of imes ha ord occurs in senence S, and S is he lengh of he senence. There is a general consensus ha merely maching erms in a candidae senence and he documen o be summarized may no alays selec summary senences ha can capure he imporan semanic inen of he documen. Thus, in order o more precisely assess he represenaiveness of a senence o he documen, e sugges inferring he probabiliy ha he documen ould be generaed as a ranslaion of he senence. Tha is, he generaing probabiliy is calculaed based on a ranslaion model of he form P( ), hich is he probabiliy ha a senence ord is semanically ranslaed o a documen ord : n(, D) P ( D S) = ( ) ( ). D P P S (7) S i Accordingly, he ranslaion-based language modeling approach allos us o score a senence by compuing he degree of mach beeen a senence ord and he semanically relaed ords in he documen. If P( ) only allos a ord o be ranslaed ino iself, Eq. (7) ll be reduced o he ULM approach (cf. Eq. (6)). Hoever, P( ) ould in general allo us o ranslae ino he semanically relaed ords h non-zero probabiliies, hereby achieving semanic maching beeen he documen and is componen senences. Based on he proposed DSG mehod, he ranslaion probabiliy P( ) can be naurally compued by: P i ) = W M. (8) ( Consequenly, he senences offering he highes generaed probabiliy (cf. Eqs (7) and (8)) and dissimilar o hose already seleced senences (for an already seleced senence S, compuing P(S S ) using Eq. (7)) ll be seleced and sequenced o form he final summary according o a desired summarizaion raio. The proposed ranslaion-based language modeling mehod is denoed by TBLM hereafer. 4. EXPERIMENTS 4.1 Daase & Seup We conduc a series of experimens on a Mandarin Benchmark broadcas ne corpus [19]. The MATBN daase is publicly available and has been dely used o evaluae several NLP-relaed asks, including speech recogniion [20], informaion rerieval [21] and summarizaion [18]. As such, e follo he experimenal seing used by some previous sudies for speech summarizaion. The average ord error rae of he auomaic ranscrips of hese broadcas nes documens is abou 38%. The reference summaries ere generaed by ranking he senences in he manual ranscrip of a broadcas nes documen by imporance hou assigning a score o each senence. Each documen has hree reference summaries annoaed by hree subecs. For he assessmen of summarizaion performance, e adoped he dely-used ROUGE merics (in F-scores) [22]. The summarizaion raio as se o 10%. In addiion, a corpus of 14,000 ex nes documens, compiled during he same period as he broadcas nes documens, as used o esimae he parameers of he models compared in his paper. 4.2 Experimenal Resuls A common hread of leveraging ord embedding mehods in a summarizaion ask is o represen a documen/senence by Table 1. Summarizaion resuls achieved by various ordembedding mehods in conuncion h he cosine similariy measure. Cosine SG 0.239 0.311 0.215 0.311 Gloe 0.244 0.310 0.214 0.310 DSG 0.281 0.351 0.234 0.330 SM 0.228 0.290 0.189 0.287 LSA 0.233 0.316 0.201 0.301 MMR 0.248 0.322 0.215 0.315 Table 2. Summarizaion resuls achieved by various ordembedding mehods in conuncion h he ranslaionbased language modeling mehod. TBLM SG 0.320 0.385 0.225 0.322 Gloe 0.309 0.372 0.239 0.332 DSG 0.333 0.389 0.244 0.331 ULM 0.298 0.362 0.210 0.307 Table 3. Summarizaion resuls achieved by a fe ellsudied or/and sae-of-he-ar unsupervised mehods. MRW 0.282 0.332 0.191 0.291 LexRank 0.309 0.305 0.146 0.254 SM 0.286 0.332 0.204 0.303 ILP 0.337 0.348 0.209 0.306 DSG(TBLM) 0.333 0.389 0.244 0.331 averaging he corresponding ord embeddings over all ords in he documen/senence. Afer ha, he cosine similariy measure, as a sraighforard choice, can be readily applied o deermine he degree of relevance beeen any pair of represenaions [29, 30]. In he firs place, e ry o invesigae he effeciveness of o saeof-he-ar ord embedding mehods (i.e., SG and Gloe) and he proposed mehod (i.e., DSG), in conuncion h he cosine similariy measure for speech summarizaion. The experimenal resuls are shon in Table 1, here denoes he resuls obained based on he manual ranscrips of spoken documens, and denoes he resuls using he speech recogniion ranscrips of spoken documens ha may conain speech recogniion errors. Several observaions can be made from hese resuls. Firs, he o classic ord embedding mehods, hough based on disparae model srucures and learning sraegies, achieve resuls compeiive o each oher in boh he and cases. Second, he proposed DSG mehod, hich naurally brings ogeher he advanages of he o maor model families (i.e., predicion-based and coun-based) in he lieraure, ouperforms SG and Gloe (represenaives of he o model families, respecively) by a significan margin in boh he and cases. Third, since he relevance degree beeen a documen-senence pair is compued by he cosine similariy measure, vecor space-based mehods, such as SM [23], LSA [23] and MMR [24], can be reaed as he principled baseline sysems. Albei ha he classic ord embedding mehods (i.e., Gloe and SG) ouperform he convenional SM model, hey achieve almos he same level of performance as LSA and MMR, hich are considered o enhanced versions of SM. I should be noed ha

M apple Daily Life 0.55 0.25 Tech. Company Figure 1. A running example for inerpreing he ord embeddings learned by DSG. he proposed DSG mehod no only ouperforms SM, bu also is superior o LSA and MMR in boh he and cases. Nex, e evaluae he proposed TBLM mehod. I is orh menioning ha TBLM is ell suiable for pairing h he DSG mehod since he ranslaion probabiliy can be easily obained by ligheigh calculaion (cf. Eq. (8)). I can also be inegraed h he classic ord embedding mehods (e.g. SG and Gloe), bu h a heavier compuaional burden (cf. Eq. (2) for example). The experimenal resuls are summarized in Table 2. Three observaions can be made from he resuls. Firs, since he proposed DSG mehod inheris he advanages from predicion- and coun-based mehods, i ouperforms boh SG and Gloe, hen all he hree models are paired h he proposed TBLM mehod. Second, hen inegraed h TBLM, all he hree ord embedding mehods ouperform he baseline ULM mehod [18, 31] (cf. Secion 3.2) by a remarkable margin in boh he and cases. Third, comparing he resuls in Tables 1 and 2, i is eviden ha TBLM is deemed a preferable vehicle o make use of poerful ord embedding mehods in speech summarizaion. In he las se of experimens, e assess he performance levels of several ell-praciced unsupervised summarizaion mehods, including he graph-based mehods (i.e., MRW [25] and LexRank [26]), he submodulariy (SM) mehod [27], and he ineger linear programming (ILP) mehod [28]. The corresponding resuls are illusraed in Table 3. The performance rends of hese sae-of-hear mehods in our sudy are quie in line h hose observaions made by oher previous sudies in differen exracive summarizaion asks. A noiceable observaion is ha speech recogniion errors may lead o inaccurae similariy measures beeen a pair of senences or documen-senence. Probably due o his reason, he o graph-based mehods (i.e., MRW and LexRank) canno perform on par h he vecor-space mehods (i.e., SM, LSA, and MMR) (cf. Table 1) in he case, bu he siuaion is reversed in he case. Moreover, he SM and ILP achieve he bes resuls in he case, bu only offer mediocre performance among all mehods in he case. To sum up, he proposed DSG mehod, hich inheris he advanages from boh he predicionand coun-based mehods, indeed ouperforms classic ord embedding mehods hen paired h differen summarizaion sraegies (i.e., he cosine similariy measure and TBLM). The proposed TBLM mehod furher enhances he DSG mehod since i can auhenically capure a finer-grained (i.e., ord-o-ord) semanic relaionship o be effecively used in exracive speech summarizaion. 4.3 Furher Analysis SG, Gloe and he proposed DSG mehod can be analyzed from several criical perspecives. Firs, SG and DSG aim a maximizing he collecion likelihood in raining, hile Gloe concenraes on discovering useful informaion from he co-occurrence saisics beeen each pair of ords. I is orhy o noe ha Gloe has a close relaion h he classic eighed marix facorizaion approach, hile he maor difference is ha he former concenraes on rendering he ord-by-ord co-occurrence marix hile he laer decomposes he ord-by-documen marix [23, 32, 33]. Second, since he parameers (i.e., ord represenaions) of SG are rained sequenially (i.e., he so-called on-line learning sraegy), he sequenial order of he raining corpus may make he resuling models unsable. On he conrary, Gloe and DSG accumulae he saisics over he enire raining corpus in he firs place; he model parameers are hen updaed based on such censuses a once (i.e., he so-called bach-mode learning sraegy). Finally, he ord vecors learned by DSG are disribuional represenaions, hile SG and Gloe express each ord by a disribued represenaion. The classic embedding mehods usually oupu o ses of ord represenaions, bu here is no paricular use for he conex ord represenaions. Since DSG assumes ha each ro of he conex ord marix W follos a mulinomial disribuion (i.e., a mulinomial disribuion over ords), e can naurally inerpre he semanic meaning of each dimension of a ord embedding by referring o he ords h higher probabiliies in he corresponding ro vecor in W. Since each ord embedding in he desired ord represenaions M is also a mulinomial disribuion, e can inerpre a learned ord embedding by firs idenifying he dimensions h higher probabiliies and hen idenifying he conex ords h higher probabiliies in he corresponding ro vecors of W. Figure 1 shos a running example for ord apple learned by DSG on he LDC Gigaord corpus. The ord cloud can be ploed h respec o he probabiliies of individual conex ords for a seleced dimension. I is obvious ha apple is no only a kind of imporan maer in our daily life, bu also is a famous echnology company. The example shos ha he ord embeddings learned by DSG can be inerpreed in a reasonable and sysemaical manner. 5. CONCLUSIONS A novel disribuional ord embedding mehod and a ranslaionbased language modeling mehod have been proposed and inroduced for exracive speech summarizaion in his paper. Empirical resuls have demonsraed heir respecive and oin effeciveness and efficiency over several sae-of-he-ar summarizaion mehods. In he fuure, e plan o furher exend and apply he proposed frameork o a der range of summarizaion and NLP-relaed asks. We ll also concenrae on inegraing a variey of prior knoledge for learning he ord represenaions.

6. REFERENCES [1] Mari Osendorf. 2008. Speech echnology and informaion access. IEEE Signal Processing Magazine, 25(3): 150 152. [2] Sadaoki Furui, Li Deng, Mark Gales, Hermann Ney, and Keiichi Tokuda. 2012. Fundamenal echnologies in modern speech recogniion. IEEE Signal Processing Magazine, 29(6): 16 17. [3] Inderee Mani and Mark T. Maybury (Eds.). 1999. Advances in auomaic ex summarizaion. Cambridge, MA: MIT Press. [4] P. B. Baxendale. 1958. Machine-made index for echnical lieraure-an experimen. IBM Journal, 2(4): 354-361. [5] Yang Liu and Dilek Hakkani-Tur. 2011. Speech summarizaion. Chaper 13 in Spoken Language Undersanding: Sysems for Exracing Semanic Informaion from Speech, G. Tur and R. D. Mori (Eds), Ne York: Wiley. [6] Ziqiang Cao, Furu Wei, Suian Li, Wenie Li, Ming Zhou, and Houfeng Wang. 2015. Learning Summary Prior Represenaion for Exracive Summarizaion. In Proc. ACL, pages 829 833. [7] Ani Nenkova and Kahleen McKeon. 2011. Auomaic summarizaion. Foundaions and Trends in Informaion Rerieval, 5(2 3): 103 233. [8] Gerald Penn and Xiaodan Zhu. 2008. A criical reassessmen of evaluaion baselines for speech summarizaion. In Proc. ACL, pages 470 478. [9] Yoshua Bengio, Reean Ducharme, Pascal incen, and Chrisian Jauvin. 2003. A neural probabilisic language model. Journal of Machine Learning Research, 3: 1137 1155. [10] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficien esimaion of ord represenaions in vecor space. In Proc. ICLR, pages 1 12. [11] Jeffrey Penningon, Richard Socher, and Chrisopher D. Manning. 2014. Gloe: Global vecor for ord represenaion. In Proc. EMNLP, pages 1532 1543. [12] Will Y. Zou, Richard Socher, Daniel Cer, Chrisopher D. Manning. 2013. Bilingual ord embeddings for phrase-based machine ranslaion. In Proc. ACL, pages 1393 1398. [13] Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning senimen-specific ord embedding for er senimen classificaion. In Proc. ACL, pages 1555 1565. [14] Ronan Collober and Jason Weson. 2008. A unified archiecure for naural language processing: deep neural neorks h muliask learning. In Proc. ICLR, pages 160 167. [15] Marco Baroni, Georgiana Dinu, and German Kruszeski. 2014. Don coun, predic! A sysemaic comparison of conexcouning vs. conex-predicing semanic vecors. In Proc. ACL, pages 238 247. [16] Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving disribuional similariy h lessons learned from ord embeddings. Transacions of he Associaion for Compuaional Linguisics, 3: 211 225. [17] Han-Shen Huang, Bou-Ho Yang and Chun-Nan Hsu. 2005. Triple ump acceleraion for he EM algorihm. In Proc. ICDM, pages 1 4. [18] Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Combining relevance language modeling and clariy measure for exracive speech summarizaion. IEEE/ACM Transacions on Audio, Speech, and Language Processing, 23(6): 957 969. [19] Hsin-Min Wang, Berlin Chen, Jen-Wei Kuo, and Shih-Sian Cheng. 2005. MATBN: A Mandarin Chinese broadcas nes corpus. Inernaional Journal of Compuaional Linguisics and Chinese Language Processing, 10(2): 219 236. [20] Jen-Tzung Chien. 2015. Hierarchical Piman-Yor-Dirichle language model. IEEE/ACM Transacions on Audio, Speech and Language Processing, 23(8): 1259 1272. [21] Chien-Lin Huang and Chung-Hsien Wu. 2007. Spoken Documen Rerieval Using Muli-Level Knoledge and Semanic erificaion. IEEE Transacions on Audio, Speech, and Language Processing, 15(8): 2551 2560. [22] Chin-Ye Lin. 2003. ROUGE: Recall-oriened undersudy for gising evaluaion. hp://haydn.isi.edu/rouge/. [23] Yihong Gong and Xin Liu. 2001. Generic ex summarizaion using relevance measure and laen semanic analysis. In Proc. SIGIR, pages 19 25. [24] Jaime Carbonell and Jade Goldsein. 1998. The use of MMR, diversiy-based reranking for reordering documens and producing summaries. In Proc. SIGIR, pages 335 336. [25] Xiaoun Wan and Jianu Yang. 2008. Muli-documen summarizaion using cluser-based link analysis. In Proc. SIGIR, pages 299 306. [26] Gunes Erkan and Dragomir R. Radev. 2004. LexRank: Graphbased lexical cenraliy as salience in ex summarizaion. Journal of Arificial Inelligen Research, 22(1): 457 479. [27] Hui Lin and Jeff Bilmes. 2010. Muli-documen summarizaion via budgeed maximizaion of submodular funcions. In Proc. NAACL HLT, pages 912 920. [28] Korbinian Riedhammer, Benoi Favre, and Dilek Hakkani-Tur. 2010. Long sory shor - Global unsupervised models for keyphrase based meeing summarizaion. Speech Communicaion, 52(10): 801 815. [29] Mikael Kageback, Olof Mogren, Nina Tahmasebi, and Devda Dubhashi. 2014. Exracive summarizaion using coninuous vecor space models. In Proc. CSC, pages 31 39. [30] Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, and Hsin-Hsi Chen. 2015. Leveraging ord embeddings for spoken documen summarizaion. In Proc. INTERSPEECH, pages 1383 1387. [31] Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Ea-Ee Jan, Wen-Lian Hsu, and Hsin-Hsi Chen. 2015. Exracive broadcas nes summarizaion leveraging recurren neural neork language modeling echniques. IEEE/ACM Transacions on Audio, Speech, and Language Processing, 23(8):1322 1334. [32] Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen, and Hsin-Hsi Chen. 2013. Weighed marix facorizaion for spoken documen rerieval. In Proc. ICASSP, pages 8530 8534. [33] Omer Levy and Yoav Goldberg. 2014. Neural ord embedding as implici marix facorizaion. In Proc. NIPS, pages 1 9.