Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects

Size: px
Start display at page:

Download "Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects"

Transcription

1 Knowledge Guided Aenion and Inference for Describing Images Conaining Unseen Objecs Adiya Mogadala 1, Umanga Bisa 2, Lexing Xie 2, Achim Reinger 1 1 Insiue of Applied Informaics and Formal Descripion Mehods (AIFB), Karlsruhe Insiue for Technology (KIT), Karlsruhe, Germany {adiya.mogadala,reinger}@ki.edu 2 Compuaional Media Lab, Ausralian Naional Universiy (ANU), Canberra, Ausralia {umanga.bisa,lexing.xie}@anu.edu Absrac. Images on he Web encapsulae diverse knowledge abou varied absrac conceps. They canno be sufficienly described wih models learned from image-capion pairs ha menion only a small number of visual objec caegories. In conras, large-scale knowledge graphs conain many more conceps ha can be deeced by image recogniion models. Hence, o assis descripion generaion for hose images which conain visual objecs unseen in image-capion pairs, we propose a wosep process by leveraging large-scale knowledge graphs. In he firs sep, a muli-eniy recogniion model is buil o annoae images wih conceps no menioned in any capion. In he second sep, hose annoaions are leveraged as exernal semanic aenion and consrained inference in he image descripion generaion model. Evaluaions show ha our models ouperform mos of he prior work on ou-of-domain MSCOCO image descripion generaion and also scales beer o broad domains wih more unseen objecs. 1 Inroducion Conen on he Web is highly heerogeneous and consiss mosly of visual and exual informaion. In mos cases, hese differen modaliies complemen each oher, which complicaes he capuring of he full meaning by auomaed knowledge exracion echniques. An approach for making informaion in all modaliies accessible o auomaed processing is linking he informaion represened in he differen modaliies (e.g., images and ex) ino a shared concepualizaion, like eniies in a Knowledge Graph (KG). However, obaining an expressive formal represenaion of exual and visual conen has remained a research challenge for many years. Recenly, a differen approach has shown impressive resuls, namely he ransformaion of one unsrucured represenaion ino anoher. Specifically, he ask of generaing naural language descripions of images or videos [16] has gained much aenion. While such approaches are no relying on formal concepualizaions of he domain o cover, he sysems ha have been proposed so far

2 are limied by a very small number of objecs ha hey can describe (less han 100). Obviously, such mehods as hey need o be rained on manually crafed image-capion parallel daa do no scale o real-world applicaions, and can be applied o cross-domain web-scale conen. In conras, visual objec classificaion echniques have improved considerably and hey are now scaling o housands of objecs more han he ones covered by capion raining daa [3]. Also, KGs have grown o cover all of hose objecs plus millions more accompanied by billions of facs describing relaions beween hose objecs. Thus, i appears ha hose informaion sources are he missing link o make exising image capioning models scale o a larger number of objecs wihou having o creae addiional image-capion raining pairs wih hose missing objecs. In his paper, we invesigae he hypohesis, ha concepual relaions of eniies as represened in KGs can provide informaion o enable capion generaion models o generalize o objecs ha hey haven seen during raining in he image-capion parallel daa. While here are exising mehods ha are ackling his ask, none of hem has exploied any form of concepual knowledge so far. In our model, we use KG eniy embeddings o guide he aenion of he capion generaor o he correc (unseen) objec ha is depiced in he image. Our main conribuions presened in his paper are summarized as follows: We designed a novel approach, called Knowledge Guided Aenion (KGA), o improve he ask of generaing capions for images which conain objecs ha are no in he raining daa. To achieve i, we creaed a muli-eniy-label image classifier for linking he depiced visual objecs o KG eniies. Based on ha, we inroduce he firs mechanism ha explois he relaional srucure of eniies in KGs for guiding he aenion of a capion generaor owards picking he correc KG eniy o menion in is descripions. We conduced an exensive experimenal evaluaion showing he effeciveness of our KGA mehod. Boh, in erms of generaing effecual capions and also scaling i o more han 600 visual objecs. The conribuion of his work on a broader scope is is progress owards he inegraion of he visual and exual informaion available on he Web wih KGs. 2 Previous Work on Describing Images wih Unseen Objecs Exising mehods such as Deep Composiional Capioning (DCC) [4], Novel objec Capioner (NOC) [15], Consrained Beam Search (CBS) [2] and - C [17] address he challenge by ransferring informaion beween seen and unseen objecs eiher before inference (i.e. before esing) or by keeping consrains on he generaion of capion words during inference (i.e. during esing). Figure 1 provides a broad overview of hose approaches. 2

3 Image conaining Unseen Objec (pizza) No Aenion + No Transfer Base CNN- A man is making a sandwich in a resauran. No Aenion + Transfer Before Inference No Aenion + Transfer During Inference DCC,NOC CBS,-C A man sanding nex o a able wih a pizza in fron of i. Knowledge Assised Aenion + Transfer Before and During Inference KGA (ours) A man is holding a pizza in his hands. Fig. 1. KGA goal is o describe images conaining unseen objecs by building on he exising mehods i.e. DCC [4], NOC [15], CBS [2] and -C [17] and going beyond hem by adding semanic knowledge assisance. Base refers o our base descripion generaion model buil wih CNN [13] - [5]. In DCC, an approach which performs informaion ransfer only before inference, he raining of he capion generaion model is solely dependen on he corpus consiuing words which may appear in he similar conex as of unseen objecs. Hence, explici ransfer of learned parameers is required beween seen and unseen objec caegories before inference which limis DCC from scaling o a wide variey of unseen objecs. NOC ries o overcame such issues by adoping a end-o-end rainable framework which incorporaes auxiliary raining objecives during raining and deaching he need for explici ransfer of parameers beween seen and unseen objecs before inference. However, NOC raining can resul in sub-opimal soluions as he addiional raining aemps o opimize hree differen loss funcions simulaneously. CBS, leverages an approximae search algorihm o guaranee he inclusion of seleced words during inference of a capion generaion model. These words are however only consrained on he image ags produced by a image classifier. And he vocabulary used o find similar words as candidaes for replacemen during inference is usually kep very large, hence adding exra compuaional complexiy. -C avoids he limiaion of finding similar words during inference by adding a copying mechanism ino capion raining. This assiss he model during inference o decide wheher a word is o be generaed or copied from a dicionary. However, -C suffers from confusion problems since probabiliies during word generaion end o ge very low. In general, aforemenioned approaches also have he following limiaions: (1) The image classifiers used canno predic absrac meaning, like hope, as observed in many web images. (2) Visual feaures exraced from images are confined o he probabiliy of occurrence of a fixed se of labels (i.e. nouns, verbs and adjecives) observed in a resriced daase and canno be easily exended o varied caegories for large-scale experimens. (3) Since an aenion mechanism is missing, imporan regions in an image are never aended. While, he aenion mechanism in our model helps o scale down all possible idenified conceps o 3

4 he relevan conceps during capion generaion. For large-scale applicaions, his plays a crucial role. We inroduce a new model called Knowledge Guided Assisance (KGA) ha explois concepual knowledge provided by a knowledge graph (KG) [6] as exernal semanic aenion hroughou raining and also o aid as a dynamic consrain before and during inference. Hence, i augmens an auxiliary view as done in muli-view learning scenarios. Usage of KGs has already shown improvemens in oher asks, such as in quesion answering over srucured daa, language modeling [1], and generaion of facoid quesions [12]. 3 Describing Images wih Unseen Objecs Using Knowledge Guided Assisance (KGA) In his secion, we presen our capion generaion model o generae capions for unseen visual objec caegories wih knowledge assisance. KGAs core goal is o inroduce exernal semanic aenion (ESA) ino he learning and also work as a consrain before and during inference for ransferring informaion beween seen words and unseen visual objec caegories. 3.1 Capion Generaion Model Our image capion generaion model (henceforh, KGA-CGM) combines hree imporan componens: a language model pre-rained on unpaired exual corpora, exernal semanic aenion (ESA) and image feaures wih a exual (T), semanic (S) and visual (V) layer (i.e. TSV layer) for predicing he nex word in he sequence when learned using image-capion pairs. In he following, we presen each of hese componens separaely while Figure 2 presens he overall archiecure of KGA-CGM. Language Model This componen is crucial o ransfer he senence srucure for unseen visual objec caegories. Language model is implemened wih wo long shor-erm memory () [5] layers o predic he nex word given previous words in a senence. If w 1:L represen he inpu o he forward of layer-1 for capuring forward inpu sequences ino hidden sequence vecors ( h 1 1:L R H ), where L is he final ime sep. Then encoding of inpu word sequences ino hidden layer-1 and hen ino layer-2 a each ime sep is achieved as follows: h 1 = L1-F( w ; Θ) (1) h 2 = L2-F( h 1 ; Θ) (2) where Θ represen hidden layer parameers. The encoded final hidden sequence ( h 2 R H ) a ime sep is hen used for predicing he probabiliy disribuion of he nex word given by p +1 = sofmax(h 2 ). The sofmax layer is only used while raining wih unpaired exual corpora and no used when learned wih image capions. 4

5 ... Resauran I I Node1 Node2 F Node3 P I I Node4 Node6 I Pizza P W W Node5 F Chef... Language Model Fig. 2. KGA-CGM is buil wih hree componens. A language model buil wih wolayer forward (L1-F and L2-F), a muli-word-label classifier o generae image visual feaures and a muli-eniy-label classifier ha generaes eniy-labels linked o a KG serving as a parial image specific scene graph. This informaion is furher leveraged o acquire eniy vecors for supporing ESA. w represens he inpu capion word, c he semanic aenion, p he oupu of probabiliy disribuion over all words and y he prediced word a each ime sep. BOS and EOS represen he special okens. Exernal Semanic Aenion (ESA) Our objecive in ESA is o exrac semanic aenion from an image by leveraging semanic knowledge in KG as eniy-labels obained using a muli-eniy-label image classifier (discussed in he Secion 4.2). Here, eniy-labels are analogous o paches or aribues of an image. In formal erms, if ea i is an eniy-label and e i R E he eniy-label vecor among se of eniy-label vecors (i = 1,.., L) and β i he aenion weigh of e i hen β i is calculaed a each ime sep using Equaion 3. β i = exp(o i ) L j=1 exp(o j) (3) where O i = f(e i, h 2 ) represen scoring funcion which condiions on he layer-2 hidden sae (h 2 ) of a capion language model. I can be observed ha he scoring funcion f(e i, h 2 ) is crucial for deciding aenion weighs. Also, relevance of he hidden sae wih each eniy-label is calculaed using Equaion 4. f(e i, h 2 ) = anh((h 2 ) T W he e i ) (4) where W he R H E is a bilinear parameer marix. Once he aenion weighs are calculaed, he sof aenion weighed vecor of he conex c, which is a dynamic represenaion of he capion a ime sep is given by Equaion 5 c = L β i e i (5) i=1 5

6 Here, c R E and L represen he cardinaliy of eniy-labels per image-capion pair insance. Image Feaures & TSV Layer & Nex Word Predicion Visual feaures for an image are exraced using muli-word-label image classifier (discussed in he Secion 4.2). To be consisen wih oher approaches [4, 15] and for a fair comparison, our visual feaures (I) also have objecs ha we aim o describe ouside of he capion daases besides having word-labels observed in paired image-capion daa. Once he oupu from all componens is acquired, he TSV layer is employed o inegrae heir feaures i.e. exual (T ), semanic (S) and visual (V ) yielded by language model, ESA and images respecively. Thus, TSV acs as a ransformaion layer for molding hree differen feaure spaces ino a single common space for predicion of nex word in he sequence. If h 2 R H, c R E and I R I represen vecors acquired a each ime sep from language model, ESA and images respecively. Then he inegraion a TSV layer of KGA-CGM is provided by Equaion 6. T SV = W h 2 h 2 + W c c + W I I (6) where W h 2 R vs H, W c R vs E and W I R vs I are linear conversion marices and vs is he image-capion pair raining daase vocabulary size. The oupu from he TSV layer a each ime sep is furher used for predicing he nex word in he sequence using a sofmax layer given by p +1 = sofmax(t SV ). 3.2 KGA-CGM Training To learn parameers of KGA-CGM, firs we freeze he parameers of he language model rained using unpaired exual corpora. Thus, enabling only hose parameers o be learned wih image-capion pairs emerging from ESA and TSV layer such as W he, W h 2, W c and W I. KGA-CGM is now rained o opimize he cos funcion ha minimizes he sum of he negaive log likelihood of he appropriae word a each ime sep given by Equaion 7. min 1 θ N N L (n) n=1 =0 log(p(y (n) )) (7) Where L (n) represen he lengh of senence (i.e. capion) wih beginning of senence (BOS), end of senence (EOS) okens a n-h raining sample and N as a number of samples used for raining. 3.3 KGA-CGM Consrained Inference Inference in KGA-CGM refer o he generaion of descripions for es images. Here, inference is no sraighforward as in he sandard image capion generaion approaches [16] because unseen visual objec caegories have no parallel 6

7 capions hroughou raining. Hence hey will never be generaed in a capion. Thus, unseen visual objec caegories require guidance eiher before or during inference from similar seen words ha appear in he paired image-capion daase and likely also from image labels. In our case, we achieve he guidance boh before and during inference wih varied echniques. Guidance before Inference We firs idenify he seen words in he paired image-capion daase similar o he visual objec caegories unseen in imagecapion daase by esimaing he semanic similariy using heir Glove embeddings [9] learned using unpaired exual corpora (more deails in Secion 4.1). Furhermore, we uilize his informaion o perform dynamic ransfer beween seen words visual feaures (W I ), language model (W h 2 ) and exernal semanic aenion (W c ) weighs and unseen visual objec caegories. To illusrae, if (v unseen, i unseen ) and (v closes, i closes ) denoe he indexes of unseen visual objec caegory zebra and is semanically similar known word giraffe in a vocabulary (v s ) and visual feaures (i s ) respecively. Then o describe images wih zebra in he similar manner as of giraffe, he ransfer of weighs is performed beween hem by assigning W c [v unseen,:], W h 2 [v unseen,:] and W I [v unseen,:] o W c [v closes,:], W h 2 [v closes,:] and W I [v closes,:] respecively. Furhermore, W I [i unseen,i closes ], W I [i closes,i unseen ] is se o zero for removing muual dependencies of seen and unseen words presence in an image. Hence, aforemenioned procedure will updae he KGA-CGM rained model before inference o assis he generaion of unseen visual objec caegories during inference as given by Algorihm 1. Inpu: M={W he, W h 2, W c, W I } Oupu: M new 1 Iniialize Lis(closes) = cosine disance(lis(unseen),vocabulary) ; 2 Iniialize W c [v unseen,:], W h 2 [v unseen,:], W I [v unseen,:] = 0 ; 3 Funcion Before Inference 4 forall iems T in closes and Z in unseen do 5 if T and Z is vocabulary hen 6 W c [v Z,:] = W c [v T,:] ; 7 W h 2 [v Z,:] = W h 2 [v T,:] ; 8 W I [v Z,:] = W I [v T,:] ; 9 end 10 if i T and i Z in visual feaures hen 11 W I [i Z,i T ]=0 ; 12 W I [i T,i Z]=0 ; 13 end 14 end 15 M new = M ; 16 reurn M new ; 17 end Algorihm 1: Consrained Inference Overview (Before) 7

8 Guidance during Inference The updaed KGA-CGM model is used for generaing descripions of unseen visual objec caegories. However, in he beforeinference procedure, he closes words o unseen visual objec caegories are idenified using embeddings ha are learned only using exual corpora and are never consrained on images. This obsrucs he view from an image leading o spurious resuls. We resolve such nuances during inference by consraining he beam search used for descripion generaion wih image eniy-labels (ea). In general, beam search is used o consider he bes k senences a ime o idenify he senence a he nex ime sep. Our modificaion o beam search is achieved by adding a exra consrain o check if a generaed unseen visual objec caegory is par of he eniy-labels. If i s no, unseen visual objec caegories are never replaced wih heir closes seen words. Algorihm 2 presens he overview of KGA-CGM guidance during inference. Inpu: M new, Im labels, beam-size k, word w Oupu: bes k successors 1 Iniialize Im labels = Top-5 (ea) ; 2 Iniialize beam-size k ; 3 Iniialize word w=null ; 4 Funcion During Inference 5 forall Sae s of k words do 6 w=s ; 7 if closes[w] in ea hen 8 s = closes[w]; 9 end 10 else 11 s = w ; 12 end 13 end 14 reurn bes k successors ; 15 end Algorihm 2: Consrained Inference Overview (During) 4 Experimenal Seup 4.1 Resources and Daases Our approach is dependen on several resources and daases. Knowledge Graphs (KGs) and Unpaired Texual Corpora There are several openly available KGs such as DBpedia, Wikidaa, and YAGO which provide semanic knowledge encapsulaed in eniies and heir relaionships. We 8

9 choose DBpedia as our KG for eniy annoaion, as i is one of he exensively used resource for semanic annoaion and disambiguaion [6]. For learning weighs of he language model and also Glove word embeddings, we have explored differen unpaired exual corpora from ou-of-domain sources (i.e. ou of image-capion parallel corpora) such as he Briish Naional Corpus (BNC) 3, Wikipedia (Wiki) and subse of SBU1M 4 capion ex conaining 947 caegories of ILSVRC12 daase [11]. NLTK 5 senence okenizer is used o exrac okenizaions and around 70k+ words vocabulary is exraced wih Glove embeddings. Unseen Objecs Descripion (Ou-of-Domain MSCOCO & ImageNe) To evaluae KGA-CGM, we use he subse of MSCOCO daase [7] proposed by Hendricks e al. [4]. The daase is obained by clusering 80 image objec caegory labels ino 8 clusers and hen selecing one objec from each cluser o be held ou from he raining se. Now he raining se does no conain he images and senences of hose 8 objecs represened by bole, bus, couch, microwave, pizza, racke, suicase and zebra. Thus making he MSCOCO raining daase o consiue 70,194 image-capion pairs. While validaion se of image-capion pairs are again divided ino each for esing and validaion. Now, he goal of KGA-CGM is o generae capion for hose es images which conain hese 8 unseen objec caegories. Henceforh, we refer his daase as ou-of-domain MSCOCO. To evaluae KGA-CGM on more challenging ask, we aemp o describe images ha conain wide variey of objecs as observed on he web. To imiae such a scenario, we colleced images from collecions conaining images wih wide variey of objecs. Firs, we used same se of images as earlier approaches [15, 17] which are subse of ImageNe [3] consiuing 642 objec caegories used in Hendricks e al. [4] who do no occur in MSCOCO. However, 120 ou of hose 642 objec caegories are par of ILSVRC Muli-Label Image Classifiers The imporan consiuens ha influence KGA-CGM are he image eniy-labels and visual feaures. Idenified objecs/acions ec. in an image are embodied in visual feaures, while eniy-labels capure he semanic knowledge in an image grounded in KG. In his secion, we presen he approach o exrac boh visual feaures and eniy-labels. Muli-Word-label Image Classifier To exrac visual feaures of ou-ofdomain MSCOCO images, emulaing Hendricks e al. [4] a muli-word-label classifier is buil using he capions aligned o an image by exracing parof-speech (POS) ags such as nouns, verbs and adjecives aained for each word 3 hp:// 4 hp://vision.cs.sonybrook.edu/~vicene/sbucapions/ 5 hp:// 9

10 in he enire MSCOCO daase. For example, he capion A young child brushes his eeh a he sink conains word-labels such as young (JJ), child (NN), eeh (NN) ec., ha represen conceps in an image. An image classifier is rained now wih 471 word-labels using a sigmoid cross-enropy loss by fineuning VGG-16 [13] pre-rained on he raining par of he ILSVRC12. The visual feaures exraced for a new image represen he probabiliies of 471 image labels observed in ha image. For exracing visual feaures from ImageNe images, we replace he muli-word-label classifier wih he lexical classifier [4] learned wih 642 ImageNe objec caegories. Muli-Eniy-label Image Classifier To exrac semanic knowledge for ouof-domain MSCOCO images analogous o he word-labels, a muli-eniy-label classifier is build wih eniy-labels aained from a knowledge graph annoaion ool such as DBpedia spoligh 6 on raining se of MSCOCO consiuing 82,783 raining image-capion pairs. In oal around 812 unique labels are exraced wih an average of 3.2 labels annoaed per image. To illusrae, considering he capion presened in he aforemenioned secion, eniy labels exraced are Brush 7 and Tooh 8. An image classifier is now rained wih muliple eniylabels using sigmoid cross-enropy loss by fine-uning VGG-16 [13] pre-rained on he raining par of he ILSVRC12. For exracing eniy-labels from ImageNe images, we again leveraged lexical classifier [4] learned wih 642 ImageNe objec caegories. However, as all 642 caegories denoe WordNe synses, we build a connecion beween hese caegories and DBpedia by leveraging BabelNe [8] for muli-eniy-label classifier. To illusrae, for visual objec caegory womba (wordneid: n ) in ImageNe can be linked o DBpedia Womba 9. Hence, his makes our mehod very modular for building new image classifiers o incorporae semanic knowledge. 4.3 Eniy-Label Embeddings We presened earlier ha he acquisiion of eniy-labels for raining muli-eniylabel classifiers were obained using DBpedia spoligh eniy annoaion and disambiguaion ool. Hence, eniy-labels are expeced o encapsulae semanic knowledge grounded in KG. Furher, eniies in a KG can be represened wih embeddings by capuring heir relaional informaion. In our work, we see he efficacy of hese embeddings for capion generaion. Thus, we leverage eniy-label embeddings for compuing semanic aenion observed in an image wih respec o he capion as observed from KG. To obain eniy-label embeddings, we adoped he RDF2Vec [10] approach and generaed 500 dimensional vecor represenaions for 812 and 642 eniy-labels o describe ou-of-domain MSCOCO and ImageNe images respecively. 6 hps://gihub.com/dbpedia-spoligh/ 7 hp://dbpedia.org/resource/brush 8 hp://dbpedia.org/resource/tooh 9 hp://dbpedia.org/page/womba 10

11 4.4 Evaluaion Measures To evaluae generaed descripions for he unseen MSCOCO visual objec caegories, we use similar evaluaion merics as earlier approaches [4, 15, 17] such as METEOR and also SPICE [2]. However, CIDEr [14] meric is no used as i is required o calculae he inverse documen frequency used by his meric across he enire es se and no jus unseen objec subses. F1 score is also calculaed o measure he presence of unseen objecs in he generaed capions when compared agains reference capions. Furhermore, o evaluae ImageNe objec caegories descripion generaion: we leveraged F1 and also oher merics such as Unseen and Accuracy scores [15, 17]. The Unseen score measures he percenage of all novel objecs menioned in generaed descripions, while accuracy measure percenage of image descripions correcly addressed he unseen objecs. 5 Experimens The experimens are conduced o evaluae he efficacy of KGA-CGM model for describing ou-of-domain MSCOCO and ImageNe images. 5.1 Implemenaion KGA-CGM model consiues hree imporan componens i.e. language model, visual feaures and eniy-labels. Before learning KGA-CGM model wih imagecapion pairs, we firs learn he weighs of language model and keep i fixed during he raining of KGA-CGM model. To learn language model, we leverage unpaired exual corpora (e.g. enire MSCOCO se, Wiki, BNC ec.) and provide inpu word embeddings represening 256 dimensions pre-rained wih Glove [9] defaul seings on he same unpaired exual corpora. Hidden layer dimensions of language model are se o 512. KGM-CGM model is hen rained using imagecapion pairs wih Adam opimizer wih gradien clipping having maximum norm of 1.0 for abou epochs. Validaion daa is used for model selecion and experimens are implemened wih Keras+Theano backend Describing Ou-of-Domain MSCOCO Images In his secion, we evaluae KGA-CGM using ou-of-domain MSCOCO daase. Quaniaive Analysis We compared our complee KGA-CGM model wih he oher exising models ha generae image descripions on ou-of-domain MSCOCO. To have a fair comparison, only hose resuls are compared which used VGG-16 o generae image feaures. Table 1 shows he comparison of individual and average scores based on METEOR, SPICE and F1 on all 8 unseen visual objec caegories wih beam size 1. I can be noiced ha KGA-CGM 10 hps://gihub.com/adiyamogadala/kga 11

12 F1 Model Beam microwave racke bole zebra pizza couch bus suicase Average DCC [4] NOC [15] > CBS(T4) [2] > C [17] > KGA-CGM METEOR DCC [4] NOC [15] > C [17] > CBS(T4) [2] > KGA-CGM SPICE DCC [4] > CBS(T4) [2] > KGA-CGM Table 1. Measures for all 8 unseen objecs. Underline shows he second bes. wih beam size 1 was comparable o oher approaches even hough i used fixed vocabulary from image-capion pairs. For example, CBS [2] used expanded vocabulary of 21,689 when compared o 8802 by us. Also, our word-labels per image are fixed, while CBS uses a varying size of prediced image ags (T1-4). This makes i non-deerminisic and can increase uncerainy, as varying ags will eiher increase or decrease he performance. Furhermore, we also evaluaed KGA-CGM for he res of seen visual objec caegories in he Table 2. I can be observed ha our KGA-CGM ouperforms exising approaches as i did no undermine he in-domain descripion generaion, alhough i was uned for ou-of-domain descripion generaion. Model Seen Objecs Beam METEOR SPICE F1-score DCC [4] CBS(T4) [2] > KGA-CGM KGA-CGM > Table 2. Average measures of MSCOCO seen objecs. 12

13 Qualiaive Analysis In Figure 3, sample predicions of our bes KGA-CGM model is presened. I can be observed ha eniy-labels has shown an influence for capion generaion. Since, eniies as image labels are already disambiguaed, i aained high similariy in he predicion of a word hus adding useful semanics. Figure 3 presens he example unseen visual objecs descripions. Unseen Objec: Bole Prediced Eniy-Labels (Top-3): Wine_glass, Wine_bole, Bole Base: A vase wih a flower in i siing on a able NOC: A wine bole siing on a able nex o a wine bole KGA-CGM : A bole of wine siing on op of a able Unseen Objec: Couch Prediced Eniy-Labels (Top-3): Cake,Couch,Glass Base: A person is laying down on a bed NOC: A woman siing on a chair wih a large piece of cake on her arm KGA-CGM : A woman siing on a couch wih a remoe Unseen Objec: Pizza Prediced Eniy-Labels (Top-3): Pizza,Resauran,Ha Base: A man is making a sandwich in a resauran NOC: A man sanding nex o a able wih a pizza in fron of i. KGA-CGM: A man is holding a pizza in his hands Unseen Objec: Suicase Prediced Eniy-Labels (Top-3): Ca,Baggage,Black_Ca Base: A ca laying on op of a pile of books NOC: A ca laying on a suicase on a bed KGA-CGM: A ca laying inside of a suicase on a bed Unseen Objec: Bus Prediced Eniy-Labels (Top-3): Bus,Public_Transpor,Transi_Bus Base: A car is parked on he side of he sree NOC: Bus driving down a sree nex o a bus sop. KGA-CGM: A whie bus is parked on he sree Unseen Objec: Microwave Prediced Eniy-Labels (Top-3):Refrigeraor,Oven,Microwave_Oven Base: A wooden able wih a refrigeraor and a brown cabine NOC: A kichen wih a refrigeraor, refrigeraor, and refrigeraor. KGA-CGM: A kichen wih a microwave, oven and a refrigeraor Unseen Objec: Racke Prediced Eniy-Labels (Top-3):Tennis, Racke_(spors_equipmen), Cour Base: A ennis player geing ready o serve he ball NOC: A woman cour holding a ennis racke on a cour. KGA-CGM: A woman playing ennis on a ennis cour wih a racke. Unseen Objec: Zebra Prediced Eniy-Labels (Top-3):Zebra,Enclosure,Zoo Base: A couple of animals ha are sanding in a field NOC: Zebras sanding ogeher in a field wih zebras KGA-CGM: A group of zebras sanding in a line Fig. 3. Sample predicions of KGA-CGM on ou-of-domain MSCOCO Images wih Beam Size 1 when compared agains base model and NOC [15] 5.3 Describing ImageNe Images ImageNe images do no conain any ground-ruh capions and conain exacly one unseen visual objec caegory per image. Iniially, we firs rerain differen language models using unpaired exual daa (Secion 4.1) and also he enire MSCOCO raining se. Furhermore, he KGA-CGM model is rebuil for each one of hem separaely. To describe ImageNe images, image classifiers presened in he Secion 4.2 are leveraged. Table 3 summarizes he experimenal resuls aained on 634 caegories (i.e. no all 642) o have fair comparison wih oher approaches. By adoping only MSCOCO raining daa for language model, our KGA-CGM makes he relaive improvemen over NOC and -C in all caegories i.e. unseen, F1 and accuracy. Figure 4 shows few sample descripions. 6 Key Findings The key observaions of our research are: (1) The ablaion sudy conduced o undersand he influence of differen componens in KGA-CGM has shown ha using exernal semanic aenion and consrained inference has superior performance when compared o using only eiher of hem. Also, increasing he beam size during inference has shown a drop in all measures. This is basically 13

14 Model Unpaired-Tex Unseen F1 Accuracy NOC [15] MSCOCO BNC&Wiki C [17] MSCOCO BNC&Wiki KGA-CGM MSCOCO BNC&Wiki BNC&Wiki&SBU1M Table 3. Describing ImageNe Images wih Beam size 1. Resuls of NOC and -C (wih Glove) are adoped from Yao e al. [17] Unseen Objec: Truffle Guidance Before Inference: food ruffle Base: A person holding a piece of paper. KGA-CGM: A close up of a person holding ruffle Unseen Objec: Papaya Guidance Before Inference: banana papaya Base: A woman sanding in a garden. KGA-CGM: These are ripe papaya hanging on a ree Unseen Objec: Mammoh Guidance Before Inference: elephan mammoh Base: A baby elephan sanding in waer KGA-CGM: A herd of mammoh sanding on op of a green field Unseen Objec: Blackbird Guidance Before Inference: bird blackbird Base: A bird sanding in a field of green grass KGA-CGM: A blackbird sanding in he grass Fig. 4. ImageNe images wih bes KGA-CGM model from Table 3. Guided before inference shows which words are used for ransfer beween seen and unseen. adhered o he influence of muliple words on unseen objecs. (2) The performance advanage becomes clearer if he domain of unseen objecs is broadened. In oher words: KGA-CGM specifically improves over he sae-of-he-ar in seings ha are larger and less conrolled. Hereby, KGA-CGM scales o one order of magniude more unseen objecs wih moderae performance decreases. (3) The influence of he closes seen words (i.e. observed in image-capion pairs) and he unseen visual objec caegories played a prominen role for generaing descripions. For example in ou-of-domain MSCOCO, words such as suicase / bag, bole / glass and bus/ruck are semanically similar and are also used in he similar manner in a senence added excellen value. However, some words usually cooccur such as racke / cour and pizza / plae played differen roles in senences and lead o few grammaical errors. (4) The decrease in performance have a high correlaion wih he discrepancy beween he domain where seen and unseen objecs come from. 7 Conclusion and Fuure Work In his paper, we presened an approach o generae capions for images ha lack parallel capions during raining wih he assisance from semanic knowledge encapsulaed in KGs. In he fuure, we plan o expand our models o build mulimedia knowledge graphs along wih image descripions which can be used for finding relaed images or can be searched wih long exual queries. 14

15 8 Acknowledgemens Firs auhor is graeful o KHYS a KIT for heir research ravel gran and Compuaional Media Lab a ANU for providing access o heir K40x GPUs. References 1. Ahn, S., Choi, H., Pärnamaa, T., Bengio, Y.: A neural knowledge language model. arxiv preprin arxiv: (2016) 2. Anderson, P., Fernando, B., Johnson, M., Gould, S.: Guided open vocabulary image capioning wih consrained beam search. In: EMNLP (2017) 3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagene: A large-scale hierarchical image daabase. In: Compuer Vision and Paern Recogniion, CVPR IEEE Conference on. pp IEEE (2009) 4. Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep composiional capioning: Describing novel objec caegories wihou paired raining daa. In: CVPR. pp (2016) 5. Hochreier, S., Schmidhuber, J.: Long shor-erm memory. Neural compuaion 9(8), (1997) 6. Lehmann, J., Isele, R., Jakob, M., Jenzsch, A., Konokosas, D., Mendes, P.N., e al.: Dbpedia a large-scale, mulilingual knowledge base exraced from wikipedia. Semanic Web (2015) 7. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zinick, C.L.: Microsof coco: Common objecs in conex. In: ECCV. pp Springer (2014) 8. Navigli, R., Ponzeo, S.P.: Babelne: The auomaic consrucion, evaluaion and applicaion of a wide-coverage mulilingual semanic nework. Arificial Inelligence 193, (2012) 9. Penningon, J., Socher, R., Manning, C.: Glove: Global vecors for word represenaion. In: EMNLP. pp (2014) 10. Risoski, P., Paulheim, H.: Rdf2vec: Rdf graph embeddings for daa mining. In: Inernaional Semanic Web Conference. pp Springer (2016) 11. Russakovsky, O., Deng, J., Su, H., Krause, J., Saheesh, S., Ma, S., Huang, Z., Karpahy, A., Khosla, A., Bernsein, M., e al.: Imagene large scale visual recogniion challenge. Inernaional Journal of Compuer Vision 115(3), (2015) 12. Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generaing facoid quesions wih recurren neural neworks: The 30m facoid quesion-answer corpus. arxiv preprin arxiv: (2016) 13. Simonyan, K., Zisserman, A.: Very deep convoluional neworks for large-scale image recogniion. arxiv preprin arxiv: (2014) 14. Vedanam, R., Zinick, L.C., Parikh, D.: Cider: Consensus-based image descripion evaluaion. In: CVPR. pp (2015) 15. Venugopalan, S., Hendricks, L.A., Rohrbach, M., Mooney, R., Darrell, T., Saenko, K.: Capioning images wih diverse objecs. In: CVPR (2017) 16. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and ell: Lessons learned from he 2015 mscoco image capioning challenge. IEEE ransacions on paern analysis and machine inelligence 39(4), (2017) 17. Yao, T., Yingwei, P., Yehao, L., Mei, T.: Incorporaing copying mechanism in image capioning for learning novel objecs. In: CVPR (2017) 15

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Diverse Concept-Level Features for Multi-Object Classification

Diverse Concept-Level Features for Multi-Object Classification Diverse Concept-Level Features for Multi-Object Classification Youssef Tamaazousti 12 Hervé Le Borgne 1 Céline Hudelot 2 1 CEA, LIST, Laboratory of Vision and Content Engineering, F-91191 Gif-sur-Yvette,

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Lip Reading in Profile

Lip Reading in Profile CHUNG AND ZISSERMAN: BMVC AUTHOR GUIDELINES 1 Lip Reading in Profile Joon Son Chung http://wwwrobotsoxacuk/~joon Andrew Zisserman http://wwwrobotsoxacuk/~az Visual Geometry Group Department of Engineering

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v1 [cs.lg] 7 Apr 2015 Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen TRANSFER LEARNING OF WEAKLY LABELLED AUDIO Aleksandr Diment, Tuomas Virtanen Tampere University of Technology Laboratory of Signal Processing Korkeakoulunkatu 1, 33720, Tampere, Finland firstname.lastname@tut.fi

More information

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION Atul Laxman Katole 1, Krishna Prasad Yellapragada 1, Amish Kumar Bedi 1, Sehaj Singh Kalra 1 and Mynepalli Siva Chaitanya 1 1 Samsung

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

arxiv: v2 [cs.cv] 3 Aug 2017

arxiv: v2 [cs.cv] 3 Aug 2017 Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis University of Maryland, College Park Abstract Linguistic Knowledge

More information

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks Taxonomy-Regularized Semantic Deep Convolutional Neural Networks Wonjoon Goo 1, Juyong Kim 1, Gunhee Kim 1, Sung Ju Hwang 2 1 Computer Science and Engineering, Seoul National University, Seoul, Korea 2

More information

arxiv:submit/ [cs.cv] 2 Aug 2017

arxiv:submit/ [cs.cv] 2 Aug 2017 Associative Domain Adaptation Philip Haeusser 1,2 haeusser@in.tum.de Thomas Frerix 1 Alexander Mordvintsev 2 thomas.frerix@tum.de moralex@google.com 1 Dept. of Informatics, TU Munich 2 Google, Inc. Daniel

More information

Cultivating DNN Diversity for Large Scale Video Labelling

Cultivating DNN Diversity for Large Scale Video Labelling Cultivating DNN Diversity for Large Scale Video Labelling Mikel Bober-Irizar mikel@mxbi.net Sameed Husain sameed.husain@surrey.ac.uk Miroslaw Bober m.bober@surrey.ac.uk Eng-Jon Ong e.ong@surrey.ac.uk Abstract

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Contents. Foreword... 5

Contents. Foreword... 5 Contents Foreword... 5 Chapter 1: Addition Within 0-10 Introduction... 6 Two Groups and a Total... 10 Learn Symbols + and =... 13 Addition Practice... 15 Which is More?... 17 Missing Items... 19 Sums with

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Linking the Ohio State Assessments to NWEA MAP Growth Tests * Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria FUZZY EXPERT SYSTEMS 16-18 18 February 2002 University of Damascus-Syria Dr. Kasim M. Al-Aubidy Computer Eng. Dept. Philadelphia University What is Expert Systems? ES are computer programs that emulate

More information

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Adam Abdulhamid Stanford University 450 Serra Mall, Stanford, CA 94305 adama94@cs.stanford.edu Abstract With the introduction

More information

What is a Mental Model?

What is a Mental Model? Mental Models for Program Understanding Dr. Jonathan I. Maletic Computer Science Department Kent State University What is a Mental Model? Internal (mental) representation of a real system s behavior,

More information

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories Ziad Al-Halah Rainer Stiefelhagen Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany Abstract

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

arxiv: v1 [cs.cl] 20 Jul 2015

arxiv: v1 [cs.cl] 20 Jul 2015 How to Generate a Good Word Embedding? Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China {swlai, kliu,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Short Text Understanding Through Lexical-Semantic Analysis

Short Text Understanding Through Lexical-Semantic Analysis Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Copyright Corwin 2015

Copyright Corwin 2015 2 Defining Essential Learnings How do I find clarity in a sea of standards? For students truly to be able to take responsibility for their learning, both teacher and students need to be very clear about

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

A Review: Speech Recognition with Deep Learning Methods

A Review: Speech Recognition with Deep Learning Methods Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Residual Stacking of RNNs for Neural Machine Translation

Residual Stacking of RNNs for Neural Machine Translation Residual Stacking of RNNs for Neural Machine Translation Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Akiva Miura Nara Institute of Science and Technology miura.akiba.lr9@is.naist.jp

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

SORT: Second-Order Response Transform for Visual Recognition

SORT: Second-Order Response Transform for Visual Recognition SORT: Second-Order Response Transform for Visual Recognition Yan Wang 1, Lingxi Xie 2( ), Chenxi Liu 2, Siyuan Qiao 2 Ya Zhang 1( ), Wenjun Zhang 1, Qi Tian 3, Alan Yuille 2 1 Cooperative Medianet Innovation

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Deep Neural Network Language Models

Deep Neural Network Language Models Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

ACTIVITY: Comparing Combination Locks

ACTIVITY: Comparing Combination Locks 5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.

More information

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF

More information

Concepts and Properties in Word Spaces

Concepts and Properties in Word Spaces Concepts and Properties in Word Spaces Marco Baroni 1 and Alessandro Lenci 2 1 University of Trento, CIMeC 2 University of Pisa, Department of Linguistics Abstract Properties play a central role in most

More information

J j W w. Write. Name. Max Takes the Train. Handwriting Letters Jj, Ww: Words with j, w 321

J j W w. Write. Name. Max Takes the Train. Handwriting Letters Jj, Ww: Words with j, w 321 Write J j W w Jen Will Directions Have children write a row of each letter and then write the words. Home Activity Ask your child to write each letter and tell you how to make the letter. Handwriting Letters

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

T2Ts, revised. Foundations

T2Ts, revised. Foundations T2Ts, revised Foundations LT, SC, Agenda LT: As a litterateur, I can utilize active reading strategies to support my reading comprehension and I can explain the expectations of the first Embedded Assessment

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Fourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade

Fourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade Fourth Grade Libertyville School District 70 Reporting Student Progress Fourth Grade A Message to Parents/Guardians: Libertyville Elementary District 70 teachers of students in kindergarten-5 utilize a

More information

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web Hang Su Queen Mary University of London hang.su@qmul.ac.uk Shaogang Gong Queen Mary University of London s.gong@qmul.ac.uk Xiatian Zhu

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web Hang Su Queen Mary University of London hang.su@qmul.ac.uk Shaogang Gong Queen Mary University of London s.gong@qmul.ac.uk Xiatian Zhu

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Review in ICAME Journal, Volume 38, 2014, DOI: /icame Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

CAFE ESSENTIAL ELEMENTS O S E P P C E A. 1 Framework 2 CAFE Menu. 3 Classroom Design 4 Materials 5 Record Keeping

CAFE ESSENTIAL ELEMENTS O S E P P C E A. 1 Framework 2 CAFE Menu. 3 Classroom Design 4 Materials 5 Record Keeping CAFE RE P SU C 3 Classroom Design 4 Materials 5 Record Keeping P H ND 1 Framework 2 CAFE Menu R E P 6 Assessment 7 Choice 8 Whole-Group Instruction 9 Small-Group Instruction 10 One-on-one Instruction 11

More information

AP Chemistry

AP Chemistry AP Chemistry 2016-2017 Welcome to AP Chemistry! I am so excited to have you in this course next year! To get geared up for the class, there are some things that you need to do this summer. None of it is

More information

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach #BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

More information

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from

More information

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today! Dear Teacher: Welcome to Reading Rods! Your Sentence Building Reading Rod Set contains 156 interlocking plastic Rods printed with words representing different parts of speech and punctuation marks. Students

More information