Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects
|
|
- Eustace Goodwin
- 6 years ago
- Views:
Transcription
1 Knowledge Guided Aenion and Inference for Describing Images Conaining Unseen Objecs Adiya Mogadala 1, Umanga Bisa 2, Lexing Xie 2, Achim Reinger 1 1 Insiue of Applied Informaics and Formal Descripion Mehods (AIFB), Karlsruhe Insiue for Technology (KIT), Karlsruhe, Germany {adiya.mogadala,reinger}@ki.edu 2 Compuaional Media Lab, Ausralian Naional Universiy (ANU), Canberra, Ausralia {umanga.bisa,lexing.xie}@anu.edu Absrac. Images on he Web encapsulae diverse knowledge abou varied absrac conceps. They canno be sufficienly described wih models learned from image-capion pairs ha menion only a small number of visual objec caegories. In conras, large-scale knowledge graphs conain many more conceps ha can be deeced by image recogniion models. Hence, o assis descripion generaion for hose images which conain visual objecs unseen in image-capion pairs, we propose a wosep process by leveraging large-scale knowledge graphs. In he firs sep, a muli-eniy recogniion model is buil o annoae images wih conceps no menioned in any capion. In he second sep, hose annoaions are leveraged as exernal semanic aenion and consrained inference in he image descripion generaion model. Evaluaions show ha our models ouperform mos of he prior work on ou-of-domain MSCOCO image descripion generaion and also scales beer o broad domains wih more unseen objecs. 1 Inroducion Conen on he Web is highly heerogeneous and consiss mosly of visual and exual informaion. In mos cases, hese differen modaliies complemen each oher, which complicaes he capuring of he full meaning by auomaed knowledge exracion echniques. An approach for making informaion in all modaliies accessible o auomaed processing is linking he informaion represened in he differen modaliies (e.g., images and ex) ino a shared concepualizaion, like eniies in a Knowledge Graph (KG). However, obaining an expressive formal represenaion of exual and visual conen has remained a research challenge for many years. Recenly, a differen approach has shown impressive resuls, namely he ransformaion of one unsrucured represenaion ino anoher. Specifically, he ask of generaing naural language descripions of images or videos [16] has gained much aenion. While such approaches are no relying on formal concepualizaions of he domain o cover, he sysems ha have been proposed so far
2 are limied by a very small number of objecs ha hey can describe (less han 100). Obviously, such mehods as hey need o be rained on manually crafed image-capion parallel daa do no scale o real-world applicaions, and can be applied o cross-domain web-scale conen. In conras, visual objec classificaion echniques have improved considerably and hey are now scaling o housands of objecs more han he ones covered by capion raining daa [3]. Also, KGs have grown o cover all of hose objecs plus millions more accompanied by billions of facs describing relaions beween hose objecs. Thus, i appears ha hose informaion sources are he missing link o make exising image capioning models scale o a larger number of objecs wihou having o creae addiional image-capion raining pairs wih hose missing objecs. In his paper, we invesigae he hypohesis, ha concepual relaions of eniies as represened in KGs can provide informaion o enable capion generaion models o generalize o objecs ha hey haven seen during raining in he image-capion parallel daa. While here are exising mehods ha are ackling his ask, none of hem has exploied any form of concepual knowledge so far. In our model, we use KG eniy embeddings o guide he aenion of he capion generaor o he correc (unseen) objec ha is depiced in he image. Our main conribuions presened in his paper are summarized as follows: We designed a novel approach, called Knowledge Guided Aenion (KGA), o improve he ask of generaing capions for images which conain objecs ha are no in he raining daa. To achieve i, we creaed a muli-eniy-label image classifier for linking he depiced visual objecs o KG eniies. Based on ha, we inroduce he firs mechanism ha explois he relaional srucure of eniies in KGs for guiding he aenion of a capion generaor owards picking he correc KG eniy o menion in is descripions. We conduced an exensive experimenal evaluaion showing he effeciveness of our KGA mehod. Boh, in erms of generaing effecual capions and also scaling i o more han 600 visual objecs. The conribuion of his work on a broader scope is is progress owards he inegraion of he visual and exual informaion available on he Web wih KGs. 2 Previous Work on Describing Images wih Unseen Objecs Exising mehods such as Deep Composiional Capioning (DCC) [4], Novel objec Capioner (NOC) [15], Consrained Beam Search (CBS) [2] and - C [17] address he challenge by ransferring informaion beween seen and unseen objecs eiher before inference (i.e. before esing) or by keeping consrains on he generaion of capion words during inference (i.e. during esing). Figure 1 provides a broad overview of hose approaches. 2
3 Image conaining Unseen Objec (pizza) No Aenion + No Transfer Base CNN- A man is making a sandwich in a resauran. No Aenion + Transfer Before Inference No Aenion + Transfer During Inference DCC,NOC CBS,-C A man sanding nex o a able wih a pizza in fron of i. Knowledge Assised Aenion + Transfer Before and During Inference KGA (ours) A man is holding a pizza in his hands. Fig. 1. KGA goal is o describe images conaining unseen objecs by building on he exising mehods i.e. DCC [4], NOC [15], CBS [2] and -C [17] and going beyond hem by adding semanic knowledge assisance. Base refers o our base descripion generaion model buil wih CNN [13] - [5]. In DCC, an approach which performs informaion ransfer only before inference, he raining of he capion generaion model is solely dependen on he corpus consiuing words which may appear in he similar conex as of unseen objecs. Hence, explici ransfer of learned parameers is required beween seen and unseen objec caegories before inference which limis DCC from scaling o a wide variey of unseen objecs. NOC ries o overcame such issues by adoping a end-o-end rainable framework which incorporaes auxiliary raining objecives during raining and deaching he need for explici ransfer of parameers beween seen and unseen objecs before inference. However, NOC raining can resul in sub-opimal soluions as he addiional raining aemps o opimize hree differen loss funcions simulaneously. CBS, leverages an approximae search algorihm o guaranee he inclusion of seleced words during inference of a capion generaion model. These words are however only consrained on he image ags produced by a image classifier. And he vocabulary used o find similar words as candidaes for replacemen during inference is usually kep very large, hence adding exra compuaional complexiy. -C avoids he limiaion of finding similar words during inference by adding a copying mechanism ino capion raining. This assiss he model during inference o decide wheher a word is o be generaed or copied from a dicionary. However, -C suffers from confusion problems since probabiliies during word generaion end o ge very low. In general, aforemenioned approaches also have he following limiaions: (1) The image classifiers used canno predic absrac meaning, like hope, as observed in many web images. (2) Visual feaures exraced from images are confined o he probabiliy of occurrence of a fixed se of labels (i.e. nouns, verbs and adjecives) observed in a resriced daase and canno be easily exended o varied caegories for large-scale experimens. (3) Since an aenion mechanism is missing, imporan regions in an image are never aended. While, he aenion mechanism in our model helps o scale down all possible idenified conceps o 3
4 he relevan conceps during capion generaion. For large-scale applicaions, his plays a crucial role. We inroduce a new model called Knowledge Guided Assisance (KGA) ha explois concepual knowledge provided by a knowledge graph (KG) [6] as exernal semanic aenion hroughou raining and also o aid as a dynamic consrain before and during inference. Hence, i augmens an auxiliary view as done in muli-view learning scenarios. Usage of KGs has already shown improvemens in oher asks, such as in quesion answering over srucured daa, language modeling [1], and generaion of facoid quesions [12]. 3 Describing Images wih Unseen Objecs Using Knowledge Guided Assisance (KGA) In his secion, we presen our capion generaion model o generae capions for unseen visual objec caegories wih knowledge assisance. KGAs core goal is o inroduce exernal semanic aenion (ESA) ino he learning and also work as a consrain before and during inference for ransferring informaion beween seen words and unseen visual objec caegories. 3.1 Capion Generaion Model Our image capion generaion model (henceforh, KGA-CGM) combines hree imporan componens: a language model pre-rained on unpaired exual corpora, exernal semanic aenion (ESA) and image feaures wih a exual (T), semanic (S) and visual (V) layer (i.e. TSV layer) for predicing he nex word in he sequence when learned using image-capion pairs. In he following, we presen each of hese componens separaely while Figure 2 presens he overall archiecure of KGA-CGM. Language Model This componen is crucial o ransfer he senence srucure for unseen visual objec caegories. Language model is implemened wih wo long shor-erm memory () [5] layers o predic he nex word given previous words in a senence. If w 1:L represen he inpu o he forward of layer-1 for capuring forward inpu sequences ino hidden sequence vecors ( h 1 1:L R H ), where L is he final ime sep. Then encoding of inpu word sequences ino hidden layer-1 and hen ino layer-2 a each ime sep is achieved as follows: h 1 = L1-F( w ; Θ) (1) h 2 = L2-F( h 1 ; Θ) (2) where Θ represen hidden layer parameers. The encoded final hidden sequence ( h 2 R H ) a ime sep is hen used for predicing he probabiliy disribuion of he nex word given by p +1 = sofmax(h 2 ). The sofmax layer is only used while raining wih unpaired exual corpora and no used when learned wih image capions. 4
5 ... Resauran I I Node1 Node2 F Node3 P I I Node4 Node6 I Pizza P W W Node5 F Chef... Language Model Fig. 2. KGA-CGM is buil wih hree componens. A language model buil wih wolayer forward (L1-F and L2-F), a muli-word-label classifier o generae image visual feaures and a muli-eniy-label classifier ha generaes eniy-labels linked o a KG serving as a parial image specific scene graph. This informaion is furher leveraged o acquire eniy vecors for supporing ESA. w represens he inpu capion word, c he semanic aenion, p he oupu of probabiliy disribuion over all words and y he prediced word a each ime sep. BOS and EOS represen he special okens. Exernal Semanic Aenion (ESA) Our objecive in ESA is o exrac semanic aenion from an image by leveraging semanic knowledge in KG as eniy-labels obained using a muli-eniy-label image classifier (discussed in he Secion 4.2). Here, eniy-labels are analogous o paches or aribues of an image. In formal erms, if ea i is an eniy-label and e i R E he eniy-label vecor among se of eniy-label vecors (i = 1,.., L) and β i he aenion weigh of e i hen β i is calculaed a each ime sep using Equaion 3. β i = exp(o i ) L j=1 exp(o j) (3) where O i = f(e i, h 2 ) represen scoring funcion which condiions on he layer-2 hidden sae (h 2 ) of a capion language model. I can be observed ha he scoring funcion f(e i, h 2 ) is crucial for deciding aenion weighs. Also, relevance of he hidden sae wih each eniy-label is calculaed using Equaion 4. f(e i, h 2 ) = anh((h 2 ) T W he e i ) (4) where W he R H E is a bilinear parameer marix. Once he aenion weighs are calculaed, he sof aenion weighed vecor of he conex c, which is a dynamic represenaion of he capion a ime sep is given by Equaion 5 c = L β i e i (5) i=1 5
6 Here, c R E and L represen he cardinaliy of eniy-labels per image-capion pair insance. Image Feaures & TSV Layer & Nex Word Predicion Visual feaures for an image are exraced using muli-word-label image classifier (discussed in he Secion 4.2). To be consisen wih oher approaches [4, 15] and for a fair comparison, our visual feaures (I) also have objecs ha we aim o describe ouside of he capion daases besides having word-labels observed in paired image-capion daa. Once he oupu from all componens is acquired, he TSV layer is employed o inegrae heir feaures i.e. exual (T ), semanic (S) and visual (V ) yielded by language model, ESA and images respecively. Thus, TSV acs as a ransformaion layer for molding hree differen feaure spaces ino a single common space for predicion of nex word in he sequence. If h 2 R H, c R E and I R I represen vecors acquired a each ime sep from language model, ESA and images respecively. Then he inegraion a TSV layer of KGA-CGM is provided by Equaion 6. T SV = W h 2 h 2 + W c c + W I I (6) where W h 2 R vs H, W c R vs E and W I R vs I are linear conversion marices and vs is he image-capion pair raining daase vocabulary size. The oupu from he TSV layer a each ime sep is furher used for predicing he nex word in he sequence using a sofmax layer given by p +1 = sofmax(t SV ). 3.2 KGA-CGM Training To learn parameers of KGA-CGM, firs we freeze he parameers of he language model rained using unpaired exual corpora. Thus, enabling only hose parameers o be learned wih image-capion pairs emerging from ESA and TSV layer such as W he, W h 2, W c and W I. KGA-CGM is now rained o opimize he cos funcion ha minimizes he sum of he negaive log likelihood of he appropriae word a each ime sep given by Equaion 7. min 1 θ N N L (n) n=1 =0 log(p(y (n) )) (7) Where L (n) represen he lengh of senence (i.e. capion) wih beginning of senence (BOS), end of senence (EOS) okens a n-h raining sample and N as a number of samples used for raining. 3.3 KGA-CGM Consrained Inference Inference in KGA-CGM refer o he generaion of descripions for es images. Here, inference is no sraighforward as in he sandard image capion generaion approaches [16] because unseen visual objec caegories have no parallel 6
7 capions hroughou raining. Hence hey will never be generaed in a capion. Thus, unseen visual objec caegories require guidance eiher before or during inference from similar seen words ha appear in he paired image-capion daase and likely also from image labels. In our case, we achieve he guidance boh before and during inference wih varied echniques. Guidance before Inference We firs idenify he seen words in he paired image-capion daase similar o he visual objec caegories unseen in imagecapion daase by esimaing he semanic similariy using heir Glove embeddings [9] learned using unpaired exual corpora (more deails in Secion 4.1). Furhermore, we uilize his informaion o perform dynamic ransfer beween seen words visual feaures (W I ), language model (W h 2 ) and exernal semanic aenion (W c ) weighs and unseen visual objec caegories. To illusrae, if (v unseen, i unseen ) and (v closes, i closes ) denoe he indexes of unseen visual objec caegory zebra and is semanically similar known word giraffe in a vocabulary (v s ) and visual feaures (i s ) respecively. Then o describe images wih zebra in he similar manner as of giraffe, he ransfer of weighs is performed beween hem by assigning W c [v unseen,:], W h 2 [v unseen,:] and W I [v unseen,:] o W c [v closes,:], W h 2 [v closes,:] and W I [v closes,:] respecively. Furhermore, W I [i unseen,i closes ], W I [i closes,i unseen ] is se o zero for removing muual dependencies of seen and unseen words presence in an image. Hence, aforemenioned procedure will updae he KGA-CGM rained model before inference o assis he generaion of unseen visual objec caegories during inference as given by Algorihm 1. Inpu: M={W he, W h 2, W c, W I } Oupu: M new 1 Iniialize Lis(closes) = cosine disance(lis(unseen),vocabulary) ; 2 Iniialize W c [v unseen,:], W h 2 [v unseen,:], W I [v unseen,:] = 0 ; 3 Funcion Before Inference 4 forall iems T in closes and Z in unseen do 5 if T and Z is vocabulary hen 6 W c [v Z,:] = W c [v T,:] ; 7 W h 2 [v Z,:] = W h 2 [v T,:] ; 8 W I [v Z,:] = W I [v T,:] ; 9 end 10 if i T and i Z in visual feaures hen 11 W I [i Z,i T ]=0 ; 12 W I [i T,i Z]=0 ; 13 end 14 end 15 M new = M ; 16 reurn M new ; 17 end Algorihm 1: Consrained Inference Overview (Before) 7
8 Guidance during Inference The updaed KGA-CGM model is used for generaing descripions of unseen visual objec caegories. However, in he beforeinference procedure, he closes words o unseen visual objec caegories are idenified using embeddings ha are learned only using exual corpora and are never consrained on images. This obsrucs he view from an image leading o spurious resuls. We resolve such nuances during inference by consraining he beam search used for descripion generaion wih image eniy-labels (ea). In general, beam search is used o consider he bes k senences a ime o idenify he senence a he nex ime sep. Our modificaion o beam search is achieved by adding a exra consrain o check if a generaed unseen visual objec caegory is par of he eniy-labels. If i s no, unseen visual objec caegories are never replaced wih heir closes seen words. Algorihm 2 presens he overview of KGA-CGM guidance during inference. Inpu: M new, Im labels, beam-size k, word w Oupu: bes k successors 1 Iniialize Im labels = Top-5 (ea) ; 2 Iniialize beam-size k ; 3 Iniialize word w=null ; 4 Funcion During Inference 5 forall Sae s of k words do 6 w=s ; 7 if closes[w] in ea hen 8 s = closes[w]; 9 end 10 else 11 s = w ; 12 end 13 end 14 reurn bes k successors ; 15 end Algorihm 2: Consrained Inference Overview (During) 4 Experimenal Seup 4.1 Resources and Daases Our approach is dependen on several resources and daases. Knowledge Graphs (KGs) and Unpaired Texual Corpora There are several openly available KGs such as DBpedia, Wikidaa, and YAGO which provide semanic knowledge encapsulaed in eniies and heir relaionships. We 8
9 choose DBpedia as our KG for eniy annoaion, as i is one of he exensively used resource for semanic annoaion and disambiguaion [6]. For learning weighs of he language model and also Glove word embeddings, we have explored differen unpaired exual corpora from ou-of-domain sources (i.e. ou of image-capion parallel corpora) such as he Briish Naional Corpus (BNC) 3, Wikipedia (Wiki) and subse of SBU1M 4 capion ex conaining 947 caegories of ILSVRC12 daase [11]. NLTK 5 senence okenizer is used o exrac okenizaions and around 70k+ words vocabulary is exraced wih Glove embeddings. Unseen Objecs Descripion (Ou-of-Domain MSCOCO & ImageNe) To evaluae KGA-CGM, we use he subse of MSCOCO daase [7] proposed by Hendricks e al. [4]. The daase is obained by clusering 80 image objec caegory labels ino 8 clusers and hen selecing one objec from each cluser o be held ou from he raining se. Now he raining se does no conain he images and senences of hose 8 objecs represened by bole, bus, couch, microwave, pizza, racke, suicase and zebra. Thus making he MSCOCO raining daase o consiue 70,194 image-capion pairs. While validaion se of image-capion pairs are again divided ino each for esing and validaion. Now, he goal of KGA-CGM is o generae capion for hose es images which conain hese 8 unseen objec caegories. Henceforh, we refer his daase as ou-of-domain MSCOCO. To evaluae KGA-CGM on more challenging ask, we aemp o describe images ha conain wide variey of objecs as observed on he web. To imiae such a scenario, we colleced images from collecions conaining images wih wide variey of objecs. Firs, we used same se of images as earlier approaches [15, 17] which are subse of ImageNe [3] consiuing 642 objec caegories used in Hendricks e al. [4] who do no occur in MSCOCO. However, 120 ou of hose 642 objec caegories are par of ILSVRC Muli-Label Image Classifiers The imporan consiuens ha influence KGA-CGM are he image eniy-labels and visual feaures. Idenified objecs/acions ec. in an image are embodied in visual feaures, while eniy-labels capure he semanic knowledge in an image grounded in KG. In his secion, we presen he approach o exrac boh visual feaures and eniy-labels. Muli-Word-label Image Classifier To exrac visual feaures of ou-ofdomain MSCOCO images, emulaing Hendricks e al. [4] a muli-word-label classifier is buil using he capions aligned o an image by exracing parof-speech (POS) ags such as nouns, verbs and adjecives aained for each word 3 hp:// 4 hp://vision.cs.sonybrook.edu/~vicene/sbucapions/ 5 hp:// 9
10 in he enire MSCOCO daase. For example, he capion A young child brushes his eeh a he sink conains word-labels such as young (JJ), child (NN), eeh (NN) ec., ha represen conceps in an image. An image classifier is rained now wih 471 word-labels using a sigmoid cross-enropy loss by fineuning VGG-16 [13] pre-rained on he raining par of he ILSVRC12. The visual feaures exraced for a new image represen he probabiliies of 471 image labels observed in ha image. For exracing visual feaures from ImageNe images, we replace he muli-word-label classifier wih he lexical classifier [4] learned wih 642 ImageNe objec caegories. Muli-Eniy-label Image Classifier To exrac semanic knowledge for ouof-domain MSCOCO images analogous o he word-labels, a muli-eniy-label classifier is build wih eniy-labels aained from a knowledge graph annoaion ool such as DBpedia spoligh 6 on raining se of MSCOCO consiuing 82,783 raining image-capion pairs. In oal around 812 unique labels are exraced wih an average of 3.2 labels annoaed per image. To illusrae, considering he capion presened in he aforemenioned secion, eniy labels exraced are Brush 7 and Tooh 8. An image classifier is now rained wih muliple eniylabels using sigmoid cross-enropy loss by fine-uning VGG-16 [13] pre-rained on he raining par of he ILSVRC12. For exracing eniy-labels from ImageNe images, we again leveraged lexical classifier [4] learned wih 642 ImageNe objec caegories. However, as all 642 caegories denoe WordNe synses, we build a connecion beween hese caegories and DBpedia by leveraging BabelNe [8] for muli-eniy-label classifier. To illusrae, for visual objec caegory womba (wordneid: n ) in ImageNe can be linked o DBpedia Womba 9. Hence, his makes our mehod very modular for building new image classifiers o incorporae semanic knowledge. 4.3 Eniy-Label Embeddings We presened earlier ha he acquisiion of eniy-labels for raining muli-eniylabel classifiers were obained using DBpedia spoligh eniy annoaion and disambiguaion ool. Hence, eniy-labels are expeced o encapsulae semanic knowledge grounded in KG. Furher, eniies in a KG can be represened wih embeddings by capuring heir relaional informaion. In our work, we see he efficacy of hese embeddings for capion generaion. Thus, we leverage eniy-label embeddings for compuing semanic aenion observed in an image wih respec o he capion as observed from KG. To obain eniy-label embeddings, we adoped he RDF2Vec [10] approach and generaed 500 dimensional vecor represenaions for 812 and 642 eniy-labels o describe ou-of-domain MSCOCO and ImageNe images respecively. 6 hps://gihub.com/dbpedia-spoligh/ 7 hp://dbpedia.org/resource/brush 8 hp://dbpedia.org/resource/tooh 9 hp://dbpedia.org/page/womba 10
11 4.4 Evaluaion Measures To evaluae generaed descripions for he unseen MSCOCO visual objec caegories, we use similar evaluaion merics as earlier approaches [4, 15, 17] such as METEOR and also SPICE [2]. However, CIDEr [14] meric is no used as i is required o calculae he inverse documen frequency used by his meric across he enire es se and no jus unseen objec subses. F1 score is also calculaed o measure he presence of unseen objecs in he generaed capions when compared agains reference capions. Furhermore, o evaluae ImageNe objec caegories descripion generaion: we leveraged F1 and also oher merics such as Unseen and Accuracy scores [15, 17]. The Unseen score measures he percenage of all novel objecs menioned in generaed descripions, while accuracy measure percenage of image descripions correcly addressed he unseen objecs. 5 Experimens The experimens are conduced o evaluae he efficacy of KGA-CGM model for describing ou-of-domain MSCOCO and ImageNe images. 5.1 Implemenaion KGA-CGM model consiues hree imporan componens i.e. language model, visual feaures and eniy-labels. Before learning KGA-CGM model wih imagecapion pairs, we firs learn he weighs of language model and keep i fixed during he raining of KGA-CGM model. To learn language model, we leverage unpaired exual corpora (e.g. enire MSCOCO se, Wiki, BNC ec.) and provide inpu word embeddings represening 256 dimensions pre-rained wih Glove [9] defaul seings on he same unpaired exual corpora. Hidden layer dimensions of language model are se o 512. KGM-CGM model is hen rained using imagecapion pairs wih Adam opimizer wih gradien clipping having maximum norm of 1.0 for abou epochs. Validaion daa is used for model selecion and experimens are implemened wih Keras+Theano backend Describing Ou-of-Domain MSCOCO Images In his secion, we evaluae KGA-CGM using ou-of-domain MSCOCO daase. Quaniaive Analysis We compared our complee KGA-CGM model wih he oher exising models ha generae image descripions on ou-of-domain MSCOCO. To have a fair comparison, only hose resuls are compared which used VGG-16 o generae image feaures. Table 1 shows he comparison of individual and average scores based on METEOR, SPICE and F1 on all 8 unseen visual objec caegories wih beam size 1. I can be noiced ha KGA-CGM 10 hps://gihub.com/adiyamogadala/kga 11
12 F1 Model Beam microwave racke bole zebra pizza couch bus suicase Average DCC [4] NOC [15] > CBS(T4) [2] > C [17] > KGA-CGM METEOR DCC [4] NOC [15] > C [17] > CBS(T4) [2] > KGA-CGM SPICE DCC [4] > CBS(T4) [2] > KGA-CGM Table 1. Measures for all 8 unseen objecs. Underline shows he second bes. wih beam size 1 was comparable o oher approaches even hough i used fixed vocabulary from image-capion pairs. For example, CBS [2] used expanded vocabulary of 21,689 when compared o 8802 by us. Also, our word-labels per image are fixed, while CBS uses a varying size of prediced image ags (T1-4). This makes i non-deerminisic and can increase uncerainy, as varying ags will eiher increase or decrease he performance. Furhermore, we also evaluaed KGA-CGM for he res of seen visual objec caegories in he Table 2. I can be observed ha our KGA-CGM ouperforms exising approaches as i did no undermine he in-domain descripion generaion, alhough i was uned for ou-of-domain descripion generaion. Model Seen Objecs Beam METEOR SPICE F1-score DCC [4] CBS(T4) [2] > KGA-CGM KGA-CGM > Table 2. Average measures of MSCOCO seen objecs. 12
13 Qualiaive Analysis In Figure 3, sample predicions of our bes KGA-CGM model is presened. I can be observed ha eniy-labels has shown an influence for capion generaion. Since, eniies as image labels are already disambiguaed, i aained high similariy in he predicion of a word hus adding useful semanics. Figure 3 presens he example unseen visual objecs descripions. Unseen Objec: Bole Prediced Eniy-Labels (Top-3): Wine_glass, Wine_bole, Bole Base: A vase wih a flower in i siing on a able NOC: A wine bole siing on a able nex o a wine bole KGA-CGM : A bole of wine siing on op of a able Unseen Objec: Couch Prediced Eniy-Labels (Top-3): Cake,Couch,Glass Base: A person is laying down on a bed NOC: A woman siing on a chair wih a large piece of cake on her arm KGA-CGM : A woman siing on a couch wih a remoe Unseen Objec: Pizza Prediced Eniy-Labels (Top-3): Pizza,Resauran,Ha Base: A man is making a sandwich in a resauran NOC: A man sanding nex o a able wih a pizza in fron of i. KGA-CGM: A man is holding a pizza in his hands Unseen Objec: Suicase Prediced Eniy-Labels (Top-3): Ca,Baggage,Black_Ca Base: A ca laying on op of a pile of books NOC: A ca laying on a suicase on a bed KGA-CGM: A ca laying inside of a suicase on a bed Unseen Objec: Bus Prediced Eniy-Labels (Top-3): Bus,Public_Transpor,Transi_Bus Base: A car is parked on he side of he sree NOC: Bus driving down a sree nex o a bus sop. KGA-CGM: A whie bus is parked on he sree Unseen Objec: Microwave Prediced Eniy-Labels (Top-3):Refrigeraor,Oven,Microwave_Oven Base: A wooden able wih a refrigeraor and a brown cabine NOC: A kichen wih a refrigeraor, refrigeraor, and refrigeraor. KGA-CGM: A kichen wih a microwave, oven and a refrigeraor Unseen Objec: Racke Prediced Eniy-Labels (Top-3):Tennis, Racke_(spors_equipmen), Cour Base: A ennis player geing ready o serve he ball NOC: A woman cour holding a ennis racke on a cour. KGA-CGM: A woman playing ennis on a ennis cour wih a racke. Unseen Objec: Zebra Prediced Eniy-Labels (Top-3):Zebra,Enclosure,Zoo Base: A couple of animals ha are sanding in a field NOC: Zebras sanding ogeher in a field wih zebras KGA-CGM: A group of zebras sanding in a line Fig. 3. Sample predicions of KGA-CGM on ou-of-domain MSCOCO Images wih Beam Size 1 when compared agains base model and NOC [15] 5.3 Describing ImageNe Images ImageNe images do no conain any ground-ruh capions and conain exacly one unseen visual objec caegory per image. Iniially, we firs rerain differen language models using unpaired exual daa (Secion 4.1) and also he enire MSCOCO raining se. Furhermore, he KGA-CGM model is rebuil for each one of hem separaely. To describe ImageNe images, image classifiers presened in he Secion 4.2 are leveraged. Table 3 summarizes he experimenal resuls aained on 634 caegories (i.e. no all 642) o have fair comparison wih oher approaches. By adoping only MSCOCO raining daa for language model, our KGA-CGM makes he relaive improvemen over NOC and -C in all caegories i.e. unseen, F1 and accuracy. Figure 4 shows few sample descripions. 6 Key Findings The key observaions of our research are: (1) The ablaion sudy conduced o undersand he influence of differen componens in KGA-CGM has shown ha using exernal semanic aenion and consrained inference has superior performance when compared o using only eiher of hem. Also, increasing he beam size during inference has shown a drop in all measures. This is basically 13
14 Model Unpaired-Tex Unseen F1 Accuracy NOC [15] MSCOCO BNC&Wiki C [17] MSCOCO BNC&Wiki KGA-CGM MSCOCO BNC&Wiki BNC&Wiki&SBU1M Table 3. Describing ImageNe Images wih Beam size 1. Resuls of NOC and -C (wih Glove) are adoped from Yao e al. [17] Unseen Objec: Truffle Guidance Before Inference: food ruffle Base: A person holding a piece of paper. KGA-CGM: A close up of a person holding ruffle Unseen Objec: Papaya Guidance Before Inference: banana papaya Base: A woman sanding in a garden. KGA-CGM: These are ripe papaya hanging on a ree Unseen Objec: Mammoh Guidance Before Inference: elephan mammoh Base: A baby elephan sanding in waer KGA-CGM: A herd of mammoh sanding on op of a green field Unseen Objec: Blackbird Guidance Before Inference: bird blackbird Base: A bird sanding in a field of green grass KGA-CGM: A blackbird sanding in he grass Fig. 4. ImageNe images wih bes KGA-CGM model from Table 3. Guided before inference shows which words are used for ransfer beween seen and unseen. adhered o he influence of muliple words on unseen objecs. (2) The performance advanage becomes clearer if he domain of unseen objecs is broadened. In oher words: KGA-CGM specifically improves over he sae-of-he-ar in seings ha are larger and less conrolled. Hereby, KGA-CGM scales o one order of magniude more unseen objecs wih moderae performance decreases. (3) The influence of he closes seen words (i.e. observed in image-capion pairs) and he unseen visual objec caegories played a prominen role for generaing descripions. For example in ou-of-domain MSCOCO, words such as suicase / bag, bole / glass and bus/ruck are semanically similar and are also used in he similar manner in a senence added excellen value. However, some words usually cooccur such as racke / cour and pizza / plae played differen roles in senences and lead o few grammaical errors. (4) The decrease in performance have a high correlaion wih he discrepancy beween he domain where seen and unseen objecs come from. 7 Conclusion and Fuure Work In his paper, we presened an approach o generae capions for images ha lack parallel capions during raining wih he assisance from semanic knowledge encapsulaed in KGs. In he fuure, we plan o expand our models o build mulimedia knowledge graphs along wih image descripions which can be used for finding relaed images or can be searched wih long exual queries. 14
15 8 Acknowledgemens Firs auhor is graeful o KHYS a KIT for heir research ravel gran and Compuaional Media Lab a ANU for providing access o heir K40x GPUs. References 1. Ahn, S., Choi, H., Pärnamaa, T., Bengio, Y.: A neural knowledge language model. arxiv preprin arxiv: (2016) 2. Anderson, P., Fernando, B., Johnson, M., Gould, S.: Guided open vocabulary image capioning wih consrained beam search. In: EMNLP (2017) 3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagene: A large-scale hierarchical image daabase. In: Compuer Vision and Paern Recogniion, CVPR IEEE Conference on. pp IEEE (2009) 4. Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep composiional capioning: Describing novel objec caegories wihou paired raining daa. In: CVPR. pp (2016) 5. Hochreier, S., Schmidhuber, J.: Long shor-erm memory. Neural compuaion 9(8), (1997) 6. Lehmann, J., Isele, R., Jakob, M., Jenzsch, A., Konokosas, D., Mendes, P.N., e al.: Dbpedia a large-scale, mulilingual knowledge base exraced from wikipedia. Semanic Web (2015) 7. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zinick, C.L.: Microsof coco: Common objecs in conex. In: ECCV. pp Springer (2014) 8. Navigli, R., Ponzeo, S.P.: Babelne: The auomaic consrucion, evaluaion and applicaion of a wide-coverage mulilingual semanic nework. Arificial Inelligence 193, (2012) 9. Penningon, J., Socher, R., Manning, C.: Glove: Global vecors for word represenaion. In: EMNLP. pp (2014) 10. Risoski, P., Paulheim, H.: Rdf2vec: Rdf graph embeddings for daa mining. In: Inernaional Semanic Web Conference. pp Springer (2016) 11. Russakovsky, O., Deng, J., Su, H., Krause, J., Saheesh, S., Ma, S., Huang, Z., Karpahy, A., Khosla, A., Bernsein, M., e al.: Imagene large scale visual recogniion challenge. Inernaional Journal of Compuer Vision 115(3), (2015) 12. Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generaing facoid quesions wih recurren neural neworks: The 30m facoid quesion-answer corpus. arxiv preprin arxiv: (2016) 13. Simonyan, K., Zisserman, A.: Very deep convoluional neworks for large-scale image recogniion. arxiv preprin arxiv: (2014) 14. Vedanam, R., Zinick, L.C., Parikh, D.: Cider: Consensus-based image descripion evaluaion. In: CVPR. pp (2015) 15. Venugopalan, S., Hendricks, L.A., Rohrbach, M., Mooney, R., Darrell, T., Saenko, K.: Capioning images wih diverse objecs. In: CVPR (2017) 16. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and ell: Lessons learned from he 2015 mscoco image capioning challenge. IEEE ransacions on paern analysis and machine inelligence 39(4), (2017) 17. Yao, T., Yingwei, P., Yehao, L., Mei, T.: Incorporaing copying mechanism in image capioning for learning novel objecs. In: CVPR (2017) 15
Neural Network Model of the Backpropagation Algorithm
Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák
More informationMore Accurate Question Answering on Freebase
More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world
More informationFast Multi-task Learning for Query Spelling Correction
Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,
More informationAn Effiecient Approach for Resource Auto-Scaling in Cloud Environments
Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling
More informationChannel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices
Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion
More information1 Language universals
AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...
More informationMyLab & Mastering Business
MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab
More informationInformation Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports
Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground
More informationA Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationDiverse Concept-Level Features for Multi-Object Classification
Diverse Concept-Level Features for Multi-Object Classification Youssef Tamaazousti 12 Hervé Le Borgne 1 Céline Hudelot 2 1 CEA, LIST, Laboratory of Vision and Content Engineering, F-91191 Gif-sur-Yvette,
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationLip Reading in Profile
CHUNG AND ZISSERMAN: BMVC AUTHOR GUIDELINES 1 Lip Reading in Profile Joon Son Chung http://wwwrobotsoxacuk/~joon Andrew Zisserman http://wwwrobotsoxacuk/~az Visual Geometry Group Department of Engineering
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationarxiv: v1 [cs.lg] 7 Apr 2015
Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationTRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen
TRANSFER LEARNING OF WEAKLY LABELLED AUDIO Aleksandr Diment, Tuomas Virtanen Tampere University of Technology Laboratory of Signal Processing Korkeakoulunkatu 1, 33720, Tampere, Finland firstname.lastname@tut.fi
More informationHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION Atul Laxman Katole 1, Krishna Prasad Yellapragada 1, Amish Kumar Bedi 1, Sehaj Singh Kalra 1 and Mynepalli Siva Chaitanya 1 1 Samsung
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationBuild on students informal understanding of sharing and proportionality to develop initial fraction concepts.
Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationarxiv: v2 [cs.cv] 3 Aug 2017
Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis University of Maryland, College Park Abstract Linguistic Knowledge
More informationTaxonomy-Regularized Semantic Deep Convolutional Neural Networks
Taxonomy-Regularized Semantic Deep Convolutional Neural Networks Wonjoon Goo 1, Juyong Kim 1, Gunhee Kim 1, Sung Ju Hwang 2 1 Computer Science and Engineering, Seoul National University, Seoul, Korea 2
More informationarxiv:submit/ [cs.cv] 2 Aug 2017
Associative Domain Adaptation Philip Haeusser 1,2 haeusser@in.tum.de Thomas Frerix 1 Alexander Mordvintsev 2 thomas.frerix@tum.de moralex@google.com 1 Dept. of Informatics, TU Munich 2 Google, Inc. Daniel
More informationCultivating DNN Diversity for Large Scale Video Labelling
Cultivating DNN Diversity for Large Scale Video Labelling Mikel Bober-Irizar mikel@mxbi.net Sameed Husain sameed.husain@surrey.ac.uk Miroslaw Bober m.bober@surrey.ac.uk Eng-Jon Ong e.ong@surrey.ac.uk Abstract
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationContents. Foreword... 5
Contents Foreword... 5 Chapter 1: Addition Within 0-10 Introduction... 6 Two Groups and a Total... 10 Learn Symbols + and =... 13 Addition Practice... 15 Which is More?... 17 Missing Items... 19 Sums with
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationLinking the Ohio State Assessments to NWEA MAP Growth Tests *
Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationFUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria
FUZZY EXPERT SYSTEMS 16-18 18 February 2002 University of Damascus-Syria Dr. Kasim M. Al-Aubidy Computer Eng. Dept. Philadelphia University What is Expert Systems? ES are computer programs that emulate
More informationSemantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma
Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Adam Abdulhamid Stanford University 450 Serra Mall, Stanford, CA 94305 adama94@cs.stanford.edu Abstract With the introduction
More informationWhat is a Mental Model?
Mental Models for Program Understanding Dr. Jonathan I. Maletic Computer Science Department Kent State University What is a Mental Model? Internal (mental) representation of a real system s behavior,
More informationAutomatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories
Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories Ziad Al-Halah Rainer Stiefelhagen Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany Abstract
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationarxiv: v1 [cs.cl] 20 Jul 2015
How to Generate a Good Word Embedding? Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China {swlai, kliu,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationEntrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany
Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International
More informationShort Text Understanding Through Lexical-Semantic Analysis
Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationCopyright Corwin 2015
2 Defining Essential Learnings How do I find clarity in a sea of standards? For students truly to be able to take responsibility for their learning, both teacher and students need to be very clear about
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationA Review: Speech Recognition with Deep Learning Methods
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationResidual Stacking of RNNs for Neural Machine Translation
Residual Stacking of RNNs for Neural Machine Translation Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Akiva Miura Nara Institute of Science and Technology miura.akiba.lr9@is.naist.jp
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationSORT: Second-Order Response Transform for Visual Recognition
SORT: Second-Order Response Transform for Visual Recognition Yan Wang 1, Lingxi Xie 2( ), Chenxi Liu 2, Siyuan Qiao 2 Ya Zhang 1( ), Wenjun Zhang 1, Qi Tian 3, Alan Yuille 2 1 Cooperative Medianet Innovation
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationACTIVITY: Comparing Combination Locks
5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.
More informationГлубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках
Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationA DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA
International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF
More informationConcepts and Properties in Word Spaces
Concepts and Properties in Word Spaces Marco Baroni 1 and Alessandro Lenci 2 1 University of Trento, CIMeC 2 University of Pisa, Department of Linguistics Abstract Properties play a central role in most
More informationJ j W w. Write. Name. Max Takes the Train. Handwriting Letters Jj, Ww: Words with j, w 321
Write J j W w Jen Will Directions Have children write a row of each letter and then write the words. Home Activity Ask your child to write each letter and tell you how to make the letter. Handwriting Letters
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationT2Ts, revised. Foundations
T2Ts, revised Foundations LT, SC, Agenda LT: As a litterateur, I can utilize active reading strategies to support my reading comprehension and I can explain the expectations of the first Embedded Assessment
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationFourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade
Fourth Grade Libertyville School District 70 Reporting Student Progress Fourth Grade A Message to Parents/Guardians: Libertyville Elementary District 70 teachers of students in kindergarten-5 utilize a
More informationWebLogo-2M: Scalable Logo Detection by Deep Learning from the Web
WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web Hang Su Queen Mary University of London hang.su@qmul.ac.uk Shaogang Gong Queen Mary University of London s.gong@qmul.ac.uk Xiatian Zhu
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationWebLogo-2M: Scalable Logo Detection by Deep Learning from the Web
WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web Hang Su Queen Mary University of London hang.su@qmul.ac.uk Shaogang Gong Queen Mary University of London s.gong@qmul.ac.uk Xiatian Zhu
More informationAdvanced Grammar in Use
Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationCAFE ESSENTIAL ELEMENTS O S E P P C E A. 1 Framework 2 CAFE Menu. 3 Classroom Design 4 Materials 5 Record Keeping
CAFE RE P SU C 3 Classroom Design 4 Materials 5 Record Keeping P H ND 1 Framework 2 CAFE Menu R E P 6 Assessment 7 Choice 8 Whole-Group Instruction 9 Small-Group Instruction 10 One-on-one Instruction 11
More informationAP Chemistry
AP Chemistry 2016-2017 Welcome to AP Chemistry! I am so excited to have you in this course next year! To get geared up for the class, there are some things that you need to do this summer. None of it is
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More informationBooks Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny
By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from
More informationDear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!
Dear Teacher: Welcome to Reading Rods! Your Sentence Building Reading Rod Set contains 156 interlocking plastic Rods printed with words representing different parts of speech and punctuation marks. Students
More information