More Accurate Question Answering on Freebase

Size: px

Start display at page:

Download "More Accurate Question Answering on Freebase"

Rosa Pearson
6 years ago
Views:

1 More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg Freiburg, Germany {bas, ABSTRACT Real-world facoid or lis quesions ofen have a simple srucure, ye are hard o mach o facs in a given knowledge base due o high represenaional and linguisic variabiliy. For example, o answer who is he ceo of apple on Freebase requires a mach o an absrac leadership eniy wih hree relaions role, organizaion and person, and wo oher eniies apple inc and managing direcor. Recen years have seen a surge of research aciviy on learning-based soluions for his mehod. We furher advance he sae of he ar by adoping learning-o-rank mehodology and by fully addressing he inheren eniy recogniion problem, which was negleced in recen works. We evaluae our sysem, called Aqqu, on wo sandard benchmarks, Free917 and WebQuesions, improving he previous bes resul for each benchmark considerably. These wo benchmarks exhibi quie differen challenges, and many of he exising approaches were evaluaed (and work well) only for one of hem. We also consider efficiency aspecs and ake care ha all quesions can be answered ineracively (ha is, wihin a second). Maerials for full reproducibiliy are available on our websie: hp://ad.informaik. uni-freiburg.de/publicaions. 1. INTRODUCTION Knowledge bases like Freebase have reached an impressive coverage of general knowledge. The daa is sored in a clean and srucured manner, and can be queried unambiguously via srucured languages like SPARQL. However, given he enormous amoun of informaion (2.9 billion riples for Freebase), mapping a search desire o he righ query can be an exremely hard ask even for an exper user. For example, consider he (seemingly) simple quesion who is he ceo of apple. The answer is indeed conained in Freebase, and he corresponding SPARQL query 1 is: 1 For he sake of readabiliy, prefixes are omied from he eniy and relaion names. Permission o make digial or hard copies of all or par of his work for personal or classroom use is graned wihou fee provided ha copies are no made or disribued for profi or commercial advanage and ha copies bear his noice and he full ciaion on he firs page. Copyrighs for componens of his work owned by ohers han ACM mus be honored. Absracing wih credi is permied. To copy oherwise, or republish, o pos on servers or o redisribue o liss, requires prior specific permission and/or a fee. Reques permissions from Permissions@acm.org. CIKM 15, Ocober 19 23, 2015, Melbourne, Ausralia. c 2015 ACM. ISBN /15/10...$ DOI: hp://dx.doi.org/ / selec?name where { Managing Direcor job ile.people wih his ile?0.?0 employmen enure.company Apple Inc.?0 employmen enure.person?name } I would clearly be preferable, if we could jus ask he quesion in naural language, and he machine auomaically compues he corresponding SPARQL query. This is he problem we consider in his paper. We focus on srucurally simple quesions, like he one above. They involve k eniies (ypically wo or hree, in he example above: ceo and apple and he resul eniy), which are linked via a single k-ary relaion in he knowledge base. For languages like SPARQL, k-ary relaions for k > 2 can be represened by a special eniy (one for each k-uple in he relaion) and k 1 binary relaions (in he example above: he hree binary relaions in he where clause, all conneced o he?0 eniy). The challenge for hese quesions is o find he maching eniies and relaions in he given knowledge base. The eniy-maching problem is hard, because he quesion may use a varian of he name used in he knowledge base (synonymy), and he knowledge base may conain many eniies wih he same name (polysemy). For example, here are 218 eniies wih he name apple in Freebase, bu he righ mach for he quesion is acually Apple Inc. The relaionmaching problem has he same problem, which is even more difficul for k-ary relaion wih k > 2. As a furher complicaion, quesions like he above do no conain any word ha maches he relaions from he sough for query. 2 Noe how hese problems exacerbae for very large knowledge bases. If we resric o lexical maches, we will ofen miss he correc query. If we allow weaker maches, he number of possibiliies becomes very large. This will become clearer in Secion Conribuions We consider he following as our main conribuions: A new end-o-end sysem ha auomaically ranslaes a given naural-language quesion o he maching SPARQL query on a given knowledge base. Several previous sysems facor ou par of he problem, for example, by assuming he righ eniies for he query o be given by an oracle. See Secion 3 for an overview of our sysem. An evaluaion of our sysem on wo sandard benchmarks, Free917 and WebQuesions, where i ouperforms all pre- 2 This is ypical when he verb o be is used in he quesion.

2 vious approaches significanly. These wo benchmarks exhibi quie differen challenges, and many of he exising approaches were evaluaed (and work well) only for one of hem. See Secion 2 for an overview of he exising approaches, and Secion 5 for he deails of our evaluaion. Inegraion of eniy recogniion in a learning-based approach. Previous learning-based approaches reaed his sub-problem in a simplisic manner, or even facored i ou by assuming he righ eniies o be given as par of he problem. Using learning-o-rank echniques o learn pair-wise comparison of query candidaes. Previous approaches ofen use parser-inspired log-linear models for ranking. We also consider efficiency aspecs and ake care ha all quesions can be answered ineracively, ha is, wihin one second. Many of he previous sysems do no consider his aspec, and ake a leas several seconds and longer o answer a single query. Again, see Secion 5 for some deails. We make he code of our sysem publicly available under hp://ad.informaik.uni-freiburg.de/publicaions. In paricular, his allows reproducing our resuls. The websie also provides various addiional useful maerials; in paricular, a lis of misakes and inconsisencies in he Free917 and WebQuesions benchmarks. Throughou his paper, we focus on Freebase as he currenly larges general-purpose knowledge base. However, here is nohing in our approach specific o Freebase. I works for any knowledge base wih eniies and (possibly k-ary) relaions beween hem. 2. RELATED WORK Much recen work on naural-language queries on knowledge bases has focused on wo recen benchmarks, boh based on Freebase: Free917 and WebQuesions. Secion 2.1 gives an overview over his body of work, inroducing he wo benchmarks on he way. In Secion 5, we compare our new mehod agains all mehods from his secion. Secion 2.2 briefly discusses work using oher benchmarks. 2.1 Work on Free917 and WebQuesions We consider he works in chronological order, briefly highlighing he relaive innovaions o previous works and he corresponding gain in resul qualiy. A more echnical descripion of each of he mehods is provided in Secion 5.3. In [7], he Free917 benchmark was firs inroduced. The benchmark consiss of 917 quesions along wih he correc 3 knowledge-base query. All queries have exacly one (possibly k-ary) relaion. The basic approach of [7] is o exend an exising semanic parser wih correspondences beween naural-language phrases and relaion names in he knowledge base. The correspondences are learned using weak supervision echniques and from he raining porion of he benchmark (70% = 641 quesions). In [15], query candidaes are derived by ransforming an underspecified logical form of a CCG [21] parse. This form is grounded o Freebase using a se of collapsing and expansion operaors ha preserve he ype of he expression. This has he advanage ha i leverages grammaical srucure in he 3 Acually, a small porion of he queries are incorrec, bu his is no a deliberae feaure of he benchmark. quesion and can adjus knowledge base mismaches, and he disadvanage ha i relies on well-formed quesions. A linear model is learned o score derivaions, which are buil using a dynamic programming based parser. In [2], he WebQuesions (WQ) benchmark was inroduced. This benchmark is much larger (5,810 quesion) bu only provides he resul se for each quesion, no he knowledge-base query. This allows gahering more raining daa more easily (he resuls were obained via crowdsourcing). The WQ quesions are also more realisic (hey were obained via he Google Sugges API) and languagewise more diverse han he Free917 quesions, and hence also harder (e.g. who runs china in 2011 asking for he former Chinese Premier). The basic approach of [2] is o generae query candidaes by recursively generaing logical forms. The generaion is guided by a mapping of phrases o knowledge base predicaes and a small se of composiion rules. Candidae scores are learned wih a log-linear model. In a follow-up work [3], he process from [2] is urned on is head by again generaing a naural-language quesion from each query candidae. Scores are hen learned (again wih a log-linear model) based on he similariy beween he quesion represening he query candidae and he original quesion. This allows leverage of ex-similariy informaion (paraphrases) from large ex corpora (unrelaed o he queried knowledge base). In [25], he auhors go anoher sep furher by no even generaing query candidaes. Insead heir approach ries o idenify he cenral eniy of he quesion, and hen ieraes over each eniy conneced (via a single relaion) o ha cenral eniy in he knowledge base. I is hen decided (via a learned model) separaely for each such eniy wheher i becomes par of he resul se. In principle, his allows correc answers even when no single relaion from he knowledge base maches he quesion (e.g., asking for a broher of someone, when he knowledge base only knows abou siblings). On he downside, his adds a lo of addiional feaures o he learning process (he aribues of he resul eniies). Qualiy-wise, he approach does no improve over [2] and [3]. In [19], he auhors go ye a sep furher by no even using he raining daa. Insead, weak-supervision is used o generae learning examples from naural language senences. The parsing sep iself is concepualized as a graphmaching problem beween he graph of a CCG parse and graphs grounded in Freebase eniies and relaions. However, heir approach was evaluaed only on small (and opically narrow) subses of he wo benchmarks. In [4], he auhors ry o solve he problem wihou any naural-language processing (no even POS-agging). They mach he resuls from [3] bu do no improve hem. 2.2 Oher benchmarks Anoher recen noable effor in open-domain quesion answering is he QALD (Quesion Answering over Linked Daa) series of evaluaion campaigns, which sared in See [22] for he laes repor. So far, five benchmarks have been issued, one per year. The challenges behind hese benchmarks are somewha differen han hose behind he Free917 and WebQuesions benchmarks from Secion 2.1: The bigges and mos diverse knowledge base used is DBpedia, which is more han an order of magniude smaller han Freebase (abou 4M vs. abou 40M eniies).

3 A significan fracion of he quesions involves more han one relaion or non-rivial comparaives. For example, wha are he capials of all counries ha he himalayas run hrough or which acor was cas in he mos movies. The raining ses are relaively small ( queries for QALD 1-3). This is mainly due o he fac, discussed in Secion 2.1 above, ha he ground ruh provides no jus he correc resul ses bu also he corresponding SPARQL queries, which requires expensive human exper work. The benchmarks hus give relaively lile opporuniy for supervised learning. Indeed, mos of he paricipaing sysems are unsupervised. I is one of he insighs from our evaluaion in Secion 5 ha supervised learning is key for resuls of he qualiy we achieve. QALD 3 and 4 conain muli-lingual versions of he daases and quesions. For QALD 5, he daase is a combinaion of RDF daa and free ex. For hese reasons, and because here is such a subsanial body of very recen work on Free917 and WebQuesions wih a series of beer and beer resuls, we did no include QALD in our evaluaion. We consider i a very worhwhile endeavor for fuure work hough, o exend our approach o he QALD benchmarks. 3. SYSTEM OVERVIEW We firs describe our overall process of answering a naural language quesion from a knowledge base (KB). In he nex secions we describe each of he seps in deail. Assume we are rying o answer he following quesion (from he WebQuesions benchmark): wha characer does ellen play in finding nemo? Eniy idenificaion. We begin by idenifying eniies from he KB ha are menioned in he quesion. In our example, ellen refers o he v hos Ellen DeGeneres and finding nemo refers o he movie Finding Nemo. However, like for he example in he inroducion, his is no obvious: ellen could also refer o he acor Ellen Page and finding nemo o he video game wih he same name (besides ohers). Insead of fixing a decision on which eniies are menioned, we delay his decision and joinly disambiguae he menioned eniies via he nex seps. Hence, he resul of his sep is a se of (possibly overlapping) eniy menions wih aached confidence scores. Templae maching. Nex, we mach a se of query emplaes o he quesion. Figure 1 shows our emplaes. Each emplae consiss of eniy and relaion placeholders. A mached emplae corresponds o a query candidae which can be execued agains he KB o obain an answer. Our simples emplae consiss of a single eniy and an answer relaion (emplae 1 in Figure 1). One of he query candidaes for our example is generaed by maching he eniy for he v hos Ellen DeGeneres and he relaion parens 4 : <Ellen DeGeneres> <parens> <T> This has he (wrong) inerpreaion of asking for her parens. A slighly more complex emplae conains wo relaions conneced o he eniy via a mediaor objec (emplae 4 We use SPARQL-like riple (subjec, predicae, objec) noaion, where uppercase characers indicae variables. 2 in Figure 1). In our example, his maches a query candidae connecing Ellen Page o absrac film performance objecs, via a film performance relaion, and from here o all he films she aced in via a film relaion: <Ellen Page> <performance> <M> <M> <film> <T> This asks for all films Ellen Page aced in. Ye anoher emplae combines wo eniies via relaions and a mediaor eniy (m in emplae 3 in Figure 1). In our example, Ellen DeGeneres and Finding Nemo are conneced via wo relaions and a film-performance mediaor. <Ellen DeGeneres> <performance> <M> <M> <film> <Finding Nemo> <M> <characer> <T> We find his connecion using an efficien invered index (see Secion 4.2) and coninue maching from he mediaor. In paricular, we creae query candidaes asking for he characer (Dory) and performance ype (Voice) of Ellen DeGeneres in Finding Nemo. The final resul of his sep is a se of all he mached query candidaes. Relaion maching. The query candidaes sill miss he fundamenal informaion abou which relaions were acually menioned and asked for in he quesion. We disinguish hree ways of maching relaions of he query candidae o words in he quesion: 1) via he name or descripion of he relaion in he KB, 2) via words learned for each relaion using disan supervision, 3) via supervised learning on a raining se. Each mach has a confidence score aached. In our example, a word learned for he relaions performance and film connecing an acor o he film she aced in is play. This maches in he query candidaes asking for all films of Ellen Page and for he performance ype or characer of Ellen DeGeneres in Finding Nemo. Furhermore, he word characer maches he relaion wih he same name, whereas he relaion performance ype doesn mach. Coninuing his way, all relaions in all query candidaes are enriched wih informaion abou wha words were mached in which way. Ranking. We now have a se of query candidaes, where each candidae is enriched wih informaion abou which of is eniies and relaions mach which pars of he quesion how well. I remains o rank he candidaes in order o find he bes maching candidae. Noe ha performing ranking a his final sep has he srong benefi of joinly disambiguaing eniies and relaions. A candidae can have a weak mach for an eniy, bu a srong mach for a relaion, and vice versa. By deciding his a he final sage we can idenify hese combinaions as correc, even when one of he maches seems unlikely when considered separaely. Inuiively, for our example, he candidae covering mos words of he quesion is bes. Maching ellen o Ellen Page does no longer allow maching Finding Nemo because hese aren acually relaed in he KB. On he oher hand, asking for he performance ype of Ellen DeGeneres in Finding Nemo doesn mach he word characer. This leaves us wih he correc inerpreaion of asking for her characer in he movie. 4. SYSTEM DETAILS In his secion, we describe he deails of our sysem, called Aqqu. Aqqu works by generaing query candidaes for each

4 Templae Example Candidae Quesion #1 e 1 r 1 Scrabble invenor who invened scrabble? #2 e 1 r 1 m r 2 Henry Ford employmen m company wha company did henry ford work for? #3 e 1 r 1 m r 3 r 2 e2 Ellen DeGeneres film performance m film characer Finding Nemo wha characer does ellen play in finding nemo? Figure 1: Query emplaes and example candidaes wih corresponding quesions. A query emplae can consis of eniy placeholders e, relaion placeholders r, an inermediae objec m and he answer node. quesion. These query candidaes are hen ranked using a learned model. The op-ranked query is hen reurned (or no answer in case he se of candidaes was empy). The following subsecions describe he candidae generaion and ranking in deail. The previous secion explained he process by an example. 4.1 Eniy maching The goal of he eniy maching phase is o idenify all eniies from he knowledge base ha mach a par of he quesion. The mach can be lieral, or via an alias of he eniy name. POS-agging We POS-ag he quesion using he Sanford agger [17]. For eniy maching (his subsecion), we make use of he ags NN (noun) and NNP (proper noun). For relaion maching (Secion 4.3), we also make use of he ags VB (verb) and JJ (adjecive). Subsequence generaion We generae he se S of all subsequences of words from he quesion, wih he following wo resricions. Firs, a subsequence consising of a single word mus be agged NN. Second, a subsequence mus no spli a sequence of words agged NNP; ha is, when i sars (ends) wih a word agged NNP, i mus no be preceded (succeeded) by a word agged NNP. Find maching eniies For each s S, we compue he lis of all eniies from he knowledge base ha have s as heir name or alias. We use a map from phrases (he aliases) o liss of eniies (he eniies wih he respecive alias) obained from he CrossWikis daase [20]. CrossWikis was buil by mining he anchor ex of links o Wikipedia eniies (aricles) from various large web-crawls. CrossWikis covers around 4 million eniies from Wikipedia. Almos all of hese eniies also exis in Freebase, ogeher wih a link o he respecive Wikipedia eniy. For he remaining Freebase eniies, we only consider he lieral name mach. Overall, we are able o recognize around 44 million eniies wih abou 60 million aliases. We have also experimened wih he aliases provided by Freebase, bu hey end o be much more noisy (wrong aliases) and less complee (imporan aliases missing). Scores for he eniy maches We compue a score for each mach s, e compued in he previous sep, where s is a subsequence of words from he quesion and e is an eniy from Freebase wih alias s. Consider a fixed alias s. CrossWikis also provides us wih a probabiliy disribuion p cross(e s) over he Wikipedia eniies e wih alias s. Le e be a Freebase eniy ha is no conained in CrossWikis. Le e max be he CrossWikis eniy wih he highes p cross(e s). Tha is, e max is he mos likely Wikipedia eniy for alias s. Le p free (e s) = p(e max s) pop(e )/pop(e max ), where pop is he (alias-independen) populariy score of an eniy, as described in he nex subsecion. Inuiively, p free (e s) esimaes he probabiliy ha e has alias s via is relaive populariy o he mos likely Wikipedia eniy for s. We merge p cross(e s) and p free (e s) ino one probabiliy disribuion by simply normalizing he probabiliies o sum 1. Populariy scores for each eniy For each eniy, we also compue a (mach-independen) populariy score. We simply ake he number of imes he eniy is menioned in he ClueWeb12 daase [9], according o he annoaions provided by Google [13]. The populariy scores are used for he eniy mach scores above. They also yield wo feaures used in ranking each candidae; see Secion Candidae generaion Based on he eniy maches, we compue a se of query candidaes as follows. We generae he query candidaes in hree (disjoin) subses, one for each of he hree emplaes shown in Figure 1. Each emplae sands for a query wih a paricular kind of srucure. These hree emplaes cover almos all of he quesions in he Free917 and WebQuesions benchmarks. Le E be he se of all eniies mached o a subsequence of he quesion, as described in he previous secion. Templae 1 For each e E, find all relaions r such ha here is some riple (e, r, ) in he knowledge base. We obain hese via a single SPARQL query for each e. Templae 2 For each e E, find all r 1, r 2, m such ha here are wo riples (e, r 1, m) and (m, r 2, ) in he knowledge base, where r 1 and r 2 are relaions and m is a mediaor eniy. We obain hese as follows. For each e, we use a single SPARQL query o obain all maching r 1. For each e, r 1, we hen use anoher SPARQL query o obain all maching r 2. Noe ha m remains a variable in he query candidae. Templae 3 For all pairs of eniies e 1, e 2 E such ha he wo subsequences mached in he quesion do no overlap, find all r 1, r 2, r 3 such ha here are hree riples (e 1, r 1, m), (m, r 2, e 2), and (m, r 3, ) in he knowledge base, where r 1, r 2, r 3 are relaions and m is a mediaor eniy. We obain hese as follows. For each eniy e, we precompue he lis of all (r, m) such ha m is a mediaor eniy and he riple (e, r, m) exiss in he knowledge base. The lis is sored by he ids of he mediaor eniies. For given e 1, e 2 like above, we hen inersec he liss for e 1 and e 2. For each

5 mediaor m in he inersecion, we hen obain all r 3 via a simple SPARQL query. In he query candidae, m remains a variable. 4.3 Relaion maching Le C be he se of query candidaes compued in he previous subsecion. For each query candidae c C, le RW c be he se of lemmaized 5 words from he relaions from c (here can be one, wo, or hree relaions, depending on he emplae from which c was generaed). We compue how well he words from RW c mach he subse QW of lemmaized words from he quesion ha are no already mached by he eniies from c. We consider four kinds of maches, described in he following: lieral, derivaion, synonym, conex. For each of hese four kinds of maches, we compue a non-negaive score (which is zero, if here is no mach a all). I can happen ha all four of hese scores are zero. In he basis version of our sysem, we keep such candidaes, in a varian we prune hem; see Secion 4.7. Lieral maches This score is simply he number of pairs w, q, where w RW c and q QW and w = q. Almos all quesions have no repeaed words; in ha case, his score is jus he number of relaion words ha occur in he quesion (and are no already mached by an eniy). Derivaion maches This score is he number of pairs w, q, where w RW c and q QW and w is derivaionally relaed o q. Here we also consider he POS-ag of w in he quesion. We precompue a map from POS-agged words o derivaions using WordNe [11]. We exrac derivaion links for verbs and nouns (e.g. produce.vb - producer.nn and vice versa). We also exrac aribue links beween adjecives and heir describing aribue (e.g., high.jj - heigh.nn ). We exend hese links wih synonyms of he noun in Word- Ne (e.g. high.jj - elevaion.nn ). Synonym maches For each w RW c and q QW, add s o his score if w is a synonym of q wih similariy s. We compue he similariy beween wo words by compuing he cosine similariy beween he associaed word vecors. We use 300-dimensional word vecors ha were compued wih Google s word2vec on a news ex corpus of size around 100 billion words. 6 We consider only synonyms, where he score is 0.4. This hreshold is based on observaion, bu chosen very liberally: many word pairs wih score above ha hreshold are no wha humans would call real synonyms, bu almos all such real synonyms have a score above ha hreshold. Conex maches For his score, we precompue weighed indicaor words for each relaion from our knowledge base. These are words which are no necessarily synonyms of words in he relaion name, bu are used in ex o express ha relaion; see below for an example. The score is hen he sum of he weighs of all words in QW ha are indicaors for one of he relaions from he query candidae. For emplaes 2 and 3, we consider r1.r2 as one relaion. We learn indicaor words using disan supervision [18] as follows. Firs, we idenify eniy menions in Wikipedia using Wiki markup and a se of simple heurisics for coreference resoluion, as described in [1]. We also idenify 5 For example, founded found and was be. 6 hps://code.google.com/p/word2vec/ daes and values using SUTime [8]. For he 23 million senences ha conain a leas wo eniies (including daes or values), we compue a dependency parse using [17]. For each pair e 1, e 2 of eniies occurring in a senence, we look up all relaions r in he knowledge base ha connec hem. We also rea relaions r1.r2 ha connec he eniies via a mediaor as a single binary relaion r. If he shores pah beween e 1 and e 2 in he dependency parse has lengh a mos four, we consider all words along ha pah as indicaor words for r. We also experimened wih considering all words in he senence, or words along longer pahs, bu hese gave considerably worse resuls. We find abou 4.7 million senences ha mach a leas one relaion his way. For example, we can hus learn ha born is an indicaor word for he relaion place of birh from he following senence (assuming ha our knowledge base conains he respecive fac): Andy Warhol was born on Augus 6, 1928 in Pisburgh. Noe ha from he same senence, we can also learn ha born is an indicaor word for he relaion dae of birh. To disinguish beween he wo, we need some kind of answer ype maching; his is described in Secion 4.4. We compue he weighs for he indicaor words in he following IR-syle fashion. Consider each relaion as a documen consising of he words exraced for ha relaion. Then compue f.idf scores for all he words in hese (relaion) documens in he usual way. For each relaion, hen only consider he op-1000 words and sum up heir f.idf scores. The weigh for each word in a (relaion) documen is hen is f.idf score divided by his sum. This could also be inerpreed as a probabiliy disribuion p(w r) over words w given a relaion r. 4.4 Answer ype maching For each candidae, we perform a simple bu effecive binary check based on he relaion leading o he answer (r1, r2, r3 for emplaes 1,2 and 3, respecively). We precompue a lis of arge ypes for each relaion r by couning he ypes of objecs o in all riples (, r, o), keeping only he op en percen of mos frequen ypes. For quesions saring wih who, we check wheher he compued arge ypes conain he ype person, characer, or organizaion. For quesions saring wih where, we check wheher he relaion leads o a locaion or an even. For quesions saring wih when or since when, we check wheher he ype is a dae; for all oher quesions, he check for arge objecs of ype dae is negaive. As our evaluaion and error analysis shows, hese simple heurisics work reasonably well for he Free917 and Web- Quesions benchmarks. The reason is ha our eniy and relaion maching already provide ample informaion for discriminaing beween candidaes. However, as explained in Secion 4.3, a quesion word like born alone does no permi discriminaion beween he wo relaions place of birh and dae of birh. However, i is exacly hose cases ha can be easily discriminaed wih he simple answer-ype check from above. We leave elaborae answer-ype deecion (which has been addressed by many QA sysems) o fuure work. 4.5 Candidae feaures The previous subsecions have shown wo hings. Firs, how we generae query candidaes for a given quesion. Sec-

6 ID Descripion 1 number of eniies in he query candidae 2 number of eniies ha mached exacly wih heir name, or wih a high probabiliy (> 0.8) 3 number of okens of all eniies ha mached lierally as per he previous feaure 4-5 average (4) and sum (5) of eniy mach probabiliies 6-7 average (6) and sum (7) of eniy mach populariies 8 number of relaions in mached emplae 9 number of relaions ha were mached lierally via heir name number of okens ha mached a relaion of kind: lieral (10), derivaion (11), synonym (12), conex (13) 14 sum of synonym mach scores 15 sum of relaion conex mach scores 16 number of imes he answer relaion (r 1, r 2, r 3 for emplaes 1, 2 and 3 respecively) occurs in he KB 17 a value beween 0 and 1 indicaing how well he relaion maches according o n-gram feaures (Secion 4.5) 18 sum of feaures 3 and 10; ha is, he number of okens maching a relaion or eniy lierally 19 number of okens ha mach an eniy or relaion divided by he oal number of okens in quesion wheher he resul size is 0 (feaure 20), 1-20 (feaure 21), or larger han 20 (feaure 22); all binary 23 binary resul of he answer-ype check (Secion 4.4) Table 1: Feaures used by our ranking approaches. Top/middle/boom: feaures for eniy maches/feaures for relaion maches/combined or oher feaures. ond, how we compue various scores for each candidae ha measure how well he eniies and relaions from he candidae mach which pars of he quesion. In his subsecion, we show how we generae a feaure vecor from each candidae. Mos of hese feaures are based on he scores jus menioned. Anoher imporan feaure, described below, serves o learn he correspondence beween n-grams from he quesion and relaions from query candidaes. Table 1 provides an overview over all our feaures. In he descripion below, we refer o he feaures by heir ID (firs column in he able). In Secion 4.6, we show how we rank candidaes based on hese feaure vecors. Eniy/Relaion maching feaures Feaures 1-7 are based on he resuls from he eniy maching described in Secion 4.1. Feaures 8-16 are based on he resuls from he relaion maching sep described in Secion 4.3. Feaures 18 and 19 quanify he number of words in he quesion covered by eniy or relaion maches (feaure 18 = lierally, feaure 19 = in any way). Feaures quanify he resul size. This is imporan, because some candidaes produce huge resul sizes or empy resuls ses, which are boh rare. Feaure 23 is he binary oupu of he simple answer-ype check from Secion 4.4. N-Gram relaion maching feaure This feaure considers correspondences beween words (unigrams) or wo-word phrases (bigrams) in he quesion and he relaion in he query candidae. For example, in he WebQuesion benchmark, he quesion who is... almos always asks for he profession of a person. Such a correspondence canno be learned by any of he mechanisms described in Secion 4.3. We learn his feaure as follows. For each query candidae, we generae all unigrams and bigrams of he lemmaized words of he quesion. The mached eniies (Secion 4.1) are replaced wih a special word eniy. For each n-gram, we hen creae an indicaor feaure by appending he n-gram o he relaion names of he candidae. For example, for emplae 2 from Figure 1, one of he feaures would be employmen.company+work for he uni-gram work and he relaions employmen.company. We hen rain an L2-regularized logisic regression classifier wih all correc candidaes as posiive examples and all ohers as negaive examples. The value of feaure 17 is simply he (probabiliy) oupu by his classifier. This feaure will be par of a subsequen sep o learn a ranking ha uses he same raining daa. To provide realisic feaure values (ha aren overfi) we proceed as follows. Spli he raining daa ino six folds. In urn, leave ou one fold and rain he n-gram feaure classifier on he remaining folds. Then, for each example in he lef-ou fold compue he n-gram feaure value. Use his compued value as par of he raining daa for subsequen learning. 4.6 Ranking For each quesion, we finally rank he query candidaes using he feaure vecors described in he previous subsecion. The op-ranked query candidae is hen used o provide he answer. We say no answer only when he se of candidaes is empy; his is discussed in Secion 4.7 below. 7 We have experimened wih sae-of-he-ar echniques for he learning-o-rank approach from IR [14] [16], including: RankSVM [14], RankBoos [12], LambdaRank [6] and AdaRank [23]. These only lead o moderae resuls and were ouperformed by our approaches described below. We presume ha his is because our ranking problem is degenerae. In paricular, each query is only associaed wih a single relevan answer. This is differen from a ypical IR scenario where a query usually has several answers, someimes wih varying degrees of relevance. We invesigae wo varians o obain a ranking: poinwise ranking and pairwise ranking. These approaches are inspired by he learning-o-rank approaches from IR. 7 Boh benchmarks conain a considerable number of quesions saring wih how many..., asking for a coun. We simply replace how many by wha in hese quesions, and coun he size of he resul se (unless he answer already is a coun).

7 Poinwise ranking In he poinwise ranking approach we compue a score for each candidae. Candidaes are sored by his score o infer a ranking. The score is compued by a classifier learned on he candidae feaures (see Secion 4.5) and raining daa. We creae raining daa by using he correc candidae of each quesion as posiive examples and all oher candidaes as negaive examples. A drawback of he poinwise approach is ha he model compares quesion-independen examples. Tha is, correc (incorrec) query candidaes of quesions of differen ype and difficuly are in he same correc (incorrec) class, when in pracice i is no necessary o compare or discriminae beween hem. Pairwise ranking In he pairwise ranking approach, we ransform he ranking problem ino a binary classificaion problem. The idea is o learn a classifier ha can predic for a given pair of candidaes, wheher one should be ranked before he oher. To infer a ranking, we sor he lis of candidaes using he learned preference relaion. This works very well in pracice, alhough our learning does no guaranee ha he learned relaion is ransiive or ani-symmeric. We have experimened wih wo alernaives o soring. Simply compuing he maximum urned ou o perform badly. This makes sense, because he maximum has o survive a larger number of comparisons. Following [10], we have also sored he candidaes by heir number of won comparisons agains all oher candidaes. The resuls were idenical o hose for soring, bu his mehod requires Θ(n 2 ) comparisons for n candidaes. To rain he classifiers we creae raining examples in he following way. For a quesion wih n query candidaes, randomly selec n/2, bu a leas 200 candidaes (or n if n/2 < 200). This is o guaranee ha we have enough raining examples for quesions wih few candidaes and o avoid puing oo much emphasis on quesions ha have more han 200 candidaes. 8 Then, for each randomly seleced candidae r i and he correc candidae c, where r i c, creae a posiive example pair (c, r i) and a negaive example pair (r i, c). The feaure represenaion for a pair (a, b) is a uple of he individual feaure vecors and heir difference: φ pair(a, b) = (φ(a) φ(b), φ(a), φ(b)), where φ is a funcion exracing he feaures in Table 1. Boh ranking approaches, poinwise and pairwise, require a classifier. Here, we consider wo differen opions. Linear A logisic regression classifier. In iniial experimens, oher linear models, such as linear SVMs, have shown similar performance. Logisic regression is also known o oupu well calibraed probabiliies and performs well in high-dimensional feaure spaces. We rain he model using L-BFGS-B [26]. To avoid over-fiing we apply L2- regularizaion choosing he regularizaion srengh using 6- fold cross-validaion on he raining se. Random fores We learn a fores of decision rees [5]. Random foress are able o learn non-linear decision boundaries, require few hyperparameers, are simple o rain, and are known o perform very well on a variey of asks. 8 Our sysem generaes around 200 candidaes on average for a random quesion, bu he exac value had lile effec on performance in our evaluaion. 4.7 Candidae pruning Some quesions may have no answers in he knowledge base. Our sysem, as described so far, reurns no answer only when he se of query candidaes is empy. However, as also described, his would rarely happen, since here are maching eniies for every quesion, and we do no require ha he relaions mach any of he words in he quesion. 9 We consider wo varians of our sysem o deal wih his problem: (1) omiing he n-gram feaure, and using hard pruning; and (2) keeping he n-gram feaure, and using a pruning-classifier. Noe ha a nice side-effec of pruning is ha i speeds up he ranking process because i needs o consider less candidaes. Wihou n-grams, wih hard pruning When omiing he n-gram feaure, here is no reason o keep candidaes wih he wrong answer ype or where feaures 9-15 are all zero. The naural approach is hen o prune such candidaes before we do he ranking; his is wha we call hard pruning. Hard pruning naurally leads o empy candidae ses for some queries. Indeed, on he Free917 benchmark, 10 quesions have no answer, and our hard pruning yields an empy candidae se for 7 of hem. Wih n-grams, wih a pruning classifier When keeping he n-gram feaure, hard pruning as jus described would be counerproducive. As explained in Secion 4.5, he answers for he who is... quesions from he WebQuesions benchmark are professions. They would be eliminaed when hard pruning by answer ype. Also, he profession relaion maches no words from hese quesions. They would hence also be eliminaed when hard pruning if feaures 9-15 are all zero. The goal of he pruning classifier is o weed ou only he obviously bad candidaes. For example, candidaes ha do no mach he answer ype, have bad relaion maches, and a weak n-gram feaure. We rain he pruning classifier in he same way as he poinwise classifiers (see above) wih he feaures from Table 1 using logisic regression. To opimize he classifier for recall we adjus example weighs so ha posiive candidaes have wice he weigh of negaive candidaes. Before he ranking sep, we apply he classifier o each candidae and only keep candidaes classified posiively. 5. EVALUATION We perform an exensive evaluaion of our sysem. In Secion 5.1, we provide more deails on our wo benchmarks. In Secion 5.2, we describe he evaluaion measures used. In Secion 5.3, we describe he sysems we evaluae and compare o. In Secion 5.4, we provide our main resuls followed by a deailed analysis in Secion Daa We use all of Freebase as our knowledge base (2.9 billion facs on 44 million eniies). Noe ha our approach is no ailored o Freebase and could easily be adaped o anoher knowledge base, e.g., WikiDaa 10. Daases We evaluae our sysem on wo esablished benchmarks: Free917 and WebQuesions. Each benchmark consiss of a se of quesions and heir answers from Freebase. 9 In ha case, feaures 9-15 are all zero; however, he n-gram feaures could sill be posiive. 10 hp://

8 The benchmarks differ subsanially in he ypes of quesions and heir complexiy. Free917 conains 917 manually generaed naural language quesions [7]. The quesions cover a wide range of domains (81 in oal). Two examples are wha fuel does an inernal combusion engine use and how many floors does he whie house have. The mos common domains, film and business, only make up 6% of he quesions [7]. All quesions are grammaical and end o be ailored o Freebase. The daase provides a ranslaion of each quesion ino a SPARQL-equivalen form. We execue he SPARQL queries o obain a gold answer for each quesion. [7] also provide an eniy lexicon: a mapping from exac ex o he menioned eniy for all eniies appearing in he quesions. This lexicon consiss of 1014 differen eniies. I was used for idenifying eniies by all sysems reporing resuls on he daase so far. We only make use of his lexicon where explicily saed. To repor resuls, we use he original spli of he quesions by [7] ino 70% (641) quesions o rain and 30% quesions (276) o es. WebQuesions consiss of 5,810 quesions ha were seleced by crawling he Google sugges API [2]. Conrary o Free917, quesions are no necessarily grammaical and are more colloquial. For example: where did jackie kennedy go o college and wha is spoken in czech republic. Due o how hey were seleced, he quesions are biased owards opics ha are frequenly asked from Google. According o [19], he people domain alone makes up abou 7% of quesions. Furhermore, he srucure of quesions ends o be simpler. Mos quesions only require a single eniy wih an answer relaion [2]. Answers o he quesions were obained by using crowdsourcing. This inroduces addiional noise; in paricular, for some quesions only a subse of he correc answer is provided as gold answer. We use he original rain-es spli of he quesions by [2] ino 70% (3,778 quesions) o rain and 30% (2,032 quesions) o es. 5.2 Evaluaion measures Given a benchmark and a sysem, denoe he quesions by q 1...q n, he gold answers by g 1...g n, and he answers from he sysem by a 1...a n. Noe ha an answer can consis of a single value (in paricular, a dae or a lieral) or a lis of values. We consider he following wo evaluaion measures. Accuracy The fracion of queries answered wih he exac gold answer: accuracy = 1 n n I(g i = a i) where I(e) is an indicaor funcion reurning one if expression e is rue and zero else. This is reasonable on Free917 which provides perfec gold answers. Average F1 The average F1 across all quesions: average F1 = 1 n F 1(g i, a i) n where he funcion F 1 compues F1 in he regular way. This accouns for parially correc resuls, which is reasonable for WebQuesions, where gold answers are someimes incomplee. In our evaluaion we focus on accuracy for Free917 and average F1 for WebQuesions. These are he mos repored and mos inuiive measures for hese daases. We also performed he evaluaion wih oher measures ha were used i=1 i=1 Free917 WebQuesions Mehod Accuracy+ Accuracy Average F1 Cai+Yaes 59 % Jacana 35.4 % Sempre 62 % 52 % 35.7 % Kwia. e al 68 % Bordes e al 39.2 % ParaSempre 68.5 % 46 % 39.9 % Aqqu 76.4 % 65.9 % 49.4 % Table 2: Resuls on he Free917 (267 quesions) and WebQuesions (2032 quesions) es se. For he resuls in he second column (Accuracy+) a manually crafed eniy lexicon was used. in previous work, e.g., varians of F1 as defined in [15] and [25]. These provided no new insighs and srongly correlaed wih he measures above. 5.3 Sysems evaluaed We evaluae and compare he following sysems. See Secion 2 for a brief descripion of he sysems from previous work. If we (re-)produced resuls, we explicily sae so. Oherwise, we repor exising resuls. Cai+Yaes The semanic parser developed by [7]. Kwia. e al The semanic parser by [15]. Sempre The semanic parser by [2]. We produced resuls for Free917 wihou an eniy lexicon using he provided code. 11 ParaSempre The semanic parser suggesed by [3]. We used he code provided by he auhors 11 o produce resuls on Free917 wihou an eniy lexicon. GraphParser The semanic parser developed by [19]. We repor resuls obained from he code provided by he auhors 12. The resuls from heir code slighly deviaes from he resuls repored in heir paper. Jacana The informaion exracion based approach by [25]. We repor updaed resuls from [24]. Bordes e al The embedding-based model by [4]. Aqqu Our sysem, as described in Secion 4. We wan o sress ha we use he exac same sysem on boh benchmarks. As shown in Secion 5.5 below, resuls can be furher improved by adaping he feaure se o he benchmark. However, we consider his overfiing. Noe ha all of he sysems above, excep Sempre and ParaSempre, were only evaluaed on one of he wo benchmarks. 5.4 Main resuls Table 2 shows he resuls on he es ses for Free917 and WebQuesions for all he sysems from Secion 5.3. Graph- Parser is discussed separaely below, because i was evaluaed only on a subse of quesions. On Free917, Aqqu improves in accuracy over he bes previous sysems by 8% wih an eniy lexicon, and by 14% wihou eniy lexicon. Performance drops considerably for all sysems when no using an eniy lexicon. This shows 11 hp://gihub.com/percyliang/sempre 12 hp://gihub.com/sivareddyg/graph-parser

9 Top-2 Top-3 Top-5 Top-10 Free % 77.2 % 79.3 % 83.7 % WebQuesions 67.1 % 72.7 % 77.5 % 82.3 % Table 3: Top-k resuls on Free917 (op) and Web- Quesions (boom). Percenage of quesions wih he bes answer in he op-k candidaes. Free917 WebQuesions Mehod Acc+ Acc Avg F1 Aqqu-poin-lin 73.6 % 63.4 % 46.9 % Aqqu-poin-ree 74.3 % 63.0 % 47.9 % Aqqu-pair-lin 76.4 % 65.2 % 48.3 % Aqqu-pair-ree 76.4 % 65.9 % 49.4 % Table 4: Resuls for differen ranking varians on he es ses for Free917 and WebQuesions. For he resuls in he second column (Acc+) a manually crafed eniy lexicon was used. ha addressing eniy recogniion is an inegral par of he problem ha canno be ignored. Overall, we achieve an oracle accuracy (percenage of quesions where a leas one produced query candidae is correc) of 89.1% and 85.5%, wih and wihou eniy lexicon respecively. This indicaes ha here is sill room for improvemen for beer maching and ranking. On WebQuesions our sysem improves he sae of he ar by almos 10% in average F1. No sysem uses an eniy lexicon. Noe ha he WebQuesions benchmark is much harder and conains a considerable amoun of imperfec or wrong answers. Ou of a random sample of 55 quesions we found 9 quesions ha had a wrong answer, and 10 furher quesion ha had only a parially correc answer. This suggess ha he upper bound for average F1 is roughly around 80%. Our oracle average F1 is a 68.5%. [2] and [3] repor 48% and 63% respecively. Hence, we successfully idenify mos of he eniies and relaions. However, here is sill much room for improvemen in ranking and maching. GraphParser was evaluaed only on a subse of Freebase relaions. The auhors provide a rain-es spli of quesions for WebQuesions. Noe ha we didn resric our sysem o he specific relaions and ha GraphParser requires an eniy lexicon also on WebQuesions. Our sysem (wihou an eniy lexicon) scores an average F1 of 66.1 % compared o 40.5 % repored for GraphParser. The seleced subse of relaions and hus quesions seems o be considerably easier o answer for our sysem. 5.5 Deailed analysis Top-k resuls Table 3 shows he op-k resuls on he wo daases. A large majoriy of quesions can be answered from he op wo or hree candidaes. By providing hese inerpreaions and resuls (in addiion o he op-ranked candidae) o a user, many quesions can be answered correcly. Noe ha on WebQuesions some quesions only have an imperfec gold answer wih an F1 score smaller han one. Therefore, he percenage of bes answers in he op candidaes can be slighly larger han he resuling average F1. Ranking varians As described in Secion 4.6, we con- Free917 WebQuesions bes previous 52.0 % 39.9 % bes now 69.2 % 49.4 % no n-grams, all oher 69.2 % 39.6 % no n-grams, no li-mach 65.2 % 39.6 % no n-grams, no synonyms 61.6 % 28.2 % n-grams, all oher 65.9 % 49.4 % n-grams, no pruning 64.4 % 49.3 % n-grams, no synonyms 62.0 % 48.0 % n-grams, nohing else 18.1 % 43.8 % Table 5: Feaure analysis for Free917 and WebQuesions. No synonyms disables feaures and no li-mach feaures 2, 3, 9, 10 and 18. When no using he n-gram feaure a differen ype of candidae pruning is performed (see ex).. sider wo possible ranking mehods: poinwise (poin) and pairwise (pair), each wih wo differen ranking classifiers: logisic regression (lin) and random foress (ree). This gives a oal of four differen combinaions. Table 4 shows resuls of all four ranking varians wih he full feaures of Table 1. On boh benchmarks, pairwise ranking is more effecive han poinwise ranking. This is consisen wih our inuiion ha learning a pairwise comparaor is beer (see Secion 4.6). Furhermore, random foress are slighly more effecive han a weighed linear combinaion. We herefore use pairwise ranking wih random foress as a sandard choice. Feaure analysis To gain insigh ino which feaures are helpful we evaluae our sysem wih differen combinaions. Table 5 shows he resuls. Noe ha, as described in Secion 4.7, wihou he n-gram feaure hard pruning is applied. The following main observaions can be made. The n-gram feaure is exremely helpful on WebQuesions bu slighly derimenal on Free917. The WebQuesions benchmark conains many quesions ha are hard o answer wihou his kind of supervision, e.g., where is reggie bush from? (asking for he place of birh) or wha o do downown san francisco? (asking for ouris aracions). Our sysem is able o successfully learn imporan feaures for hese from he raining se. On he oher hand, he small Free917 benchmark covers a wide range of domains and relaions wih only few repeiions. N-gram feaures aren helpful on his daase, which is shown by he low performance when only using he n-gram feaure (18.1%). Noe ha he ranking and learning problem is inherenly more difficul when he number of possible candidaes increases. This is he case when no using hard pruning which goes along wih using he n-gram feaure (see Secion 4.7). This disadvanage canno be fully compensaed by he weak n-gram feaure and leaner pruning and, as a resul, he score drops by abou 3% for Free917. Sill, we consider i more imporan o have a single approach ha performs well on differen kinds of daases han o opimize for a single daase. Lieral feaures provide a small benefi for Free917 bu no benefi on WebQuesions. This is an arefac of he way Free917 was buil. Free917 quesions are ailored o Freebase, ofen using words from he relaion name as par of he

10 quesion. Synonym feaures are imporan for boh daases. They give a huge benefi on WebQuesions wihou he n- gram feaure bu only a small benefi on op of i. Finally, he pruning classifier used wih he n-gram feaure helps on Free917 because i allows o reurn no answer for some quesions ha have no answer in he knowledge base. The difference on WebQuesions (which always has an answer in he knowledge base) is no significan, and shows ha he pruning classifier doesn negaively affec performance. Manual error analysis We manually inspeced he errors our sysem makes. Many errors are due o misakes in he benchmarks (parially or compleely wrong gold answers) and inconsisencies in he knowledge base (differen relaions wih conradicing answers on he same piece of informaion). We provide a lis on our websie, see he link in Secion 1.1. On ha websie, we also provide a lis of errors due o our sysem. There is no single large class of errors worh poining ou hough. Efficiency We also evaluaed he performance of our sysem. The average response ime for a quesion is 644 ms for Free917 and 900 ms for WebQuesions. 13 None of he oher sysem from Secion 5.3 comes wih an efficiency evaluaion. For sysems ha provide code and for which we reproduced resuls, runimes are (a leas) several seconds per query. Training our sysem on he large WebQuesions benchmark akes abou 90 minues in oal. 6. CONCLUSION We have presened Aqqu, a new end-o-end sysem ha auomaically ranslaes a given naural-language quesion o he maching SPARQL query on a knowledge base. The sysem inegraes eniy recogniion and uilizes disan supervision and learning-o-rank echniques. We showed ha our sysem ouperforms previous sae-of-he-ar sysems on wo very differen benchmarks by 8% and more. Aqqu answers quesions ineracively, ha is, wihin one second. For around 80% of he queries, he correc answer is among he op-5 candidaes. This suggess ha a more ineracive approach, which asks he user s feedback for criical decisions (e.g., beween wo relaions), could achieve a significanly furher improved accuracy. 7. REFERENCES [1] H. Bas, F. Bäurle, B. Buchhold, and E. Haussmann. Broccoli: Semanic full-ex search a your fingerips. CoRR, abs/ , [2] J. Beran, A. Chou, R. Frosig, and P. Liang. Semanic Parsing on Freebase from Quesion-Answer Pairs. In EMNLP, pages , [3] J. Beran and P. Liang. Semanic Parsing via Paraphrasing. In ACL, pages , [4] A. Bordes, S. Chopra, and J. Weson. Quesion Answering wih Subgraph Embeddings. CoRR, abs/ , [5] L. Breiman. Random foress. Machine Learning, 45(1):5 32, Answer imes are averaged over hree runs on a server wih Inel E5649 CPUs, 90GB of RAM and warm SPARQL caches. [6] C. J. C. Burges, R. Ragno, and Q. V. Le. Learning o rank wih nonsmooh cos funcions. In NIPS, pages , [7] Q. Cai and A. Yaes. Large-scale Semanic Parsing via Schema Maching and Lexicon Exension. In ACL, pages , [8] A. X. Chang and C. D. Manning. Suime: A library for recognizing and normalizing ime expressions. In LREC, pages , [9] ClueWeb, The Lemur Projek. [10] W. W. Cohen, R. E. Schapire, and Y. Singer. Learning o order hings. JAIR, 10: , [11] C. Fellbaum. WordNe. Wiley Online Library, [12] Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficien boosing algorihm for combining preferences. JMLR, 4: , [13] E. Gabrilovich, M. Ringgaard, and A. Subramanya. FACC1: Freebase annoaion of ClueWeb corpora, Version 1. [14] T. Joachims. Opimizing search engines using clickhrough daa. In KDD, pages , [15] T. Kwiakowski, E. Choi, Y. Arzi, and L. S. Zelemoyer. Scaling Semanic Parsers wih On-he-Fly Onology Maching. In EMNLP, pages , [16] T. Liu. Learning o rank for informaion rerieval. Foundaions and Trends in Informaion Rerieval, 3(3): , [17] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Behard, and D. McClosky. The sanford corenlp naural language processing oolki. In ACL, pages 55 60, [18] M. Minz, S. Bills, R. Snow, and D. Jurafsky. Disan supervision for relaion exracion wihou labeled daa. In ACL, pages , [19] S. Reddy, M. Lapaa, and M. Seedman. Large-scale Semanic Parsing wihou Quesion-Answer Pairs. TACL, 2: , [20] V. I. Spikovsky and A. X. Chang. A Cross-Lingual Dicionary for English Wikipedia Conceps. In LREC, pages , [21] M. Seedman. The synacic process, volume 35. MIT Press, [22] C. Unger, C. Forascu, V. Lopez, A. N. Ngomo, E. Cabrio, P. Cimiano, and S. Waler. Quesion answering over linked daa (QALD-4). In CLEF 2014, pages , [23] J. Xu and H. Li. Adarank: a boosing algorihm for informaion rerieval. In SIGIR, pages , [24] X. Yao, J. Beran, and B. V. Durme. Freebase QA: Informaion Exracion or Semanic Parsing? In ACL, Workshop on Semanic Parsing, [25] X. Yao and B. V. Durme. Informaion Exracion over Srucured Daa: Quesion Answering wih Freebase. In ACL, pages , [26] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal. Algorihm 778: L-BFGS-B: Forran subrouines for large-scale bound-consrained opimizaion. ACM Trans. Mah. Sofw., 23(4): , 1997.

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák