Sentence Simplification for Question Generation

Sentence Simpification for Question Generation Feras A Tarouti and Juga Kaita Conor McGrory Department of Computer Science Department of Computer Science University of Coorado at Coorado Springs Princeton University Coorado Springs, Coorado 80918, USA Princeton, New Jersey 08544 {fatarou & jkaita}@uccs.edu cmcgrory@princeton.edu Abstract - Automatic generation of basic, factua questions from a singe sentence is a probem that has received a considerabe amount of attention. Some studies have suggested spitting this probem into two parts: first, decomposing the source sentence into a set of smaer, simpe sentences, and then transforming each of these sentences into a question. This paper outines a nove method for the first part, combining two techniques recenty deveoped for reated NLP probems. Our method uses a trained cassifier to determine which phrases of the source sentence are potentia answers to questions, and then creates different compressions of the sentence for each one. Index Terms Sentence Simpification, Question Generation. I. INTRODUCTION The abiity of a speaker to form a grammatica question to request a specific piece of information from another party is indispensabe in most practica situations invoving basic communication. Recenty, there has been a significant amount of research towards deveoping systems that can automaticay generate basic questions from input text. This is caed the probem of Question Generation (QG). Athough some studies in the past have tried to generate questions based on whoe bocks of text [1], the majority of recent work done on QG has focused on the probem of generating factua questions from a singe sentence. Eary attempts to sove this probem used compicated sets of grammatica rues to transform the input sentence directy into a question [2]. However, Heiman and Smith [3] suggested separating the probem into two steps: first, simpifying the source sentence, and then transforming it into a question. The advantage of this approach is that grammatica rues are much better at transforming simpe sentences than they are at transforming compex ones. Our paper outines a method for performing the first step, which we refer to as the probem of Simpified Statement Extraction (SSE). II. PRIOR WORK Two probems in NLP that are reated to QG are coze question generation and sentence compression. In a coze question, the student is asked, after reading the text, to compete a given sentence by fiing in a bank with the correct word. One exampe coud be the question A is a conceptua device used in computer science as a universa mode of computing processes. In this case, the answer woud be Turing machine. However, seecting which phrase(s) in the sentence to deete is somewhat difficut. A question ike A Turing Machine a conceptua device used in computer science as a universa mode of computing processes. with the verb is as the answer woud be competey useess to a student interested in testing knowedge of computer science. An automatic coze question generator needs to distinguish informative questions from extraneous ones. Because the quaity of a coze question can depend on reationships between a arge number of factors, to generate high-quaity questions, Becker et a. [4] train a ogistic regression cassifier on a corpus of questions paired with human judgments of their quaity. Sentence compression is the probem of transforming an input sentence into a shorter version that is grammatica and retains the most important semantic eements of the origina. Knight and Marcu [5] used a statistica anguage mode where the input sentence is treated as a noisy channe and the compression is the signa, whie Carke and Lapata [6] used a arge set of constituency parse tree manipuation rues to generate compressions. Heiman and Smith [7] deveoped a rue-based agorithm, which is caed Simpified Factua Statement Extractor (SFSE), that extracts mutipe simpe sentences from a source sentence. Whie traditiona sentence compression agorithms usuay compress a ong sentence into a singe short sentence, SFSE extracts one or more simpe sentences from a ong sentence. By doing so, the agorithm ensures that important information, which can be used to generate questions, is reserved. Each simpe sentence produced by the agorithm can be easiy converted into a question. The SFSE agorithm uses textua entaiment recognition to spit the compex sentences into a set of true simpe sentences given the origina sentence. There are two inguistic phenomena that the SFSE agorithm works on: semantic entaiment and presupposition. By extracting mutipe simpified statements from the source sentence, they increased the number of possibe questions that coud be generated. Kaady et a. [8] presented a rue-based agorithm for generating definitiona and factoid questions from a mutisentence source. Here, to generate definitiona questions, keywords from the source document are seected using a summarization system [9]. These keywords are caed Up- Keys. Then, the Up-Keys are mapped to simpe question

tempates. For instance, if the word Eboa is seected as a keyword, then, it woud be mapped to the tempate: <Question-word>is <Up-Key>? to generate the question What is Eboa?. To generate factoid questions, the source sentence is preprocessed to produce simpe causes by spitting the independent causes within the sentence and repacing pronouns. Then, using a tree reguar expression anguage, the agorithm tries to identify named entities, subject-auxiiaries, appositives, subject verb object structures, prepositiona phrases and adverbias. Finay, for each case of these patterns, a procedure is appied to generate a question. The authors evauated the system by comparing the questions generated by the system with manuay generated questions. The system scored an average precision of 0.46 and an average reca of 0.68. The authors reported that the overa quaity of the generated questions decreases as the ength of the source sentence increases. Fiippova and Strube [10] deveoped a method where a compressed sentence is generated by pruning the dependency parse tree of the input sentence. Using the Tipster corpus, they cacuated the conditiona probabiities of specific dependencies occurring after a given head word. These were used, in combination with data on the frequencies of the words themseves, to cacuate a score for each dependency in the tree. They then formuated the probem of compressing the sentence as an integer inear program. Each variabe corresponded to a dependency in the tree. A vaue of 1 meant the dependent word of that dependency woud be preserved in the compression, and a vaue of 0 meant that it woud be deeted. Constraints were added to restrict the structure and ength of the compression, and the objective function set to be maximized was the sum of the scores of the preserved dependencies. The centra assumption made by Fiippova and Strube s method is that the frequency with which a particuar dependency occurs after a given word is a good indicator of its grammatica necessity. III. SIMPLIFIED SENTENCE EXTRACTION A. Probem Statement We divide the process of QG into three major steps: answer seection, sentence simpification and question generation. Figure 1 shows the QG process appied on the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. In this work we focus on the answer seection and sentence simpification steps, which we refer to as simpified statement extraction (SSE).We define the probem of (SSE) as foows. For a source sentence S, create a set of simpified statements {si...sn} that are semantic entaiments of S. A sentence is considered a simpified statement if it is a decarative sentence (a statement) that can be directy transformed into a question-answer pair without any compression. Fig. 1 The process of question generation appied to the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. B. Soution Steps As Becker et a. [4] showed, there are certain phrases in S that make sense as answers to questions and others that do not. The idea behind our SSE system is that knowedge of which phrases in S are good answers can inform the compression process, preventing us from missing important information and thereby maximizing coverage. We sove the SSE probem in two parts: first identifying potentia answers, and then generating for each of these answers a compression of S where that answer is preserved. These compressions form the set {si} of simpified statements. Our goa when compressing for a particuar answer is to find the shortest grammatica compression of S that contains the given answer. To seect potentia answers from the input sentence, we use a sighty modified version of Becker et a. s coze question generation system. Once we have the set of possibe answers, we use a more substantiay modified version of Fiippova and Strube s [10] dependency tree pruning method to generate the set of shortest grammatica compressions of S that contain each of the answers. IV. ANSWER SELECTION We impemented the answer seection system using the Stanford NLP Tookit [11] and the Weka machine earning software [12]. It uses the corpus of sentences, QA pairs, and human judgments from Becker et a. [4] to train a cassifier to find the nodes in the parse tree of the input sentence that are most ikey viabe answers to questions. A. Feature Set The dependency reations identified by the Stanford NLP Tookit are a set of grammatica reations between governor Fig. 2 Transformation for the dependency tree of She mentioned that she worked in Appe and Microsoft.

and dependent words in a sentence [11]. Some exampes incude verb-subject, verb-indirect object, noun-modifier, and noun-determiner. For our purposes, we used the 56 basic reations defined in the Stanford ibrary to categorize a of our dependencies. Our features can be divided into three basic categories: token count features, syntactic features, and semantic features. The token count features contained 5 features which had to do with the ength of the answer in comparison to the ength of the sentence, ike the raw engths of both and the ength of the answer as a percentage of the ength of the question. Exampes of syntactic features we use are the Penn POS tag of the word [13] that comes immediatey before the answer, the tag of the word that comes immediatey after, and the set of tags of words contained in the answer phrase. The semantic features use the Stanford dependencies system and are competey different than the semantic features used by [4]. These incude the dependency reation between the head of the answer phrase and its governor in the sentence, the set of reations between governors in the answer and dependents not in the answer, the set of reations with both governors and dependents in the answer, and the distance in the constituency tree between the answer node and its maxima projection. B. Cassifier The cassifier used in our system is the Weka Logistic cassifier [14]. This is a binary ogistic regression cassifier, simiar to the one used by Becker et a [4]. C. Human Judgments The corpus provided by Becker et a. [4] consists of sighty over 2,000 sentences, each with a seected answer phrase and four human judgments of the quaity of the answer. Our program used the four judgments to cacuate a score for each answer, which we then used to determine how to cassify it in the data set. This score is then compared to the threshod We used the program to produce a data set from the Becker et a. corpus. This data set was created using a threshod vaue of 1.0 (a four human judges have to rate the sentence as Good ). A random sampe of the sentences was drawn from this data to produce a subset with a comparabe amount of Good and Bad sentences. This set contained a tota of 582 instances, 278 of which were Good and 304 of which were Bad. We tested both the Weka Logistic cassifier and the Weka Simpe Logistic cassifier on the data using 10-fod cross-vaidation. For the Logistic cassifier, the correct cassification rate was 72.3%, the true positive rate was 78.4%, and the fase positive rate was 33.2%. V. SENTENCE COMPRESSION Fiippova and Strub [10] deveoped an unsupervised sentence compression approach that compresses sentences by pruning unnecessary subtrees from the dependency tree of the sentence. Three processes are appied to the dependency tree to compress a sentence: transformation, compression and inearization. The tree transformation process is carried out in four steps: ROOT, VERB, PREP and CONJ. In the ROOT step, a root node is inserted to the tree. Then, in the VERB step, the root node is connected to a the infected verbs in the tree with edges abeed as s. After that, a auxiiary verbs edges are deeted from the tree and grammatica properties of the verbs are stored to be recovered ater. In the next step (PREP), a prepositiona nodes are repaced with abes on the edges which connect a head to the respective noun. Finay, in the CONJ step, for every chain of conjoined non-verb words, the chain is spit and each conjunction on it is connected directy to the head of the first of the chain using edges that have abes simiar to the edge connecting the first conjunction to the head. Figure 2 shows the transformation for the dependency Tabe I SAMPLES OF SIMPLIFIED SENTENCES ALONG WITH THEIR MFQ VALUES AND EVALUATIONS vaue (a pre-set constant in the program). If the score is greater than or equa to this vaue, the answer is cassified in the data set as Good. Otherwise, it is cassified as Bad. D. Resuts tree of the sentence She mentioned that she worked in Appe and Microsoft. The tree compression process is performed by removing edges from the dependency graph produced by the transformation process. To seect which edge shoud be removed from the graph, a score is computed for the subtree

connected by each edge. We first cacuate probabiities of dependencies occurring after head words and use this as an estimate of the grammatica necessity of different dependencies given the presence of a head word. Aong with a of the constraints paced on the ILP in the origina mode by Fiippova and Strube [10], we add an extra constraint that ensures the preservation of the answer phrase in the compression. We then use a inear program sover to sove the ILP for a ength vaues between 0 and the ength of S, generating a set of compressions of S with a possibe engths. From these compressions, we use a 3-gram mode to cacuate the Mean First Quartie (MFQ) grammaticaity metric described by Cark et a. [15]. Compressions with an MFQ vaue ower than a threshod are deemed grammatica, and the shortest of these is seected as the fina compression of S for the given answer. Finay, in the tree inearization process, the seected words are presented in the order they appear in the origina sentence. A. Dependency Probabiities In addition to the feature set used in the seection part of the system, we incuded additiona ones such as coapsed dependencies [11], which are created when cosed-cass words ike and, of, or by are made part of the grammatica reations. To cacuate the frequencies of dependencies after certain head words, we use a pre-parsed section of the Open American Nationa Corpus [16]. To prevent rounding errors, we used a smoothing function when cacuating the probabiities from the frequency data. Finay, to avoid probems that come with probabiity vaues of zero, our system ineary maps the smoothed probabiity P( h) vaues from [0,1] to [10-4,1]. B. Integer Linear Program We formuate the compression probem as an ILP. For each dependency with the Stanford type, hoding between head word h and dependent word w, we create variabe xh, w. These variabes must each take on a vaue of 0 or 1 in the soution, where dependencies whose variabes are equa to 1 are preserved in the resuting compression and dependencies whose variabes are equa to 0 are deeted, aong with their dependent words. The ILP maximizes the objective function h, w Ρ Ρ f ( x) = x t(, ) (, h) (, h ) x where t is the tweak function, which corrects discrepancies between frequency and grammatica necessity that occur with some specific types of dependencies. Fiippova and Strube used two constraints in their mode to preserve tree structure and connectedness in the compression. To ensure that a of the words in the pre-seected answer A are aso preserved, we incude in our mode the extra constraint w A, xh h, 1., w We soved these integer inear programs using p sove 1, an open-source LP and ILP sover. C. Shortest Grammatica Compression In order to find the shortest grammatica compression of S, our system first finds a soution to the ILP for S and A for every vaue of α (the maximum ength constraint parameter) between the ength of S and the ength of A. Because the constraints aso specify that every word in A is preserved in the compression, any mode where α is ess than the ength of A woud have no soution. To determine the grammaticaity of the compressions, we use the MFQ metric [15], which is created using the Berkeey Language Mode Tookit [17] and trained on the OANC text. It considers the og-probabiities of a of the n-grams in the given sentence, seects the first quartie (25% with the owest vaues), and cacuates the mean of the ratios of each n-gram og-probabiity over the unigram og-probabiity of that n- gram s head word. The arger the MFQ vaue is, the ess ikey the sentence is to be grammatica. Our system ooks through the ist of different ength compressions and seects the shortest compression with an MFQ vaue ess than a specified threshod (we used a threshod of 1.14). This compression is returned as the simpified statement extracted from S for the answer A. Tabe I shows MFQ vaues of some simpified sentences aong with their evauation. D. Resuts The functionaity of the compression system can be demonstrated with sampe outputs from the compressor. For exampe, given the sentence She mentioned that she worked in Appe and Microsoft, the simpified sentence extractor can determine that she, Appe, Microsoft are potentia answers for which a question generator can ask questions. For the answers Appe and Microsoft, the system generates as the compression She worked in Appe and Microsoft, which happens to be a compression of the origina sentence with the pre-identified answer preserved in it. This statement can now be passed to a question generator as a simpe sentence that can potentiay generate the question Where did she work in? or something simiar. VI. EVALUATION AND DISCUSSION To evauate our agorithm for (SSE), we compare it with the (SFSE) agorithm presented by Heiman and Smith [7]. The source sentences we use are compex sentences from the Simpe-Compex Sentence Pairs produced by [18]. The Simpe-Compex Sentence Pairs were coected from the Engish Wikipedia 2 and Simpe Engish Wikipedia 3. Simpe Wikipedia targets chidren and non-native Engish speakers. 1 http://sourceforge.net/projects/psove 2 http://en.wikipedia.org 3 http://simpe.wikipedia.org

Authors of Simpe Wikipedia use short sentences composed of easy words to write artices. The coected dataset incude 65,133 artices paired from Simpe Wikipedia and Wikipedia using dump fies downoaded from Wikimedia 4. We randomy seected a sampe of 85 compex senetences from the corpus. Our agorithm was abe to produce 215 compressed sentences, whie the SFSE agorithm was abe to produce 119 compressed sentences. To measure the performance of the agorithms, we compute the percentage of the correct compressed sentences produced by both methods. We asked independent human evauators to evauate the compressed sentences through a web appication. The evauators were asked if the agorithm produced a new shorter sentence and whether the new sentence is correct or not. As Figure 3 shows, our (SSE) agorithm was abe to produced new compressed sentences in 84.4% of the cases whie the (SFSE) agorithm was abe to produces new compressed sentences in 73.38% of the cases. Moreover, our (SSE) agorithm generated 43.3% correct sentences and 41.1% of incorrect sentences, whereas the (SFSE) agorithm generated 46.77% correct sentences and 26.77% of incorrect sentences. We notice here that our method produced more compressed sentences but with ower grammatica accuracy compared with the rue-based approach presented by [7]. We beieve that this is norma since we are using a statistica method for shortening the source sentences. When using ruebased methods, one has the advantage of controing the output. However, one major disadvantages of using rue-based methods is that it is imited to the impemented set of rues. Our resuts ceary show that the rue-based method produced fewer sentences compared with the statistica method we use. Another disadvantage of using a rue-based method is that it is aso imited to a singe anguage whereas statistica methods can be adapted to use with additiona anguages. VII. CONCLUSION The key principe on which our system is buit is that seecting the answer at the beginning of the QG process and using it to guide SSE can improve the coverage of the system. We impemented a machine earning-based approach for answer seection and deveoped a way to compress a sentence whie eaving a specified answer phrase intact. Athough we have not yet been abe to perform arge scae tests on this system where the output is rated by human judges, we have generated some good output sentences. In the near future, this system wi be integrated with a direct decarative-tointerrogative transformation system to produce a fu, functiona, QG system. Figure 3 The ratings of sentences produced by our (SSE) agorithm and the (SFSE) agorithm presented by [7]. 4 http://downoad.wikimedia.org REFERENCES [1] H. Kunichika, T. Katayama, T. Hirashima, and A. Takeuchi, Automated question generation methods for inteigent Engish earning systems and its evauation, in Proceedings of ICCE2004, 2003, pp. 2 5. [2] J. H. Wofe, Automatic question generation from text-an aid to independent study, in ACM SIGCUE Outook, vo. 10. ACM, 1976, pp. 104 112. [Onine]. Avaiabe: http://d.acm.org/citation.cfm?id=803459 [3] M. Heiman and N. A. Smith, Good question! statistica ranking for question generation, in Human Language Technoogies: The 2010 Annua Conference of the North American Chapter of the Association for Computationa Linguistics. Association for Computationa Linguistics, 2010, pp. 609 617. [4] L. Becker, S. Basu, and L. Vanderwende, Mind the gap: earning to choose gaps for question generation, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computationa Linguistics: Human Language Technoogies. Association for Computationa Linguistics, 2012, pp. 742 751. [5] K. Knight and D. Marcu, Statistics-based summarization-step one: Sentence compression, in AAAI/IAAI, 2000, pp. 703 710. [6] J. Carke and M. Lapata, Modeing Compression with Discourse Constraints. in EMNLP-CoNLL, 2007, pp. 1 11. [7] M. Heiman and N. A. Smith, Extracting simpified statements for factua question generation, in Proceedings of QG2010: The Third Workshop on Ques-tion Generation, 2010. [8] S. Kaady, A. Eikkotti, and R. Das, Natura anguage question generation using syntax and keywords, in Proceedings of QG2010: The Third Workshop on Question Generation, 2010, pp. 1 10. [9] R. Das and A. Eikkotti, Automatic Summarizer to aid a Q/A system, Internationa Journa of Computer Appications, vo. 1, no. 1, pp. 108 112, 2010. [10] K. Fiippova and M. Strube, Dependency tree based sentence compression, in Proceedings of the Fifth Internationa Natura Language

Generation Conference. Association for Computationa Linguistics, 2008, pp. 25 32. [11] M.-C. De Marneffe and C. D. Manning, The Stanford typed dependencies representation, in Coing 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evauation. Association for Computationa Linguistics, 2008, pp. 1 8. [12] G. Homes, A. Donkin, and I. H. Witten, WEKA: a machine earning workbench, in Proceedings of the 1994 Second Austraian and New Zeaand Conference on Inteigent Information Systems,1994, Nov. 1994, pp. 357 361. [13] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Buiding a Large Annotated Corpus of Engish: The Penn Treebank, Comput. Linguist., vo. 19, no. 2, pp. 313 330, Jun. 1993. [14] S. L. Cessie and J. C. V. Houweingen, Ridge Estimators in Logistic Regression, Journa of the Roya Statistica Society. Series C (Appied Statistics), vo. 41, no. 1, pp. 191 201, Jan. 1992. [15] A. Cark, G. Giorgoo, and S. Lappin, Statistica representation of grammaticaity judgements: the imits of n-gram modes, CMCL 2013, p. 28, 2013. [16] N. Ide and C. Maceod, The american nationa corpus: A standardized resource of american engish, in Proceedings of Corpus Linguistics 2001, vo. 3, 2001. [17] A. Paus and D. Kein, Faster and Smaer N-gram Language Modes, in Proceedings of the 49th Annua Meeting of the Association for Computationa Linguistics: Human Language Technoogies - Voume 1, ser. HLT 11. Stroudsburg, PA, USA: Association for Computationa Linguistics, 2011, pp. 258 267. [18] Z. Zhu, D. Bernhard, and I. Gurevych, A monoingua tree-based transation mode for sentence simpification, in Proceedings of the 23rd internationa conference on computationa inguistics. Association for Computationa Linguistics, 2010, pp. 1353 1361.