Sentence Simplification for Question Generation

Size: px
Start display at page:

Download "Sentence Simplification for Question Generation"

Transcription

1 Sentence Simpification for Question Generation Feras A Tarouti and Juga Kaita Conor McGrory Department of Computer Science Department of Computer Science University of Coorado at Coorado Springs Princeton University Coorado Springs, Coorado 80918, USA Princeton, New Jersey {fatarou & jkaita}@uccs.edu cmcgrory@princeton.edu Abstract - Automatic generation of basic, factua questions from a singe sentence is a probem that has received a considerabe amount of attention. Some studies have suggested spitting this probem into two parts: first, decomposing the source sentence into a set of smaer, simpe sentences, and then transforming each of these sentences into a question. This paper outines a nove method for the first part, combining two techniques recenty deveoped for reated NLP probems. Our method uses a trained cassifier to determine which phrases of the source sentence are potentia answers to questions, and then creates different compressions of the sentence for each one. Index Terms Sentence Simpification, Question Generation. I. INTRODUCTION The abiity of a speaker to form a grammatica question to request a specific piece of information from another party is indispensabe in most practica situations invoving basic communication. Recenty, there has been a significant amount of research towards deveoping systems that can automaticay generate basic questions from input text. This is caed the probem of Question Generation (QG). Athough some studies in the past have tried to generate questions based on whoe bocks of text [1], the majority of recent work done on QG has focused on the probem of generating factua questions from a singe sentence. Eary attempts to sove this probem used compicated sets of grammatica rues to transform the input sentence directy into a question [2]. However, Heiman and Smith [3] suggested separating the probem into two steps: first, simpifying the source sentence, and then transforming it into a question. The advantage of this approach is that grammatica rues are much better at transforming simpe sentences than they are at transforming compex ones. Our paper outines a method for performing the first step, which we refer to as the probem of Simpified Statement Extraction (SSE). II. PRIOR WORK Two probems in NLP that are reated to QG are coze question generation and sentence compression. In a coze question, the student is asked, after reading the text, to compete a given sentence by fiing in a bank with the correct word. One exampe coud be the question A is a conceptua device used in computer science as a universa mode of computing processes. In this case, the answer woud be Turing machine. However, seecting which phrase(s) in the sentence to deete is somewhat difficut. A question ike A Turing Machine a conceptua device used in computer science as a universa mode of computing processes. with the verb is as the answer woud be competey useess to a student interested in testing knowedge of computer science. An automatic coze question generator needs to distinguish informative questions from extraneous ones. Because the quaity of a coze question can depend on reationships between a arge number of factors, to generate high-quaity questions, Becker et a. [4] train a ogistic regression cassifier on a corpus of questions paired with human judgments of their quaity. Sentence compression is the probem of transforming an input sentence into a shorter version that is grammatica and retains the most important semantic eements of the origina. Knight and Marcu [5] used a statistica anguage mode where the input sentence is treated as a noisy channe and the compression is the signa, whie Carke and Lapata [6] used a arge set of constituency parse tree manipuation rues to generate compressions. Heiman and Smith [7] deveoped a rue-based agorithm, which is caed Simpified Factua Statement Extractor (SFSE), that extracts mutipe simpe sentences from a source sentence. Whie traditiona sentence compression agorithms usuay compress a ong sentence into a singe short sentence, SFSE extracts one or more simpe sentences from a ong sentence. By doing so, the agorithm ensures that important information, which can be used to generate questions, is reserved. Each simpe sentence produced by the agorithm can be easiy converted into a question. The SFSE agorithm uses textua entaiment recognition to spit the compex sentences into a set of true simpe sentences given the origina sentence. There are two inguistic phenomena that the SFSE agorithm works on: semantic entaiment and presupposition. By extracting mutipe simpified statements from the source sentence, they increased the number of possibe questions that coud be generated. Kaady et a. [8] presented a rue-based agorithm for generating definitiona and factoid questions from a mutisentence source. Here, to generate definitiona questions, keywords from the source document are seected using a summarization system [9]. These keywords are caed Up- Keys. Then, the Up-Keys are mapped to simpe question

2 tempates. For instance, if the word Eboa is seected as a keyword, then, it woud be mapped to the tempate: <Question-word>is <Up-Key>? to generate the question What is Eboa?. To generate factoid questions, the source sentence is preprocessed to produce simpe causes by spitting the independent causes within the sentence and repacing pronouns. Then, using a tree reguar expression anguage, the agorithm tries to identify named entities, subject-auxiiaries, appositives, subject verb object structures, prepositiona phrases and adverbias. Finay, for each case of these patterns, a procedure is appied to generate a question. The authors evauated the system by comparing the questions generated by the system with manuay generated questions. The system scored an average precision of 0.46 and an average reca of The authors reported that the overa quaity of the generated questions decreases as the ength of the source sentence increases. Fiippova and Strube [10] deveoped a method where a compressed sentence is generated by pruning the dependency parse tree of the input sentence. Using the Tipster corpus, they cacuated the conditiona probabiities of specific dependencies occurring after a given head word. These were used, in combination with data on the frequencies of the words themseves, to cacuate a score for each dependency in the tree. They then formuated the probem of compressing the sentence as an integer inear program. Each variabe corresponded to a dependency in the tree. A vaue of 1 meant the dependent word of that dependency woud be preserved in the compression, and a vaue of 0 meant that it woud be deeted. Constraints were added to restrict the structure and ength of the compression, and the objective function set to be maximized was the sum of the scores of the preserved dependencies. The centra assumption made by Fiippova and Strube s method is that the frequency with which a particuar dependency occurs after a given word is a good indicator of its grammatica necessity. III. SIMPLIFIED SENTENCE EXTRACTION A. Probem Statement We divide the process of QG into three major steps: answer seection, sentence simpification and question generation. Figure 1 shows the QG process appied on the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. In this work we focus on the answer seection and sentence simpification steps, which we refer to as simpified statement extraction (SSE).We define the probem of (SSE) as foows. For a source sentence S, create a set of simpified statements {si...sn} that are semantic entaiments of S. A sentence is considered a simpified statement if it is a decarative sentence (a statement) that can be directy transformed into a question-answer pair without any compression. Fig. 1 The process of question generation appied to the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. B. Soution Steps As Becker et a. [4] showed, there are certain phrases in S that make sense as answers to questions and others that do not. The idea behind our SSE system is that knowedge of which phrases in S are good answers can inform the compression process, preventing us from missing important information and thereby maximizing coverage. We sove the SSE probem in two parts: first identifying potentia answers, and then generating for each of these answers a compression of S where that answer is preserved. These compressions form the set {si} of simpified statements. Our goa when compressing for a particuar answer is to find the shortest grammatica compression of S that contains the given answer. To seect potentia answers from the input sentence, we use a sighty modified version of Becker et a. s coze question generation system. Once we have the set of possibe answers, we use a more substantiay modified version of Fiippova and Strube s [10] dependency tree pruning method to generate the set of shortest grammatica compressions of S that contain each of the answers. IV. ANSWER SELECTION We impemented the answer seection system using the Stanford NLP Tookit [11] and the Weka machine earning software [12]. It uses the corpus of sentences, QA pairs, and human judgments from Becker et a. [4] to train a cassifier to find the nodes in the parse tree of the input sentence that are most ikey viabe answers to questions. A. Feature Set The dependency reations identified by the Stanford NLP Tookit are a set of grammatica reations between governor Fig. 2 Transformation for the dependency tree of She mentioned that she worked in Appe and Microsoft.

3 and dependent words in a sentence [11]. Some exampes incude verb-subject, verb-indirect object, noun-modifier, and noun-determiner. For our purposes, we used the 56 basic reations defined in the Stanford ibrary to categorize a of our dependencies. Our features can be divided into three basic categories: token count features, syntactic features, and semantic features. The token count features contained 5 features which had to do with the ength of the answer in comparison to the ength of the sentence, ike the raw engths of both and the ength of the answer as a percentage of the ength of the question. Exampes of syntactic features we use are the Penn POS tag of the word [13] that comes immediatey before the answer, the tag of the word that comes immediatey after, and the set of tags of words contained in the answer phrase. The semantic features use the Stanford dependencies system and are competey different than the semantic features used by [4]. These incude the dependency reation between the head of the answer phrase and its governor in the sentence, the set of reations between governors in the answer and dependents not in the answer, the set of reations with both governors and dependents in the answer, and the distance in the constituency tree between the answer node and its maxima projection. B. Cassifier The cassifier used in our system is the Weka Logistic cassifier [14]. This is a binary ogistic regression cassifier, simiar to the one used by Becker et a [4]. C. Human Judgments The corpus provided by Becker et a. [4] consists of sighty over 2,000 sentences, each with a seected answer phrase and four human judgments of the quaity of the answer. Our program used the four judgments to cacuate a score for each answer, which we then used to determine how to cassify it in the data set. This score is then compared to the threshod We used the program to produce a data set from the Becker et a. corpus. This data set was created using a threshod vaue of 1.0 (a four human judges have to rate the sentence as Good ). A random sampe of the sentences was drawn from this data to produce a subset with a comparabe amount of Good and Bad sentences. This set contained a tota of 582 instances, 278 of which were Good and 304 of which were Bad. We tested both the Weka Logistic cassifier and the Weka Simpe Logistic cassifier on the data using 10-fod cross-vaidation. For the Logistic cassifier, the correct cassification rate was 72.3%, the true positive rate was 78.4%, and the fase positive rate was 33.2%. V. SENTENCE COMPRESSION Fiippova and Strub [10] deveoped an unsupervised sentence compression approach that compresses sentences by pruning unnecessary subtrees from the dependency tree of the sentence. Three processes are appied to the dependency tree to compress a sentence: transformation, compression and inearization. The tree transformation process is carried out in four steps: ROOT, VERB, PREP and CONJ. In the ROOT step, a root node is inserted to the tree. Then, in the VERB step, the root node is connected to a the infected verbs in the tree with edges abeed as s. After that, a auxiiary verbs edges are deeted from the tree and grammatica properties of the verbs are stored to be recovered ater. In the next step (PREP), a prepositiona nodes are repaced with abes on the edges which connect a head to the respective noun. Finay, in the CONJ step, for every chain of conjoined non-verb words, the chain is spit and each conjunction on it is connected directy to the head of the first of the chain using edges that have abes simiar to the edge connecting the first conjunction to the head. Figure 2 shows the transformation for the dependency Tabe I SAMPLES OF SIMPLIFIED SENTENCES ALONG WITH THEIR MFQ VALUES AND EVALUATIONS vaue (a pre-set constant in the program). If the score is greater than or equa to this vaue, the answer is cassified in the data set as Good. Otherwise, it is cassified as Bad. D. Resuts tree of the sentence She mentioned that she worked in Appe and Microsoft. The tree compression process is performed by removing edges from the dependency graph produced by the transformation process. To seect which edge shoud be removed from the graph, a score is computed for the subtree

4 connected by each edge. We first cacuate probabiities of dependencies occurring after head words and use this as an estimate of the grammatica necessity of different dependencies given the presence of a head word. Aong with a of the constraints paced on the ILP in the origina mode by Fiippova and Strube [10], we add an extra constraint that ensures the preservation of the answer phrase in the compression. We then use a inear program sover to sove the ILP for a ength vaues between 0 and the ength of S, generating a set of compressions of S with a possibe engths. From these compressions, we use a 3-gram mode to cacuate the Mean First Quartie (MFQ) grammaticaity metric described by Cark et a. [15]. Compressions with an MFQ vaue ower than a threshod are deemed grammatica, and the shortest of these is seected as the fina compression of S for the given answer. Finay, in the tree inearization process, the seected words are presented in the order they appear in the origina sentence. A. Dependency Probabiities In addition to the feature set used in the seection part of the system, we incuded additiona ones such as coapsed dependencies [11], which are created when cosed-cass words ike and, of, or by are made part of the grammatica reations. To cacuate the frequencies of dependencies after certain head words, we use a pre-parsed section of the Open American Nationa Corpus [16]. To prevent rounding errors, we used a smoothing function when cacuating the probabiities from the frequency data. Finay, to avoid probems that come with probabiity vaues of zero, our system ineary maps the smoothed probabiity P( h) vaues from [0,1] to [10-4,1]. B. Integer Linear Program We formuate the compression probem as an ILP. For each dependency with the Stanford type, hoding between head word h and dependent word w, we create variabe xh, w. These variabes must each take on a vaue of 0 or 1 in the soution, where dependencies whose variabes are equa to 1 are preserved in the resuting compression and dependencies whose variabes are equa to 0 are deeted, aong with their dependent words. The ILP maximizes the objective function h, w Ρ Ρ f ( x) = x t(, ) (, h) (, h ) x where t is the tweak function, which corrects discrepancies between frequency and grammatica necessity that occur with some specific types of dependencies. Fiippova and Strube used two constraints in their mode to preserve tree structure and connectedness in the compression. To ensure that a of the words in the pre-seected answer A are aso preserved, we incude in our mode the extra constraint w A, xh h, 1., w We soved these integer inear programs using p sove 1, an open-source LP and ILP sover. C. Shortest Grammatica Compression In order to find the shortest grammatica compression of S, our system first finds a soution to the ILP for S and A for every vaue of α (the maximum ength constraint parameter) between the ength of S and the ength of A. Because the constraints aso specify that every word in A is preserved in the compression, any mode where α is ess than the ength of A woud have no soution. To determine the grammaticaity of the compressions, we use the MFQ metric [15], which is created using the Berkeey Language Mode Tookit [17] and trained on the OANC text. It considers the og-probabiities of a of the n-grams in the given sentence, seects the first quartie (25% with the owest vaues), and cacuates the mean of the ratios of each n-gram og-probabiity over the unigram og-probabiity of that n- gram s head word. The arger the MFQ vaue is, the ess ikey the sentence is to be grammatica. Our system ooks through the ist of different ength compressions and seects the shortest compression with an MFQ vaue ess than a specified threshod (we used a threshod of 1.14). This compression is returned as the simpified statement extracted from S for the answer A. Tabe I shows MFQ vaues of some simpified sentences aong with their evauation. D. Resuts The functionaity of the compression system can be demonstrated with sampe outputs from the compressor. For exampe, given the sentence She mentioned that she worked in Appe and Microsoft, the simpified sentence extractor can determine that she, Appe, Microsoft are potentia answers for which a question generator can ask questions. For the answers Appe and Microsoft, the system generates as the compression She worked in Appe and Microsoft, which happens to be a compression of the origina sentence with the pre-identified answer preserved in it. This statement can now be passed to a question generator as a simpe sentence that can potentiay generate the question Where did she work in? or something simiar. VI. EVALUATION AND DISCUSSION To evauate our agorithm for (SSE), we compare it with the (SFSE) agorithm presented by Heiman and Smith [7]. The source sentences we use are compex sentences from the Simpe-Compex Sentence Pairs produced by [18]. The Simpe-Compex Sentence Pairs were coected from the Engish Wikipedia 2 and Simpe Engish Wikipedia 3. Simpe Wikipedia targets chidren and non-native Engish speakers

5 Authors of Simpe Wikipedia use short sentences composed of easy words to write artices. The coected dataset incude 65,133 artices paired from Simpe Wikipedia and Wikipedia using dump fies downoaded from Wikimedia 4. We randomy seected a sampe of 85 compex senetences from the corpus. Our agorithm was abe to produce 215 compressed sentences, whie the SFSE agorithm was abe to produce 119 compressed sentences. To measure the performance of the agorithms, we compute the percentage of the correct compressed sentences produced by both methods. We asked independent human evauators to evauate the compressed sentences through a web appication. The evauators were asked if the agorithm produced a new shorter sentence and whether the new sentence is correct or not. As Figure 3 shows, our (SSE) agorithm was abe to produced new compressed sentences in 84.4% of the cases whie the (SFSE) agorithm was abe to produces new compressed sentences in 73.38% of the cases. Moreover, our (SSE) agorithm generated 43.3% correct sentences and 41.1% of incorrect sentences, whereas the (SFSE) agorithm generated 46.77% correct sentences and 26.77% of incorrect sentences. We notice here that our method produced more compressed sentences but with ower grammatica accuracy compared with the rue-based approach presented by [7]. We beieve that this is norma since we are using a statistica method for shortening the source sentences. When using ruebased methods, one has the advantage of controing the output. However, one major disadvantages of using rue-based methods is that it is imited to the impemented set of rues. Our resuts ceary show that the rue-based method produced fewer sentences compared with the statistica method we use. Another disadvantage of using a rue-based method is that it is aso imited to a singe anguage whereas statistica methods can be adapted to use with additiona anguages. VII. CONCLUSION The key principe on which our system is buit is that seecting the answer at the beginning of the QG process and using it to guide SSE can improve the coverage of the system. We impemented a machine earning-based approach for answer seection and deveoped a way to compress a sentence whie eaving a specified answer phrase intact. Athough we have not yet been abe to perform arge scae tests on this system where the output is rated by human judges, we have generated some good output sentences. In the near future, this system wi be integrated with a direct decarative-tointerrogative transformation system to produce a fu, functiona, QG system. Figure 3 The ratings of sentences produced by our (SSE) agorithm and the (SFSE) agorithm presented by [7]. 4 REFERENCES [1] H. Kunichika, T. Katayama, T. Hirashima, and A. Takeuchi, Automated question generation methods for inteigent Engish earning systems and its evauation, in Proceedings of ICCE2004, 2003, pp [2] J. H. Wofe, Automatic question generation from text-an aid to independent study, in ACM SIGCUE Outook, vo. 10. ACM, 1976, pp [Onine]. Avaiabe: [3] M. Heiman and N. A. Smith, Good question! statistica ranking for question generation, in Human Language Technoogies: The 2010 Annua Conference of the North American Chapter of the Association for Computationa Linguistics. Association for Computationa Linguistics, 2010, pp [4] L. Becker, S. Basu, and L. Vanderwende, Mind the gap: earning to choose gaps for question generation, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computationa Linguistics: Human Language Technoogies. Association for Computationa Linguistics, 2012, pp [5] K. Knight and D. Marcu, Statistics-based summarization-step one: Sentence compression, in AAAI/IAAI, 2000, pp [6] J. Carke and M. Lapata, Modeing Compression with Discourse Constraints. in EMNLP-CoNLL, 2007, pp [7] M. Heiman and N. A. Smith, Extracting simpified statements for factua question generation, in Proceedings of QG2010: The Third Workshop on Ques-tion Generation, [8] S. Kaady, A. Eikkotti, and R. Das, Natura anguage question generation using syntax and keywords, in Proceedings of QG2010: The Third Workshop on Question Generation, 2010, pp [9] R. Das and A. Eikkotti, Automatic Summarizer to aid a Q/A system, Internationa Journa of Computer Appications, vo. 1, no. 1, pp , [10] K. Fiippova and M. Strube, Dependency tree based sentence compression, in Proceedings of the Fifth Internationa Natura Language

6 Generation Conference. Association for Computationa Linguistics, 2008, pp [11] M.-C. De Marneffe and C. D. Manning, The Stanford typed dependencies representation, in Coing 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evauation. Association for Computationa Linguistics, 2008, pp [12] G. Homes, A. Donkin, and I. H. Witten, WEKA: a machine earning workbench, in Proceedings of the 1994 Second Austraian and New Zeaand Conference on Inteigent Information Systems,1994, Nov. 1994, pp [13] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Buiding a Large Annotated Corpus of Engish: The Penn Treebank, Comput. Linguist., vo. 19, no. 2, pp , Jun [14] S. L. Cessie and J. C. V. Houweingen, Ridge Estimators in Logistic Regression, Journa of the Roya Statistica Society. Series C (Appied Statistics), vo. 41, no. 1, pp , Jan [15] A. Cark, G. Giorgoo, and S. Lappin, Statistica representation of grammaticaity judgements: the imits of n-gram modes, CMCL 2013, p. 28, [16] N. Ide and C. Maceod, The american nationa corpus: A standardized resource of american engish, in Proceedings of Corpus Linguistics 2001, vo. 3, [17] A. Paus and D. Kein, Faster and Smaer N-gram Language Modes, in Proceedings of the 49th Annua Meeting of the Association for Computationa Linguistics: Human Language Technoogies - Voume 1, ser. HLT 11. Stroudsburg, PA, USA: Association for Computationa Linguistics, 2011, pp [18] Z. Zhu, D. Bernhard, and I. Gurevych, A monoingua tree-based transation mode for sentence simpification, in Proceedings of the 23rd internationa conference on computationa inguistics. Association for Computationa Linguistics, 2010, pp

Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling

Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling Unsupervised Large-Vocabuary Word Sense Disambiguation with Graph-based Agorithms for Sequence Data Labeing Rada Mihacea Department of Computer Science University of North Texas rada@cs.unt.edu Abstract

More information

Using Voluntary work to get ahead in the job market

Using Voluntary work to get ahead in the job market Vo_1 Vounteering Using Vountary work to get ahead in the job market Job Detais data: {documents}httpwwwopeneduopenearnocw_cmid4715_2014-08-21_14-34-17_ht2.xm user: ht2 tempate: ve_pdf job name: httpwwwopeneduopenearnocw_cmid4715_2014-08-

More information

Making and marking progress on the DCSF Languages Ladder

Making and marking progress on the DCSF Languages Ladder Making and marking progress on the DCSF Languages Ladder Primary anguages taster pack Year 3 summer term Asset Languages and CILT have been asked by the DCSF to prepare support materias to hep teachers

More information

Precision Decisions for the Timings Chart

Precision Decisions for the Timings Chart PPENDIX 1 Precision Decisions for the Timings hart Data-Driven Decisions for Performance-Based Measures within ssions Deb Brown, MS, BB Stanisaus ounty Office of Education Morningside Teachers cademy Performance-based

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books Yoav Goldberg Bar Ilan University yoav.goldberg@gmail.com Jon Orwant Google Inc. orwant@google.com Abstract We created

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Exploiting Wikipedia as External Knowledge for Named Entity Recognition Exploiting Wikipedia as External Knowledge for Named Entity Recognition Jun ichi Kazama and Kentaro Torisawa Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa, 923-1292

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing The Effect of Multiple Grammatical Errors on Processing Non-Native Writing Courtney Napoles Johns Hopkins University courtneyn@jhu.edu Aoife Cahill Nitin Madnani Educational Testing Service {acahill,nmadnani}@ets.org

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Search right and thou shalt find... Using Web Queries for Learner Error Detection Search right and thou shalt find... Using Web Queries for Learner Error Detection Michael Gamon Claudia Leacock Microsoft Research Butler Hill Group One Microsoft Way P.O. Box 935 Redmond, WA 981052, USA

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

The Ups and Downs of Preposition Error Detection in ESL Writing

The Ups and Downs of Preposition Error Detection in ESL Writing The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Specifying a shallow grammatical for parsing purposes

Specifying a shallow grammatical for parsing purposes Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

A deep architecture for non-projective dependency parsing

A deep architecture for non-projective dependency parsing Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective

More information

Survey on parsing three dependency representations for English

Survey on parsing three dependency representations for English Survey on parsing three dependency representations for English Angelina Ivanova Stephan Oepen Lilja Øvrelid University of Oslo, Department of Informatics { angelii oe liljao }@ifi.uio.no Abstract In this

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

TINE: A Metric to Assess MT Adequacy

TINE: A Metric to Assess MT Adequacy TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information