Sentence Simplification for Question Generation
|
|
- Nathan Simpson
- 6 years ago
- Views:
Transcription
1 Sentence Simpification for Question Generation Feras A Tarouti and Juga Kaita Conor McGrory Department of Computer Science Department of Computer Science University of Coorado at Coorado Springs Princeton University Coorado Springs, Coorado 80918, USA Princeton, New Jersey {fatarou & jkaita}@uccs.edu cmcgrory@princeton.edu Abstract - Automatic generation of basic, factua questions from a singe sentence is a probem that has received a considerabe amount of attention. Some studies have suggested spitting this probem into two parts: first, decomposing the source sentence into a set of smaer, simpe sentences, and then transforming each of these sentences into a question. This paper outines a nove method for the first part, combining two techniques recenty deveoped for reated NLP probems. Our method uses a trained cassifier to determine which phrases of the source sentence are potentia answers to questions, and then creates different compressions of the sentence for each one. Index Terms Sentence Simpification, Question Generation. I. INTRODUCTION The abiity of a speaker to form a grammatica question to request a specific piece of information from another party is indispensabe in most practica situations invoving basic communication. Recenty, there has been a significant amount of research towards deveoping systems that can automaticay generate basic questions from input text. This is caed the probem of Question Generation (QG). Athough some studies in the past have tried to generate questions based on whoe bocks of text [1], the majority of recent work done on QG has focused on the probem of generating factua questions from a singe sentence. Eary attempts to sove this probem used compicated sets of grammatica rues to transform the input sentence directy into a question [2]. However, Heiman and Smith [3] suggested separating the probem into two steps: first, simpifying the source sentence, and then transforming it into a question. The advantage of this approach is that grammatica rues are much better at transforming simpe sentences than they are at transforming compex ones. Our paper outines a method for performing the first step, which we refer to as the probem of Simpified Statement Extraction (SSE). II. PRIOR WORK Two probems in NLP that are reated to QG are coze question generation and sentence compression. In a coze question, the student is asked, after reading the text, to compete a given sentence by fiing in a bank with the correct word. One exampe coud be the question A is a conceptua device used in computer science as a universa mode of computing processes. In this case, the answer woud be Turing machine. However, seecting which phrase(s) in the sentence to deete is somewhat difficut. A question ike A Turing Machine a conceptua device used in computer science as a universa mode of computing processes. with the verb is as the answer woud be competey useess to a student interested in testing knowedge of computer science. An automatic coze question generator needs to distinguish informative questions from extraneous ones. Because the quaity of a coze question can depend on reationships between a arge number of factors, to generate high-quaity questions, Becker et a. [4] train a ogistic regression cassifier on a corpus of questions paired with human judgments of their quaity. Sentence compression is the probem of transforming an input sentence into a shorter version that is grammatica and retains the most important semantic eements of the origina. Knight and Marcu [5] used a statistica anguage mode where the input sentence is treated as a noisy channe and the compression is the signa, whie Carke and Lapata [6] used a arge set of constituency parse tree manipuation rues to generate compressions. Heiman and Smith [7] deveoped a rue-based agorithm, which is caed Simpified Factua Statement Extractor (SFSE), that extracts mutipe simpe sentences from a source sentence. Whie traditiona sentence compression agorithms usuay compress a ong sentence into a singe short sentence, SFSE extracts one or more simpe sentences from a ong sentence. By doing so, the agorithm ensures that important information, which can be used to generate questions, is reserved. Each simpe sentence produced by the agorithm can be easiy converted into a question. The SFSE agorithm uses textua entaiment recognition to spit the compex sentences into a set of true simpe sentences given the origina sentence. There are two inguistic phenomena that the SFSE agorithm works on: semantic entaiment and presupposition. By extracting mutipe simpified statements from the source sentence, they increased the number of possibe questions that coud be generated. Kaady et a. [8] presented a rue-based agorithm for generating definitiona and factoid questions from a mutisentence source. Here, to generate definitiona questions, keywords from the source document are seected using a summarization system [9]. These keywords are caed Up- Keys. Then, the Up-Keys are mapped to simpe question
2 tempates. For instance, if the word Eboa is seected as a keyword, then, it woud be mapped to the tempate: <Question-word>is <Up-Key>? to generate the question What is Eboa?. To generate factoid questions, the source sentence is preprocessed to produce simpe causes by spitting the independent causes within the sentence and repacing pronouns. Then, using a tree reguar expression anguage, the agorithm tries to identify named entities, subject-auxiiaries, appositives, subject verb object structures, prepositiona phrases and adverbias. Finay, for each case of these patterns, a procedure is appied to generate a question. The authors evauated the system by comparing the questions generated by the system with manuay generated questions. The system scored an average precision of 0.46 and an average reca of The authors reported that the overa quaity of the generated questions decreases as the ength of the source sentence increases. Fiippova and Strube [10] deveoped a method where a compressed sentence is generated by pruning the dependency parse tree of the input sentence. Using the Tipster corpus, they cacuated the conditiona probabiities of specific dependencies occurring after a given head word. These were used, in combination with data on the frequencies of the words themseves, to cacuate a score for each dependency in the tree. They then formuated the probem of compressing the sentence as an integer inear program. Each variabe corresponded to a dependency in the tree. A vaue of 1 meant the dependent word of that dependency woud be preserved in the compression, and a vaue of 0 meant that it woud be deeted. Constraints were added to restrict the structure and ength of the compression, and the objective function set to be maximized was the sum of the scores of the preserved dependencies. The centra assumption made by Fiippova and Strube s method is that the frequency with which a particuar dependency occurs after a given word is a good indicator of its grammatica necessity. III. SIMPLIFIED SENTENCE EXTRACTION A. Probem Statement We divide the process of QG into three major steps: answer seection, sentence simpification and question generation. Figure 1 shows the QG process appied on the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. In this work we focus on the answer seection and sentence simpification steps, which we refer to as simpified statement extraction (SSE).We define the probem of (SSE) as foows. For a source sentence S, create a set of simpified statements {si...sn} that are semantic entaiments of S. A sentence is considered a simpified statement if it is a decarative sentence (a statement) that can be directy transformed into a question-answer pair without any compression. Fig. 1 The process of question generation appied to the sentence John performed Yoga, which is a Hindu spiritua discipine, to reduce his stress. B. Soution Steps As Becker et a. [4] showed, there are certain phrases in S that make sense as answers to questions and others that do not. The idea behind our SSE system is that knowedge of which phrases in S are good answers can inform the compression process, preventing us from missing important information and thereby maximizing coverage. We sove the SSE probem in two parts: first identifying potentia answers, and then generating for each of these answers a compression of S where that answer is preserved. These compressions form the set {si} of simpified statements. Our goa when compressing for a particuar answer is to find the shortest grammatica compression of S that contains the given answer. To seect potentia answers from the input sentence, we use a sighty modified version of Becker et a. s coze question generation system. Once we have the set of possibe answers, we use a more substantiay modified version of Fiippova and Strube s [10] dependency tree pruning method to generate the set of shortest grammatica compressions of S that contain each of the answers. IV. ANSWER SELECTION We impemented the answer seection system using the Stanford NLP Tookit [11] and the Weka machine earning software [12]. It uses the corpus of sentences, QA pairs, and human judgments from Becker et a. [4] to train a cassifier to find the nodes in the parse tree of the input sentence that are most ikey viabe answers to questions. A. Feature Set The dependency reations identified by the Stanford NLP Tookit are a set of grammatica reations between governor Fig. 2 Transformation for the dependency tree of She mentioned that she worked in Appe and Microsoft.
3 and dependent words in a sentence [11]. Some exampes incude verb-subject, verb-indirect object, noun-modifier, and noun-determiner. For our purposes, we used the 56 basic reations defined in the Stanford ibrary to categorize a of our dependencies. Our features can be divided into three basic categories: token count features, syntactic features, and semantic features. The token count features contained 5 features which had to do with the ength of the answer in comparison to the ength of the sentence, ike the raw engths of both and the ength of the answer as a percentage of the ength of the question. Exampes of syntactic features we use are the Penn POS tag of the word [13] that comes immediatey before the answer, the tag of the word that comes immediatey after, and the set of tags of words contained in the answer phrase. The semantic features use the Stanford dependencies system and are competey different than the semantic features used by [4]. These incude the dependency reation between the head of the answer phrase and its governor in the sentence, the set of reations between governors in the answer and dependents not in the answer, the set of reations with both governors and dependents in the answer, and the distance in the constituency tree between the answer node and its maxima projection. B. Cassifier The cassifier used in our system is the Weka Logistic cassifier [14]. This is a binary ogistic regression cassifier, simiar to the one used by Becker et a [4]. C. Human Judgments The corpus provided by Becker et a. [4] consists of sighty over 2,000 sentences, each with a seected answer phrase and four human judgments of the quaity of the answer. Our program used the four judgments to cacuate a score for each answer, which we then used to determine how to cassify it in the data set. This score is then compared to the threshod We used the program to produce a data set from the Becker et a. corpus. This data set was created using a threshod vaue of 1.0 (a four human judges have to rate the sentence as Good ). A random sampe of the sentences was drawn from this data to produce a subset with a comparabe amount of Good and Bad sentences. This set contained a tota of 582 instances, 278 of which were Good and 304 of which were Bad. We tested both the Weka Logistic cassifier and the Weka Simpe Logistic cassifier on the data using 10-fod cross-vaidation. For the Logistic cassifier, the correct cassification rate was 72.3%, the true positive rate was 78.4%, and the fase positive rate was 33.2%. V. SENTENCE COMPRESSION Fiippova and Strub [10] deveoped an unsupervised sentence compression approach that compresses sentences by pruning unnecessary subtrees from the dependency tree of the sentence. Three processes are appied to the dependency tree to compress a sentence: transformation, compression and inearization. The tree transformation process is carried out in four steps: ROOT, VERB, PREP and CONJ. In the ROOT step, a root node is inserted to the tree. Then, in the VERB step, the root node is connected to a the infected verbs in the tree with edges abeed as s. After that, a auxiiary verbs edges are deeted from the tree and grammatica properties of the verbs are stored to be recovered ater. In the next step (PREP), a prepositiona nodes are repaced with abes on the edges which connect a head to the respective noun. Finay, in the CONJ step, for every chain of conjoined non-verb words, the chain is spit and each conjunction on it is connected directy to the head of the first of the chain using edges that have abes simiar to the edge connecting the first conjunction to the head. Figure 2 shows the transformation for the dependency Tabe I SAMPLES OF SIMPLIFIED SENTENCES ALONG WITH THEIR MFQ VALUES AND EVALUATIONS vaue (a pre-set constant in the program). If the score is greater than or equa to this vaue, the answer is cassified in the data set as Good. Otherwise, it is cassified as Bad. D. Resuts tree of the sentence She mentioned that she worked in Appe and Microsoft. The tree compression process is performed by removing edges from the dependency graph produced by the transformation process. To seect which edge shoud be removed from the graph, a score is computed for the subtree
4 connected by each edge. We first cacuate probabiities of dependencies occurring after head words and use this as an estimate of the grammatica necessity of different dependencies given the presence of a head word. Aong with a of the constraints paced on the ILP in the origina mode by Fiippova and Strube [10], we add an extra constraint that ensures the preservation of the answer phrase in the compression. We then use a inear program sover to sove the ILP for a ength vaues between 0 and the ength of S, generating a set of compressions of S with a possibe engths. From these compressions, we use a 3-gram mode to cacuate the Mean First Quartie (MFQ) grammaticaity metric described by Cark et a. [15]. Compressions with an MFQ vaue ower than a threshod are deemed grammatica, and the shortest of these is seected as the fina compression of S for the given answer. Finay, in the tree inearization process, the seected words are presented in the order they appear in the origina sentence. A. Dependency Probabiities In addition to the feature set used in the seection part of the system, we incuded additiona ones such as coapsed dependencies [11], which are created when cosed-cass words ike and, of, or by are made part of the grammatica reations. To cacuate the frequencies of dependencies after certain head words, we use a pre-parsed section of the Open American Nationa Corpus [16]. To prevent rounding errors, we used a smoothing function when cacuating the probabiities from the frequency data. Finay, to avoid probems that come with probabiity vaues of zero, our system ineary maps the smoothed probabiity P( h) vaues from [0,1] to [10-4,1]. B. Integer Linear Program We formuate the compression probem as an ILP. For each dependency with the Stanford type, hoding between head word h and dependent word w, we create variabe xh, w. These variabes must each take on a vaue of 0 or 1 in the soution, where dependencies whose variabes are equa to 1 are preserved in the resuting compression and dependencies whose variabes are equa to 0 are deeted, aong with their dependent words. The ILP maximizes the objective function h, w Ρ Ρ f ( x) = x t(, ) (, h) (, h ) x where t is the tweak function, which corrects discrepancies between frequency and grammatica necessity that occur with some specific types of dependencies. Fiippova and Strube used two constraints in their mode to preserve tree structure and connectedness in the compression. To ensure that a of the words in the pre-seected answer A are aso preserved, we incude in our mode the extra constraint w A, xh h, 1., w We soved these integer inear programs using p sove 1, an open-source LP and ILP sover. C. Shortest Grammatica Compression In order to find the shortest grammatica compression of S, our system first finds a soution to the ILP for S and A for every vaue of α (the maximum ength constraint parameter) between the ength of S and the ength of A. Because the constraints aso specify that every word in A is preserved in the compression, any mode where α is ess than the ength of A woud have no soution. To determine the grammaticaity of the compressions, we use the MFQ metric [15], which is created using the Berkeey Language Mode Tookit [17] and trained on the OANC text. It considers the og-probabiities of a of the n-grams in the given sentence, seects the first quartie (25% with the owest vaues), and cacuates the mean of the ratios of each n-gram og-probabiity over the unigram og-probabiity of that n- gram s head word. The arger the MFQ vaue is, the ess ikey the sentence is to be grammatica. Our system ooks through the ist of different ength compressions and seects the shortest compression with an MFQ vaue ess than a specified threshod (we used a threshod of 1.14). This compression is returned as the simpified statement extracted from S for the answer A. Tabe I shows MFQ vaues of some simpified sentences aong with their evauation. D. Resuts The functionaity of the compression system can be demonstrated with sampe outputs from the compressor. For exampe, given the sentence She mentioned that she worked in Appe and Microsoft, the simpified sentence extractor can determine that she, Appe, Microsoft are potentia answers for which a question generator can ask questions. For the answers Appe and Microsoft, the system generates as the compression She worked in Appe and Microsoft, which happens to be a compression of the origina sentence with the pre-identified answer preserved in it. This statement can now be passed to a question generator as a simpe sentence that can potentiay generate the question Where did she work in? or something simiar. VI. EVALUATION AND DISCUSSION To evauate our agorithm for (SSE), we compare it with the (SFSE) agorithm presented by Heiman and Smith [7]. The source sentences we use are compex sentences from the Simpe-Compex Sentence Pairs produced by [18]. The Simpe-Compex Sentence Pairs were coected from the Engish Wikipedia 2 and Simpe Engish Wikipedia 3. Simpe Wikipedia targets chidren and non-native Engish speakers
5 Authors of Simpe Wikipedia use short sentences composed of easy words to write artices. The coected dataset incude 65,133 artices paired from Simpe Wikipedia and Wikipedia using dump fies downoaded from Wikimedia 4. We randomy seected a sampe of 85 compex senetences from the corpus. Our agorithm was abe to produce 215 compressed sentences, whie the SFSE agorithm was abe to produce 119 compressed sentences. To measure the performance of the agorithms, we compute the percentage of the correct compressed sentences produced by both methods. We asked independent human evauators to evauate the compressed sentences through a web appication. The evauators were asked if the agorithm produced a new shorter sentence and whether the new sentence is correct or not. As Figure 3 shows, our (SSE) agorithm was abe to produced new compressed sentences in 84.4% of the cases whie the (SFSE) agorithm was abe to produces new compressed sentences in 73.38% of the cases. Moreover, our (SSE) agorithm generated 43.3% correct sentences and 41.1% of incorrect sentences, whereas the (SFSE) agorithm generated 46.77% correct sentences and 26.77% of incorrect sentences. We notice here that our method produced more compressed sentences but with ower grammatica accuracy compared with the rue-based approach presented by [7]. We beieve that this is norma since we are using a statistica method for shortening the source sentences. When using ruebased methods, one has the advantage of controing the output. However, one major disadvantages of using rue-based methods is that it is imited to the impemented set of rues. Our resuts ceary show that the rue-based method produced fewer sentences compared with the statistica method we use. Another disadvantage of using a rue-based method is that it is aso imited to a singe anguage whereas statistica methods can be adapted to use with additiona anguages. VII. CONCLUSION The key principe on which our system is buit is that seecting the answer at the beginning of the QG process and using it to guide SSE can improve the coverage of the system. We impemented a machine earning-based approach for answer seection and deveoped a way to compress a sentence whie eaving a specified answer phrase intact. Athough we have not yet been abe to perform arge scae tests on this system where the output is rated by human judges, we have generated some good output sentences. In the near future, this system wi be integrated with a direct decarative-tointerrogative transformation system to produce a fu, functiona, QG system. Figure 3 The ratings of sentences produced by our (SSE) agorithm and the (SFSE) agorithm presented by [7]. 4 REFERENCES [1] H. Kunichika, T. Katayama, T. Hirashima, and A. Takeuchi, Automated question generation methods for inteigent Engish earning systems and its evauation, in Proceedings of ICCE2004, 2003, pp [2] J. H. Wofe, Automatic question generation from text-an aid to independent study, in ACM SIGCUE Outook, vo. 10. ACM, 1976, pp [Onine]. Avaiabe: [3] M. Heiman and N. A. Smith, Good question! statistica ranking for question generation, in Human Language Technoogies: The 2010 Annua Conference of the North American Chapter of the Association for Computationa Linguistics. Association for Computationa Linguistics, 2010, pp [4] L. Becker, S. Basu, and L. Vanderwende, Mind the gap: earning to choose gaps for question generation, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computationa Linguistics: Human Language Technoogies. Association for Computationa Linguistics, 2012, pp [5] K. Knight and D. Marcu, Statistics-based summarization-step one: Sentence compression, in AAAI/IAAI, 2000, pp [6] J. Carke and M. Lapata, Modeing Compression with Discourse Constraints. in EMNLP-CoNLL, 2007, pp [7] M. Heiman and N. A. Smith, Extracting simpified statements for factua question generation, in Proceedings of QG2010: The Third Workshop on Ques-tion Generation, [8] S. Kaady, A. Eikkotti, and R. Das, Natura anguage question generation using syntax and keywords, in Proceedings of QG2010: The Third Workshop on Question Generation, 2010, pp [9] R. Das and A. Eikkotti, Automatic Summarizer to aid a Q/A system, Internationa Journa of Computer Appications, vo. 1, no. 1, pp , [10] K. Fiippova and M. Strube, Dependency tree based sentence compression, in Proceedings of the Fifth Internationa Natura Language
6 Generation Conference. Association for Computationa Linguistics, 2008, pp [11] M.-C. De Marneffe and C. D. Manning, The Stanford typed dependencies representation, in Coing 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evauation. Association for Computationa Linguistics, 2008, pp [12] G. Homes, A. Donkin, and I. H. Witten, WEKA: a machine earning workbench, in Proceedings of the 1994 Second Austraian and New Zeaand Conference on Inteigent Information Systems,1994, Nov. 1994, pp [13] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Buiding a Large Annotated Corpus of Engish: The Penn Treebank, Comput. Linguist., vo. 19, no. 2, pp , Jun [14] S. L. Cessie and J. C. V. Houweingen, Ridge Estimators in Logistic Regression, Journa of the Roya Statistica Society. Series C (Appied Statistics), vo. 41, no. 1, pp , Jan [15] A. Cark, G. Giorgoo, and S. Lappin, Statistica representation of grammaticaity judgements: the imits of n-gram modes, CMCL 2013, p. 28, [16] N. Ide and C. Maceod, The american nationa corpus: A standardized resource of american engish, in Proceedings of Corpus Linguistics 2001, vo. 3, [17] A. Paus and D. Kein, Faster and Smaer N-gram Language Modes, in Proceedings of the 49th Annua Meeting of the Association for Computationa Linguistics: Human Language Technoogies - Voume 1, ser. HLT 11. Stroudsburg, PA, USA: Association for Computationa Linguistics, 2011, pp [18] Z. Zhu, D. Bernhard, and I. Gurevych, A monoingua tree-based transation mode for sentence simpification, in Proceedings of the 23rd internationa conference on computationa inguistics. Association for Computationa Linguistics, 2010, pp
Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling
Unsupervised Large-Vocabuary Word Sense Disambiguation with Graph-based Agorithms for Sequence Data Labeing Rada Mihacea Department of Computer Science University of North Texas rada@cs.unt.edu Abstract
More informationUsing Voluntary work to get ahead in the job market
Vo_1 Vounteering Using Vountary work to get ahead in the job market Job Detais data: {documents}httpwwwopeneduopenearnocw_cmid4715_2014-08-21_14-34-17_ht2.xm user: ht2 tempate: ve_pdf job name: httpwwwopeneduopenearnocw_cmid4715_2014-08-
More informationMaking and marking progress on the DCSF Languages Ladder
Making and marking progress on the DCSF Languages Ladder Primary anguages taster pack Year 3 summer term Asset Languages and CILT have been asked by the DCSF to prepare support materias to hep teachers
More informationPrecision Decisions for the Timings Chart
PPENDIX 1 Precision Decisions for the Timings hart Data-Driven Decisions for Performance-Based Measures within ssions Deb Brown, MS, BB Stanisaus ounty Office of Education Morningside Teachers cademy Performance-based
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationTextGraphs: Graph-based algorithms for Natural Language Processing
HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006
More informationA Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books
A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books Yoav Goldberg Bar Ilan University yoav.goldberg@gmail.com Jon Orwant Google Inc. orwant@google.com Abstract We created
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationCompositional Semantics
Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationLinking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report
Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationExploiting Wikipedia as External Knowledge for Named Entity Recognition
Exploiting Wikipedia as External Knowledge for Named Entity Recognition Jun ichi Kazama and Kentaro Torisawa Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa, 923-1292
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationThe Effect of Multiple Grammatical Errors on Processing Non-Native Writing
The Effect of Multiple Grammatical Errors on Processing Non-Native Writing Courtney Napoles Johns Hopkins University courtneyn@jhu.edu Aoife Cahill Nitin Madnani Educational Testing Service {acahill,nmadnani}@ets.org
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationSearch right and thou shalt find... Using Web Queries for Learner Error Detection
Search right and thou shalt find... Using Web Queries for Learner Error Detection Michael Gamon Claudia Leacock Microsoft Research Butler Hill Group One Microsoft Way P.O. Box 935 Redmond, WA 981052, USA
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationOn-the-Fly Customization of Automated Essay Scoring
Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationThe Ups and Downs of Preposition Error Detection in ESL Writing
The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY
More informationLearning Computational Grammars
Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract
More informationSpecifying a shallow grammatical for parsing purposes
Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationA deep architecture for non-projective dependency parsing
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective
More informationSurvey on parsing three dependency representations for English
Survey on parsing three dependency representations for English Angelina Ivanova Stephan Oepen Lilja Øvrelid University of Oslo, Department of Informatics { angelii oe liljao }@ifi.uio.no Abstract In this
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationTINE: A Metric to Assess MT Adequacy
TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More information