THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Size: px
Start display at page:

Download "THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING"

Transcription

1 SISOM & ACOUSTICS 2015, Bucharest May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest, mnvlazar@gmail.com 2 University Politehnica of Bucharest, Bucharest The speech recognition and understanding systems require various kinds of linguistic knowledge to improve their performances. This requires the capture of knowledge from different linguistic levels (morphological, syntactic, semantic, etc.) and their processing using different techniques. The decision trees are one of the most frequently used techniques in natural language processing because they can model very well the grammatical structure of sentences/phrases. In this paper we present the role of the decision trees in modeling of possession relation between two nouns semantically related. Keywords: natural language processing, decision trees, linguistic knowledge, morphosyntactic description, syntactic level, possession relation, Romanian language. 1. INTRODUCTION The automatic speech recognition and understanding requires knowledge from different domains like the signal and pattern recognition, natural language processing, mathematics and linguistics. In recognizing and understanding natural speech are needed both acoustic pattern matching and linguistic knowledge. This linguistic knowledge used by human when they communicate is hard to model being almost impossible to obtain a language model that can solve all the problems. To model different kinds of linguistic knowledge for natural language models (so that they can be introduced in automatic speech recognition and understanding systems, machine translation, natural language generation, word sense disambiguation, part-of-speech tagging, etc.) have been developed different theories and methods. Each of these methods uses various kinds of linguistic knowledge so that the models obtained lead to improve the accuracy of automatic speech recognition and understanding. One of the most used methods in natural language processing, for solving different ambiguities from phonetic, morphologic, syntactic, semantic and pragmatic levels, are the decision trees because they can model well the language structure. 2. LINGUISTIC KNOWLEDGE Due to the complexity of the natural language, the explanation of the natural language behavior is a task that is very hard to achieve. For this reason the knowledge of language has been divided on several levels, each of them containing certain linguistic features that will allow creating some specific linguistic analysis methods for each level. The way the knowledge of language is divided on linguistic levels is made in a manner that allowing the information from the bottom linguistic level helps to achieve the analysis on the top levels. Linguistic knowledge is divided on linguistic levels as follows: - the way of producing linguistic sounds is studied on phonetics and phonology levels; - the way the words are formed from the meaningful components is examined on morphology level; - the way the words are ordered and grouped together is studied on syntax level; - the meaning of the words are examined on semantic level;

2 227 The role of decision trees in natural language processing - what kind of actions speakers intend by the use of certain sentences is studied on pragmatic or dialogue level. In this paper we studied word relation only on syntax level in order to define possession relation between two nouns using morphosyntactic description. 3. DECISION TREES A decision tree is a tool used for supporting the decision making process to make good choices. They are part of the family of machine learning methods and allow a hierarchical distribution of a dataset collection. Using a decision tree algorithm, is generated knowledge in the form of a hierarchical tree structure that can be used to classify instances based on a training dataset. An example of the decision tree used in natural language processing are the syntactic tree generated with a grammar (figure 1). Legend: S sentence NP noun phrase VP verb phrase PP prepositional phrase N noun V verb P preposition Figure 1. TheădecisionătreeăforăRomanianăsentenceă Mariaăpuneăcarteaăpeămas. ă(eng.:ămary puts the book on the table.) An important part of a decision tree algorithm is the method used for selecting attributes at each node of the tree. Each of these algorithms uses a certain methods for splitting the set of items. The C4.5 algorithm [6] uses for splitting the information gain, a notion based on entropy concept. Another most known algorithm, CART algorithm [10], uses the Gini impurity that measure how often a randomly chosen item from the dataset would be incorrectly labelled. The decision trees can be used to classify unseen instances because given a training dataset it can be induced a decision tree from which can be easily extracted rules concerning the dataset. Another advantage offered by the decision trees is the fact that they are able to handle both categorical and numerical. Also they are able to classify large datasets in a short period of time. 4. DECISION TREES IN NATURAL LANGUAGE PROCESSING Theă challengeă ină naturală languageă processingă isă toă selectă theă best ă linguistică knowledgeă toă beă usedă when trying to solve a problem. In fact, there are many situations when an ambiguous case (e.g.: part-ofspeech tagging) must be solved by making a decision. The decision trees are one of the best methods for decision making. These can be used for disambiguating problems from every linguistic levels, beginning with ambiguities from phonetics and ending with the understanding a dialog. So, we will present some of the uses of the decision trees in natural language processing from the point of view of the linguistic knowledge type used. Parts-of-speech are very important in morphology because they can give us a large amount of information about a word and its neighbors and the way the word is pronounced. So, the problem of assigning parts-of-speech to words (part-of-speech tagging) is very important in speech and language processing. A crucial role in part-of-speech tagging for morphologically complex languages is played by morphological parsing [7]. The structures that these morphological parsers produce can have many forms: strings, trees, or networks. Thereby, in 1990s, were developed the algorithm based on decision trees [9] and [11].

3 MarilenaăLAZ R, Diana MILITARU 228 Also initially, for morphological parsing (process of finding the constituent morphemes in a word) has been used hand written rules, lately to solve this task was used supervised machine learning, like decision trees [8]. Recent research has focused on unsupervised way to automatically induce the morphological structure without labeled training data sets [2], [3]. The next linguistic knowledge needed for understanding a statement is the knowledge from the syntactic level. The ways the words are arranged together help us to understand which sequence of words will make sense and which not. All linguistic knowledge about which and how words can be grouped together are included on syntax level. The ambiguity occurs on this level because sometimes the grammar assigns two or more possible parse trees to a sentence. Syntactic parsing solve this structural ambiguity by searching through the space of all the possible parse trees to find the correct parse tree for each given sentence. Today there are many parsing algorithms that employ the context-free grammar to produce syntactic trees, which later will be used for semantic analysis in applications that realize machine translation, question answering, information extraction, and grammar checking. Decision trees, together with other methods, were used to develop probabilistic grammars [9], that later were used both to disambiguation in syntactic parsing and to word prediction in language modeling. In 1990s researches has been made in order to add lexical dependencies to probabilistic context-free grammars, so that these grammars became more sensitive to syntactic structure. The lexical probabilistic approach has been used to solve preposition-phrase attachment using decision trees that employ semantic distance between heads [12]. Also, because decision trees can model well the language structure they have been used to develop statistical models for parsing [1]. Analysis on the syntactic level plays an important roles in natural language processing, because the knowledge from this level support the analysis on the superior level (semantic analysis).the meaning of linguistic word sequences can be captured in formal structures, called meaning representations that are used to link the word sequences (linguistic knowledge) to the non-linguistic knowledge in order to perform tasks involving the meaning of linguistic knowledge. But, before understanding sentences, it must be solved the word sense disambiguation problem. Examining each word from a text and determining which sense of each word is being used in that context is not an easy task because many words have more than one meaning. One of the methods used in sense disambiguation is supervised learning. In supervised learning approach to the problem of sense disambiguation is used a corpus, which was hand-labeled with correct word senses, to extract a set of features that are helpful in predicting particular senses. These extracted features will be used for training the systems classifier (naive Bayes classifier, decision list classifier, decision tree classifier, etc.). Considerable research on sense disambiguation has been made using methods like semantic networks, naive Bayes and decision list classifiers. Because of the increasing interest in supervised machine learning approaches to sense disambiguation decision trees learning begin to be used for this task [13]. Also, decision tree learning combined with methods has been used for detecting part-whole relations [14], noun compounds relations [15], [16], noun-modifier relations [17], and semantic roles [18], [19]. Another problem that was solved using the decision tree is modeling long distance dependencies in the sentences that cannot be modeled by the n-gram models [20]. But the decision tree-based language model thus obtained has only a slight improvement in perplexity over the normal n-gram model. So the author suggested that the two models (decision tree-based language model and the n-gram model) to be used together in order to obtain a much lower perplexity. Decision trees can also be used in pragmatics. Interpreting dialog act assume that the system must decide whether a given input is a statement, a question, a directive, or an acknowledgement. One of the methods used to train the prosodic predictor that can solve this task was decision trees [21]. Also the decision trees have been used for supervised discourse segmentation [22]. 5. EXPERIMENTS AND RESULTS As we mention before, decision trees are used in natural language processing for detecting different kinds of syntactic and semantic relations. In this paper we focus only on the possession relation between two nouns in Romanian language, determined using semantic criteria, and tried to detect this relation encoded

4 229 The role of decision trees in natural language processing into lexico-syntactic patterns. The definition of the possession relations between two nouns used is given by the following definition: an entity (A) is possessed/owned by an animated entity (B) [23]. For example in Romanian language the possession relation between two nouns, the possessed object and the possessor, is expressed by genitive case (figure 2). Noun-noun example p rulăjuliei membrii unei familii ochii lui Winston MSD Ncmsry Npfsoy Ncmpry Tifso Ncfson Ncmpry Tf-so Np (English translation) (Julia s hair) (members of a family) (Winston s eyes) Figure 2. Some of the lexico-syntactic patterns that encoded the possession relation extracted from the Romanian translation of Orwell'sănovelă Nineteenăeighty-four ă(fromămultext-east corpus [4]) To discover the possession relations between two nouns we used C4.5 decision tree learning [6] implemented in WEKA [5]. In order to detect this relation for the learning algorithm we have choose a set of linguistic factors extracted from the linguistically annotated corpus consisting of Romanian translation of Orwell'sănovelă Nineteenăeighty-four ă[4].ăweăhadă179ărelationăclassesăandăaătotalănumberăofă410ăinstances.ă The size of the obtained tree had 134 with 88 leaves. Selected features used were: - articulation type, number, gender, case and instances for the nouns; - morphosyntactic descriptions and instances for the link word. The decision tree obtained for possession relation encoded into lexico-syntactic patterns extracted from RomanianătranslationăofăOrwell'sănovelă Nineteenăeighty-four ăisăpresentedăinăfigure 3. Figure 3. The decision tree obtained for possession relation encoded into lexico-syntactic patterns extracted from Romanian translationăofăorwell'sănovelă Nineteenăeighty-four Table 1. Ranked attributes for possession relation between two nouns for Information gain (a) and Gain ratio (b)

5 MarilenaăLAZ R, Diana MILITARU 230 Ranked attributes forinformation gain (a) Ranked attributes forgain ratio (b) gender_possessednoun instancepossessednoun 1 exist_possessivearticle instancepossessornoun 1 no_possessornoun MSD_PossessiveArticle 1 case_possessednoun instancefirstlinkword 1 gender_possessornoun case_possessednoun 1 no_possessednoun case_possessornoun 1 case_possessornoun exist_possessivearticle 1 articulate_possessornoun gender_possessornoun 1 articulate_possessednoun no_possessednoun instancepossessornoun gender_possessednoun instancepossessednoun no_possessornoun MSD_PossessiveArticle articulate_possessornoun instancefirstlinkword articulate_possessednoun Each instance has 13 attributes, but some of them are more relevant than others. To decide which of the attributes are the most relevant in our decision trees, we have used the information gain and gain ratio. In table 1 there are the ranked attributes for selected possession relation between two nouns for information gain (a) and gain ratio (b). To estimate the experiment performances we use the next evaluation measures [5]: a. Kappa statistic is used to measure the difference between expected and observed agreements of a dataset, standardized to lie between 1 (perfect agreement) and -1 (perfect disagreement), 0 being exactly expectation: (1) b. True positive rate (also called sensitivity or recall, in our case) is the percentage of instances correctly classified as a given class: (2) c. False positive rate is the percentage of instances incorrectly classified as a given class: (3) d. Precision is the positive predictive value as a measure of a classifiers exactly. It is the percentage of instances correctly classified as positive out of all instances the algorithm classified as positive: (4) e. F-measure is a combined measure for precision and recall: (5) f. ROC area (receiver operating characteristics area, also named recall-precision curve) is the plotting area between true positive rate and false positive rate. A good classifier will have ROC area values nearly 1. g. Accuracy is the percentage of correctly classified instances:

6 231 The role of decision trees in natural language processing A (6) For selected feature described above we obtained an accuracy of % when using the training set (table 2) and an accuracy of % when using the cross validation with 10 folds (table 3). Table 2. The weighted average values and accuracy of two nouns relation for training set Weighted average value for all classes Accuracy (%) Kappa statistic TP Rate (Recall) FP Rate Precision F-Measure ROC Area Correctly classified instances Table 3. The weighted average values and accuracy of two nouns relation for cross-validation (10 folds set) Weighted average value for all classes Accuracy (%) Kappa statistic TP Rate (Recall) FP Rate Precision F-Measure ROC Area Correctly classified instances CONCLUSIONS As we previous presented there are a lot of task in natural language processing that can be solved using decision trees. They can be used for modeling different linguistic data from phonetic to pragmatic knowledge and combined with other methods they really are a powerful tools in natural language processing. In our case, we have extracted 410 possession relations between two nouns, determined using semantic criteria, from the linguisticallyă annotatedă corpusă consistingă ofă Romaniană translationă ofă Orwell'să novelă Nineteenă eighty-four ă [4].ă Ină orderă toă detectă thoseă possessionă relationsă betweenă nounsă weă usedă C4.5ă decisionă treeă learning [6] implemented in WEKA [5] and these set of 13 morphosyntactic factors: - articulation type, number, gender, case and instances for the nouns - morphosyntactic descriptions and instances for the link word. Using these features the relations were grouped in 179 classes. The decision tree had 88 leaves and its size was 134. The results were: - 90% accuracy and ROC curve for training set; % accuracy and ROC curve using cross-validation with 10 folds. In conclusion, Possession relation modeling between two nouns using decision trees seems to be promising and useful for Romanian language modeling. To improve the results we could use more possessive relations between two or more nouns, more or different features, semantic interpretation of possession relation, etc. REFERENCES 1. Magerman, D. M., Statistical decision-tree models for parsing, ACL-95, pp , ACL, Monson, C., Paramor: From Paradigm Structure to Natural Language Morphology Induction, PhD Thesis, Carnegie Mellon University, Creutz, M., Lagus, K., Unsupervised models for morpheme segmentation and morphology learning, ACM Transactions on Speech and Language Processing, Volume 4, Issue 1, January 2007, Article No Erjavec, T., MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora, Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC'10, ELRA Paris Witten, I.H., Frank, E., Hall, A. M., Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, Morgan Kaufmann Publishers, Quinlan, J. R., Induction of decision trees, Machine Learning, 1, , 1986.

7 MarilenaăLAZ R, Diana MILITARU Jurafsky, D., Martin, J. H., Speech and Language Processing: Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, New Jersey, 2nd. Ed., Van den Bosch, A., Learning to Pronounce Written Words: A Study in Inductive Language Learning, Ph.D. thesis, University of Maastricht, Maastricht, The Netherlands, Jelinek, F., Lafferty, J. D., Magerman, D. M., Mercer, R. L., Ratnaparkhi, A., Roukos, S., Decision tree parsing using a hidden derivation model, ARPA Human Language Technologies Workshop, Plainsboro, pp , Morgan Kaufmann, Breiman, L., et al., Classification and Regression Trees, Pacific Grove, CA, Wadsworth, Heeman, P. A., POS tags and decision trees for language modeling, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-99), pp , Stetina, J. and Nagao, M., Corpus based PP attachment ambiguity resolution with a semantic dictionary, Zhou, J. and Church, K. W. (Eds.), Proceedings of the Fifth Workshop on Very Large Corpora, Beijing, China, pp ACL, Black, E., An experiment in computational discrimination of English word senses, IBM Journal of Research and Development, 3(2), , Girju, R., Badulescu, A., Moldovan, D., Learning semantic constraints for the automatic discovery of part-whole relations, Proceedings of the Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, Rosario, B., Hearst, M., Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy, Proceedings of Conference on EMNLP, Rosario, B., Hearst, M., Fillmore, C, The Descendent of Hierarchy, and Selection in Relational Semantics, Proceedings of ACL, Nastase, V., Szpakowicz, S., Exploring Noun-Modifier Semantic Relations International Workshop on Computational Semantics, Tillburg, Netherlands, January Surdeanu, M., Harabagiu, S., Williams, J., Aarseth, P., Using predicate-argument structures for information extraction, Proceedings of the 41th Annual Conference of the Association for Computational Linguistics (ACL-03), pages 8-15, Chen, J., and Rambow, O., Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments, Proceedings of EMNLP-2003, Sapporo, Japan, Bahl, L.R., Brown, P.F., desouza, P.V., Mercer, R.L., A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp , Shriberg, E., Bates, R., Taylor, P., Stolcke, A., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C., Can prosody aid the automatic classification of dialog acts in conversational speech?, Language and Speech (Special Issue on Prosody and onversation), 41(3-4), , McCarthy, J. F. and Lehnert, W. G., Using decision trees for coreference resolution, IJCAI-95, Montreal, Canada, pp , Moldovan, D., Badulescu, A., Tatu, M., Antohe, D., Girju, R., Models for the Semantic Classification of Noun Phrases, Computational Lexical Semantics Workshop, Human language Technology Conference (HLT-NAACL), Boston, USA, May 2004.

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

cmp-lg/ Jan 1998

cmp-lg/ Jan 1998 Identifying Discourse Markers in Spoken Dialog Peter A. Heeman and Donna Byron and James F. Allen Computer Science and Engineering Department of Computer Science Oregon Graduate Institute University of

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Dialog Act Classification Using N-Gram Algorithms

Dialog Act Classification Using N-Gram Algorithms Dialog Act Classification Using N-Gram Algorithms Max Louwerse and Scott Crossley Institute for Intelligent Systems University of Memphis {max, scrossley } @ mail.psyc.memphis.edu Abstract Speech act classification

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Accuracy (%) # features

Accuracy (%) # features Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Andreas Vlachos Computer Laboratory University of Cambridge Cambridge, CB3 0FD, UK av308@cl.cam.ac.uk Caroline Gasperin Computer

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

The Ups and Downs of Preposition Error Detection in ESL Writing

The Ups and Downs of Preposition Error Detection in ESL Writing The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie

More information

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Analysis of Probabilistic Parsing in NLP

Analysis of Probabilistic Parsing in NLP Analysis of Probabilistic Parsing in NLP Krishna Karoo, Dr.Girish Katkar Research Scholar, Department of Electronics & Computer Science, R.T.M. Nagpur University, Nagpur, India Head of Department, Department

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Copyright and moral rights for this thesis are retained by the author

Copyright and moral rights for this thesis are retained by the author Zahn, Daniela (2013) The resolution of the clause that is relative? Prosody and plausibility as cues to RC attachment in English: evidence from structural priming and event related potentials. PhD thesis.

More information

Extracting and Ranking Product Features in Opinion Documents

Extracting and Ranking Product Features in Opinion Documents Extracting and Ranking Product Features in Opinion Documents Lei Zhang Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL 60607 lzhang3@cs.uic.edu Bing Liu

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Minimalism is the name of the predominant approach in generative linguistics today. It was first Minimalism Minimalism is the name of the predominant approach in generative linguistics today. It was first introduced by Chomsky in his work The Minimalist Program (1995) and has seen several developments

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

A Computational Evaluation of Case-Assignment Algorithms

A Computational Evaluation of Case-Assignment Algorithms A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Psycholinguistic Features for Deceptive Role Detection in Werewolf

Psycholinguistic Features for Deceptive Role Detection in Werewolf Psycholinguistic Features for Deceptive Role Detection in Werewolf Codruta Girlea University of Illinois Urbana, IL 61801, USA girlea2@illinois.edu Roxana Girju University of Illinois Urbana, IL 61801,

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Optimizing to Arbitrary NLP Metrics using Ensemble Selection Optimizing to Arbitrary NLP Metrics using Ensemble Selection Art Munson, Claire Cardie, Rich Caruana Department of Computer Science Cornell University Ithaca, NY 14850 {mmunson, cardie, caruana}@cs.cornell.edu

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information