function(n1,n2) will return the frequency of the input noun pair (n1,n2) appearing in the corpus. So the frequency of (n1,n2) and (n2,n3) determines

Size: px
Start display at page:

Download "function(n1,n2) will return the frequency of the input noun pair (n1,n2) appearing in the corpus. So the frequency of (n1,n2) and (n2,n3) determines"

Transcription

1 CIS 630 Class Project Szu-ting Yi and Susan Converse 18 December 2000 I. Introduction Compound nouns, or noun-noun compounds, are prevalent in both English and Chinese. Handling them properly poses some challenges for NLP systems. First there is the problem of just identifying the compounds, recognizing what their boundaries are and what the components are. For both English and Chinese, determining the components is related to the problem of deciding what unit size should be stored in the lexicon and in the Chinese case it is part of the problem (or solution) of segmenting the text into "words." Second, there is the problem of determining the syntactic structure of the compound itself (assuming a binary branching parse structure). Like the prepositional phrase attachment problem, when there are three or more nouns in a compound, there is a choice of which components bind more closely to one another. That is, given N1 N2 N3, should the modification relations be left bracketed as [[N1 N2] N3] or right bracketed as [N1 [N2 N3]]? Finally, there is the challenge of determining the semantic relation/s between/among the components and interpreting the whole compound according to the choices made during partitioning and parsing the compound. In a 1995 ACL paper, Mark Lauer(1) proposed some statistical methods for syntactic analysis of compound nouns, addressing the second of the three problems just outlined. Although Lauer was working on English compounds, his methods are not in principle language-specific, and we have attempted in this paper to apply a similar approach to Chinese noun compounds. II. Goal and Methods The goal of our system is to decide the syntactic structure of a three-noun sequence. That is, give a noun triple, our system will determine whether it is left-branching or right-branching. In order to achieve this goal, we tested two different models using four different statistical methods. The two models, called the _adjacency model_ and the _dependency model_, are taken from Lauer s paper. In each model, the strength of the association between two nouns in the triple (e.g., n1 and n2) is compared to the strength of the other possible pairing (e.g., n2 with n3). For the _adjacency model_ (the model most often found in the literature) the pairings are the two adjacent pairs (n1,n2) and (n2,n3). If the 1

2 (n1,n2) bond is stronger than (n2,n3) (in which the meaning of "stronger" will be discussed presently), then the compound is left branching, otherwise the compound is assumed to be right branching. Lauer argues that the _dependency model_ that he describes in his paper is a better predictor of branching behavior. The _dependency model_ compares the strength of the (n1,n2) bond against the strength of the (n1,n3) bond instead of that of (n2,n3). Figure 1: left left v v v v NN1 NN2 NN3 NN1 NN2 NN3 ^ ^ ^ ^ right right ADJACENCY DEPENDENCY Lauer uses the example "calcium ion exchange" to illustrate the motivation for the dependency model. While there appears to be no a priori reason that either "calcium ion" or "ion exchange" would be more frequent than the other, the general understanding of the compound is that calcium is characterizing the ions, not the exchange (we don t say "calcium exchange", except in the rare nutritional setting), and we would therefore bracket the compound as follows: [[calcium ion] exchange] On the flip side, given the compound "IRCS Friday colloquium," the association between "IRCS" and "colloquium" is far greater than the association between "IRCS" and "Friday," and we would bracket the compound [IRCS [Friday colloqium]]. Given these two models, we measured the strength of the association between the two nouns in a pair using four different statistical functions. For example, in the adjacency model, given a noun triple (n1 n2 n3), we would calculate function(n1,n2) and function(n2,n3) and compare the function values returned for the pairs. If the former value is greater than the latter, then we say that (n1,n2) are more likely to be put together than (n2,n3) and thus (n1 n2 n3) should have a left-branching structure; otherwise we say the likelihood of (n2,n3) being together is larger, and we assign the noun triple a right-branching structure. We have tried four statistical methods for the function mentioned above. The four methods are a pure frequency method, mutual information, the t test, and the chi-square test. The details are as follows. a) pure word frequency method: 2

3 function(n1,n2) will return the frequency of the input noun pair (n1,n2) appearing in the corpus. So the frequency of (n1,n2) and (n2,n3) determines the final decision. Of course this the simplest method. The following three methods are all well-known in the statistical natural language processing literature when identifying collocations. The goals and the logic of applying these statistical tests are quite similar, only the the motivations behind what triggers the dependency differ. When executing statistical-function(n1,n2), we first assume the null hypothesis -- that if two things happen together it is a coincidence and not because of some dependency between the two events -- and see whether the values we get applying the statistical methods are big enough to reject the null hypothesis. If they are then we can positively state that two things are dependent. The null hypothesis we used in our experiments is the probability of seeing n1 times that of seeing n2, because we assume two things are independent. b) mutual information method: The equation we present is the pointwise mutual information. Mutual information is an information-theoretically motivated measure for discovering interesting collocations, and roughly it is a measure of how much one word tells us about the other. The mutual information score depends heavily on the frequency of the individual words, however, so instead of saying it is a good measure of dependency, we would rather say it is a good measure of _independence_. In addition to this, our data do not present good distributions. We therefore also tried the t test and chi-square test. c) t test Another test that has been widely used for determining the degree of dependence between two things is the t test. The t test looks at the difference between the observed and expected means, scaled by the variance of the data, and tells us how likely one is to get a sample of that mean and variance assuming that the sample is drawn from a normal distribution with that expected mean. d) chi-square test The t test has been criticized because it assumes that probabilities are approximately normally distributed, which is not true in general. An alternative test for dependence which does not assume normally distributed probabilities is the chi-square test. Just as application of the t test is problematic because of the underlying normality assumption, so is application of chi-square test in cases where 3

4 the sample size is too small. III. The Data The Corpus We used the Chinese Treebank ( which is a parsed corpus of newswire articles from Xinhuashe ( - New China News Agency). There are 325 separate articles, with a total of 99,720 words. "Word" in this case is anything tagged with a part-of-speech tag, including punctuation (see the website for the tagging guidelines). A (non-punctuation) word is usually two characters in length, with single- and triple-character words next most common. Proper names may have many characters in them. In this corpus there are 4,453 different noun types (unique nouns). Of these, 2,191 (just about half) occur only once, and 650 appear only twice. These statistics include all tagged text, including headlines and bylines. Because of the nature of the corpus, there are a few words that have artificially high frequencies, such as _dian_ ( -- equivalent to "wire" in a byline -- which is the 7th most frequent noun) and Xinhuashe ( -- New China News Agency, which also appears in bylines, is the 24th most frequent noun). There are approximately 3200 sentences, excluding headlines and bylines. The Training Data Two versions of the training data were extracted from sentences (but not the headlines or bylines) of the Chinese Treebank: a syntax-blind version and an enhanced version. We only used part-of-speech tags in extracting the blind version of the training data. For the enhanced version, we took into account the bracketing information from the Treebank parses. We called the latter training set the "enhanced" version, not only because we are taking advantage of more linguistic knowledge, but also because we hope we can getter better results. Take the following sentence as an example. ((IP (LCP-TMP (NP (NT )) jin1nian2 -- this year (LC )) lai2 -- coming (PU ) (NP-TPC (DNP (LCP (NP (NP-PN-APP (NR ) zhong1 -- China (NR )) han2 -- Korea (QP (CD )) liang3 -- two (NP (NN ))) guo2 -- country (LC )) zhi1jian4 -- between (DEG )) de0 -- particle (NP (NN ) jing1mao4 -- economics&trade 4

5 (NN ))) wang3lai2 -- contacts (NP-SBJ (NN )) fa1zhan3 -- expansion (VP (VA )) xun4su4 -- rapid (PU ))) In the syntax-blind version, any adjacent pair of nouns was counted, ignoring syntactic bracketing. This would be equivalent to only POS-tagging a corpus and then taking any adjacent pairing of nouns. In the above sentence the pairs ( ), where the two nouns appear in the same phrase, and ( ), in which the nouns are in different phrases, both would be extracted. (Note that in addition to making pairs that are not motivated by constituent structure, this clearly biases against the dependency calculations, since no (n1,n3) pair would ever be counted.) In the "enhanced" training data, only the pair ( ) would be counted, since it appears within a single NP. In addition to not crossing constituent boundaries in the "enhanced" version of the noun pairs extraction, pairs were not extracted from noun compounds of three or more. For example, no noun pair or triple was extracted from the following four-noun expression: (IP (NP-SBJ (NN ) wai4zi1 -- foreign capital/investment (NN ) qi3ye4 -- enterprise/business (NN ) chu1kou3 -- export (NN )) chan3pin3 -- products/produce (VP (VV ))) tui4shui4 -- drawback... The blind version of the pairs file would have three pairs, however. In each set of training data, we have two files. One is the single noun file, with one noun and its corpus frequency on each line. The other file is the noun pair file, in which each line contains a noun pair and its corresponding frequency in the corpus. Observing the training data we have, we found two serious limitations. Because the data we have are so small in number, first of all we have to face the sparse data problem. Furthermore, the distribution of our training data is not as representative as a much larger data set would be. For example, for the two files of the enhanced version, in the single noun file, we have 4453 entries in total, and among them, 2191 nouns appear only one, 650 nouns appear twice, and about 72.3% of nouns appear three or fewer times. The situation is worse when we look at the noun pair file, where more than three fourths of the noun pairs appear only once. The Test Data The test data consist of a file of 1186 noun triples, extracted syntactically 5

6 using the same kinds of bracketing information that were used in extracting the "enhanced" pairs. Triples were not extracted from noun compounds of four or more nouns, nor were they extracted across constituent boundaries. IV. Evaluation We asked three Chinese native speakers to decide the bracketing structure of the 1186 noun triples to develop the gold standard. Among those three people, two are from Taiwan, and they both are majors in Linguistics. The remaining one is from Mainland China and does not have any linguistic background. Since the Chinese Treebank data are from Mainland China news articles, we expected that the person from Mainland China could do the judgements in a more intuitive way, and that the two people from Taiwan would draw on their linguistic knowledge. The three judges spent from 2 to 7 hours to finish the whole job, and the average time spent was 4 hours. Several questions were raised while they were doing the judgements. They are: 1) flat structure versus binary branching structure; 2) implicit possessive usage; 3) lexical or syntactic ambiguity; 4) don t understand the term at all. 1) flat structure versus binary branching structure Whether a noun sequence has a flat structure or a binary branching structure is always a debatable question. Sometimes a judge just could not distinguish which two nouns should be paired together and had to leave all three nouns at the same level. The triple ( ) (sheng3 qu1 shi4 -- province district municipality) is one such example. Cases like these might be interpreted as really being (n1 conjunction n2 conjunction n3), with the conjunctions just dropped. This is consistent with what Lauer found. He noted that there will be some noun compounds that will be ambiguous even in context. He found that 35 out of 279 noun triples could not be assigned either left- or right-branching structures, even when context was taken into account. 2) implicit possessive usage At times, in Chinese a given noun pair (n1 n2) can be interpreted as a noun compound (n1n2) that can be paraphrased as (n1 s n2). This kind of usage will have an effect on determining whether a noun triple has a left- or right- branching structure. For example, the first two nouns in ( ) (dang1di4 jing1ji4 zhi1zhu4 -- that locale economy mainstay) could be construed as "that locale s economy." 6

7 3) lexical or syntactic ambiguity Although by common sense it might seem that when a word consists of more than two characters it would be less ambiguous (since each character might be seen to be refining the meaning), nevertheless, there still are nouns which have lexical or syntactic ambiguity, and these make it hard for the annotators to make their judgements. 4) don t understand the term at all There are just a few of these cases. Each person had four to six terms that to them were unknown. The Answer Keys Based on the three set of answers, we applied two different guidelines to get two sets of answer keys. The first guideline is to use the answers that all three annotators agreed with. By this, we got 369 left-branching answers (L), 143 right-branching answers (R), 8 undecidable answers (U), and 666 at least one has a different answer (N). That means we only have = 512 valid answers. We called this the agreement_all_three version. The second guideline is to use the answers on which at least two annotators agreed. From this standard, we got 685 L, 365 R, 38 U, and 98 N. In total, we have 1050 valid answers. This was called agreement_above_two answer key. V. The Results If we use the agreement_all_three answer key and the adjacency model, we get the following results: (all percents are percent correct for the method): using blind-version of noun pairs, chi-square test: 62.7% mutual information: 62.3% t test: 72.9% word frequency: 72.5% with enhanced-version of noun pairs, chi-square test: 68.0% mutual information: 75.9% t test: 76.0% word frequency: 78.9% 7

8 If we use the agreement_all_three answer key, and the dependency model, the results are as follows: using blind-version of noun pairs, chi-square test: 72.5% mutual information: 71.9% t test: 73.5% word frequency: 72.8% with enhanced-version of noun pairs, chi-square test: 74.8% mutual information: 71.3% t test: 71.3% word frequency: 74.0% If we use the agreement_above_two version answer key, the accuracy rate distributes similarly, but the accuracy rate is lower in general. By comparison, an assignment of left-branching to all triples results in an accuracy of 72.1% for the agreement_all_three standard, and an accuracy of 65.2% for the agreement_above_two standard. VI. Discussion Using the adjacency model, the "enhanced" version of the noun pairs outperformed the syntax-blind version by a lot: the accuracy difference ranges from 5% to 13%. However, if we switch to the dependency model, the difference is not that obvious, it seems that we don t need too much linguistic information using dependency model. The chi-square method did not perform well. The poor performance was probably due to too small a sample size. Comparing the t test and the mutual information test, the t test works a little bit better than mutual information method. At this moment, the word frequency method is the best one. Although the simplest method of course can (should?) be the best solution, this result is unexpected. Possible causes bring us back to our training data. It is not a complex data set, and due to the over-simplified distribution and the small amount of data, the simple word frequency method won the championship. If we can get more data, or we have good ways to normalize our training data, we would expect that the t test would perform the best. VII. Future Work As indicated before, the system has a very severe sparse data problem. 8

9 Most of the single nouns and noun pairs appear only once. To alleviate this shortcoming, tokenizing our training set in a more coarse-grained manner might be a good approach. In other words, we would like to change the tokenization of our training data from the word level we have used thus far to a class or category level tokenization -- that is, to cluster our nouns into several categories. This is the approach Lauer used (borrowed from Resnick and Hearst (1993(2))). He grouped words into classes and essentially compared the mutual information between the classes the words belonged to instead of the information between the words themselves. Lauer used Roget s Thesaurus for the classes. We will use HowNet, which can be treated as a Chinese semantic network, to define the categories we want. HowNet s noun category is a hierarchical structure, and it has, at deepest, three levels. Here, we only use the information of the highest level. The algorithm for assigning a class name to a noun entry we have in the training data is shown below: 1. Get the noun (n1) from our single file 2. Look it up in the HowNet dictionary 3. Extract the definition part of that dictionary entry 4. Extract the highest level information from the definition we get from 3 5. Assign the noun the class name we get from 4 6. See the next noun entry We have done a preliminary test on this algorithm. However, not every noun in our training data has an entry as a noun in the HowNet dictionary, and the overlap between the nouns and the HowNet dictionary noun entries is surprisingly limited. This forces us to one of two strategies. We can keep the classifying guidelines derived from HowNet, and assign the class name for each noun by hand. Another approach would be to see if a Chinese Treebank noun that is not categorized as a noun in HowNet appears elsewhere in HowNet with another classification, such as a verb. References (1) Mark Lauer, "Corpus Statistics Meet the Noun Compound: Some Empirical Results" pp 47-54, _Proceedings of the Annual Meeting of the Association for Computational Linguistics_ 1995 (2) P. Resnik and M. Hearst "Structural Ambiguity and Conceptual Relations" pp 58-64, _Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives_ Ohio State University June

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Word Sense Disambiguation

Word Sense Disambiguation Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Procedia - Social and Behavioral Sciences 154 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 154 ( 2014 ) 263 267 THE XXV ANNUAL INTERNATIONAL ACADEMIC CONFERENCE, LANGUAGE AND CULTURE, 20-22 October

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

The Choice of Features for Classification of Verbs in Biomedical Texts

The Choice of Features for Classification of Verbs in Biomedical Texts The Choice of Features for Classification of Verbs in Biomedical Texts Anna Korhonen University of Cambridge Computer Laboratory 15 JJ Thomson Avenue Cambridge CB3 0FD, UK alk23@cl.cam.ac.uk Yuval Krymolowski

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

The Ups and Downs of Preposition Error Detection in ESL Writing

The Ups and Downs of Preposition Error Detection in ESL Writing The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

A Statistical Approach to the Semantics of Verb-Particles

A Statistical Approach to the Semantics of Verb-Particles A Statistical Approach to the Semantics of Verb-Particles Colin Bannard School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW, UK c.j.bannard@ed.ac.uk Timothy Baldwin CSLI Stanford

More information

Methods for the Qualitative Evaluation of Lexical Association Measures

Methods for the Qualitative Evaluation of Lexical Association Measures Methods for the Qualitative Evaluation of Lexical Association Measures Stefan Evert IMS, University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart, Germany evert@ims.uni-stuttgart.de Brigitte Krenn Austrian

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Review in ICAME Journal, Volume 38, 2014, DOI: /icame Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.

More information

Web as a Corpus: Going Beyond the n-gram

Web as a Corpus: Going Beyond the n-gram Web as a Corpus: Going Beyond the n-gram Preslav Nakov Qatar Computing Research Institute, Tornado Tower, floor 10 P.O.box 5825 Doha, Qatar pnakov@qf.org.qa Abstract. The 60-year-old dream of computational

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary Sanni Nimb, The Danish Dictionary, University of Copenhagen Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary Abstract The paper discusses how to present in a monolingual

More information

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic

More information

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, ! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

A Computational Evaluation of Case-Assignment Algorithms

A Computational Evaluation of Case-Assignment Algorithms A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Combining a Chinese Thesaurus with a Chinese Dictionary

Combining a Chinese Thesaurus with a Chinese Dictionary Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

An Introduction to the Minimalist Program

An Introduction to the Minimalist Program An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Providing student writers with pre-text feedback

Providing student writers with pre-text feedback Providing student writers with pre-text feedback Ana Frankenberg-Garcia This paper argues that the best moment for responding to student writing is before any draft is completed. It analyses ways in which

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

A corpus-based approach to the acquisition of collocational prepositional phrases

A corpus-based approach to the acquisition of collocational prepositional phrases COMPUTATIONAL LEXICOGRAPHY AND LEXICOl..OGV A corpus-based approach to the acquisition of collocational prepositional phrases M. Begoña Villada Moirón and Gosse Bouma Alfa-informatica Rijksuniversiteit

More information

Lexical category induction using lexically-specific templates

Lexical category induction using lexically-specific templates Lexical category induction using lexically-specific templates Richard E. Leibbrandt and David M. W. Powers Flinders University of South Australia 1. The induction of lexical categories from distributional

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

Computerized Adaptive Psychological Testing A Personalisation Perspective

Computerized Adaptive Psychological Testing A Personalisation Perspective Psychology and the internet: An European Perspective Computerized Adaptive Psychological Testing A Personalisation Perspective Mykola Pechenizkiy mpechen@cc.jyu.fi Introduction Mixed Model of IRT and ES

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

A Comparative Study of Research Article Discussion Sections of Local and International Applied Linguistic Journals

A Comparative Study of Research Article Discussion Sections of Local and International Applied Linguistic Journals THE JOURNAL OF ASIA TEFL Vol. 9, No. 1, pp. 1-29, Spring 2012 A Comparative Study of Research Article Discussion Sections of Local and International Applied Linguistic Journals Alireza Jalilifar Shahid

More information

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

Generation of Referring Expressions: Managing Structural Ambiguities

Generation of Referring Expressions: Managing Structural Ambiguities Generation of Referring Expressions: Managing Structural Ambiguities Imtiaz Hussain Khan and Kees van Deemter and Graeme Ritchie Department of Computing Science University of Aberdeen Aberdeen AB24 3UE,

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

The Discourse Anaphoric Properties of Connectives

The Discourse Anaphoric Properties of Connectives The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,

More information

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Montana Content Standards for Mathematics Grade 3 Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Contents Standards for Mathematical Practice: Grade

More information