Word Sense Disambiguation

Size: px
Start display at page:

Download "Word Sense Disambiguation"

Transcription

1 Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

2 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities.

3 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually comes from a dictionary or thesaurus. Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches.

4 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually comes from a dictionary or thesaurus. Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches. Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory.

5 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually comes from a dictionary or thesaurus. Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches. Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory. Unsupervised techniques.

6 Computers versus Humans Polysemy: most words have many possible meanings.

7 Computers versus Humans Polysemy: most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human...

8 Computers versus Humans Polysemy: most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human... Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases...

9 Computers versus Humans Polysemy: most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human... Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases... Example: The fisherman jumped off the bank and into the water. The bank down the street was robbed!

10 Brief Historical Overview Noted as problem for Machine Translation (Weaver, 1949) A word can often only be translated if you know the specific sense intended (A bill in English could be a pico or a cuenta in Spanish) Bar-Hillel (1960) posed the following: Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Is pen a writing instrument or an enclosure where children play?... declared it unsolvable, left the field of MT!

11 Brief Historical Overview 1970s s Rule based systems Rely on hand crafted knowledge sources 1990s Corpus based approaches Dependence on sense tagged text (Ide and Veronis, 1998) overview history from early days to s Hybrid Systems Minimizing or eliminating use of sense tagged text Taking advantage of the Web

12 Practical Applications Machine Translation Translate bill from English to Spanish Is it a pico or a cuenta? Is it a bird jaw or an invoice?

13 Practical Applications Machine Translation Translate bill from English to Spanish Is it a pico or a cuenta? Is it a bird jaw or an invoice? Information Retrieval Find all Web Pages about cricket The sport or the insect?

14 Practical Applications Machine Translation Translate bill from English to Spanish Is it a pico or a cuenta? Is it a bird jaw or an invoice? Information Retrieval Find all Web Pages about cricket The sport or the insect? Question Answering What is George Miller s position on gun control? The psychologist or US congressman?

15 Practical Applications Machine Translation Translate bill from English to Spanish Is it a pico or a cuenta? Is it a bird jaw or an invoice? Information Retrieval Find all Web Pages about cricket The sport or the insect? Question Answering What is George Miller s position on gun control? The psychologist or US congressman? Knowledge Acquisition Add to KB: Herb Bergson is the mayor of Duluth. Minnesota or Georgia?

16 Overview of the Problem Many words have several meanings (homonymy / polysemy) Ex: chair - furniture or person Ex: child - young person or human offspring

17 Overview of the Problem Many words have several meanings (homonymy / polysemy) Ex: chair - furniture or person Ex: child - young person or human offspring Determine which sense of a word is used in a specific sentence

18 Overview of the Problem Many words have several meanings (homonymy / polysemy) Ex: chair - furniture or person Ex: child - young person or human offspring Determine which sense of a word is used in a specific sentence Note: often, the different senses of a word are closely related Ex: title : right of legal ownership document that is evidence of the legal ownership

19 Overview of the Problem Many words have several meanings (homonymy / polysemy) Ex: chair - furniture or person Ex: child - young person or human offspring Determine which sense of a word is used in a specific sentence Note: often, the different senses of a word are closely related Ex: title : right of legal ownership document that is evidence of the legal ownership sometimes, several senses can be activated in a single context (co-activation) Ex: This could bring competition to the trade: the act of competing the people who are competing

20 Word Senses The meaning of a word in a given context

21 Word Senses The meaning of a word in a given context Word sense representations With respect to a dictionary chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics

22 Word Senses The meaning of a word in a given context Word sense representations With respect to a dictionary chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics With respect to the translation in a second language chair = chaise chair = directeur

23 Word Senses The meaning of a word in a given context Word sense representations With respect to a dictionary chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics With respect to the translation in a second language chair = chaise chair = directeur With respect to the context where it occurs (discrimination) Sit on a chair Take a seat on this chair The chair of the Math Department The chair of the meeting

24 Approaches to Word Sense Disambiguation Knowledge-Based Disambiguation use of external lexical resources such as dictionaries and thesauri discourse properties

25 Approaches to Word Sense Disambiguation Knowledge-Based Disambiguation use of external lexical resources such as dictionaries and thesauri discourse properties Supervised Disambiguation based on a labeled training set the learning system has: a training set of feature-encoded inputs AND their appropriate sense label (category)

26 Approaches to Word Sense Disambiguation Knowledge-Based Disambiguation use of external lexical resources such as dictionaries and thesauri discourse properties Supervised Disambiguation based on a labeled training set the learning system has: a training set of feature-encoded inputs AND their appropriate sense label (category) Unsupervised Disambiguation based on unlabeled corpora The learning system has: a training set of feature-encoded inputs BUT NOT their appropriate sense label (category)

27 All Words Word Sense Disambiguation Attempt to disambiguate all open-class words in a text He put his suit over the back of the chair

28 All Words Word Sense Disambiguation Attempt to disambiguate all open-class words in a text He put his suit over the back of the chair Knowledge-based approaches

29 All Words Word Sense Disambiguation Attempt to disambiguate all open-class words in a text He put his suit over the back of the chair Knowledge-based approaches Use information from dictionaries Definitions / Examples for each meaning Find similarity between definitions and current context

30 All Words Word Sense Disambiguation Attempt to disambiguate all open-class words in a text He put his suit over the back of the chair Knowledge-based approaches Use information from dictionaries Definitions / Examples for each meaning Find similarity between definitions and current context Position in a semantic network Find that table is closer to chair/furniture than to chair/person

31 All Words Word Sense Disambiguation Attempt to disambiguate all open-class words in a text He put his suit over the back of the chair Knowledge-based approaches Use information from dictionaries Definitions / Examples for each meaning Find similarity between definitions and current context Position in a semantic network Find that table is closer to chair/furniture than to chair/person Use discourse properties A word exhibits the same sense in a discourse / in a collocation

32 All Words Word Sense Disambiguation Minimally supervised approaches Learn to disambiguate words using small annotated corpora E.g. SemCor - corpus where all open class words are disambiguated 200,000 running words Most frequent sense

33 Targeted Word Sense Disambiguation Disambiguate one target word Take a seat on this chair The chair of the Math Department

34 Targeted Word Sense Disambiguation Disambiguate one target word Take a seat on this chair The chair of the Math Department WSD is viewed as a typical classification problem use machine learning techniques to train a system

35 Targeted Word Sense Disambiguation Disambiguate one target word Take a seat on this chair The chair of the Math Department WSD is viewed as a typical classification problem use machine learning techniques to train a system Training: Corpus of occurrences of the target word, each occurrence annotated with appropriate sense Build feature vectors: a vector of relevant linguistic features that represents the context (ex: a window of words around the target word)

36 Targeted Word Sense Disambiguation Disambiguate one target word Take a seat on this chair The chair of the Math Department WSD is viewed as a typical classification problem use machine learning techniques to train a system Training: Corpus of occurrences of the target word, each occurrence annotated with appropriate sense Build feature vectors: a vector of relevant linguistic features that represents the context (ex: a window of words around the target word) Disambiguation: Disambiguate the target word in new unseen text

37 Targeted Word Sense Disambiguation Take a window of n word around the target word

38 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word

39 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS tags, frequency,...

40 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS tags, frequency,... An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.

41 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS tags, frequency,... An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. Surrounding context (local features) [ (guitar, NN1), (and, CJC), (player, NN1), (stand, VVB) ]

42 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS tags, frequency,... An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. Surrounding context (local features) [ (guitar, NN1), (and, CJC), (player, NN1), (stand, VVB) ] Frequent co-occurring words (topical features) [fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band] [0,0,0,1,0,0,0,0,0,0,1,0]

43 Targeted Word Sense Disambiguation Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS tags, frequency,... An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. Surrounding context (local features) [ (guitar, NN1), (and, CJC), (player, NN1), (stand, VVB) ] Frequent co-occurring words (topical features) [fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band] [0,0,0,1,0,0,0,0,0,0,1,0] Other features: [followed by player, contains show in the sentence,... ] [yes, no,... ]

44 Unsupervised Disambiguation Disambiguate word senses: without supporting tools such as dictionaries and thesauri without a labeled training text

45 Unsupervised Disambiguation Disambiguate word senses: without supporting tools such as dictionaries and thesauri without a labeled training text Without such resources, word senses are not labeled We cannot say chair/furniture or chair/person

46 Unsupervised Disambiguation Disambiguate word senses: without supporting tools such as dictionaries and thesauri without a labeled training text Without such resources, word senses are not labeled We cannot say chair/furniture or chair/person We can: Cluster/group the contexts of an ambiguous word into a number of groups Discriminate between these groups without actually labeling them

47 Unsupervised Disambiguation Hypothesis: same senses of words will have similar neighboring words

48 Unsupervised Disambiguation Hypothesis: same senses of words will have similar neighboring words Disambiguation algorithm Identify context vectors corresponding to all occurrences of a particular word Partition them into regions of high density Assign a sense to each such region

49 Unsupervised Disambiguation Hypothesis: same senses of words will have similar neighboring words Disambiguation algorithm Identify context vectors corresponding to all occurrences of a particular word Partition them into regions of high density Assign a sense to each such region Sit on a chair Take a seat on this chair

50 Unsupervised Disambiguation Hypothesis: same senses of words will have similar neighboring words Disambiguation algorithm Identify context vectors corresponding to all occurrences of a particular word Partition them into regions of high density Assign a sense to each such region Sit on a chair Take a seat on this chair The chair of the Math Department The chair of the meeting

51 Evaluating Word Sense Disambiguation Metrics: Precision = percentage of words that are tagged correctly, out of the words addressed by the system Recall = percentage of words that are tagged correctly, out of all words in the test set Special tags are possible: Unknown Proper noun Multiple senses Compare to a gold standard SEMCOR corpus, SENSEVAL corpus,...

52 Evaluating Word Sense Disambiguation Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on results

53 Evaluating Word Sense Disambiguation Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on results Coarse versus fine-grained sense distinction

54 Evaluating Word Sense Disambiguation Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on results Coarse versus fine-grained sense distinction chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics

55 Evaluating Word Sense Disambiguation Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on results Coarse versus fine-grained sense distinction chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics bank = a financial institution that accepts deposits and channels the money into lending activities; he cashed a check at the bank ; that bank holds the mortgage on my home bank = a building in which commercial banking is transacted; the bank is on the corner of Nassau and Witherspoon

56 Evaluating Word Sense Disambiguation Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on results Coarse versus fine-grained sense distinction chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics bank = a financial institution that accepts deposits and channels the money into lending activities; he cashed a check at the bank ; that bank holds the mortgage on my home bank = a building in which commercial banking is transacted; the bank is on the corner of Nassau and Witherspoon Sense maps Cluster similar senses Allow for both fine-grained and coarse-grained evaluation

57 Knowledge-based Methods for Word Sense Disambiguation Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text

58 Knowledge-based Methods for Word Sense Disambiguation Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text Resources Yes Machine Readable Dictionaries Raw corpora No Manually annotated corpora

59 Knowledge-based Methods for Word Sense Disambiguation Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text Resources Yes Machine Readable Dictionaries Raw corpora No Manually annotated corpora Scope All open-class words

60 Machine Readable Dictionaries In recent years, most dictionaries made available in Machine Readable format (MRD) Oxford English Dictionary Collins Longman Dictionary of Ordinary Contemporary English (LDOCE)

61 Machine Readable Dictionaries In recent years, most dictionaries made available in Machine Readable format (MRD) Oxford English Dictionary Collins Longman Dictionary of Ordinary Contemporary English (LDOCE) Thesauruses - add synonymy information Roget Thesaurus

62 Machine Readable Dictionaries In recent years, most dictionaries made available in Machine Readable format (MRD) Oxford English Dictionary Collins Longman Dictionary of Ordinary Contemporary English (LDOCE) Thesauruses - add synonymy information Roget Thesaurus Semantic networks - add more semantic relations WordNet EuroWordNet

63 MRD - A Resource for Knowledge-based WSD For each word in the language vocabulary, an MRD provides: A list of meanings Definitions (for all word meanings) Typical usage examples (for most word meanings)

64 MRD - A Resource for Knowledge-based WSD For each word in the language vocabulary, an MRD provides: A list of meanings Definitions (for all word meanings) Typical usage examples (for most word meanings) WordNet definitions/examples for the noun plant 1 buildings for carrying on industrial labor; they built a large plant to manufacture automobiles 2 a living organism lacking the power of locomotion 3 something planted secretly for discovery by another; the police used a plant to trick the thieves ; he claimed that the evidence against him was a plant 4 an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience

65 MRD - A Resource for Knowledge-based WSD A thesaurus adds: An explicit synonymy relation between word meanings

66 MRD - A Resource for Knowledge-based WSD A thesaurus adds: An explicit synonymy relation between word meanings WordNet synsets for the noun plant 1 plant, works, industrial plant 2 plant, flora, plant life

67 MRD - A Resource for Knowledge-based WSD A thesaurus adds: An explicit synonymy relation between word meanings WordNet synsets for the noun plant 1 plant, works, industrial plant 2 plant, flora, plant life A semantic network adds: Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, entailnment, etc.

68 MRD - A Resource for Knowledge-based WSD A thesaurus adds: An explicit synonymy relation between word meanings WordNet synsets for the noun plant 1 plant, works, industrial plant 2 plant, flora, plant life A semantic network adds: Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, entailnment, etc. WordNet related concepts for the meaning plant life - {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus},... meronym: {plant tissue}, {plant part} holonym: {Plantae, kingdom Plantae, plant kingdom}

69 Lesk Algorithm (Michael Lesk 1986): Identify senses of words in context using definition overlap Algorithm: 1 Retrieve from MRD all sense definitions of the words to be disambiguated 2 Determine the definition overlap for all possible sense combinations 3 Choose senses that lead to highest overlap

70 Lesk Algorithm: Example disambiguate PINE CONE PINE 1 kinds of evergreen tree with needle-shaped leaves 2 waste away through sorrow or illness CONE 1 solid body which narrows to a point 2 something of this shape whether solid or hollow 3 fruit of certain evergreen trees

71 Lesk Algorithm: Example disambiguate PINE CONE PINE 1 kinds of evergreen tree with needle-shaped leaves 2 waste away through sorrow or illness CONE 1 solid body which narrows to a point 2 something of this shape whether solid or hollow 3 fruit of certain evergreen trees Pine#1 Cone#1 = 0 Pine#2 Cone#1 = 0 Pine#1 Cone#2 = 1 Pine#2 Cone#2 = 0 Pine#1 Cone#3 = 2 Pine#2 Cone#4 = 0

72 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes

73 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3)

74 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) 43,929,600 sense combinations! How to find the optimal sense combination?

75 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) 43,929,600 sense combinations! How to find the optimal sense combination? Simulated annealing (Cowie, Guthrie, Guthrie 1992) Define a function E = combination of word senses in a given text. Find the combination of senses that leads to highest definition overlap (redundancy)

76 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) 43,929,600 sense combinations! How to find the optimal sense combination? Simulated annealing (Cowie, Guthrie, Guthrie 1992) Define a function E = combination of word senses in a given text. Find the combination of senses that leads to highest definition overlap (redundancy) 1 Start with E = the most frequent sense for each word 2 At each iteration, replace the sense of a random word in the set with a different sense, and measure E 3 Stop iterating when there is no change in the configuration of senses

77 Lesk Algorithm: A Simplified Version Original Lesk definition: measure overlap between sense definitions for all words in context. Identify simultaneously the correct senses for all words in context

78 Lesk Algorithm: A Simplified Version Original Lesk definition: measure overlap between sense definitions for all words in context. Identify simultaneously the correct senses for all words in context Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and current context Identify the correct sense for one word at a time Search space significantly reduced Algorithm for simplified Lesk: 1 Retrieve from MRD all sense definitions of the word to be disambiguated 2 Determine the overlap between each sense definition and the current context 3 Choose the sense that leads to highest overlap

79 Example of simplified Lesk disambiguate PINE in Pine cones hanging in a tree PINE 1 kinds of evergreen tree with needle-shaped leaves 2 waste away through sorrow or illness

80 Example of simplified Lesk disambiguate PINE in Pine cones hanging in a tree PINE 1 kinds of evergreen tree with needle-shaped leaves 2 waste away through sorrow or illness Pine#1 Sentence = 1 Pine#2 Sentence = 0

81 Evaluations of Lesk Algorithm Initial evaluation by M. Lesk 50-70% on short samples of text manually annotated set, with respect to Oxford Advanced Learner s Dictionary Simulated annealing 47% on 50 manually annotated sentences Evaluation on Senseval-2 all-words data, with back-off to random sense (Mihalcea & Tarau 2004) Original Lesk: 35% Simplified Lesk: 47% Evaluation on Senseval-2 all-words data, with back-off to most frequent sense (Vasilescu, Langlais, Lapalme 2004) Original Lesk: 42% Simplified Lesk: 58%

82 Semantic Similarity Words in a discourse must be related in meaning, for the discourse to be coherent (Haliday and Hassan, 1976) Use this property for WSD - Identify related meanings for words that share a common context

83 Semantic Similarity Words in a discourse must be related in meaning, for the discourse to be coherent (Haliday and Hassan, 1976) Use this property for WSD - Identify related meanings for words that share a common context Context span: 1 Local context: semantic similarity between pairs of words 2 Global context: lexical chains

84 Semantic Similarity in a Local Context Similarity determined between pairs of concepts, or between a word and its surrounding context Relies on similarity metrics on semantic networks (Rada et al. 1989)

85

86

87

88 Decision List for WSD (Yarowsky, 1994) Identify collocational features from sense tagged data. Word immediately to the left or right of target : I have my bank/1 statement. The river bank/2 is muddy. Pair of words to immediate left or right of target : The world s richest bank/1 is here in New York. The river bank/2 is muddy. Words found within k positions to left or right of target, where k is often : My credit is just horrible because my bank/1 has made several mistakes with my account and the balance is very low.

89 Decision List for WSD (Yarowsky, 1994) Sort order of collocation tests using log of conditional probabilities. Words most indicative of one sense (and not the other) will be ranked highly.

90 Decision List for WSD (Yarowsky, 1994) Sort order of collocation tests using log of conditional probabilities. Words most indicative of one sense (and not the other) will be ranked highly. ( Abs log p(s = 1 F ) i = Collocation i ) p(s = 2 F i = Collocation i )

91 Decision List for WSD (Yarowsky, 1994)

92 Decision List for WSD (Yarowsky, 1994)

93 Decision List for WSD (Yarowsky, 1994)

94 References I (Gale, Church and Yarowsky 1992) Gale, W., Church, K., and Yarowsky, D. Estimating upper and lower bounds on the performance of word-sense disambiguation programs ACL (Miller et. al., 1994) Miller, G., Chodorow, M., Landes, S., Leacock, C., and Thomas, R. Using a semantic concordance for sense identification. ARPA Workshop (Miller, 1995) Miller, G. Wordnet: A lexical database. ACM, 38(11) (Senseval) Senseval evaluation exercises (Agirre and Rigau, 1995) Agirre, E. and Rigau, G. A proposal for word sense disambiguation using conceptual distance. RANLP (Banerjee and Pedersen 2002) Banerjee, S. and Pedersen, T. An adapted Lesk algorithm for word sense disambiguation using WordNet. CICLING 2002.

95 References II (Cowie, Guthrie and Guthrie 1992), Cowie, L. and Guthrie, J. A. and Guthrie, L.: Lexical disambiguation using simulated annealing. COLING (Jiang and Conrath 1997) Jiang, J. and Conrath, D. Semantic similarity based on corpus statistics and lexical taxonomy. COLING (Lesk, 1986) Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. SIGDOC (Lin 1998) Lin, D An information theoretic definition of similarity. ICML (Mihalcea, Tarau, Figa 2004) R. Mihalcea, P. Tarau, E. Figa PageRank on Semantic Networks with Application to Word Sense Disambiguation, COLING (Patwardhan, Banerjee, and Pedersen 2003) Patwardhan, S. and Banerjee, S. and Pedersen, T. Using Measures of Semantic Relatedeness for Word Sense Disambiguation. CICLING 2003.

96 References III (Resnik 1995) Resnik, P. Using information content to evaluate semantic similarity. IJCAI (Vasilescu, Langlais, Lapalme 2004) F. Vasilescu, P. Langlais, G. Lapalme Evaluating variants of the Lesk approach for disambiguating words, LREC (Yarowsky, 1994) Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of ACL. pp (Yarowsky, 2000) Hierarchical decision lists for word sense disambiguation. Computers and the Humanities, 34.

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German A Comparative Evaluation of Word Sense Disambiguation Algorithms for German Verena Henrich, Erhard Hinrichs University of Tübingen, Department of Linguistics Wilhelmstr. 19, 72074 Tübingen, Germany {verena.henrich,erhard.hinrichs}@uni-tuebingen.de

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, ! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium

More information

Combining a Chinese Thesaurus with a Chinese Dictionary

Combining a Chinese Thesaurus with a Chinese Dictionary Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio

More information

Robust Sense-Based Sentiment Classification

Robust Sense-Based Sentiment Classification Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

First Grade Curriculum Highlights: In alignment with the Common Core Standards

First Grade Curriculum Highlights: In alignment with the Common Core Standards First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

The MEANING Multilingual Central Repository

The MEANING Multilingual Central Repository The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index

More information

Automatic Extraction of Semantic Relations by Using Web Statistical Information

Automatic Extraction of Semantic Relations by Using Web Statistical Information Automatic Extraction of Semantic Relations by Using Web Statistical Information Valeria Borzì, Simone Faro,, Arianna Pavone Dipartimento di Matematica e Informatica, Università di Catania Viale Andrea

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation Tristan Miller 1 Nicolai Erbs 1 Hans-Peter Zorn 1 Torsten Zesch 1,2 Iryna Gurevych 1,2 (1) Ubiquitous Knowledge Processing Lab

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Accuracy (%) # features

Accuracy (%) # features Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Domain Ontology Development Environment Using a MRD and Text Corpus A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

A process by any other name

A process by any other name January 05, 2016 Roger Tregear A process by any other name thoughts on the conflicted use of process language What s in a name? That which we call a rose By any other name would smell as sweet. William

More information

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4 University of Waterloo School of Accountancy AFM 102: Introductory Management Accounting Fall Term 2004: Section 4 Instructor: Alan Webb Office: HH 289A / BFG 2120 B (after October 1) Phone: 888-4567 ext.

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Controlled vocabulary

Controlled vocabulary Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction

Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction Intl. Conf. RIVF 04 February 2-5, Hanoi, Vietnam Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction Ngoc-Diep Ho, Fairon Cédrick Abstract There are a lot of approaches for

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Semantic Evidence for Automatic Identification of Cognates

Semantic Evidence for Automatic Identification of Cognates Semantic Evidence for Automatic Identification of Cognates Andrea Mulloni CLG, University of Wolverhampton Stafford Street Wolverhampton WV SB, United Kingdom andrea@wlv.ac.uk Viktor Pekar CLG, University

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

1. Introduction. 2. The OMBI database editor

1. Introduction. 2. The OMBI database editor OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Jianqiang Wang and Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park,

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

arxiv:cmp-lg/ v1 22 Aug 1994

arxiv:cmp-lg/ v1 22 Aug 1994 arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

USER ADAPTATION IN E-LEARNING ENVIRONMENTS USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Let's Learn English Lesson Plan

Let's Learn English Lesson Plan Let's Learn English Lesson Plan Introduction: Let's Learn English lesson plans are based on the CALLA approach. See the end of each lesson for more information and resources on teaching with the CALLA

More information

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition Student User s Guide to the Project Integration Management Simulation Based on the PMBOK Guide - 5 th edition TABLE OF CONTENTS Goal... 2 Accessing the Simulation... 2 Creating Your Double Masters User

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

Sleeping Coconuts Cluster Projects

Sleeping Coconuts Cluster Projects Sleeping Coconuts Cluster Projects Grades K 1 Description: A story, an indoor relay race for pre-readers and new readers to demonstrate the benefits of doing Bible translation in cluster projects, and

More information

EQuIP Review Feedback

EQuIP Review Feedback EQuIP Review Feedback Lesson/Unit Name: On the Rainy River and The Red Convertible (Module 4, Unit 1) Content Area: English language arts Grade Level: 11 Dimension I Alignment to the Depth of the CCSS

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION LOUISIANA HIGH SCHOOL RALLY ASSOCIATION Literary Events 2014-15 General Information There are 44 literary events in which District and State Rally qualifiers compete. District and State Rally tests are

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

CHANCERY SMS 5.0 STUDENT SCHEDULING

CHANCERY SMS 5.0 STUDENT SCHEDULING CHANCERY SMS 5.0 STUDENT SCHEDULING PARTICIPANT WORKBOOK VERSION: 06/04 CSL - 12148 Student Scheduling Chancery SMS 5.0 : Student Scheduling... 1 Course Objectives... 1 Course Agenda... 1 Topic 1: Overview

More information

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special

More information

Grade Band: High School Unit 1 Unit Target: Government Unit Topic: The Constitution and Me. What Is the Constitution? The United States Government

Grade Band: High School Unit 1 Unit Target: Government Unit Topic: The Constitution and Me. What Is the Constitution? The United States Government The Constitution and Me This unit is based on a Social Studies Government topic. Students are introduced to the basic components of the U.S. Constitution, including the way the U.S. government was started

More information

Ontologies vs. classification systems

Ontologies vs. classification systems Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk

More information

Airplane Rescue: Social Studies. LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group The LEGO Group.

Airplane Rescue: Social Studies. LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group The LEGO Group. Airplane Rescue: Social Studies LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group. 2010 The LEGO Group. Lesson Overview The students will discuss ways that people use land and their physical

More information

Lesson Plan Title Aquatic Ecology

Lesson Plan Title Aquatic Ecology Lesson Plan Title Aquatic Ecology Name (last, first): Larson, Don Scientific Theme(s): C-2: develop an understanding of the structure, function, behavior, development, life cycles, and diversity of living

More information

Effectiveness of Electronic Dictionary in College Students English Learning

Effectiveness of Electronic Dictionary in College Students English Learning 2016 International Conference on Mechanical, Control, Electric, Mechatronics, Information and Computer (MCEMIC 2016) ISBN: 978-1-60595-352-6 Effectiveness of Electronic Dictionary in College Students English

More information