Slides based on Jurafsky and Martin Speech and Language Processing Semantics 3/3 (Lexical semantics) Ing. Roberto Tedesco, PhD roberto.tedesco@polimi.it NLP AA 17-18 Prof. L. Sbattella
Lexical semantics l The linguistic study of: The meaning of words Relations among words and their meanings l Tools: Resources: lexical databases (e.g. WordNet) Technologies: Word Sense Disambiguation 2
3 Lexical Semantics: How to represent word meanings
Some basic definitions l Lexeme: smallest unit with orthographic form, phonological form and meaning Orthographic form (written form) Lexeme Phonological form (spoken form) Sense (the meaning) 4 l The orthographic form is usually given in base form : the lemma l Lexicon: a collection of lexemes (including special forms like compound nouns)
Lexical relations among lexemes l Most used: Polysemy / Homonymy Synonymy Antonymy Hyponymy/hypernymy Meronymy/holonymy l Others exist 5
Polysemy / Homonymy 6 l Polysemy A lexeme with more related senses l The bank is constructed from red brick (the building) l I withdrew the money from the bank (the financial establishment) Frequent words tend to be polysemic, especially verbs l to get, to put,... l Homonymy Different lexemes with the same form, but with distinct unrelated senses l bank (a financial establishment) l bank (the land alongside or sloping down to a river or lake) l So, we have two bank lexemes, with 4 senses
Homograph and homophones 7 l All the polysemic senses of a lexeme share the same orthographic and phonological form l For homonym lexemes, instead, we can have: Homographs: l Lexemes with the same orthographic form l conduct (noun) [ˈkänˌdəkt] conduct (verb) [kənˈdəkt] Homophones: l Lexemes with the same phonological form l E.g. write and right ; piece and peace Perfect homonym: homograph + homophone l bank (a financial establishment) bank (the land alongside or sloping down to a river or lake)
Problems related to homonymy and polysemy 8 l In the following, problems related to homographs / homophones. Of course they also hold for polysemy l Text-To-Speech is affected by homographs with different phonological form conduct (noun) [ˈkänˌdəkt] and conduct (verb) [kənˈdəkt] bass (noun: a voice in the lowest range) [bās] and bass (noun: the common European freshwater perch) [bas] l Information Retrieval is affected by homographs QUERY: bat care à l bat as an implement with a handle and a solid surface, usually of wood, used for hitting the ball; l bat as a mainly nocturnal mammal capable of sustained flight
Problems related to homonymy and polysemy 9 l Spelling correction is affected by homophones People tend to confound homophones while writing (malapropism): weather à whether This leads to real-word spelling errors l Speech recognition is affected by homophones to, too, two but also by perfect homonyms bank has two senses, that occur in different contexts Speech recognition is based on statistical model of word co-occurrences In these models, the two meanings of bank are conflated As a result, words co-occurring with the wrong sense are considered
Metaphor and Metonymy 10 l Special kinds of polysemy l Methaphor: Constructs an analogy between two things or ideas, the analogy is conveyed by the use of a metaphorical word in place of some other word Germany will pull Slovenia out of its economic slump l Metonymy: A concept is denoted by naming some other concept closely related to it The White House announced yesterday This chapter talks about part-of-speech tagging
Synonymy l Different lexemes with the same meaning youth adolescent big large automobile car l What does it mean for two lexemes to mean the same thing? Practical definition: two lexemes are considered synonyms if they can be substituted for one another in sentences without changing the meaning of the sentence (substitutability) 11
Synonymy l Perfect synonyms are rare Lexemes rarely share all they senses l E.g: Big and large? That s my big sister That s my large sister Fails because big has, among its senses, the notion of being older, while large lacks it 12
Antonymy l Lexemes with opposite sense l Opposite but related! Dark light Boy girl Hot cold Up down In out 13
Hypernymy/hyponymy l Hyponymy: an hyponym lexeme denotes a subclass of another lexeme l Hypernymy: an hypernym lemexe denotes a superclass of another lexeme l E.g., since dogs are canids: dog is hyponym of canid canid is hypernym of dog 14
Meronymy/holonymy l Meronymy: a meronym lexeme denotes a constituent part of, or a member of another lexeme l Holonymy: an holonym lemexe denotes the whole of a lexeme that denotes a part of it l E.g., since trees have trunk and limbs: trunk and limb are meronyms of tree tree is holonym of both trunk and limb 15
Lexical Databases l Model senses and relationship among them l Model a language lexicon l A sense: 16 Represents a specific meaning Is a collection of synonym terms l Relationships are a predefined set: Hyponym/hypernym: the subclass relationship Meronym/holonym: the part-of relationship Synonym/antonym
Lexical Databases l Node: word; arc: lexical relationship 17
A Lexical Database: WordNet 18 l English lexicon database About 150.000 terms: nouns, verbs, adjectives, adverbs l Terms are organized in sets called synsets: A synset contains synonym lexemes A synset carries a specific sense, a meaning A synset has a gloss, explaining the carried meaning A lexeme can appear in several synsets (homonymy/polysemy) l Synsets or single lexemes are connected by a set of predefined relations: So-called semantic relations: connect synsets So-called lexical relations: connect lexemes NB: This is WordNet terminology, they are both lexical relations!
WordNet: Number of senses # senses 19 # verbs
WordNet: Synsets 20 Fonte: http://slideplayer.com/slide/5196/
WordNet: Structure 21 l Nouns and verbs: Two taxonomies of synsets l Adjectives: Pairs of opposite lemexes form a group Each adjective is connected to synonym lexemes l Adverbs: Connected to the related adjectives l NB: WordNet is not a dictionary; it does not contain: Pronouns, articles, particles (e.g. prepositions) I.e., WordNet does not contain the closed vocabulary (the keywords ) of English WordNet contains the open vocabulary of English
WordNet: Relations 22 l Main semantic relations for nouns: X is hypernym of Y: X is superclass of Y Y is hyponym of X: Y is subclass of X X is holonym of Y: X is the whole and Y is part of it Y is meronym of X: Y is part of X X is coordinated with Y: X and Y have a common hypernym l Main semantic relations for verbs: hypernym, hyponym, coordinated X is troponym of Y: X is a particular way to do Y (e.g. X= to walk, Y= to move ) X implies Y: action X implies action Y (X= to snore implies Y= to sleep ) Actually, it is a logical relationship
WordNet: Relations l Main semantic relations for adjectives: similar to l and adverbs: pertainym: connects the related adjective l Some lexical reations: antonymy: opposite adjectives synonymy (lexemes in the same synset are implicitly connected by the synonymy relation) 23
24
25
26
WordNet: Domains l Labels that identify usage domains Associated to synsets, for nouns and verbs Associated to lexemes, for adjectives and adverbs l Domains are actually synset names l E.g. domains associated to adjective light : 27 1 of 15 senses of light Sense 1 light, visible light, visible radiation TOPIC->(noun) physics#2, natural philosophy#1
28 WordNet: Domains
WordNet: Instance of Usually, ontologies do that 29 l Last WordNet versions distinguish between classes and instances l Not easy to separate class and instances Typical problem when one tries to define an ontology E.g. Is Nero d Avola a class or an instance? l instance, because it is an element of the set Wine l Class because, in turn, it is a set of bottles Depends on the goal l WordNet approach: proper nouns are usually instances (this is just a general rule )
WordNet: Attributes Usually, ontologies define that l Adjectives can represent values associated to a noun noun weight has attribute adjectives light, heavy Adjective light is attribute of nouns weight, value, light 30
Multi-language lexical databases: MultiWordNet l Translates WordNet into many languages One-to-one synset translation Is this a sound approach? l Adds to synsets the so-called semantic domain Taxonomy of hundreds of labels, denoting usage domains The goal is similar to the one of WordNet Domains 31
32 MultiWordNet
33 Multi-language lexical databases: EuroWordNet
MultiWordNet vs EuroWordNet l MultiWordNet Quick and dirty approach: implies a one-to-one matching among senses in different languages l But this assumption is not true! It is easy to add new languages l EuroWordNet 34 Sound approach: each language defines its own network The ILI structure is the intermediate language and permits to connects all the languages It is not easy to add new languages
Lexical Databases and NLP l Semantic similarity among words W 1 and W 2 Distance (possibly a weighted distance) in terms of relations connecting two words l Using hypernym/hyponym (path in a tree) l Using all the relations (path in a graph) WordNet is composed of synsets, then: d SN (W 1,W 2 ) = min d SYN (S 1, S 2 ) S 1 sysetsof (W 1 ) S 2 sysetsof (W 2 ) 35 d SYN (S 1,S 2 ) = min path(s 1,S 2 )
Lexical Databases and NLP l Clustering Divide similar words in clusters, using the distance Divide similar documents in clusters, using distances among their words l Advanced search engines Search for a word and its synonyms, hynonyms, etc. Search for an adjective and the derived adverb... 36 l Lemmatization: from the flexed form to the base form (treesà tree; running à to run) Depends on the particular SN: WordNet uses heuristics
A library for WordNet: JWNL 37 JWNL.initialize(new FileInputStream( )); Dictionary dic = Dictionary.getInstance(); Synset synset; IndexWord idxword = dic.lookupindexword(pos.noun,"wine"); if (idxword!= null){ } for (int i = 1; i <= idxword.getsensecount(); i++){ synset = idxword.getsense(i); for (Word w : synset.getwords()){ } System.out.println(w.getLemma());
Internal structure of words 38 l Thematic roles: roles associated with verbal arguments l Selectional restriction: constraints that verbs pose on their arguments l Primitive decomposition: decomposing words in primitive parts l Semantic fields: takes into account the background information that lexemes may share See MultiWordnet s Semantic Domains, or WordNet s Domains
Thematic roles He opened a door Houston s Billy Hatcher broke a bat e, x, y Isa(e,Opening) Opener(e, he) OpenedThing(e, y) Isa(y, door) e, x, y Isa(e, Breaking) Brea ker(e, BillyHatcher) BrokenThing(e, y) Isa(y, bat) 39 l Semantic deep roles: Opener, OpenedThing, Breaker, BrokenThing l Opener, Breaker have something in common They are both volitional actors, often animate, they cause an event to happen à AGENT l OpenedThing, BrokenThing have something in common Inanimate object affected by the action à THEME
Thematic roles 40 l Commonly-used thematic roles
41 Thematic roles: examples
Linking theory l Thematic roles as an intermediate level: Semantic deep role (e.g. Breaker) Thematic role (e.g. AGENT) Grammatical realization (e.g. subject) l Example surface form Houston s Billy Hatcher broke a bat grammatical realization subject verb dir-obj thematic roles AGENT THEME 42 semantic deep roles Breaker BrokenThing
Issues with linking theory l Such thematic roles only applies to arguments of verbs l But other parts of speech have arguments, too: E.g. nouns destruction of the city father of the bride l Linking theory does not consider them 43
FrameNet l An English lexicon listing the syntactic and thematic combinations of each word (not only verbs ) l Each word (Lexical Unit - LU) is defined inside a frame l Each frame has Frame Elements (FEs) The thematic roles, very specific With various possible grammatical realizations l FEs are arranged in Patterns 44 l Frames are connected each other by means of particular relationships l VerbNet is another English verb lexicon
FrameNet Valence Patterns: appreciate.v (Judgment) Thematic roles Cognizer Evaluee Reason The Cognizer makes the judgment Evaluee is the person or thing about whom/which a judgment is made Typically, there is a constituent expressing the reason for the Judge's judgmen Grammatical realizations (Phrase Type. Grammatical Function) e.g.: NP.Obj: Noun Phrase. Object 45
Selectional restrictions 46 l A semantic constraint imposed by a lexeme on the concepts that can fill argument roles associated with it l Remember the sentence: I wanna eat someplace that s close to Politecnico? Try to interpret it using the transitive version of eat l Transitive version of eat has AGENT and THEME roles: l I wanna eat someplace that s close to Politecnico AGENT THEME Semantic ill-formedness (unless you are Godzilla ) THEME should be edible, for the transitive form of eat Selectional restriction violation
Representing selectional restrictions I want to eat an hamburger l Representation with roles e, y Eating(e) Agent(e, Speaker) Theme(e, y) Isa(y, Hamburger) l Adding restrictions e, y Eating(e) Agent(e, Speaker) Theme(e, y) Isa(y, Hamburger) Isa(y, EdibleThing) 47 l Using WordNet it is possible to derive that a word is edible Following hypernyms taxonomy
Hamburger is edible 48 l Hypothesis: I must know that the word food means something edible l I must map EdibleThing to food Actually, on the synset containing food
Primitive decomposition l So far, words seem to represent atomic symbols carrying semantic information l But words have internal structure l For verbs: CD (Conceptual Dependency) 49 Eleven primitive predicates Used to represent all predicate-like language expressions Each verb is a combination of such primitive predicates The waiter brought Mary the check brought : physical movement of an object + change of possession/control of an object
Primitive decomposition The waiter brought Mary the check 50 x, y Atrans(x) Actor(x,Waiter) Object(x, Check) To(x, Mary) Ptrans(y) Actor(y,Waiter) Object(y, Check) To(y, Mary)
51 Lexical Semantics: Word Sense Disambiguation (WSD)
1) WSD & selectional restrictions l WSD as a side-effect of semantic analysis l Restrictions eliminate ill-formed components l As a result, the right meanings survive l If the predicate in unambiguous: I wash dishes ( wash requires something washable) I eat this dish ( eat requires edible thing) The predicate selects the correct sense of its argument ( dish ) l If the argument is unambiguous: Which airlines serve Denver? ( Denver is a location) Which one serve breakfast? ( breakfast is edible ) 52 The argument selects the sense of the predicate ( serve )
WSD & selectional restrictions l If both predicate and arguments have multiple senses: I m looking for a restaurant that serves vegetarian dishes Both are ambiguous; several sense combinations 53 But, in this case, only one sense combination does not lead to selectional constraints violation ( serve as serving food and dish as edible thing ) l Limitations of this approach: What kind of dishes do you recommend? ( dish as?) You can t eat gold for lunch if you re hungry! ( gold is not edible. Violation? No, because of the can t ) Mr. Kulkarni ate glass on an empty stomach (for Mr. Kulkarni, glass is edible!) Inter FC will eat Milan AC at the next match (metaphor)
2) WSD & Machine Learning l Classify words by means of a stochastic model Classes: the meanings l Input: Word to classify (the so-called target word ) The portion of text where it is embedded (context) Usually, POS of the words (target and context) Often, morphologic analysis is performed on words Less often, some form of parsing is used 54 l Output: The right class (i.e., the right meaning)
Features l Input is transformed into a set of features l Common features for WSD: The target word itself The target word collocations The target word co-occurrences l Representation: Per each word, a vector of feature name/value pairs is computed Such vectors are used to train, test, and run the model 55 l First of all we need to chose the window that represents the context of the word to classify
Window An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps l Window: +/- 2 words l Target word: bass An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps 56
Features l The target word (not the lemma!) l Collocation About context words (usually, in base form) in specific positions around the word to classify l Co-occurrence Whether a given word (usually, its base form) appears in the context of the target word, or not 57
Collocation l About context words in specific positions around the target word E.g. word base-form, POS [,word n-2, POS n-2, word n-1, POS n-1, word n+1, POS n+1 ] l Representation: a vector Using the window=+/-2: guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand, VVB] 58
Co-occurrence l Whether a given word (usually, the base form) appears in the context of the target word, or not Previous operation: collect the n most frequent co-occurring words, according to a corpus, for each target word Feature calc.: select words appearing in the window l Representation: a vector Using window=+/-2: e.g., guitar and bass player stand E.g., collect the n=12 most frequent co-occurring words in sentences with the target word bass (every meaning): [fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band] Then, example of feature: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0] for the target word bass 59 player guitar
Example: bass 60 l Sense s {1, 2, 3, 4, 5, 6, 7, 8}
Supervised machine learning 61 l Such models undergo a training phase: Input: a training set Output: the trained model l Training set: a (usually huge) set of samples A sample: (list of features; right class) E.g.: ( [guitar, NN, and, CJC, player, NN, stand, VVB], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], bass; right class:2 ) l Popular models: Naïve Bayes, Decision lists/trees, Neural Nets, Support Vector Machines, etc.
Naïve Bayes l P(s): sense prior probability l v j : j-th feature l P(v j s): probability of feature v j, given sense s l Use a tagged corpus to calculate these values l A sample 62 Tags: the right senses guitar and bass player stand v 1 : [guitar, and, player, stand] v 2 : [NN, CJC, NN, VVB] v 3 : [0,0,0,1,0,0,0,0,0,0,1,0] v 4 : bass s: 7 (Tag: the right sense)
Naïve Bayes l Having n features, we want to find: l Using Bayes: ŝ = argmax s S ŝ = argmax s S P(s v 1, v 2,..., v n ) P(v 1, v 2,..., v n s)p(s) P(v 1, v 2,..., v n ) 63 l Denominator does not depend on s à does not modify the result of argmax à it can be deleted ŝ = argmax s S P(v 1, v 2,..., v n s)p(s) l Finally, assuming indepencence of features: ŝ = argmax s S n j=1 P(s) P(v j s)
Bootstrapping approaches l Common issue: a large corpus is needed! l Bootstrap: Start with a small number of instances of each sense for each lexeme (seeds) Train a classifier Use the classifier to label a larger set of words Check correctness of the labeling Repeat l Selecting seeds 64 Hand-label a subset of the corpus, using the one sense per collocation approach
The one sense per collocation approach 65 l For each lexeme (i.e. the target sense); discover word(s) that co-occur frequently l Use sentences where such words appear as a seeding set for the target lexeme l E.g. bass Assume play occurs with the music sense and fish occurs with the fish sense Select sentences containing either play or fish, not both! l How to select co-occurring words and the related sense? By hand (examining the co-occurring words and the target lexeme) Using a lexical database
66 REFERENCES
67 On lexical databases l WordNet http://wordnet.princeton.edu/ http://www.aclweb.org/anthology/j/j06/j06-1001.pdf l WordNet Domains http://wndomains.fbk.eu/ l MultiWordNet http://multiwordnet.itc.it/english/home.php http://wndomains.itc.it/wordnetdomains.html l EuroWordNet http://www.illc.uva.nl/eurowordnet/ l Global WordNet http://globalwordnet.org
On verbal frames l FrameNet http://framenet.icsi.berkeley.edu/ l VerbNet http://verbs.colorado.edu/~mpalmer/projects/verbnet.html l PropNet http://verbs.colorado.edu/~mpalmer/projects/ace.html l Unified Verb Idex http://verbs.colorado.edu/verb-index/ 68
Unifying lexical resources l SemLink http://verbs.colorado.edu/semlink/ 69