Course Examination Computational 1. Natural Language Processing and Communication Oral presentation (15-20 min), in November [not graded] Short essay (½ -2 pages) on the same topic [not graded] Björn Gambäck Department of Computer and Information Science Norwegian University of Science and Technology Oral exam (ca20 min), ca5-6 December [graded] Grade: sum of points gained in both theory modules SICS, Swedish Institute of Computer Science AB 11.09.2012 TDT13, lecture 1: Björn Gambäck 1 11.09.2012 TDT13, lecture 1: Björn Gambäck 2 Main Reasons to Process Natural Languages Allow computer agents to communicate with people Allow agents to acquire information from (written) language Make it easier for people to communicate with people 11.09.2012 TDT13, lecture 1: Björn Gambäck 3 Languages are for Communication A speaker must put words to his/her thoughts A hearer must recognize the thoughts expressed from the words he/she perceives Both presupposes: Capacity to recognize systematic connections between meaning and linguistic form 11.09.2012 TDT13, lecture 1: Björn Gambäck 4 What is Computational? 1. How can we automate the process of associating semantic representations with expressions of natural language? 2. How can we use logical representations of natural language expressions to automate the process of drawing inferences? Patrick Blackburn and Johan Bos Representation and Inference for Natural Language: A First Course in Computational CSLI Publications, Stanford, California. March 2005 www.blackburnbos.org Two Fundamental Traits of Human Languages Ambiguity A word or a string of words has more than one meaning Redundancy The same information is expressed more than once Björn Gambäck 5 11.09.2012 TDT13, lecture 1: Björn Gambäck 6 NTNU 1
Natural Language Processing General NLP System Architecture Syntax how signs are related to each other Mr. Smith is expressive how signs are related to things Pragmatics how signs are related to people User Modeling Grammar Dialogue Management 11.09.2012 TDT13, lecture 1: Björn Gambäck 7 11.09.2012 TDT13, lecture 1: Björn Gambäck 8 Analysis Depth Analysis Width morphemes words phrases sentences paragraphs texts car-s cars see the cars John doesn t see the cars. Three sports cars are speeding down the street. John doesn t see the cars. He steps out into the street Our story is about a short-sighted man named John. He lives in a small city with narrow streets. One day John goes for a walk. Three sports cars are speeding down the street. John doesn t see the cars. He steps out into the street 11.09.2012 TDT13, lecture 1: Björn Gambäck 9 11.09.2012 TDT13, lecture 1: Björn Gambäck 10 The Research Frontier Some NLP applications Syntax Compositional Situational Pragmatics morphemes words phrases sentences paragraphs texts (What kind of knowledge of language is needed?) Text-to-speech Speech Recognition OCR Information Retrieval Information Extraction Machine Translation Dialogue Systems 11.09.2012 TDT13, lecture 1: Björn Gambäck 11 11.09.2012 TDT13, lecture 1: Björn Gambäck 12 NTNU 2
What is a language? There are 6000-8000 languages in the World. (Why are the figures not more specific than that?) There are 82 languages in Ethiopia. (How can we be sure of that? - Why not 80 or 85?) How many languages are there in Norway?! 11! (according to the Ethnologue): Norwegian: Bokmål, Nynorsk; Norwegian Sign Language, Finnish: Kven Romani: Tavringer, Vlax; Norwegian Traveller Saami: Lule, Pite, North, South One or two individuals (languages)? How can you tell if a person speaks the same language as yourself or if she speaks another, different language? Do two speakers of the same language always speak alike? Is it always impossible to understand a person who speaks another language? 11.09.2012 TDT13, lecture 1: Björn Gambäck 13 11.09.2012 TDT13, lecture 1: Björn Gambäck 14 Grammatical vs. Meaningful Sentences Context-Free Grammar (CFG) Belonging to the string set * brown sleeps blue dog the Grammatical (belonging to the language)? The blue brown blue brown blue dog sleeps Understandable The blue dog sleeps Meaningful The brown dog sleeps LHS = one non-terminal s np, vp. np name. np n. np det, n. vp v. vp v, np. vp v, np, np. name [john]. name [mary]. det [a]. det [the]. n [dog]. n [dogs]. v [snores]. v [see]. v [sees]. v [gives]. 11.09.2012 TDT13, lecture 1: Björn Gambäck 15 11.09.2012 TDT13, lecture 1: Björn Gambäck 16 Grammar Coverage Coverage is never complete Add more rules All grammars leak More specific rules Add more features Syntactic Ambiguity Joe said that Martha expected that it would rain yesterday She asked him or she persuaded him to leave He knew the girl left Tycker du om Line? Vad tycker du om Line? 11.09.2012 TDT13, lecture 1: Björn Gambäck 17 11.09.2012 TDT13, lecture 1: Björn Gambäck 18 NTNU 3
Lexical Ambiguity I made her duck Structural Ambiguity I saw a man in the park with a telescope her - possessive pronoun; her - object pronoun duck - verb; duck - noun make = create; make = cook I saw a man in [the park with a telescope] I saw [a man] in the park [with a telescope] I [saw] a man in the park [with a telescope] 11.09.2012 TDT13, lecture 1: Björn Gambäck 19 11.09.2012 TDT13, lecture 1: Björn Gambäck 20 Redundancy We discussed computers yesterday Den gula bilen (pseudo-)amharic: Man-the he-died Semantic Construction Given a sentence of a language, is there a systematic way of constructing its semantic representation? Can we translate a syntactic structure into an abstract representation of its actual meaning? (e.g. first-order logic) 11.09.2012 TDT13, lecture 1: Björn Gambäck 21 11.09.2012 TDT13, lecture 1: Björn Gambäck 22 Compositional Compositional = The abstract meaning of a sentence (built from the meaning of its parts) Situational = Adds context-dependent information Forget about it World knowledge = knowledge about the world shared between groups of people FBI Technician: What s forget about it? Donnie Brasco: Forget about it is like if you agree with someone, you know, like Raquel Welsh is one great piece of ass forget about it. But then, if you disagree, like A Lincoln is better than a Cadillac? Forget about it! you know? But then, it's also like if something s the greatest thing in the world, like Mingio s Peppers, forget about it. But it s also like saying Go to hell! too. Like, you know, like Hey Paulie, you got a one inch pecker? and Paulie says Forget about it! Sometimes it just means forget about it. Construction of Semantic Representations Three basic principles: Lexicalization: try to keep semantic information lexicalized Compositionality: pass information up compositionally from terminals Underspecification: Don t make a choice unless you have to (the interpretation of ambiguous parts is left unresolved) Björn Gambäck 23 Björn Gambäck 24 NTNU 4
Lexicalization Simple grammars BUT terribly unwieldy lexical feature structures. Try to express lexical generalizations. Alternatively: Extend the formalism to make it more expressive. Let special features have dedicated (complex) behaviour. Compositionality, Frege s Principle Meaning ultimately flows from the lexicon Meanings are combined by syntactic information The meaning of the whole is a function of the meaning of its parts ( parts = the substructure given by syntax) Björn Gambäck 25 11.09.2012 TDT13, lecture 1: Björn Gambäck 26 Underspecification A meaning ϕ of a formalism L is underspecified = represents an ambiguous sentence in a more compact manner than by a disjunction of all readings Phenomena for Underspecification local ambiguities e.g., lexical ambiguities, anaphoric or deictic use of PRO L is complete = L s disambiguation device produces all possible refinements of any ϕ Example: consider a sentence with 3 quantified NPs (with underspecifed scoping relations) L must be able to represent all 2 3! = 64 refinements (partial and complete disambiguations) of the sentence. global ambiguities e.g., scopal ambiguities, collective-distributive readings ambiguous or incoherent non-semantic information e.g., PP-attachment, number disagreement Björn Gambäck 27 Björn Gambäck 28 Word Meaning Built in from the start?! Or learnt by observation? Word usage in context by a community the meaning of a word is its use in the language (Ludwig Wittgenstein 1953) Distributional Hypothesis Words with similar usage have similar meanings Similarity = share contexts (Zellig Harris 1954, 1968) Distributional data used to model similarity you shall know a word by the company it keeps (John Rupert Firth 1957) 11.09.2012 TDT13, lecture 1: Björn Gambäck 29 11.09.2012 TDT13, lecture 1: Björn Gambäck 30 NTNU 5