Björn Gambäck 1 October 2013

Similar documents
ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

Context Free Grammars. Many slides from Michael Collins

Compositional Semantics

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Argument structure and theta roles

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Construction Grammar. University of Jena.

Control and Boundedness

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Chapter 4: Valence & Agreement CSLI Publications

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Applications of memory-based natural language processing

The College Board Redesigned SAT Grade 12

An Interactive Intelligent Language Tutor Over The Internet

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Proof Theory for Syntacticians

Some Principles of Automated Natural Language Information Extraction

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Highlighting and Annotation Tips Foundation Lesson

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Word Stress and Intonation: Introduction

Aspectual Classes of Verb Phrases

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Loughton School s curriculum evening. 28 th February 2017

Words come in categories

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Analysis of Probabilistic Parsing in NLP

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Developing a TT-MCTAG for German with an RCG-based Parser

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Grammars & Parsing, Part 1:

AQUA: An Ontology-Driven Question Answering System

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Sight Word Assessment

RESPONSE TO LITERATURE

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Facing our Fears: Reading and Writing about Characters in Literary Text

Derivational and Inflectional Morphemes in Pak-Pak Language

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Writing a composition

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

What the National Curriculum requires in reading at Y5 and Y6

Copyright Corwin 2015

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

CHILDREN S POSSESSIVE STRUCTURES: A CASE STUDY 1. Andrew Radford and Joseph Galasso, University of Essex

Achievement Level Descriptors for American Literature and Composition

Frequency and pragmatically unmarked word order *

Constraining X-Bar: Theta Theory

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Psychology and Language

Language acquisition: acquiring some aspects of syntax.

Theoretical Syntax Winter Answers to practice problems

Correlated GRADE. Congratulations on your purchase of some of the finest teaching materials in the world. to State Standards

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Generation of Referring Expressions: Managing Structural Ambiguities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Som and Optimality Theory

Unit 8 Pronoun References

Abstractions and the Brain

Ch VI- SENTENCE PATTERNS.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Intensive English Program Southwest College

Fears and Phobias Unit Plan

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Part I. Figuring out how English works

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Controlled vocabulary

Pre-Processing MRSes

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Pseudo-Passives as Adjectival Passives

Writing Research Articles

Good-Enough Representations in Language Comprehension

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

CPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities

Switched Control and other 'uncontrolled' cases of obligatory control

Exemplar Grade 9 Reading Test Questions

Cross Language Information Retrieval

Phenomena of gender attraction in Polish *

Specifying a shallow grammatical for parsing purposes

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Transcription:

Course Examination Computational 1. Natural Language Processing and Communication Oral presentation (15-20 min), in November [not graded] Short essay (½ -2 pages) on the same topic [not graded] Björn Gambäck Department of Computer and Information Science Norwegian University of Science and Technology Oral exam (ca20 min), ca5-6 December [graded] Grade: sum of points gained in both theory modules SICS, Swedish Institute of Computer Science AB 11.09.2012 TDT13, lecture 1: Björn Gambäck 1 11.09.2012 TDT13, lecture 1: Björn Gambäck 2 Main Reasons to Process Natural Languages Allow computer agents to communicate with people Allow agents to acquire information from (written) language Make it easier for people to communicate with people 11.09.2012 TDT13, lecture 1: Björn Gambäck 3 Languages are for Communication A speaker must put words to his/her thoughts A hearer must recognize the thoughts expressed from the words he/she perceives Both presupposes: Capacity to recognize systematic connections between meaning and linguistic form 11.09.2012 TDT13, lecture 1: Björn Gambäck 4 What is Computational? 1. How can we automate the process of associating semantic representations with expressions of natural language? 2. How can we use logical representations of natural language expressions to automate the process of drawing inferences? Patrick Blackburn and Johan Bos Representation and Inference for Natural Language: A First Course in Computational CSLI Publications, Stanford, California. March 2005 www.blackburnbos.org Two Fundamental Traits of Human Languages Ambiguity A word or a string of words has more than one meaning Redundancy The same information is expressed more than once Björn Gambäck 5 11.09.2012 TDT13, lecture 1: Björn Gambäck 6 NTNU 1

Natural Language Processing General NLP System Architecture Syntax how signs are related to each other Mr. Smith is expressive how signs are related to things Pragmatics how signs are related to people User Modeling Grammar Dialogue Management 11.09.2012 TDT13, lecture 1: Björn Gambäck 7 11.09.2012 TDT13, lecture 1: Björn Gambäck 8 Analysis Depth Analysis Width morphemes words phrases sentences paragraphs texts car-s cars see the cars John doesn t see the cars. Three sports cars are speeding down the street. John doesn t see the cars. He steps out into the street Our story is about a short-sighted man named John. He lives in a small city with narrow streets. One day John goes for a walk. Three sports cars are speeding down the street. John doesn t see the cars. He steps out into the street 11.09.2012 TDT13, lecture 1: Björn Gambäck 9 11.09.2012 TDT13, lecture 1: Björn Gambäck 10 The Research Frontier Some NLP applications Syntax Compositional Situational Pragmatics morphemes words phrases sentences paragraphs texts (What kind of knowledge of language is needed?) Text-to-speech Speech Recognition OCR Information Retrieval Information Extraction Machine Translation Dialogue Systems 11.09.2012 TDT13, lecture 1: Björn Gambäck 11 11.09.2012 TDT13, lecture 1: Björn Gambäck 12 NTNU 2

What is a language? There are 6000-8000 languages in the World. (Why are the figures not more specific than that?) There are 82 languages in Ethiopia. (How can we be sure of that? - Why not 80 or 85?) How many languages are there in Norway?! 11! (according to the Ethnologue): Norwegian: Bokmål, Nynorsk; Norwegian Sign Language, Finnish: Kven Romani: Tavringer, Vlax; Norwegian Traveller Saami: Lule, Pite, North, South One or two individuals (languages)? How can you tell if a person speaks the same language as yourself or if she speaks another, different language? Do two speakers of the same language always speak alike? Is it always impossible to understand a person who speaks another language? 11.09.2012 TDT13, lecture 1: Björn Gambäck 13 11.09.2012 TDT13, lecture 1: Björn Gambäck 14 Grammatical vs. Meaningful Sentences Context-Free Grammar (CFG) Belonging to the string set * brown sleeps blue dog the Grammatical (belonging to the language)? The blue brown blue brown blue dog sleeps Understandable The blue dog sleeps Meaningful The brown dog sleeps LHS = one non-terminal s np, vp. np name. np n. np det, n. vp v. vp v, np. vp v, np, np. name [john]. name [mary]. det [a]. det [the]. n [dog]. n [dogs]. v [snores]. v [see]. v [sees]. v [gives]. 11.09.2012 TDT13, lecture 1: Björn Gambäck 15 11.09.2012 TDT13, lecture 1: Björn Gambäck 16 Grammar Coverage Coverage is never complete Add more rules All grammars leak More specific rules Add more features Syntactic Ambiguity Joe said that Martha expected that it would rain yesterday She asked him or she persuaded him to leave He knew the girl left Tycker du om Line? Vad tycker du om Line? 11.09.2012 TDT13, lecture 1: Björn Gambäck 17 11.09.2012 TDT13, lecture 1: Björn Gambäck 18 NTNU 3

Lexical Ambiguity I made her duck Structural Ambiguity I saw a man in the park with a telescope her - possessive pronoun; her - object pronoun duck - verb; duck - noun make = create; make = cook I saw a man in [the park with a telescope] I saw [a man] in the park [with a telescope] I [saw] a man in the park [with a telescope] 11.09.2012 TDT13, lecture 1: Björn Gambäck 19 11.09.2012 TDT13, lecture 1: Björn Gambäck 20 Redundancy We discussed computers yesterday Den gula bilen (pseudo-)amharic: Man-the he-died Semantic Construction Given a sentence of a language, is there a systematic way of constructing its semantic representation? Can we translate a syntactic structure into an abstract representation of its actual meaning? (e.g. first-order logic) 11.09.2012 TDT13, lecture 1: Björn Gambäck 21 11.09.2012 TDT13, lecture 1: Björn Gambäck 22 Compositional Compositional = The abstract meaning of a sentence (built from the meaning of its parts) Situational = Adds context-dependent information Forget about it World knowledge = knowledge about the world shared between groups of people FBI Technician: What s forget about it? Donnie Brasco: Forget about it is like if you agree with someone, you know, like Raquel Welsh is one great piece of ass forget about it. But then, if you disagree, like A Lincoln is better than a Cadillac? Forget about it! you know? But then, it's also like if something s the greatest thing in the world, like Mingio s Peppers, forget about it. But it s also like saying Go to hell! too. Like, you know, like Hey Paulie, you got a one inch pecker? and Paulie says Forget about it! Sometimes it just means forget about it. Construction of Semantic Representations Three basic principles: Lexicalization: try to keep semantic information lexicalized Compositionality: pass information up compositionally from terminals Underspecification: Don t make a choice unless you have to (the interpretation of ambiguous parts is left unresolved) Björn Gambäck 23 Björn Gambäck 24 NTNU 4

Lexicalization Simple grammars BUT terribly unwieldy lexical feature structures. Try to express lexical generalizations. Alternatively: Extend the formalism to make it more expressive. Let special features have dedicated (complex) behaviour. Compositionality, Frege s Principle Meaning ultimately flows from the lexicon Meanings are combined by syntactic information The meaning of the whole is a function of the meaning of its parts ( parts = the substructure given by syntax) Björn Gambäck 25 11.09.2012 TDT13, lecture 1: Björn Gambäck 26 Underspecification A meaning ϕ of a formalism L is underspecified = represents an ambiguous sentence in a more compact manner than by a disjunction of all readings Phenomena for Underspecification local ambiguities e.g., lexical ambiguities, anaphoric or deictic use of PRO L is complete = L s disambiguation device produces all possible refinements of any ϕ Example: consider a sentence with 3 quantified NPs (with underspecifed scoping relations) L must be able to represent all 2 3! = 64 refinements (partial and complete disambiguations) of the sentence. global ambiguities e.g., scopal ambiguities, collective-distributive readings ambiguous or incoherent non-semantic information e.g., PP-attachment, number disagreement Björn Gambäck 27 Björn Gambäck 28 Word Meaning Built in from the start?! Or learnt by observation? Word usage in context by a community the meaning of a word is its use in the language (Ludwig Wittgenstein 1953) Distributional Hypothesis Words with similar usage have similar meanings Similarity = share contexts (Zellig Harris 1954, 1968) Distributional data used to model similarity you shall know a word by the company it keeps (John Rupert Firth 1957) 11.09.2012 TDT13, lecture 1: Björn Gambäck 29 11.09.2012 TDT13, lecture 1: Björn Gambäck 30 NTNU 5