CSCI 5832 Natural Language Processing

Similar documents
CS 598 Natural Language Processing

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Derivational and Inflectional Morphemes in Pak-Pak Language

BULATS A2 WORDLIST 2

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Parsing of part-of-speech tagged Assamese Texts

Sample Goals and Benchmarks

Context Free Grammars. Many slides from Michael Collins

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Developing a TT-MCTAG for German with an RCG-based Parser

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

1. Introduction. 2. The OMBI database editor

Words come in categories

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

How long did... Who did... Where was... When did... How did... Which did...

Coast Academies Writing Framework Step 4. 1 of 7

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Constraining X-Bar: Theta Theory

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Natural Language Processing. George Konidaris

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Developing Grammar in Context

Ch VI- SENTENCE PATTERNS.

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

Linking Task: Identifying authors and book titles in verbose queries

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

LING 329 : MORPHOLOGY

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

The Impact of Morphological Awareness on Iranian University Students Listening Comprehension Ability

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Getting Started with Deliberate Practice

What the National Curriculum requires in reading at Y5 and Y6

The Evolution of Random Phenomena

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

Underlying Representations

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Part I. Figuring out how English works

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Chapter 4: Valence & Agreement CSLI Publications

AQUA: An Ontology-Driven Question Answering System

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

MYCIN. The MYCIN Task

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

lgarfield Public Schools Italian One 5 Credits Course Description

Probabilistic Latent Semantic Analysis

Phonological Processing for Urdu Text to Speech System

Double Double, Morphology and Trouble: Looking into Reduplication in Indonesian

SAMPLE PAPER SYLLABUS

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Myths, Legends, Fairytales and Novels (Writing a Letter)

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Using a Native Language Reference Grammar as a Language Learning Tool

(3) Vocabulary insertion targets subtrees (4) The Superset Principle A vocabulary item A associated with the feature set F can replace a subtree X

Interactive Whiteboard

Aspectual Classes of Verb Phrases

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Loughton School s curriculum evening. 28 th February 2017

Basic concepts: words and morphemes. LING 481 Winter 2011

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Test Blueprint. Grade 3 Reading English Standards of Learning

Introduction to CRC Cards

P-4: Differentiate your plans to fit your students

The Use of Inflectional Suffixes by Third Year English Undergraduates at the College of Education, University of Mosul Adday Mahmood Adday (1)

HinMA: Distributed Morphology based Hindi Morphological Analyzer

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Ontological spine, localization and multilingual access

Phonological and Phonetic Representations: The Case of Neutralization

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Course Outline for Honors Spanish II Mrs. Sharon Koller

Morphotactics as Tier-Based Strictly Local Dependencies

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Holy Family Catholic Primary School SPELLING POLICY

Applications of memory-based natural language processing

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

1.11 I Know What Do You Know?

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

An Interactive Intelligent Language Tutor Over The Internet

Programma di Inglese

The Role of the Head in the Interpretation of English Deverbal Compounds

Transcription:

CSCI 5832 Natural Language Processing Lecture 4 Jim Martin 1/25/07 CSCI 5832 Spring 2006 1 Today 1/25 More English Morphology FSAs and Morphology Break FSTs 1/25/07 CSCI 5832 Spring 2007 2 1

English Morphology Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes We can usefully divide morphemes into two classes Stems: The core meaning bearing units Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions 1/25/07 CSCI 5832 Spring 2007 3 Inflectional Morphology Inflectional morphology concerns the combination of stems and affixes where the resulting word Has the same word class as the original Serves a grammatical/semantic purpose different from the original 1/25/07 CSCI 5832 Spring 2007 4 2

Nouns and Verbs (English) Nouns are simple (not really) Markers for plural and possessive Verbs are only slightly more complex Markers appropriate to the tense of the verb 1/25/07 CSCI 5832 Spring 2007 5 FSAs and the Lexicon First we ll capture the morphotactics The rules governing the ordering of affixes in a language. Then we ll add in the actual words 1/25/07 CSCI 5832 Spring 2007 6 3

Simple Rules 1/25/07 CSCI 5832 Spring 2007 7 Adding the Words 1/25/07 CSCI 5832 Spring 2007 8 4

Derivational Rules 1/25/07 CSCI 5832 Spring 2007 9 Parsing/Generation vs. Recognition Recognition is usually not quite what we need. Usually if we find some string in the language we need to find the structure in it (parsing) Or we have some structure and we want to produce a surface form (production/generation) Example From cats to cat +N +PL and back 1/25/07 CSCI 5832 Spring 2007 10 5

Homework How big is your vocabulary? 1/25/07 CSCI 5832 Spring 2007 11 Projects 2 styles of projects Something no one has done You might ask yourself why no one has done it. Tasks that have benchmarks and current best results from bakeoffs To get ideas about the latter go to acl.ldc.upenn.edu and poke around. 1/25/07 CSCI 5832 Spring 2007 12 6

Projects Other ideas Anything to do with blogs Machine learning applied to X Clustering (unsupervised) Classification (supervised) Bioinformatic language sources Search engines (getting old) Semantic tagging (getting hot) 1/25/07 CSCI 5832 Spring 2007 13 Applications The kind of parsing we re talking about is normally called morphological analysis It can either be An important stand-alone component of an application (spelling correction, information retrieval) Or simply a link in a chain of processing 1/25/07 CSCI 5832 Spring 2007 14 7

Finite State Transducers The simple story Add another tape Add extra symbols to the transitions On one tape we read cats, on the other we write cat +N +PL, or the other way around. 1/25/07 CSCI 5832 Spring 2007 15 FSTs 1/25/07 CSCI 5832 Spring 2007 16 8

Transitions c:c a:a t:t +N:ε +PL:s c:c means read a c on one tape and write a c on the other +N:ε means read a +N symbol on one tape and write nothing on the other +PL:s means read +PL and write an s 1/25/07 CSCI 5832 Spring 2007 17 Typical Uses Typically, we ll read from one tape using the first symbol on the machine transitions (just as in a simple FSA). And we ll write to the second tape using the other symbols on the transitions. 1/25/07 CSCI 5832 Spring 2007 18 9

Ambiguity Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. Didn t matter which path was actually traversed In FSTs the path to an accept state does matter since differ paths represent different parses and different outputs will result 1/25/07 CSCI 5832 Spring 2007 19 Ambiguity What s the right parse for Unionizable Union-ize-able Un-ion-ize-able Each represents a valid path through the derivational morphology machine. 1/25/07 CSCI 5832 Spring 2007 20 10

Ambiguity There are a number of ways to deal with this problem Simply take the first output found Find all the possible outputs (all paths) and return them all (without choosing) Bias the search so that only one or a few likely paths are explored 1/25/07 CSCI 5832 Spring 2007 21 The Gory Details Of course, its not as easy as cat +N +PL <-> cats As we saw earlier there are geese, mice and oxen But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes Cats vs Dogs Fox and Foxes 1/25/07 CSCI 5832 Spring 2007 22 11

Multi-Tape Machines To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next So to handle irregular spelling changes we ll add intermediate tapes with intermediate symbols 1/25/07 CSCI 5832 Spring 2007 23 Generativity Nothing really privileged about the directions. We can write from one and read from the other or vice-versa. One way is generation, the other way is analysis 1/25/07 CSCI 5832 Spring 2007 24 12

Multi-Level Tape Machines We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape 1/25/07 CSCI 5832 Spring 2007 25 Lexical to Intermediate Level 1/25/07 CSCI 5832 Spring 2007 26 13

Intermediate to Surface The add an e rule as in fox^s# <-> foxes# 1/25/07 CSCI 5832 Spring 2007 27 Foxes 1/25/07 CSCI 5832 Spring 2007 28 14

Note A key feature of this machine is that it doesn t do anything to inputs to which it doesn t apply. Meaning that they are written out unchanged to the output tape. Turns out the multiple tapes aren t really needed; they can be compiled away. 1/25/07 CSCI 5832 Spring 2007 29 Overall Scheme We now have one FST that has explicit information about the lexicon (actual words, their spelling, facts about word classes and regularity). Lexical level to intermediate forms We have a larger set of machines that capture orthographic/spelling rules. Intermediate forms to surface forms 1/25/07 CSCI 5832 Spring 2007 30 15

Overall Scheme 1/25/07 CSCI 5832 Spring 2007 31 Finish Chapter 3 Next Time 1/25/07 CSCI 5832 Spring 2007 32 16