CS674 Natural Language Processing. Goal. What knowledge sources will we need? Topics for today

Similar documents
CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

LING 329 : MORPHOLOGY

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Phonological Processing for Urdu Text to Speech System

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

BULATS A2 WORDLIST 2

Specifying a shallow grammatical for parsing purposes

Context Free Grammars. Many slides from Michael Collins

Refining the Design of a Contracting Finite-State Dependency Parser

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Double Double, Morphology and Trouble: Looking into Reduplication in Indonesian

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Underlying Representations

Grammars & Parsing, Part 1:

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Some Principles of Automated Natural Language Information Extraction

An Interactive Intelligent Language Tutor Over The Internet

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Developing a TT-MCTAG for German with an RCG-based Parser

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Derivational and Inflectional Morphemes in Pak-Pak Language

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Ch VI- SENTENCE PATTERNS.

Chapter 4: Valence & Agreement CSLI Publications

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

(3) Vocabulary insertion targets subtrees (4) The Superset Principle A vocabulary item A associated with the feature set F can replace a subtree X

The Impact of Morphological Awareness on Iranian University Students Listening Comprehension Ability

Florida Reading Endorsement Alignment Matrix Competency 1

Character Stream Parsing of Mixed-lingual Text

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

Adapting Stochastic Output for Rule-Based Semantics

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Noisy SMS Machine Translation in Low-Density Languages

Phonological and Phonetic Representations: The Case of Neutralization

THE VERB ARGUMENT BROWSER

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

INTRODUCTION TO MORPHOLOGY Mark C. Baker and Jonathan David Bobaljik. Rutgers and McGill. Draft 6 INFLECTION

Year 4 National Curriculum requirements

Basic concepts: words and morphemes. LING 481 Winter 2011

Lexical phonology. Marc van Oostendorp. December 6, Until now, we have presented phonological theory as if it is a monolithic

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Semantic Modeling in Morpheme-based Lexica for Greek

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Compositional Semantics

AQUA: An Ontology-Driven Question Answering System

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Inflection Classes and Economy

Analysis of Probabilistic Parsing in NLP

Language properties and Grammar of Parallel and Series Parallel Languages

Modeling full form lexica for Arabic

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Words come in categories

Type Theory and Universal Grammar

Tutorial on Paradigms

AF~-SUttA~ :tc.a~ v~ t~* Salah Alnajem. Abstract. Department of Arabic, College of Arts Kuwait University

1. Introduction. 2. The OMBI database editor

HinMA: Distributed Morphology based Hindi Morphological Analyzer

ARNE - A tool for Namend Entity Recognition from Arabic Text

Constructing Parallel Corpus from Movie Subtitles

Pethau weird ac atmosphere gwych Conflict sites in Welsh-English mixed nominal constructions

Morphotactics as Tier-Based Strictly Local Dependencies

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

South Carolina English Language Arts

(12) United States Patent Bernth et al.

A Simple Surface Realization Engine for Telugu

Natural Language Processing. George Konidaris

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Analysis of Lexical Structures from Field Linguistics and Language Engineering

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

California Department of Education English Language Development Standards for Grade 8

On the final vowel in Kikae

5/29/2017. Doran, M.K. (Monifa) RADBOUD UNIVERSITEIT NIJMEGEN

Underlying and Surface Grammatical Relations in Greek consider

Proof Theory for Syntacticians

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Lexical specification of tone in North Germanic

More ESL Teaching Ideas

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Program in Linguistics. Academic Year Assessment Report

Constraining X-Bar: Theta Theory

Building an HPSG-based Indonesian Resource Grammar (INDRA)

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Common Core ENGLISH GRAMMAR & Mechanics. Worksheet Generator Standard Descriptions. Grade 2

Transcription:

CS674 Natural Language Processing Last class Need for morphological analysis Basics of English morphology Finite-state morphological parsing» Introduction Goal Input: surface form Output: stem plus morphological features Focus: productive nominal plural (-s) verbal progressive (-ing) foxes fox +N +PL geese goose +N +PL eating eat +V +PRES-PART goose (goose +N +SG) or (goose +V) What knowledge sources will we need? Lexicon List of stems and affixes with basic information about each Morphotactics Model of morpheme ordering Explains which classes of morphemes can follow others Spelling rules Orthographic rules Model the spelling changes that occur in a word when two morphemes combine Topics for today Finite-state morphological parsing Lexicon and morphotactics Morphological parsing with FST s Orthgraphic rules Combining it all

The lexicon Verbal inflection Usually not represented as a list of words Structured as List of stems and affixes Representation of the morphotactics Represent via a finite-state automaton (J&M Ch. 2) J&M Fig 3.2 FSA s for derivational morphology Much more complex Often use CFG s instead Consider adjective morphology what s the problem? FSA s for morphological recognition Goal: Use the FSA s to determine whether an input string of letters makes up a legitimate English word Combine the list of stems with the FSA Expand each arc with all of the morphemes that comprise the class

Topics for today Finite-state morphological parsing Lexicon and morphotactics Morphological parsing with FST s Orthgraphic rules Combining it all Two-level morphology Represents a word as a correspondence between Surface level» Represents the spelling of the word, i.e. letter sequences Lexical level» Represents a concatenation of morphemes, i.e. morpheme and feature sequences Two-level morphology example Mapping between the two levels is accomplished via a finite-state transducer (FST)

Finite-state transducers A finite-state automaton that maps between one set of symbols and another An FSA defines a formal language by defining a set of strings Defines a relation between sets of strings Reads one string and generates another Formal definition Q: a finite set of N states q 0, q 1,, q N q 0 : start state F: set of final states : a finite alphabet of input-output pairs i:o δ(q,i:o): transition function between states. Given a state q Qand complex symbol i:o, δ(q,i:o) returns a new state q' Q FST morphological parser Two-level lexicon reg-noun tree cloud irreg-pl-noun g o:e o:e s e sheep m o:i u:ε s:c e irreg-sg-noun goose sheep mouse

Lexical and intermediate tapes Orthographic Rules E insertion (for example) e added after s, -z, -x, -ch, -sh before s» watch/watches» fox/foxes Implement these rules as a cascade of FST s Output of one transducer is the input to the next transducer One transducer per orthographic rule Each transducer needs to express the constraints necessary for that rule; allow any other string of symbols to pass through unchanged. Transducer for E-insertion

Topics for today Finite-state morphological parsing Lexicon and morphotactics Morphological parsing with FST s Orthgraphic rules Combining it all Ambiguity foxes can be a verb as well as a noun Local ambiguities occur E.g. caress What shall we do? Non-determinism requires the FST-parsing algorithm to include a search algorithm