Matilde Marcolli CS101: Mathematical and Computational Linguistics. Winter Additional Topics

Similar documents
A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Grammars & Parsing, Part 1:

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Parsing of part-of-speech tagged Assamese Texts

Minimalism is the name of the predominant approach in generative linguistics today. It was first

CS 598 Natural Language Processing

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

LNGT0101 Introduction to Linguistics

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Developing a TT-MCTAG for German with an RCG-based Parser

California Department of Education English Language Development Standards for Grade 8

Natural Language Processing. George Konidaris

Theoretical Syntax Winter Answers to practice problems

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

An Interactive Intelligent Language Tutor Over The Internet

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

AQUA: An Ontology-Driven Question Answering System

Using dialogue context to improve parsing performance in dialogue systems

Using computational modeling in language acquisition research

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Some Principles of Automated Natural Language Information Extraction

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

A Case Study: News Classification Based on Term Frequency

Proof Theory for Syntacticians

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Context Free Grammars. Many slides from Michael Collins

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

The College Board Redesigned SAT Grade 12

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Developing Grammar in Context

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The Strong Minimalist Thesis and Bounded Optimality

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Advanced Grammar in Use

An Introduction to the Minimalist Program

Multiple case assignment and the English pseudo-passive *

SOME MINIMAL NOTES ON MINIMALISM *

Psychology and Language

Generation of Referring Expressions: Managing Structural Ambiguities

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Phenomena of gender attraction in Polish *

Today we examine the distribution of infinitival clauses, which can be

Lecture 1: Basic Concepts of Machine Learning

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Frequency and pragmatically unmarked word order *

Derivational and Inflectional Morphemes in Pak-Pak Language

Writing a composition

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Derivations (MP) and Evaluations (OT) *

On the Notion Determiner

The stages of event extraction

Loughton School s curriculum evening. 28 th February 2017

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Constraining X-Bar: Theta Theory

The Evolution of Random Phenomena

Evolution of Symbolisation in Chimpanzees and Neural Nets

Using a Native Language Reference Grammar as a Language Learning Tool

Age Effects on Syntactic Control in. Second Language Learning

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Compositional Semantics

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Underlying and Surface Grammatical Relations in Greek consider

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Analysis of Probabilistic Parsing in NLP

Lecture 1: Machine Learning Basics

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Aspectual Classes of Verb Phrases

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

What the National Curriculum requires in reading at Y5 and Y6

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Children s Acquisition of Syntax: Simple Models are Too Simple

Character Stream Parsing of Mixed-lingual Text

1. Share the following information with your partner. Spell each name to your partner. Change roles. One object in the classroom:

Construction Grammar. University of Jena.

LING 329 : MORPHOLOGY

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Transcription:

Some Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015

Main Reference Judith L. Klavans, Philip Resnik (Eds.), The balancing act: combining symbolic and statistical approaches to language, MIT Press, 1996.

Syntactic Parameters and Models of Language Acquisition Linguistic parameter space H with H = 2 N for N parameters problem of locating a target grammar G in this parameter space on the basis of input text in the language L G already seen some models (Markov Chain Model) general idea: combine linguistic (symbolic) and statistical techniques in constructing such models Shyam Kapur and Robin Clark, The Automatic Construction of a Symbolic Parser via Statistical Techniques, in The balancing act: combining symbolic and statistical approaches to language, MIT Press, 1996, pp. 95 117

the process of setting values of syntactic parameters also involves reorganizing the grammar to reflect changed parameter value (more linguistic input) self-modification process (realistic model of language acquisition/evolution) most commonly used learning algorithm (see previous lectures) moves one step in parameter space triggered by failure to parse an incoming sentence inefficient: basically amounts to a random walk in parameter space different idea: next step choice uses previously built structure (incrementally build the grammar, modifying it when some parameter needs to be reset)

focus on a set of syntactic parameters 1 Relative order of specifier and head (place of determiner relative to noun, position of VP-modifying adverbs) 2 Relative order of head and complement (VO versus OV; prepositions versus postpositions) 3 Scrambling: (some amount of) free word order allowed 4 Relative order of negative markers and verbs (more than one parameter: English has not after first tensed auxiliary, French wraps around verb: ne... pas, etc.) 5 Root word order changes: certain word order changes allowed in root clauses but not in embedded clauses (eg inversion in root questions in English)

6 Rightward dislocation (as in: That this happens amazes me) 7 Wh-movement: location of wh-questions in phrase (English only one in first position, French as English or in situ, Polish several wh-questions stacked at beginning) 8 Exceptional case marking, structural case marking: allows for structures like V [+tense] NPVP [ tense] tensed verb, noun phrase, verb phrase headed by infinitive verb 9 Raising and Control: distinguishes raising and control verbs (eg they seem to be trying: seem is a raising-to-subject verb, takes a semantic argument that belongs to an embedded predicate; or he proved them to be wrong: prove is raising-to-object verb; control verbs: he stopped laughing, they told me to go there,...) 10 Long and short-distance anaphora: short-distance anaphor himself corefers to NP within same local domain; other languages have long distance...

in Principles and Parameters theory trigger data (cues) force learner to set certain particular parameters where do statistical properties of the input text enter in parameter setting? Example: in English sentences can have John thinks that Mary likes him, where him is a local anaphor (for John), or sentences like Mary likes him, where him is not co-referential to anything else in the sentence by statistical (frequency) of occurrences him is not always an anaphor. (This will avoid erroneously setting Long-distance anaphor parameter for English; unlike sig in Icelandic that can only be used as anaphor, long or short distance) Idea: a model of parameter setting should involve statistical analysis of the input text

Parameter Setting Model Space with N binary parameters Random subdivision of parameters into m groups: P 1,..., P m first set all parameters in first group P 1 : 1 no parameter is set at the start 2 both values ± or each Π i P 1 are competing 3 for each Π i a pair of hypotheses H i ± 4 these hypotheses are tested on input evidence 5 if H i fails or H i + succeeds set Π i = +, else Π i = continue with P 2,..., P m

Window sizes for hypotheses testing, suitable window sizes during which algorithm is sensitive to occurrence/non-occurrence; failure to occur within specified window taken as negative evidence Example 1 H i +: expect not to observe phenomena from a fixed set O i supporting Π i = 2 H i : expect not to observe phenomena from a fixed set O i + supporting Π i = + testing H i +: two small numbers w i, k i 1 input of sentences of size w i : record occurrences of phenomena in O i 2 repeat this construction of window of size w i for k i times: fraction c i of times that phenomena in O i occurred at least once 3 hypothesis H i + succeeds if c i /k i < 1/2

sets O i ± have to be such that parser is always capable of analyzing the input for occurrences Note: with this method some parameters get set quicker than others (those parameters that are expressed more frequently) Word order parameters, for example, are expressed in all sentences: first ones to be set but for example have languages like German that are SOV but with V2 parameter moving verb in second position in root clauses (making some sentences look SVO) know from previous discussion of Gibson Wexler algorithm that the parameter space for these word order plus V2 parameters has local maxima problem what happens to V2 parameter setting in this model? Can it avoid the problem?

Word order and V2 parameter Entropy S(X ) = X =x p(x) log p(x) or random variable X Conditional Entropy S(X Y ) = X =x,y =y p(x, y) log p(x y) = X =x,y =y p(x, y) log how much better first variable can be predicted when second known pin down word order by analyzing entropy of positions in the neighborhood of verbs observation: in a V2 language more entropy to the left of verb than to the right (position to the left is less predictable) p(y) p(x, y)

in input text consider data (v, d, w) with v one of 20 most frequent verbs, d a position either to the left or to the right of v and w the word that occurs in that position then procedure for setting V 2 parameter Compute conditional entropies H(W V, D) if H(W V, D = left) > H(W V, D = right) set V 2 = + otherwise set V 2 = correct result obtained when algorithm testes on 9 languages What hypothesis H V 2 ± does this procedure correspond to? simply use H V 2 + = expect not to observe lower entropy on the left of verbs window size used 300 sentences and 10 repetitions

Another application of the same algorithm: Clitic Pronouns Clitic Pronouns (κλιτ ικoς = inflexional): syntactically independent but phonologically associated to another word Example: in French me, te (object clitic), je, tu (subject clitic), moi, toi (non-clitic, free standing), nous, vous (ambiguous) Automatic identification and classification of clitic pronouns Related to correctly setting syntactic parameters for syntax of pronominals also use method based on entropies of positions algorithm computes entropy profiles three positions to the left and to the right of each pronoun H(W P = p) cluster together pronouns that have similar entropy profiles: find this gives the correct syntactic grouping

Some Concluding Remarks on Linguistics and Statistics Statistical methods in Computational Linguistics (especially Hidden Markov Models) have come to play a prominent role in recent years Statistical methods are also the basis for Natural Language Processing and Machine Translation techniques Theoretical Linguistics, on the other hand, is focused on understanding syntactic structures of languages, generative grammars, models of how the human mind acquires and processes language, and of how languages change and evolve in time Is there a Linguistics versus Statistics tension in the field? the sociological answer is yes, but the scientific answer should be no

Language Learning if only a discrete setting where some parameters are switched on and off would expect abrupt changes in a learner s acquisition process in experimental observation of children learning a language, grammar changes happen as changes in frequencies of use of different possibilities, over a stretch of time more consistent with the idea that the language learner is dealing with probabilistic grammars and trying out rules for a time a probabilistic grammar is a combination of a theoretical linguistic substrate (context-free grammars, tree-adjoining grammars, etc.) with a probabilistic datum associated to the production rules Discrete (algebraic) structures + (continuous) probabilities

Language Evolution both language change by dialect diversification and by interaction with other languages require frequencies/probabilities describing spreading of change and proportions of different language speakers among a population even assuming every individual adult speaker uses a fixed (non-probabilistic) grammar, probabilistic methods are intrinsic in the description of language change over a population linguistics theories formulated before computational methods (like the wave theory model of language change) and already naturally compatible with the probabilistic approach even setting of syntactic parameters in a given language can be seen as probabilistic (see the head-initial/head-final subdivision)

Parsing Ambiguities even completely unremarkable and seemingly unambiguous sentences can have lots of different parsings (just most of them would be considered very unusual) a lot of these (grammatical) parsings would be accepted by a grammar but not in agreement with human perception Example: English sentence The cows are grazing in the grass seems completely unambiguous, but are is also a noun, a measure of size: a hectare is a hundred ares... it would be grammatical, but very unlikely... probabilistically suppressed

Natural versus Computer Languages separating out the functioning of natural languages into grammar and compiler (that uses grammar to produce and parse sentences) is convenient for theoretical understanding, but does not correspond to an actual distinction (e.g. to different structures in the human mind) analogy with computer languages: grammars (formal languages) also work for describing computer languages... they provide an abstract description of the structure of the computation being performed... but in the actual compiler operations grammar and parsing work simultaneously and not as separate entities grammar is an abstract idealization of linguistic data, which has the power of simplicity (like algebraic structures)

Autonomy of Syntax Chomsky s famous example: sentences - revolutionary new ideas appear infrequently - colorless green ideas sleep furiously syntactically equally well structured (same structure); the second is grammatical but would be discarded by any statistical analysis syntax is in itself an interesting (algebraic) structure, but it is autonomous only as long as it is not interfaced with semantics syntax as algebraic grammar is one (very important) aspect of linguistics, but not the only one

The Goals of Linguistics Describe language: how it is produced, comprehended, learned, and how it evolves over time Goal of Generative Linguistics: produce a model (grammar) that generates sentences in a given language L that reflect the structure as recognized by a human speaker of language L Turing Test for Linguistics: a model passes the test if the sentences it generates are recognized as grammatical and natural by a human speaker... grammatical is not enough, natural is a matter of degrees... both algebraic and probabilistic aspects contribute (test cannot be passed by an unweighted grammar)

Criticism of Markov Models already in his early paper Three models for the description of Language, Chomsky criticized Shannon s n-gram models and statistical approximations to English main point of criticism: it is impossible to choose n and ɛ so that P n (s) > ɛ iff sentence s is grammatical this already pointed out by Shannon: at order n approximation there will be some more elaborate dependences affecting grammaticality that approximation does not capture... but inadequacy of Markov model lies in their being finite-state automata not in being statistical: probabilistic context-free grammars or probabilistic tree-adjoining grammars are more sophisticated statistical models than Shannon s n-grams

Reference for these conclusive remarks Steven Abney, Statistical Methods and Linguistics, in The balancing act: combining symbolic and statistical approaches to language, MIT Press, 1996, pp. 1 26.