Non-parametric Bayesian models for computational morphology

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Derivational and Inflectional Morphemes in Pak-Pak Language

Probabilistic Latent Semantic Analysis

BULATS A2 WORDLIST 2

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

CS 598 Natural Language Processing

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Year 4 National Curriculum requirements

Words come in categories

Accurate Unlexicalized Parsing for Modern Hebrew

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

The Role of the Head in the Interpretation of English Deverbal Compounds

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

LING 329 : MORPHOLOGY

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Natural Language Processing. George Konidaris

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Basic concepts: words and morphemes. LING 481 Winter 2011

Grammars & Parsing, Part 1:

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Ensemble Technique Utilization for Indonesian Dependency Parser

Linking Task: Identifying authors and book titles in verbose queries

Training and evaluation of POS taggers on the French MULTITAG corpus

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Lecture 1: Machine Learning Basics

Learning Methods in Multilingual Speech Recognition

Vocabulary Usage and Intelligibility in Learner Language

Universiteit Leiden ICT in Business

Applications of memory-based natural language processing

The stages of event extraction

Constructing Parallel Corpus from Movie Subtitles

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Coast Academies Writing Framework Step 4. 1 of 7

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Chapter 4: Valence & Agreement CSLI Publications

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Developing a TT-MCTAG for German with an RCG-based Parser

Ch VI- SENTENCE PATTERNS.

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Parsing of part-of-speech tagged Assamese Texts

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Phenomena of gender attraction in Polish *

Using computational modeling in language acquisition research

A Bayesian Learning Approach to Concept-Based Document Classification

Controlled vocabulary

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

Exploration. CS : Deep Reinforcement Learning Sergey Levine

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

What the National Curriculum requires in reading at Y5 and Y6

A Case Study: News Classification Based on Term Frequency

Short Text Understanding Through Lexical-Semantic Analysis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

ScienceDirect. Malayalam question answering system

A study of speaker adaptation for DNN-based speech synthesis

Specifying a shallow grammatical for parsing purposes

The Smart/Empire TIPSTER IR System

Morphosyntactic and Referential Cues to the Identification of Generic Statements

A Grammar for Battle Management Language

Multilingual Sentiment and Subjectivity Analysis

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

THE VERB ARGUMENT BROWSER

BYLINE [Heng Ji, Computer Science Department, New York University,

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Grammar Extraction from Treebanks for Hindi and Telugu

A heuristic framework for pivot-based bilingual dictionary induction

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Online Updating of Word Representations for Part-of-Speech Tagging

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Test Blueprint. Grade 3 Reading English Standards of Learning

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

AQUA: An Ontology-Driven Question Answering System

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Cross Language Information Retrieval

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Adjectives tell you more about a noun (for example: the red dress ).

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

Language Model and Grammar Extraction Variation in Machine Translation

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Today we examine the distribution of infinitival clauses, which can be

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

A Vector Space Approach for Aspect-Based Sentiment Analysis

Transcription:

Non-parametric Bayesian models for computational morphology Dissertation defence Kairit Sirts Institute of Informatics Tallinn University of Technology 18.06.2015 1

Outline 1. NLP and computational morphology 2. Why non-parametric Bayesian modeling? 3. Thesis Claims 4. Model 1: joint POS tagging and morphological segmentation 5. Model 2: weakly-supervised morphological segmentation 6. Model 3: morphosyntactic clustering using distributional and morphological cues 7. Future work 2

Natural language processing Human-human interaction 1-2 languages Human-computer interaction 90 languages 3

World s languages Big languages 1750 Mandarin English Spanish Hindi/Urdu Less than 1000 speakers 7000 languages Level of computer support ~275 More than 1M speakers 4

Language complexity Related to morphological complexity English Nouns 4 inflected forms Singular Plural Nom bird birds Gen ird s birds Estonian Nouns 28 inflected forms Singular Plural Nom lind linnud Gen linnu lindude Part lindu linde Ill lindu lindudesse 5

Morphology Studies the ords i ter al stru ture Definition 1 (Haspelmath and Sims, pp. 3): Morphology is the study of the combination of morphemes to yield words. Morphemes are the smallest meaningful constituents of words. disconnections dis_connect_ion_s Definition 2 (Haspelmath and Sims, pp. 2): Morphology is the study of systematic covariation in the form and meaning of words. Mutter Mütter M. Haspelmath and A. D. Sims. Understanding Morphology: 2nd edition, Hodder Education, 2010. 6

Computational morphology Useful for: machine translation, speech recognition, information retrieval, natural language generation SPARSITY Infrequent words (Zipf s law) Fixed size vocabularies Recognize a word: disconnection out of vocabulary dis, connection in the vocabulary disconnection = dis + connection 7

Computational morphology tasks Morphological segmentation Splitting words into morphemes disconnections dis_connect_ion_s Part-of-speech tagging (clustering) Clustering words based on their syntactic function ou, er, adje ti e, pro ou, Morphological analysis Assigning each word a set of morphosyntactic features hallides hall+des //_A_ pos pl in // 8

Why non-parametric Bayesian modeling? Supervised vs unsupervised Enables working with languages lacking annotated linguistic data Algorithmic vs model-based Probabilistic modeling framework Provides semantics to the model Frequentist vs Bayesian Frequentist: P(Data Model) Bayesian: P(Data Model) * P(Model) Non-parametric priors generate Zipfian distributions 9

Claim A For unsupervised or weakly-supervised learning of natural language structures, it is vital not only to model the known properties of those structures, but also some regularities or patterns that are latent, even if they have no specific meaning in linguistic terms. 10

Claim B Unsupervised learning can be improved by integrating different aspects of the same process into the joint model; this helps to resolve ambiguities, leading to overall better results. 11

Joint POS induction and morphological segmentation Model 1 12

Joint POS induction and morphological segmentation NOUN VERB VERB PREP DET NOUN Children are playing in the courtyard Child ren are play ing in the court yard 13

Results Competitive results in POS induction, tested on 15 languages Mediocre results in morphological segmentation, tested on 4 languages Assess the joint learning with semi-supervised experiments (Estonian) Tags Segments Unsupervised 47.6 51.9 Semi-supervised 40.5 44.5 14

Contributions State-of-the-art results in unsupervised POS induction over several languages Empirical evidence that morphological information and POS assignments influence each other in the joint learning setting (Claim B). 15

Weakly-supervised morphological segmentation Model 2 16

Weakly-supervised morphological segmentation Adaptor Grammars framework (Johnson et al., 2007) Combines probabilistic context-free grammars and non-parametric Bayesian modeling Two weakly-supervised methods: AG Select uses model selection Semi-supervised AG Comparing morphology grammars: word is a sequence of morphemes with morpheme sub- or super-structures 17

Grammars for learning morphology Word Morph + Word Morph + Morph SubMorph + Word Compound + Compound Prefix* Stem Suffix* Prefix, Stem, Suffix SubMorph +

Results Average F1-scores over four languages (Eng, Est, Fin, Tur) Weakly-supervised models use 1000 annotated word types Unsupervised Weakly-supervised AG MorphSeq 58.0 63.4 AG SubMorphs 63.3 66.1 AG Compounding 62.4 69.8 1 AG Select 70.8 1 Turkish excluded 19

Contributions State-of-the-art results in both unsupervised and weaklysupervised morphological segmentation across several languages Empirical evidence that grammars modeling additional latent sub- or superstructures perform consistently better than the grammars modeling flat morpheme sequences only (Claims A and B). 20

Morphosyntactic clustering using distributional and morphological cues Model 3 21

Morphosyntactic clustering using distributional and morphological cues Unsupervised clustering model Distributional information via word embeddings Non-parametric prior using suffix similarity function Clustering and similarity function learned jointly 22

Word embeddings Trained with neural networks clustered as multivariate Gaussian random variables a d began copying a d began to to peaceful sounds o peaceful terms peaceful pedantic guarded began played stepped peaceful pedantic guarded began played stepped 23

Results on English Good results on English Not too impressive results on other languages Model # Clusters Accuracy K-means baseline 104 16.1 IGMM baseline 55.6 41.0 Our model 47.2 64.0 24

Contributions Empirical evidence that the joint model using both sources of information learns better clusters then the one using distributional information only (Claim B) Showing that the non-parametric model allowed to choose the number of morphosyntactic clusters freely makes a reasonable choice in English (Claim A) 25

Future research Study the relations between suffixes and (morpho)syntactic categories in morphologically complex languages Current models probably biased to English Combine the models together Use Adaptor Grammar segmentation in the joint POS induction and segmentation model Combine the two syntactic clustering models Use learned suffixes as features in the morphosyntactic clustering model Apply the segmentation models to more languages 26

Conclusions Three models of computational morphology Defined in non-parametric Bayesian framework Unsupervised or weakly-supervised All employ joint learning in different ways and demonstrate that it is beneficial Demonstrate the utility of modeling additional latent structures 27

Contributions Joint POS induction and morphological segmentation State-of-the-art results in unsupervised POS induction over several languages Morphological information and POS assignments influence each other in the joint learning setting (Claim B). Weakly-supervised morphological segmentation State-of-the-art results morphological segmentation across several languages Modeling latent sub- or superstructures are helpful for learning morphological segmentations (Claims A and B). Morphsyntactic clustering using distributional and morphological cues The model using both sources of information learns better clusters then the one using distributional information only (Claim B) Showing that the non-parametric model allowed to choose the number of morphosyntactic clusters freely makes a reasonable choice in English (Claim A) 28

Question 1 A question regarding the joint POS induction and morphological segmentation model: One innovation of your model over prior work is the ability to automatically learn the number of tags by using the infinite HMM. What do you thi k ould the i pa t to your odel s performance be if you used a fixed finite number of tags instead, using Dirichlet priors? 29

Question 2 Regarding the model for morphological segmentation using Adaptor Grammars: In the Adaptor Grammars framework, it was difficult to introduce a weighting factor for a small set of labeled data by simply including each labeled word in the dataset multiple times. What do you think about using weights for the labeled words when computing the posterior grammar, after training? Would that achieve the goal of giving higher weight to observed segmentations? 30

Adaptor Grammars Word Morphs Morphs Morph Morphs Morphs Morph Morph Chars Chars Char Chars Chars Char Char s Char i PCFG: Word sing_ing = Word Morphs Morphs Morph Morphs Morph Chars Chars Char Chars Char s... Adaptor Grammar: Word sing_ing = Word Morphs Morphs Morph Morphs Morph sing Morph ing 31

Semisupervised AG Use labeled data to extract counts of different rules and subtrees Labels must be compatible with the grammar Full bracketing is not required Example Input: (Morph s i n g) (Morph i n g) 32

Question 3 Regarding the model for morphological segmentation using Adaptor Grammars: Is it possible to use a small labeled set for both selecting a morphological template as in AG Select and for gathering counts from labeled segmentations as in semi-supervised AG? 33

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 34

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 35

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 36

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 37

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 38

AG Select s M11 a l M1 M12 t Word i M21 M2 M22 M1 M2 salt_iness M1 M21 M22 salt_i_ness M11 M12 M2 sal_t_iness M11 M12 M21 M22 sal_t_i_ness n e s s Word M1 Word M1 M2 M1 M11 M1 M11 M12 M2 M21 M2 M21 M22 M11 Chars + M12 Chars + M21 Chars + M22 Chars + 39

Feature-based similarity function - -d -ed -c -ic -s -es stepped played metallic pedantic 1 1 1 0 0 0 0 1 0 0 1 1 0 0 40

Distance-dependent Chinese restaurant process metallic pedantic stepped played table stepped played table table stepped, played pedantic stepped played table metallic Afp Vmis Ncns 41