Part II. Statistical NLP

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Context Free Grammars. Many slides from Michael Collins

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Grammars & Parsing, Part 1:

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Natural Language Processing. George Konidaris

CS 598 Natural Language Processing

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

Using dialogue context to improve parsing performance in dialogue systems

Parsing of part-of-speech tagged Assamese Texts

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

An Evaluation of POS Taggers for the CHILDES Corpus

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Ensemble Technique Utilization for Indonesian Dependency Parser

Indian Institute of Technology, Kanpur

Sample Goals and Benchmarks

Learning Methods in Multilingual Speech Recognition

Training and evaluation of POS taggers on the French MULTITAG corpus

Developing Grammar in Context

Analysis of Probabilistic Parsing in NLP

The stages of event extraction

Probabilistic Latent Semantic Analysis

Prediction of Maximal Projection for Semantic Role Labeling

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Memory-based grammatical error correction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Advanced Grammar in Use

1. Introduction. 2. The OMBI database editor

The Role of the Head in the Interpretation of English Deverbal Compounds

Copyright 2017 DataWORKS Educational Research. All rights reserved.

Vocabulary Usage and Intelligibility in Learner Language

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

A Bayesian Learning Approach to Concept-Based Document Classification

The College Board Redesigned SAT Grade 12

Modeling function word errors in DNN-HMM based LVCSR systems

What the National Curriculum requires in reading at Y5 and Y6

Short Text Understanding Through Lexical-Semantic Analysis

Lecture 1: Basic Concepts of Machine Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Accurate Unlexicalized Parsing for Modern Hebrew

THE VERB ARGUMENT BROWSER

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Multilingual Sentiment and Subjectivity Analysis

Guidelines for Writing an Internship Report

A Comparison of Two Text Representations for Sentiment Analysis

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype

Modeling function word errors in DNN-HMM based LVCSR systems

Universiteit Leiden ICT in Business

A Syllable Based Word Recognition Model for Korean Noun Extraction

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

AQUA: An Ontology-Driven Question Answering System

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Development of the First LRs for Macedonian: Current Projects

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Formulaic Language and Fluency: ESL Teaching Applications

Distant Supervised Relation Extraction with Wikipedia and Freebase

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Leveraging Sentiment to Compute Word Similarity

An Efficient Implementation of a New POP Model

Physics 270: Experimental Physics

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Compositional Semantics

Writing a composition

Cross Language Information Retrieval

Speech Recognition at ICSI: Broadcast News and beyond

Some Principles of Automated Natural Language Information Extraction

Online Updating of Word Representations for Part-of-Speech Tagging

Applications of memory-based natural language processing

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Beyond the Pipeline: Discrete Optimization in NLP

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Introduction to Text Mining

LTAG-spinal and the Treebank

Specifying a shallow grammatical for parsing purposes

Proceedings of the 19th COLING, , 2002.

BULATS A2 WORDLIST 2

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Transcription:

Advanced Artificial Intelligence Part II. Statistical NLP Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most slides taken (or adapted) from Adam Przepiorkowski (Poland) Figures by Manning and Schuetze

Contents Part of Speech Tagging Task Why Approaches Naive VMM HMM Transformation Based Learning Probabilistic Parsing PCFGs and Tree Banks Parts of chapters 10, 11, 12 of Statistical NLP, Manning and Schuetze, and Chapter 8 of Jurafsky and Martin, Speech and Language Processing.

Motivations and Applications Part-of-speech tagging The representative put chairs on the table AT NN VBD NNS IN AT NN AT JJ NN VBZ IN AT NN Some tags : AT: article, NN: singular or mass noun, VBD: verb, past tense, NNS: plural noun, IN: preposition, JJ: adjective

Table 10.1

Why pos-tagging? First step in parsing More tractable than full parsing, intermediate representation Useful as a step for several other, more complex NLP tasks, e.g. Information extraction Word sense disambiguation Speech Synthesis Oldest task in Statistical NLP Easy to evaluate Inherently sequential

Different approaches Start from tagged training corpus And learn Simplest approach For each word, predict the most frequent tag 0-th order Markov Model Gets 90% accuracy at word level (English) Best taggers 96-97% accuracy at word level (English) At sentence level : e.g. 20 words per sentence, on average one tagging error per sentence Unsure how much better one can do (human error)

Notation / Table 10.2

Visual Markov Model Assume the VMM of last week We are representing Lexical (word) information implicit

Table 10.3

Hidden Markov Model Make the lexical information explicit and use HMMs State values correspond to possible tags Observations to possible words So, we have

Estimating the parameters From a tagged corpus, maximum likelihood estimation So, even though a hidden markov model is learning, everything is visible during learning! Possibly apply smoothing (cf. N-gramms)

Table 10.4

Tagging with HMM For an unknown sentence, employ now the Viterbi algorithm to tag Similar techniques employed for protein secondary structure prediction Problems The need for a large corpus Unknown words (cf. Zipf s law)

Unknown words Two classes of part of speech : open and closed (e.g. articles) for closed classes all words are known Z: normalization constant

What if no corpus available? Use traditional HMM (Baum-Welch) but Assume dictionary (lexicon) that lists the possible tags for each word One possibility : initialize the word generation (symbol emmision) probabilities b jl * = 0 if t j is not a part of speech for w l 1 / T (w l ) otherwise

Assume b * jl = P(t j w l ) = 1/ T (w l ), i.e. uniform We want P(w l t j ) = P(t j w l )P(w l ) P(t j ) = = = P(t j w l )P(w l ) P(t j w m ).P(w m ) w m w m 1.C(w l ) T (w l ). C(w k ) w k 1.C(w m ) T (w m ). C(w k ) C(w l ) T (w l ) C(wm ) w T (w m ) m w k

Transformation Based Learning (Eric Brill) Observation : Predicting the most frequent tag already results in excellent behaviour Why not try to correct the mistakes that are made? Apply transformation rules IF conditions THEN replace tag_j by tag_i Which transformations / corrections admissible? How to learn these?

Table 10.7/10.8

The learning algorithm

Remarks Other machine learning methods could be applied as well (e.g. decision trees, rule learning )

Rule-based tagging Oldest method, hand-crafted rules Start by assigning all potential tags to each word Disambiguate using manually created rules E.g. for the word that If The next word is an adjective, an adverb or a quantifier, And the further symbol is a sentence boundary And the previous word is not a consider-type verb Then erase all tags apart from the adverbial tag Else erase the adverbial tag

Learning PCFGs for parsing Learning from complete data Everything is observed visible, examples are parse trees Cf. POS-tagging from tagged corpora PCFGs : learning from tree banks, Easy : just counting Learning from incomplete data Harder : The EM approach The inside-outside algorithm Learning from the sentences (no parse trees given)

How does it work? R := {r r is a rule that occurs in one of the parse trees in the corpus} For all rules r in R do Estimate probability label rule P( N -> S) = Count(N -> S) / Count(N)

Conclusions Pos-tagging as an application of SNLP VMM, HMMs, TBL Statistical tagggers Good results for positional languages (English) Relatively cheap to build Overfitting avoidance needed Difficult to interpret (black box) Linguistically naive

Conclusions Rule-based taggers Very good results Expensive to build Presumably better for free word order languages Interpretable Transformation based learning A good compromise? Tree bank grammars Pretty effective (and easy to learn) But hard to get the corpus.