Introduction to Advanced Natural Language Processing (NLP)

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Parsing of part-of-speech tagged Assamese Texts

CS 598 Natural Language Processing

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Applications of memory-based natural language processing

A Case Study: News Classification Based on Term Frequency

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

English Language and Applied Linguistics. Module Descriptions 2017/18

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

The Smart/Empire TIPSTER IR System

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Context Free Grammars. Many slides from Michael Collins

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Some Principles of Automated Natural Language Information Extraction

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Word Sense Disambiguation

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Cross Language Information Retrieval

Ensemble Technique Utilization for Indonesian Dependency Parser

Accurate Unlexicalized Parsing for Modern Hebrew

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Developing a TT-MCTAG for German with an RCG-based Parser

Compositional Semantics

SEMAFOR: Frame Argument Resolution with Log-Linear Models

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

The stages of event extraction

Natural Language Processing. George Konidaris

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Vocabulary Usage and Intelligibility in Learner Language

Leveraging Sentiment to Compute Word Similarity

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

2.1 The Theory of Semantic Fields

AQUA: An Ontology-Driven Question Answering System

BYLINE [Heng Ji, Computer Science Department, New York University,

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Linking Task: Identifying authors and book titles in verbose queries

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

A Case-Based Approach To Imitation Learning in Robotic Agents

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Prediction of Maximal Projection for Semantic Role Labeling

Analysis of Probabilistic Parsing in NLP

THE VERB ARGUMENT BROWSER

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Control and Boundedness

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

The Discourse Anaphoric Properties of Connectives

Using Semantic Relations to Refine Coreference Decisions

A Comparison of Two Text Representations for Sentiment Analysis

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Role of the Head in the Interpretation of English Deverbal Compounds

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Using dialogue context to improve parsing performance in dialogue systems

A Bayesian Learning Approach to Concept-Based Document Classification

Learning Computational Grammars

Update on Soar-based language processing

Modeling full form lexica for Arabic

Phonological and Phonetic Representations: The Case of Neutralization

Proof Theory for Syntacticians

On document relevance and lexical cohesion between query terms

Accuracy (%) # features

A First-Pass Approach for Evaluating Machine Translation Systems

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Phenomena of gender attraction in Polish *

A Framework for Customizable Generation of Hypertext Presentations

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Specifying a shallow grammatical for parsing purposes

Chapter 4: Valence & Agreement CSLI Publications

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Development of the First LRs for Macedonian: Current Projects

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Corpus Linguistics (L615)

Universiteit Leiden ICT in Business

Generation of Referring Expressions: Managing Structural Ambiguities

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Multilingual Sentiment and Subjectivity Analysis

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Introduction to Text Mining

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Short Text Understanding Through Lexical-Semantic Analysis

Construction Grammar. University of Jena.

Copyright and moral rights for this thesis are retained by the author

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Transcription:

Advanced Natural Language Processing () L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 24

Definition of CL 1 Computational linguistics is the study of computer systems for understanding and generating natural language. (Ralph Grishman (1986), Computational Linguistics: An Introduction, Cambridge University Press) 2 / 24

Definition of CL 2 Simply put, computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. These models may be knowledge-based ( hand-crafted ) or data-driven ( statistical or empirical ). Work in computational linguistics is in some cases motivated from a scientific perspective in that one is trying to provide a computational explanation for a particular linguistic or psycholinguistic phenomenon; and in other cases the motivation may be more purely technological in that one wants to provide a working component of a speech or natural language system.... 3 / 24

Definition of CL 2 (cont.)... Indeed, the work of computational linguists is incorporated into many working systems today, including speech recognition systems, text-to-speech synthesizers, automated voice response systems, web search engines, text editors, language instruction materials, to name just a few. (http://www.aclweb.org/archive/misc/what.html, retrieved 8/3/15) 4 / 24

Short History 1950s: Machine Translation 1964: ALPAC Report 1966: ELIZA 1973: SHRDLU 1980s: knowledge-based CL 1990s & beyond: statistical / machine learning approaches in CL 5 / 24

Early 1950s Machine Translation (MT): one of the earliest applications of computers Major players: US and USSR Russian to English and reverse Georgetown University, Washington system: Translated sample texts in 1954 Euphoria: a lot of funding, many groups in US, USSR BUT: the system could not scale up 6 / 24

ALPAC Report Assessed research results of groups working on MTs Conclusions: MT not possible in near future Funding should cease for MT! Basic research should be supported Word-by-word translation does not work Linguistic knowledge is needed 7 / 24

ELIZA The first chatterbot a computer program that mimics human conversation Author: Joseph Weizenbaum (MIT) Simulation of a (Rogerian) therapist User types in some statement or set of statements in natural language ELIZA then analyzes the user s statement and generates some response Basic technology: pattern matching USER: ELIZA: You don t argue with me. WHY DO YOU THINK I DON T ARGUE WITH YOU 8 / 24

SHRDLU Interaction with a robot in a block world. Author: Terry Winograd (MIT) The user can: ask the robot to manipulate the blocks ask it about the blocks configurations ask it about its reasoning update facts Understands language in a limited domain by using syntactic parsing and semantic reasoning Large scale grammar of English + parser Procedural semantics for words and phrases 9 / 24

Knowledge-Based CL Proof of concept & manually-written rules Linguistic/logic paradigm extensively pursued Later: development of linguistic formalisms (Lexical Functional Grammar, Head-Driven Phrase Structure Grammar, Tree Adjoining Grammar, etc.) Limitations: Not robust enough Few applications Not scalable... though, systems are still getting better Addressing the limitations led to the more recent statistical approaches 10 / 24

Statistical / Machine Learning Approaches Instead of writing rules, have computer learn rules / regularities Approach massive ambiguity problem by probabilities Need annotated data for training Data sparseness problem Unsupervised learning does not help: no linguistically relevant rules 11 / 24

To sum, two main approaches to doing work in : Theory-driven ( knowledge-based): working from a theoretical framework, come up with a scheme for an task e.g., parse a sentence using a handwritten HPSG grammar Data-driven ( statistical): working from some data (and some framework), derive a scheme for an task e.g., parse a sentence using a grammar derived from a corpus The difference is often a matter of degree This course is more data-driven & probabilistic 12 / 24

Rarity of usage Consider the following (Abney 1996): (1) The a are of I. (2) John saw Mary. The a are of I is an acceptable noun phrase (NP): a and I are labels on a map, and are is measure of area John saw Mary is ambiguous between a sentence (S) and an NP: a type of saw (a John saw) which picks out the Mary we are talking about (cf. Typhoid Mary) We don t get these readings right away because they re rare usages of these words Rarity needs to be defined probabilistically 13 / 24

Wide-coverage of rules Grammar rules work sometimes & not others Typically, if a noun is premodified by both an adjective and another noun, the adjective must precede the modifying noun (3) tall (A) shoe (N) rack (4) *shoe (N) tall (A) rack But not always: (5) a Kleene-star (N) transitive (A) closure (6) highland (N) igneous (A) formations If language is categorical and you have a rule which allows N A N, then you have to do something to prevent shoe tall rack. 14 / 24

Using probabilities The Ambiguity of Language Language is ambiguous in a variety of ways: Word senses: e.g., bank Word categories: e.g., can Semantic scope: e.g., All cats hate a dog. Syntactic structure: e.g., I shot the elephants in my pajamas. Often, however, of all the ambiguous choices, one is the best 15 / 24

Syntactic Ambiguity (7) Our company is training workers S NP VP Our company Aux VP is V NP training workers 16 / 24

Less intuitive analyses (1) S NP VP Our company Aux NP is VP V training NP workers 17 / 24

Less intuitive analyses (2) S NP VP Our company V NP is Adj NP training workers 18 / 24

We can induce probabilistic information from language data, potentially data annotated with linguistic information. Thus, we will become familiar with processing large texts, i.e., corpora are often annotated with lingusitic mark-up, such as part-of-speech labels or syntactic annotation These corpora will serve as our data from which to learn probabilities are not the only lexical resources out there; dictionaries (e.g., WordNet) are also important, but these are often derived from corpora 19 / 24

Using corpora for simple analysis Word counts We can use corpora to give us some basic information about word occurrences Count word types = number of distinct words there are in the corpus Count word tokens = number of actual word occurrences in the corpus; multiple occurrences of the same word type are counted each time If we compare word types and tokens, we see that there are: a few word types which occur a large number of times (often function words) a large number of word types which occur only a few times or only once 20 / 24

Zipf s Law This idea is formulated in Zipf s Law = the frequency (f) of a word is inversely proportional to its rank (r) (8) a. fr = k, where k is some constant, or f = k r (Zipf) b. f = P(r + ρ) B, where P, ρ, and B are parameters which measure a text s richness (Mandelbrot) Mandelbrot adjusted Zipf s Law to better handle high and low ranking words; with B = 1 and ρ = 0, it is identical to Zipf s Law (where P = k). Important insight: most words are rare! 21 / 24

Linguistic levels phonetics / phonology morphology POS annotation syntax lexical semantics discourse 22 / 24

CL Analysis finite-state morphology (analysis + generation) POS tagging parsing word sense disambiguation detect selectional restrictions (kill, murder, assassinate) shallow inference (X killed Y Y is dead) anaphora / coreference resolution 23 / 24

Concepts Borrowed from Computer Science finite-state automata / transducers search: divide and conquer, beam search, nondeterminism, guides and oracles parsing (compilers) dynamic programming machine learning approaches: decision trees, k-nearest neighbors, clustering, support vector machines,... 24 / 24