Foundations of Natural Language Processing Lecture 1 Introduction

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Context Free Grammars. Many slides from Michael Collins

Natural Language Processing. George Konidaris

Parsing of part-of-speech tagged Assamese Texts

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Applications of memory-based natural language processing

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Grammars & Parsing, Part 1:

Compositional Semantics

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Prediction of Maximal Projection for Semantic Role Labeling

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Introduction, Organization Overview of NLP, Main Issues

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking Task: Identifying authors and book titles in verbose queries

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Universiteit Leiden ICT in Business

SEMAFOR: Frame Argument Resolution with Log-Linear Models

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Ensemble Technique Utilization for Indonesian Dependency Parser

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Argument structure and theta roles

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

AQUA: An Ontology-Driven Question Answering System

Analysis of Probabilistic Parsing in NLP

A Graph Based Authorship Identification Approach

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Introduction to Text Mining

Control and Boundedness

Proof Theory for Syntacticians

The Smart/Empire TIPSTER IR System

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Developing a TT-MCTAG for German with an RCG-based Parser

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

The stages of event extraction

Chapter 4: Valence & Agreement CSLI Publications

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

ARNE - A tool for Namend Entity Recognition from Arabic Text

Cross Language Information Retrieval

LTAG-spinal and the Treebank

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Accurate Unlexicalized Parsing for Modern Hebrew

Indian Institute of Technology, Kanpur

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Physics 270: Experimental Physics

Training and evaluation of POS taggers on the French MULTITAG corpus

The Role of the Head in the Interpretation of English Deverbal Compounds

Leveraging Sentiment to Compute Word Similarity

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Update on Soar-based language processing

Constraining X-Bar: Theta Theory

Online Updating of Word Representations for Part-of-Speech Tagging

ScienceDirect. Malayalam question answering system

Some Principles of Automated Natural Language Information Extraction

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

CS177 Python Programming

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

BYLINE [Heng Ji, Computer Science Department, New York University,

Word Sense Disambiguation

Adapting Stochastic Output for Rule-Based Semantics

Multiple case assignment and the English pseudo-passive *

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

A Comparison of Two Text Representations for Sentiment Analysis

Florida Reading Endorsement Alignment Matrix Competency 1

Construction Grammar. University of Jena.

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Using dialogue context to improve parsing performance in dialogue systems

Language acquisition: acquiring some aspects of syntax.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

STUDENT MOODLE ORIENTATION

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Distant Supervised Relation Extraction with Wikipedia and Freebase

Specifying a shallow grammatical for parsing purposes

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Words come in categories

Annotation Projection for Discourse Connectives

Corpus Linguistics (L615)

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Transcription:

Foundations of Natural Language Processing Lecture 1 Introduction Alex Lascarides (Slides based on those of Philipp Koehn, Alex Lascarides, Sharon Goldwater) 16 January 2018 Alex Lascarides FNLP Lecture 1 16 January 2018

What is Natural Language Processing? Alex Lascarides FNLP Lecture 1 1

What is Natural Language Processing? Applications Machine Translation Information Retrieval Question Answering Dialogue Systems Information Extraction Summarization Sentiment Analysis... Core technologies Language modelling Part-of-speech tagging Syntactic parsing Named-entity recognition Coreference resolution Word sense disambiguation Semantic Role Labelling... Alex Lascarides FNLP Lecture 1 2

This course NLP is a big field! We focus mainly on core ideas and methods needed for technologies in the second column (and eventually for applications). Linguistic facts and issues Computational models and algorithms More advanced methods and specific application areas covered in 4th/5th year courses: Natural Language Understanding Machine Translation Topics in NLP Automatic Speech Recognition Alex Lascarides FNLP Lecture 1 3

What does an NLP system need to know? Language consists of many levels of structure Humans fluently integrate all of these in producing/understanding language Ideally, so would a computer! Alex Lascarides FNLP Lecture 1 4

Words This is a simple sentence WORDS Alex Lascarides FNLP Lecture 1 5

Morphology This is a simple sentence be 3sg present WORDS MORPHOLOGY Alex Lascarides FNLP Lecture 1 6

Parts of Speech DT VBZ DT JJ NN This is a simple sentence be 3sg present PART OF SPEECH WORDS MORPHOLOGY Alex Lascarides FNLP Lecture 1 7

Syntax S VP NP NP SYNTAX DT VBZ DT JJ NN This is a simple sentence be 3sg present PART OF SPEECH WORDS MORPHOLOGY Alex Lascarides FNLP Lecture 1 8

Semantics S NP VP NP SYNTAX DT VBZ DT JJ NN This is a simple sentence be 3sg present SIMPLE1 having few parts SENTENCE1 string of words satisfying the grammatical rules of a languauge PART OF SPEECH WORDS MORPHOLOGY SEMANTICS y(this dem(x) be(e, x, y) simple(y) sentence(y)) Alex Lascarides FNLP Lecture 1 9

Discourse S NP VP NP SYNTAX CONTRAST DT VBZ DT JJ NN This is a simple sentence be 3sg present SIMPLE1 having few parts But it is an instructive one. SENTENCE1 string of words satisfying the grammatical rules of a languauge PART OF SPEECH WORDS MORPHOLOGY SEMANTICS DISCOURSE Alex Lascarides FNLP Lecture 1 10

Why is NLP hard? 1. Ambiguity at many levels: Word senses: bank (finance or river?) Part of speech: chair (noun or verb?) Syntactic structure: I saw a man with a telescope Quantifier scope: Every child loves some movie Multiple: I saw her duck Reference: John dropped the goblet onto the glass table and it broke. Discourse: The meeting is cancelled. Nicholas isn t coming to the office today. How can we model ambiguity, and choose the correct analysis in context? Alex Lascarides FNLP Lecture 1 11

Ambiguity Inf2a started to discuss methods of dealing with ambiguity. non-probabilistic methods (FSMs for morphology, CKY parsers for syntax) return all possible analyses. probabilistic models (HMMs for POS tagging, PCFGs for syntax) and algorithms (Viterbi, probabilistic CKY) return the best possible analysis, i.e., the most probable one according to the model. This best analysis is only good if our model s probabilities are accurate. Where do they come from? Alex Lascarides FNLP Lecture 1 12

Statistical NLP Like most other parts of AI, NLP today is dominated by statistical methods. Typically more robust than earlier rule-based methods. Relevant statistics/probabilities are learned from data (cf. Inf2b). Normally requires lots of data about any particular phenomenon. Alex Lascarides FNLP Lecture 1 13

2. Sparse data due to Zipf s Law. Why is NLP hard? To illustrate, let s look at the frequencies of different words in a large text corpus. Assume a word is a string of letters separated by spaces (a great oversimplification, we ll return to this issue...) Alex Lascarides FNLP Lecture 1 14

Word Counts Most frequent words (word types) in the English Europarl corpus (out of 24m word tokens) any word Frequency Type 1,698,599 the 849,256 of 793,731 to 640,257 and 508,560 in 407,638 that 400,467 is 394,778 a 263,040 I nouns Frequency Type 124,598 European 104,325 Mr 92,195 Commission 66,781 President 62,867 Parliament 57,804 Union 53,683 report 53,547 Council 45,842 States Alex Lascarides FNLP Lecture 1 15

Word Counts But also, out of 93638 distinct word types, 36231 occur only once. Examples: cornflakes, mathematicians, fuzziness, jumbling pseudo-rapporteur, lobby-ridden, perfunctorily, Lycketoft, UNCITRAL, H-0695 policyfor, Commissioneris, 145.95, 27a Alex Lascarides FNLP Lecture 1 16

Plotting word frequencies Order words by frequency. What is the frequency of nth ranked word? Alex Lascarides FNLP Lecture 1 17

Plotting word frequencies Order words by frequency. What is the frequency of nth ranked word? Alex Lascarides FNLP Lecture 1 18

Rescaling the axes To really see what s going on, use logarithmic axes: Alex Lascarides FNLP Lecture 1 19

Alex Lascarides FNLP Lecture 1 20

Zipf s law Summarizes the behaviour we just saw: f r k f = frequency of a word r = rank of a word (if sorted by frequency) k = a constant Alex Lascarides FNLP Lecture 1 21

Zipf s law Summarizes the behaviour we just saw: f r k f = frequency of a word r = rank of a word (if sorted by frequency) k = a constant Why a line in log-scales? fr = k f = k r log f = log k log r Alex Lascarides FNLP Lecture 1 22

Implications of Zipf s Law Regardless of how large our corpus is, there will be a lot of infrequent (and zero-frequency!) words. In fact, the same holds for many other levels of linguistic structure (e.g., syntactic rules in a CFG). This means we need to find clever ways to estimate probabilities for things we have rarely or never seen. Alex Lascarides FNLP Lecture 1 23

Why is NLP hard? 3. Variation Suppose we train a part of speech tagger on the Wall Street Journal: Mr./NNP Vinken/NNP is/vbz chairman/nn of/in Elsevier/NNP N.V./NNP,/, the/dt Dutch/NNP publishing/vbg group/nn./. Alex Lascarides FNLP Lecture 1 24

Why is NLP hard? 3. Variation Suppose we train a part of speech tagger on the Wall Street Journal: Mr./NNP Vinken/NNP is/vbz chairman/nn of/in Elsevier/NNP N.V./NNP,/, the/dt Dutch/NNP publishing/vbg group/nn./. What will happen if we try to use this tagger for social media?? ikr smh he asked fir yo last name Twitter example due to Noah Smith Alex Lascarides FNLP Lecture 1 25

Why is NLP hard? 4. Expressivity Not only can one form have different meanings (ambiguity) but the same meaning can be expressed with different forms: She gave the book to Tom vs. She gave Tom the book Some kids popped by vs. A few children visited Is that window still open? vs Please close the window Alex Lascarides FNLP Lecture 1 26

Why is NLP hard? 5 and 6. Context dependence and Unknown representation Last example also shows that correct interpretation is context-dependent and often requires world knowledge. Very difficult to capture, since we don t even know how to represent the knowledge a human has/needs: What is the meaning of a word or sentence? How to model context? Other general knowledge? That is, in the limit NLP is hard because AI is hard In particular, we ve made remarkably little progress on the Knowledge Representation problem... Alex Lascarides FNLP Lecture 1 27

Background needed for this course We assume you are familiar with most/all of the following: Basic Python programming Finite-state machines, regular languages Context-free grammars Dynamic programming (e.g. edit distance, Viterbi, and/or CKY algorithms) Concepts from machine learning (estimating probabilities, making predictions based on data) Probability theory (conditional probabilities, Bayes Rule, independence and conditional independence, expectations) Vectors, logarithms Concepts of syntactic structure and semantics and relationship between them (ideally for natural language but at least for programming languages) Some basic linguistic concepts (e.g. parts of speech, inflection) Alex Lascarides FNLP Lecture 1 28

Where we are headed Informatics 2a discussed ideas and algorithms for NLP from a largely formal, algorithmic perspective. Here we build on that by Focusing on real data with all its complexities. Discussing some of the algorithms in more depth, as probabilistic inference. Introducing some tasks and technologies that didn t fit into the Inf2a story. Alex Lascarides FNLP Lecture 1 29

Course organization Lecturer: Alex Lascarides Lectures: Tue/Fri 10:00-10:50 Labs: two groups Reply to the email from ITO to be assigned a group. Labs start next week! Web site: for slides, lectures, labs, assignments, due dates, etc http://www.inf.ed.ac.uk/teaching/courses/fnlp/ Course mailing list: fnlp-students@inf. Register ASAP to get on the list! Course discussion forum: Piazza. Link for signing up to FNLP s piazza page is on FNLP website. Alex Lascarides FNLP Lecture 1 30

Outside work required In addition to attending lectures, you are expected to keep up with: Readings from textbook: Speech and Language Processing, 3rd edition (online) and 2nd edition (paperback, International version), Jurafsky and Martin. NLP techniques in Python: Bird, S., E. Klein and E. Loper, Natural Language Processing with Python, (2009) O Reilly Media Weekly (unassessed) labs (in Python). To help solidify concepts and give you practical experience. Help and feedback available from lab demonstrator. Lectures are being recorded. Recordings will be linked from the lectures page week by week. The audience is not in shot. Two assignments (in Python) The second worth 30% The first will be reviewed and marked, but will not contribute to your final mark Exam in May, worth 70% of final mark. We will also provide some optional further readings/exercises for those who wish to stretch themselves. These will be clearly marked as optional (non-examinable). Alex Lascarides FNLP Lecture 1 31