Statistical NLP: linguistic essentials. Updated 10/15

Similar documents
Context Free Grammars. Many slides from Michael Collins

CS 598 Natural Language Processing

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Argument structure and theta roles

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Grammars & Parsing, Part 1:

Parsing of part-of-speech tagged Assamese Texts

Developing a TT-MCTAG for German with an RCG-based Parser

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Compositional Semantics

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Construction Grammar. University of Jena.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Words come in categories

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Chapter 4: Valence & Agreement CSLI Publications

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Control and Boundedness

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Ch VI- SENTENCE PATTERNS.

Constraining X-Bar: Theta Theory

Natural Language Processing. George Konidaris

English Language and Applied Linguistics. Module Descriptions 2017/18

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Some Principles of Automated Natural Language Information Extraction

AQUA: An Ontology-Driven Question Answering System

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Prediction of Maximal Projection for Semantic Role Labeling

Underlying and Surface Grammatical Relations in Greek consider

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Pseudo-Passives as Adjectival Passives

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Advanced Grammar in Use

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Using dialogue context to improve parsing performance in dialogue systems

Update on Soar-based language processing

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

LING 329 : MORPHOLOGY

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

LTAG-spinal and the Treebank

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Derivational and Inflectional Morphemes in Pak-Pak Language

California Department of Education English Language Development Standards for Grade 8

Adjectives tell you more about a noun (for example: the red dress ).

Ensemble Technique Utilization for Indonesian Dependency Parser

Proof Theory for Syntacticians

Procedia - Social and Behavioral Sciences 154 ( 2014 )

BULATS A2 WORDLIST 2

An Introduction to the Minimalist Program

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Vocabulary Usage and Intelligibility in Learner Language

Applications of memory-based natural language processing

Multiple case assignment and the English pseudo-passive *

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Introduction to Text Mining

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Loughton School s curriculum evening. 28 th February 2017

Chapter 1 The functional approach to language and the typological approach to grammar

Hindi Aspectual Verb Complexes

On the Notion Determiner

Today we examine the distribution of infinitival clauses, which can be

Emmaus Lutheran School English Language Arts Curriculum

Formulaic Language and Fluency: ESL Teaching Applications

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

The stages of event extraction

Modeling full form lexica for Arabic

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A First-Pass Approach for Evaluating Machine Translation Systems

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

An Interactive Intelligent Language Tutor Over The Internet

What the National Curriculum requires in reading at Y5 and Y6

Developing Grammar in Context

Word Stress and Intonation: Introduction

The Discourse Anaphoric Properties of Connectives

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Linking Task: Identifying authors and book titles in verbose queries

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Analysis of Probabilistic Parsing in NLP

BASIC ENGLISH. Book GRAMMAR

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Tibor Kiss Reconstituting Grammar: Hagit Borer's Exoskeletal Syntax 1

- «Crede Experto:,,,». 2 (09) ( '36

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

THE VERB ARGUMENT BROWSER

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Transcription:

Statistical NLP: linguistic essentials Updated 10/15

Parts of Speech and Morphology syntactic or grammatical categories or parts of Speech (POS) are classes of word with similar syntactic behavior Examples of word categories: noun, verb, adjective, prepositions, adverb, Most basic test for words belonging to the same class is the substitution test. The intelligent sad green fat one in the corner

Syntactic categories Traditional systems of part-of-speech distinguish 8 categories Corpus linguists use many more fine grained classification of word classes, abbreviated as POS tags.

Morphological process Word categories are systematically related by morphological processes such as the formation of plural form from the singular form. The major types of morphological processes are Inflection: drive driven, egg eggs derivation : drive driver, wide widely Compounding : database, overtake

Main Syntactic Functions of words Typically, nouns refer to entities in the world (e.g. people, animals, hat ). Determiners describe the particular reference of a noun (e.g. the, a ) and adjectives describe the properties of nouns (e.g. red, long, intelligent ). Verbs are used to describe actions, activities and states (e.g. have, threw, walked ). Adverbs modify a verb in the same way as adjectives modify nouns (e.g. often, heavily ). Prepositions are typically small words that express spatial or time relationships (e.g. in, on, over ). Prepositions can also be used as particles to create phrasal verbs. Conjunctions and complementizers link two words, phrases or clauses (e.g. and, or, but ).

Brown tags (partial list) NN singular noun NNP proper nouns NNS plural nouns NR adverbial nouns (e.g. home ) JJ - adjective AT articles VB verb, base form VBD verb third person singular (e.g. likes ) RB adverbs IN - preposition

Phrase structure Words are ordered in phrases in hierarchical order She The woman The tall woman The tall woman with sad eyes saw him the man the fat man the fat man with red beard

Major phrase types Noun phrase (NP), e.g. The homeless old man in the park that lied on the bench Prepositional phrase (PP) e.g. under the fence painted yesterday Verb phrase (VP) e.g. coughed severely

Phrase structure grammars Syntactic analysis of a sentence determines the meaning of the sentence Mary gave peter a book Peter gave Mary a book In English, word order is essential for inferring who did what to whom. Many languages (e.g. Latin, Russian) are free word order languages. Regularities in word order are often captured by rewrite rules.

Syntax or Phrase Structure: A simple context-free grammar S --> NP VP NP --> AT NNS AT NN NP PP VP --> VP PP VBD VBD NP PP --> IN NP The Grammar AT --> the NNS --> children students mountains VBD --> slept ate saw IN --> in of NN --> cake The Lexicon

Syntax or Phrase Structure: A Parse Tree I

Syntax or Phrase Structure: A Parse Tree II

Syntax or Phrase Structure: A Parse Tree III

Local and Non-Local Dependencies A local dependency is a dependency between two words expressed within the same syntactic rule. A non-local dependency is an instance in which two words can be syntactically dependent even though they occur far apart in a sentence (e.g., subject-verb agreement; long-distance dependencies such as wh-extraction). Non-local phenomena are a challenge for certain statistical NLP approaches (e.g., n- grams) that model local dependencies.

Semantic Roles Most commonly, noun phrases are arguments of verbs. These arguments have semantic roles: the agent of an action, the patient and other roles such as the instrument or the goal. In English, these semantic roles correspond to the notions of subject and object. But things are complicated by the notions of direct and indirect object, active and passive voice.

Subcategorization Different verbs can relate different numbers of entities: transitive versus intransitive verbs. Tightly related verb arguments are called complements but less tightly related ones are called adjuncts. Prototypical examples of adjuncts tell us time, place, or manner of the action or state described by the verb. Verbs are classified according to the type of complements they permit. This is called subcategorization. Subcategorizations allow to capture syntactic as well as semantic regularities.

Attachment Ambiguity and Garden-Path Sentences Attachment ambiguities occur with phrases that could have been generated by two different nodes in the parse tree. E.g.: The children ate the cake with a spoon. Garden-Path sentences are sentences that lead you along a path that suddenly turns out not to work. E.g.: The horse raced past the barn fell.

Semantics Semantics is the study of the meaning of words, constructions, and utterances. Semantics can be divided into two parts: lexical semantics and combination semantics. Lexical semantics: hypernymy, hyponymy, antonymy, meronymy, holonymy, synonymy, homonymy, polysemy, and homophony. Compositionality: the meaning of the whole often differs from the meaning of the parts. Idioms correspond to cases where the compound phrase means something completely different from its parts.

Pragmatics Pragmatics is the area of studies that goes beyond the study of the meaning of a sentence and tries to explain what the speaker really is expressing. Understand the scope of quantifiers, speech acts, discourse analysis, anaphoric relations. The resolution of anaphoric relations is crucial to the task of information extraction.