Search engines, Question Answering and Syntactic Analysis

Similar documents
AQUA: An Ontology-Driven Question Answering System

CS 598 Natural Language Processing

Compositional Semantics

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Parsing of part-of-speech tagged Assamese Texts

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Case Study: News Classification Based on Term Frequency

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Derivational and Inflectional Morphemes in Pak-Pak Language

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Applications of memory-based natural language processing

The Smart/Empire TIPSTER IR System

The MEANING Multilingual Central Repository

Context Free Grammars. Many slides from Michael Collins

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Cross Language Information Retrieval

Writing a composition

Natural Language Processing. George Konidaris

Constraining X-Bar: Theta Theory

Construction Grammar. University of Jena.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

An Introduction to the Minimalist Program

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

The College Board Redesigned SAT Grade 12

Probabilistic Latent Semantic Analysis

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Control and Boundedness

ScienceDirect. Malayalam question answering system

National Literacy and Numeracy Framework for years 3/4

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Ch VI- SENTENCE PATTERNS.

Let's Learn English Lesson Plan

BASIC ENGLISH. Book GRAMMAR

Some Principles of Automated Natural Language Information Extraction

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Visual CP Representation of Knowledge

Modeling full form lexica for Arabic

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Argument structure and theta roles

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Controlled vocabulary

Underlying and Surface Grammatical Relations in Greek consider

Reading Project. Happy reading and have an excellent summer!

THE VERB ARGUMENT BROWSER

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Organizing Comprehensive Literacy Assessment: How to Get Started

November 2012 MUET (800)

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Grammars & Parsing, Part 1:

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

DESIGNING NARRATIVE LEARNING MATERIAL AS A GUIDANCE FOR JUNIOR HIGH SCHOOL STUDENTS IN LEARNING NARRATIVE TEXT

Linking Task: Identifying authors and book titles in verbose queries

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Chapter 9 Banked gap-filling

Using dialogue context to improve parsing performance in dialogue systems

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Developing Grammar in Context

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

Developing a TT-MCTAG for German with an RCG-based Parser

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Introduction to Text Mining

An Interactive Intelligent Language Tutor Over The Internet

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Ontologies vs. classification systems

Loughton School s curriculum evening. 28 th February 2017

Unit 8 Pronoun References

English Language and Applied Linguistics. Module Descriptions 2017/18

Tutoring First-Year Writing Students at UNM

Research Journal ADE DEDI SALIPUTRA NIM: F

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Chapter 4: Valence & Agreement CSLI Publications

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

The stages of event extraction

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Proof Theory for Syntacticians

Leveraging Sentiment to Compute Word Similarity

LET S COMPARE ADVERBS OF DEGREE

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Guidelines for Writing an Internship Report

The Role of the Head in the Interpretation of English Deverbal Compounds

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Using Semantic Relations to Refine Coreference Decisions

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Specifying a shallow grammatical for parsing purposes

Multiple case assignment and the English pseudo-passive *

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Using a Native Language Reference Grammar as a Language Learning Tool

What the National Curriculum requires in reading at Y5 and Y6

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Transcription:

Search engines, Question Answering and Syntactic Analysis Kaarel Kaljurand (kaarel@ut.ee) Tartu University Theory Days in Koke 2004, Koke, Estonia

Outline of the talk Search (information retrieval, information extraction, question answering) Problems with currently available search tools (e.g. Google) Currently available NLP tools and how they can be put to use: Question Answering system Closer look to syntactic analysis in Question Answering Theory Days in Koke 2004, Koke, Estonia 2/23

The search problem Definition: provide an answer to a statement of user s information need How is this statement formulated? How is the answer formulated? What are the features of the knowledge source? How to process the knowledge source (= understand its meaning)? Theory Days in Koke 2004, Koke, Estonia 3/23

The search problem (cont.) Knowledge source Database (information is highly structured) Web (natural language, redundancy) Small text collection (e.g. technical manual) Information need Summarization List of the characters in Hamlet. What did the author want to say in this essay?... Theory Days in Koke 2004, Koke, Estonia 4/23

Keyword-based (web) search Keyword-based search: mapping a set of keywords to a set of documents Query as a Boolean formula ( pet AND dog AND-NOT cat ) Bag-of-words model to represent documents Ranking Small amount of NLP: lemmatization, stop-word lists Theory Days in Koke 2004, Koke, Estonia 5/23

Problems with keyword-based search Documents are written in natural language: ambiguity (synonymy, polysemy) exists at every level of language User has to convert his question into a set of keywords, not very intuitive ( Find a document that contains the word dog ) Too many results usually retrieved Result unit is a file (which can be of any size), instead of a linguistic unit, e.g. a sentence or a paragraph Theory Days in Koke 2004, Koke, Estonia 6/23

Overcoming the problems Phrase search, to overcome poor syntax modeling (probably works better with English where the word order is more fixed) Ranking (using meta-information like links), classification (teoma.com) Excerpts and highlighting (to overcome big text sizes) Location information, personalized results NLP: lemmatization, query expansion with synonyms (from e.g. WordNet) Theory Days in Koke 2004, Koke, Estonia 7/23

NLP intensive search: Question Answering Maps a natural language question to natural language (short) answer As ambitious as Machine Translation, tries to understand the documents by applying analysis of all levels of language Interesting are NLP intensive methods, although QA can be attempted by simple pattern matching + wrapper for keyword-based search (e.g. askjeeves.com) Theory Days in Koke 2004, Koke, Estonia 8/23

Levels of language analysis Morphology: dog = dogs, quick = quickly, koer = koerakeselikkusegagi Syntax: John gave Mary a book = A book was given to Mary by John Semantics: John gave Mary a book = Mary got a book from John John would have run = John runs vi edits texts = vi is a text editor John kills himself = John kills John John kills Mary Mary is dead Theory Days in Koke 2004, Koke, Estonia 9/23

Pragmatics: John Person, CEO JobTitle Theory Days in Koke 2004, Koke, Estonia 10/23

Components of languagecomputer.com Named Entity Recognition (names of companies, persons, locations etc.) Syntactic Analysis (noun and verb groups, PP attachments) Coreference Resolution (President Bush = Georg W. Bush) Meta-information extraction from WordNet glosses Logical Form Generation Theorem proving (with Otter) Theory Days in Koke 2004, Koke, Estonia 11/23

Document representation example Heavy selling of Standard & Poor s 500-stock index futures in Chicago relentlessly beat stocks downward. heavy JJ(x1) & selling NN(x1) & of IN(x1,x6) & Standard NN(x2) & & CC(x13,x2,x3) & Poor NN(x3) & s POS(x6,x13) & 500-stock JJ(x6) & index NN(x4) & future NN(x5) & nn NNC(x6,x4,x5) & in IN(x1,x8) & Chicago NN(x8) & relentlessly RB(e12) & beat VB(e12,x1,x9) & stocks NN(x9) & downward RB(e12). Theory Days in Koke 2004, Koke, Estonia 12/23

Question Answering screenshot Open domain QA: What percent of the Earth s air is oxygen? Theory Days in Koke 2004, Koke, Estonia 13/23

Syntax formalisms Phrase Structure Grammar (Chomsky 1957) Focuses on phrase structure Analysis and generation Sensitive to word order Dependency Grammar (Tesnière 1959, Mel ĉuk 1987) Focuses on binding words Compatible with free word order languages Structure is more semantic Less focus on grammatical correctness Theory Days in Koke 2004, Koke, Estonia 14/23

Dependency Grammar example Subject, object and indirect object Theory Days in Koke 2004, Koke, Estonia 15/23

Closeness to semantics Syntactic relations map nicely to semantic ones: subject actor object patient adjective modifier property Theory Days in Koke 2004, Koke, Estonia 16/23

Levels of dependency analysis Shallow The nature of modification (e.g. subject) is specified, but not the target Quite reliable (Constraint Grammar: 95% of reliability for English) Deep The full relation is specified, e.g. subject(run, dog) Subject and object relations detected correctly 90% of the times Theory Days in Koke 2004, Koke, Estonia 17/23

Difficult problems, e.g. PP-attachment ( I saw a man with a hat vs. I saw an ant with a microscope ) Existing systems: Connexor Machinese Syntax, MINIPAR, Link Parser etc Theory Days in Koke 2004, Koke, Estonia 18/23

Deep Dependency Grammar rules Each word in the sentence modifies (is a dependent of) another word (so called head ) Each word can modify only one head Head-modifier relations have types (e.g. main verb, subject, object, attribute) The sentence structure is a tree (no modification cycles are allowed) Theory Days in Koke 2004, Koke, Estonia 19/23

Example 1 Classification of adverbs Theory Days in Koke 2004, Koke, Estonia 20/23

Example 2 Question analysis Theory Days in Koke 2004, Koke, Estonia 21/23

Example 3 Coordination, control structures: John and Mary are subjects of promise and dance Theory Days in Koke 2004, Koke, Estonia 22/23

Existing Estonian NLP tools Morphological analyzer A shallow dependency parser based on Constraint Grammar formalism WordNet semantic dictionary Theory Days in Koke 2004, Koke, Estonia 23/23