TEXT ANALYSIS AND COMPREHENSION:

Similar documents
CS 598 Natural Language Processing

Compositional Semantics

Parsing of part-of-speech tagged Assamese Texts

AQUA: An Ontology-Driven Question Answering System

Natural Language Processing. George Konidaris

Ch VI- SENTENCE PATTERNS.

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Loughton School s curriculum evening. 28 th February 2017

Knowledge-Based - Systems

Using dialogue context to improve parsing performance in dialogue systems

Context Free Grammars. Many slides from Michael Collins

Developing Grammar in Context

Proof Theory for Syntacticians

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Linking Task: Identifying authors and book titles in verbose queries

SPRING GROVE AREA SCHOOL DISTRICT

Some Principles of Automated Natural Language Information Extraction

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

What the National Curriculum requires in reading at Y5 and Y6

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

MYCIN. The embodiment of all the clichés of what expert systems are. (Newell)

Chapter 9 Banked gap-filling

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Case Study: News Classification Based on Term Frequency

L1 and L2 acquisition. Holger Diessel

An Interactive Intelligent Language Tutor Over The Internet

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

First Grade Curriculum Highlights: In alignment with the Common Core Standards

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Procedia - Social and Behavioral Sciences 154 ( 2014 )

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Probabilistic Latent Semantic Analysis

Cross Language Information Retrieval

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Underlying and Surface Grammatical Relations in Greek consider

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Constraining X-Bar: Theta Theory

Applications of memory-based natural language processing

Controlled vocabulary

Modeling user preferences and norms in context-aware systems

Aspectual Classes of Verb Phrases

MENTORING. Tips, Techniques, and Best Practices

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Chapter 4: Valence & Agreement CSLI Publications

The College Board Redesigned SAT Grade 12

Lecture 1: Basic Concepts of Machine Learning

Automating the E-learning Personalization

Unit 8 Pronoun References

National Literacy and Numeracy Framework for years 3/4

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

The Smart/Empire TIPSTER IR System

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Guidelines for Writing an Internship Report

The Conversational User Interface

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

UNIT IX. Don t Tell. Are there some things that grown-ups don t let you do? Read about what this child feels.

Myths, Legends, Fairytales and Novels (Writing a Letter)

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Construction Grammar. University of Jena.

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Topic: Making A Colorado Brochure Grade : 4 to adult An integrated lesson plan covering three sessions of approximately 50 minutes each.

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

Language acquisition: acquiring some aspects of syntax.

Teaching Task Rewrite. Teaching Task: Rewrite the Teaching Task: What is the theme of the poem Mother to Son?

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

A Domain Ontology Development Environment Using a MRD and Text Corpus

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

Tap vs. Bottled Water

CX 105/205/305 Greek Language 2017/18

Lecturing in a Loincloth

The Writing Process. The Academic Support Centre // September 2015

Houghton Mifflin Online Assessment System Walkthrough Guide

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Lecture 2: Quantifiers and Approximation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Foundations of Knowledge Representation in Cyc

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Transcription:

Анализа текста и екстракција информација TEXT ANALYSIS AND COMPREHENSION: BASIC CONCEPTS; CHALLENGES; APPLICATION DOMAINS Jelena Jovanović Email: jeljov@gmail.com Web: http://jelenajovanovic.net

Outline Text analysis and comprehension: Why is it relevant? Why do we need it? What challenges does it face? What are typical approaches to text analysis and comprehension? 2

Why is it relevant? Why do we need it? Context-aware spelling and grammar check Semantic search More advanced than traditional, keywords-based search Information extraction Extraction of entities and their relationships from texts of different sorts Machine (automated) translation 3

Why is it relevant? Why do we need it? New interfaces Dialog-based systems Business applications: reputation management context-aware advertising business analytics 4

What are the challenges? The complexity of human language Some examples: Mary and Sue are sisters. Mary and Sue are mothers. Joe saw his brother skiing on TV. The fool didn t have a jacket on! didn t recognize him! 5

What are the challenges? Examples (cont.) I deposited $100 in the bank. The river deposited sediment along the bank. Put on something warm, it s cold outside. I ll come quickly! See you soon! 6

What are the challenges? To sum up, human language is: Full of ambiguous terms and phrases Based on the use of context for defining and conveying meaning Full of fuzzy, probabilistic terms Based on commonsense knowledge and reasoning Influenced by and an influencer of human social interactions 7

What are the challenges? Complex, layered structure of human language: What words appear in the given piece of text? What phrases can be identified? Are there words that modify the meaning of other words? What is the (literal) meaning of the identified words and phrases? What can be deduced from the fact that someone said something in the given context? What kind of reaction could be expected? 8

What are the challenges? The level of language analysis Morphology Syntax and Grammar Semantics Description Recognizing words and the variety of their forms Recognizing the type of the word Identifying how different words are related to one another Determining the meaning of words (often based on their context) Example use, uses, user different forms of the same word There are 5 rows in the table. rows is noun here; She rows 5 times per week. rows is verb in this case Bob went out; he needed some fresh air. The pronoun he refers to Bob. The car driver was injured. vs. The driver was installed in the computer 9

Language/text modeling Main approaches to text/language modeling: Logical models Rely on detailed linguistic analysis, and abstract representation of the sentence structure (typically in the form of a parse tree) Models of this type need to be manually created An example of tree-based model of a sentence structure 10 Image source: http://goo.gl/qgcqs9

Language/text modeling Main approaches to text/language modeling: Stochastic models Based on the probability of occurrence of individual words or sequences of words (typically 2-4 words)* These models are learned i.e., their creation is automated through the application of m. learning methods over large text corpora Hybrid models Combine characteristics of logical and stochastic models E.g., assigning probabilities to individual elements of a tree-based language model * a sequence of n words with associated probability is often referred to as n-gram 11

Recommendation The Natural Language Processing topic within the course Introduction to Artificial Intelligence at Udacity.com URL: https://www.udacity.com/course/cs271 Lecture on Natural Language Processing held during the International Summer School on Semantic Computing, Berkeley 2011 URL: http://videolectures.net/sssc2011_martell_naturallanguage/ 12