Applications of memory-based natural language processing

Similar documents
Memory-based grammatical error correction

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The stages of event extraction

Learning Computational Grammars

AQUA: An Ontology-Driven Question Answering System

Cross Language Information Retrieval

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Analysis of Probabilistic Parsing in NLP

Beyond the Pipeline: Discrete Optimization in NLP

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning Distributed Linguistic Classes

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

A Coreference Corpus and Resolution System for Dutch

Linking Task: Identifying authors and book titles in verbose queries

CS 598 Natural Language Processing

English Language and Applied Linguistics. Module Descriptions 2017/18

Questions, Pictures, Answers: Introducing Pictures in Question-Answering Systems

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Speech Recognition at ICSI: Broadcast News and beyond

Developing a TT-MCTAG for German with an RCG-based Parser

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

The taming of the data:

BYLINE [Heng Ji, Computer Science Department, New York University,

Using dialogue context to improve parsing performance in dialogue systems

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

LING 329 : MORPHOLOGY

Organizing Comprehensive Literacy Assessment: How to Get Started

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Some Principles of Automated Natural Language Information Extraction

The Conversational User Interface

ScienceDirect. Malayalam question answering system

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

The MEANING Multilingual Central Repository

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Problems of the Arabic OCR: New Attitudes

Word Segmentation of Off-line Handwritten Documents

Modeling full form lexica for Arabic

The Role of the Head in the Interpretation of English Deverbal Compounds

Introduction to Text Mining

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Florida Reading Endorsement Alignment Matrix Competency 1

A Case Study: News Classification Based on Term Frequency

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Annotation Projection for Discourse Connectives

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Natural Language Processing: Interpretation, Reasoning and Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Language Independent Passage Retrieval for Question Answering

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Vocabulary Usage and Intelligibility in Learner Language

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

An Interactive Intelligent Language Tutor Over The Internet

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Argument structure and theta roles

Top US Tech Talent for the Top China Tech Company

A Bayesian Learning Approach to Concept-Based Document Classification

The Smart/Empire TIPSTER IR System

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Laboratorio di Intelligenza Artificiale e Robotica

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Cross-Lingual Text Categorization

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Routledge Library Editions: The English Language: Pronouns And Word Order In Old English: With Particular Reference To The Indefinite Pronoun Man

THE VERB ARGUMENT BROWSER

The College Board Redesigned SAT Grade 12

Distant Supervised Relation Extraction with Wikipedia and Freebase

Computerized Adaptive Psychological Testing A Personalisation Perspective

Update on Soar-based language processing

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Ensemble Technique Utilization for Indonesian Dependency Parser

An Introduction to the Minimalist Program

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Highlighting and Annotation Tips Foundation Lesson

What the National Curriculum requires in reading at Y5 and Y6

Development of the First LRs for Macedonian: Current Projects

Reducing Features to Improve Bug Prediction

Guru: A Computer Tutor that Models Expert Human Tutors

Introduction, Organization Overview of NLP, Main Issues

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Prediction of Maximal Projection for Semantic Role Labeling

Chapter 4: Valence & Agreement CSLI Publications

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Transcription:

Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007

Current ILK members Principal investigator: Antal van den Bosch Post-doc researchers: Piroska Lendvai, Martin Reynaert, Roser Morante, Erwin Marsi Ph.D. students: Sander Canisius, Toine Bogers, Marieke van Erp, Herman Stehouwer Scientific programmers: Ko van der Sloot, Steve Hunt, Peter Berck Guest researchers: Erik Tjong Kim Sang, Iris Hendrickx, Walter Daelemans

0utline of the talk 1. Scientific embedding 1.1 NLP as classification 1.2 Inference in NLP 2. Memory-based NLP applications 3. Embedded memory-based applications 4. Software and infrastructure 5. e-learning?

1 Scientific embedding (1) Language processing is memory-based Learning consists of: Storing instances in memory Drawing analogies with the stored instances to deal with new experiences. Learning is a supervised process Annotated data are needed

Representation of instances Task: assigning part of speech tags Context Focus word Context were always accepted.. _?

1 Scientific embedding (2) Language processing has simplicity constraints: Context is a local phenomenon Abstraction is harmful

1 Scientific embedding (3) Language processing can be reduced to: Classification Segmentation, mapping Inference: Finding the optimal sequence/structure

1.1 NLP as classification (1) Classification: Given new test instance X, Compare it to all memory instances Compute a distance between X and memory instance Y Update the top k of closest instances (nearest neighbors) When done, take the majority class of the k nearest neighbors as the class of X

1.1 NLP as classification (2) Sentence accent placement Dependency relation assignment

1.2 Inference in NLP Local classifications global solution Open up search space In which there is an optimal global solution Search algorithms Constraint satisfaction inference Beam search Viterbi

2 Memory-based NLP apps Basic NLP Spelling correction Speech synthesis Morpho-syntax Semantics Machine translation Embedded NLP Dialogue systems Professional document writing Knowledge enrichment

2.1 Morpho-phonology

2.2 Morpho-syntax

2.3 Semantics

2.3 Semantics Semantic relations: content-container

2.4 Machine Translation Memory-based text-to-text processing Machine translation Language modelling Confusible disambiguation

3 Embedded Memory-Based Apps Dialogue systems NWO IMIX: ROLAQUAD Professional document writing Senter Novem IOP-MMI À Propos Knowledge enrichment in domains NWO CATCH: MITCH

3.1 Semantic Classification in QA Answer retrieval from domain documents through alignment of question analyses with off-line document analyses.

3.2 Professional Document Writing Pro-active personalization for professional document writing Recommend related articles for a 'focus' online news article Retrieve similar passages Classify experts

3.3 Knowledge Enrichment Mining information from texts in the cultural heritage From documents to knowledge bases and ontologies Goal: research and develop techniques to discover new meaning in large collections of partially structured data that are available at Naturalis

3.4 Text Mining in Animal Data

In sum

LT Modules Text Applications Lexical / Morphological Analysis Tagging Chunking Syntactic Analysis Word Sense Disambiguation Grammatical Relation Finding Named Entity Recognition Semantic Analysis OCR Spelling Error Correction Grammar Checking Information retrieval Document Classification Information Extraction Summarization Question Answering Ontology Extraction and Refinement Reference Resolution Discourse Analysis Meaning Dialogue Systems Machine Translation

4 Software and Infrastructure Open Source (GPL) software: a.o. TiMBL, MBT: Machine learning and sequence processing NeXTeNS: text-to-speech conversion POS tagging, lemmatization, morphological analysis, shallow parsing (Tadpole) Demos Web interfaces Computing infrastructure One supercomputer; one high-end file server Approx. 20 computing servers, 4 web/data servers, 20 desktops Parallelisation: Dimbl, Mumbl

Better accessibility e-learning? Recommendation tools Multi-lingual NLP & MT Creating better e-learning apps with more natural interfaces Speech synthesis QA, dialogue systems Language e-learning Help the computer learn language Win-win situation, open mind

Thanks for your attention! You will find more information in: http://ilk.uvt.nl

Partners Academic CNTS, University of Antwerp Project partners: Nijmegen, Groningen, Maastricht, Utrecht, Eindhoven, Leuven University of Bergen, Dublin City University, Polytechnic University of Catalunya, Saarland University, University of Illinois at Urbana- Champaign Non-commercial Naturalis Museum of Natural History Industrial Textkernel Project partners: Polderland, SEC, Irion, Trezorix

Spin off Textkernel B.V. Information extraction Robust text matching Dialogue systems Foundation for Inductive Learning Applications Broker for Tilburg and Antwerp university software Consultancy

Eager vs Lazy Learning