Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

Similar documents
AQUA: An Ontology-Driven Question Answering System

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Developing a TT-MCTAG for German with an RCG-based Parser

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Proof Theory for Syntacticians

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Compositional Semantics

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

The Smart/Empire TIPSTER IR System

Pre-Processing MRSes

An Interactive Intelligent Language Tutor Over The Internet

Some Principles of Automated Natural Language Information Extraction

Semantic Inference at the Lexical-Syntactic Level

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Visual CP Representation of Knowledge

Applications of memory-based natural language processing

Introduction, Organization Overview of NLP, Main Issues

Parsing of part-of-speech tagged Assamese Texts

Using Semantic Relations to Refine Coreference Decisions

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Ensemble Technique Utilization for Indonesian Dependency Parser

Ontologies vs. classification systems

The MEANING Multilingual Central Repository

Prediction of Maximal Projection for Semantic Role Labeling

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Developing a large semantically annotated corpus

A Domain Ontology Development Environment Using a MRD and Text Corpus

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Introduction to Text Mining

A student diagnosing and evaluation system for laboratory-based academic exercises

Distant Supervised Relation Extraction with Wikipedia and Freebase

The Strong Minimalist Thesis and Bounded Optimality

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

A Case Study: News Classification Based on Term Frequency

Knowledge-Based - Systems

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Shared Mental Models

A Grammar for Battle Management Language

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

"f TOPIC =T COMP COMP... OBJ

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Linking Task: Identifying authors and book titles in verbose queries

An Introduction to the Minimalist Program

On document relevance and lexical cohesion between query terms

The Conversational User Interface

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Update on Soar-based language processing

Natural Language Arguments: A Combined Approach

TextGraphs: Graph-based algorithms for Natural Language Processing

On-Line Data Analytics

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A Graph Based Authorship Identification Approach

A Bayesian Learning Approach to Concept-Based Document Classification

A Version Space Approach to Learning Context-free Grammars

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Cross Language Information Retrieval

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Radius STEM Readiness TM

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Multimedia Application Effective Support of Education

ScienceDirect. Malayalam question answering system

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

A First-Pass Approach for Evaluating Machine Translation Systems

Grammars & Parsing, Part 1:

Accurate Unlexicalized Parsing for Modern Hebrew

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Evolution of Collective Commitment during Teamwork

SYSTEM ENTITY STRUCTUURE ONTOLOGICAL DATA FUSION PROCESS INTEGRAGTED WITH C2 SYSTEMS

CS Machine Learning

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text

ARNE - A tool for Namend Entity Recognition from Arabic Text

Age Effects on Syntactic Control in. Second Language Learning

CS 598 Natural Language Processing

Learning Disability Functional Capacity Evaluation. Dear Doctor,

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Specifying Logic Programs in Controlled Natural Language

Dreistadt: A language enabled MOO for language learning

Assessing Entailer with a Corpus of Natural Language From an Intelligent Tutoring System

The CESAR Project: Enabling LRT for 70M+ Speakers

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

The Discourse Anaphoric Properties of Connectives

Computerized Adaptive Psychological Testing A Personalisation Perspective

Modeling full form lexica for Arabic

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Adapting Stochastic Output for Rule-Based Semantics

Adding syntactic structure to bilingual terminology for improved domain adaptation

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Exploring context issues within natural language information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Transcription:

Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence Dr. Matthias Hecking Fraunhofer FKIE matthias.hecking@fkie.fraunhofer.de Dr. Andreas Wotzlaw University of Cologne wotzlaw@informatik.uni-koeln.de Ravi Coote Fraunhofer FKIE ravi.coote@fkie.fraunhofer.de

Outline Hecking/Wotzlaw/Coote / 2 1. Introduction 2. Combined Deep and Shallow Parsing 3. Logical Inferences on Text Content 4. Background Knowledge 5. Conclusion, References

1. Introduction Hecking/Wotzlaw/Coote / 3 Motivation/Problem description: Necessity to analyze the content of large quantities of intelligence reports and other documents written in different languages. During this information and knowledge exploration (content analysis) a formal description of the actions and involved entities is constructed. The extracted information can be combined and enhanced with background knowledge. Conclusions can be drawn from the extracted and enhanced information. Various approaches: Shallow parsing, application specific combination of analysis results, used in current projects, Information Extraction, ZENON project. Our mie project.

1. Introduction - mie Project Main ideas Hecking/Wotzlaw/Coote / 4 Our approach: The project "Multilingual content analysis with semantic inference on military relevant texts" (mie) Combined deep and shallow parsing approach. Extracted meaning of each sentence is formalized in formal logic. Simple English and (very simple) Arabic texts can be processed. The formalized content is extended with background knowledge (integration of WordNet and YAGO). New conclusions (logical inferences) can be drawn; application of theorem provers and model builders.

1. Introduction - mie Project Logical inference I Hecking/Wotzlaw/Coote / 5 The problem of drawing conclusions on texts and relevant background knowledge is formalized as a pair of a text and a hypothesis. The following is a typical example: Text T: German soldiers were involved in a battle near Kundus. Two of them were badly injured. They were brought with a military airplane to Germany. Hypothesis H: Some hurt soldiers were transported to Germany.

1. Introduction - mie Project Logical inference II Hecking/Wotzlaw/Coote / 6 Drawing inferences on military relevant texts can be formulated as a problem of recognizing textual entailment (RTE) - a well known academic problem. In RTE we want to identify automatically the type of a logical relation between two input texts (T and H). The mie system can be used to find answers to the following, mutually exclusive conjectures with respect to background knowledge: 1. T entails H, 2. T H is inconsistent, i.e., T H contains some contradiction, or 3. H is informative with respect to T, i.e., T does not entail H and T H is consistent.

1. Introduction - mie Project Prototype I Hecking/Wotzlaw/Coote / 7 English input.

1. Introduction - mie Project Prototype II Hecking/Wotzlaw/Coote / 8 A second language.

1. Introduction - mie Project Prototype III Hecking/Wotzlaw/Coote / 9 Result of the inference process.

1. Introduction - mie Project Architecture Hecking/Wotzlaw/Coote / 10 Main modules: Syntactic and semantic analysis Logical Inference Minimal Recursion Semantics (MRS) Graphical User Interface (GUI)

2. Combined Deep and Shallow Parsing - I Hecking/Wotzlaw/Coote / 11 Task of this module: syntactic processing and semantic construction. XML-based middleware architecture Heart of Gold. Flexible integration of shallow and deep linguistics-based and semanticsoriented NLP components. Shallow processing: statistical or simple rule-based, typically finite-state methods. Deep HPSG parser PET. English Resource HPSG Grammar (ERG); simple Arabic HPSG grammar.

2. Combined Deep and Shallow Parsing - II Hecking/Wotzlaw/Coote / 12 Tokenization: Java tool Jtok. Part-of-speech tagging: statistical tagger TnT trained for English on the Penn Treebank. Named entity recognition: SProUT. HPSG parser PET: highly efficient runtime parser for unification-based grammars; core of the rule-based, fine-grained deep analysis. Robust Minimal Recursion Semantics (RMRS).

2. Combined Deep and Shallow Parsing - III Hecking/Wotzlaw/Coote / 13 Result of the combined deep and shallow parsing.

3. Logical Inferences on Text Content - I Hecking/Wotzlaw/Coote / 14 Task of this module: logical deduction, integration of background knowledge. The MRS expressions are translated into a semantic equivalent representation of First- Order Logic with Equality (FOLE). Find the relevant background knowledge. Inference engines: Theorem provers: prove that a formula is valid. Model builders: show that a formula is true in at least one model. The theorem prover attempts to prove the input whereas the model builder simultaneously tries to find a model for the negation of the input.

3. Logical Inferences on Text Content - II Hecking/Wotzlaw/Coote / 15 Semantic representation of T as a FOLE formula.

4. Background Knowledge - I Hecking/Wotzlaw/Coote / 16 Extend automatically the FOLE formulas (T and H) with problem-relevant knowledge in form of background knowledge axioms. 1 st source: WordNet 3.0 A lexical database for synonymy, hyperonymy (e.g., location is a hyperonym of city), and hyponymy (e.g., city is a hyponymy of location) relations (taxonomy). Approx. 2.6 million entries. It helps the logical inference process to detect entailments between lexical units from the text and the hypothesis. The hyperonymy/hyponymy relation in WordNet spans a directed acyclic graph (DAG) with the root node entity => may induce inconsistencies between the input problem formulas and the extracted knowledge. This must be taken into account during the integration process.

4. Background Knowledge - II Hecking/Wotzlaw/Coote / 17 Integration of WordNet List all concepts and individuals from the input formulas. Find the search predicates in WordNet and build the knowledge graph (using hyperonymy/hyponymy and synonymy relations). The graph is optimized so that only those concepts appear in a tree, which are directly relevant for the inference problem.

4. Background Knowledge - III Hecking/Wotzlaw/Coote / 18 2 nd source: YAGO Large ontology; approx. 22 million facts and relations. Assembled automatically from the category system and the info boxes of Wikipedia, and combined with taxonomic relations from WordNet. Integration of YAGO Consult YAGO about search predicates that were not recognized in the WordNet phase. The result of every YAGO-query is in general represented by a DAG. Preserve correctness of results: select for the integration only those concepts, individuals, and relations which are on the longest path from the most general concept to one of the direct hyperonyms of the leaf.

4. Background Knowledge - IV Hecking/Wotzlaw/Coote / 19 Result of a query to YAGO and integration of the result.

4. Background Knowledge - V Hecking/Wotzlaw/Coote / 20 Concepts from WordNet and YAGO.

5. Conclusion Hecking/Wotzlaw/Coote / 21 In this presentation, we introduced the mie system based on a combination of deep and shallow parsing with logical inferences on the analysis results and background knowledge. Possible improvements The Arabic HPSG grammar is only a very small one. During the inference process only the most probable meaning of the words is considered. Considering as well other - less probable - meanings might increase the inferential power. It would be interesting to look at the inconsistent cases of the inference process. They were caused by errors in presupposition and anaphora resolution, incorrect syntactic derivations, and inadequate semantic representations. For the implementation of some temporal calculus, also temporal relations from YAGO such as during, since, or until could be considered.

5. References Hecking/Wotzlaw/Coote / 22 A. Wotzlaw and R. Coote. Recognizing textual entailment with deep-shallow semantic analysis and logical inference. In: SEMAPRO 2010, Florence, Italy, 2010. R. Coote and A. Wotzlaw. Generation of first-order expressions from a broad coverage HPSG grammar. In AAIA'10, Wisla, Poland, 2010. Andreas Wotzlaw. Towards better ontological support for recognizing textual entailment. In: EKAW 2010, Lisbon, Portugal, 2010. M. Hecking, A. Wotzlaw, R. Coote. Abschlussbericht des Projektes Multilinguale Inhaltserschließung. FKIE-Bericht Nr. 207, Wachtberg, Germany, 2011. M. Hecking. Multilinguale Textinhaltserschließung auf militärischen Texten. In: Verteilte Führungsinformationssysteme. M. Wunder, J. Grosche (Hrsg.), Springer- Verlag, 2009. M. Hecking and T. Sarmina Baneviciene. A Tajik Extension of the Multilingual Information Extraction System ZENON. In Proceedings of the 15th International Command and Control Research and Technology Symposium (ICCRTS), Santa Monica, CA, U.S.A., 2010. M. Hecking. System ZENON Semantic Analysis of Intelligence Reports. In: Proceedings of the LangTech 2008, February 28-29, 2008, Rome, Italy.

Hecking/Wotzlaw/Coote / 23 Thank you for your attention! Questions?