Identifying Implicit Relationships Within Natural-Language Questions. Brandon Marlowe ID:

Similar documents
AQUA: An Ontology-Driven Question Answering System

CS Machine Learning

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.cl] 2 Apr 2017

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Domain Ontology Development Environment Using a MRD and Text Corpus

The Strong Minimalist Thesis and Bounded Optimality

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

On document relevance and lexical cohesion between query terms

Natural Language Processing. George Konidaris

The stages of event extraction

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Using dialogue context to improve parsing performance in dialogue systems

Visual CP Representation of Knowledge

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Constructing Parallel Corpus from Movie Subtitles

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Multilingual Sentiment and Subjectivity Analysis

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

The MEANING Multilingual Central Repository

Beyond the Pipeline: Discrete Optimization in NLP

A Bayesian Learning Approach to Concept-Based Document Classification

Speech Recognition at ICSI: Broadcast News and beyond

CSC200: Lecture 4. Allan Borodin

Detecting English-French Cognates Using Orthographic Edit Distance

Improving Fairness in Memory Scheduling

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Probabilistic Latent Semantic Analysis

Prediction of Maximal Projection for Semantic Role Labeling

Graph Alignment for Semi-Supervised Semantic Role Labeling

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

An Introduction to the Minimalist Program

On-Line Data Analytics

MYCIN. The MYCIN Task

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Learning Methods in Multilingual Speech Recognition

Unsupervised Learning of Narrative Schemas and their Participants

Distant Supervised Relation Extraction with Wikipedia and Freebase

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Finding Translations in Scanned Book Collections

The Role of String Similarity Metrics in Ontology Alignment

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

(Sub)Gradient Descent

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning From the Past with Experiment Databases

Assignment 1: Predicting Amazon Review Ratings

Seminar - Organic Computing

Compositional Semantics

Cross Language Information Retrieval

Leveraging Sentiment to Compute Word Similarity

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Rule Learning With Negation: Issues Regarding Effectiveness

Robust Sense-Based Sentiment Classification

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

A Case Study: News Classification Based on Term Frequency

Artificial Neural Networks written examination

Proof Theory for Syntacticians

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling function word errors in DNN-HMM based LVCSR systems

Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text

ATW 202. Business Research Methods

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

A Graph Based Authorship Identification Approach

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

1. Introduction. 2. The OMBI database editor

Accuracy (%) # features

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Reducing Features to Improve Bug Prediction

Organizational Knowledge Distribution: An Experimental Evaluation

Using Synonyms for Author Recognition

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Using Task Context to Improve Programmer Productivity

Context Free Grammars. Many slides from Michael Collins

Lecture 1: Machine Learning Basics

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Florida Reading Endorsement Alignment Matrix Competency 1

Axiom 2013 Team Description Paper

Introduction to Simulation

CS 598 Natural Language Processing

Vocabulary Usage and Intelligibility in Learner Language

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Grade 4. Common Core Adoption Process. (Unpacked Standards)

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Dublin City Schools Mathematics Graded Course of Study GRADE 4

1 3-5 = Subtraction - a binary operation

Transcription:

Identifying Implicit Relationships Within Natural-Language Questions Brandon Marlowe ID: 2693414

What is Watson? Watson is a question answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO, industrialist Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! and, in 2011, the Watson computer system competed on Jeopardy! against former winners Brad Rutter and Ken Jennings winning the first place prize of $1 million. - Wikipedia

Watson - 2011

IBM Watson Hardware Specs (as of 2011) Cluster of 90 IBM Power 750 Servers Each server has a 3.5 GHz POWER7 Processor 8 cores, 32 threads each (720 cores, 2880 threads total) 16 TB of RAM combined Can process 500 GB of data per second

Important Terms

What are Implicit Relationships Within Natural Language Questions? Implicit capable of being understood from something else though unexpressed - Merriam Webster Dictionary Related connected by reason of an established or discoverable relation - Merriam Webster Dictionary Language the words, their pronunciation, and the methods of combining them used and understood by a community - Merriam Webster Dictionary

What is Machine Learning? Machine Learning: Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. - Wikipedia Features: In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed...features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. - Wikipedia

Types of Common Bonds Questions whose solution is a common element or characteristic among all the entities within the question Questions These make up less than 0.2% of all Jeopardy! questions Missing Links Questions whose solution relies on identifying a missing entity explicitly or implicitly referred to within the question

Common Bonds: Things with arches!

Missing Links: On hearing of the discovery of George Mallory s body, this explorer told reporters he still thinks he was first. Answer: Sir Edmund Hillary Missing Link: Mount Everest

How Does Watson Do It?

Watson s Four Computational Steps 1) Question Analysis 2) Candidate Answer Generation 3) Candidate Answer Scoring 4) Merging and Ranking of Candidates

1) Question Analysis Analysis done using four components Spreading Activation Algorithm N-Gram Corpus PRISMATIC Knowledge Base Wikipedia link-crawling

Spreading Activation Algorithm A method for searching associative, neural, or semantic networks Begins at a set of source nodes with weights or activation values IBM developed a recursive S-A Algorithm Identifies related concepts Based on heterogeneous data resources Higher activation values between nodes = stronger relationship

Spreading Activation Algorithm Example Visualization

N-Gram Corpus Captures semantic relatedness between words using Normalized Google Distance (NGD) NGD measures conceptual/semantic similarity between word pairs ie.) Football and Player Terms that frequently occur together in are more likely to appear in an N-Gram

PRISMATIC Knowledge Base Determines conceptual and semantic relatedness based on syntactic arrangement Uses VerbNet, FrameNet, and WordNet (minimally) as resources, each of which are manually built WordNet contains synset information (definition, synonyms, antonyms, etc.) FrameNet contains frames that describe the structure of selected words used in association VerbNet maps verbs to their associated Levin-classes

PRISMATIC Knowledge Base

PRISMATIC Knowledge Base In 1921, Einstein received the Nobel Prize for his original work on the photoelectric effect. Parse Tree SLOT

Wikipedia Links Wikipedia metadata enables Watson to determine semantic relationships <X> represents links where the anchor text and the target document are both X. <X Y> represent links where X is the anchor text and Y is the title of the target document.

2) Candidate Answer Generation Answer generation differs between Common Bond and Missing Link questions Common Bond Identifies closely related concepts to entities in the question Considers the union of all concepts as candidates S-A Algorithm invoked on each question entity Common bond solutions are directly related to entities in question spreading activation depth = 1 Missing Link Candidate answers are generated, and used as hypothesized missing links The missing links are then passed back into the algorithm along with the question New candidate solutions are generated Good missing links are: Highly related to concepts in the question Must be ruled out as possible solutions Missing links are of the wrong answer type (ie. Mount Everest is not a person), but have high association with the question

3) Candidate Answer Scoring Answer ranking differs between Common Bond and Missing Link questions Common Bond Scored on semantic relatedness to each entity in the question Similarity score calculated using NGD and N-Gram Corpus Candidates semantically close to all entities are ranked highly Scores are used in final ranking step Missing Link Watson performs worse when missing link is implicit An additional answer scorer includes identified missing link to measure semantic relationship between all entities New answer scorer allows textual evidence scorers to operate more optimally Score is calculated by determining semantic relatedness between the missing link and candidate answers Three instances of the scoring method are run in parallel One for each resource (N-Gram Corpus, PRISMATIC, and Wikipedia links)

4) Merging and Ranking of Candidates Watson s Confidence threshold

Experimental Evaluation

Experimental Evaluation Separate experiments for Common Bonds and Missing Link Common Bonds Evaluation Setup Tested end-to-end system performance Trained Watson using a set of 14,770 questions (102 Common Bond) Two versions of Watson: Enhanced (w/ N-Gram Corpus AKA Common Bond Answer Generator), baseline (w/o N-Gram Corpus) 139 previously unseen common bond questions given to Watson Two main benchmarks Binary Recall = percentage of questions for which the system chose the correct answer as a candidate answer Precision@70 = precision when answering the top 70% of questions it was most confident about Missing Link Evaluation Setup Tested end-to-end system performance Two version of Watson: Enhanced (w/ Missing Link Processing), baseline (w/o Missing Link Processing) 1,112 previously unseen Missing Link questions given to Watson Two main benchmarks Binary Recall = (same as Common Bonds) Question-Answering Accuracy tests Watson s ability to promote candidate answers produced by Missing-Link Answer Scorer to the top of candidate list

Experimental Results Common Bond Evaluation Results Enhanced System vs. Baseline System Common Bond Answer Generator produced at least one candidate answer for 113 of the 139 questions (81%) For 80 of those 113 (80%), the correct answer was one of the candidates Binary Recall was improved for only 6 additional questions when combining all the systems Fails to generate correct answer when the solution is an abstract concept ex.) Question: Modem, Quasar, Gestapo [Answer: Acronyms] Ultimately, the N-Gram Corpus was left out of the final system

Experimental Results Missing Link Evaluation Results Of the 1,112 questions, presented to Watson Watson identified 259 as having a Missing Link Just under 20% of them were not Missing Link questions ~60% of the Missing Links were explicit, ~40% were implicit Questions within the Missing Link subset are more difficult Humans score ~48% in Missing Link Questions

Experimental Results Correct type initially. Chooses answer lower in list, then identifies that as the missing link

Conclusion Three knowledge resources developed by IBM: N-Gram Corpus PRISMATIC Knowledge Base Wikipedia Web Link Crawling Spreading Activation Algorithm: Supported by all three knowledge resources Recursively traverses neural network to discover semantic relationships Massive implications in AI and widespread application Health-care Law Advertising

Sources https://www.merriam-webster.com/dictionary/implicit https://www.merriam-webster.com/dictionary/language https://www.merriam-webster.com/dictionary/related https://en.wikipedia.org/wiki/watson_(computer) https://en.wikipedia.org/wiki/machine_learning https://en.wikipedia.org/wiki/feature_%28machine_learning%29 Fan, James & Ferrucci, David & Gondek, David & Kalyanpur, Aditya. (2010). PRISMATIC: Inducing Knowledge From a Large Scale Lexicalized Relation Resource. 122-127.