The s participation in QA4MRE: from QA to multiple choice challenge

Similar documents
AQUA: An Ontology-Driven Question Answering System

Linking Task: Identifying authors and book titles in verbose queries

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Language Independent Passage Retrieval for Question Answering

A Case Study: News Classification Based on Term Frequency

Word Segmentation of Off-line Handwritten Documents

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

An Interactive Intelligent Language Tutor Over The Internet

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

The MEANING Multilingual Central Repository

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

ScienceDirect. Malayalam question answering system

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Postprint.

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Applications of memory-based natural language processing

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Patterns for Adaptive Web-based Educational Systems

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Diploma in Library and Information Science (Part-Time) - SH220

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Speech Recognition at ICSI: Broadcast News and beyond

Delaware Performance Appraisal System Building greater skills and knowledge for educators

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Cross Language Information Retrieval

Classifying combinations: Do students distinguish between different types of combination problems?

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Probability Therefore (25) (1.33)

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Evolutive Neural Net Fuzzy Filtering: Basic Description

Rule Learning With Negation: Issues Regarding Effectiveness

Abstractions and the Brain

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

prehending general textbooks, but are unable to compensate these problems on the micro level in comprehending mathematical texts.

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Coimisiún na Scrúduithe Stáit State Examinations Commission LEAVING CERTIFICATE 2008 MARKING SCHEME GEOGRAPHY HIGHER LEVEL

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

arxiv: v1 [cs.cl] 2 Apr 2017

Human Emotion Recognition From Speech

Software Maintenance

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

User Education Programs in Academic Libraries: The Experience of the International Islamic University Malaysia Students

Using dialogue context to improve parsing performance in dialogue systems

Evidence for Reliability, Validity and Learning Effectiveness

BENCHMARK TREND COMPARISON REPORT:

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

Compositional Semantics

GACE Computer Science Assessment Test at a Glance

INPE São José dos Campos

NCEO Technical Report 27

Facing our Fears: Reading and Writing about Characters in Literary Text

Learning Microsoft Office Excel

Lecture 1: Machine Learning Basics

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Florida Reading Endorsement Alignment Matrix Competency 1

Ontologies vs. classification systems

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Constraining X-Bar: Theta Theory

Practice Examination IREB

Task Tolerance of MT Output in Integrated Text Processes

Computerized Adaptive Psychological Testing A Personalisation Perspective

Rule Learning with Negation: Issues Regarding Effectiveness

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Android App Development for Beginners

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

DYNAMIC ADAPTIVE HYPERMEDIA SYSTEMS FOR E-LEARNING

On-Line Data Analytics

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Developing a TT-MCTAG for German with an RCG-based Parser

Matching Similarity for Keyword-Based Clustering

ARNE - A tool for Namend Entity Recognition from Arabic Text

Universiteit Leiden ICT in Business

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

Automating the E-learning Personalization

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Assignment 1: Predicting Amazon Review Ratings

Bachelor of Software Engineering: Emerging sustainable partnership with industry in ODL

Transcription:

The DI@UE s participation in QA4MRE: from QA to multiple choice challenge José Saias and Paulo Quaresma Departamento de Informática, ECT Universidade de Évora, Portugal {jsaias,pq}@di.uevora.pt Abstract. This QA4MRE edition brought two challenges to the DI@UE team: the absence of Portuguese as a working language and the different nature of the task when compared with previous participation in QA@CLEF. We addressed this multiple choice answering problem by assessing answer candidates in a text surface based manner, without a deep linguistic processing. This system employs a Lucene based search engine and Wordnet to assist in synonym check and morphological normalization. Answer analysis and the criteria for the answering decision are fundamentally based on superficial analysis of document text, enriched with semantic validation of compatibility between terms. The solution we describe answered to 73 from 120 questions, having 18 correct answers and an accuracy of 0.15. 1 Introduction This paper describes the participation of the Informatics Department of the University of Évora (DI@UE) team at Question Answering for Machine Reading Evaluation (QA4MRE) 1 of Cross Language Evaluation Forum (CLEF2011). In previous editions of CLEF, DI@UE focused on the Portuguese monolingual Question Answering (QA) task 2. In QA@CLEF 2008 we used the Senso system [4] for open domain QA, featuring a portuguese stemmer, a text indexation engine and an answer validation module. In this system s latest evolution[5], candidate answers for each question are contextualized over time, space and semantic dimension. This organization enables a multiple perspective differentiation over the answer list and supports the answer appreciation process, but it is not multilingual. It incorporates tools for the Portuguese language, which was not included in the list of languages for this QA4MRE edition. Even with the introduction of tools for the English language, Senso system did not seem the most suitable for processing QA4MRE questions. 1 http://celct.fbk.eu/qa4mre/ 2 University of Évora previous work at CLEF: 2004 [7], 2005 [8], 2007 [3] and 2008 [4].

The task in which participated until 2008 was substantially different from that required this year. For QA@CLEF, the purpose was to automatically find the answer to a set of questions. Systems were required to detect the possible answers by themselves inside the document collections (Wikipedia and news corpus) [9]. QA4MRE main task aims to test systems ability to understand the meaning communicated by a text [10]. The task proposes a reading comprehension exercise where each question about a document will have five choices, from which systems will identify the correct answer. Despite the complexity of the process, with the need for a justification with the elements that support the answer and the need for textual inference, these task rules give rise to specialized approaches. Using the benefit of having an answer among five possibilities, the effort can be directed to assessing answer candidates. Such may be seen as a subtask of full QA, in the sense that it does not need an answer extraction phase. The next section presents the main resources employed and the system architecture. The approach used in QA4MRE is described with examples in section 3. The obtained results are presented in section 4. Finally, some conclusions and future work are pointed out in section 5. 2 System Resources and Architecture The main system components and their interactions are presented in figure 1. The XML Layer is a component responsible for parsing the input, sorting out the questions with their multiple choice answers and maintaining a connection to their particular reading test document. When all questions are processed, this component generates the XML output and makes sure the syntax is correct and conforms to the DTD. The Question Classifier module was thought to determine the type of the question, which is later considered in assessing each response. The Local KB has a starting knowledge base containing common sense facts about places, entities and events. Its content is important for example in Named Entity Recognition (NER) process. The Libs Module contains collections of text documents, refered as Background Collections (BC). The English version for all three topics (AIDS; Climate Change; Music and Society) corresponds to approximately 2 gigabytes of text. This module also includes a Lucene 3 based text search engine, used to index all BC text and subsequent document retrieval operations. Working with English written text, Wordnet[2] is a significant external tool for assisting in synonym check, term base form normalization or to find definitions. This resource is consulted through the Java API for WordNet Searching 4. 3 Apache Lucene is an open source project with advanced indexing and searching features. http://lucene.apache.org/ 4 Java API for WordNet Searching: http://lyle.smu.edu/ tspell/jaws/index.html

The Answer Analyzer is responsible for assess each answer choice for a question. This check is fundamentally based on superficial analysis of text, enriched with semantic validation of compatibility between terms. Besides the reading test document content, the system also examines BC documents that are retrieved from the question text and answer choice s text. Possibly relevant phrases are highlighted for further consideration when deciding between the various choices. With the information collected for each candidate answer to a question, the Answer Selector module applies a set of criteria to choose the most plausible answer. Next section explains how the system processes each QA4MRE question. Fig. 1. System Architecture 3 Methodology Although Machine Reading can be defined as the automatic understanding of text [6], our participation in QA4MRE is not based on a deep linguistic processing. Our approach to identify the correct answer choice to a question comprises the following steps:

Named Entity Recognition - prior identification of any entity names, dates, quantities or other expressions that influence question interpretation. Question Classification - determine the category of the question in order to adopt specific procedures in the treatment of answers. The question: How many orchestras were mentioned in the London Times? is a Quantity subtype of Factoid category. Document Retrieval - search for documents in Background Collections that can support one of the possible answers to the question. It uses the Lucene tool with a search expression according to the question category and each answer candidate. Search expression avoids stop words and can be expanded using synonyms or morphological normalization. Passage Retrieval - to minimize text area where the more time consuming techniques must be applied, the retrieved documents and the reading test document are divided into text segments. Answer Analysis - all possible answers are analyzed in the text segments. For each of the multiple choices we intend to verify: A- Is there a textual answer pattern to this question category that is verified, in the current text segment, for this answer choice? B- If both question and answer key elements are present in the segment, what is the (minimal) distance between them? In both cases, the question key element to find in the text segments is the question focus. That is the entity or object that the question refers to, about which some information is to be determined. Question focus is identified with the category of the question. When question classification fails, the default procedure is to look for terms in the question text, filtering out stop words. The presence of those key elements on a text segment is based on term exact match, firstly, but also through semantic compatibility (synonym, hyperonym, base form). Answer Selection- considering the cases A and B detected for each answer, the system will identify the most plausible answer. The decision is based on the following ordered criteria: 1. If there is one single answer that verified the case A, then that response is chosen. 2. If multiple answers verify the case A: (a) If one of them has more occurrences of the case A, then that answer is chosen. (b) Otherwise, between the answers having the same number of occurrences of case A, the tie is resolved by the criteria that follow. 3. If there is one single answer that verified the case B, then that answer is chosen. 4. If multiple answers verify the case B and one (only) of them has the minimal distance observed in a text segment, then that answer is chosen. 5. If multiple answers verify the case B and one (only) of them has the minimum value for the medium distance observed in the segments where it had occurences, then that answer is chosen. 6. If none of the above applies, then the question remains unanswered.

The fifth criterion is the main difference between the two submitted runs. For the first run, it was used only to break ties resulting from the criterion 4, whereas in the last run it is applied more broadly. 4 Results This methodology was applied to the 120 questions. Two full executions were completed and their results were submitted. For the second run, the chart in figure 2 denotes the proportion of correct and incorrect answers on the one hand, and the amount unanswered questions on the other side. It makes clear that the approach leads to a large number of wrong answers. However, if we add the right answers to the unanswered questions we get more than the number of wrong answers. Fig.2. Evaluation at QA level for the second run A more specific assessment in each of the runs can be found in table 1. We can see that the system answered 74 questions in the first run. The remaining 46 questions were left unanswered without selecting candidate answer. In cases where response was submitted, only 15 were classified as correct. In the second run, one more question was left unanswered. The system was correct in 18 of the 73 responses submitted. We note that the system classification for the second run is more favorable, because while it increased the number of correct answers, it also decreased the number of erroneous results. This improvement is confirmed by the measure in the second column on the right of the table. The accuracy is calculated as the number of responses submitted and classified as correct divided bythenumberofquestions.onrun02,accuracyis18/120=0.15,whichisbetter than the former. C@1 is a balanced measure rewarding systems that, for the same number of correct answers, perform better over the remaing answers [1]. By leaving some questions unanswered, a system can decrease the number of incorrect results.

unanswered answered all Run # Right Wrong Empty # Right Wrong Accuracy C@1 01 46 0 0 46 74 15 59 0.13 0.17 02 47 0 0 47 73 18 55 0.15 0.21 Table 1. QA level evaluation for each run This measure is calculated using the formula in equation 1. The C@1 value for each run is shown in the rightmost column of the table 1. Again, the system got the best result in the second run, with C@1 = (18+47(18/120))/120 = 0.21. The next section has some conclusions about these results and considerations about our QA4MRE participation. C@1 = #correct+#unanswered #correct #questions #questions (1) 5 Discussion We believe that the outcome of this task is not comparable with our previous work in QA@CLEF. The aim of this work is to automatically understand the meaning of each question and its response hypotheses in order to determine the answer. We found that answer analysis with variant A did not contributed to any answer. This may be due to problems with the question classifier, who managed to correctly assign a category to only 9 questions and some of those still had problems in detecting the question focus. Thus, the criteria 3, 4, 5 and 6 turned out to be the dominant to the process of multiple choice answering. Looking at the questions and despite our text surface based approach can be much improved, it seems that there is a limit beyond which only a deeper semantic analysis may reach the answers. This QA4MRE edition brought us two challenges: the absence of Portuguese as a working language and the different nature of the task and its objectives. Having a lack of time to implement a more robust solution, we consider that the results are satisfactory. Moreover, we believe this experiment constitutes a basis for a future semantically more advanced system, enabled to English, that can be tested in an upcoming participation. References 1. Anselmo Peñas and Alvaro Rodrigo. A Simple Measure to Assess Non-response. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 1415 1424, (2011), ISBN: 978-1-932432-87-9.

2. George A. Miller. Wordnet: A lexical database for English. Communications of the ACM, (1995) 3. José Saias and Paulo Quaresma. The senso question answering approach to portuguese qa@clef-2007. Technical report, CLEF 2007 Working Notes, Cross-Language Evaluation Forum Workshop, Budapest, Hungary, (2007). ISBN: 2-912335-32-9. 4. José Saias and Paulo Quaresma. The senso question answering system at qa@clef 2008. Technical report, Universidade de Évora, Multiple Language Question Answering @ Cross-Language Evaluation Forum, (2008). ISBN: 2-912335-43-4. 5. José Saias. Contextualização e Activação Semântica na Selecção de Resultados em Sistemas de Pergunta-Resposta. Phd Thesis, (2010), hdl.handle.net/10174/2505 6. Lucy Vanderwende. Answering and Questioning for Machine Reading. American Association for Artificial Intelligence, (2007) 7. Paulo Quaresma, Luis Quintano, Irene Rodrigues, José Saias and Pedro Salgueiro. The University of Évora approach to QA@CLEF-2004. CLEF 2004 Working Notes, (2004) 8. Paulo Quaresma and Irene Rodrigues. A Logic Programming Based Approach To QA@CLEF05 Track. CLEF 2005 Working Notes, (2005) 9. QA@CLEF2008. Guidelines for the participants in QA@CLEF 2008. http://clefqa.fbk.eu/2008/download/qa@clef08 Guidelines-for-Participants new.pdf 10. QA4MRE@CLEF2011. Track Guidelines. http://celct.fbk.eu/qa4mre/