Aligning Sentences from Standard Wikipedia to Simple Wikipedia. Written by Hwang et al. Presented by Xia Cui for

Save this PDF as:

Size: px
Start display at page:

Download "Aligning Sentences from Standard Wikipedia to Simple Wikipedia. Written by Hwang et al. Presented by Xia Cui for"

Transcription

1 Aligning Sentences from Standard Wikipedia to Simple Wikipedia Written by Hwang et al. Presented by Xia Cui for

2 Overview Wikipedia Simple Article shorter sentences and simpler words and grammars Standard Article Aim: Sentence Alignment for every simple sentence, find corresponding sentence(or sentence fragments) in standard Wikipedia Problem not strictly parallel & very different presentation ordering Solution Sentence-Level Scoring Sequence-Level Search

3 Sentence-Level Scoring Kauchak, 2013 cosine distance between vector representations of tf.idf scores of words in each sentence tf.idf: term frequency inverse document frequency, how important a word to a document Wu and Plamer, 1994 word-level pairwise semantic similarity score

4 Sequence-Level Search Zhu et al., 2010 without constraint, can be one-to-many two sentences are aligned if similarity score > threshold Coster and Kauchak, 2011; Barzilay and Elhadad, 2003 with a sequential constraint dynamic programming, recursively optimization relies on consistent ordering, not always hold for Wikipedia

5 Simplification Datasets Good semantics of simple and standard completely matches Good Partial a sentence covers the other, but contains additional info Partial discuss unrelated concepts, but share short related phrase Bad discuss unrelated concepts

6 Simplification Datasets(Cont.) Manually Annotated native speaker, 67,853 pairs(277 good, 281 good partial, 117 partial and 67,178 bad) Automatically Aligned threshold > 0.45; good: 0.67; good partial: K good, 130K good partial, 110K unlabelled 51.5M potential(threshold < 0.45)

7 Sentence Alignment Sentence-Level Score builds on Word-Level Similarity WikNet Similarity Structural Semantic Similarity Greedy Search

8 Word-Level Similarity WikNet Similarity WikNet: a graph leverage synonym info in Wiktionary + word-definition co-occurrence Word: a node if word w2 appears in any sense of definitions of word w1 an edge: Preprocess w1 morphological variations are mapped to baseform atypical word senses are removed stopwords are removed Extended Jaccard Coefficient Jaccard Coefficient(Salton and Mcgill, 1983) Number of shared neighbors for two words w2

9 WikNet Similarity(Cont.) Extended Jaccard Coefficient neighbors with n-step reach(fogaras and Racz, 2005) additional term: direct neighbor or not if words or neighbors have synonym sets in Wiktionary, then the shared synonyms are used if two words are in each other s synonym lists, the similarity is set to 1 otherwise:» is l-step neighbor set of wi

10 Structural Semantic Similarity Between words +dependency structure between words in a sentence Stanford s dependency parser(de Marneffe et al., 2006) create triplet for each word w: given word, h: head word, r: relationship between w and h Similarity between w1 and w2 : WikNet Similarity; : dependency similarity between relations r1 and r2 same category: ; otherwise:

11 Greedy Sequence-Level Alignment Compute similarity between all sentences Sj in simple and Ai in standard Select most similar sentence pair, remove all other pairs with respective sentences S*, A* = argmaxs(sj, Ai) Repeat until all sentences in shorter document are aligned Good Good Partial Ai (fragments of standard sentence Ai)

12 Experiments Preprocess topic names, list markers and non-english are removed data was tokenized, lemmatized and parsed by Stanford CoreNLP ( Evaluation Precision-recall; max F1; AUC Comparison(Greedy Structural WikNet) Unconstrained WordNet(Mohler and Mihalcea, 2009) an unconstrained search for aligning sentences and WordNet Semantic Similarity Unconstrained Vector Space(Zhu et al., 2010) vector space representation and an unconstrained search for aligning sentences Ordered Vector Space(Coster and Kauchak, 2011) dynamic programming for sentence alignment and vector space scoring

13 Results

14 Results(Cont.)

15 Future Work Introducing other techniques using introduced datasets Better text preprocessing Learning similarities Phrase alignment to obtain better partial matches

Aligning Sentences from Standard Wikipedia to Simple Wikipedia

Aligning Sentences from Standard Wikipedia to Simple Wikipedia Aligning Sentences from Standard Wikipedia to Simple Wikipedia William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu {wshwang, hannaneh, ostendor, weiwu}@u.washington.edu University of Washington

More information

CLSciSumm Shared Task: On the Contribution of Similarity measure and Natural Language Processing Features for Citing Problem

CLSciSumm Shared Task: On the Contribution of Similarity measure and Natural Language Processing Features for Citing Problem CLSciSumm Shared Task: On the Contribution of Similarity measure and Natural Language Processing Features for Citing Problem Elnaz Davoodi *, Kanika Madan *, Jia Gu ( * Equal contribution) Thomson Reuters,

More information

Short Text Similarity with Word Embeddings

Short Text Similarity with Word Embeddings Short Text Similarity with s CS 6501 Advanced Topics in Information Retrieval @UVa Tom Kenter 1, Maarten de Rijke 1 1 University of Amsterdam, Amsterdam, The Netherlands Presented by Jibang Wu Apr 19th,

More information

SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores

SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores Nabin Maharjan, Rajendra Banjade, Nobal B. Niraula, Vasile Rus Department of Computer Science,

More information

Automatic Vector Space Based Document Summarization Using Bigrams

Automatic Vector Space Based Document Summarization Using Bigrams Automatic Vector Space Based Document Summarization Using Bigrams Rajeena Mol M. 1, Sabeeha K. P. 2 P.G. Student, Department of Computer Science and Engineering, M.E.A Engineering College, Kerala, India

More information

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Bo Pang and Lillian Lee (2004)

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Bo Pang and Lillian Lee (2004) A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee (2004) Document-level Polarity Classification Determining whether an article is

More information

Monitoring Classroom Teaching Relevance Using Speech Recognition Document Similarity

Monitoring Classroom Teaching Relevance Using Speech Recognition Document Similarity Monitoring Classroom Teaching Relevance Using Speech Recognition Document Similarity Raja Mathanky S 1 1 Computer Science Department, PES University Abstract: In any educational institution, it is imperative

More information

STS-UHH at SemEval-2017 Task 1: Scoring Semantic Textual Similarity Using Supervised and Unsupervised Ensemble

STS-UHH at SemEval-2017 Task 1: Scoring Semantic Textual Similarity Using Supervised and Unsupervised Ensemble STS-UHH at SemEval-2017 Task 1: Scoring Semantic Textual Similarity Using Supervised and Unsupervised Ensemble Sarah Kohail LT Group Amr Rekaby Salama NATS Group Chris Biemann LT Group {kohail,salama,biemann}@informatik.uni-hamburg.de

More information

INTRODUCTION TO TEXT MINING

INTRODUCTION TO TEXT MINING INTRODUCTION TO TEXT MINING Jelena Jovanovic Email: jeljov@gmail.com Web: http://jelenajovanovic.net 2 OVERVIEW What is Text Mining (TM)? Why is TM relevant? Why do we study it? Application domains The

More information

Text Summarization of Turkish Texts using Latent Semantic Analysis

Text Summarization of Turkish Texts using Latent Semantic Analysis Text Summarization of Turkish Texts using Latent Semantic Analysis Makbule Gulcin Ozsoy Dept. of Computer Eng. Middle East Tech. Univ. Ankara, Turkey e1395383@ceng.metu.edu.tr Ilyas Cicekli Dept. of Computer

More information

Text Summarization of Turkish Texts using Latent Semantic Analysis

Text Summarization of Turkish Texts using Latent Semantic Analysis Text Summarization of Turkish Texts using Latent Semantic Analysis Makbule Gulcin Ozsoy Dept. of Computer Eng. Middle East Tech. Univ. e1395383@ceng.metu.edu.tr Ilyas Cicekli Dept. of Computer Eng. Bilkent

More information

Malayalam Text summarization Using Vector Space Model

Malayalam Text summarization Using Vector Space Model RESEARCH ARTICLE OPEN ACCESS Malayalam Text summarization Using Vector Space Model Kanitha D K, D. Muhammad Noorul Mubarak 2 & S. A. Shanavas 3 (Computational Linguistics, Department of Linguistics, University

More information

Comparing the value of Latent Semantic Analysis on two English-to-Indonesian lexical mapping tasks

Comparing the value of Latent Semantic Analysis on two English-to-Indonesian lexical mapping tasks Comparing the value of Latent Semantic Analysis on two English-to-Indonesian lexical mapping tasks David Moeljadi Nanyang Technological University October 16, 2014 1 Outline The Authors The Experiments

More information

Deep Learning. Mohammad Ebrahim Khademi Lecture 14: Natural Language Processing

Deep Learning. Mohammad Ebrahim Khademi Lecture 14: Natural Language Processing Deep Learning Mohammad Ebrahim Khademi Lecture 14: Natural Language Processing OUTLINE Introduction to Natural Language Processing Word Vectors SVD Based Methods Iteration Based Methods Word2vec Language

More information

Predicting the Semantic Orientation of Adjective. Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi

Predicting the Semantic Orientation of Adjective. Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi Aim To validate that conjunction put constraints on conjoined adjectives and

More information

CROSS-LINGUAL INFORMATION RETRIEVAL WITH EXPLICIT SEMANTIC ANALYSIS

CROSS-LINGUAL INFORMATION RETRIEVAL WITH EXPLICIT SEMANTIC ANALYSIS 1 CROSS-LINGUAL INFORMATION RETRIEVAL WITH EXPLICIT SEMANTIC ANALYSIS Philipp Sorg and Philipp Cimiano Working Notes of the Annual CLEF Meeting, 2008 Tiago Luís Outline 2 Cross-Language IR Explicit Semantic

More information

Using Machine Learning Methods and Linguistic Features in Single-Document Extractive Summarization

Using Machine Learning Methods and Linguistic Features in Single-Document Extractive Summarization Using Machine Learning Methods and Linguistic Features in Single-Document Extractive Summarization Alexander Dlikman and Mark Last Department of Information Systems Engineering Ben-Gurion University of

More information

Brent Fitzgerald. CS224N Final Project - June 1, 2000

Brent Fitzgerald. CS224N Final Project - June 1, 2000 IMPLEMENTATION OF AN AUTOMATED TEXT SEGMENTATION SYSTEM USING HEARST S TEXTTILING ALGORITHM Brent Fitzgerald brentf@stanford.edu CS224N Final Project - June 1, 2000 ABSTRACT This paper describes the implementation

More information

Ameeta Agrawal Nikolay Yakovets. 01 Dec 2011

Ameeta Agrawal Nikolay Yakovets. 01 Dec 2011 Ameeta Agrawal Nikolay Yakovets 01 Dec 2011 In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. Prime Minister Vladimir V. Pu2n, the country's paramount leader,

More information

Unsupervised Relation Extraction from Web. -Bhavishya Mittal (11198) - Vempati Anurag Sai (Y )

Unsupervised Relation Extraction from Web. -Bhavishya Mittal (11198) - Vempati Anurag Sai (Y ) Unsupervised Relation Extraction from Web -Bhavishya Mittal (11198) - Vempati Anurag Sai (Y9227645) Problem Statement Previous Work Approach Self learning Extractor Probability Query Work Done Work Remaining

More information

Motivation, Methods and Evaluation. Sowmya Vajjala

Motivation, Methods and Evaluation. Sowmya Vajjala Motivation, Methods and Evaluation (with Detmar Meurers) Center for Language Technology University of Gothenburg, Sweden 20 November 2014 1 / 29 What is readability analysis? We want to measure how difficult

More information

The Contribution of FaMAF at 2008.Answer Validation Exercise

The Contribution of FaMAF at 2008.Answer Validation Exercise The Contribution of FaMAF at QA@CLEF 2008.Answer Validation Exercise Julio J. Castillo Faculty of Mathematics Astronomy and Physics National University of Cordoba, Argentina cj@famaf.unc.edu.ar Abstract.

More information

Natural Language Processing SoSe Summarization. (based on the book of Jurafski and Martin 2009)

Natural Language Processing SoSe Summarization. (based on the book of Jurafski and Martin 2009) Natural Language Processing SoSe 2015 Summarization Dr. Mariana Neves July 6th, 2014 (based on the book of Jurafski and Martin 2009) Outline 2 Task Single-document summarization Multi-document summarization

More information

Mining Meaning From Wikipedia

Mining Meaning From Wikipedia Mining Meaning From Wikipedia PD Dr. Günter Neumann LT-lab, DFKI, Saarbrücken Outline 1. Introduction 2. Wikipedia 3. Solving NLP tasks 4. Namend Entity Disambiguation 5. Information Extraction 6. Ontology

More information

CS502: Compilers & Programming Systems

CS502: Compilers & Programming Systems CS502: Compilers & Programming Systems Context Free Grammars Zhiyuan Li Department of Computer Science Purdue University, USA Course Outline Languages which can be represented by regular expressions are

More information

An Information Retrieval-Based Approach to Determining Contextual Opinion Polarity of Words

An Information Retrieval-Based Approach to Determining Contextual Opinion Polarity of Words An Information Retrieval-Based Approach to Determining Contextual Opinion Polarity of Words Olga Vechtomova 1, Kaheer Suleman 2, Jack Thomas 2 1 Department of Management Sciences, University of Waterloo,

More information

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: 1137-3601 revista@aepia.org Asociación Española para la Inteligencia Artificial España Rezk, Martín I.; Alonso i Alemany,

More information

A Comparison between Sentiment Analysis of Student Feedback at Sentence Level and at Token Level

A Comparison between Sentiment Analysis of Student Feedback at Sentence Level and at Token Level 482 A Comparison between Sentiment Analysis of Student Feedback at Sentence Level and at Token Level 1 Chandrika Chatterjee, 2 Kunal Chakma 1, 2 Computer Science and Engineering, National Institute of

More information

Computing Semantic Relatedness using Wikipedia Taxonomy by Spreading Activation

Computing Semantic Relatedness using Wikipedia Taxonomy by Spreading Activation Computing Semantic Relatedness using Wikipedia Taxonomy by Spreading Activation May Sabae Han, and Ei Ei Mon Abstract Semantic relatedness means the degree of the nearness of two documents or two terms

More information

Evaluating Translational Correspondence using Annotation Projection

Evaluating Translational Correspondence using Annotation Projection Evaluating Translational Correspondence using Annotation Projection R. Hwa, P. Resnik, A. Weinberg & O. Kolak (2002) Presented by Jeremy G. Kahn Presentation for Ling 580 (Machine Translation) 10 Jan 2006

More information

Combined Cluster Based Ranking for Web Document Using Semantic Similarity

Combined Cluster Based Ranking for Web Document Using Semantic Similarity IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IV (Jan. 2014), PP 06-11 Combined Cluster Based Ranking for Web Document Using Semantic Similarity

More information

Machine Translation using Deep Learning Methods Max Fomin Michael Zolotov

Machine Translation using Deep Learning Methods Max Fomin Michael Zolotov Machine Translation using Deep Learning Methods Max Fomin Michael Zolotov Sequence to Sequence Learning with Neural Networks Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine

More information

Identifying Implicit Relationships Within Natural-Language Questions. Brandon Marlowe ID:

Identifying Implicit Relationships Within Natural-Language Questions. Brandon Marlowe ID: Identifying Implicit Relationships Within Natural-Language Questions Brandon Marlowe ID: 2693414 What is Watson? Watson is a question answering computer system capable of answering questions posed in natural

More information

CSA4020. Multimedia Systems:

CSA4020. Multimedia Systems: CSA4020 Multimedia Systems: Adaptive Hypermedia Systems Lecture 7: Term Relationships & Grouping Multimedia Systems: Adaptive Hypermedia Systems 1 Problems with Single-Term Indexing Single terms are either

More information

Effectiveness of Indirect Dependency for Automatic Synonym Acquisition

Effectiveness of Indirect Dependency for Automatic Synonym Acquisition Effectiveness of Indirect Dependency for Automatic Synonym Acquisition Masato HAGIWARA, Yasuhiro OGAWA, and Katsuhiko TOYAMA Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan {hagiwara,yasuhiro,toyama}@kl.i.is.nagoya-u.ac.jp

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

A Convolution Kernel for Sentiment Analysis using Word-Embeddings

A Convolution Kernel for Sentiment Analysis using Word-Embeddings A Convolution Kernel for Sentiment Analysis using Word-Embeddings James Thorne Department of Computer Science University of Sheffield jthorne1@sheffield.ac.uk Abstract. Accurate analysis of a sentence

More information

SUMMARY In order to enter an international circuit, a language must reach a certain level of informatization. This means the existence of some

SUMMARY In order to enter an international circuit, a language must reach a certain level of informatization. This means the existence of some SUMMARY In order to enter an international circuit, a language must reach a certain level of informatization. This means the existence of some resources and programs specially made for the respective language

More information

Exploring the Vector Space Model for Finding Verbs Synonyms

Exploring the Vector Space Model for Finding Verbs Synonyms Exploring the for Finding Verbs Synonyms in Portuguese Recent Advances in Natural Language Processing September 14-16, 2009, Borovets, Bulgaria Luís Sarmento Paula Carvalho Eugénio Oliveira September 14,

More information

Feature Creation and Selection

Feature Creation and Selection Feature Creation and Selection INFO-4604, Applied Machine Learning University of Colorado Boulder October 24, 2017 Prof. Michael Paul Features Often the input variables (features) in raw data are not ideal

More information

Extending WordNet using Generalized Automated Relationship Induction

Extending WordNet using Generalized Automated Relationship Induction Extending WordNet using Generalized Automated Relationship Induction Lawrence McAfee lcmcafee@stanford.edu Nuwan I. Senaratna nuwans@cs.stanford.edu Todd Sullivan tsullivn@stanford.edu This paper describes

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY TUTORIAL QUESTION BANK Name INFORMATION RETRIEVAL SYSTEM Code A70533 Class IV B. Tech I Semester

More information

Incremental Input Stream Segmentation for Real-time NLP Applications

Incremental Input Stream Segmentation for Real-time NLP Applications Incremental Input Stream Segmentation for Real-time NLP Applications Mahsa Yarmohammadi Streaming NLP for Big Data Class SBU Computer Science Department 9/29/2016 Outline Introduction Simultaneous speech-to-speech

More information

As Simple As It Gets - A sentence simplifier for different learning levels and contexts

As Simple As It Gets - A sentence simplifier for different learning levels and contexts As Simple As It Gets - A sentence simplifier for different learning levels and contexts Abstract This paper presents a text simplification method that transforms complex sentences into simplified forms.

More information

Traceability Between Business Process and Software Component using Probabilistic Latent Semantic Analysis

Traceability Between Business Process and Software Component using Probabilistic Latent Semantic Analysis Traceability Between Business Process and Software Component using Probabilistic Latent Semantic Analysis Fony Revindasari 1, Riyanarto Sarno 2, Adhatus Solichah 3 Informatics Department, Faculty of Information

More information

Problems in Current Text Simplification Research

Problems in Current Text Simplification Research Problems in Current Text Simplification Research Wei Xu Chris Callison-Burch Courtney Napoles UPenn UPenn JHU TACL paper @ EMNLP Sep-20-2015 What is Text Simplification What is Text Simplification INPUT

More information

ASSOCIATING DOCUMENTS TO CONCEPT MAPS IN CONTEXT

ASSOCIATING DOCUMENTS TO CONCEPT MAPS IN CONTEXT Concept Mapping: Connecting Educators Proc. of the Third Int. Conference on Concept Mapping Tallinn, Estonia & Helsinki, Finland 2008 ASSOCIATING DOCUMENTS TO CONCEPT MAPS IN CONTEXT Alejandro Valerio

More information

Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement

Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement Building Continents of Knowledge in Oceans of Data: The Future of Co-Created ehealth A. Ugon et al. (Eds.) 2018 European Federation for Medical Informatics (EFMI) and IOS Press. This article is published

More information

Learning Matching Models with Weak

Learning Matching Models with Weak Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots Yu Wu SKLSDE, Beihang University wuyu@buaa.edu.cn Wei Wu Microsoft Corporation wuwei@microsoft.com Zhoujun

More information

Discourse: Structure and Coherence Kathy McKeown. Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin

Discourse: Structure and Coherence Kathy McKeown. Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin Discourse: Structure and Coherence Kathy McKeown Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin HW4: For HW3 you experiment with different features (at least 3) and different learning algorithms

More information

39 Managing Information Disparity in Multi-lingual Document Collections

39 Managing Information Disparity in Multi-lingual Document Collections 39 Managing Information Disparity in Multi-lingual Document Collections KEVIN DUH, NTT Communication Science Laboratories CHING-MAN AU YEUNG, NTT Communication Science Laboratories TOMOHARU IWATA, NTT

More information

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym Synonym Distinction

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym Synonym Distinction Integrating Distributional Lexical Contrast into Word Embeddings for Antonym Synonym Distinction Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu Institute for Natural Language Processing (IMS) University

More information

Identifying Similarities and Differences Across English and Arabic News

Identifying Similarities and Differences Across English and Arabic News Identifying Similarities and Differences Across English and Arabic News David Kirk Evans, Kathleen R. McKeown Department of Computer Science Columbia University New York, NY, 10027, USA {devans,kathy}@cs.columbia.edu

More information

KeyPhrase Extraction with Lexical Chains Gönenç Ercan Computer Engineering Dept. Bilkent University, Ankara, Turkey

KeyPhrase Extraction with Lexical Chains Gönenç Ercan Computer Engineering Dept. Bilkent University, Ankara, Turkey KeyPhrase Extraction with Lexical Chains Gönenç Ercan Computer Engineering Dept. Bilkent University, Ankara, Turkey ercangu@cs.bilkent.edu.tr ABSTRACT Keyphrases have various usages, including indexing,

More information

Semi-supervised emotion lexicon expansion with label propagation

Semi-supervised emotion lexicon expansion with label propagation Semi-supervised emotion lexicon expansion with label propagation Mario Giulianelli 1 Daniël de Kok 2 1 University of Amsterdam 2 Seminar für Sprachwissenschaft University of Tübingen CLIN, 2018 1/19 Emotion

More information

Automated Educational Course Metadata Generation Based on Semantics Discovery

Automated Educational Course Metadata Generation Based on Semantics Discovery Automated Educational Course Metadata Generation Based on Semantics Discovery Marián Šimko and Mária Bieliková Institute of Informatics and Software Engineering, Faculty of Informatics and Information

More information

Towards a Vecsigrafo Portable Semantics in Knowledge-based Text Analytics. Ronald Denaux & José Manuel Gómez Pérez HSSUES Oct.

Towards a Vecsigrafo Portable Semantics in Knowledge-based Text Analytics. Ronald Denaux & José Manuel Gómez Pérez HSSUES Oct. Towards a Vecsigrafo Portable Semantics in Knowledge-based Text Analytics Ronald Denaux & José Manuel Gómez Pérez HSSUES Oct. 21st, 2017 The Cognitive Chasm How can humans and AI interact with and understand

More information

Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach

Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach Wei Xu 1, Chunfa Yuan 1, Wenjie Li 2, Mingli Wu 2, and Kam-Fai Wong 3 1 Department of Computer Science and Technology

More information

Lexical Disambiguation

Lexical Disambiguation Lexical Disambiguation The Interaction of Knowledge Sources in Word Sense Disambiguation Will Roberts wroberts@coli.uni-sb.de Wednesday, 4 June, 2008 1/34 Will Roberts Lexical Disambiguation Word Senses

More information

SimpLe: Lexical Simplification using Word Sense Disambiguation

SimpLe: Lexical Simplification using Word Sense Disambiguation SimpLe: Lexical Simplification using Word Sense Disambiguation Nikolay YAKOVETS a,1 a and Ameeta AGRAWAL a Department of Computer Science and Engineering, York University, Canada Abstract. Sentence simplification

More information

Developing Word Sense Disambiguation Corpuses using Word2vec and Wu Palmer for Disambiguation

Developing Word Sense Disambiguation Corpuses using Word2vec and Wu Palmer for Disambiguation Developing Word Sense Disambiguation Corpuses using Word2vec and Wu Palmer for Disambiguation Fadli Husein Wattiheluw Department of Informatics Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia

More information

Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection

Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection Thade Nahnsen, Özlem Uzuner, Boris Katz Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Final Projects. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Final Projects. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Final Projects Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho Collados and Roberto Navigli lcl.uniroma1.it/wsdeval Word Sense Disambiguation

More information

Putting it Simply: a Context-Aware Approach to Lexical Simplification

Putting it Simply: a Context-Aware Approach to Lexical Simplification Putting it Simply: a Context-Aware Approach to Lexical Simplification Or Biran Computer Science Columbia University New York, NY 10027 ob2008@columbia.edu Samuel Brody Noémie Elhadad Communication & Information

More information

Utilizing Contextually Relevant Terms in Bilingual Lexicon Extraction

Utilizing Contextually Relevant Terms in Bilingual Lexicon Extraction Utilizing Contextually Relevant Terms in Bilingual Lexicon Extraction Azniah Ismail Department of Computer Science University of York York YO10 5DD UK azniah@cs.york.ac.uk Suresh Manandhar Department of

More information

LIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 2013

LIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 2013 LIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 2013 Martin Gleize LIMSI-CNRS & ENS B.P. 133 91403 ORSAY CEDEX, France gleize@limsi.fr Brigitte Grau LIMSI-CNRS & ENSIIE B.P.

More information

Multilingual. Language Processing. Applications. Natural

Multilingual. Language Processing. Applications. Natural Multilingual Natural Language Processing Applications Contents Preface xxi Acknowledgments xxv About the Authors xxvii Part I In Theory 1 Chapter 1 Finding the Structure of Words 3 1.1 Words and Their

More information

Easy First Dependency Parsing of Modern Hebrew

Easy First Dependency Parsing of Modern Hebrew Easy First Dependency Parsing of Modern Hebrew Yoav Goldberg and Michael Elhadad Ben Gurion University of the Negev Department of Computer Science POB 653 Be er Sheva, 84105, Israel {yoavg elhadad}@cs.bgu.ac.il

More information

A review of word embedding and document similarity algorithms applied to academic text

A review of word embedding and document similarity algorithms applied to academic text A review of word embedding and document similarity algorithms applied to academic text Computer Science Bachelor s Thesis Author: Jon Ezeiza Alvarez Supervisor: Prof. Dr. Hannah Bast Motivation A consequence

More information

Lexical-Based Alignment for Reconstruction of Structure in Parallel Texts*

Lexical-Based Alignment for Reconstruction of Structure in Parallel Texts* Lexical-Based Alignment for Reconstruction of Structure in Parallel Texts* Alexander Gelbukh 1, Grigori Sidorov 1 2, and Liliana Chanona-Hernandez 1 Center for Research in Computer Science, National Polytechnic

More information

Learning-to-Rank for Hybrid User Profiles

Learning-to-Rank for Hybrid User Profiles Houssem Safi, Maher Jaoua, Lamia Belguith Hadrich ANLP Research Group, MIRACL Laboratory Faculty of Economics and management of Sfax, Tunisia safi.houssem@gmail.com, {Maher.Jaaoua, la.beguith}@fsegs.rnu.tn

More information

Improving Statistical Word Alignment with a Rule-Based Machine Translation System

Improving Statistical Word Alignment with a Rule-Based Machine Translation System Improving Statistical Word Alignment with a Rule-Based Machine Translation System WU Hua, WANG Haifeng Toshiba (China) Research & Development Center 5/F., Tower W2, Oriental Plaza, No.1, East Chang An

More information

Kaggle Competition: Quora Question Pairs ENSC895 Course Project

Kaggle Competition: Quora Question Pairs ENSC895 Course Project Kaggle Competition: Quora Question Pairs ENSC895 Course Project Arlene Fu, 301256171 Professor: Ivan Bajic Simon Fraser University December 4 th, 2017 1. Introduction There are over 100 million people

More information

Dependency Parsing. Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING

Dependency Parsing. Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING Dependency Parsing Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING Adapted from slides by Neelamadhav Gantayat and Ryan MacDonald Computational Linguistics: Jordan

More information

IRIT at INEX 2013: Tweet Contextualization Track

IRIT at INEX 2013: Tweet Contextualization Track IRIT at INEX 2013: Tweet Contextualization Track Liana Ermakova, Josiane Mothe Institut de Recherche en Informatique de Toulouse 118 Route de Narbonne, 31062 Toulouse Cedex 9, France liana.ermakova.87@gmail.com,

More information

A metric for automatically evaluating coherent summaries via context chains

A metric for automatically evaluating coherent summaries via context chains 2009 IEEE International Conference on Semantic Computing A metric for automatically evaluating coherent summaries via context chains Frank Schilder and Ravi Kondadadi Thomson Reuters Corporation Research

More information

Loss-augmented Structured Prediction

Loss-augmented Structured Prediction Loss-augmented Structured Prediction CMSC 723 / LING 723 / INST 725 Marine Carpuat Figures, algorithms & equations from CIML chap 17 POS tagging Sequence labeling with the perceptron Sequence labeling

More information

Off-topic English Essay Detection Model Based on Hybrid Semantic Space for Automated English Essay Scoring System

Off-topic English Essay Detection Model Based on Hybrid Semantic Space for Automated English Essay Scoring System Off-topic English Essay Detection Model Based on Hybrid Semantic Space for Automated English Essay Scoring System Guimin Huang, Jian Liu a, Chunli Fan and Tingting Pan School of Information and Communication

More information

Postgraduate Certificate in Data Analysis and Pattern Recognition

Postgraduate Certificate in Data Analysis and Pattern Recognition Postgraduate Certificate in Data Analysis and Pattern Recognition 1 of Certificate: Postgraduate Certificate in Data Analysis and Pattern Recognition 1.1 of Award: Postgraduate Certificate in Data Analysis

More information

Syntactic N-grams as Features for the Author Profiling Task

Syntactic N-grams as Features for the Author Profiling Task Syntactic N-grams as Features for the Author Profiling Task Notebook for PAN at CLEF 2015 Juan-Pablo Posadas-Durán, Ilia Markov, Helena Gómez-Adorno, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh,

More information

Sentence Annotation based Enhanced Semantic Summary Generation from Multiple Documents

Sentence Annotation based Enhanced Semantic Summary Generation from Multiple Documents American Journal of Applied Sciences 9 (7): 1063-1070, 2012 ISSN 1546-9239 2012 Science Publications Sentence Annotation based Enhanced Semantic Summary Generation from Multiple Documents Kogilavani, A.

More information

Unsupervised WSD with a Dynamic Thesaurus *

Unsupervised WSD with a Dynamic Thesaurus * Unsupervised WSD with a Dynamic Thesaurus * Javier Teada-Cárcamo, 1,2 Hiram Calvo 1, Alexander Gelbukh 1 1 Center for Computing Research, National Polytechnic Institute, Mexico City, 07738, Mexico 2 Sociedad

More information

Effects of Using Simple Semantic Similarity on Textual Entailment Recognition

Effects of Using Simple Semantic Similarity on Textual Entailment Recognition Effects of Using Simple Semantic Similarity on Textual Entailment Recognition TEAM ID:u_tokyo Ken-ichi Yokote, Shohei Tanaka and Mitsuru Ishizuka Department of Information and Communication Eng. School

More information

Creating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation

Creating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation Creating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation Department of Linguistics The Ohio State University {scott,mwhite}@ling.ohio-state.edu June 24, 2011 Idea

More information

Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples

Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

Derivational Smoothing for Syntactic Distributional Semantics

Derivational Smoothing for Syntactic Distributional Semantics Derivational Smoothing for Syntactic Distributional Semantics Sebastian Padó, Jan Šnajder, and Britta Zeller Institute for Computational Linguistics, Heidelberg University Faculty of Electrical Engineering

More information

DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations

DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations Diana McCarthy Saarland University diana@dianamccarthy.co.uk Spandana Gella University of Malta,

More information

Using WordNet to Supplement Corpus Statistics

Using WordNet to Supplement Corpus Statistics Using WordNet to Supplement Corpus Statistics Rose Hoberman and Roni Rosenfeld November 14, 2002 Sphinx Lunch Nov 2002 Data, Statistics, and Sparsity Statistical approaches need large amounts of data Even

More information

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, and Aijun An Department of Computer Science and Engineering, York University,

More information

Automatic Retrieval of Parallel Collocations

Automatic Retrieval of Parallel Collocations Automatic Retrieval of Parallel Collocations Valeriy I. Novitskiy The Moscow Institute of Physics and Technology, Moscow, Russia nov.valerij@gmail.com Abstract. An approach to automatic retrieval of parallel

More information

Utilizing contextually relevant terms in bilingual lexicon extraction

Utilizing contextually relevant terms in bilingual lexicon extraction Utilizing contextually relevant terms in bilingual lexicon extraction Azniah Ismail & Suresh Manandhar 5 June 2009 NAACL-2009 Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics

More information

Sentence Embedding Evaluation Using Pyramid Annotation

Sentence Embedding Evaluation Using Pyramid Annotation Sentence Embedding Evaluation Using Pyramid Annotation Tal Baumel talbau@cs.bgu.ac.il Raphael Cohen cohenrap@cs.bgu.ac.il Michael Elhadad elhadad@cs.bgu.ac.il Abstract Word embedding vectors are used as

More information

Template-based Recognition of Online Handwriting

Template-based Recognition of Online Handwriting Template-based Recognition of Online Handwriting PhD Dissertation of Jakob Sternby Lund University, Sweden Opponent: Sargur N. Srihari University at Buffalo, State University of New York May 30, 2008 1

More information

NLP APPLICATIONS IN EXTERNAL PLAGIARISM DETECTION

NLP APPLICATIONS IN EXTERNAL PLAGIARISM DETECTION U.P.B. Sci. Bull., Series C, Vol. 76, Iss. 3, 2014 ISSN 2286-3540 NLP APPLICATIONS IN EXTERNAL PLAGIARISM DETECTION Sorin AVRAM 1, Dan CARAGEA 2, Theodor BORANGIU 3 The purpose of our present research

More information

Semantic similarity and analysis of the word frequency dynamics

Semantic similarity and analysis of the word frequency dynamics Journal of Physics: Conference Series PAPER OPEN ACCESS Semantic similarity and analysis of the word frequency dynamics To cite this article: V V Bochkarev et al 2017 J. Phys.: Conf. Ser. 936 012067 View

More information

A Visual Representation of Wittgenstein s Tractatus Logico-Philosophicus

A Visual Representation of Wittgenstein s Tractatus Logico-Philosophicus A Visual Representation of Wittgenstein s Tractatus Logico-Philosophicus Anca Bucur Center of Excellence in Image Study, Faculty of Letters, Solomon Marcus Center for Computational Linguistics, University

More information

Efficient Text Summarization Using Lexical Chains

Efficient Text Summarization Using Lexical Chains Efficient Text Summarization Using Lexical Chains H. Gregory Silber Computer and Information Sciences University of Delaware Newark, DE 19711 USA silber@udel.edu ABSTRACT The rapid growth of the Internet

More information

Using Wikipedia with associative networks for document classification

Using Wikipedia with associative networks for document classification Using Wikipedia with associative networks for document classification N. Bloom 1,2, M. Theune 2 and F.M.G. De Jong 2 1- Perrit B.V., Hengelo - The Netherlands 2- University of Twente, Enschede - The Netherlands

More information

Learning Feature-based Semantics with Autoencoder

Learning Feature-based Semantics with Autoencoder Wonhong Lee Minjong Chung wonhong@stanford.edu mjipeo@stanford.edu Abstract It is essential to reduce the dimensionality of features, not only for computational efficiency, but also for extracting the

More information

IRIT at INEX: Question Answering Task

IRIT at INEX: Question Answering Task IRIT at INEX: Question Answering Task Liana Ermakova, Josiane Mothe Institut de Recherche en Informatique de Toulouse 118 Route de Narbonne, 31062 Toulouse Cedex 9, France liana.ermakova.87@gmail.com,

More information