Structure Discovery and Visualization in Scientific Literature

Size: px
Start display at page:

Download "Structure Discovery and Visualization in Scientific Literature"

Transcription

1 DIPF-Workshop im Lichtenberghaus Chris Biemann, August 2, 2012 Data-driven Methods for Text Analysis Structure Discovery and Visualization in Scientific Literature

2 Outline What standard NLP can do Structure Discovery: unsupervised and knowledge-free methods Word co-occurrence Semantic Similarity Segmentation with Topic Models Visual Analytics of Language Conclusion 2

3 What standard NLP can do Indexing: Find documents containing search terms, ordered by relevance, filtered by meta data (think: Google / Opac) Low-level-processing: Language recognition, Parts-ofspeech tagging, Named Entity recognition, glossing, segment detection.... and where we are going: semantic matching and paraphrase recognition automatic summarization personal assistant 3

4 Structure Discovery: Collection exploration Structure Discovery: Units (words, sentences, documents) are characterized by distinguishing features Similar units are grouped: discovery of structure in the data groups of units for new units/features for further analysis structures can be visualized or used by processing systems Advantages: domain- and language independent no manual creation of lexical resources or training material 4

5 Syntagmatic vs. Paradigmatic Relations Ferdinand de Saussure Syntagmatic Relations: syntactic constraints in the context Paradigmatic Relations: associations, semantic constraints 5

6 Co-occurrence Graphs from Large Corpora Pairs of words that significantly co-occur within a given window significance test: score interesting pairs higher result: top-significant co-occurrences per word Gesamtschule Lerntheorie Local neighborhoods: top significant co-occurrences, and their top significant connections Mix of syntactic and semantic relations Biemann, Chr.; Bordag, S.; Heyer, G.; Quasthoff, U.; Wolff, Chr.: Language independent Methods for Compiling Monolingual Lexi cal Data, Proceedings of CicLING 2004, Seoul, Korea 6

7 Distributional Thesaurus (DT) Computed from distributional similarity statistics Entry for a target word consists of a ranked list of neighbors meeting gathering 56.0 seminar 49.0 meet 46.0 lecture 43.0 conference 42.0 concert 38.0 fair 35.0 exhibition 33.0 demonstration 33.0 reception 33.0 rally 32.0 presentation 30.0 symposium 28.0 screening 27.0 workshop 26.0 dinner 26.0 occasion 25.0 reading 25.0 picnic 25.0 congress PowerPoint Excel Word Access Outlook Flash Microsoft_Excel WordPerfect PostScript SVG RTF Microsoft_Word XML Internet_Explorer DjVu TIFF PDB insight Kuh Hund Kuh First order Second order 2 bunt#attr fliegen#subj Katze#kon die#det Hund

8 Matching with semantic expansions Knowledge-based Word Sense Disambiguation (à la Lesk) A patient fell over a stack of magazines in an aisle at a physiotherapist practice. customer student individual person mother user passenger.. rose dropped climbed increased slipped declined tumbled surged pile copy lots dozens array collection amount ton Zero word overlap field hill line river stairs road hall driveway physician attorney psychiatrist scholar engineer journalist contractor session game camp workouts training meeting work WordNet: S: (n) magazine (product consisting of a paperback periodic publication as a physical object) "tripped over a pile of magazines jumped woke turned drove walked blew put fell.. stack tons piece heap collection bag loads mountain.. Overlap = 2 Overlap = 1 Overlap = 2 8

9 Text Mr. Pohs, previously executive vice president and chief operating officer, was named interim president and chief executive officer after David M. Harrold, a company founder, resigned from the posts for personal reasons in August. Cellular said Robert J. Lunday Jr., its chairman and another founder, resigned from the company s board to pursue the sale of his telephone Intuition: company, Big Sandy Telecommunications Inc. Apartheid foes staged a massive antigovernment rally in South Africa. More than 70,000 people filled a soccer stadium on the outskirts of the black township of Soweto and between welcomed segments freed leaders of the outlawed African National Congress. It was considered South Africa s largest opposition rally. Cohesion within segments is higher than cohesion 9

10 Text Segmentation using Topic Models Mr.:62 Pohs:2,:2 previously:4 executive:2 vice:2 president:2 and:17 chief:2 Mr.:62 operating:2 Pohs:2 officer:2,:2 previously:4,:72 was:2 executive:2 named:2 interim:2 vice:2 president:2 and:17 and:73 chief:2 operating:2 executive:2 officer:2,:72 after:17 was:2 David:2 named:2 M:27 interim:2.:36 Harrold:65 president:2,:2 and:73 a:84 company:2 chief:2 executive:2 founder:2,:26 officer:2 resigned:2 after:17 from:91 David:2 the:34 M:27 posts:2.:36 Harrold:65 for:62 personal:61,:2 a:84 company:2 reasons:2 founder:2 in:84 August:2,:26 resigned:2.:58 Cellular:70 from:91 said:54 the:34 Robert:2 posts:2 J:61 for:62.:42 personal:61 Lunday:2 Jr:18 reasons:2.:31,:44 in:84 its:57 August:2 chairman:2.:58 and:73 Cellular:70 another:25 said:54 founder:2 Robert:2,:31 J:61 resigned:2.:42 Lunday:2 from:91 Jr:18 the:57.:31,:44 its:57 company:2 chairman:2 s:24 board:2 and:73 to:10 another:25 pursue:2 founder:2 the:10,:31 sale:55 resigned:2 of:67 his:28 from:91 telephone:31 the:57 company:2 company:42 s:24,:74 board:2 Big:10 Sandy:50 to:10 pursue:2 Telecommunications:31 the:10 sale:55 of:67 Inc:2 his:28.:74 telephone:31 company:42,:74 Big:10 Sandy:50 Telecommunications:31 Inc:2.:74 Apartheid:37 foes:37 staged:41 a:37 massive:37 antigovernment:37 rally:37 in:40 South:37 Apartheid:37 Africa:37 foes:37.:19 staged:41 More:29 a:37 than:34 massive:37 70:45,:26 antigovernment: people:37 filled:17 rally:37 a:22 in:40 soccer:37 South:37 Africa:37 stadium:88.:19 on:46 More:29 the:34 than:34 outskirts:37 70:45,:26 of: the:24 people:37 black:37 filled:17 township:37 a:22 of:45 soccer:37 Soweto:37 stadium:88 and:37 on:46 welcomed:11 the:34 outskirts:37 freed:37 leaders:37 of:93 the:24 of:98 black:37 the:57 township:37 outlawed:37 of:45 Soweto:37 African:37 and:37 National:45 welcomed:11 Congress:87 freed:37 leaders:37.:72 It:79 of:98 was:55 the:57 considered:37 South:37 outlawed:37 Africa:37 African:37 s:33 National:45 largest:90 opposition:67 Congress:87.:72 rally:37 It:79.:37 was:55 considered:37 South:37 Africa:37 s:33 largest:90 opposition:67 rally:37.:37 Riedl M., Biemann C. (2012): TopicTiling: A Text Segmentation Algorithm based on LDA, Proc. of the Student Research Workshop of the 50th ACL, Jeju, Republic of Korea 10

11 Visual Analytics using NLP NLP: getting linguistic annotations right Visual Analytics: present data in an interesting way. The interpretation lies in the eye of the beholder NLP + Visual Analytics can yield interesting tools for literature research and document collection understanding 11

12 Term Maps and Concept Trails Background Map: significant terms and their co-occurrences Red/Yellow Trail: Sequence of terms in incoming document Georgien Afghanistan Irak Quickly maps a new document in a background map e.g. visualizes how a new document matches current 12 body of references Martin Riedl and Chris Biemann, TU Biemann, Darmstadt C., Böhm, K., Heyer, G., Melz, R. (2004): SemanticTalk: Software for Visualizing Brainstorming Sessions and Thematic Concept Trails on Document Collections, Proceedings of ECML/PKDD 2004, Pisa, Italy 12

13 Time lines: Frequency over time slices Eiken, U.C., Liseth, A.T., Richter, M., Witschel, F. and Biemann, C. (2006): Ord i Dag: Mining Norwegian Daily Newswire. Proc. FinTAL, Turku, Finland Quasthoff, U. (2007): Deutsches Neologismenwörterbuch. Neue Wörter und Wortbedeutungen in der Gegenwartssprache. Berlin, De Gruyter 13

14 Conclusions Language Technology can solve many basic preprocessing tasks Structure Discovery can be used to unveil specific phenomena and relations of language units methodology is independent of domain or language resulting structure is domain-specific and adopts to changes Visual Analytics of language material visual aid for locating an incoming document in a background map time series analysis (many more possible, ask Daniela Oelke) Statistics over text is a powerful tool to support literature research Language technology cannot replace human researchers, editors and authors. But it can make their job easier! 14

15 Q&A 15

16 Positional Co-occurrences: sagte vs. meinte Also store the distance between words in the sentence captures parts of syntactic structure similar terms have similar contexts 16

17 Clustering of DT entries: Sense Induction paper#nn bright#jj 17

18 Cooc- PEDOCS 18

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

Word Sense Disambiguation

Word Sense Disambiguation Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

Leader 1: Dr. Angela K. Lewis Leader 2: Dr. Tondra Loder-Jackson Professor of Political Science Associate Professor of Education dralewis@uab.edu tloder@uab.edu 205.934.8416 205.934.8304 Course Description

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Using Semantic Relations to Refine Coreference Decisions

Using Semantic Relations to Refine Coreference Decisions Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

Welcome to. ECML/PKDD 2004 Community meeting

Welcome to. ECML/PKDD 2004 Community meeting Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

MARYLAND BLACK BUSINESS SUMMIT & EXPO March 24-27, 2011 presented by AATC * Black Dollar Exchange * BBH Tours

MARYLAND BLACK BUSINESS SUMMIT & EXPO March 24-27, 2011 presented by AATC * Black Dollar Exchange * BBH Tours Baltimore, MD. February 23, 2011 Lou Fields, President of AATC and founder of the Black Dollar Exchange announced the First Annual Maryland Black Business Summit & Expo being held in the City of Baltimore

More information

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report 2014-2015 OFFICE OF ENROLLMENT MANAGEMENT Annual Report Table of Contents 2014 2015 MESSAGE FROM THE VICE PROVOST A YEAR OF RECORDS 3 Undergraduate Enrollment 6 First-Year Students MOVING FORWARD THROUGH

More information

Operational Knowledge Management: a way to manage competence

Operational Knowledge Management: a way to manage competence Operational Knowledge Management: a way to manage competence Giulio Valente Dipartimento di Informatica Universita di Torino Torino (ITALY) e-mail: valenteg@di.unito.it Alessandro Rigallo Telecom Italia

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Level 1 Mathematics and Statistics, 2015

Level 1 Mathematics and Statistics, 2015 91037 910370 1SUPERVISOR S Level 1 Mathematics and Statistics, 2015 91037 Demonstrate understanding of chance and data 9.30 a.m. Monday 9 November 2015 Credits: Four Achievement Achievement with Merit

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

UNIT IX. Don t Tell. Are there some things that grown-ups don t let you do? Read about what this child feels.

UNIT IX. Don t Tell. Are there some things that grown-ups don t let you do? Read about what this child feels. UNIT IX Are there some things that grown-ups don t let you do? Read about what this child feels. There are lots of things They won t let me do- I'm not big enough yet, They say. So I patiently wait Till

More information

Literature and the Language Arts Experiencing Literature

Literature and the Language Arts Experiencing Literature Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Outreach Connect User Manual

Outreach Connect User Manual Outreach Connect A Product of CAA Software, Inc. Outreach Connect User Manual Church Growth Strategies Through Sunday School, Care Groups, & Outreach Involving Members, Guests, & Prospects PREPARED FOR:

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Objective: Model division as the unknown factor in multiplication using arrays and tape diagrams. (8 minutes) (3 minutes)

Objective: Model division as the unknown factor in multiplication using arrays and tape diagrams. (8 minutes) (3 minutes) Lesson 11 3 1 Lesson 11 Objective: Model division as the unknown factor in multiplication using arrays Suggested Lesson Structure Fluency Practice Application Problem Concept Development Student Debrief

More information

UVA Office of University Building Official. Annual Report

UVA Office of University Building Official. Annual Report UVA Office of University Building Official Annual Report 2009-2010 Introduction The University of Virginia Office of University Building Official (OUBO) is charged with the administration of the Virginia

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Executive Summary. Gautier High School

Executive Summary. Gautier High School Pascagoula School District Mr. Boyd West, Principal 4307 Gautier-Vancleave Road Gautier, MS 39553-4800 Document Generated On January 16, 2013 TABLE OF CONTENTS Introduction 1 Description of the School

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

The MEANING Multilingual Central Repository

The MEANING Multilingual Central Repository The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9) Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

Enhancing Customer Service through Learning Technology

Enhancing Customer Service through Learning Technology C a s e S t u d y Enhancing Customer Service through Learning Technology John Hancock Implements an online learning solution which integrates training, performance support, and assessment Chris Howard

More information

Lesson M4. page 1 of 2

Lesson M4. page 1 of 2 Lesson M4 page 1 of 2 Miniature Gulf Coast Project Math TEKS Objectives 111.22 6b.1 (A) apply mathematics to problems arising in everyday life, society, and the workplace; 6b.1 (C) select tools, including

More information

Introduction to Yearbook / Newspaper Course Syllabus

Introduction to Yearbook / Newspaper Course Syllabus Introduction to Yearbook / Newspaper Course Highland East Junior High School 2017-18 Teacher: Mr. Gibson Classroom: 305 Hour: 4th Hour Email: briangibson@mooreschools.com Phone: 735-4580 Website resources:

More information

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5 Reading Horizons Volume 10, Issue 3 1970 Article 5 APRIL 1970 A Look At Linguistic Readers Nicholas P. Criscuolo New Haven, Connecticut Public Schools Copyright c 1970 by the authors. Reading Horizons

More information

Measures of the Location of the Data

Measures of the Location of the Data OpenStax-CNX module m46930 1 Measures of the Location of the Data OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 The common measures

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

How To Design A Training Course By Peter Taylor

How To Design A Training Course By Peter Taylor How To Design A Training Course By Peter Taylor If you are looking for a ebook by Peter Taylor How to Design a Training Course in pdf format, then you've come to right website. We presented the full edition

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Ontologies vs. classification systems

Ontologies vs. classification systems Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk

More information

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation Tristan Miller 1 Nicolai Erbs 1 Hans-Peter Zorn 1 Torsten Zesch 1,2 Iryna Gurevych 1,2 (1) Ubiquitous Knowledge Processing Lab

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

Executive Summary. Sidney Lanier Senior High School

Executive Summary. Sidney Lanier Senior High School Montgomery County Board of Education Dr. Antonio Williams, Principal 1756 South Court Street Montgomery, AL 36104 Document Generated On October 7, 2015 TABLE OF CONTENTS Introduction 1 Description of the

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

U VA THE CHANGING FACE OF UVA STUDENTS: SSESSMENT. About The Study

U VA THE CHANGING FACE OF UVA STUDENTS: SSESSMENT. About The Study About The Study U VA SSESSMENT In 6, the University of Virginia Office of Institutional Assessment and Studies undertook a study to describe how first-year students have changed over the past four decades.

More information

university of wisconsin MILWAUKEE Master Plan Report

university of wisconsin MILWAUKEE Master Plan Report university of wisconsin MILWAUKEE Master Plan Report 2010 introduction CUNNINGHAM 18 INTRODUCTION EMS CHEMISTRY LAPHAM 19 INTRODCUCTION introduction The University of Wisconsin-Milwaukee (UWM) is continually

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME InTraServ Intelligent Training Service for Management Training in SMEs Deliverable DL 9 Dissemination Plan Prepared for the European Commission under Contract

More information

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB

More information

Interactive Whiteboard

Interactive Whiteboard 50 Graphic Organizers for the Interactive Whiteboard Whiteboard-ready graphic organizers for reading, writing, math, and more to make learning engaging and interactive by Jennifer Jacobson & Dottie Raymer

More information

CNS 18 21th Communications and Networking Simulation Symposium

CNS 18 21th Communications and Networking Simulation Symposium CNS 18 21th Communications and Networking Simulation Symposium Spring Simulation Multi-conference 2018 Organizing Committee AAA General Chair: Dr. Abdolreza Abhari, aabhari@ryerson.ca Ryerson University,

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Name: Class: _ Date: _ Test Review Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Members of a high school club sold hamburgers at a baseball game to

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

Top US Tech Talent for the Top China Tech Company

Top US Tech Talent for the Top China Tech Company THE FALL 2017 US RECRUITING TOUR Top US Tech Talent for the Top China Tech Company INTERVIEWS IN 7 CITIES Tour Schedule CITY Boston, MA New York, NY Pittsburgh, PA Urbana-Champaign, IL Ann Arbor, MI Los

More information

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance James J. Kemple, Corinne M. Herlihy Executive Summary June 2004 In many

More information

Close Up. washington, Dc High School Programs

Close Up. washington, Dc High School Programs Close Up washington, Dc High School Programs Washington Close Up offers the most comprehensive educational opportunity in Washington, DC. Established in 1971, Close Up is the nation s leading nonprofit,

More information

Characterizing Mathematical Digital Literacy: A Preliminary Investigation. Todd Abel Appalachian State University

Characterizing Mathematical Digital Literacy: A Preliminary Investigation. Todd Abel Appalachian State University Characterizing Mathematical Digital Literacy: A Preliminary Investigation Todd Abel Appalachian State University Jeremy Brazas, Darryl Chamberlain Jr., Aubrey Kemp Georgia State University This preliminary

More information

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases. F.009 Phrases Objective The student will gain speed and accuracy in reading phrases. Materials YES and NO header cards (Activity Master F.001.AM1) Phrase cards (Activity Master F.009.AM1a - F.009.AM1f)

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, ! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain Myongho Yi 1 and Sam Gyun Oh 2* 1 School of Library and Information Studies, Texas Woman

More information

Developing Grammar in Context

Developing Grammar in Context Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

14 N Leo News. Information for all Leos. District 14N Leo Clubs

14 N Leo News. Information for all Leos. District 14N Leo Clubs May 21, 2014 Volume 2, Issue 3 14 N Leo News Information for all Leos District 14N Leo Clubs -Apollo Ridge Senior -Apollo Ridge Middle -Beaver Area -Beaver Falls Area -Blackhawk Members from at least three

More information

Dear campus colleagues, Thank you for choosing to present the CME Bulletin Board in a Bag : Native American History Month in your area this November!

Dear campus colleagues, Thank you for choosing to present the CME Bulletin Board in a Bag : Native American History Month in your area this November! Dear campus colleagues, Thank you for choosing to present the CME Bulletin Board in a Bag : Native American History Month in your area this November! In this packet, and any attached documents, you will

More information

The role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning

The role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning 1 Article Title The role of the first language in foreign language learning Author Paul Nation Bio: Paul Nation teaches in the School of Linguistics and Applied Language Studies at Victoria University

More information

Task Tolerance of MT Output in Integrated Text Processes

Task Tolerance of MT Output in Integrated Text Processes Task Tolerance of MT Output in Integrated Text Processes John S. White, Jennifer B. Doyon, and Susan W. Talbott Litton PRC 1500 PRC Drive McLean, VA 22102, USA {white_john, doyon jennifer, talbott_susan}@prc.com

More information

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,

More information

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS Danail Dochev 1, Radoslav Pavlov 2 1 Institute of Information Technologies Bulgarian Academy of Sciences Bulgaria, Sofia 1113, Acad. Bonchev str., Bl.

More information

Innovative Methods for Teaching Engineering Courses

Innovative Methods for Teaching Engineering Courses Innovative Methods for Teaching Engineering Courses KR Chowdhary Former Professor & Head Department of Computer Science and Engineering MBM Engineering College, Jodhpur Present: Director, JIETSETG Email:

More information

Short Text Understanding Through Lexical-Semantic Analysis

Short Text Understanding Through Lexical-Semantic Analysis Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China

More information

Using the CU*BASE Member Survey

Using the CU*BASE Member Survey Using the CU*BASE Member Survey INTRODUCTION Now more than ever, credit unions are realizing that being the primary financial institution not only for an individual but for an entire family may be the

More information

Adjusting a semantic taxonomy and annotation tool for historical corpora

Adjusting a semantic taxonomy and annotation tool for historical corpora Adjusting a semantic taxonomy and annotation tool for historical corpora Dr Paul Rayson @perayson Director of UCREL research centre, School of Computing and Communications, Lancaster, UK Joint work with

More information