Annotated datasets for NER

Similar documents
Postprint.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

BYLINE [Heng Ji, Computer Science Department, New York University,

Python Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Distant Supervised Relation Extraction with Wikipedia and Freebase

Linking Task: Identifying authors and book titles in verbose queries

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Word Segmentation of Off-line Handwritten Documents

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Case Study: News Classification Based on Term Frequency

Probabilistic Latent Semantic Analysis

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

A Vector Space Approach for Aspect-Based Sentiment Analysis

AQUA: An Ontology-Driven Question Answering System

Australian Journal of Basic and Applied Sciences

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Cross Language Information Retrieval

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Multilingual Sentiment and Subjectivity Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Learning Methods for Fuzzy Systems

A Comparison of Two Text Representations for Sentiment Analysis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Laboratorio di Intelligenza Artificiale e Robotica

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Assignment 1: Predicting Amazon Review Ratings

THE world surrounding us involves multiple modalities

Laboratorio di Intelligenza Artificiale e Robotica

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Using dialogue context to improve parsing performance in dialogue systems

A survey of multi-view machine learning

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

CSL465/603 - Machine Learning

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Axiom 2013 Team Description Paper

Semantic and Context-aware Linguistic Model for Bias Detection

Platform for the Development of Accessible Vocational Training

Expert locator using concept linking. V. Senthil Kumaran* and A. Sankar

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Test How To. Creating a New Test

Human Emotion Recognition From Speech

Lecture 1: Machine Learning Basics

Top US Tech Talent for the Top China Tech Company

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Lecture 1: Basic Concepts of Machine Learning

The stages of event extraction

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

Probing for semantic evidence of composition by means of simple classification tasks

CS 446: Machine Learning

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Movie Review Mining and Summarization

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

TextGraphs: Graph-based algorithms for Natural Language Processing

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Mining Association Rules in Student s Assessment Data

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Speech Emotion Recognition Using Support Vector Machine

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Data Fusion Models in WSNs: Comparison and Analysis

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

arxiv: v1 [cs.cl] 20 Jul 2015

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Online Updating of Word Representations for Part-of-Speech Tagging

Knowledge-Based - Systems

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Abstractions and the Brain

Discriminative Learning of Beam-Search Heuristics for Planning

Coupling Semi-Supervised Learning of Categories and Relations

Second Exam: Natural Language Parsing with Neural Networks

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Applications of memory-based natural language processing

The Haymarket Disaster and the Knights of Labor

Cross-Media Knowledge Extraction in the Car Manufacturing Industry

Extracting and Ranking Product Features in Opinion Documents

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CS Machine Learning

Transcription:

Annotated datasets for NER TOPIC: Training data for Named Entity Recognition Give a brief overview of available annotated datasets for NER I.e. the data we need to train models with full supervision Do you think this is enough data to train good supervised models? Give us some results that support your answer What about using unsupervised learning? Nadeau and Sekin, A survey of named entity recognition and classification, Linguisticae Investigationes 30, 2007, pp. 3 26.

Annotated data for Medical NER TOPIC: Named Entities in the CLEF-eHEALTH challenge Give an overview of the CLEF-eHEALTH challenge Talk about NER in this challenge (Task 1) Present the training data provided for medical NER Which set of classes are annotated? How can you use this data to train a classifier (e.g. a linear model)? https://sites.google.com/site/clefehealth2016/ https://sites.google.com/site/clefehealth2015/

Supervised NER TOPIC: Linear models for Named Entity Recognition Get a training set for a NER task (e.g. CLEF e-health) Model the problem as a multi-class classification task Consider the following methods: (non-sequential) Linear models Linear-chain conditional random fields Which one do you think will work better? and why? https://sites.google.com/site/clefehealth2016/ Nadeau and Sekin, A survey of named entity recognition and classification, Linguisticae Investigationes 30, 2007, pp. 3 26.

Supervised NER TOPIC: Neural Networks for Named Entity Recognition What are the advantages of neural networks over linear models? What do the non-linear activations do? Present a neural network for the NER task Should we use neural networks instead of linear models for NER Give us some results that support your answer Collobert et al., Natural Language Processing (Almost) from Scratch, Journal of Machine Learning Research, 2011, pp. 2493 2537.

Supervised NER TOPIC: Weakly Supervised Named Entity Recognition Starting from a few examples ("seed examples"), how do you automatically build a named entity classifier? This is sometimes referred to as "bootstrapping" What are the problems with this approach? How do you block the process from generalizing too much? Should we use weak supervision instead of (full) supervision for NER Give us some results that support your answer Nadeau and Sekin, A survey of named entity recognition and classification, Linguisticae Investigationes 30, 2007, pp. 3 26.

NER Domain Adaptation TOPIC: Domain adaptation and failure to adapt What is the problem of domain adaptation? How is it addressed in statistical classification approaches to NER? How well does it work Daume III, Frustratingly Easy Domain Adaptation, ACL, 2007.

Classificationbased Citation Parsing TOPIC: Parsing citations using classifiers How is the citation parsing problem formulated using classifiers? What sort of information is available? What does the training data look like? What sorts of downstream applications are based on citation parsing? Peng et al., Information extraction from research papers using conditional random fields, Information Processing & Management, 2006, pp. 963 979.

Question Answering TOPIC: Information Extraction for Question Answering In 2011, IBM's Watson defeated two human champions in the US quiz show Jeopardy Give an overview of Watson's question answering engine DeepQA Highlight how information extraction techniques are used in a complex pipeline for this application Ferrucci et al., An Overview of the DeepQA Project, AI Magazine, 2010, pp. 59 79.

Reading Comprehension TOPIC: Natural Language Comprehension with Neural Networks A machine reading system can answer queries about the content of natural language documents Which resources are required to build a system that is able to solve real-world tasks? How would we design and train a system based on Artificial Neural Networks? Hermann et al., Teaching Machines to Read and Comprehend, NIPS, 2015, pp. 1693 1701.

Event Detection TOPIC: Event Detection in Social Media Activity in social media (e.g., Twitter) can be monitored and analyzed to spot events Use cases: natural disasters, epidemics, stock market,... What are the challenges and which information extraction techniques can be employed? Give a high-level sketch of the overall pipeline Yin et al., Using Social Media to Enhance Emergency Situation Awareness, IEEE Intelligent Systems, November/December 2012, pp. 52 59. Sakaki et al., Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors, WWW, 2010, pp. 851 860.

Sentiment Analysis TOPIC: Applications of Sentiment Analysis: Political Opinion and Customer Suggestions Sentiment analysis and opinion mining: Capturing public opinion in forums, blogs, social networks, Automatic classification of sentiment Describe possible applications of sentiment analysis, e.g. for election prediction, product preferences, marketing,... Wang et al., A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle, ACL System Demonstrations, 2012, pp. 115 120 Negi and Buitelaar, Towards the Extraction of Customer-to-Customer Suggestions from Reviews, EMNLP, 2015, pp. 2159 2167.

IE and Computer Vision (ADVANCED!) TOPIC: Cross-modal Information Extraction Detecting objects in the visual world (in images) and mapping them to words Possible applications: caption generation, event detection based on multi-modal input, image search, Are methods from natural language processing helpful? Distributional semantics, with a projection between an imagebased semantic space and a word-based semantic space How to learn new concepts? Lazaridou et al., Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world, ACL, 2014, pp. 1403 1414.