Structured Output Prediction

Similar documents
Python Machine Learning

Probabilistic Latent Semantic Analysis

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Human Emotion Recognition From Speech

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

Generative models and adversarial training

Speech Emotion Recognition Using Support Vector Machine

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning From the Past with Experiment Databases

Switchboard Language Model Improvement with Conversational Data from Gigaword

Artificial Neural Networks written examination

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Time series prediction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Linking Task: Identifying authors and book titles in verbose queries

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Case Study: News Classification Based on Term Frequency

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Multivariate k-nearest Neighbor Regression for Time Series data -

Learning Methods in Multilingual Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Reducing Features to Improve Bug Prediction

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

A survey of multi-view machine learning

Learning Methods for Fuzzy Systems

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Lecture 1: Basic Concepts of Machine Learning

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Indian Institute of Technology, Kanpur

CS 598 Natural Language Processing

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

CS 446: Machine Learning

Discriminative Learning of Beam-Search Heuristics for Planning

arxiv: v2 [cs.cv] 30 Mar 2017

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

A Vector Space Approach for Aspect-Based Sentiment Analysis

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Calibration of Confidence Measures in Speech Recognition

A Comparison of Two Text Representations for Sentiment Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

A Bayesian Learning Approach to Concept-Based Document Classification

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Australian Journal of Basic and Applied Sciences

Ensemble Technique Utilization for Indonesian Dependency Parser

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Modeling function word errors in DNN-HMM based LVCSR systems

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Semi-Supervised Face Detection

Welcome to. ECML/PKDD 2004 Community meeting

Issues in the Mining of Heart Failure Datasets

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Copyright by Sung Ju Hwang 2013

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Using dialogue context to improve parsing performance in dialogue systems

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A study of speaker adaptation for DNN-based speech synthesis

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Universidade do Minho Escola de Engenharia

Axiom 2013 Team Description Paper

The Smart/Empire TIPSTER IR System

Data Fusion Through Statistical Matching

WHEN THERE IS A mismatch between the acoustic

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Truth Inference in Crowdsourcing: Is the Problem Solved?

Model Ensemble for Click Prediction in Bing Search Ads

OFFICE SUPPORT SPECIALIST Technical Diploma

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Softprop: Softmax Neural Network Backpropagation Learning

TextGraphs: Graph-based algorithms for Natural Language Processing

The stages of event extraction

Applications of data mining algorithms to analysis of medical data

Knowledge Transfer in Deep Convolutional Neural Nets

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Second Exam: Natural Language Parsing with Neural Networks

Rule Learning with Negation: Issues Regarding Effectiveness

Support Vector Machines for Speaker and Language Recognition

Lecture 10: Reinforcement Learning

Word Segmentation of Off-line Handwritten Documents

Natural Language Processing. George Konidaris

Transcription:

Structured Output Prediction CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: T. Joachims, T. Hofmann, Yisong Yue, Chun-Nam Yu, Predicting Structured Objects with Support Vector Machines, Communications of the ACM, Research Highlight, 52(11):97-104, 2009. http://mags.acm.org/communications/200911/

Discriminative vs. Generative Bayes Rule Generative: Make assumptions about Estimate parameters of the two distributions Discriminative: Define set of prediction rules (i.e. hypotheses) H Find h in H that best approximates Question: Can we train HMM s discriminately?

Idea for Discriminative Training of HMM Bayes Rule Model with so that Intuition: Tune so that correct y has the highest value of is a feature vector that describes the match between x and y

Training HMMs with Structural SVM Define to HMM so that model is isomorphic One feature for each possible start state One feature for each possible transition One feature for each possible output in each possible state Feature values are counts

Structural Support Vector Machine Joint features describe match between x and y Learn weights so that is max for correct y

Structural SVM Training Problem Hard-margin optimization problem: Training Set: Prediction Rule: Optimization: Correct label y i must have higher value of than any incorrect label y Find weight vector with smallest norm

Soft-Margin Structural SVM Loss function measures match between target and prediction.

Soft-Margin Structural SVM Soft-margin optimization problem: Lemma: The training loss is upper bounded by

Cutting-Plane Algorithm for Structural SVM Input: REPEAT FOR compute IF _ Find most violated constraint Violated by more than? optimize StructSVM over ENDIF ENDFOR UNTIL Add constraint to working set has not changed during iteration Polynomial Time Algorithm (SVM-struct)

Test Accuracy (%) Experiment: Part-of-Speech Tagging Task Given a sequence of words x, predict sequence of tags y. x The dog chased the cat Dependencies from tag-tag transitions in Markov model. Model Markov model with one state per tag and words as emissions Each word described by ~250,000 dimensional feature vector (all word suffixes/prefixes, word length, capitalization ) Experiment (by Dan Fleisher) Train/test on 7966/1700 sentences from Penn Treebank y Det N V Det N 97.00 96.50 96.00 95.50 95.00 94.50 94.00 95.78 Brill (RBT) 95.63 HMM (ACOPOST) 95.02 94.68 95.75 knn (MBT) Tree Tagger SVM Multiclass (SVM-light) 96.49 SVM-HMM (SVM-struct)

NE Identification Identify all named locations, named persons, named organizations, dates, times, monetary amounts, and percentages.

Experiment: Named Entity Recognition Data Spanish Newswire articles 300 training sentences 9 tags no-name, beginning and continuation of person name, organization, location, misc name Output words are described by features (e.g. starts with capital letter, contains number, etc.) Error on test set (% mislabeled tags): Generative HMM: 9.36% Support Vector Machine HMM: 5.08%

General Problem: Predict Complex Outputs Supervised Learning from Examples Find function from input space X to output space Y such that the prediction error is low. Typical Output space is just a single number Classification: -1,+1 Regression: some real number General Predict outputs that are complex objects

Examples of Complex Output Spaces Natural Language Parsing Given a sequence of words x, predict the parse tree y. Dependencies from structural constraints, since y has to be a tree. y S x The dog chased the cat NP VP NP Det N V Det N

Examples of Complex Output Spaces Noun-Phrase Co-reference Given a set of noun phrases x, predict a clustering y. Structural dependencies, since prediction has to be an equivalence relation. Correlation dependencies from interactions. x y The policeman fed the cat. He did not know that he was late. The cat is called Peter. The policeman fed the cat. He did not know that he was late. The cat is called Peter.

Examples of Complex Output Spaces Scene Recognition Given a 3D point cloud with RGB from Kinect camera Segment into volumes Geometric dependencies between segments (e.g. monitor usually close to keyboard)

Wrap-Up

Classification Discriminative Decision Trees Perceptron Linear SVMs Kernel SVMs Generative Multinomial Naïve Bayes Multivariate Naïve Bayes Less Naïve Bayes Linear Discriminant Nearest Neighbor Methods + Theory + Practice Other Methods Logical rule learning Online Learning Logistic Regression Neural Networks RBF Networks Boosting Bagging Parametric (Graphical) Models Non-Parametric Models *-Regression

Structured Prediction Discriminative Structural SVMs Generative Hidden Markov Model Other Methods Maximum Margin Markov Networks Conditional Random Fields Markov Random Fields Bayesian Networks Statistical Relational Learning CS4782 Prob Graphical Models

Unsupervised Learning Clustering Hierarchical Agglomerative Clustering K-Means Mixture of Gaussians and EM-Algorithm Other Methods Spectral Clustering Latent Dirichlet Allocation Latent Semantic Analysis Multi-Dimensional Scaling Other Tasks Outlier Detection Novelty Detection Dimensionality Reduction Non-Linear Manifold Detection CS4850 Math Found for the Information Age

Other Learning Problems and Applications Recommender Systems Reinforcement Learning and Markov Decision Processes CS4758 Robot Learning Computer Vision CS4670 Intro Computer Vision Natural Language Processing CS4740 Intro Natural Language Processing

Other Machine Learning Courses at Cornell CS 4700 Introduction to Artificial Intelligence CS 4780/5780 - Machine Learning CS 4758 - Robot Learning CS 4782 - Probabilistic Graphical Models OR 4740 - Statistical Data Mining CS 6756 - Advanced Topics in Robot Learning: 3D Perception CS 6780 - Advanced Machine Learning CS 6784 - Advanced Topics in Machine Learning ORIE 6740 - Statistical Learning Theory for Data Mining ORIE 6750 - Optimal learning ORIE 6780 - Bayesian Statistics and Data Analysis ORIE 6127 - Computational Issues in Large Scale Data-Driven Models BTRY 6502 - Computationally Intensive Statistical Inference MATH 7740 - Statistical Learning Theory