Lecture 1: Introduction

Similar documents
(Sub)Gradient Descent

CSL465/603 - Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS Machine Learning

Lecture 1: Machine Learning Basics

Discriminative Learning of Beam-Search Heuristics for Planning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Top US Tech Talent for the Top China Tech Company

Switchboard Language Model Improvement with Conversational Data from Gigaword

The stages of event extraction

CS 446: Machine Learning

Learning From the Past with Experiment Databases

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

Welcome to. ECML/PKDD 2004 Community meeting

Rule Learning With Negation: Issues Regarding Effectiveness

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

INTERMEDIATE ALGEBRA Course Syllabus

Beyond the Pipeline: Discrete Optimization in NLP

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Grammars & Parsing, Part 1:

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Pre-AP Geometry Course Syllabus Page 1

Linking Task: Identifying authors and book titles in verbose queries

Office Hours: Mon & Fri 10:00-12:00. Course Description

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Human Emotion Recognition From Speech

Assignment 1: Predicting Amazon Review Ratings

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Ensemble Technique Utilization for Indonesian Dependency Parser

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Online Updating of Word Representations for Part-of-Speech Tagging

MGT/MGP/MGB 261: Investment Analysis

Rule Learning with Negation: Issues Regarding Effectiveness

Course Content Concepts

Reducing Features to Improve Bug Prediction

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Second Exam: Natural Language Parsing with Neural Networks

Data Structures and Algorithms

Axiom 2013 Team Description Paper

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Model Ensemble for Click Prediction in Bing Search Ads

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

An investigation of imitation learning algorithms for structured prediction

Probabilistic Latent Semantic Analysis

Using dialogue context to improve parsing performance in dialogue systems

AQUA: An Ontology-Driven Question Answering System

Context Free Grammars. Many slides from Michael Collins

CS Course Missive

Multi-Lingual Text Leveling

Indian Institute of Technology, Kanpur

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Speech Emotion Recognition Using Support Vector Machine

arxiv: v1 [cs.cl] 2 Apr 2017

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

BYLINE [Heng Ji, Computer Science Department, New York University,

A Case Study: News Classification Based on Term Frequency

Memory-based grammatical error correction

Artificial Neural Networks written examination

Applications of memory-based natural language processing

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Semi-Supervised Face Detection

arxiv: v1 [cs.lg] 15 Jun 2015

Laboratorio di Intelligenza Artificiale e Robotica

Learning Methods for Fuzzy Systems

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Extracting Verb Expressions Implying Negative Opinions

Event on Teaching Assignments October 7, 2015

Word Segmentation of Off-line Handwritten Documents

CALCULUS III MATH

Prediction of Maximal Projection for Semantic Role Labeling

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A Comparison of Two Text Representations for Sentiment Analysis

Reinforcement Learning by Comparing Immediate Reward

A Case-Based Approach To Imitation Learning in Robotic Agents

Multivariate k-nearest Neighbor Regression for Time Series data -

Transcription:

Lecture 1: Introduction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net CS6501- Advanced Machine Learning 1

What is this course about? v You ve learned how to make binary and multiclass predictions v But many real-world problems are more complex than that v This course focuses on: v exciting techniques developed in machine learning for problems required to make complex decision CS6501- Advanced Machine Learning 2

Machine learning 101 CS6501- Advanced Machine Learning 3

CS6501- Advanced Machine Learning 4

Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression. CS6501- Advanced Machine Learning 5

Classification is generally well-understood v Theoretically: generalization bound v # examples to train a good model v Algorithmically: v Efficient algorithm for large data set v E.g., take a few second to train a linear SVM on data with millions instances and features v Algorithms for non-linear model v E.g., Kernel methods Is this enough to solve all real-world problems? CS6501- Advanced Machine Learning 6

m=40, n=10 CS6501- Advanced Machine Learning 7

Machine Translation CS6501- Advanced Machine Learning 8

Self-driving car CS6501- Advanced Machine Learning 9

Reading Comprehension CS6501- Advanced Machine Learning 10

Q: [Chris] = [Mr. Robin]? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth Kai-Wei Chang (University of Virginia) 11

Complex Decision Structure Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Kai-Wei Chang (University of Virginia) 12

Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 13

Challenges Algorithm 2 is shown to perform a local-optimality guarantee. better Can Methods Berg-Kirkpatrick, learning for to learning search ACL to work search even Robin is alive and well. He is the for structured prediction typically converge same faster person -- anyway, that you read the E- about imitate a reference policy, with step changes in the book, the auxiliary Winnie the Pooh. As algorithms. search existing algorithm, theoretical This enables LOLS, guarantees us which to function a boy, by changing Chris lived the in a pretty develop does demonstrating well structured relative low contextual to regret the expected home counts, called so Cotchfield there's no Farm. bandits, reference compared a partial policy, to that information but reference. additionally This point When in finding Chris a local was three maximum years old, structured guarantees is unsatisfactory prediction low regret in setting many compared with of the his auxiliary father wrote a poem about many to applications deviations potential from where applications. the the learned reference function him. policy. in The is each poem suboptimal iteration was printed in a and the goal magazine for others to read. Mr. of learning is to Robin then wrote a book Bill Clinton, recently elected as the President of Consequently, LOLS can the 2010. USA, has It can been invited also be by the expected Russian to President], improve when [Vladimir the upon reference Putin, the to reference visit Russia. poor? President Clinton said that he looks forward policy, We provide unlike a previous new learning to strengthening ties between USA and Russia vmodeling challenges Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Structured prediction models vhow to model a complex decision? vrepresentation challenges Deep learning models vhow to extract features? valgorithmic challenges v Large amount of data and complex decision structure Inference / learning algorithms 14

This Lecture v Course Overview v The key challenges & solutions (we know so far) v What will you learn from this course? v Course Information CS6501- Advanced Machine Learning 15

Modeling Challenges v How to model a complex decision? v Why this is important? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 16

Language is structural CS6501- Advanced Machine Learning 17

Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 18

Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 19

Visual recognition CS6501- Advanced Machine Learning 20

Human body recognition CS6501- Advanced Machine Learning 21

Structured Prediction Assign values to a set of interdependent output variables Task Input Output Part-of-speech Tagging They operate ships and banks. Pronoun Verb Noun And Noun Dependency Parsing Segmentation They operate ships and banks. Root They operate ships and banks. 22

Bridge the gap v Simple classifiers are not designed for handle complex output v Need to make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can Example from Vivek Srikumar CS6501- Advanced Machine Learning 23

Make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can v Each part needs a label v Assign tag (V., N., A., ) to each word in the sentence v The decisions are mutually dependent v Cannot have verb followed by a verb v Results are evaluated jointly CS6501- Advanced Machine Learning 24

Structured prediction problems v Problems that v have multiple interdependent output variables v and the output assignments are evaluated jointly v Need a joint assignment to all the output variables v We called it joint inference, global infernece or simply inference CS6501- Advanced Machine Learning 25

A General learning setting v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb Goal: make joint prediction to minimize a joint loss find h H such that h x Y(X) minimizing E 3,4 ~6 loss y, h x samples x 8, y 8 ~D based on N Kai-Wei Chang (University of Virginia) 26

Combinatorial output space v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb # POS tags: 45 How many possible outputs for sentence with 10 words? 45 <= = 3.4 10 <D Observation: Not all sequences are valid, and we don t need to consider all of them Kai-Wei Chang (University of Virginia) 27

Representation of interdependent output variables v A compact way to represent output combinations v Abstract away unnecessary complexities v We know how to process them v Graph algorithms for linear chain, tree, etc. Pronoun Verb Noun And Noun Root They operate ships and banks. CS6501- Advanced Machine Learning 28

A General Formula ye = argmax y Y f(y; w, x) input model parameters output space v Inference/Test: given w, x, solve argmax v Learning/Training: find a good w CS6501- Advanced Machine Learning 29

The Input x ye = argmax y Y f(y; w, x) x: representation of the input v Feature extraction: mapping a domain element into a representation v Typically x R 8 or x {0,1} 8 v E.g., bag-of-words v Can be obtained by a (deep) neural network CS6501- Advanced Machine Learning 30

The Label Space Y ye = argmax y Y f(y; w, x) Y: label space (output space) v Binary classification: Y = 1,1 v Regression: Y = R v Multi-class classification: Y = {1,2,, K} v Structured prediction: Y = {structured objects} v sequences of labels, parse trees, etc. v represented by multiple variables with constraints CS6501- Advanced Machine Learning 31

Algorithms/models for structured prediction v Many learning algorithms can be generalized to the structured case v Perceptron Structured perceptron v SVM Structured SVM v Logistic regression Conditional random field (a.k.a. log-linear models) v Can be solved by a reduction stack v Structured prediction multi-class binary CS6501- Advanced Machine Learning 32

Representation Challenges v How to obtain features? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 33

Representation Challenges v How to obtain features? 1. Design features based on domain knowledge v E.g., by patterns in parse trees When Chris was three years old, his father wrote a poem about him. v By nicknames Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. v Need human experts/knowledge CS6501- Advanced Machine Learning 34

Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v E.g., use all words, pairs of words, Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 35

Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v Challenges: v # featuers can be very large v # English words: 171K (Oxford) v # Bigram: 171K Y ~3 10 <=, # trigram? v For some domains, it is hard to design features CS6501- Advanced Machine Learning 36

Representation learning v Learn compact representations of features v Combinatorial (continuous representation) CS6501- Advanced Machine Learning 37

Representation learning v Learn compact representations of features v Combinatorial (continuous representation) v Hieratical/compositional CS6501- Advanced Machine Learning 38

What will learn from this course v Structured prediction v Models / inference/ learning v Representation (deep) learning v Input/output representations v Combining structured models and deep learning CS6501- Advanced Machine Learning 39

What to Read? v Machine learning ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ v Natural Language Processing ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL v Computer Vision ICCV, CVPR v Data Mining KDD, ICDM, CIKM, SDM v Artificial Intelligence AAAI, IJCAI, UAI, JAIR CS6501- Advanced Machine Learning 40

This course v New course, first time being offered v Comments are welcomed v Designed for first or second year PhD students v Lecture + student presentations v I assume v programming experience (for the final project) v Probability, calculus, and linear algebra (HW0) v basic ML background: AI, ML, NLP, CV CS6501- Advanced Machine Learning 41

Staff v Instructor: Kai-Wei Chang v Email: ml16@kwchang.net v Office: R412 Rice Hall v Office hour: 14:00 15:00, Tue. v TAs (Office@R432 Rice Hall): v Wasi Ahmad, wua4nw@virginia.edu v Supplementary session: 17:00-18:00 Tue v Md Rizwan Parvez, mp5eb@virginia.edu v TA hour: 14:00 -- 15:00, Thu CS6501- Advanced Machine Learning 42

Grading (tentative) v Lectures & forum v Participate in discussion (bonus credits) v Review quizzes (30%): 3 review quizzes v Homework sets (15%): 3 homework sets v Paper presentation (15%) v Final project (40%) v No rounding/ceiling on final scores CS6501- Advanced Machine Learning 43

Quizzes v Format v Multiple choice questions v Fill-in-the-blank v Short answer questions v Each quiz: ~30 min in class v Schedule: see course website v Closed book, Closed notes, Closed laptop CS6501- Advanced Machine Learning 44

Homework set v Format: v Math problems v Programming problems v Schedule: see course website CS6501- Advanced Machine Learning 45

Paper presentation v Each group has 2~3 students v Picked one slot at: v Register your choice early https://goo.gl/usy5ta v 25~30 min presentation + Q&A v Will be graded by the instructor, TA, other students v Start from 2/1 CS6501- Advanced Machine Learning 46

Final Project v Work in groups (2~3 students) v Project proposal v Written report, 2 page maximum v Project report v < 8 pages, NIPS format v Due 2 days before the final presentation v Project presentation (15%) v ~ 5-min in-class presentation CS6501- Advanced Machine Learning 47

No idea? CS6501- Advanced Machine Learning 48

Typical project topics v New idea/model for a well-known problem v New application for a model v Implementation of an old idea v Reproduce results of a paper v Implement algorithms using a different framework/programming language v Contact me if you want some ideas CS6501- Advanced Machine Learning 49

Late Policy v Credit of 48 hours for all the assignments v Including proposal and final project v No accumulation v No more grace period v No make-up exam & late homework v unless under emergency situation CS6501- Advanced Machine Learning 50

Cheating/Plagiarism v No. Ask if you have concerns v UVA Honor Code: http://www.virginia.edu/honor/ CS6501- Advanced Machine Learning 51

Lectures and office hours v Participation is highly appreciated! v Ask questions of anything v Feedback is welcomed v Lead discussion in this class v Enroll Piazza https://piazza.com/virginia/fall2017/cs6501001 CS6501- Advanced Machine Learning 52

Waiting list v Start attending the first few meetings of the class as if you are registered. Given that some students will drop the class, some space will free up. CS6501- Advanced Machine Learning 53