Lecture 1: Introduction

Size: px

Start display at page:

Download "Lecture 1: Introduction"

Raymond Wilkinson
5 years ago
Views:

1 Lecture 1: Introduction Kai-Wei Chang University of Virginia kw@kwchang.net CS6501- Advanced Machine Learning 1

2 What is this course about? v You ve learned how to make binary and multiclass predictions v But many real-world problems are more complex than that v This course focuses on: v exciting techniques developed in machine learning for problems required to make complex decision CS6501- Advanced Machine Learning 2

3 Machine learning 101 CS6501- Advanced Machine Learning 3

4 CS6501- Advanced Machine Learning 4

5 Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression. CS6501- Advanced Machine Learning 5

6 Classification is generally well-understood v Theoretically: generalization bound v # examples to train a good model v Algorithmically: v Efficient algorithm for large data set v E.g., take a few second to train a linear SVM on data with millions instances and features v Algorithms for non-linear model v E.g., Kernel methods Is this enough to solve all real-world problems? CS6501- Advanced Machine Learning 6

7 m=40, n=10 CS6501- Advanced Machine Learning 7

8 Machine Translation CS6501- Advanced Machine Learning 8

9 Self-driving car CS6501- Advanced Machine Learning 9

10 Reading Comprehension CS6501- Advanced Machine Learning 10

11 Q: [Chris] = [Mr. Robin]? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth Kai-Wei Chang (University of Virginia) 11

As a boy, Chris lived in a pretty home called Cotchfield Farm.

12 Complex Decision Structure Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Kai-Wei Chang (University of Virginia) 12

13 Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 13

Challenges Algorithm 2 is shown to perform a local-optimality guarantee.

He is the for structured prediction typically converge same faster person -- anyway, that you read the E- about imitate a reference

search existing algorithm, theoretical This enables LOLS, guarantees us which to function a boy, by changing Chris lived the in a pretty

Farm. bandits, reference compared a partial policy, to that information but reference.

regret in setting many compared with of the his auxiliary father wrote a poem about many to applications deviations potential from where

in The is each poem suboptimal iteration was printed in a and the goal magazine for others to read. Mr.

USA, has It can been invited also be by the expected Russian to President], improve when [Vladimir the upon reference Putin, the to

President Clinton said that he looks forward policy, We provide unlike a previous new learning to strengthening ties between USA and

14 Challenges Algorithm 2 is shown to perform a local-optimality guarantee. better Can Methods Berg-Kirkpatrick, learning for to learning search ACL to work search even Robin is alive and well. He is the for structured prediction typically converge same faster person -- anyway, that you read the E- about imitate a reference policy, with step changes in the book, the auxiliary Winnie the Pooh. As algorithms. search existing algorithm, theoretical This enables LOLS, guarantees us which to function a boy, by changing Chris lived the in a pretty develop does demonstrating well structured relative low contextual to regret the expected home counts, called so Cotchfield there's no Farm. bandits, reference compared a partial policy, to that information but reference. additionally This point When in finding Chris a local was three maximum years old, structured guarantees is unsatisfactory prediction low regret in setting many compared with of the his auxiliary father wrote a poem about many to applications deviations potential from where applications. the the learned reference function him. policy. in The is each poem suboptimal iteration was printed in a and the goal magazine for others to read. Mr. of learning is to Robin then wrote a book Bill Clinton, recently elected as the President of Consequently, LOLS can the USA, has It can been invited also be by the expected Russian to President], improve when [Vladimir the upon reference Putin, the to reference visit Russia. poor? President Clinton said that he looks forward policy, We provide unlike a previous new learning to strengthening ties between USA and Russia vmodeling challenges Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Structured prediction models vhow to model a complex decision? vrepresentation challenges Deep learning models vhow to extract features? valgorithmic challenges v Large amount of data and complex decision structure Inference / learning algorithms 14

15 This Lecture v Course Overview v The key challenges & solutions (we know so far) v What will you learn from this course? v Course Information CS6501- Advanced Machine Learning 15

Modeling Challenges v How to model a complex decision?

He is the same person that you read about in the book, Winnie the Pooh.

When Chris was three years old, his father wrote a poem about him.

16 Modeling Challenges v How to model a complex decision? v Why this is important? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 16

17 Language is structural CS6501- Advanced Machine Learning 17

18 Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 18

19 Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 19

20 Visual recognition CS6501- Advanced Machine Learning 20

21 Human body recognition CS6501- Advanced Machine Learning 21

22 Structured Prediction Assign values to a set of interdependent output variables Task Input Output Part-of-speech Tagging They operate ships and banks. Pronoun Verb Noun And Noun Dependency Parsing Segmentation They operate ships and banks. Root They operate ships and banks. 22

23 Bridge the gap v Simple classifiers are not designed for handle complex output v Need to make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can Example from Vivek Srikumar CS6501- Advanced Machine Learning 23

24 Make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can v Each part needs a label v Assign tag (V., N., A., ) to each word in the sentence v The decisions are mutually dependent v Cannot have verb followed by a verb v Results are evaluated jointly CS6501- Advanced Machine Learning 24

25 Structured prediction problems v Problems that v have multiple interdependent output variables v and the output assignments are evaluated jointly v Need a joint assignment to all the output variables v We called it joint inference, global infernece or simply inference CS6501- Advanced Machine Learning 25

26 A General learning setting v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb Goal: make joint prediction to minimize a joint loss find h H such that h x Y(X) minimizing E 3,4 ~6 loss y, h x samples x 8, y 8 ~D based on N Kai-Wei Chang (University of Virginia) 26

27 Combinatorial output space v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb # POS tags: 45 How many possible outputs for sentence with 10 words? 45 <= = <D Observation: Not all sequences are valid, and we don t need to consider all of them Kai-Wei Chang (University of Virginia) 27

28 Representation of interdependent output variables v A compact way to represent output combinations v Abstract away unnecessary complexities v We know how to process them v Graph algorithms for linear chain, tree, etc. Pronoun Verb Noun And Noun Root They operate ships and banks. CS6501- Advanced Machine Learning 28

29 A General Formula ye = argmax y Y f(y; w, x) input model parameters output space v Inference/Test: given w, x, solve argmax v Learning/Training: find a good w CS6501- Advanced Machine Learning 29

30 The Input x ye = argmax y Y f(y; w, x) x: representation of the input v Feature extraction: mapping a domain element into a representation v Typically x R 8 or x {0,1} 8 v E.g., bag-of-words v Can be obtained by a (deep) neural network CS6501- Advanced Machine Learning 30

31 The Label Space Y ye = argmax y Y f(y; w, x) Y: label space (output space) v Binary classification: Y = 1,1 v Regression: Y = R v Multi-class classification: Y = {1,2,, K} v Structured prediction: Y = {structured objects} v sequences of labels, parse trees, etc. v represented by multiple variables with constraints CS6501- Advanced Machine Learning 31

32 Algorithms/models for structured prediction v Many learning algorithms can be generalized to the structured case v Perceptron Structured perceptron v SVM Structured SVM v Logistic regression Conditional random field (a.k.a. log-linear models) v Can be solved by a reduction stack v Structured prediction multi-class binary CS6501- Advanced Machine Learning 32

33 Representation Challenges v How to obtain features? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 33

34 Representation Challenges v How to obtain features? 1. Design features based on domain knowledge v E.g., by patterns in parse trees When Chris was three years old, his father wrote a poem about him. v By nicknames Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. v Need human experts/knowledge CS6501- Advanced Machine Learning 34

Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v E.g., use all words, pairs of words, Robin is alive and well.

35 Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v E.g., use all words, pairs of words, Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 35

36 Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v Challenges: v # featuers can be very large v # English words: 171K (Oxford) v # Bigram: 171K Y ~3 10 <=, # trigram? v For some domains, it is hard to design features CS6501- Advanced Machine Learning 36

37 Representation learning v Learn compact representations of features v Combinatorial (continuous representation) CS6501- Advanced Machine Learning 37

38 Representation learning v Learn compact representations of features v Combinatorial (continuous representation) v Hieratical/compositional CS6501- Advanced Machine Learning 38

39 What will learn from this course v Structured prediction v Models / inference/ learning v Representation (deep) learning v Input/output representations v Combining structured models and deep learning CS6501- Advanced Machine Learning 39

40 What to Read? v Machine learning ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ v Natural Language Processing ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL v Computer Vision ICCV, CVPR v Data Mining KDD, ICDM, CIKM, SDM v Artificial Intelligence AAAI, IJCAI, UAI, JAIR CS6501- Advanced Machine Learning 40

41 This course v New course, first time being offered v Comments are welcomed v Designed for first or second year PhD students v Lecture + student presentations v I assume v programming experience (for the final project) v Probability, calculus, and linear algebra (HW0) v basic ML background: AI, ML, NLP, CV CS6501- Advanced Machine Learning 41

42 Staff v Instructor: Kai-Wei Chang v ml16@kwchang.net v Office: R412 Rice Hall v Office hour: 14:00 15:00, Tue. v TAs (Office@R432 Rice Hall): v Wasi Ahmad, wua4nw@virginia.edu v Supplementary session: 17:00-18:00 Tue v Md Rizwan Parvez, mp5eb@virginia.edu v TA hour: 14: :00, Thu CS6501- Advanced Machine Learning 42

43 Grading (tentative) v Lectures & forum v Participate in discussion (bonus credits) v Review quizzes (30%): 3 review quizzes v Homework sets (15%): 3 homework sets v Paper presentation (15%) v Final project (40%) v No rounding/ceiling on final scores CS6501- Advanced Machine Learning 43

44 Quizzes v Format v Multiple choice questions v Fill-in-the-blank v Short answer questions v Each quiz: ~30 min in class v Schedule: see course website v Closed book, Closed notes, Closed laptop CS6501- Advanced Machine Learning 44

45 Homework set v Format: v Math problems v Programming problems v Schedule: see course website CS6501- Advanced Machine Learning 45

46 Paper presentation v Each group has 2~3 students v Picked one slot at: v Register your choice early v 25~30 min presentation + Q&A v Will be graded by the instructor, TA, other students v Start from 2/1 CS6501- Advanced Machine Learning 46

47 Final Project v Work in groups (2~3 students) v Project proposal v Written report, 2 page maximum v Project report v < 8 pages, NIPS format v Due 2 days before the final presentation v Project presentation (15%) v ~ 5-min in-class presentation CS6501- Advanced Machine Learning 47

48 No idea? CS6501- Advanced Machine Learning 48

49 Typical project topics v New idea/model for a well-known problem v New application for a model v Implementation of an old idea v Reproduce results of a paper v Implement algorithms using a different framework/programming language v Contact me if you want some ideas CS6501- Advanced Machine Learning 49

50 Late Policy v Credit of 48 hours for all the assignments v Including proposal and final project v No accumulation v No more grace period v No make-up exam & late homework v unless under emergency situation CS6501- Advanced Machine Learning 50

51 Cheating/Plagiarism v No. Ask if you have concerns v UVA Honor Code: CS6501- Advanced Machine Learning 51

52 Lectures and office hours v Participation is highly appreciated! v Ask questions of anything v Feedback is welcomed v Lead discussion in this class v Enroll Piazza CS6501- Advanced Machine Learning 52

53 Waiting list v Start attending the first few meetings of the class as if you are registered. Given that some students will drop the class, some space will free up. CS6501- Advanced Machine Learning 53

(Sub)Gradient Descent

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include