Lecture 1: Introduction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net CS6501- Advanced Machine Learning 1
What is this course about? v You ve learned how to make binary and multiclass predictions v But many real-world problems are more complex than that v This course focuses on: v exciting techniques developed in machine learning for problems required to make complex decision CS6501- Advanced Machine Learning 2
Machine learning 101 CS6501- Advanced Machine Learning 3
CS6501- Advanced Machine Learning 4
Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression. CS6501- Advanced Machine Learning 5
Classification is generally well-understood v Theoretically: generalization bound v # examples to train a good model v Algorithmically: v Efficient algorithm for large data set v E.g., take a few second to train a linear SVM on data with millions instances and features v Algorithms for non-linear model v E.g., Kernel methods Is this enough to solve all real-world problems? CS6501- Advanced Machine Learning 6
m=40, n=10 CS6501- Advanced Machine Learning 7
Machine Translation CS6501- Advanced Machine Learning 8
Self-driving car CS6501- Advanced Machine Learning 9
Reading Comprehension CS6501- Advanced Machine Learning 10
Q: [Chris] = [Mr. Robin]? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth Kai-Wei Chang (University of Virginia) 11
Complex Decision Structure Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Kai-Wei Chang (University of Virginia) 12
Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 13
Challenges Algorithm 2 is shown to perform a local-optimality guarantee. better Can Methods Berg-Kirkpatrick, learning for to learning search ACL to work search even Robin is alive and well. He is the for structured prediction typically converge same faster person -- anyway, that you read the E- about imitate a reference policy, with step changes in the book, the auxiliary Winnie the Pooh. As algorithms. search existing algorithm, theoretical This enables LOLS, guarantees us which to function a boy, by changing Chris lived the in a pretty develop does demonstrating well structured relative low contextual to regret the expected home counts, called so Cotchfield there's no Farm. bandits, reference compared a partial policy, to that information but reference. additionally This point When in finding Chris a local was three maximum years old, structured guarantees is unsatisfactory prediction low regret in setting many compared with of the his auxiliary father wrote a poem about many to applications deviations potential from where applications. the the learned reference function him. policy. in The is each poem suboptimal iteration was printed in a and the goal magazine for others to read. Mr. of learning is to Robin then wrote a book Bill Clinton, recently elected as the President of Consequently, LOLS can the 2010. USA, has It can been invited also be by the expected Russian to President], improve when [Vladimir the upon reference Putin, the to reference visit Russia. poor? President Clinton said that he looks forward policy, We provide unlike a previous new learning to strengthening ties between USA and Russia vmodeling challenges Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Structured prediction models vhow to model a complex decision? vrepresentation challenges Deep learning models vhow to extract features? valgorithmic challenges v Large amount of data and complex decision structure Inference / learning algorithms 14
This Lecture v Course Overview v The key challenges & solutions (we know so far) v What will you learn from this course? v Course Information CS6501- Advanced Machine Learning 15
Modeling Challenges v How to model a complex decision? v Why this is important? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 16
Language is structural CS6501- Advanced Machine Learning 17
Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 18
Hand written recognition v What is this letter? CS6501- Advanced Machine Learning 19
Visual recognition CS6501- Advanced Machine Learning 20
Human body recognition CS6501- Advanced Machine Learning 21
Structured Prediction Assign values to a set of interdependent output variables Task Input Output Part-of-speech Tagging They operate ships and banks. Pronoun Verb Noun And Noun Dependency Parsing Segmentation They operate ships and banks. Root They operate ships and banks. 22
Bridge the gap v Simple classifiers are not designed for handle complex output v Need to make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can Example from Vivek Srikumar CS6501- Advanced Machine Learning 23
Make multiple decisions jointly v Example: POS tagging: can you can a can as a canner can can a can v Each part needs a label v Assign tag (V., N., A., ) to each word in the sentence v The decisions are mutually dependent v Cannot have verb followed by a verb v Results are evaluated jointly CS6501- Advanced Machine Learning 24
Structured prediction problems v Problems that v have multiple interdependent output variables v and the output assignments are evaluated jointly v Need a joint assignment to all the output variables v We called it joint inference, global infernece or simply inference CS6501- Advanced Machine Learning 25
A General learning setting v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb Goal: make joint prediction to minimize a joint loss find h H such that h x Y(X) minimizing E 3,4 ~6 loss y, h x samples x 8, y 8 ~D based on N Kai-Wei Chang (University of Virginia) 26
Combinatorial output space v Input: x X v Truth: y Y(x) v Predicted: h(x) Y(x) v Loss: loss y, y I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb # POS tags: 45 How many possible outputs for sentence with 10 words? 45 <= = 3.4 10 <D Observation: Not all sequences are valid, and we don t need to consider all of them Kai-Wei Chang (University of Virginia) 27
Representation of interdependent output variables v A compact way to represent output combinations v Abstract away unnecessary complexities v We know how to process them v Graph algorithms for linear chain, tree, etc. Pronoun Verb Noun And Noun Root They operate ships and banks. CS6501- Advanced Machine Learning 28
A General Formula ye = argmax y Y f(y; w, x) input model parameters output space v Inference/Test: given w, x, solve argmax v Learning/Training: find a good w CS6501- Advanced Machine Learning 29
The Input x ye = argmax y Y f(y; w, x) x: representation of the input v Feature extraction: mapping a domain element into a representation v Typically x R 8 or x {0,1} 8 v E.g., bag-of-words v Can be obtained by a (deep) neural network CS6501- Advanced Machine Learning 30
The Label Space Y ye = argmax y Y f(y; w, x) Y: label space (output space) v Binary classification: Y = 1,1 v Regression: Y = R v Multi-class classification: Y = {1,2,, K} v Structured prediction: Y = {structured objects} v sequences of labels, parse trees, etc. v represented by multiple variables with constraints CS6501- Advanced Machine Learning 31
Algorithms/models for structured prediction v Many learning algorithms can be generalized to the structured case v Perceptron Structured perceptron v SVM Structured SVM v Logistic regression Conditional random field (a.k.a. log-linear models) v Can be solved by a reduction stack v Structured prediction multi-class binary CS6501- Advanced Machine Learning 32
Representation Challenges v How to obtain features? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 33
Representation Challenges v How to obtain features? 1. Design features based on domain knowledge v E.g., by patterns in parse trees When Chris was three years old, his father wrote a poem about him. v By nicknames Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. v Need human experts/knowledge CS6501- Advanced Machine Learning 34
Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v E.g., use all words, pairs of words, Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 35
Representation Challenges v How to obtain features? 1. Design features based on domain knowledge 2. Design feature templates and then let machine find the right ones v Challenges: v # featuers can be very large v # English words: 171K (Oxford) v # Bigram: 171K Y ~3 10 <=, # trigram? v For some domains, it is hard to design features CS6501- Advanced Machine Learning 36
Representation learning v Learn compact representations of features v Combinatorial (continuous representation) CS6501- Advanced Machine Learning 37
Representation learning v Learn compact representations of features v Combinatorial (continuous representation) v Hieratical/compositional CS6501- Advanced Machine Learning 38
What will learn from this course v Structured prediction v Models / inference/ learning v Representation (deep) learning v Input/output representations v Combining structured models and deep learning CS6501- Advanced Machine Learning 39
What to Read? v Machine learning ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ v Natural Language Processing ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL v Computer Vision ICCV, CVPR v Data Mining KDD, ICDM, CIKM, SDM v Artificial Intelligence AAAI, IJCAI, UAI, JAIR CS6501- Advanced Machine Learning 40
This course v New course, first time being offered v Comments are welcomed v Designed for first or second year PhD students v Lecture + student presentations v I assume v programming experience (for the final project) v Probability, calculus, and linear algebra (HW0) v basic ML background: AI, ML, NLP, CV CS6501- Advanced Machine Learning 41
Staff v Instructor: Kai-Wei Chang v Email: ml16@kwchang.net v Office: R412 Rice Hall v Office hour: 14:00 15:00, Tue. v TAs (Office@R432 Rice Hall): v Wasi Ahmad, wua4nw@virginia.edu v Supplementary session: 17:00-18:00 Tue v Md Rizwan Parvez, mp5eb@virginia.edu v TA hour: 14:00 -- 15:00, Thu CS6501- Advanced Machine Learning 42
Grading (tentative) v Lectures & forum v Participate in discussion (bonus credits) v Review quizzes (30%): 3 review quizzes v Homework sets (15%): 3 homework sets v Paper presentation (15%) v Final project (40%) v No rounding/ceiling on final scores CS6501- Advanced Machine Learning 43
Quizzes v Format v Multiple choice questions v Fill-in-the-blank v Short answer questions v Each quiz: ~30 min in class v Schedule: see course website v Closed book, Closed notes, Closed laptop CS6501- Advanced Machine Learning 44
Homework set v Format: v Math problems v Programming problems v Schedule: see course website CS6501- Advanced Machine Learning 45
Paper presentation v Each group has 2~3 students v Picked one slot at: v Register your choice early https://goo.gl/usy5ta v 25~30 min presentation + Q&A v Will be graded by the instructor, TA, other students v Start from 2/1 CS6501- Advanced Machine Learning 46
Final Project v Work in groups (2~3 students) v Project proposal v Written report, 2 page maximum v Project report v < 8 pages, NIPS format v Due 2 days before the final presentation v Project presentation (15%) v ~ 5-min in-class presentation CS6501- Advanced Machine Learning 47
No idea? CS6501- Advanced Machine Learning 48
Typical project topics v New idea/model for a well-known problem v New application for a model v Implementation of an old idea v Reproduce results of a paper v Implement algorithms using a different framework/programming language v Contact me if you want some ideas CS6501- Advanced Machine Learning 49
Late Policy v Credit of 48 hours for all the assignments v Including proposal and final project v No accumulation v No more grace period v No make-up exam & late homework v unless under emergency situation CS6501- Advanced Machine Learning 50
Cheating/Plagiarism v No. Ask if you have concerns v UVA Honor Code: http://www.virginia.edu/honor/ CS6501- Advanced Machine Learning 51
Lectures and office hours v Participation is highly appreciated! v Ask questions of anything v Feedback is welcomed v Lead discussion in this class v Enroll Piazza https://piazza.com/virginia/fall2017/cs6501001 CS6501- Advanced Machine Learning 52
Waiting list v Start attending the first few meetings of the class as if you are registered. Given that some students will drop the class, some space will free up. CS6501- Advanced Machine Learning 53