Introduction to Machine Learning

Outline Introduction to Machine Learning Course Logistics Varun Chandola January 31, 018 1 Class Details Class Details Lecture Information Monday, Wednesday, Friday (9.00-9.50 AM) 109 Knox Recitations 1. 10.00-10.50 AM Monday, Norton 10. 01.00-01.50 PM Tuesday, Bell 337 3. 08.00-08.50 AM Friday, Cooke 17a Contents 1 Class Details Syllabus 3 3 Textbooks 4 4 Grading 5 5 Gradiance 6 6 Python 7 7 Socrative Online 7 8 Honor Code 8 9 Checklist and Resources 8 10 Warmup 9 Recitation topics are listed in the syllabus No recitation this week. Class web page http://www.cse.buffalo.edu/~chandola/machinelearning.html https://piazza.com/buffalo/spring018/cse474/home Instructor Varun Chandola http://www.cse.buffalo.edu/~chandola Email: chandola@buffalo.edu Office: 304 Davis Hall Phone: (716) 645-4747 Office Hours: 1.00 PM - 3.00 PM (Mondays)

Teaching Assistants Xin Ma Email: xma4@buffalo.edu Office Hours: 10.00 AM - 11.00 AM (Fridays) Rudra Prasad Bakshi Email: rudrapra@buffalo.edu Office Hours: 11.30 AM - 1.30 PM (Wednesdays) Hongfei Xue Email: hongfeix@buffalo.edu Office Hours: 4.00 PM - 5.00 PM (Mondays) Piazza Primary medium of communication All announcements, teaching notes, slides, polls, etc. available through Piazza. Questions? 1. General post to all (Name will be visible). Choose appropriate folder.. Private post to instructor, TA. Interact. Syllabus Theoretical Machine Learning Concept Learning Mistake Bound Online Learning 3 will be made Vapnik-Chervonenkis Dimension PAC Learning Statistical Learning Theory Machine Learning Tools Bayesian Inference Expectation Maximization Optimization Machine Learning Algorithms Linear Regression Linear Classification Neural Networks Support Vector Machines Kernel Methods Latent Space Models (PCA) Mixture of Models Bayesian Networks 3 Textbooks Textbooks No prescribed text Primary references Tom Mitchell, Machine Learning. McGraw-Hill, 1997. Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 01. 4

Optional reading list David Mackay, Information Theory, Inference, and Learning Algorithms, Cambridge Press, 003. http://www.inference.phy.cam.ac.uk/mackay/itila/book. html Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning. Springer, 009. Chris Bishop, Pattern Recognition and Machine Learning, Springer, 006. Richard Duda, Peter Hart and David Stork, Pattern Classification, nd ed. John Wiley & Sons, 001. David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 01. 4 Grading http://web4.cs.ucl.ac.uk/staff/d.barber/textbook/09113. pdf Grading Scheme Short weekly quizzes using Gradiance (1) 0% Programming Assignments (3) 30% Mid-term Exam (in-class, open book/notes) on 03/16/018 0% Final Exam (in-class, open book/notes) on 05/16/018 30% All components will be individually curved Final grade: A [9.5, 100] A- [87.5, 9.5) B+ [8.5, 87.5) B [77.5, 8.5) Use UBLearns for all electronic submissions B- [7.5, 77.5) C+ [67.5, 7.5) C [6.5, 67.5) C- [57.5, 6.5) Final exam will not be comprehensive All multi-choice objective problems No partial credit 5 Gradiance An online quiz system One quiz per week released on Monday by 8.59 AM and due next Sunday by 11.59 PM 3-4 multiple choice problems about topics covered that week A warm up quiz (ungraded) is posted 5-minute delay between successive submissions Only 3 tries allowed, maximum score will be used Every wrong answer will result in 1 negative point per try Gradiance Enrollment Go to http://www.newgradiance.com/services Register and use the class token FC4761F5 Make sure you register using your UBIT name as the username No other username will be accepted All material covered in class 5 6

6 Python All programming assigments and class demonstrations using Python Resources: Github Repo Installing python, ipython Python IDE - Canopy More about ipython notebooks Python for Developers, a complete book on Python programming by Ricardo Duarte CodeAmerica - Python An introduction to machine learning with Python and scikit-learn (repo and overview) by Hannes Schulz and Andreas Mueller https://github.com/ubdsgroup/ubmlcourse http://nbviewer.ipython.org/github/ubdsgroup/ubmlcourse/tree/ master/notebooks/ 7 Socrative Online Online student response system Random number generator! http://m.socrative.com/student/ Enter class ID - 5943 Optional 8 Honor Code Academic Integrity and Honor Code http://www.cse.buffalo.edu/shared/policies/academic.php Against the ML honor code to: 1. Collaborate on Gradiance quizzes. Collaborate or cheat during exams 3. Submit someone else s work, including from the internet, as one s own for any submission 4. Misuse Piazza forum You are allowed to: 1. Have discussions about homeworks. Every student should submit own homework with names of students in the discussion group explicitly mentioned.. Collaborate in groups of or 3 for programming assignments. One submission is required for each group. Violation of ML honor code and departmental policy will result in an automatic F for the concerned submission Two violations fail grade in the course 9 Checklist and Resources Checklist and Resources 1. Sign-up for Piazza. Sign-up for Gradiance, try warm-up quiz 3. Read the department s academic integrity policy Resources 7 8

Piazza - piazza.com/buffalo/spring018/cse474/home Video Channel - TBA Course slides and handouts - www.cse.buffalo.edu/~chandola/machinelearning. html Github Repo - github.com/ubdsgroup/ubmlcourse Notebooks - nbviewer.ipython.org/github/ubdsgroup/ubmlcourse/ tree/master/ 10 Warmup A fair coin Probability of heads? 5 heads in a row? 5 th head after seen 4 heads in a row? Gambler s Fallacy If I know that probability of two people bringing a bomb on a plane is very low, should I bring a bomb along to make myself safer? Assuming that the coin is fair (i.e., probability of observing heads is 0.5) and the tosses are independent, the probability P (fifthhead fourheads) will be 0.5 because the event of observing a heads is independent of what has been observed so far. A simple application of Bayes rule will reveal the same answer. P (fifthhead fourheads) = P (fourheads fifthhead)p (fifthhead) P (fourheads) Note that the event fourheads is clearly independent of the event fifthhead because it occurs before, which means that P (fourheads fifthhead) = P (fourheads). Hence P (fifthhead fourheads) = P (fifthhead) = 0.5. 9 Gambler s fallacy says (mistakenly) that if an event happens more frequently in the present, then the chances of it happening later will decrease, and vice versa. Consider a different game in which winning means getting at least 1 head in 4 tosses. In the beginning the probability of winning is: ( ) 4 1 1 = 93.75% Now if we toss a tail in the first trial, will my winning probability stay the same or change, and would it increase or decrease? According to the Gambler s Fallacy it should increase. However actually the probability of winning will get revised to: ( ) 3 1 1 = 87.5% So actually by getting a tails in the first toss, we lower our probability of winning by over 6%. Matrix Vector Products Let [3, 4] denote a vector in a D space Multiply with a number? Multiply with a matrix? [ ] 3 4 [ ] [ ] 1 3 3 4 For a matrix, find a vector such that matrix-vector product scalarvector product. For a given matrix A, we are interested in finding a vector x such that: Ax = λx where λ is a scalar. The solution is the set of Eigenvectors. 10