Introduction to Machine Learning

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Python Machine Learning

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

CS 100: Principles of Computing

Welcome to. ECML/PKDD 2004 Community meeting

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Android App Development for Beginners

Lecture 1: Basic Concepts of Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

ACCOUNTING FOR MANAGERS BU-5190-OL Syllabus

Houghton Mifflin Online Assessment System Walkthrough Guide

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Axiom 2013 Team Description Paper

An Introduction to Simio for Beginners

Laboratorio di Intelligenza Artificiale e Robotica

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Valcik, N. A., & Tracy, P. E. (2013). Case studies in disaster response and emergency management. Boca Raton, FL: CRC Press.

CS Course Missive

CS Machine Learning

The Enterprise Knowledge Portal: The Concept

Strategic Management (MBA 800-AE) Fall 2010

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Outreach Connect User Manual

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Laboratorio di Intelligenza Artificiale e Robotica

Assignment 1: Predicting Amazon Review Ratings

Photography: Photojournalism and Digital Media Jim Lang/B , extension 3069 Course Descriptions

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

COMM370, Social Media Advertising Fall 2017

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Lecture 1: Machine Learning Basics

Business Computer Applications CGS 1100 Course Syllabus. Course Title: Course / Prefix Number CGS Business Computer Applications

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Bitstrips for Schools: A How-To Guide

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

MGT/MGP/MGB 261: Investment Analysis

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Social Media Marketing BUS COURSE OUTLINE

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Designing for Visualization & Communication

Navigating the PhD Options in CMS


BUSINESS OCR LEVEL 2 CAMBRIDGE TECHNICAL. Cambridge TECHNICALS BUSINESS ONLINE CERTIFICATE/DIPLOMA IN R/502/5326 LEVEL 2 UNIT 11

ACCOUNTING FOR MANAGERS BU-5190-AU7 Syllabus

LEARNER VARIABILITY AND UNIVERSAL DESIGN FOR LEARNING

MYCIN. The MYCIN Task

MTH 215: Introduction to Linear Algebra

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

BIOL 2421 Microbiology Course Syllabus:

McKendree University School of Education Methods of Teaching Elementary Language Arts EDU 445/545-(W) (3 Credit Hours) Fall 2011

FINN FINANCIAL MANAGEMENT Spring 2014

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

One Hour of Code 10 million students, A foundation for success

Education for an Information Age

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

The Creation and Significance of Study Resources intheformofvideos

A Case Study: News Classification Based on Term Frequency

COVER SHEET. This is the author version of article published as:

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Learning Microsoft Office Excel

University of Florida ADV 3502, Section 1B21 Advertising Sales Fall 2017

Ruggiero, V. R. (2015). The art of thinking: A guide to critical and creative thought (11th ed.). New York, NY: Longman.

Reducing Features to Improve Bug Prediction

Introduction to Psychology

USC MARSHALL SCHOOL OF BUSINESS

Blackboard Communication Tools

Probabilistic Latent Semantic Analysis

STUDENT MOODLE ORIENTATION

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Learning Methods for Fuzzy Systems

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor.

Machine Learning and Development Policy

Speak Up 2012 Grades 9 12

Running head: THE INTERACTIVITY EFFECT IN MULTIMEDIA LEARNING 1

Orange Coast College Spanish 180 T, Th Syllabus. Instructor: Jeff Brown

Introduction to Personality Daily 11:00 11:50am

BOS 3001, Fundamentals of Occupational Safety and Health Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes.

Transcription:

1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer, Jyrki Kivinen, and Teemu Roos) November 1st December 14th 2018

2, Introduction Practical details of the course Lectures Exercises Exam Grading Course outline What is machine learning? Motivation & examples Definition Relation to other fields Examples

3, Practical details (1) Lectures: November 1st (today) December 14th Thursdays at 2pm-4pm (Physicum D101) and Fridays at 10am-12pm (Exactum A111) Lecturer: Antti Ukkonen (Exactum A341, antti.ukkonen@helsinki.fi) Language: English Based on the course textbook (next slide)

4, Practical details (2) Textbook: authors: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani title: An Introduction to Statistical Learning with Applications in R publisher: Springer (2013, first edition) web page: www-bcf.usc.edu/~gareth/isl/ we ll cover the whole book except splines and generalized additive models (GAMs) and include some additional Bayesian stuff

5, Practical details (3) Lecture material this set of slides (by Hoyer/Kivinen/Roos/Ukkonen) is intended for use as part of the actual lectures, together with the blackboard etc. we will cover some topics in more detail than the textbook (and some less) in particular some additional detail is needed for homework problems both the selected parts of the textbook as well as additional material indicated on the course homepage are required material for the exam

6, Practical details (4) Exercises: Two kinds: mathematical exercises (pencil-and-paper) computer exercises (support given in R but Python is a good choice too) Problem set handed out every Friday, focusing on topics from that week s lectures Solutions returned at the exercise sessions NB: You get points only by attending your own exercise group (not group 99) Solutions can be returned by email only in exceptional circumstances, not including being busy at work: email janne.leppa-aho@helsinki.fi Language of exercise sessions: English Exercise points make up 40% of your total grade, must get at least half the points to be eligible for the course exam.

7, Practical details (5) Exercises this week: This week we offer voluntary R tutorials Thursday Nov 1st at 4pm (D123) (after this lecture!) and Friday Nov 2nd at 12pm (D123) (after tomorrow s lecture!) Instruction on R and its features used on this course Voluntary, no points awarded. Recommended for everyone not previously familiar with R. Bring you own laptop, with R (and possibly RStudio) installed.

8, Practical details (6) Course exam (these can sometimes change with short notice!): Tuesday, December 18 at 9:00am (NB: not 9:15am) Makes 60% of your course grade Must get a minimum of half the points of the exam to pass the course Pencil-and-paper problems, similar style as in exercises (also essay or explain problems) You may answer exam problems also in Finnish or Swedish. You may bring a hand-written cheat sheet (one A4)! (Note: To be eligible to take a separate exam you need to first complete some programming assignments. These will be available on the course web page a bit later. However since you are here at the lecture, this probably does not concern you.)

9, Practical details (7) Prerequisites: Mathematics: Basics of probability theory and statistics, linear algebra (i.e., vectors and matrices) and real analysis (i.e., derivatives, etc.) Computer science: Good programming skills (but no previous familiarity with R necessary)

10, Related courses Various advanced Data Science courses: Advanced Course in Machine Learning (period IV) Statistical Data Science (period II) Computational Statistics I-II (periods I-II) Probabilistic Graphical Models (period III) Introduction to Data Science (period I) Deep Learning (period II) Many seminars also have a strong machine learning flavour! Lots of courses at Aalto as well!

11, Practical details (8) Course material: Webpage (public information about the course): https://courses.helsinki.fi/en/data11002/124843969 NB: You should have signed up at Oodi. Help? Ask the assistants/lecturer at exercises/lectures Contact assistants/lecturer

12, Course outline Introduction Ingredients of machine learning task, models, data evaluation and model selection Supervised learning classification regression Unsupervised learning clustering dimension reduction

13, What is machine learning? Definition: machine = computer, computer program (in this course) learning = improving performance on a given task, based on experience / examples In other words instead of the programmer writing explicit rules for how to solve a given problem, the programmer instructs the computer how to learn from examples in many cases the computer program can even become better at the task than the programmer is!

Example 1: tic-tac-toe How to program the computer to play tic-tac-toe? Option A: The programmer writes explicit rules, e.g. if the opponent has two in a row, and the third position is free, place your mark there, etc (lots of work, difficult, not at all scalable!) Option B: Go through the game tree, choose optimally (for non-trivial games, must be combined with some heuristics to restrict tree size) Option C: Let the computer try out various strategies by playing against itself and others, and noting which strategies lead to winning and which to losing (= machine learning ) 14,

15, Arthur Samuel (50 s and 60 s): Computer program that learns to play checkers Program plays against itself thousands of times, learns which positions are good and which are bad (i.e. which lead to winning and which to losing) The computer program eventually becomes much better than the programmer.

16, Example 2: spam filter Programmer writes rules: If it contains viagra then it is spam. (difficult, not user-adaptive) The user marks which mails are spam, which are legit, and the computer learns itself what words are predictive Y { } From: medshop@spam.com Subject: viagra cheap meds... From: my.professor@helsinki.fi Subject: important information here s how to ace the test.... From: mike@example.org Subject: you need to see this how to win $1,000,000... spam non-spam.?

17, Problem setup One definition of machine learning: A computer program improves its performance on a given task with experience (i.e. examples, data). So we need to separate Task: What is the problem that the program is solving? Performance measure: How is the performance of the program (when solving the given task) evaluated? Experience: What is the data (examples, features) that the program is using to improve its performance?

18, Related scientific disciplines (1) Artificial Intelligence (AI) Machine learning can be seen as one approach towards implementing intelligent machines (or at least machines that behave in a seemingly intelligent way). Artificial neural networks, computational neuroscience Inspired by and trying to mimic the function of biological brains, in order to make computers that learn from experience. Modern machine learning really grew out of the neural networks boom in the 1980 s and early 1990 s. Pattern recognition Recognizing objects and identifying people in controlled or uncontrolled settings, from images, audio, etc. Such tasks typically require machine learning techniques.

19, Deep learning? Is a family of machine learning methods. Has become incredibly popular in the past few years. Yields very good results, e.g. in computer vision and speech recognition tasks. Heavily based on classical work on artificial neural network methods. (Fundamentally not really that novel.) Basic principles of ML covered in this course also apply to deep learning!

20, Availability of data These days it is very easy to collect data (sensors are cheap, much information digital) store data (hard drives are big and cheap) transmit data (essentially free on the internet). The result? Everybody is collecting large quantities of data. Businesses: shops (market-basket data), search engines (web pages and user queries), financial sector (asset prices, transaction metadata, etc), manufacturing (sensors of all kinds), social networking sites (user activity, links), anybody with a web server (hits, user activity) Science: genomes sequenced, gene expression data, experiments in high-energy physics, images of remote galaxies, global ecosystem monitoring data, drug research and development, public health data But how to benefit from it? Analysis is becoming key!

21, Big Data one definition: data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges (Oxford English Dictionary) 3V: volume, velocity, and variety (Doug Laney, 2001) a database may be able to handle a lot of data, but you can t really implement a machine learning algorithm as an SQL query on this course we do not consider technical issues relating to extremely large data sets (check out courses Introduction to Big Data Management and Big Data Frameworks) basic principles of machine learning still apply, but many algorithms may be difficult to implement efficiently

22, Related scientific disciplines (2) Data mining Trying to identify interesting and useful associations and patterns in huge datasets Focus on scalable algorithms Example: shopping basket analysis (frequent itemsets) Statistics historically, introductory courses on statistics tend to focus on hypothesis testing and some other basic problems however there s a lot more to statistics than hypothesis testing there is a lot of interaction between research in machine learning, data mining and statistics

23, Example 3 Prediction of search queries The programmer provides a standard dictionary (words and expressions change!) Previous search queries are used as examples!

24, Example 4 Ranking search results: Various criteria for ranking results What do users click on after a given search? Search engines can learn what users are looking for by collecting queries and the resulting clicks.

25, Example 5 Detecting credit card fraud Credit card companies typically end up paying for fraud (stolen cards, stolen card numbers) It s thus useful to screen transactions automatically! Important to be adaptive to the behaviors of customers, i.e. learn from existing data how users normally behave, and try to detect unusual transactions (anomaly detection)

26, Example 6 Self-driving cars: Sensors (radars, cameras) superior to humans How to make the computer react appropriately to the sensor data? Note: The sensors can be broken and deliver incorrect/broken data. Adversarial attacks on computer vision systems!

27, Example 7 Character recognition: Automatically sorting mail (handwritten characters) Digitizing old books and newspapers into easily searchable format (printed characters)

28, Example 8 Recommendation systems ( collaborative filtering ): Amazon: Customers who bought X also bought Y... Netflix: Based on your movie ratings, you might enjoy... Spotify: Discover weekly playlists Linda Jack Bill Lucy John Seven Fargo Aliens Leon Avatar 4 5 5 1 2 3 4 3 1 4 1 5 1? 4 1? 2 1 1 5 1 1 4 5 4 5 5 2 3 3

29, Example 9 Machine translation: Traditionally: statistical machine translation based on example data Most recently: neural machine translation based on deep sequence-to-sequence models

30, Example 10 Online store website optimization: What items to present, what layout? What colors to use? Can significantly affect sales volume Experiment, and analyze the results! (lots of decisions on how exactly to experiment and how to ensure meaningful results)

31, Example 11 Mining chat and discussion forums Breaking news Detecting outbreaks of infectious disease Tracking consumer sentiment about companies / products

32, Example 12 Real-time sales and inventory management Picking up quickly on new trends (what s hot at the moment?) Deciding on what to produce or order

33, Example 13 Prediction of friends in Facebook, or prediction of who you d like to follow on Twitter.

34, What about privacy? Users are surprisingly willing to sacrifice privacy to obtain useful services and benefits Regardless of what position you take on this issue, it is important to know what can and what cannot be done with various types information (i.e. what the dangers are) Privacy-preserving data mining What type of statistics/data can be released without exposing sensitive personal information? (e.g. government statistics) Developing data mining algorithms that limit exposure of user data (e.g. Collaborative filtering with privacy, Canny 2002)

We re in this together. Let s do it! 35,