INTRODUCTION TO DATA SCIENCE

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

(Sub)Gradient Descent

CS Machine Learning

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Laboratorio di Intelligenza Artificiale e Robotica

Probabilistic Latent Semantic Analysis

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Axiom 2013 Team Description Paper

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Generative models and adversarial training

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evolutive Neural Net Fuzzy Filtering: Basic Description

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Speech Emotion Recognition Using Support Vector Machine

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Reducing Features to Improve Bug Prediction

An Introduction to Simio for Beginners

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Welcome to. ECML/PKDD 2004 Community meeting

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Modeling user preferences and norms in context-aware systems

Rule Learning With Negation: Issues Regarding Effectiveness

Mathematics process categories

A survey of multi-view machine learning

A Case Study: News Classification Based on Term Frequency

Learning Methods for Fuzzy Systems

Calibration of Confidence Measures in Speech Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Recognition at ICSI: Broadcast News and beyond

Test Effort Estimation Using Neural Network

SARDNET: A Self-Organizing Feature Map for Sequences

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A study of speaker adaptation for DNN-based speech synthesis

Lecture 10: Reinforcement Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

WHEN THERE IS A mismatch between the acoustic

Word Segmentation of Off-line Handwritten Documents

Human Emotion Recognition From Speech

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

REALISTIC MATHEMATICS EDUCATION FROM THEORY TO PRACTICE. Jasmina Milinković

A Case-Based Approach To Imitation Learning in Robotic Agents

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Multi-label classification via multi-target regression on data streams

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Seminar - Organic Computing

Person Centered Positive Behavior Support Plan (PC PBS) Report Scoring Criteria & Checklist (Rev ) P. 1 of 8

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Measures of the Location of the Data

Spinners at the School Carnival (Unequal Sections)

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

learning collegiate assessment]

Probability and Statistics Curriculum Pacing Guide

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning Methods in Multilingual Speech Recognition

Universidade do Minho Escola de Engenharia

Semi-Supervised Face Detection

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Reinforcement Learning by Comparing Immediate Reward

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

On the Formation of Phoneme Categories in DNN Acoustic Models

Exploration. CS : Deep Reinforcement Learning Sergey Levine

TD(λ) and Q-Learning Based Ludo Players

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Model Ensemble for Click Prediction in Bing Search Ads

Australian Journal of Basic and Applied Sciences

Using focal point learning to improve human machine tacit coordination

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

CS 446: Machine Learning

Abstractions and the Brain

arxiv: v2 [cs.cv] 30 Mar 2017

We re Listening Results Dashboard How To Guide

MYCIN. The MYCIN Task

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Switchboard Language Model Improvement with Conversational Data from Gigaword

Statewide Framework Document for:

GUIDE TO THE CUNY ASSESSMENT TESTS

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Transcription:

DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING

TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING

WHAT IS MACHINE LEARNING? Definition: machine = computer, computer program (in this course) learning = improving performance on a given task, based on experience / examples In other words, instead of the programmer writing explicit rules for how to solve a given problem, the programmer instructs the computer how to learn from examples In many cases the computer program can even become better at the task than the programmer is!

EXAMPLE: SPAM FILTER method #1: Programmer writes rules: If it contains viagra then it is spam. (difficult, not user-adaptive) method #2: The user marks which mails are spam, which are legit, and a ML algorithm is used to construct a classifier From: medshop@spam.com Subject: viagra cheap meds... From: my.professor@helsinki.fi Subject: important information here s how to ace the exam... From: mike@example.org Subject: you need to see this how to win $1,000,000... spam non-spam?

MACHINE LEARNING SETTING One definition of machine learning: A computer program improves its performance on a given task with experience (i.e. examples, data). So we need to separate Task: What is the problem that the program is solving? Performance measure: How is the performance of the program (when solving the given task) evaluated? Experience: What is the data (examples) that the program is using to improve its performance?

NEIGHBORING DISCIPLINES Artificial Intelligence (AI) : Machine learning can be seen as one approach towards implementing intelligent machines Neural networks, deep learning: Inspired by and trying to mimic the function of biological brains, in order to make computers that learn from experience. Modern machine learning really grew out of the neural networks boom in the 1980 s and early 1990 s. Pattern recognition: Recognizing objects and identifying people in controlled or uncontrolled settings, from images, audio, etc. Such tasks typically require machine learning techniques.

NEIGHBORING DISCIPLINES Statistics historically, introductory courses on statistics tend to focus on hypothesis testing and some other basic problems such as linear regression There s a lot more to statistics than hypothesis testing There is a lot of interaction between research in machine learning and statistics

NEIGHBORING DISCIPLINES image from: machinelearners.wordpress.com

KINDS OF MACHINE LEARNING Supervised machine learning: task is to predict the correct (or good) response y given an input x, e.g.: + classify samples to normal and abnormal + classify emails as spam or legit ("ham") + predict movie profits based on director, actors,... + generate text descriptions of images Unsupervised machine learning: task is to create models or summaries of the input x (no y): + clustering (users, products, text documents by topic,...) + building dependency graphs (Bayesian networks,...) + reducing dimensionality to the essentials + visualization (dimension reduction to 2D/3D)

KINDS OF MACHINE LEARNING Other kinds exist as well: semi-supervised learning: supervised learning task but only some training data is labeled reinforcement learning: supervised learning but no direct feedback about the goodness of individual choices; instead delayed reward/penalty (e.g., win/lose a game, reach destination successfull/not,...) We'll mostly focus on supervised and unsupervised learning Goal here is to learn to identify a machine learning problem, choose the right approach, instead of learning the details

KINDS OF MACHINE LEARNING Case: Bank loan application Training data: 10000 customer background questionnaires & info about paid-on-time/not Task: predict whether a new customer will pay back on time or not based on their background questionnaire ML approach: SUPERVISED LEARNING!

KINDS OF MACHINE LEARNING Case: Autonomous car Training data: Control data from Tesla drivers driving around & info about crash/no-crash Task: Self-driving car ML approach: SUPERVISED LEARNING (for learning how to mimic human drivers) + REINFORCEMENT LEARNING (for learning to drive even better!)

KINDS OF MACHINE LEARNING Case: Customer segmentation Training data: Shopping basket data from 1 000 000 purchases Task: Group customers into different groups to tailor product placement and marketing ML approach: UNSUPERVISED LEARNING (clustering)

KINDS OF MACHINE LEARNING Case: Product pricing Training data: Sales data (product descriptions, final price) from on-line marketplace (swap.com, huuto.net) Task: Choose appropriate price for new products based on description ML approach: SUPERVISED LEARNING (but remember "game-theoretic aspect")

LOSS FUNCTIONS The key problem in supervised learning (classification and regression) is to maximize the predictive performance (Of course computational complexity is crucial for big data scenarios.) Performance is measured using a loss function predictor: f: X Y (map an input x X to output ŷ Y) loss: L: Y 2 R (map the predicted output ŷ and the correct output y to a score measuring "cost" or "error") Training loss: average L(f(x), y) over (x,y) in training data set Test loss: average L(f(x), y) over (x,y) in test data set

LOSS FUNCTIONS Example loss functions: squared error (regression) L(ŷ,y) = (ŷ y) 2 zero-one-loss (classification) L(ŷ,y) = 1 if ŷ y, 0 if ŷ=y ^ ^ log-loss (probabilistic classification) L(p,y) = log p(y), ^ where p(y) is the predicted probability of y NB: In the last case, the predictor outputs a probability distribution over the outcomes It is important to understand what the real "cost" or utility in the practical application is: minimizing one thing can far from optimal in terms of another

OVERFITTING Training loss can be low because: problem is simple and good predictions easy to find we have tried a huge number of different predictors and some of them just happen to fit the training data! The second alternative is called overfitting In case of overfitting: training error is small but test error is big

OVERFITTING The overfitting problem is closely related to the complexity of the models being fitted There are fewer simple models than complex models Therefore, fitting a simple model leads to a lower risk of overfitting than fitting a complex model Classic example: polynomial fitting

OVERFITTING Mean Squared Error 0.0 0.5 1.0 1.5 2.0 2.5 Left: Data source (black line), data (circles), and three regression models of increasing complexity; Right: training (blue) and test error (red) of the three models

VALIDATION A separate validation data set can be used to reduce the risk of overfitting train validation available data Fit models with varying complexity on training data, e.g. regression with different covariate subsets (feature selection) decision trees with variable number of nodes support vector machines with different regularization parameters Choose the subset/number-of-nodes/regularization based on performance on the validation set

CROSS-VALIDATION To get more reliable statistics than a single split provides, use K- fold cross-validation (see Exercise 1.3.c): 1. Divide the data into K equal-sized subsets: 1 2 3 4 5 2. For j from 1 to K: available data 2.1 Train the model(s) using all data except that of subset j 2.2 Compute the resulting validation error on the subset j 3. Average the K results When K = N (i.e. each datapoint is a separate subset) this is known as leave-one-out cross-validation.