CSE 190 Lecture 1.5. Data Mining and Predictive Analytics. Supervised learning Regression

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

STA 225: Introductory Statistics (CT)

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

CS Machine Learning

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Probability and Game Theory Course Syllabus

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Probability and Statistics Curriculum Pacing Guide

Artificial Neural Networks written examination

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

WE ARE EXCITED TO HAVE ALL OF OUR FFG KIDS BACK FOR OUR SCHOOL YEAR PROGRAM! WE APPRECIATE YOUR CONTINUED SUPPORT AS WE HEAD INTO OUR 8 TH SEASON!

Introduction to Simulation

CSL465/603 - Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 10: Reinforcement Learning

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

COURSE WEBSITE:

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Truth Inference in Crowdsourcing: Is the Problem Solved?

STAT 220 Midterm Exam, Friday, Feb. 24

Proof Theory for Syntacticians

MGT/MGP/MGB 261: Investment Analysis

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Probabilistic Latent Semantic Analysis

CS/SE 3341 Spring 2012

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Case study Norway case 1

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Learning From the Past with Experiment Databases

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

GAT General (Analytical Reasoning Section) NOTE: This is GAT-C where: English-40%, Analytical Reasoning-30%, Quantitative-30% GAT

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v1 [cs.lg] 15 Jun 2015

Business Computer Applications CGS 1100 Course Syllabus. Course Title: Course / Prefix Number CGS Business Computer Applications

Lecture 1: Basic Concepts of Machine Learning

Hierarchical Linear Models I: Introduction ICPSR 2015

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Ryerson University Sociology SOC 483: Advanced Research and Statistics

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Office Hours: Mon & Fri 10:00-12:00. Course Description

ANT 3520 (Online) Skeleton Keys: Introduction to Forensic Anthropology Spring 2015

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Physics 270: Experimental Physics

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

CTE Teacher Preparation Class Schedule Career and Technical Education Business and Industry Route Teacher Preparation Program

Individual Differences & Item Effects: How to test them, & how to test them well

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Analysis of Enzyme Kinetic Data

Detailed course syllabus

Nutrition 10 Contemporary Nutrition WINTER 2016

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

School of Innovative Technologies and Engineering

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Generative models and adversarial training

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Math 181, Calculus I

Mathematics. Mathematics

12- A whirlwind tour of statistics

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

MTH 141 Calculus 1 Syllabus Spring 2017

arxiv: v1 [math.at] 10 Jan 2016

Foothill College Summer 2016

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

STUDENT PACKET - CHEM 113 Fall 2010 and Spring 2011

Investment in e- journals, use and research outcomes

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Chromatography Syllabus and Course Information 2 Credits Fall 2016

PSYCHOLOGY 353: SOCIAL AND PERSONALITY DEVELOPMENT IN CHILDREN SPRING 2006

Introduction to Causal Inference. Problem Set 1. Required Problems

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

STUDENT SATISFACTION IN PROFESSIONAL EDUCATION IN GWALIOR

Math 96: Intermediate Algebra in Context

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Machine Learning and Development Policy

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Theory of Probability

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A Case Study: News Classification Based on Term Frequency

Transcription:

CSE 190 Lecture 1.5 Data Mining and Predictive Analytics Supervised learning Regression

What is supervised learning? Supervised learning is the process of trying to infer from labeled data the underlying function that produced the labels associated with the data

What is supervised learning? Given labeled training data of the form Infer the function

Example Suppose we want to build a movie recommender e.g. which of these films will I rate highest?

Example Q: What are the labels? A: ratings that others have given to each movie, and that I have given to other movies

Example Q: What is the data? A: features about the movie and the users who evaluated it Movie features: genre, actors, rating, length, etc. User features: age, gender, location, etc.

Example Movie recommendation: =

Solution 1 Design a system based on prior knowledge, e.g. def prediction(user, movie): if (user[ age ] <= 14): if (movie[ mpaa_rating ]) == G ): return 5.0 else: return 1.0 else if (user[ age ] <= 18): if (movie[ mpaa_rating ]) == PG ): return 5.0.. Etc. Is this supervised learning?

Solution 2 Identify words that I frequently mention in my social media posts, and recommend movies whose plot synopses use similar types of language Plot synopsis Social media posts Is this supervised learning? argmax similarity(synopsis, post)

Solution 3 Identify which attributes (e.g. actors, genres) are associated with positive ratings. Recommend movies that exhibit those attributes. Is this supervised learning?

Solution 1 (design a system based on prior knowledge) Disadvantages: Depends on possibly false assumptions about how users relate to items Cannot adapt to new data/information Advantages: Requires no data!

Solution 2 (identify similarity between wall posts and synopses) Disadvantages: Depends on possibly false assumptions about how users relate to items May not be adaptable to new settings Advantages: Requires data, but does not require labeled data

Solution 3 (identify attributes that are associated with positive ratings) Disadvantages: Requires a (possibly large) dataset of movies with labeled ratings Advantages: Directly optimizes a measure we care about (predicting ratings) Easy to adapt to new settings and data

Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised learning approaches find patterns/relationships/structure in data, but are not optimized to solve a particular predictive task Supervised learning aims to directly model the relationship between input and output variables, so that the output variables can be predicted accurately given the input

Regression Regression is one of the simplest supervised learning approaches to learn relationships between input variables (features) and output variables (predictions)

Linear regression Linear regression assumes a predictor of the form matrix of features (data) vector of outputs unknowns (labels) (which features are relevant) (or if you prefer)

Linear regression Linear regression assumes a predictor of the form Q: Solve for theta A:

Example 1 How do preferences toward certain beers vary with age?

Example 1 Beers: Ratings/reviews: User profiles:

Example 1 50,000 reviews are available on http://jmcauley.ucsd.edu/cse190/data/beer/beer_50000.json (see course webpage) See also non-alcoholic beers: http://jmcauley.ucsd.edu/cse190/data/beer/non-alcoholic-beer.json

Example 1 Real-valued features How do preferences toward certain beers vary with age? How about ABV? (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Example 1 Real-valued features What is the interpretation of: (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Example 2 Categorical features How do beer preferences vary as a function of gender? (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Example 3 Random features What happens as we add more and more random features? (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Exercise How would you build a feature to represent the month, and the impact it has on people s rating behavior?

CSE 190 Lecture 2 Data Mining and Predictive Analytics Regression Diagnostics

Regression recap Regression is one of the simplest supervised learning approaches to learn relationships between input variables (features) and output variables (predictions)

Linear regression recap Linear regression assumes a predictor of the form matrix of features (data) vector of outputs unknowns (labels) (which features are relevant) (or if you prefer)

Linear regression recap Linear regression assumes a predictor of the form Q: Solve for theta A:

Example 3 (from Tuesday) Random features What happens as we add more and more random features? (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Exercise (from Tuesday) How would you build a feature to represent the month, and the impact it has on people s rating behavior?

Exercise (from Tuesday) How would you build a feature to represent the month? { Jan : 1, Feb : 2, Mar : 3, Apr : 4, May : 5, Jun : 6, }[mon]? Jan = [1,0,0,0,0,0,0,0,0,0,0,0] Feb = [0,1,0,0,0,0,0,0,0,0,0,0] Nov = [0,0,0,0,0,0,0,0,0,0,1,0] (etc.) Jan = [0,0,0,0,0,0,0,0,0,0,0] Feb = [0,0,0,0,0,0,0,0,0,0,1] Mar = [0,0,0,0,0,0,0,0,0,1,0] (etc.) Any benefit of one vs. another?

What does the data actually look like? Season vs. rating (overall)

Today: Regression diagnostics Mean-squared error (MSE)

Regression diagnostics Q: Why MSE (and not mean-absoluteerror or something else)

Regression diagnostics Quantile-Quantile (QQ)-plot

Regression diagnostics Coefficient of determination Q: How low does the MSE have to be before it s low enough? A: It depends! The MSE is proportional to the variance of the data

Regression diagnostics Coefficient of determination (R^2 statistic) Mean: Variance: MSE:

Regression diagnostics Coefficient of determination (R^2 statistic) (FVU = fraction of variance unexplained) FVU(f) = 1 FVU(f) = 0 Trivial predictor Perfect predictor

Regression diagnostics Coefficient of determination (R^2 statistic) R^2 = 0 Trivial predictor R^2 = 1 Perfect predictor

Overfitting Q: But can t we get an R^2 of 1 (MSE of 0) just by throwing in enough random features? A: Yes! This is why MSE and R^2 should always be evaluated on data that wasn t used to train the model A good model is one that generalizes to new data

Overfitting When a model performs well on training data but doesn t generalize, we are said to be overfitting Q: What can be done to avoid overfitting?

Occam s razor Among competing hypotheses, the one with the fewest assumptions should be selected (image from personalspirituality.net)

Occam s razor hypothesis Q: What is a complex versus a simple hypothesis?

Occam s razor hypothesis Q: What is a complex versus a simple hypothesis?

Occam s razor A1: A simple model is one where theta has few non-zero parameters (only a few features are relevant) A2: A simple model is one where theta is almost uniform (few features are significantly more relevant than others)

Occam s razor A1: A simple model is one where theta has few non-zero parameters is small A2: A simple model is one where theta is almost uniform is small ( proof on whiteboard)

Regularization Regularization is the process of penalizing model complexity during training MSE (l2) model complexity

Regularization Regularization is the process of penalizing model complexity during training How much should we trade-off accuracy versus complexity?

Optimizing the (regularized) model We no longer have a convenient closed-form solution for theta Need to resort to some form of approximation algorithm

Optimizing the (regularized) model Gradient descent: 1. Initialize at random 2. While (not converged) do All sorts of annoying issues: How to initialize theta? How to determine when the process has converged? How to set the step size alpha These aren t really the point of this class though

Optimizing the (regularized) model Gradient descent in scipy: (code for all examples is on http://jmcauley.ucsd.edu/cse190/code/week1.py)

Model selection How much should we trade-off accuracy versus complexity? Each value of lambda generates a different model. Q: How do we select which one is the best?

Model selection How to select which model is best? A1: The one with the lowest training error? A2: The one with the lowest test error? We need a third sample of the data that is not used for training or testing

Model selection A validation set is constructed to tune the model s parameters Training set: used to optimize the model s parameters Test set: used to report how well we expect the model to perform on unseen data Validation set: used to tune any model parameters that are not directly optimized

Model selection A few theorems about training, validation, and test sets The training error increases as lambda increases The validation and test error are at least as large as the training error (assuming infinitely large random partitions) The validation/test error will usually have a sweet spot between under- and over-fitting

Summary of Week 1: Regression Linear regression and least-squares (a little bit of) feature design Overfitting and regularization Gradient descent Training, validation, and testing Model selection

Coming up! An exciting case study (i.e., my own research)!

Homework Homework is available on the course webpage http://cseweb.ucsd.edu/~jmcauley/cse190/homework1.pdf Please submit it at the beginning of the week 3 lecture (Apr 14)

Office hours (in addition to my office hours on Wednesday) There will be office hours on Friday (with Long): 12:30-2:30pm in EBU3B B275 And on Monday (with Pranay): 5:00-7:00pm in EBU3B B250A

A question Q: Is this class going to be too much work? A: No

Questions?