Optimization for Data Science

Similar documents
Python Machine Learning

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Assignment 1: Predicting Amazon Review Ratings

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

STA 225: Introductory Statistics (CT)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

School of Innovative Technologies and Engineering

CSL465/603 - Machine Learning

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

arxiv: v1 [cs.lg] 15 Jun 2015

Human Emotion Recognition From Speech

Probability and Statistics Curriculum Pacing Guide

Generative models and adversarial training

CS Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A study of speaker adaptation for DNN-based speech synthesis

Evaluation of Teach For America:

Mathematics. Mathematics

Active Learning. Yingyu Liang Computer Sciences 760 Fall

A. What is research? B. Types of research

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Calibration of Confidence Measures in Speech Recognition

Detailed course syllabus

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Algebra 2- Semester 2 Review

The Strong Minimalist Thesis and Bounded Optimality

Navigating the PhD Options in CMS

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Hardhatting in a Geo-World

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Capturing and Organizing Prior Student Learning with the OCW Backpack

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

University of Arizona

12- A whirlwind tour of statistics

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Chapter 2 Rule Learning in a Nutshell

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

AP Statistics Summer Assignment 17-18

Mathematics Assessment Plan

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

B.S/M.A in Mathematics

Time series prediction

Semi-Supervised Face Detection

Syllabus ENGR 190 Introductory Calculus (QR)

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Model Ensemble for Click Prediction in Bing Search Ads

PROGRAM REVIEW CALCULUS TRACK MATH COURSES (MATH 170, 180, 190, 191, 210, 220, 270) May 1st, 2012

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Speech Emotion Recognition Using Support Vector Machine

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

EGRHS Course Fair. Science & Math AP & IB Courses

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

Using focal point learning to improve human machine tacit coordination

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Honors Mathematics. Introduction and Definition of Honors Mathematics

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

arxiv: v2 [cs.cv] 30 Mar 2017

Math Placement at Paci c Lutheran University

Pearson Baccalaureate Higher Level Mathematics Worked Solutions

Evolutive Neural Net Fuzzy Filtering: Basic Description

Levels of processing: Qualitative differences or task-demand differences?

Probabilistic Latent Semantic Analysis

Math 098 Intermediate Algebra Spring 2018

Lecture 10: Reinforcement Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Methods for Fuzzy Systems

Mathematics process categories

arxiv: v1 [math.at] 10 Jan 2016

A survey of multi-view machine learning

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Characteristics of Functions

Softprop: Softmax Neural Network Backpropagation Learning

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

GDP Falls as MBA Rises?

Person Centered Positive Behavior Support Plan (PC PBS) Report Scoring Criteria & Checklist (Rev ) P. 1 of 8

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Statewide Framework Document for:

Rule Learning With Negation: Issues Regarding Effectiveness

MTH 141 Calculus 1 Syllabus Spring 2017

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Transcription:

Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & Alexandre Gramfort

Core Info Where : Telecom ParisTech Location : Amphi Estaunié or B312 ECTS : 5 ECTS Volume : 40h When : 12 weeks (including one week break for holidays + one week for exam) Online: All teaching materials on moodle: http://datasciencex-master-paris-saclay.fr/education/ Students upload their projects / reports via moodle too. All students **must** be registered on moodle.

Who am I? Robert M. Gower Assistant Prof at Telecom robert.gower@telecom-paristech.fr www.ens.fr/~rgower Research topics: Stochastic algorithms for optimization, numerical linear algebra, quasi-newton methods and automatic differentiation (backpropagation).

Introduction to Optimization in Machine Learning Robert M. Gower Master 2 Data Science, Univ. Paris Saclay Optimisation for Data Science

An Introduction to Supervised Learning

References for this class Chapter 1 Understanding Machine Learning: From Theory to Algorithms Pages 67 to 79 Convex Optimization

Is There a Cat in the Photo? Yes No

Is There a Cat in the Photo? Yes

Is There a Cat in the Photo? Yes

Is There a Cat in the Photo? No

Is There a Cat in the Photo? Yes

Is There a Cat in the Photo? Yes No x: Input/Feature y: Output/Target Find mapping h that assigns the correct target to each input

Labeled Data: The training set

Labeled Data: The training set y= -1 means no/false

Labeled Data: The training set y= -1 means no/false Learning Algorithm

Labeled Data: The training set y= -1 means no/false Learning Algorithm

Labeled Data: The training set y= -1 means no/false Learning Algorithm -1

Example: Linear Regression for Height Labeled data Sex Male Sex Female Age 30 Age 70 Height 1,72 cm Height 1,52 cm

Example: Linear Regression for Height Labeled data Sex Male Sex Female Age 30 Age 70 Height 1,72 cm Height 1,52 cm Example Hypothesis: Linear Model

Example: Linear Regression for Height Labeled data Sex Male Sex Female Age 30 Age 70 Height 1,72 cm Height 1,52 cm Example Hypothesis: Linear Model Example Training Problem:

Linear Regression for Height H e i g h t Age

Linear Regression for Height H e i g h t The Training Algorithm Age

Linear Regression for Height H e i g h t The Training Algorithm Other options aside from linear? Age

Parametrizing the Hypothesis Linear: Polinomial: Neural Net: H e i g h t H e i g h t Age Age

Loss Functions Why a Squared Loss?

Loss Functions Why a Squared Loss? Loss Functions The Training Problem

Loss Functions Why a Squared Loss? Loss Functions The Training Problem Typically a convex function

Choosing the Loss Function Quadratic Loss Binary Loss Hinge Loss

Choosing the Loss Function Quadratic Loss Binary Loss Hinge Loss y=1 in all figures

Choosing the Loss Function Quadratic Loss Binary Loss Hinge Loss EXE: Plot the binary and hinge loss function in when y=1 in all figures

Loss Functions Is a notion of Loss enough? What happens when we do not have enough data?

Loss Functions The Training Problem Is a notion of Loss enough? What happens when we do not have enough data?

Overfitting and Model Complexity Fitting 1st order polynomial

Overfitting and Model Complexity Fitting 1st order polynomial

Overfitting and Model Complexity Fitting 3rd order polynomial

Overfitting and Model Complexity Fitting 9th order polynomial

Regularization Regularizor Functions General Training Problem

Regularization Regularizor Functions General Training Problem Goodness of fit, fidelity term...etc

Regularization Regularizor Functions General Training Problem Goodness of fit, fidelity term...etc Penlizes complexity

Regularization Regularizor Functions Controls tradeoff between fit and complexity General Training Problem Goodness of fit, fidelity term...etc Penlizes complexity

Regularization Regularizor Functions Controls tradeoff between fit and complexity General Training Problem Goodness of fit, fidelity term...etc Exe: Penlizes complexity

Overfitting and Model Complexity Fitting kth order polynomial

Overfitting and Model Complexity For λ big enough, the solution is a 2nd order polynomial Fitting kth order polynomial

Exe: Ridge Regression Linear hypothesis L2 loss Ridge Regression L2 regularizor

Exe: Support Vector Machines Linear hypothesis Hinge loss SVM with soft margin L2 regularizor

Exe: Logistic Regression Linear hypothesis Logistic loss Logistic Regression L2 regularizor

The Machine Learners Job

The Machine Learners Job

The Machine Learners Job

The Machine Learners Job

The Machine Learners Job

The Machine Learners Job

The Statistical Learning Problem: The hard truth Do we really care if the loss is small on the known labelled data paris (xi,yi)? Nope We really want to have a small loss on new unlabelled Observations! Assume data sampled distribution where is an unknown

The Statistical Learning Problem: The hard truth The statistical learning problem: Minimize the expected loss over an unknown expectation Variance of sample mean: