ECE 6254 Statistical Machine Learning Spring 2017

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Python Machine Learning

CSL465/603 - Machine Learning

STA 225: Introductory Statistics (CT)

The Evolution of Random Phenomena

CS Machine Learning

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Mathematics. Mathematics

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

EGRHS Course Fair. Science & Math AP & IB Courses

Mathematics Success Grade 7

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

School of Innovative Technologies and Engineering

Lecture 10: Reinforcement Learning

MGT/MGP/MGB 261: Investment Analysis

Self Study Report Computer Science

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Probability and Statistics Curriculum Pacing Guide

Lecture 1: Basic Concepts of Machine Learning

Learning Methods for Fuzzy Systems

Introduction to Simulation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

MYCIN. The MYCIN Task

CS177 Python Programming

Proof Theory for Syntacticians

Firms and Markets Saturdays Summer I 2014

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Word learning as Bayesian inference

Generative models and adversarial training

Syllabus Foundations of Finance Summer 2014 FINC-UB

STAT 220 Midterm Exam, Friday, Feb. 24

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ME 443/643 Design Techniques in Mechanical Engineering. Lecture 1: Introduction

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

12- A whirlwind tour of statistics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Measurement. When Smaller Is Better. Activity:

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Statistics and Data Analytics Minor

This Performance Standards include four major components. They are

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Probabilistic Latent Semantic Analysis

Course Content Concepts

Functional Skills Mathematics Level 2 assessment

Laboratorio di Intelligenza Artificiale e Robotica

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Office Hours: Mon & Fri 10:00-12:00. Course Description

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Mathematics process categories

CS/SE 3341 Spring 2012

Grade Band: High School Unit 1 Unit Target: Government Unit Topic: The Constitution and Me. What Is the Constitution? The United States Government

Evaluating Statements About Probability

Data Structures and Algorithms

Introduction and Motivation

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

B.S/M.A in Mathematics

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Math Pathways Task Force Recommendations February Background

CS 101 Computer Science I Fall Instructor Muller. Syllabus

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Comparison of network inference packages and methods for multiple networks inference

Students Understanding of Graphical Vector Addition in One and Two Dimensions

The Task. A Guide for Tutors in the Rutgers Writing Centers Written and edited by Michael Goeller and Karen Kalteissen

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Critical Thinking in the Workplace. for City of Tallahassee Gabrielle K. Gabrielli, Ph.D.

WHEN THERE IS A mismatch between the acoustic

Using focal point learning to improve human machine tacit coordination

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

What is a Mental Model?

Contents. Foreword... 5

Physics 270: Experimental Physics

Word Segmentation of Off-line Handwritten Documents

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Circuit Simulators: A Revolutionary E-Learning Platform

Machine Learning and Development Policy

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Getting Started with Deliberate Practice

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Transcription:

ECE 6254 Statistical Machine Learning Spring 2017 Mark A. Davenport Georgia Institute of Technology School of Electrical and Computer Engineering

Statistical machine learning How can we learn effective models from data? apply these models to practical inference and signal processing problems? Applications include: classification, prediction, regression, clustering, modeling, and data exploration/visualization Our approach: statistical inference Main subject of this course how to reason about and work with probabilistic models to help us make inferences from data

What is machine learning? learn: gain or acquire knowledge of or skill in (something) by study, experience, or being taught How do we learn that this is a tree? Definition? EXAMPLES! (A perennial plant with an elongated stem, or trunk, supporting leaves or branches.) A good definition of learning for this course: using a set of examples to infer something about an underlying process

Why learn from data? Traditional signal processing is top down Given a model for our data, derive the optimal algorithm A learning approach is more bottom up Given some examples, derive a good algorithm Sometimes a good model is really hard to derive from first principles

Examples of learning The Netflix prize: Predict how a user will rate a movie 10% improvement = $1 million prize Some pattern exists users do not assign ratings completely at random if you like Godfather I, you ll probably like Godfather II It is hard to pin down the pattern mathematically We have lots and lots of data we know how a user has rated other movies, and we know how other users have rated this (and other) movies

Handwritten digit recognition Examples of learning

Waking up in the morning A day in the life

A day in the life Getting into the car to drive to work

Once at work A day in the life

Over the lunch break A day in the life

Heading home for the day A day in the life

Finally home for the day A day in the life

Before getting into bed A day in the life

Supervised learning We are given input data Each represents a measurement or observation of some natural or man-made phenomenon may be called input, pattern, signal, feature vector, instance, or independent variable the coordinates may be called features, attributes, predictors, or covariates In the supervised case, we are also given output data may be called output, label, response, or dependent variable The data training data are called the

Supervised learning We can think of a pair input-output relationship as obeying a (possibly noisy) The goal of supervised learning is usually to generalize the input-output relationship so that we can predict the output associated with a previously unseen input The primary supervised learning problems are classification: regression:

Unsupervised learning The inputs are not accompanied by labels The goal of unsupervised learning is typically not related to future observations. Instead we just want to understand that structure in the data sample itself, or to infer some characteristic of the underlying probability distribution. Examples of unsupervised learning problems include clustering density estimation dimensionality reduction/feature selection visualization

Other variants of learning semi-supervised learning active learning online learning reinforcement learning anomaly detection ranking transfer learning multi-task learning In general, most learning problems can be thought of as variants of traditional signal processing problems, but where we have no idea (a priori) how to model our signals

Prerequisites Probability random variables, expectation, joint distributions, independence, conditional distributions, Bayes rule, multivariate normal distribution, Linear algebra norms, inner products, orthogonality, linear independence, eigenvalues/vectors, eigenvalue decompositions, Multivariable calculus partial derivatives, gradients, the chain rule, Python or similar programming experience (C or MATLAB)

Text There is no formally required textbook for this course, but I will draw material primarily from these sources: A list of other useful books and links to relevant papers will be posted on the course webpage Lecture notes will also be posted on the course webpage

Grading Pre-test (5%) Homework (25%) Midterm exam (20%) Final exam (20%) Final project (25%) Participation (5%)

Distance learning Welcome to our online students! Recorded lectures will be available to all students (including on-campus students) I need your help to make this a success Online resources: Course website T-square Piazza

A learning puzzle?

Is learning even possible? or: How I learned to stop worrying and love statistics Supervised learning Given training data learn an (unknown) function for other than, we would like to such that but as we have just seen, this is impossible. Without any additional assumptions, we conclude nothing about (maybe) for its value on except

Probability to the rescue! Any agreeing with the training data may be possible but that does not mean that any is equally probable A short digression Suppose that I have a biased coin, which lands on heads with some unknown probability I toss the coin times (independently) Does tell us anything about?

What can we learn from? Given enough tosses (large ), we expect that Law of large numbers as Clearly, at least in a very limited sense, we can learn something about from observations There is always the possibility that we are totally wrong, but given enough data, the probability should be very small

Connection to learning Coin tosses: We want to estimate Learning: We want to estimate a function Suppose we have a hypothesis and that is discrete Think of the as a series of independent coin tosses, where the are drawn from a probability distribution heads: our hypothesis is correct, i.e., tails: our hypothesis is wrong, i.e., Define Risk: Empirical risk:

Trust, but verify The law of large numbers guarantees that as long as we have enough data, we will have that This means that we can use to verify whether was a good hypothesis Unfortunately, verification is not learning Where did What if come from? is large? How do we know if, or at least, if? Given many possible hypotheses, how can we pick a good one?

E pluribus unum Consider an ensemble of many hypotheses If we fix a hypotheses before drawing our data, then the law of large numbers tells us that However, it is also true that for a fixed, if is large it can still be very likely that there is some hypothesis for which is still very far from

Back to the coin analogy Question 1: If I toss a fair coin 10 times, what is the probability that I get 10 heads? Answer: 0.001 Question 2: If I toss 1000 fair coins 10 times each, what is the probability that some coin will get 10 heads? Answer: 0.624 This phenomenon forms the fundamental challenge of multiple hypothesis testing

and back to learning If we have many hypotheses (large even though for any fixed hypothesis ), then it is likely that it is also likely that there will be at least one hypothesis where is very different from How to adapt our approach to handle many hypotheses? Next time: We will be a bit more quantitative and take a first crack at solving this problem