CS 6375 Advanced Machine Learning (Qualifying Exam Section) Nicholas Ruozzi University of Texas at Dallas

Similar documents
(Sub)Gradient Descent

Python Machine Learning

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Basic Concepts of Machine Learning

Mathematics. Mathematics

CS Machine Learning

A survey of multi-view machine learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

WHEN THERE IS A mismatch between the acoustic

Axiom 2013 Team Description Paper

Probabilistic Latent Semantic Analysis

School of Innovative Technologies and Engineering

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Assignment 1: Predicting Amazon Review Ratings

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Generative models and adversarial training

Learning From the Past with Experiment Databases

Reducing Features to Improve Bug Prediction

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Australian Journal of Basic and Applied Sciences

arxiv: v1 [cs.lg] 15 Jun 2015

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Comment-based Multi-View Clustering of Web 2.0 Items

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v2 [cs.cv] 30 Mar 2017

Rule Learning With Negation: Issues Regarding Effectiveness

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Semi-Supervised Face Detection

Speech Emotion Recognition Using Support Vector Machine

Switchboard Language Model Improvement with Conversational Data from Gigaword

Human Emotion Recognition From Speech

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Rule Learning with Negation: Issues Regarding Effectiveness

CS 446: Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Softprop: Softmax Neural Network Backpropagation Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Detailed course syllabus

Model Ensemble for Click Prediction in Bing Search Ads

CS/SE 3341 Spring 2012

Getting Started with Deliberate Practice

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Case Study: News Classification Based on Term Frequency

Honors Mathematics. Introduction and Definition of Honors Mathematics

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Calibration of Confidence Measures in Speech Recognition

arxiv: v1 [math.at] 10 Jan 2016

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Mathematics Success Grade 7

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Statewide Framework Document for:

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Evolutive Neural Net Fuzzy Filtering: Basic Description

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Truth Inference in Crowdsourcing: Is the Problem Solved?

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Knowledge Transfer in Deep Convolutional Neural Nets

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Indian Institute of Technology, Kanpur

Learning Methods for Fuzzy Systems

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Reinforcement Learning by Comparing Immediate Reward

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Universidade do Minho Escola de Engenharia

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Recognition at ICSI: Broadcast News and beyond

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Self Study Report Computer Science

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Math Placement at Paci c Lutheran University

Using Web Searches on Important Words to Create Background Sets for LSI Classification

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Natural Language Processing: Interpretation, Reasoning and Machine Learning

A study of speaker adaptation for DNN-based speech synthesis

Lecture 10: Reinforcement Learning

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Using focal point learning to improve human machine tacit coordination

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

B.S/M.A in Mathematics

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Methodology

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Transcription:

CS 6375 Advanced Machine Learning (Qualifying Exam Section) Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate

Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues. 10am-11am TA:? Office hours and location? Course website: www.utdallas.edu/~nrr150130/cs6375/2017fa/ 2

Prerequisites CS 5343 (algorithms) Mathematical sophistication Basic probability Linear algebra Eigenvalues, eigenvectors, matrices, vectors, etc. Multivariate calculus Derivatives, integration, gradients, Lagrange multipliers, etc. I ll review some concepts as we come to them, but you should brush up in areas that you aren t as comfortable 3

Grading 5-6 problem sets (50%) See collaboration policy on the web Mix of theory and programming (in MATLAB or Python) Available and turned in on elearning Approximately one assignment every two weeks Midterm Exam (20%) Final Exam (30%) -subject to change- 4

Course Topics Dimensionality reduction PCA Matrix Factorizations Learning Supervised, unsupervised, active, reinforcement, Learning theory: PAC learning, VC dimension SVMs & kernel methods Decision trees, k-nn, Parameter estimation: Bayesian methods, MAP estimation, maximum likelihood estimation, expectation maximization, Clustering: k-means & spectral clustering Graphical models Neural networks Bayesian networks: naïve Bayes Statistical methods Boosting, bagging, bootstrapping Sampling Ranking & Collaborative Filtering 5

What is ML? 6

What is ML? A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. - Tom Mitchell 7

Basic Machine Learning Paradigm Collect data Build a model using training data Use model to make predictions 8

Supervised Learning Input: xx (1), yy (1),, (xx (MM), yy (MM) ) xx (mm) is the mm ttt data item and yy (mm) is the mm ttt label Goal: find a function ff such that ff xx (mm) is a good approximation to yy (mm) Can use it to predict yy values for previously unseen xx values 9

Examples of Supervised Learning Spam email detection Handwritten digit recognition Stock market prediction More? 10

Supervised Learning Hypothesis space: set of allowable functions ff: XX YY Goal: find the best element of the hypothesis space How do we measure the quality of ff? 11

Types of Learning Supervised The training data includes the desired output Unsupervised The training data does not include the desired output Semi-supervised Some training data comes with the desired output Active learning Semi-supervised learning where the algorithm can ask for the correct outputs for specifically chosen data points Reinforcement learning The learner interacts with the world via allowable actions which change the state of the world and result in rewards The learner attempts to maximize rewards through trial and error 12

Regression yy xx 13

Regression yy xx Hypothesis class: linear functions ff xx = aaaa + bb How do we measure the quality of the approximation? 14

Linear Regression In typical regression applications, measure the fit using a squared loss function LL ff = 1 MM mm ff xx mm yy mm 2 Want to minimize the average loss on the training data For 2-D linear regression, the learning problem is then min aa,bb 1 MM mm aaxx (mm) + bb yy (mm) 2 For an unseen data point, xx, the learning algorithm predicts ff(xx) 15

Linear Regression min aa,bb 1 MM mm aaxx (mm) + bb yy (mm) 2 How do we find the optimal aa and bb? 16

Linear Regression min aa,bb 1 MM mm aaxx (mm) + bb yy (mm) 2 How do we find the optimal aa and bb? Solution 1: take derivatives and solve (there is a closed form solution!) Solution 2: use gradient descent 17

Linear Regression min aa,bb 1 MM mm aaxx (mm) + bb yy (mm) 2 How do we find the optimal aa and bb? Solution 1: take derivatives and solve (there is a closed form solution!) Solution 2: use gradient descent This approach is much more likely to be useful for general loss functions 18

Gradient Descent Iterative method to minimize a (convex) differentiable function ff Pick an initial point xx 0 Iterate until convergence xx tt+1 = xx tt γγ tt ff(xx tt ) where γγ tt is the tt ttt step size (sometimes called learning rate) 19

Gradient Descent 20 source: Wikipedia

Gradient Descent min aa,bb 1 MM mm aaxx (mm) + bb yy (mm) 2 What is the gradient of this function? What does the gradient descent iteration look like for this simple regression problem? 21

Linear Regression In higher dimensions, the linear regression problem is essentially the same only xx (mm) R nn min aa R nn,bb 1 MM mm aa TT xx (mm) + bb yy (mm) 2 Can still use gradient descent to minimize this Not much more difficult than the nn = 1 case 22

Gradient Descent Gradient descent converges under certain technical conditions on the function ff and the step size γγ tt If ff is convex, then any fixed point of gradient descent must correspond to a global optimum of ff In general, convergence is only guaranteed to a local optimum 23

Regression What if we enlarge the hypothesis class? Quadratic functions kk-degree polynomials Can we always learn better with a larger hypothesis class? 24

Regression What if we enlarge the hypothesis class? Quadratic functions kk-degree polynomials Can we always learn better with a larger hypothesis class? 25

Regression What if we enlarge the hypothesis class? Quadratic functions kk-degree polynomials Can we always learn better with a larger hypothesis class? Larger hypothesis space always decreases the cost function, but this does NOT necessarily mean better predictive performance This phenomenon is known as overfitting Ideally, we would select the simplest hypothesis consistent with the observed data 26

Binary Classification Regression operates over a continuous set of outcomes Suppose that we want to learn a function ff: XX {0,1} As an example: xx 11 xx 22 xx 3 yy 1 0 0 1 0 2 0 1 0 1 3 1 1 0 1 4 1 1 1 0 How do we pick the hypothesis space? How do we find the best ff in this space? 27

Binary Classification Regression operates over a continuous set of outcomes Suppose that we want to learn a function ff: XX {0,1} As an example: xx 11 xx 22 xx 3 yy 1 0 0 1 0 2 0 1 0 1 3 1 1 0 1 4 1 1 1 0 How many functions with three binary inputs and one binary output are there? 28

Binary Classification xx 11 xx 22 xx 3 yy 0 0 0? 1 0 0 1 0 2 0 1 0 1 0 1 1? 1 0 0? 1 0 1? 3 1 1 0 1 4 1 1 1 0 2 8 possible functions 2 4 are consistent with the observations How do we choose the best one? What if the observations are noisy? 29

Challenges in ML How to choose the right hypothesis space? Number of factors influence this decision: difficulty of learning over the chosen space, how expressive the space is, How to evaluate the quality of our learned hypothesis? Prefer simpler hypotheses (to prevent overfitting) Want the outcome of learning to generalize to unseen data 30

Challenges in ML How do we find the best hypothesis? This can be an NP-hard problem! Need fast, scalable algorithms if they are to be applicable to real-world scenarios 31