Introduction to Machine Learning 1. Nov., 2018 D. Ratner SLAC National Accelerator Laboratory

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Artificial Neural Networks written examination

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Generative models and adversarial training

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Assignment 1: Predicting Amazon Review Ratings

arxiv: v1 [cs.lg] 15 Jun 2015

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Reducing Features to Improve Bug Prediction

Learning From the Past with Experiment Databases

Model Ensemble for Click Prediction in Bing Search Ads

Calibration of Confidence Measures in Speech Recognition

Probability and Statistics Curriculum Pacing Guide

Semi-Supervised Face Detection

The Evolution of Random Phenomena

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Axiom 2013 Team Description Paper

Evolutive Neural Net Fuzzy Filtering: Basic Description

Laboratorio di Intelligenza Artificiale e Robotica

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Knowledge Transfer in Deep Convolutional Neural Nets

Speech Emotion Recognition Using Support Vector Machine

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Test Effort Estimation Using Neural Network

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Softprop: Softmax Neural Network Backpropagation Learning

Probabilistic Latent Semantic Analysis

WHEN THERE IS A mismatch between the acoustic

Lecture 10: Reinforcement Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Detailed course syllabus

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v2 [cs.cv] 30 Mar 2017

Learning Methods for Fuzzy Systems

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Issues in the Mining of Heart Failure Datasets

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

INPE São José dos Campos

A Case Study: News Classification Based on Term Frequency

Why Did My Detector Do That?!

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Switchboard Language Model Improvement with Conversational Data from Gigaword

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Modeling function word errors in DNN-HMM based LVCSR systems

Indian Institute of Technology, Kanpur

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Modeling function word errors in DNN-HMM based LVCSR systems

Applications of data mining algorithms to analysis of medical data

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

School of Innovative Technologies and Engineering

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Learning Methods in Multilingual Speech Recognition

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Support Vector Machines for Speaker and Language Recognition

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A study of speaker adaptation for DNN-based speech synthesis

Laboratorio di Intelligenza Artificiale e Robotica

Rule Learning With Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

Time series prediction

A Review: Speech Recognition with Deep Learning Methods

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Linking Task: Identifying authors and book titles in verbose queries

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Speech Recognition at ICSI: Broadcast News and beyond

Word learning as Bayesian inference

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

arxiv: v1 [cs.cv] 10 May 2017

Attributed Social Network Embedding

Deep Facial Action Unit Recognition from Partially Labeled Data

CS 446: Machine Learning

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Statewide Framework Document for:

Julia Smith. Effective Classroom Approaches to.

STA 225: Introductory Statistics (CT)

Reinforcement Learning by Comparing Immediate Reward

Transcription:

Introduction to Machine Learning 1 Nov., 2018 D. Ratner SLAC National Accelerator Laboratory

Introduction What is machine learning? Arthur Samuel (1959): Ability to learn without being explicitly programmed Tom Mitchell (1998): Computer program learns from experience E with respect to task T if its performance, P, improves after experience E. When is machine learning successful? Tasks which humans can learn, but have trouble explaining how Regression Neural networks Sentient computers

Introduction Topics Supervised learning (examples with labels): ML framework/terminology Regression vs. classification Parameteric vs. non-parametric models Unsupervised learning (examples, no labels): Clustering, anomaly/breakout detection, generation Reinforcement learning (examples, partial labels): Control, games, optimization Goal from Lecture 1: Learn terminology and framework of ML Goal from Lecture 2: See examples of ML in accelerator physics Material drawn from: Stanford CS 229, EE103 Michael Nielsen, Neural Networks and Deep Learning

Supervised learning: Parametric models Least Squares Regression Start from a simple problem: can we predict house price? Training set consists of m examples Each example has n attributes (x) and one label (y) Our goal: given a new example, x, can we predict its label, y? Hypothesis: example i sum over n attributes guess for y Parameters/weights

Supervised learning: Parametric models Least Squares Regression The core of machine learning: how do we learn best q given data x,y? Need a metric for best : Cost/Loss function Examples: mean square error (MSE), absolute error, etc. MSE: # of examples groundtruth=label Optimal q : n+1 x 1 m x n+1 m x 1

Supervised learning: Parametric models MSE: Least Squares Regression The core of machine learning: how do we learn best q given data X,y? Need a metric for best : Cost/Loss function Examples: mean square error (MSE), absolute error, etc. # of examples groundtruth Learning rate Stochastic gradient descent : update q after each i

Supervised learning: Hyper-parameter choice Least Squares Regression Hyper-parameters : how do we choose model itself? e.g. pick model architecture, cost function, learning rate, etc. p=10 p=2 p=1 attributes features

Error (J) Polynomial (p) Supervised learning: Hyper-parameter choice Least Squares Regression Hyper-parameters : how do we choose model itself? e.g. pick model architecture, cost function, learning rate, etc. Split data into training and test (and validation) sets Typical split: 80/20 or 80/10/10 Degree (p) Train error Test error 1 0.65 0.75 * p=10 2 0.47 0.57 * 10 0.15 2.54 * * * p=1 p=2 test train

Error (J) Supervised learning: Hyper-parameter choice Bias-Variance Tradeoff High bias Collect new attributes, create new features, more parameters High variance Fewer features (e.g. mutual information ), more data High bias High variance test High bias (under-fitting) Polynomial (p) train High variance (over-fitting)

Supervised learning: Hyper-parameter choice Bias-Variance Tradeoff Regularization: modify the cost function Penalizes large amplitudes of q

Supervised learning: Parametric models Least Squares Regression: Probabilistic interpretation Define Likelihood : Most likely

Supervised learning: Parametric models Least Squares Regression: Probabilistic interpretation Maximum Likelihood Estimation (MLE) log likelihood Least squares

Supervised learning: Parametric models Least Squares Regression: Bayesian interpretation Sick (1% of pop.) Healthy (99% of pop.) P(A) P(B) Positive test 90% 10% Negative test 10% 90% P(A+B) Given positive result, what is probability of correct diagnosis? Bayes Rule: ~8% Regularization term Maximum a posteriori (MAP)

Supervised learning: Parametric models Logistic Regression Classification problem: Did a house sell? Output limited to range [0, 1] full regression seems awkward y=0 h=0.5 Use MLE to derive update rule: y=1 Same as OLS except now h is non-linear

Supervised learning: Non-parametric Instance-based learning Parametric model: Non-parametric model: x (4) X (2) X (1) x (5) K-nearest neighbors x 2 x* x (3) x 1

Supervised learning: Non-parametric Optimal-margin classifier Alternative classifier definition: find hyperplane that divides classes Optimal-margin classifier: pick line with maximize minimum distance from plane

Supervised learning: Non-parametric Optimal-margin classifier Alternative classifier definition: find hyperplane that divides classes Optimal-margin classifier: pick line with maximize minimum distance from plane y = -1 Support vector machine (SVM): y = +1 Prediction rule:

Supervised learning: Non-parametric Support Vector Machines What happens if classes aren t separable? Try adding new features: e.g. x 12 + x 2 2

Supervised learning: Non-parametric SVMs and Kernels Feature mapping: SVM equation: Define kernel : New SVM equation:

Supervised learning: Non-parametric SVMs and Kernels Feature mapping: SVM equation: Define kernel : New SVM equation: Mercer s theorem: K(x,z) is kernel iff symmetric, positive, semi-definite Kernel trick

Supervised learning: Non-parametric Presenting Classification Results How do I report how well my model works? Precision-Recall 99% accurate! wikipedia

Supervised learning: Non-parametric Presenting Classification Results How do I report how well my model works? Precision-Recall How do I pick the threshold for classification? h=0.3 h=0.5 h=0.7 o x wikipedia

Precision Supervised learning: Non-parametric Presenting Classification Results How do I report how well my model works? Precision-Recall How do I pick the threshold for classification? Area under curve (AUC) scikit-learn Recall wikipedia

Michael Nielsen, Neural Networks and Deep Learning, Determination Press (2015) Supervised learning: Parametric models The Perceptron w 1 w 2 b w 3 Sigmoid Tanh ReLU

Supervised learning: Parametric models Artificial Neural Networks Input Hidden layers Cost function, e.g. MSE Output Problem: O(n 2 ) Clever idea to the rescue: Use the chain rule! Backpropagation Michael Nielsen, Neural Networks and Deep Learning, Determination Press (2015)

Michael Nielsen, Neural Networks and Deep Learning, Determination Press (2015) Supervised learning: Parametric models Convolutional Neural Networks

Supervised learning: Parametric models ANNs practical tips 1. Training is slow use GPUs 2. Large models can have millions of parameters, prone to over-fitting Use regularization, drop-out, noise-layers, lots of data 3. Always plot training AND validation loss shows bias vs. variance 4. Not training? Try different loss functions, activations, architectures, mini-batch parameters, optimization algorithms, learning rates, data quality Hidden layers Input Output test train

Unsupervised learning What can be accomplished without labels? Supervised learning: X, y Unsupervised learning: X What can we hope to accomplish? 1. Clustering (classification) 2. Decomposition (e.g. separating audio signals) 3. Anomaly/breakout detection (e.g. fault detection/prediction) 4. Generation (e.g. creating new examples within a class)

Unsupervised learning What can be accomplished without labels? Clustering: Divide x into k categories K-means K-means algorithm: a. Pick k random centroids b. Loop until convergence { 1. Assign examples to nearest centroid 2. Update centroids to mean of clusters } See also: Hierarchical clustering, DBSCAN, etc http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html

Unsupervised learning Time series data: Anomaly/Breakout/Changepoint Detection Anomaly detection: identify points that are statistical outliers from a distribution PyAstronomy: Generalized ESD (GESD) (Available from pip install) Breakout/Changepoint detection: Find point in time at which distribution changed X Y

Unsupervised learning Generating new data Unsupervised learning with neural networks: train a model to generate new examples based on training set Deep dreaming of dogs Style transfer If you train a network to recognize dogs it will hallucinate dogs Gatys, et al.

Unsupervised learning Generating new data Generative Adversarial Network (GAN) Training Set Real Discriminator Generator Noise Fake Cross entropy (log loss)

Partial supervision Reinforcement Learning r = p = Third category: partial supervision e.g. when playing a game, will not have a known label for every position Goal is to find policy : optimal action a s, given state s AlphaGo Actions: a States: s Transition probability: p Rewards: r https://en.wikipedia.org/wiki/reinforcement_learning