A Very Brief Introduc/on to Machine Learning and its Applica/on to PCE

Similar documents
Lecture 1: Machine Learning Basics

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Python Machine Learning

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

(Sub)Gradient Descent

Generative models and adversarial training

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.lg] 15 Jun 2015

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Knowledge Transfer in Deep Convolutional Neural Nets

Artificial Neural Networks written examination

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

On the Formation of Phoneme Categories in DNN Acoustic Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v2 [cs.cv] 30 Mar 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Assignment 1: Predicting Amazon Review Ratings

Speech Recognition at ICSI: Broadcast News and beyond

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Deep Neural Network Language Models

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Individual Differences & Item Effects: How to test them, & how to test them well

Human Emotion Recognition From Speech

Time series prediction

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Calibration of Confidence Measures in Speech Recognition

Probability and Statistics Curriculum Pacing Guide

A study of speaker adaptation for DNN-based speech synthesis

CS 446: Machine Learning

Discovering Statistics

Attributed Social Network Embedding

arxiv: v1 [cs.lg] 7 Apr 2015

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised Face Detection

Active Learning. Yingyu Liang Computer Sciences 760 Fall

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Probabilistic Latent Semantic Analysis

WHEN THERE IS A mismatch between the acoustic

Second Exam: Natural Language Parsing with Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Review: Speech Recognition with Deep Learning Methods

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods for Fuzzy Systems

Lecture 1: Basic Concepts of Machine Learning

Why Did My Detector Do That?!

Machine Learning and Development Policy

Comment-based Multi-View Clustering of Web 2.0 Items

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Residual Stacking of RNNs for Neural Machine Translation

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Market Design and Computer- Assisted Markets: An Economist s Perspec;ve. Simons Ins;tute, Berkeley May 31, 2013

Truth Inference in Crowdsourcing: Is the Problem Solved?

arxiv: v1 [cs.cv] 10 May 2017

Modeling function word errors in DNN-HMM based LVCSR systems

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Word Segmentation of Off-line Handwritten Documents

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Introduction to Simulation

arxiv: v1 [cs.cl] 2 Apr 2017

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Learning From the Past with Experiment Databases

Deep Facial Action Unit Recognition from Partially Labeled Data

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

MASTERS VS. PH.D. WHICH ONE TO CHOOSE? HOW FAR TO GO? Rita H. Wouhaybi, Intel Labs Bushra Anjum, Amazon

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

12- A whirlwind tour of statistics

Improvements to the Pruning Behavior of DNN Acoustic Models

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Types of Research EDUC 500

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Introduction to Causal Inference. Problem Set 1. Required Problems

Visual CP Representation of Knowledge

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Word learning as Bayesian inference

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Mining Association Rules in Student s Assessment Data

A Comparison of Annealing Techniques for Academic Course Scheduling

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

On-the-Fly Customization of Automated Essay Scoring

Axiom 2013 Team Description Paper

Transcription:

A Very Brief Introduc/on to Machine Learning and its Applica/on to PCE + = 1 2 LER Path Comp 3 4 David Meyer Next Steps for the Path Computa/on Element Workshop Feb 17-18, 2015 hnp://ict- one.eu/pace/public_wiki/mediawiki- 1.19.7/index.php?/tle=Workshops dmm@{brocade.com,uoregon.edu,1-4- 5.net, }

Agenda Goals for this Talk What is Machine Learning? Kinds of Machine Learning Machine Learning Fundamentals Shallow dive Regression and Classifica/on Induc/ve Learning Focus on Ar/ficial Neural Networks (ANNs) A Bit on Unsupervised Learning Deep Learning Google Power Usage Effec0veness (PUE) Op/miza/on Applica/on PCE? With figures courtesy Yoshua Bengio and others

Goals for this Talks To give us a basic common understanding of machine learning so that we can discuss the applica9on of machine learning to PCE

Before We Start What is the SOTA in Machine Learning? Building High- level Features Using Large Scale Unsupervised Learning, Andrew Ng, et. al, 2012 hnp://arxiv.org/pdf/1112.6209.pdf Training a deep neural network Showed that it is possible to train neurons to be selec/ve for high- level concepts using en/rely unlabeled data In par/cular, they trained a deep neural network that func/ons as detectors for faces, human bodies, and cat faces by training on random frames of YouTube videos (ImageNet 1 ). These neurons naturally capture complex in- variances such as out- of- plane and scale invariances. Details of the Model Sparse deep auto- encoder (we ll talk about what this is later in this deck) O(10 9 ) connec/ons O(10 7 ) 200x200 pixel images, 10 3 machines, 16K cores à Input data in R 40000 Three days to train 15.8% accuracy categorizing 22K object classes 70% improvement over current results Random guess achieves less than 0.005% accuracy for this dataset 1 hnp://www.image- net.org/

Goals for this Talk Agenda What is Machine Learning? Machine Learning Fundamentals Shallow dive Regression and Classifica/on Induc/ve Learning Focus on Ar/ficial Neural Networks (ANNs) PCE?

What is Machine Learning? The complexity in tradi0onal computer programming is in the code (programs that people write). In machine learning, algorithms (programs) are in principle simple and the complexity (structure) is in the data. Is there a way that we can automa0cally learn that structure? That is what is at the heart of machine learning. - - Andrew Ng That is, machine learning is the about the construc/on and study of systems that can learn from data. This is very different than tradi/onal computer programming.

The Same Thing Said in Cartoon Form Tradi9onal Programming Data Program Computer Output Machine Learning Data Output Computer Program

When Would We Use Machine Learning? When panerns exists in our data Even if we don t know what they are Or perhaps especially when we don t know what they are We can not pin down the func/onal rela/onships mathema/cally Else we would just code up the algorithm When we have lots of (unlabeled) data Labeled training sets harder to come by Data is of high- dimension High dimension features For example, sensor data Want to discover lower- dimension representa/ons Dimension reduc/on Aside: Machine Learning is heavily focused on implementability Frequently using well know numerical op/miza/on techniques Lots of open source code available See e.g., libsvm (Support Vector Machines): hnp://www.csie.ntu.edu.tw/~cjlin/libsvm/ Most of my code in python: hnp://scikit- learn.org/stable/ (many others) Languages (e.g., octave: hnps://www.gnu.org/sotware/octave/)

Why Machine Learning Is Hard What is a 2?

Examples of Machine Learning Problems PaNern Recogni/on Facial iden//es or facial expressions HandwriNen or spoken words (e.g., Siri) Medical images Sensor Data/IoT Op/miza/on Many parameters have hidden rela/onships that can be the basis of op/miza/on Obvious PCE use case PaNern Genera/on Genera/ng images or mo/on sequences Anomaly Detec/on Unusual panerns in the telemetry from physical and/or virtual plants (e.g., data centers) Unusual sequences of credit card transac/ons Unusual panerns of sensor data from a nuclear power plant or unusual sound in your car engine or Predic/on Future stock prices or currency exchange rates

Agenda Goals for this Talk What is Machine Learning? Machine Learning Fundamentals Shallow(er) dive Induc/ve Learning: Regression and Classifica/on Focus on Ar/ficial Neural Networks (ANNs) PCE?

So What Is Induc/ve Learning? Given examples of a func/on (x, f(x)) Supervised learning (because we re given f(x)) Don t explicitly know f (rather, trying to fit a model to the data) Labeled data set (i.e., the f(x) s) Training set may be noisy, e.g., (x, (f(x) + ε)) Nota/on: (x i, f(x i )) denoted (x (i),y (i) ) y (i) some/mes called t i (t for target ) Predict func/on f(x) for new examples x Discrimina/on/Predic/on (Regression): f(x) con/nuous Classifica/on: f(x) discrete Es/ma/on: f(x) = P(Y = c x) for some class c

Neural Nets in 1 Slide (J )

Forward Propaga/on Cartoon

Backpropaga/on Cartoon

More Formally Empirical Risk Minimiza/on (loss func/on also called cost func/on denoted J(θ)) Any interes9ng cost func9on is complicated and non- convex

Solving the Risk (Cost) Minimiza/on Problem Gradient Descent Basic Idea

Gradient Descent Intui/on 1 Convex Cost Func/on One of the many nice proper9es of convexity is that any local minimum is also a global minimum

Gradient Decent Intui/on 2 Unfortunately, any interes9ng cost func9on is likely non- convex

Solving the Op/miza/on Problem Gradient Descent for Linear Regression The big breakthrough in the 1980s from the Hinton lab was the backpropaga/on algorithm, which is a way of compu/ng the gradient of the loss func/on with respect to the model parameters θ

Agenda Goals for this Talk What is Machine Learning? Kinds of Machine Learning Machine Learning Fundamentals Shallow dive Induc/ve Learning: Regression and Classifica/on Focus on Ar/ficial Neural Networks (ANNs) PCE?

Now, How About PCE? PCE ideally suited to SDN and Machine Learning Can we infer proper/es of paths we can t directly see? Likely living in high- dimensional space(s) i.e., those in other domains Other inference tasks? Aggregate bandwidth consump/on Most loaded links/conges/on Cumula/ve cost of path set Uncover unseen correla/ons that allow for new op/miza/ons How to get there from here The PCE was always a form of SDN Applying Machine Learning to the PCE requires understanding the problem you want to solve and what data sets you have

PCE Data Sets Assume we have labeled data set {(X (1),Y (1) ),,(X (n),y (n) )} Where X (i) is an m- dimensional vector, and Y (i) is usually a k dimensional vector, k < m Strawman X (the PCE knows this informa/on) X (i) = (Path end points, Desired path constraints, Computed path, Aggregate path constraints (e.g. path cost), Minimum cost path, Minimum load path, Maximum residual bandwidth path, Aggregate bandwidth consump/on, Load of the most loaded link, Cumula/ve cost of a set of paths, Other (possibly exogenous) data) The Y (i) s are a set of classes we want to predict, e.g., conges/on, latency,

What Might the Labels Look Like? à (instance)

Making this Real (what do we have to do?) Choose the labels of interest What are the classes of interest, what might we want to predict? Get the (labeled) data set (this is always the trick ) Split into training, test, cross- valida/on Avoid generaliza/on error (bias, variance) Avoid data leakage Choose a model I would try supervised DNN We want to find non- obvious features, which likely live in high- dimensional space Test on (previously) unseen examples Write code Iterate

Issues/Challenges Is there a unique model that PCEs would use? Unlikely à online learning PCE is a non- perceptual tasks (we think) Most if not all of the recent successes with ML have been on perceptual tasks (image recogni/on, speech recog/genera/on, ) Does the Manifold Hypothesis hold for non- perceptual data sets? Unlabeled vs. Labeled Data Most commercial successes in ML have come with deep supervised learning à labeled data We don t have ready access to large labeled data sets (always a problem) Time Series Data With the excep/on of Recurrent Neural Networks, most ANNs do not explicitly model /me (e.g., Deep Neural Networks) Training vs. {predic/on,classifica/on} Complexity Stochas/c (online) vs. Batch vs. Mini- batch Where are the computa/onal bonlenecks, and how do those interact with (quasi) real /me requirements?

Q & A Thanks!

BTW, How Can Machine Learning Possibly Work? You want to build sta/s/cal models that generalize to unseen cases What assump/ons do we need to do this (essen/ally predict the future)? 4 main prior assump/ons are (at least) required Smoothness Manifold Hypothesis Distributed Representa/on/Composi/onality Composi/onality is useful to describe the world around us efficiently à distributed representa/ons (features) are meaningful by themselves. Non- distributed à # of dis/nguishable regions linear in # of parameters Distributed à # of dis/nguishable regions grows almost exponen/ally in # of parameters Each parameter influences many regions, not just local neighbors Want to generalize non- locally to never- seen regions Shared Underlying Explanatory Factors The assump/on here is that there are shared underlying explanatory factors, in par/cular between p(x) (prior distribu/on) and p(y x) (posterior distribu/on). Disentangling these factors is in part what machine learning is about. Before this, however: What is the problem in the first place?

What We Are Figh/ng: The Curse Of Dimensionality

Smoothness Smoothness assump0on: If x is geometrically close to x then f(x) f(x )

Smoothness, basically Probability mass P(Y=c X;θ)

Manifold Hypothesis The Manifold Hypothesis states that natural data forms lower dimensional manifolds in its embedding space. Why should this be? Well, it seems that there are both theore/cal and experimental reasons to suspect that the Manifold Hypothesis is true. So if you believe that the MH is true, then the task of a machine learning classifica/on algorithm is fundamentally to separate a bunch of tangled up manifolds.