Session 4: Regularization (Chapter 7)

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Session 4: Regularization (Chapter 7)"

Transcription

1 Session 4: Regularization (Chapter 7) Tapani Raiko Aalto University 30 September 2015 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

2 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

3 Goal of Regularization Neural networks are very powerful (universal appr.). Easy to perform great on the training set (overfitting). Regularization improves generalization to new data at the expense of increased training error. Use held-out validation data to choose hyperparameters (e.g. regularization strength). Use held-out test data to evaluate performance. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

4 Example Without regularization training error goes to zero and learning stops. With noise regularization, test error keeps dropping. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

5 Expressivity demo: Training first layer only No regularization, training W (1) and b (1) only. 0.2% error on training set, 2% error on test set. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

6 What is overfitting? Probability theory states how we should make predictions (of y test) using a model with unknowns θ and data X = {x train, y train, x test}: P(y test X) = P(y test, θ X)dθ = P(y test θ, X)P(θ X)dθ. Probability of observing y test can be acquired by summing or integrating over all different explanations θ. The term P(y test θ, X) is the probability of y test given a particular explanation θ and it is weighted with the probability of the explanation P(θ X). However, such computation is intractable. If we want to choose a single θ to represent all the probability mass, it is better not to overfit to the highest probability peak, but to find a good representative of the mass. Posterior probability mass matters Center of gravity maximum Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

7 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

8 Regularization methods Limited size of network Early stopping Weight decay Data augmentation Injecting noise Parameter sharing (e.g. convolutional) Sparse representations Ensemble methods Auxiliary tasks (e.g. unsupervised) Probabilistic treatment (e.g. variational methods) Adversarial training,... Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

9 Limited size of network Rule of thumb: When #parameters is ten times less than #outputs #examples, overfitting will not be severe. Reducing input dimensionality (e.g. by PCA) helps in reducing parameters Easy. Low computational complexity Other methods give better accuracy Data augmentation increases #examples Parameter sharing decreases #parameters Auxiliary tasks increases #outputs Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

10 Early stopping Monitor validation performance during training Stop when it starts to deteriorate With other regularization, it might never start Keeps solution close to the initialization Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

11 Weight decay (Tikhonov, 1943) Add a penalty term to the training cost C = + Ω(θ) Note: only a function of parameters θ, not data. L 2 regularization: Ω(θ) = λ 2 θ 2 hyperparameter λ for strength. Gradient: Ω(θ) θ i = λθ i. L 1 regularization: Ω(θ) = λ/2 θ 1 Gradient: Ω(θ) θ i = λ sign(θ i ). Induces sparsity: Often many params become zero. Max-norm: Constrain row vectors w i of weight matrices to w i 2 c. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

12 Weight decay L2 (left) and L1 (right). w unregularized solution, w regularized solution. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

13 Weight decay as Bayesian prior Consider the maximum a posteriori solution Bayes rule: P(θ X) = P(X θ)p(θ) written on -log scale: C = log P(X θ) log P(θ) Assuming Gaussian prior P(θ) = N (0, λ 1 I) we get Ω(θ) = i log exp θ2 i 2λ = λ 1 2 θ 2 L 2 regularization Gaussian prior L 1 regularization Laplace prior Max-norm regularization Uniform prior with finite support Ω = 0 Maximum likelihood Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

14 Data augmentation Image from (Dosovitskiy et al., 2014) Augmented data by image-specific transformations. E.g. cropping just 2 pixels gets you 9 times the data! Infinite MNIST: Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

15 Injecting noise (Sietsma and Dow, 1991) Inject random noise during training separately in each epoch Can be applied to input data, to hidden activations, or to weights Can be seen as data augmentation Simple end effective Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

16 Injecting noise to inputs (analysis) Inject small additive Gaussian noise at inputs Assume least squares error at output y Taylor series expansion around x Corresponds to penalizing the Jacobian J 2 y 1 J = dy x dx = y 1 x d y c x 1. y c x d For linear networks, this reduces to L 2 penalty Rifai et al. (2011) penalize the Jacobian directly Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

17 Parameter sharing Force sets of parameters to be equal Reduces the number of (unique) parameters Important in convolutional networks (next week) Auto-encoders sometimes share weights between encoder and decoder (Oct 28 session) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

18 Sparse representations Penalize representation h using Ω(h) to make it sparse L 1 penalty on weights makes W sparse Similarly L 1 penalty can make h sparse Also possible to set a desired sparsity level Sparse coding is common in image processing Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

19 Ensemble methods Train several models and take average of their outputs Also known as bagging or model averaging It helps to make individual models different by varying models or algorithms varying hyperparameters varying data (dropping examples or dimensions) varying random seed It is possible to train a single final model to mimick the performance of the ensemble, for test-time computational efficiency (Hinton et al., 2015) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

20 Dropout (Hinton et al., 2012) Each time we present data example x, randomly delete each hidden node with 0.5 probability Can be seen as injecting noise or as ensemble: Multiplicative binary noise Training an ensemble of 2 h networks with weight sharing At test time, use all nodes but divide weights by 2 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

21 Dropout training Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

22 Dropout as bagging Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

23 Auxiliary tasks Multi-task learning: Parameter sharing between multiple tasks E.g. speech recognition and speaker identification could share low-level representations Layer-wise pretraining (Hinton and Salakhutdinov, 2006) can be seen as using unsupervised learning as an auxiliary task (Nov 4 session) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

24 Probabilistic treatment Variational methods starting to appear in deep learning research T Machine Learning: Advanced Probabilistic Methods Jyri Kivinen might discuss these on Nov 11 session Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

25 Adversarial training (Szegedy et al., 2014) Search for an input x near a datapoint x that would have very different output y from y Adversaries can be found surprisingly close! Miato et al. (2015) build a very effective regularizer Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

26 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

27 Exercises Read Chapter 7 (Regularization) and Chapter 9 (Convolutional Networks) Read the Theano tutorial on Regularization: Extend your MNIST classifier to include regularization. Consider at least L2 weight decay and additive Gaussian noise injected in the inputs. Choose a good regularization strength using a held-out validation set. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Plankton Image Classification

Plankton Image Classification Plankton Image Classification Sagar Chordia Stanford University sagarc14@stanford.edu Romil Verma Stanford University vermar@stanford.edu Abstract This paper is in response to the National Data Science

More information

Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from. Predictive Analytics and Futurism December 2015 Issue 12 Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

More information

CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015

CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015 CPSC 340: Machine Learning and Data Mining Course Review/Preview Fall 2015 Admin Assignment 6 due now. We will have office hours as usual next week. Final exam details: December 15: 8:30-11 (WESB 100).

More information

Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

More information

Deep Learning. Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks. l l l l. CS 678 Deep Learning 1

Deep Learning. Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks. l l l l. CS 678 Deep Learning 1 Deep Learning Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks CS 678 Deep Learning 1 Deep Learning Overview Train networks with many layers (vs. shallow nets with just a couple

More information

Ensembles. CS Ensembles 1

Ensembles. CS Ensembles 1 Ensembles CS 478 - Ensembles 1 A Holy Grail of Machine Learning Outputs Just a Data Set or just an explanation of the problem Automated Learner Hypothesis Input Features CS 478 - Ensembles 2 Ensembles

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-

More information

Deep (Structured) Learning

Deep (Structured) Learning Deep (Structured) Learning Yasmine Badr 06/23/2015 NanoCAD Lab UCLA What is Deep Learning? [1] A wide class of machine learning techniques and architectures Using many layers of non-linear information

More information

CS 510: Lecture 8. Deep Learning, Fairness, and Bias

CS 510: Lecture 8. Deep Learning, Fairness, and Bias CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already

More information

Capacity, Learning, Teaching

Capacity, Learning, Teaching Capacity, Learning, Teaching Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison jerryzhu@cs.wisc.edu 23 Machine learning human learning Learning capacity and generalization bounds

More information

Training Neural Networks, Part I. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Training Neural Networks, Part I. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1 Lecture 6: Training Neural Networks, Part I Lecture 6-1 Administrative Assignment 1 due Thursday (today), 11:59pm on Canvas Assignment 2 out today Project proposal due Tuesday April 25 Notes on backprop

More information

Artificial Neural Networks. Andreas Robinson 12/19/2012

Artificial Neural Networks. Andreas Robinson 12/19/2012 Artificial Neural Networks Andreas Robinson 12/19/2012 Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically

More information

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

More information

NoiseOut: A Simple Way to Prune Neural Networks

NoiseOut: A Simple Way to Prune Neural Networks NoiseOut: A Simple Way to Prune Neural Networks Mohammad Babaeizadeh, Paris Smaragdis & Roy H. Campbell Department of Computer Science University of Illinois at Urbana-Champaign {mb2,paris,rhc}@illinois.edu.edu

More information

Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction

Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction Brandon Cui (bcui19@stanford.edu) 1 Calvin Qi (calvinqi@stanford.edu) 2 Abstract We studied

More information

Multi-Prediction Deep Boltzmann Machines

Multi-Prediction Deep Boltzmann Machines Multi-Prediction Deep Boltzmann Machines Ian J. Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal Montréal, QC H3C

More information

arxiv: v3 [cs.lg] 9 Mar 2014

arxiv: v3 [cs.lg] 9 Mar 2014 Learning Factored Representations in a Deep Mixture of Experts arxiv:1312.4314v3 [cs.lg] 9 Mar 2014 David Eigen 1,2 Marc Aurelio Ranzato 1 Ilya Sutskever 1 1 Google, Inc. 2 Dept. of Computer Science, Courant

More information

Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)

Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling

More information

Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

More information

Training Neural Networks, Part 2. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 7-1

Training Neural Networks, Part 2. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 7-1 Lecture 7: Training Neural Networks, Part 2 Lecture 7-1 Administrative - Assignment 1 is being graded, stay tuned - Project proposals due today by 11:59pm - Assignment 2 is out, due Thursday May 4 at 11:59pm

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Machine learning: what? Study of making machines learn a concept without having to explicitly program it. Constructing algorithms that can: learn

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

A study of the NIPS feature selection challenge

A study of the NIPS feature selection challenge A study of the NIPS feature selection challenge Nicholas Johnson November 29, 2009 Abstract The 2003 Nips Feature extraction challenge was dominated by Bayesian approaches developed by the team of Radford

More information

Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

More information

Machine Learning for Computer Vision

Machine Learning for Computer Vision Prof. Daniel Cremers Machine Learning for Computer PD Dr. Rudolph Triebel Lecturers PD Dr. Rudolph Triebel rudolph.triebel@in.tum.de Room number 02.09.058 (Fridays) Main lecture MSc. Ioannis John Chiotellis

More information

15 : Case Study: Topic Models

15 : Case Study: Topic Models 10-708: Probabilistic Graphical Models, Spring 2015 15 : Case Study: Topic Models Lecturer: Eric P. Xing Scribes: Xinyu Miao,Yun Ni 1 Task Humans cannot afford to deal with a huge number of text documents

More information

Computer Vision for Card Games

Computer Vision for Card Games Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program

More information

T Machine Learning: Advanced Probablistic Methods

T Machine Learning: Advanced Probablistic Methods T-61.5140 Machine Learning: Advanced Probablistic Methods Jaakko Hollmén Department of Information and Computer Science Helsinki University of Technology, Finland e-mail: Jaakko.Hollmen@tkk.fi Web: http://www.cis.hut.fi/opinnot/t-61.5140/

More information

Deep Multi-Task Learning with evolving weights

Deep Multi-Task Learning with evolving weights Deep Multi-Task Learning with evolving weights Soufiane Belharbi1, Romain He rault1, Cle ment Chatelain1 and Se bastien Adam2 1- INSA de Rouen - LITIS EA 4108 Saint E tienne du Rouvray 76800 - France 2-

More information

Deep Learning of Representations for Unsupervised and Transfer Learning

Deep Learning of Representations for Unsupervised and Transfer Learning JMLR: Workshop and Conference Proceedings 7 (2011) 1 20 Workshop on Unsupervised and Transfer Learning Deep Learning of Representations for Unsupervised and Transfer Learning Yoshua Bengio yoshua.bengio@umontreal.ca

More information

Phoneme Recognition Using Deep Neural Networks

Phoneme Recognition Using Deep Neural Networks CS229 Final Project Report, Stanford University Phoneme Recognition Using Deep Neural Networks John Labiak December 16, 2011 1 Introduction Deep architectures, such as multilayer neural networks, can be

More information

P(A, B) = P(A B) = P(A) + P(B) - P(A B)

P(A, B) = P(A B) = P(A) + P(B) - P(A B) AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

More information

TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING

TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING Samuli Laine NVIDIA slaine@nvidia.com Timo Aila NVIDIA taila@nvidia.com ABSTRACT In this paper, we present a simple and efficient method for training deep

More information

Linear Regression. Chapter Introduction

Linear Regression. Chapter Introduction Chapter 9 Linear Regression 9.1 Introduction In this class, we have looked at a variety of di erent models and learning methods, such as finite state machines, sequence models, and classification methods.

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

20.3 The EM algorithm

20.3 The EM algorithm 20.3 The EM algorithm Many real-world problems have hidden (latent) variables, which are not observable in the data that are available for learning Including a latent variable into a Bayesian network may

More information

Studies in Deep Belief Networks

Studies in Deep Belief Networks Studies in Deep Belief Networks Jiquan Ngiam jngiam@cs.stanford.edu Chris Baldassano chrisb33@cs.stanford.edu Abstract Deep networks are able to learn good representations of unlabelled data via a greedy

More information

Tiny ImageNet Image Classification Alexei Bastidas Stanford University

Tiny ImageNet Image Classification Alexei Bastidas Stanford University Tiny ImageNet Image Classification Alexei Bastidas Stanford University alexeib@stanford.edu Abstract In this work, I investigate how fine-tuning and adapting existing models, namely InceptionV3[7] and

More information

CSE 258 Lecture 3. Web Mining and Recommender Systems. Supervised learning Classification

CSE 258 Lecture 3. Web Mining and Recommender Systems. Supervised learning Classification CSE 258 Lecture 3 Web Mining and Recommender Systems Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression, in

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

More information

Government of Russian Federation. Federal State Autonomous Educational Institution of High Professional Education

Government of Russian Federation. Federal State Autonomous Educational Institution of High Professional Education Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University Higher School of Economics Syllabus for the course Advanced

More information

Learning Deep Belief Networks from Non-Stationary Streams

Learning Deep Belief Networks from Non-Stationary Streams Learning Deep Belief Networks from Non-Stationary Streams Roberto Calandra 1, Tapani Raiko 2, Marc Peter Deisenroth 1, and Federico Montesino Pouzols 3 1 Fachbereich Informatik, Technische Universität

More information

Analyzing human feature learning as nonparametric Bayesian inference

Analyzing human feature learning as nonparametric Bayesian inference Analyzing human feature learning as nonparametric Bayesian inference Joseph L. Austerweil Department of Psychology University of California, Berkeley Berkeley, CA 94720 Joseph.Austerweil@gmail.com Thomas

More information

HAMLET JERRY ZHU UNIVERSITY OF WISCONSIN

HAMLET JERRY ZHU UNIVERSITY OF WISCONSIN HAMLET JERRY ZHU UNIVERSITY OF WISCONSIN Collaborators: Rui Castro, Michael Coen, Ricki Colman, Charles Kalish, Joseph Kemnitz, Robert Nowak, Ruichen Qian, Shelley Prudom, Timothy Rogers Somewhere, something

More information

Automatic Generation of Neural Networks based on Genetic Algorithms

Automatic Generation of Neural Networks based on Genetic Algorithms Automatic Generation of Neural Networks based on Genetic Algorithms Fiszelew, A. 1, Britos, P. 2, 3, Perichisky, G. 3 & García-Martínez, R. 2 1 Intelligent Systems Laboratory. School of Engineering. University

More information

Deep learning for music genre classification

Deep learning for music genre classification Deep learning for music genre classification Tao Feng University of Illinois taofeng1@illinois.edu Abstract In this paper we will present how to use Restricted Boltzmann machine algorithm to build deep

More information

Twitter Sentiment Analysis with Recursive Neural Networks

Twitter Sentiment Analysis with Recursive Neural Networks Twitter Sentiment Analysis with Recursive Neural Networks Ye Yuan, You Zhou Department of Computer Science Stanford University Stanford, CA 94305 {yy0222, youzhou}@stanford.edu Abstract In this paper,

More information

A deep learning strategy for wide-area surveillance

A deep learning strategy for wide-area surveillance A deep learning strategy for wide-area surveillance 17/05/2016 Mr Alessandro Borgia Supervisor: Prof Neil Robertson Heriot-Watt University EPS/ISSS Visionlab Roke Manor Research partnership 17/05/2016

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Pattern Classification and Clustering Spring 2006

Pattern Classification and Clustering Spring 2006 Pattern Classification and Clustering Time: Spring 2006 Room: Instructor: Yingen Xiong Office: 621 McBryde Office Hours: Phone: 231-4212 Email: yxiong@cs.vt.edu URL: http://www.cs.vt.edu/~yxiong/pcc/ Detailed

More information

Deep multi-task learning with evolving weights

Deep multi-task learning with evolving weights Deep multi-task learning with evolving weights ESANN 2016 Soufiane Belharbi Romain Hérault Clément Chatelain Sébastien Adam soufiane.belharbi@insa-rouen.fr LITIS lab., DocApp team - INSA de Rouen, France

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 15th, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 15th, 2018 Data Mining CS573 Purdue University Bruno Ribeiro February 15th, 218 1 Today s Goal Ensemble Methods Supervised Methods Meta-learners Unsupervised Methods 215 Bruno Ribeiro Understanding Ensembles The

More information

Unsupervised Learning

Unsupervised Learning 17s1: COMP9417 Machine Learning and Data Mining Unsupervised Learning May 2, 2017 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html

More information

10701/15781 Machine Learning, Spring 2005: Homework 1

10701/15781 Machine Learning, Spring 2005: Homework 1 10701/15781 Machine Learning, Spring 2005: Homework 1 Due: Monday, February 6, beginning of the class 1 [15 Points] Probability and Regression [Stano] 1 1.1 [10 Points] The Matrix Strikes Back The Matrix

More information

Bayesian Reasoning and Deep Learning Shakir Mohamed

Bayesian Reasoning and Deep Learning Shakir Mohamed Bayesian Reasoning and Deep Learning Shakir Mohamed DeepMind shakirm.com @shakir_za 9 October 2015 Abstract Deep learning and Bayesian machine learning are currently two of the most active areas of machine

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Introduction to Deep Learning U Kang Seoul National University U Kang 1 In This Lecture Overview of deep learning History of deep learning and its recent advances

More information

Lecture 1. Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University

Lecture 1. Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University Advanced Machine Learning Lecture 1 Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer

More information

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities

More information

A brief tutorial on reinforcement learning: The game of Chung Toi

A brief tutorial on reinforcement learning: The game of Chung Toi A brief tutorial on reinforcement learning: The game of Chung Toi Christopher J. Gatti 1, Jonathan D. Linton 2, and Mark J. Embrechts 1 1- Rensselaer Polytechnic Institute Department of Industrial and

More information

arxiv: v3 [cs.cv] 16 Feb 2014

arxiv: v3 [cs.cv] 16 Feb 2014 Unsupervised feature learning by augmenting single images arxiv:1312.5242v3 [cs.cv] 16 Feb 2014 Alexey Dosovitskiy, Jost Tobias Springenberg and Thomas Brox Department of Computer Science University of

More information

CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network

CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network Byungwoo Kang Stanford University Department of Physics bkang@stanford.edu Hyun Sik Kim Stanford University Department of Electrical Engineering

More information

Learning Bayes Networks

Learning Bayes Networks Learning Bayes Networks 6.034 Based on Russell & Norvig, Artificial Intelligence:A Modern Approach, 2nd ed., 2003 and D. Heckerman. A Tutorial on Learning with Bayesian Networks. In Learning in Graphical

More information

STA 414/2104 Statistical Methods for Machine Learning and Data Mining

STA 414/2104 Statistical Methods for Machine Learning and Data Mining STA 414/2104 Statistical Methods for Machine Learning and Data Mining Radford M. Neal, University of Toronto, 2014 Week 1 What are Machine Learning and Data Mining? Typical Machine Learning and Data Mining

More information

MACHINE LEARNING WITH SAS

MACHINE LEARNING WITH SAS This webinar will be recorded. Please engage, use the Questions function during the presentation! MACHINE LEARNING WITH SAS SAS NORDIC FANS WEBINAR 21. MARCH 2017 Gert Nissen Technical Client Manager Georg

More information

Understanding data augmentation for classification: when to warp?

Understanding data augmentation for classification: when to warp? Understanding data augmentation for classification: when to warp? Sebastien C. Wong Defence Science and Technology Edinburgh SA, Australia Email: sebastien.wong@dsto.defence.gov.au Adam Gatt Australian

More information

One-Shot Learning of Faces

One-Shot Learning of Faces One-Shot Learning of Faces Luke Johnston William Chen Department of Computer Science, Stanford University Introduction The ability to learn and generalize from single or few examples is often cited as

More information

Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation

Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation Gregory Luppescu Department of Electrical Engineering Stanford University gluppes@stanford.edu Francisco

More information

CS519: Deep Learning 1. Introduction

CS519: Deep Learning 1. Introduction CS519: Deep Learning 1. Introduction Winter 2017 Fuxin Li With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim 1 Cutting Edge of Machine Learning: Deep Learning

More information

A Review on Classification Techniques in Machine Learning

A Review on Classification Techniques in Machine Learning A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College

More information

Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis

Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis Target Target Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis Vanika Singhal, Anupriya Gogna and Angshul Majumdar Indraprastha Institute of Information Technology,

More information

Assembly Output Codes for Learning Neural Networks

Assembly Output Codes for Learning Neural Networks Assembly Output Codes for Learning Neural Networks Philippe Tigreat*, Carlos Rosar Kos Lassance*, Xiaoran liang **, Vincent Gripon*, Claude Berrou* *Electronics Department, Telecom Bretagne **INRIA Rennes

More information

Word Sense Determination from Wikipedia. Data Using a Neural Net

Word Sense Determination from Wikipedia. Data Using a Neural Net 1 Word Sense Determination from Wikipedia Data Using a Neural Net CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University By Qiao Liu May 2017 Word Sense Determination

More information

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551

More information

Deep Reinforcement Learning for Flappy Bird Kevin Chen

Deep Reinforcement Learning for Flappy Bird Kevin Chen Deep Reinforcement Learning for Flappy Bird Kevin Chen Abstract Reinforcement learning is essential for applications where there is no single correct way to solve a problem. In this project, we show that

More information

545 Machine Learning, Fall 2011

545 Machine Learning, Fall 2011 545 Machine Learning, Fall 2011 Final Project Report Experiments in Automatic Text Summarization Using Deep Neural Networks Project Team: Ben King Rahul Jha Tyler Johnson Vaishnavi Sundararajan Instructor:

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Outline Introduction to Neural Network Introduction to Artificial Neural Network Properties of Artificial Neural Network Applications of Artificial Neural Network Demo Neural

More information

CSC 411 MACHINE LEARNING and DATA MINING

CSC 411 MACHINE LEARNING and DATA MINING CSC 411 MACHINE LEARNING and DATA MINING Lectures: Monday, Wednesday 12-1 (section 1), 3-4 (section 2) Lecture Room: MP 134 (section 1); Bahen 1200 (section 2) Instructor (section 1): Richard Zemel Instructor

More information

Abstractive Summarization with Global Importance Scores

Abstractive Summarization with Global Importance Scores Abstractive Summarization with Global Importance Scores Shivaal Roy Department of Computer Science Stanford University shivaal@cs.stanford.edu Vivian Nguyen Department of Computer Science Stanford University

More information

Convolutional Neural Networks for Speech Recognition

Convolutional Neural Networks for Speech Recognition IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL 22, NO 10, OCTOBER 2014 1533 Convolutional Neural Networks for Speech Recognition Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang,

More information

Indian Institute of Technology Kanpur. Deep Learning for Document Classification

Indian Institute of Technology Kanpur. Deep Learning for Document Classification Indian Institute of Technology Kanpur CS671 - Natural Language Processing Course project Deep Learning for Document Classification Amlan Kar Sanket Jantre Supervised by Dr. Amitabha Mukerjee Contents 1

More information

Session 1: Gesture Recognition & Machine Learning Fundamentals

Session 1: Gesture Recognition & Machine Learning Fundamentals IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalisation and Learning Trajectory

Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalisation and Learning Trajectory Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalisation and Learning Trajectory Alan F. Murray Dept. of Electrical Engineering Edinburgh University Scotland Peter J. Edwards

More information

Programming Assignment2: Neural Networks

Programming Assignment2: Neural Networks Programming Assignment2: Neural Networks Problem :. In this homework assignment, your task is to implement one of the common machine learning algorithms: Neural Networks. You will train and test a neural

More information

Learning General Features From Images and Audio With Stacked Denoising Autoencoders

Learning General Features From Images and Audio With Stacked Denoising Autoencoders Portland State University PDXScholar Dissertations and Theses Dissertations and Theses Fall 1-23-2014 Learning General Features From Images and Audio With Stacked Denoising Autoencoders Nathaniel H. Nifong

More information

Bayesian Deep Learning for Integrated Intelligence: Bridging the Gap between Perception and Inference

Bayesian Deep Learning for Integrated Intelligence: Bridging the Gap between Perception and Inference 1 Bayesian Deep Learning for Integrated Intelligence: Bridging the Gap between Perception and Inference Hao Wang Department of Computer Science and Engineering Joint work with Naiyan Wang, Xingjian Shi,

More information

Binary decision trees

Binary decision trees Binary decision trees A binary decision tree ultimately boils down to taking a majority vote within each cell of a partition of the feature space (learned from the data) that looks something like this

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa Chen, Yuxin Horning, Mitchell Shee, Liberty Supervised by: Prof. Yohann Tero August 9, 213 Abstract This report details and exts the implementation

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Classification: Naïve Bayes Readings: Barber 10.1-10.3 Stefan Lee Virginia Tech Administrativia HW2 Due: Friday 09/28, 10/3, 11:55pm Implement linear

More information

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably

More information

A Few Useful Things to Know about Machine Learning. Pedro Domingos Department of Computer Science and Engineering University of Washington" 2012"

A Few Useful Things to Know about Machine Learning. Pedro Domingos Department of Computer Science and Engineering University of Washington 2012 A Few Useful Things to Know about Machine Learning Pedro Domingos Department of Computer Science and Engineering University of Washington 2012 A Few Useful Things to Know about Machine Learning Machine

More information

Vocal Tract Length Perturbation (VTLP) improves speech recognition

Vocal Tract Length Perturbation (VTLP) improves speech recognition Vocal Tract Length Perturbation (VTLP) improves speech recognition Navdeep Jaitly ndjaitly@cs.toronto.edu University of Toronto, 10 King s College Rd., Toronto, ON M5S 3G4 CANADA Geoffrey E. Hinton hinton@cs.toronto.edu

More information

18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES 18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

More information

Lip Reader: Video-Based Speech Transcriber

Lip Reader: Video-Based Speech Transcriber Lip Reader: Video-Based Speech Transcriber Bora Erden Max Wolff Sam Wood 1. Introduction We set out to build a lip-reader, which would take audio-free videos of people speaking and reconstruct their spoken

More information

Deep Learning in Music Informatics

Deep Learning in Music Informatics Deep Learning in Music Informatics Demystifying the Dark Art, Part III Practicum Eric J. Humphrey 04 November 2013 Outline In this part of the talk, we ll touch on the following: Recap: What is deep learning

More information

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION James H. Nealand, Alan B. Bradley, & Margaret Lech School of Electrical and Computer Systems Engineering, RMIT University,

More information

Big Data Analytics Clustering and Classification

Big Data Analytics Clustering and Classification E6893 Big Data Analytics Lecture 4: Big Data Analytics Clustering and Classification Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 28th, 2017 1

More information