Session 4: Regularization (Chapter 7)

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Session 4: Regularization (Chapter 7)"

Transcription

1 Session 4: Regularization (Chapter 7) Tapani Raiko Aalto University 30 September 2015 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

2 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

3 Goal of Regularization Neural networks are very powerful (universal appr.). Easy to perform great on the training set (overfitting). Regularization improves generalization to new data at the expense of increased training error. Use held-out validation data to choose hyperparameters (e.g. regularization strength). Use held-out test data to evaluate performance. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

4 Example Without regularization training error goes to zero and learning stops. With noise regularization, test error keeps dropping. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

5 Expressivity demo: Training first layer only No regularization, training W (1) and b (1) only. 0.2% error on training set, 2% error on test set. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

6 What is overfitting? Probability theory states how we should make predictions (of y test) using a model with unknowns θ and data X = {x train, y train, x test}: P(y test X) = P(y test, θ X)dθ = P(y test θ, X)P(θ X)dθ. Probability of observing y test can be acquired by summing or integrating over all different explanations θ. The term P(y test θ, X) is the probability of y test given a particular explanation θ and it is weighted with the probability of the explanation P(θ X). However, such computation is intractable. If we want to choose a single θ to represent all the probability mass, it is better not to overfit to the highest probability peak, but to find a good representative of the mass. Posterior probability mass matters Center of gravity maximum Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

7 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

8 Regularization methods Limited size of network Early stopping Weight decay Data augmentation Injecting noise Parameter sharing (e.g. convolutional) Sparse representations Ensemble methods Auxiliary tasks (e.g. unsupervised) Probabilistic treatment (e.g. variational methods) Adversarial training,... Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

9 Limited size of network Rule of thumb: When #parameters is ten times less than #outputs #examples, overfitting will not be severe. Reducing input dimensionality (e.g. by PCA) helps in reducing parameters Easy. Low computational complexity Other methods give better accuracy Data augmentation increases #examples Parameter sharing decreases #parameters Auxiliary tasks increases #outputs Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

10 Early stopping Monitor validation performance during training Stop when it starts to deteriorate With other regularization, it might never start Keeps solution close to the initialization Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

11 Weight decay (Tikhonov, 1943) Add a penalty term to the training cost C = + Ω(θ) Note: only a function of parameters θ, not data. L 2 regularization: Ω(θ) = λ 2 θ 2 hyperparameter λ for strength. Gradient: Ω(θ) θ i = λθ i. L 1 regularization: Ω(θ) = λ/2 θ 1 Gradient: Ω(θ) θ i = λ sign(θ i ). Induces sparsity: Often many params become zero. Max-norm: Constrain row vectors w i of weight matrices to w i 2 c. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

12 Weight decay L2 (left) and L1 (right). w unregularized solution, w regularized solution. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

13 Weight decay as Bayesian prior Consider the maximum a posteriori solution Bayes rule: P(θ X) = P(X θ)p(θ) written on -log scale: C = log P(X θ) log P(θ) Assuming Gaussian prior P(θ) = N (0, λ 1 I) we get Ω(θ) = i log exp θ2 i 2λ = λ 1 2 θ 2 L 2 regularization Gaussian prior L 1 regularization Laplace prior Max-norm regularization Uniform prior with finite support Ω = 0 Maximum likelihood Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

14 Data augmentation Image from (Dosovitskiy et al., 2014) Augmented data by image-specific transformations. E.g. cropping just 2 pixels gets you 9 times the data! Infinite MNIST: Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

15 Injecting noise (Sietsma and Dow, 1991) Inject random noise during training separately in each epoch Can be applied to input data, to hidden activations, or to weights Can be seen as data augmentation Simple end effective Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

16 Injecting noise to inputs (analysis) Inject small additive Gaussian noise at inputs Assume least squares error at output y Taylor series expansion around x Corresponds to penalizing the Jacobian J 2 y 1 J = dy x dx = y 1 x d y c x 1. y c x d For linear networks, this reduces to L 2 penalty Rifai et al. (2011) penalize the Jacobian directly Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

17 Parameter sharing Force sets of parameters to be equal Reduces the number of (unique) parameters Important in convolutional networks (next week) Auto-encoders sometimes share weights between encoder and decoder (Oct 28 session) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

18 Sparse representations Penalize representation h using Ω(h) to make it sparse L 1 penalty on weights makes W sparse Similarly L 1 penalty can make h sparse Also possible to set a desired sparsity level Sparse coding is common in image processing Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

19 Ensemble methods Train several models and take average of their outputs Also known as bagging or model averaging It helps to make individual models different by varying models or algorithms varying hyperparameters varying data (dropping examples or dimensions) varying random seed It is possible to train a single final model to mimick the performance of the ensemble, for test-time computational efficiency (Hinton et al., 2015) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

20 Dropout (Hinton et al., 2012) Each time we present data example x, randomly delete each hidden node with 0.5 probability Can be seen as injecting noise or as ensemble: Multiplicative binary noise Training an ensemble of 2 h networks with weight sharing At test time, use all nodes but divide weights by 2 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

21 Dropout training Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

22 Dropout as bagging Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

23 Auxiliary tasks Multi-task learning: Parameter sharing between multiple tasks E.g. speech recognition and speaker identification could share low-level representations Layer-wise pretraining (Hinton and Salakhutdinov, 2006) can be seen as using unsupervised learning as an auxiliary task (Nov 4 session) Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

24 Probabilistic treatment Variational methods starting to appear in deep learning research T Machine Learning: Advanced Probabilistic Methods Jyri Kivinen might discuss these on Nov 11 session Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

25 Adversarial training (Szegedy et al., 2014) Search for an input x near a datapoint x that would have very different output y from y Adversaries can be found surprisingly close! Miato et al. (2015) build a very effective regularizer Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

26 Table of Contents Background Regularization methods Exercises Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

27 Exercises Read Chapter 7 (Regularization) and Chapter 9 (Convolutional Networks) Read the Theano tutorial on Regularization: Extend your MNIST classifier to include regularization. Consider at least L2 weight decay and additive Gaussian noise injected in the inputs. Choose a good regularization strength using a held-out validation set. Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September / 27

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Deep Learning. Mohammad Ali Keyvanrad Lecture 5:A Review of Artificial Neural Networks (4)

Deep Learning. Mohammad Ali Keyvanrad Lecture 5:A Review of Artificial Neural Networks (4) Deep Learning Mohammad Ali Keyvanrad Lecture 5:A Review of Artificial Neural Networks (4) OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning

More information

CS-E Deep Learning Session 2: Introduction to Deep 16 September Learning, Deep 2015Feedforward 1 / 27 N

CS-E Deep Learning Session 2: Introduction to Deep 16 September Learning, Deep 2015Feedforward 1 / 27 N CS-E4050 - Deep Learning Session 2: Introduction to Deep Learning, Deep Feedforward Networks Jyri Kivinen Aalto University 16 September 2015 Presentation largely based on material in Lecun et al. (2015)

More information

Dropout Training (Hinton et al. 2012)

Dropout Training (Hinton et al. 2012) Dropout Training (Hinton et al. 2012) Aaron Courville IFT6135 - Representation Learning Slide Credit: Some slides were taken from Ian Goodfellow 1 Dropout training Introduced in Hinton, G. E., Srivastava,

More information

Lecture 10 Summary and reflections

Lecture 10 Summary and reflections Lecture 10 Summary and reflections Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University. Email: niklas.wahlstrom@it.uu.se SML - Lecture 10 Contents Lecture

More information

ECE521 Lecture10 Deep Learning

ECE521 Lecture10 Deep Learning ECE521 Lecture10 Deep Learning Learning fully connected multi-layer neural networks For a single data point, we can write the the hidden activations of the fully connected neural network as a recursive

More information

Plankton Image Classification

Plankton Image Classification Plankton Image Classification Sagar Chordia Stanford University sagarc14@stanford.edu Romil Verma Stanford University vermar@stanford.edu Abstract This paper is in response to the National Data Science

More information

Improving neural networks by preventing coadaption of feature detectors

Improving neural networks by preventing coadaption of feature detectors Improving neural networks by preventing coadaption of feature detectors Published by: G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov Presented by: Melvin Laux TEst adhssahss2013

More information

An Analysis of Single-Layer Networks in Unsupervised Feature Learning. Adam Coates1, Honglak Lee2, Andrew Y. Ng1

An Analysis of Single-Layer Networks in Unsupervised Feature Learning. Adam Coates1, Honglak Lee2, Andrew Y. Ng1 An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates1, Honglak Lee2, Andrew Y. Ng1 Overview A Brief Introduction Unsupervised feature learning framework Experiments and Analysis

More information

Neural Networks. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley

Neural Networks. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley Neural Networks Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley Problem we want to solve The essence of machine learning: A pattern exists We cannot pin

More information

Final exam for CSC 321 April 11, 2013, 7:00pm 9:00pm No aids are allowed.

Final exam for CSC 321 April 11, 2013, 7:00pm 9:00pm No aids are allowed. Your name: Your student number: Final exam for CSC 321 April 11, 2013, 7:00pm 9:00pm No aids are allowed. This exam has two sections, each of which is worth a total of 10 points. Answer all 10 questions

More information

Key Ideas in Machine Learning

Key Ideas in Machine Learning CHAPTER 14 Key Ideas in Machine Learning Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF December 4, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This

More information

Understanding Deep Learning Requires Rethinking Generalization

Understanding Deep Learning Requires Rethinking Generalization Understanding Deep Learning Requires Rethinking Generalization Chiyuan Zhang Massachusetts Institute of Technology chiyuan@mit.edu Samy Bengio Google Brain bengio@google.com Moritz Hardt Google Brain mrtz@google.com

More information

CIS680: Vision & Learning Assignment 2.a: Gradient manipulation. Due: Oct. 16, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.a: Gradient manipulation. Due: Oct. 16, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.a: Gradient manipulation. Due: Oct. 16, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their own answers,

More information

Neural Networks and Regularization

Neural Networks and Regularization Deep Learning Theory and Applications Neural Networks and Regularization Kevin Moon (kevin.moon@yale.edu) Guy Wolf (guy.wolf@yale.edu) CPSC/AMTH 663 Outline 1. Overfitting 2. L2 Regularization 3. Other

More information

Sparse Gaussian Graphical Models with Unknown Block Structure

Sparse Gaussian Graphical Models with Unknown Block Structure Sparse Gaussian Graphical Models with Unknown Block Structure Department of Computer Science University of British Columbia Department of Computer Science, University of British Columbia 1 Outline Introduction

More information

Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

More information

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 10: Neural Networks for NLP 1 Announcements Assignment 2 due Friday project proposal due Tuesday, Feb. 16 midterm on Thursday, Feb.

More information

Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from. Predictive Analytics and Futurism December 2015 Issue 12 Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

More information

CPSC 540: Machine Learning. VAEs and GANs Winter 2018

CPSC 540: Machine Learning. VAEs and GANs Winter 2018 CPSC 540: Machine Learning VAEs and GANs Winter 2018 Density Estimation Strikes Back One of the hottest topic in machine learning: density estimation? In particular, deep learning for density estimation.

More information

The Fundamentals of Machine Learning

The Fundamentals of Machine Learning The Fundamentals of Machine Learning Willie Brink 1, Nyalleng Moorosi 2 1 Stellenbosch University, South Africa 2 Council for Scientific and Industrial Research, South Africa Deep Learning Indaba 2017

More information

Special Topic: Deep Learning

Special Topic: Deep Learning Special Topic: Deep Learning Hello! We are Zach Jones and Sohan Nipunage You can find us at: zdj21157@uga.edu smn57958@uga.edu 2 Outline I. II. III. IV. What is Deep Learning? Why Deep Learning? Common

More information

CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015

CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015 CPSC 340: Machine Learning and Data Mining Course Review/Preview Fall 2015 Admin Assignment 6 due now. We will have office hours as usual next week. Final exam details: December 15: 8:30-11 (WESB 100).

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE AUTOS for feature extraction Adrian Horzyk Autoencoders Autoencoder is a kind of artificial neural networks which is trained to represent a set of training data in an unsupervised

More information

Applications, Deep Learning Networks

Applications, Deep Learning Networks COMP9444 13s2 Applications, 1 vi COMP9444: Neural Networks Applications, Deep Learning Networks Example Applications speech phoneme recognition credit card fraud detection financial prediction image classification

More information

Statistical Learning. CS 486/686 Introduction to AI University of Waterloo

Statistical Learning. CS 486/686 Introduction to AI University of Waterloo Statistical Learning CS 486/686 Introduction to AI University of Waterloo Motivation: Things you know Agents model uncertainty in the world and utility of different courses of actions - Bayes nets are

More information

Deep Learning. Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks. l l l l. CS 678 Deep Learning 1

Deep Learning. Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks. l l l l. CS 678 Deep Learning 1 Deep Learning Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks CS 678 Deep Learning 1 Deep Learning Overview Train networks with many layers (vs. shallow nets with just a couple

More information

Lecture 2 Fundamentals of machine learning

Lecture 2 Fundamentals of machine learning Lecture 2 Fundamentals of machine learning Topics of this lecture Formulation of machine learning Taxonomy of learning algorithms Supervised, semi-supervised, and unsupervised learning Parametric and non-parametric

More information

Ensembles. CS Ensembles 1

Ensembles. CS Ensembles 1 Ensembles CS 478 - Ensembles 1 A Holy Grail of Machine Learning Outputs Just a Data Set or just an explanation of the problem Automated Learner Hypothesis Input Features CS 478 - Ensembles 2 Ensembles

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-

More information

Alex Zamoshchin (alexzam), Jonathan Gold (johngold)

Alex Zamoshchin (alexzam), Jonathan Gold (johngold) Alex Zamoshchin (alexzam), Jonathan Gold (johngold) Convolutional Neural Networks for Plankton Classification: Transfer Learning, Data Augmentation, and Ensemble Models 1. ABSTRACT We designed multiple

More information

Is Simple Better?: Revisiting Simple Generative Models for Unsupervised Clustering

Is Simple Better?: Revisiting Simple Generative Models for Unsupervised Clustering Is Simple Better?: Revisiting Simple Generative Models for Unsupervised Clustering Jhosimar Arias Figueroa and Adín Ramírez Rivera Institute of Computing, University of Campinas, Campinas, SP, Brazil jhosimar.figueroa@students.ic.unicamp.br,

More information

Universität Konstanz,

Universität Konstanz, Universität Konstanz, 11.06.2018 LeNet - LeCun, et al. developed a pioneer ConvNet for handwritten digits: - Many hidden layers - Many kernels in each layer - Pooling of the outputs of nearby replicated

More information

Learning from a Probabilistic Perspective

Learning from a Probabilistic Perspective Learning from a Probabilistic Perspective Data Mining and Concept Learning CSI 5387 1 Learning from a Probabilistic Perspective Bayesian network classifiers Decision trees Random Forest Neural networks

More information

Modeling with Keras. Open Discussion Machine Learning Christian Contreras, PhD

Modeling with Keras. Open Discussion Machine Learning Christian Contreras, PhD Modeling with Keras Open Discussion Machine Learning Christian Contreras, PhD Overview - As practitioners in deep networks, we often want to understand areas of prototyping and modeling. While there are

More information

11. Artificial Neural Networks

11. Artificial Neural Networks Foundations of Machine Learning CentraleSupélec Fall 2017 11. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

Machine Translation WiSe 2016/2017. Neural Machine Translation

Machine Translation WiSe 2016/2017. Neural Machine Translation Machine Translation WiSe 2016/2017 Neural Machine Translation Dr. Mariana Neves January 30th, 2017 Overview 2 Introduction Neural networks Neural language models Attentional encoder-decoder Google NMT

More information

Deep (Structured) Learning

Deep (Structured) Learning Deep (Structured) Learning Yasmine Badr 06/23/2015 NanoCAD Lab UCLA What is Deep Learning? [1] A wide class of machine learning techniques and architectures Using many layers of non-linear information

More information

Learning. Machine. A First Course in. Simon Rogers Mark Girolami. Chapman & Hall/CRC. CRC Press. Machine Learning & Pattern Recognition Series

Learning. Machine. A First Course in. Simon Rogers Mark Girolami. Chapman & Hall/CRC. CRC Press. Machine Learning & Pattern Recognition Series Chapman & Hall/CRC Machine Learning & Pattern Recognition Series A First Course in Machine Learning Simon Rogers Mark Girolami CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an

More information

Deep metric learning using Triplet network

Deep metric learning using Triplet network Deep metric learning using Triplet network Elad Hoffer, Nir Ailon January 2016 Outline 1 Motivation Deep Learning Feature Learning 2 Deep Metric Learning Previous attempts - Siamese Network Triplet network

More information

Overview of Machine Learning and H2O.ai

Overview of Machine Learning and H2O.ai Overview of Machine Learning and H2O.ai Machine Learning Overview What is machine learning? -- Arthur Samuel, 1959 Why now? Data, computers, and algorithms are commodities Unstructured data Increasing

More information

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Learning Discrete Representations via Information Maximizing Self-Augmented Training Learning Discrete Representations via Information Maximizing Self-Augmented Training Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama University of Tokyo RIKEN *Based on the work

More information

CogSci 109: Lecture 23. Mon Dec 2, 2007 Multilayer artificial neural networks, examples, and applications (II)

CogSci 109: Lecture 23. Mon Dec 2, 2007 Multilayer artificial neural networks, examples, and applications (II) CogSci 109: Lecture 23 Mon Dec 2, 2007 Multilayer artificial neural networks, examples, and applications (II) Outline for today Announcements Homework announcement Instead of a threshold, we can consider

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 10 2019 Class Outline Introduction 1 week Probability and linear algebra review Supervised

More information

Modified Dropout for Training Neural Network

Modified Dropout for Training Neural Network Modified Dropout for Training Neural Network James Duyck Machine Learning Department jduyck@andrew.cmu.edu Min Hyung Lee Machine Learning Department minhyunl@andrew.cmu.edu Eric Lei Machine Learning Department

More information

An Introduction to Machine Learning

An Introduction to Machine Learning MindLAB Research Group - Universidad Nacional de Colombia Introducción a los Sistemas Inteligentes Outline 1 2 What s machine learning History Supervised learning Non-supervised learning 3 Observation

More information

Security Analytics Review for Final Exam. Purdue University Prof. Ninghui Li

Security Analytics Review for Final Exam. Purdue University Prof. Ninghui Li Security Analytics Review for Final Exam Purdue University Prof. Ninghui Li Exam Date/Time Monday Dec 10 (8am 10am) LWSN B134 Organization of the Course Basic machine learning algorithms Neural networks

More information

CSC 2515: Lecture 01: Introduction

CSC 2515: Lecture 01: Introduction CSC 2515: Lecture 01: Introduction Richard Zemel & Raquel Urtasun University of Toronto Sep 17, 2015 Zemel & Urtasun (UofT) CSC 2515: 01-Introduction Sep 17, 2015 1 / 50 Today Administration details Why

More information

Capacity, Learning, Teaching

Capacity, Learning, Teaching Capacity, Learning, Teaching Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison jerryzhu@cs.wisc.edu 23 Machine learning human learning Learning capacity and generalization bounds

More information

Parameter and Structure Learning in Graphical Models

Parameter and Structure Learning in Graphical Models Advanced Signal Processing 2 SE Parameter and Structure Learning in Graphical Models 02.05.2005 Stefan Tertinek turtle@sbox.tugraz.at Outline Review: Graphical models (DGM, UGM) Learning issues (approaches,

More information

Solving Higgs Boson Machine Learning. Challenge using Neural Networks

Solving Higgs Boson Machine Learning. Challenge using Neural Networks Solving Higgs Boson Machine Learning Challenge using Neural Networks 1 Solving Higgs Boson Machine Learning Challenge using Neural Networks Varun Thumbe [13773] Satya Prakash P [14610] Indian Institute

More information

CS 510: Lecture 8. Deep Learning, Fairness, and Bias

CS 510: Lecture 8. Deep Learning, Fairness, and Bias CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already

More information

Categorization of Web News Documents Using Word2Vec and Deep Learning

Categorization of Web News Documents Using Word2Vec and Deep Learning Categorization of Web News Documents Using Word2Vec and Deep Learning Ryoma Kato/Hosei University Department of Systems Engineering Tokyo, Japan ryoma.kato.ra@stu.hosei.ac.jp Hiroyuki Goto/Hosei University

More information

Training Neural Networks, Part I. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Training Neural Networks, Part I. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1 Lecture 6: Training Neural Networks, Part I Lecture 6-1 Administrative Assignment 1 due Thursday (today), 11:59pm on Canvas Assignment 2 out today Project proposal due Tuesday April 25 Notes on backprop

More information

lhe Fundamentais of Machine learning 4 Why Use Machine Learning? 7 Supervised/Unsupervised Learning 8 Batch and Online Learning

lhe Fundamentais of Machine learning 4 Why Use Machine Learning? 7 Supervised/Unsupervised Learning 8 Batch and Online Learning Table of Contents Preface...................................................................... xiii Part I. lhe Fundamentais of Machine learning 1. The Machine learning landscape.............................................

More information

Machine Learning 101a. Jan Peters Gerhard Neumann

Machine Learning 101a. Jan Peters Gerhard Neumann Machine Learning 101a Jan Peters Gerhard Neumann 1 Purpose of this Lecture Statistics and Math Refresher Foundations of machine learning tools for robotics We focus on regression methods and general principles

More information

Machine Learning: Summary

Machine Learning: Summary Machine Learning: Summary Greg Grudic CSCI-4830 Machine Learning 1 What is Machine Learning? The goal of machine learning is to build computer systems that can adapt and learn from their experience. Tom

More information

Hello! Practical deep neural nets for detecting marine

Hello! Practical deep neural nets for detecting marine Hello! Practical deep neural nets for detecting marine mammals daniel.nouri@gmail.com @dnouri Kaggle competitions 2 sec sounds right whale upcall? ICML2013 comp results (1) 47k examples, 10% positive AUC:

More information

Plankton Classification Using Hybrid Convolutional Network-Random Forests Architectures

Plankton Classification Using Hybrid Convolutional Network-Random Forests Architectures Plankton Classification Using Hybrid Convolutional Network-Random Forests Architectures Pranav Jindal Department of Computer Science, Stanford University pranavj@stanford.edu Rohit Mundra Department of

More information

CS 760 Machine Learning Spring 2017

CS 760 Machine Learning Spring 2017 Page 1 University of Wisconsin Madison Department of Computer Sciences CS 760 Machine Learning Spring 2017 Final Examination Duration: 1 hour 15 minutes One set of handwritten notes and calculator allowed.

More information

Machine Learning Basics

Machine Learning Basics Deep Learning Theory and Applications Machine Learning Basics Kevin Moon (kevin.moon@yale.edu) Guy Wolf (guy.wolf@yale.edu) CPSC/AMTH 663 Outline 1. What is machine learning? 2. Supervised Learning Regression

More information

Multi-layer Perceptron on Interval Data

Multi-layer Perceptron on Interval Data Multi-layer Perceptron on Interval Data Fabrice Rossi 1 and Brieuc Conan-Guez 2 1 LISE/CEREMADE, UMR CNRS 7534, Université Paris-IX Dauphine, Place du Maréchal de Lattre de Tassigny, 75016 Paris, France

More information

Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems

Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems APSIPA ASC 2011 Xi an Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems Van Hai Do, Xiong Xiao, Eng Siong Chng School of Computer

More information

Lecture 7: Distributed Representations

Lecture 7: Distributed Representations Lecture 7: Distributed Representations Roger Grosse 1 Introduction We ll take a break from derivatives and optimization, and look at a particular example of a neural net that we can train using backprop:

More information

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

More information

Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines

Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines KyungHyun Cho, Alexander Ilin, and Tapani Raiko Department of Information and Computer Science Aalto University School of Science,

More information

NoiseOut: A Simple Way to Prune Neural Networks

NoiseOut: A Simple Way to Prune Neural Networks NoiseOut: A Simple Way to Prune Neural Networks Mohammad Babaeizadeh, Paris Smaragdis & Roy H. Campbell Department of Computer Science University of Illinois at Urbana-Champaign {mb2,paris,rhc}@illinois.edu.edu

More information

CSC 411: Introduction to Machine Learning

CSC 411: Introduction to Machine Learning CSC 411: duction to Machine Learning Lecture 1 - duction Ethan Fetaya, James Lucas and Emad Andrews University of Toronto Today Administration details Why is machine learning so cool? The Team I Instructors:

More information

Word Recognition with Conditional Random Fields

Word Recognition with Conditional Random Fields Outline ord Recognition with Conditional Random Fields Jeremy Morris 2/05/2010 ord Recognition CRF Pilot System - TIDIGITS Larger Vocabulary - SJ Future ork 1 2 Conditional Random Fields (CRFs) Discriminative

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Autoencoder Networks: Embedding and Representation Learning (Chapter 6) Motivation Representing words and other data

More information

Speech Recognition Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Speech Recognition Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Speech Recognition Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Amnon Drory & Matan Karo 19/12/2017 Deep Speech 1 Overview 19/12/2017 Deep Speech 2 Automatic Speech Recognition

More information

Relation Classification with Gated Recursive Convolutional Networks

Relation Classification with Gated Recursive Convolutional Networks Relation Classification with Gated Recursive Convolutional Networks Karl-Heinz Krachenfels CIS, LMU-Munich, Germany February 21, 2017 Abstract In this work we investigate variants of recursive Convolutional

More information

The Machine Learning Landscape

The Machine Learning Landscape The Machine Learning Landscape Vineet Bansal Research Software Engineer, Center for Statistics & Machine Learning vineetb@princeton.edu Oct 31, 2018 What is ML? A field of study that gives computers the

More information

Language Independent Automatic Framework for Entity Extraction in Indian Languages

Language Independent Automatic Framework for Entity Extraction in Indian Languages IIT(BHU)@IECSIL-FIRE-2018: Language Independent Automatic Framework for Entity Extraction in Indian Languages Akanksha Mishra, Rajesh Kumar Mundotiya, and Sukomal Pal Indian Institute of Technology, Varanasi,

More information

Artificial Neural Networks. Andreas Robinson 12/19/2012

Artificial Neural Networks. Andreas Robinson 12/19/2012 Artificial Neural Networks Andreas Robinson 12/19/2012 Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically

More information

Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction

Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction Brandon Cui (bcui19@stanford.edu) 1 Calvin Qi (calvinqi@stanford.edu) 2 Abstract We studied

More information

CSC 411/2515 Machine Learning and Data Mining Assignment 2 Out: Oct. 28 Due: Nov 16 [noon] k=1

CSC 411/2515 Machine Learning and Data Mining Assignment 2 Out: Oct. 28 Due: Nov 16 [noon] k=1 CSC 411/2515 Machine Learning and Data Mining Assignment 2 Out: Oct. 28 Due: Nov 16 [noon] Overview In this assignment, you will experiment with a neural network and mixture of Gaussians model. Some code

More information

Word Recognition with Conditional Random Fields. Jeremy Morris 2/05/2010

Word Recognition with Conditional Random Fields. Jeremy Morris 2/05/2010 ord Recognition with Conditional Random Fields Jeremy Morris 2/05/2010 1 Outline Background ord Recognition CRF Model Pilot System - TIDIGITS Larger Vocabulary - SJ Future ork 2 Background Conditional

More information

Introduction to Machine Learning 1. Nov., 2018 D. Ratner SLAC National Accelerator Laboratory

Introduction to Machine Learning 1. Nov., 2018 D. Ratner SLAC National Accelerator Laboratory Introduction to Machine Learning 1 Nov., 2018 D. Ratner SLAC National Accelerator Laboratory Introduction What is machine learning? Arthur Samuel (1959): Ability to learn without being explicitly programmed

More information

arxiv: v3 [cs.lg] 9 Mar 2014

arxiv: v3 [cs.lg] 9 Mar 2014 Learning Factored Representations in a Deep Mixture of Experts arxiv:1312.4314v3 [cs.lg] 9 Mar 2014 David Eigen 1,2 Marc Aurelio Ranzato 1 Ilya Sutskever 1 1 Google, Inc. 2 Dept. of Computer Science, Courant

More information

Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)

Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling

More information

Multi-Prediction Deep Boltzmann Machines

Multi-Prediction Deep Boltzmann Machines Multi-Prediction Deep Boltzmann Machines Ian J. Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal Montréal, QC H3C

More information

Deep Generative Models:

Deep Generative Models: Deep Generative Models: GANs and VAE Jakub M. Tomczak AMLAB, Universiteit van Amsterdam Split, Croatia 2017 Do we need generative modeling? Do we need generative modeling? Do we need generative modeling?

More information

CSC412/2506 Probabilistic Learning and Reasoning. Introduction

CSC412/2506 Probabilistic Learning and Reasoning. Introduction CSC412/2506 Probabilistic Learning and Reasoning Introduction Today Course information Overview of ML with examples Ungraded, anonymous background quiz Thursday: Basics of ML vocabulary (crossvalidation,

More information

CS230: Lecture 5 Case Study

CS230: Lecture 5 Case Study CS230: Lecture 5 Case Study Kian Katanforoosh Problem statement: Live-Cell Detection Goal: determining which parts of a microscope image corresponds to which individual cells. Data: Doctors have collected

More information

Incorporating Semantic Information into Image Classifiers

Incorporating Semantic Information into Image Classifiers Incorporating Semantic Information into Image Classifiers Osbert Bastani and Hamsa Sridhar Advised by Richard Socher December 14, 2012 1 Introduction In this project, we are investigating the incorporation

More information

Learning Feature-based Semantics with Autoencoder

Learning Feature-based Semantics with Autoencoder Wonhong Lee Minjong Chung wonhong@stanford.edu mjipeo@stanford.edu Abstract It is essential to reduce the dimensionality of features, not only for computational efficiency, but also for extracting the

More information

Reinforcement Learning for NLP

Reinforcement Learning for NLP Reinforcement Learning for NLP Caiming Xiong Salesforce Research CS224N/Ling284 Outline Introduction to Reinforcement Learning Policy-based Deep RL Value-based Deep RL Examples of RL for NLP Many Faces

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Machine learning: what? Study of making machines learn a concept without having to explicitly program it. Constructing algorithms that can: learn

More information

Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

More information

10601 Machine Learning Assignment 3: Logistic Regression

10601 Machine Learning Assignment 3: Logistic Regression 060 Machine Learning Assignment 3: Logistic Regression Due: Sept. 25th, 23:59 EST, via AutoLab Late submission due: Sept. 27th, 23:59 EST with 50% discount of credits TAs-in-charge: William Wang, Pengtao

More information

Training Neural Networks, Part 2. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 7-1

Training Neural Networks, Part 2. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 7-1 Lecture 7: Training Neural Networks, Part 2 Lecture 7-1 Administrative - Assignment 1 is being graded, stay tuned - Project proposals due today by 11:59pm - Assignment 2 is out, due Thursday May 4 at 11:59pm

More information

Active Learning for High Dimensional Inputs using Bayesian Convolutional Neural Networks

Active Learning for High Dimensional Inputs using Bayesian Convolutional Neural Networks Active Learning for High Dimensional Inputs using Bayesian Convolutional Neural Networks Riashat Islam Department of Engineering University of Cambridge M.Phil in Machine Learning, Speech and Language

More information

Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

More information

A study of the NIPS feature selection challenge

A study of the NIPS feature selection challenge A study of the NIPS feature selection challenge Nicholas Johnson November 29, 2009 Abstract The 2003 Nips Feature extraction challenge was dominated by Bayesian approaches developed by the team of Radford

More information

Deep neural networks III

Deep neural networks III Deep neural networks III June 5 th, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Announcements PS due 6/ (Thurs), 11:59 pm Review

More information

CSC412/2506 Probabilistic Learning and Reasoning

CSC412/2506 Probabilistic Learning and Reasoning CSC412/2506 Probabilistic Learning and Reasoning Introduction Jesse Bettencourt Today Course information Overview of ML with examples Ungraded, anonymous background quiz Thursday: Basics of ML vocabulary

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

15 : Case Study: Topic Models

15 : Case Study: Topic Models 10-708: Probabilistic Graphical Models, Spring 2015 15 : Case Study: Topic Models Lecturer: Eric P. Xing Scribes: Xinyu Miao,Yun Ni 1 Task Humans cannot afford to deal with a huge number of text documents

More information

Backpropagation in recurrent MLP

Backpropagation in recurrent MLP Backpropagation in recurrent MLP Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006 Chapter 5.3 Training and design issues in MLP Christopher Bishop, Pattern Recognition and Machine

More information