Hello! Practical deep neural nets for detecting marine

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

Python Machine Learning

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

Modeling function word errors in DNN-HMM based LVCSR systems

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Lecture 1: Machine Learning Basics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v1 [cs.lg] 15 Jun 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Lip Reading in Profile

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CSL465/603 - Machine Learning

Cultivating DNN Diversity for Large Scale Video Labelling

Forget catastrophic forgetting: AI that learns after deployment

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

arxiv: v1 [cs.lg] 7 Apr 2015

THE enormous growth of unstructured data, including

A Deep Bag-of-Features Model for Music Auto-Tagging

Calibration of Confidence Measures in Speech Recognition

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Model Ensemble for Click Prediction in Bing Search Ads

INPE São José dos Campos

SARDNET: A Self-Organizing Feature Map for Sequences

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Evolutive Neural Net Fuzzy Filtering: Basic Description

arxiv: v1 [cs.cl] 27 Apr 2016

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v1 [cs.cv] 10 May 2017

Evolution of Symbolisation in Chimpanzees and Neural Nets

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

An empirical study of learning speed in backpropagation

Softprop: Softmax Neural Network Backpropagation Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Dropout improves Recurrent Neural Networks for Handwriting Recognition

A study of speaker adaptation for DNN-based speech synthesis

Software Maintenance

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Deep Neural Network Language Models

B. How to write a research paper

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Speech Recognition at ICSI: Broadcast News and beyond

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

SORT: Second-Order Response Transform for Visual Recognition

Learning From the Past with Experiment Databases

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

CS Machine Learning

WHEN THERE IS A mismatch between the acoustic

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speaker Identification by Comparison of Smart Methods. Abstract

Speech Emotion Recognition Using Support Vector Machine

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Word Segmentation of Off-line Handwritten Documents

Using dialogue context to improve parsing performance in dialogue systems

Mathematics process categories

A Review: Speech Recognition with Deep Learning Methods

A Case Study: News Classification Based on Term Frequency

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Offline Writer Identification Using Convolutional Neural Network Activation Features

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Soft Computing based Learning for Cognitive Radio

Australian Journal of Basic and Applied Sciences

Residual Stacking of RNNs for Neural Machine Translation

Switchboard Language Model Improvement with Conversational Data from Gigaword

WEEK FORTY-SEVEN. Now stay with me here--this is so important. Our topic this week in my opinion, is the ultimate success formula.

arxiv: v1 [cs.cl] 20 Jul 2015

Deep Facial Action Unit Recognition from Partially Labeled Data

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

On the Formation of Phoneme Categories in DNN Acoustic Models

Rule Learning with Negation: Issues Regarding Effectiveness

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Using EEG to Improve Massive Open Online Courses Feedback Interaction

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Transcription:

Hello! Practical deep neural nets for detecting marine mammals daniel.nouri@gmail.com @dnouri

Kaggle competitions 2 sec sounds right whale upcall?

ICML2013 comp results (1) 47k examples, 10% positive AUC: 0.988 (Kaggle valid set) Accuracy: 97.3% 62k examples, 19% positive AUC: 0.992 (Kaggle valid set) Accuracy: 97.3%

ICML2013 comp results (2) Confusion matrix no 3152 79 yes 29 740

ICML2013 comp results (3) precision recall f1 score support neg 0.99 0.98 0.98 3231 pos 0.90 0.96 0.93 769 avg 0.97 0.97 0.97 4000

Predictions

This presentation 1. Quick overview: deep learning 2. An implementation: cuda-convnet 3. Practical tips for better results

Neural networks Neural Networks find weights so that h produces desired output

Deep neural networks Deep because many hidden layers

Deep learning: and the brain Fascinating idea: one algorithm hypothesis Rewire sensors auditory cortex visual cortex, visual cortex will learn to hear

Deep learning: so what DNN not just a classifier, but also a very powerful feature extractor signal processing, filtering noise reduction contour extraction, per species (sometimes uninformed) assumptions

Deep learning: say what DNN not just a classifier, but also a very powerful feature extractor signal processing, filtering noise reduction contour extraction, per species (sometimes uninformed) assumptions

Deep learning: claim Big bold claim less work better results Challenge me!

Deep Learning: breakthrough recent breakthroughs in in many fields: Image recognition Image search (autoencoder) Speech recognition Natural Language Processing Passive acoustics for detecting mammals!

Deep learning: old ideas Backprop for training weights but training used to be hard

Deep learning: new things New developments that enabled breakthrough Much larger (deeper) nets; able to train them better through GPUs (huge jump in performance) more (labeled) data 'relu' activation function Dropout

Implementation: cuda-convnet by Alex Krizhevsky, Hinton's group Open Source and good docs examples included (CIFAR) code.google.com/p/cuda-convnet/ very fast implementation of convolutional DNNs based on CUDA C++, Python

cuda-convnet: ILSVRC 2012 Large Scale Visual Recognition Challenge 2012 1.2 million high-resolution training images 1000 object classes winner code based on cuda-convnet trained for a week on two GPUs 60 million parameters and 650,000 neurons 16.4% error versus 26.1% (2 nd place)

cuda-convnet: ILSVRC 2012

cuda-convnet: config (1) layers.cfg defines architecture [fc4] # layer name type=fc # type of layer inputs=fc3 # layer input outputs=512 # number of units initw=0.01 # weight initialization neuron=relu # activation function

cuda-convnet: config (2) layers.cfg defines many layers [data] [resize] [conv1] [pool1] [conv2] [pool2] [fc3] [fc4] [fc5] [probs] [logprob]

cuda-convnet: config (3) layer-params.cfg defines additional params for layers in layers.cfg params that may change during training e.g. learning rate, regularization

cuda-convnet: input file format actual training data: data_batch_1, data_batch_2,, data_batch_n statistics (mean): batches_meta data_batch_1: pickled dict with {'data': Numpy array, 'labels': list} a few lines of Python

cuda-convnet: data provider Python class responsible for reading data passing it on to neural net example data layer included can adjust e.g. when dealing with grayscale, different cropping

cuda-convnet: training (1) python convnet.py --data-path=../cifar-10-batches-py-colmajor/ --save-path=../tmp --test-range=5 --train-range=1-4 --layer-def=layers.cfg --layer-params=layer-params.cfg data-provider=cifar-cropped --test-freq=13 --crop-border=4 --epochs=100

cuda-convnet: training (2) continue training from a snapshot python convnet.py -f../tmp/convnet 2013-06-14_15.5 4.31 --epochs=110

cuda-convnet: prediction input: data_bach_x output: csv file, other formats github.com/dnouri/noccn predict script

Practical tips for better results Lots of hyperparameters most important params: number and type of layers number of units in layers number of convolutional filters and their size weight initialization learning rates: epsw weight decay number of input dims convolutional filter size

Practical: where to start Lots of parameters Automated grid search not feasible, at least not for bigger nets Need to start with reasonable defaults Standard architectures go a long way

Practical: try examples CIFAR-10 examples I worked on image classification problem when I started with upcall detection challenge feeding a spectogram into a very similar net gave great results already

Practical: overfit first Configure net to overfit first Add regularization later except maybe weight decay in conv layers: helps with learning Hinton: if your deep neural net isn't overfitting, it isn't big enough

Practical: init weights (1) fine-tuning net hyperparameters can take a long time net with better initialized weights trains much faster, thus reducing round-trip time for fine-tuning we initialize weights from a random distribution

Practical: init weights (2) play a little, compare training error of first epoch whatever trains faster, wins if you change number of units, you'll probably want to change scale of weight initialization, too

Practical: check filters Noisy convolutional filters are bad for generalization

Practical: check weights make sure that all/many filters are active here: second conv layer

Practical: init weights (3) DBNs: pre-training to learn weights use if you don't have a lot of labeled data

Practical: learning rate relatively easy to find good values too high: training error doesn't decrease too low: training error decreases slowly, gets stuck in local optimum reduce at end of training to get little more gain

Practical: weight decay pulls weights towards zero makes for cleaner filters don't use them for fully connected layers; instead use...

Practical: Dropout recent development effect similar to averaging many individual nets but faster to train and test dropout 0.5 in fully connected layers; sometimes 0.2 in input layers my best model uses dropout and overfits very little

Practical: data augmentation more data better generalization augment data at train time, mix example together with random negative example

Practical: cropping another way to augment data crop from 120x100 spectogram window of 100x100

References (1) ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky 2012] Improving neural networks by preventing co-adaptation of feature detectors [Hinton 2012] Practical recommendations for gradient-based training of deep architectures [Bengio 2012]

References (2) code.google.com/p/cuda-convnet/ github.com/dnouri/cuda-convnet github.com/dnouri/noccn daniel.nouri@gmail.com Thanks!