Artificial Neural Networks. Andreas Robinson 12/19/2012

Similar documents
Python Machine Learning

Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

(Sub)Gradient Descent

Knowledge Transfer in Deep Convolutional Neural Nets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v1 [cs.lg] 15 Jun 2015

Softprop: Softmax Neural Network Backpropagation Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

CS Machine Learning

Human Emotion Recognition From Speech

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Artificial Neural Networks

Lecture 1: Basic Concepts of Machine Learning

An empirical study of learning speed in backpropagation

INPE São José dos Campos

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Modeling function word errors in DNN-HMM based LVCSR systems

Forget catastrophic forgetting: AI that learns after deployment

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

SARDNET: A Self-Organizing Feature Map for Sequences

CSL465/603 - Machine Learning

Learning Methods for Fuzzy Systems

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Deep Bag-of-Features Model for Music Auto-Tagging

Active Learning. Yingyu Liang Computer Sciences 760 Fall

arxiv: v1 [cs.cv] 10 May 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v2 [cs.cv] 30 Mar 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Axiom 2013 Team Description Paper

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

On the Formation of Phoneme Categories in DNN Acoustic Models

A Review: Speech Recognition with Deep Learning Methods

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Word Segmentation of Off-line Handwritten Documents

Learning From the Past with Experiment Databases

Assignment 1: Predicting Amazon Review Ratings

Probabilistic Latent Semantic Analysis

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A study of speaker adaptation for DNN-based speech synthesis

Model Ensemble for Click Prediction in Bing Search Ads

Knowledge-Based - Systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Second Exam: Natural Language Parsing with Neural Networks

Evolution of Symbolisation in Chimpanzees and Neural Nets

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Evolutive Neural Net Fuzzy Filtering: Basic Description

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Issues in the Mining of Heart Failure Datasets

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Discriminative Learning of Beam-Search Heuristics for Planning

Offline Writer Identification Using Convolutional Neural Network Activation Features

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Test Effort Estimation Using Neural Network

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Laboratorio di Intelligenza Artificiale e Robotica

Calibration of Confidence Measures in Speech Recognition

Laboratorio di Intelligenza Artificiale e Robotica

Time series prediction

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v1 [cs.cl] 2 Apr 2017

Speaker Identification by Comparison of Smart Methods. Abstract

Learning to Schedule Straight-Line Code

Deep Neural Network Language Models

An OO Framework for building Intelligence and Learning properties in Software Agents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Speech Emotion Recognition Using Support Vector Machine

Improvements to the Pruning Behavior of DNN Acoustic Models

Device Independence and Extensibility in Gesture Recognition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

arxiv: v2 [cs.ir] 22 Aug 2016

Attributed Social Network Embedding

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Learning Methods in Multilingual Speech Recognition

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

THE enormous growth of unstructured data, including

Speech Recognition at ICSI: Broadcast News and beyond

Corrective Feedback and Persistent Learning for Information Extraction

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Rule Learning With Negation: Issues Regarding Effectiveness

Abstractions and the Brain

WHEN THERE IS A mismatch between the acoustic

MYCIN. The MYCIN Task

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CS 446: Machine Learning

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Transcription:

Artificial Neural Networks Andreas Robinson 12/19/2012

Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically motivated: human brain 1 layer networks related to Support Vector Machines Universal approximators

Motivating Quote We are currently experiencing a second Neural Network Rennaissance (the first one happened in the 1980s and early 90s). In many applications, our deep NNs are now outperforming all other methods including the theoretically less general and less powerful support vector machines (which for a long time had the upper hand, at least in practice) Dr. Jurgen Schmidhuber Between 2009 2012 Swiss AI lab has won 8 international pattern recognition contests and currently hold record for several machine learning benchmark datasets.

NN Example Applications Post Office OCR Used to recognize hand written zip code digits for the postal service. Also, bank check readers DARPA grand challenge Used by winning team as part of solution for extracting roads from aerial imagery DARPA Deep learning BAA 2009 to present Unsupervised deep architectures automatic feature extraction Much of the research > deep neural networks and related approaches Goodrich Aerospace: Learning telemetry mapping from shear ports Pitot probes Primordial proposed as possible approach for Natick Phase 2 Land cover extraction Deep neural networks Previously have tried: SVMs, Max Likelihood, EM, region segmentation

Summary Neurons Single layer Networks Perceptrons 1950s 60s Multi layer Networks (1 2 layers) 1980s 90s Demo Deep neural networks Recurrent neural networks (briefly) Competitions / Benchmarks Libraries

Biological Neuron Neurons Synapses Ops / Sec Human Brain 10 11 10 14 10 17

Artificial Neuron

Perceptrons Invented: Rosenblatt 1957 Structure Input/output layers No hidden layers Activation function Hard threshold Perceptron Example: Something missing? Feed forward Shape of decision boundary? Learning rule W i < W i + alpha * (y h w (x)) * x i

Perceptron Limitations Linear separators Problematic cases Decline of neural network research Perceptrons Minsky & Papert 1969 Also first AI winter

Multilayer Neural Networks Key innovation Technique for training more than one layer Back propagation Reinvigorated interest in neural nets Back prop invented: 1969 [Ho] Reinvented: 1974 [Werbos], 1985 [Park] Widespread use: 1980s, early 90s Addressed key deficiencies that had been raised with perceptrons e.g. XOR Still feed forward: typically 1 2 hidden layers

Multilayer Neural Network

Activation functions Hard threshold Support non linearities Sigmoid Differentiable

Training network Back propagation: Gradient descent to minimize error on training sample Adjust weights in direction that locally minimizes the error Direction determined by gradient of error function (local partial derivatives) on training samples ( de/dw0,, de/dw1) Differentiable activation function, squared error => closed form derivative Resulting weight update eqn, errors propagate back through network

Multilayer Networks Theory Universal approximator 1 hidden layer: Finite domain, continuous functions Local minima Momentum Retraining Typical topology 1 2 layers Regression vs classification (output activation) Determining the structure and parameters Overfitting Cross validation Early stopping Regularization

Universal Approximator Visualization

Pre processing Manual feature selection HOGs, wavelets, shape descriptors, statistical properties, SIFT Reduce dimensionality Improve separability Segmentation entity detection Training data deformations/invariants

Neural Network Demo Sample app svn://bordeaux/source/classifysvm/ OpenCV Number of features? Adjustable Parameters: Number of hidden nodes Training iterations

Deep Networks Definition Number of layers Hierarchical structure Deep networks traditionally not used Common phallacy: 1 layer universal approximator Lack of effective training algorithms; local minima, vanishing gradients, small training sets Automatic feature extraction Lower layers Unsupervised: Recent algorithms for training E.g. Stacked auto encoders, Restricted Boltzmann Machines (RBMs), etc Supervised Convolutional networks Unsupervised pre training

Motivation Deep Structures Single Layer Universal approximator Compact representations most functions representable compactly with a deep architecture would require a very large number of components if represented with a shallow one * Example: For all k, there are depth k+1 circuits of linear size that require exponential size to simulate with depth k circuits Complexity in terms of number of bits or number of input nodes Generalization Lookup table: linear in sample size, exponential in bits Sub exponential representation => underlying pattern

Unsupervised Auto encoders No labeled training samples Sparse auto encoder Learn compact representation Input => input Small number of hidden nodes, or bias in optimization towards zero valued weights Can use back prop to train network

Auto encoder Visualization Images that maximize activation of each hidden node feature

Stacked Auto encoders Stacked auto encoders Hierarchical, deep structure Hidden nodes represent features Low level to more abstract features Edges, shapes, faces Unsupervised pretraining

Google YouTube Classification Google team, 2012 unsupervised learning from YouTube images Stacked Sparse Auto encoders 9 layers 1 billion weight connections (compared to roughly 10^14 in brain) 10 million images; random unlabeled YouTube frames 1000 machine cluster (16,000 cores) trained for 3 days Pooling and local contrast normalization Local receptive field Who: Jeff Dean, a Google technical fellow (et al) Andrew Y. Ng, a Stanford computer scientist Current record on ImageNet database Supervised pre training 20k object types 70% better then previous best 15.8% accuracy (random guess:.005%) Challenging dataset LSVRC ImageNet 2012 Challenge (not Google): Deep Convolutional Network (GPU) Best team: 15.3% error (as opposed to accuracy); 26.1% runner up (SIFT features) 1000 object types Deep learning quote winning team: The point about this approach is that it scales beautifully. Basically you just need to keep making it bigger and faster, and it will get better. There s no looking back now. [Hinton]

Google Auto encoder Visualization Images that maximize activation of two hidden node features

Deep Networks Supervised Convolutional Neural Networks Current top approach in many machine learning competitions/datasets Biologically motivated Visual cortex Local receptive field Shared weights Reduced search space; translation invariance Trained with back prop Yan LeCun (90s) Unsupervised pre training E.g. stacked auto encoders Google approach, non convolutional Swiss team down played

Convolutional Networks Local receptive field Biological motivation Shared Weights (Convolutions) Multiple feature maps per layer Translation invariance Pooling (Max) Downsampling Local Receptive Field: Shared Weights:

Deep Network Results Swiss AI lab (Dr. Schmidhuber) Primarily: Convolutional Networks and Recurrent Networks (see next slide) GPU implementations In some cases, raw imagery rather than manual features Since 2009: Lab has won 8 first prizes in visual pattern recognition contests Including better than human performance in sign recognition (IJCNN 2011) Top performance in following benchmarks: MNIST Handwritten Digits Benchmark ( 1st human competitive result in 2011 ) 0.23% error NORB Object Recognition Benchmark CIFAR Image Classification Benchmark The Weizmann & KTH Human Action Recognition Benchmarks

Recurrent neural networks Backwards connections (loops) Human brain Turing complete Compact representations Top performance in several hand writing recognition competitions ICDAR 2009: the Arabic Connected Handwriting Competition, the Handwritten Farsi/Arabic Character Recognition Competition, and the French Connected Handwriting Competition Same Swiss team as prev slide (Shmidhuber)

Some existing libraries OpenCV: Basic shallow neural network implementation Primarily a computer vision library Fast Neural Network Library (FANN): C++ library for efficient feed foward networks http://leenissen.dk/fann/wp/ Pynnet: Python library for deep neural networks Stacked auto encoders, convolutional networks, recurrent networks, etc http://code.google.com/p/pynnet/

References Russell & Norvig AI Textbook http://www.idsia.ch/~juergen/vision.html http://deeplearning.net/tutorial/lenet.html http://research.google.com/archive/unsupervised_icml201 2.html http://ufldl.stanford.edu/wiki/index.php/autoencoders_an d_sparsity http://ufldl.stanford.edu/wiki/index.php/stacked_autoenc oders http://www.nytimes.com/2012/11/24/science/scientistssee advances in deep learning a part of artificialintelligence.html?pagewanted=2&_r=1