Parallel Distributed Processing: Selected History up to Deep Learning

Similar documents
Python Machine Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Knowledge-Based - Systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning Methods for Fuzzy Systems

Artificial Neural Networks

Lecture 1: Basic Concepts of Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

CS Machine Learning

INPE São José dos Campos

Softprop: Softmax Neural Network Backpropagation Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

Test Effort Estimation Using Neural Network

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

On the Formation of Phoneme Categories in DNN Acoustic Models

Learning to Schedule Straight-Line Code

Lecture 1: Machine Learning Basics

Modeling function word errors in DNN-HMM based LVCSR systems

Generative models and adversarial training

Word Segmentation of Off-line Handwritten Documents

Modeling function word errors in DNN-HMM based LVCSR systems

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Knowledge Transfer in Deep Convolutional Neural Nets

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CSL465/603 - Machine Learning

Reinforcement Learning by Comparing Immediate Reward

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

phone hidden time phone

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Seminar - Organic Computing

An OO Framework for building Intelligence and Learning properties in Software Agents

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Human Emotion Recognition From Speech

CS 598 Natural Language Processing

Speech Recognition at ICSI: Broadcast News and beyond

Second Exam: Natural Language Parsing with Neural Networks

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

An empirical study of learning speed in backpropagation

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

Deep Neural Network Language Models

(Sub)Gradient Descent

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Henry Tirri* Petri Myllymgki

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Laboratorio di Intelligenza Artificiale e Robotica

SARDNET: A Self-Organizing Feature Map for Sequences

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Probabilistic Latent Semantic Analysis

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

arxiv: v1 [cs.lg] 15 Jun 2015

EGRHS Course Fair. Science & Math AP & IB Courses

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Developing Grammar in Context

A Case Study: News Classification Based on Term Frequency

Indian Institute of Technology, Kanpur

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Neural Representation and Neural Computation. Philosophical Perspectives, Vol. 4, Action Theory and Philosophy of Mind (1990),

Automating the E-learning Personalization

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Ontological spine, localization and multilingual access

An Empirical and Computational Test of Linguistic Relativity

Guide to Teaching Computer Science

Classification Using ANN: A Review

Description: Pricing Information: $0.99

A study of speaker adaptation for DNN-based speech synthesis

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Value Creation Through! Integration Workshop! Value Stream Analysis and Mapping for PD! January 31, 2002!

Early Model of Student's Graduation Prediction Based on Neural Network

Connectionism, Artificial Life, and Dynamical Systems: New approaches to old questions

Speaker Identification by Comparison of Smart Methods. Abstract

Software Maintenance

How People Learn Physics

Time series prediction

Chapter 9 Banked gap-filling

Using focal point learning to improve human machine tacit coordination

Using the Artificial Neural Networks for Identification Unknown Person

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

Dublin City Schools Mathematics Graded Course of Study GRADE 4

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Transcription:

Parallel Distributed Processing: Selected History up to Deep Learning COGS 201-9/20/16

Goals 1 Give a historical overview of the development of 2 Start a conversation about neural networks and machine learning 3 Identify resources to learn about cutting-edge machine intelligence and deep learning

Outline Parallel Distributed Processing () 1 : Its situation in time 2 Application of : On Learning the Past Tenses of English Verbs 3 Example: Learning to classify hand-drawn digits 4

Why did I choose Chapter 18? Worked with electrical engineers in signal processing applied to physics-based imaging systems; they were machine learning experts. Been curious ever since and this is my chance to learn. At first this chapter seemed like a good way for me to learn a real-world computational language problem. Found out it is an example that both lays foundation for and is quickly superseded by more recent work. As such, look at more recent work in machine learning, including deep learning, which I know even less about. Please teach me!

Origins of earliest roots of the approach can be found in the work of...neurologists Feldman and Ballard (1982) laid out many of the computational principles of the approach (under the name of connectionism), and stressed the biological implausibility of most... computational models in (AI) ( 1 ) Hopfield s (1982) contribution of the idea that played a prominent role in the development of the Boltzmann machine which re-appears in neural networks and machine learning later 1 David E. Rumelhart and James L. McClelland. Parallel Distributed Processing. Massachusetts Institute of Technology, 1986. isbn: 0262521873. doi: 10.1016/0010-0277(93)90006-H. arxiv: 9809069v1 [arxiv:gr-qc]. url: http://stanford.edu/%7b~%7djlmcc/papers//chapter18.pdf, 1, p. 41.

The renaissance Promised to imitate human learning and physiology in solving a number of problems including: 1 Processing sentences 2 Place recognition 3 Learning the past tense of (Ch. 18) See [ 2 ]. 2 David E. Rumelhart and James L. McClelland. Parallel Distributed Processing. Massachusetts Institute of Technology, 1986. isbn: 0262521873. doi: 10.1016/0010-0277(93)90006-H. arxiv: 9809069v1 [arxiv:gr-qc]. url: http://stanford.edu/%7b~%7djlmcc/papers//chapter18.pdf, Vol 2.

Unlike the formal grammar approach, the rules for learning past tense are learned as correct examples are shown to the system. The connectionist/ perspective eschews the concepts of symbols and rules in favor of a model of the mind that closely reflects the functioning of the brain. [ 3 ] 3 Marc F. Joanisse and James L. McClelland. Connectionist perspectives on language learning, representation and processing. In: Wiley Interdisciplinary Reviews: Cognitive Science 6.3 (2015), pp. 235 247. issn: 19395086. doi: 10.1002/wcs.1340.

Model behaves like a human in that it goes through three stages of past tense learning 1 Phase I: ten most common verbs; 8 irregular, 2 regular 2 Phase II: a large set of regular verbs; at this point the model gets confused and makes mistakes on the original irregular verbs it had previously learned 3 Phase III: expansion to more examples including the irregular verbs again; mistakes stop for first-learned irregulars Learning means presenting correct conjugations to the model, e.g. GO WENT, and changing network based on error Mathematical vectors represent phoeneme structure in input/output data. More specifically, phonemes are translated to a more complex structure, Wickelphones which can build to form Wickelfeatures

From 4 4 David E. Rumelhart and James L. McClelland. Parallel Distributed Processing. Massachusetts Institute of Technology, 1986. isbn: 0262521873. doi: 10.1016/0010-0277(93)90006-H. arxiv: 9809069v1 [arxiv:gr-qc]. url: http://stanford.edu/%7b~%7djlmcc/papers//chapter18.pdf.

Major weakness of this approach: Wickels Wickelphones and Wickelfeatures that never caught on. They were improved upon just a few years later as explained in [5 ] and acknowledged in Joanisse and McClelland (2015) [6 ]. 5 Charles X. Ling. Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models. In: Journal of Artificial Intelligence (1994), pp. 209 229. arxiv: cs/9402101. 6 Marc F. Joanisse and James L. McClelland. Connectionist perspectives on language learning, representation and processing. In: Wiley Interdisciplinary Reviews: Cognitive Science 6.3 (2015), pp. 235 247. issn: 19395086. doi: 10.1002/wcs.1340.

Weaknesses of the original program Criticisms centered on the 1 issues of high error rates and low reliability of the experimental results 2 the inappropriateness of the training and testing procedures 3 hidden features of the representation and the network architecture that facilitate learning 4 opaque knowledge representation of the networks (list is made of selected quotes from [ 7 ]) 7 Charles X. Ling. Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models. In: Journal of Artificial Intelligence (1994), pp. 209 229. arxiv: cs/9402101.

: Library for Machine Intelligence 1 Python or C++ 2 Optimized for deep learning model building, training, validating, and testing 3 Can configure automatic GPU utilization on Linux and Mac

MNIST For ML Beginners: handwritten digit classification Hello, World! for machine learning. Each digit is assigned to a class which is represented as an output vector with value 1 at the index corresponding to the digit, and 0 elsewhere. So for example, 1 0 0 0 1 0 0 = 0, 1 = 0,..., 9 =.. 0. 0 0 1 (1)

MNIST For ML Beginners: handwritten digit classification Python library includes MNIST data: from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("mnist_data/", one_hot=true The data: pairs of images and actual digit Corresponds to digit 1 1 55,000 for training 2 10,000 for testing 3 5,000 for validation

MNIST For ML Beginners: handwritten digit classification The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model. Ideally, the test set should be kept in a vault, and be brought out only at the end of the data analysis 8 1 55,000 for training 2 10,000 for testing 3 5,000 for validation 8 Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Vol. 1. 2009. isbn: 9780387848570. doi: 10.1007/b94608. url: http://statweb.stanford.edu/~tibs/elemstatlearn/, p. 222.

Training & back-propagation The back propagation technique was one of the major contributions of the research group, by Rumelhart, et al., 1986. The conceptual outline of training the neural network is below. 9 Present the inputs, calculate outputs using the current weights (initialize weights to random values for first presentation) Check difference between input and output Adjust weights according to difference and the back propagation algorithm 9 David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. In: Nature 323.6088 (1986), pp. 533 536. issn: 0028-0836. doi: 10.1038/323533a0. arxiv: arxiv:1011.1669v3.

Training & back-propagation the backpropagation equations are so rich that understanding them well requires considerable time and patience as you gradually delve deeper into the equations. The good news is that such patience is repaid many times over. 10 10 Michael A. Nielsen. Neural Networks and Deep Learning. 2015. url: http://neuralnetworksanddeeplearning.com/.

Some confusing terminology Hidden layer : not an input or an output layer Perceptrons: neurons where the output activation is either 0 or 1 Sigmoid neurons: digit learning used sigmoid neurons where activation is a decimal between 0 and 1

ML beginners: Neural network in the browser playground.tensorflow.org

The depth of deep learning comes from the number of layers used in the network. As the number of layers is increased, there is also a hierarchical dependency added. This allows the network to represent the types of hierarchical patterning that occurs in many natural data sources 11. See this comparison of popular deep learning frameworks on GitHub and read for more info on deep learning frameworks including. 11 Michael A. Nielsen. Neural Networks and Deep Learning. 2015. url: http://neuralnetworksanddeeplearning.com/, Tensorflow.org. an Open Source Software Library for Machine Intelligence. url: https://www.tensorflow.org/ (visited on 09/18/2016).

According to Wikipedia Google has been running Tensor Processing Units for over a year in their data centers. These TPUs are like GPUs, but specialized not just for matrix multiplications, but for the tensors, the multidimensional arrays, of.