TECHNIQUES: THE MULTI-LAYER PERCEPTRON

Similar documents
Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods for Fuzzy Systems

Python Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

SARDNET: A Self-Organizing Feature Map for Sequences

Softprop: Softmax Neural Network Backpropagation Learning

An empirical study of learning speed in backpropagation

Evolution of Symbolisation in Chimpanzees and Neural Nets

(Sub)Gradient Descent

Evolutive Neural Net Fuzzy Filtering: Basic Description

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Lecture 1: Machine Learning Basics

Artificial Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Test Effort Estimation Using Neural Network

Issues in the Mining of Heart Failure Datasets

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

INPE São José dos Campos

CS Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CSL465/603 - Machine Learning

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.lg] 15 Jun 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Human Emotion Recognition From Speech

Speaker Identification by Comparison of Smart Methods. Abstract

While you are waiting... socrative.com, room number SIMLANG2016

Second Exam: Natural Language Parsing with Neural Networks

arxiv: v1 [cs.cv] 10 May 2017

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Mathematics subject curriculum

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Improvements to the Pruning Behavior of DNN Acoustic Models

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

An Introduction to Simulation Optimization

Predicting Future User Actions by Observing Unmodified Applications

Radius STEM Readiness TM

Shockwheat. Statistics 1, Activity 1

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Seminar - Organic Computing

Early Model of Student's Graduation Prediction Based on Neural Network

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

On the Formation of Phoneme Categories in DNN Acoustic Models

An OO Framework for building Intelligence and Learning properties in Software Agents

Assignment 1: Predicting Amazon Review Ratings

Learning to Schedule Straight-Line Code

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Measurement. When Smaller Is Better. Activity:

School of Innovative Technologies and Engineering

Knowledge-Based - Systems

Device Independence and Extensibility in Gesture Recognition

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Axiom 2013 Team Description Paper

Time series prediction

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

Lecture 1: Basic Concepts of Machine Learning

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Cognitive Thinking Style Sample Report

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

On-Line Data Analytics

An Introduction to Simio for Beginners

Classification Using ANN: A Review

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica

Generative models and adversarial training

Let s think about how to multiply and divide fractions by fractions!

On the Combined Behavior of Autonomous Resource Management Agents

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Grade 6: Correlated to AGS Basic Math Skills

Automatic Pronunciation Checker

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Foothill College Summer 2016

arxiv: v2 [cs.ir] 22 Aug 2016

A 3D SIMULATION GAME TO PRESENT CURTAIN WALL SYSTEMS IN ARCHITECTURAL EDUCATION

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Probability estimates in a scenario tree

A study of speaker adaptation for DNN-based speech synthesis

Learning From the Past with Experiment Databases

Missouri Mathematics Grade-Level Expectations

STA 225: Introductory Statistics (CT)

A Review: Speech Recognition with Deep Learning Methods

Introduction to the Practice of Statistics

WHEN THERE IS A mismatch between the acoustic

CSC200: Lecture 4. Allan Borodin

Moderator: Gary Weckman Ohio University USA

CS 446: Machine Learning

A virtual surveying fieldcourse for traversing

Math 96: Intermediate Algebra in Context

Transcription:

TECHNIQUES: THE MULTI-LAYER PERCEPTRON Learning Back propagation The General Multi-Layer Perceptron one of the most important and widely used network models links together processing units into a network made up of layers input (set by problem data) output (of solution values) typically one or two hidden layers (units model patterns in input data) Each layer is fully connected to the succeeding layer The network is feedforward. When using or testing a trained network, the input values set the values of elements in the first hidden layer, which influence the next layer, and so on until it sets values for the output layer elements. See a JavaScript model of the feedforward calculation in the Neural Planner diggers example. What if a known input pattern produces a wildly incorrect output signal? Then we need to train the network, through a learning process. Learning in neural networks is done by changing the weighting factors (weights) at each element to reduce output errors. I've put together a simple JavaScript demonstration of learning in a single-layer perceptron.

In MLPs, learning is supervised, with separate training and recall phases. For an example of how you train and then test a network, see Bob Mitchell's handwriting recognizer During training the nodes in the hidden layers organise themselves such that different nodes learn to recognise different features of the total input space. During the recall phase of operation the network will respond to inputs that exhibit features similar to those learned during training. Incomplete or noisy inputs may be completely recovered by the network. In its learning phase, you give it a training set of examples with known inputs and outputs. 1. For each input pattern, the network produces an output pattern. 2. It compares the actual output and the desired one from the training set and calculates an error. 3. It adjusts its weights a little to reduce the error (sliding down the slope). 4. It repeats 1-3 many times for every example in the training set until it has minimised the errors. For a graphical visualisation of how a neural network gradually adjusts

itself to get closer and closer to the input patterns, see Jochen Fröhlich's Java simulation of a self-organising Kohonen feature map. (Note that this uses a different learning algorithm, not back propagation, but it still gradually gets closer to the training set values.) There are many weights to be adjusted, so consider a multi-dimensional surface constructed by plotting the total network error in weight space (i.e. over all the possible changes in weight). A 3-D approximation could look like this: During training: objective: find the global minimum on the error surface. solution: gradient descent, adjust weights to follow the steepest downhill slope. don't know surface in advance, so explore it in many small steps. Back propagation During training, information is propagated back through the network and used to update connection weights. How? Different neural network architectures use different algorithms to calculate the weight changes. Backpropagation (BP) is a commonly used (but inefficient) algorithm in MLPs. We know the errors at the output layer, but not at the hidden layer elements. BP solves the problem of how to calculate the hidden layer errors (it propagates the output errors back to the previous layer using the output element weights). The mathematics of this algorithm are given in several textbooks and on-line tutorials. For a detailed explanation of the back propagation algorithm, see Carling, Alison (1992) Introducing Neural Networks, Wilmslow: Sigma Press, pp. 147-154. It helps to know some features of it when training neural networks. 1. Internally most BP networks work with values between 0 and 1. If your inputs have a different range, NN simulators like Neural Planner will scale each input variable minimum to 0 and maximum to 1. 2. They change the weights each time by some fraction of the change needed to completely correct the error. This fraction, ß, is the learning rate. a. b. High learning rates cause the learning algorithm to take large steps on the error surface, with the risk of missing a minimum, or unstably oscillating across the error minimum ('sloshing') Small steps, from a low learning rate, eventually find a minimum, but they take a long time to get there.

c. Some NN simulators can be set to reduce the learning rate as the error decreases. d. Also, sloshing can be reduced by mixing in to the weight change a proportion of the last weight change, so smoothing out small fluctutions. This proportion is the momentum term. 3. The algorithm finds the nearest local minimum, not always the lowest minimum. One solution commonly used in backpropagation is to: 1. restart learning every so often from a new set of random weights (i.e. somewhere else in the weight space). 2. find the local minimum from each new start 3. keep track of the best minimum found 4. Overfitting is when the NN learns the specific details of the training set, instead of the general pattern found in all present and future data There can be two causes: a. Training for too long. Solution? 1. Test against a separate test set every so often. 2. Stop when the results on the test set start getting worse. b. Too many hidden nodes One node can model a linear function More nodes can model higher-order functions, or more input patterns Too many nodes model the training set too closely, preventing generalisation. Learning parameters for Neural Networks A summary of the parameters used in BP networks to control the learning behaviour.

Parameter Models Function Learning rate All Controls the step size for weight adjustments. Decreases over time for some types of NN. Momentum Back propagation Smooths the effect of weight adjustments over time. Error tolerance Back propagation Activation All function Specifies how close the output value must be to the desired value before the error is considered The function used at each neural processing unit to generate the output signal from the weighted average of inputs. Most common is the sigmoid function. Adapted from: Joseph P. Biggus (1996) Data mining with Neural Networks. New York: McGraw-Hill, p. 82.

Neural nets Back to Biological foundations On to NN applications Prepared by Dr. David R. Newman.