Connectionism (Artificial Neural Networks) and Dynamical Systems

Similar documents
Artificial Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Evolution of Symbolisation in Chimpanzees and Neural Nets

Test Effort Estimation Using Neural Network

Python Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

An Empirical and Computational Test of Linguistic Relativity

Abstractions and the Brain

Lecture 1: Basic Concepts of Machine Learning

Learning to Schedule Straight-Line Code

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

Learning Methods for Fuzzy Systems

Human Emotion Recognition From Speech

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

How People Learn Physics

Breaking the Habit of Being Yourself Workshop for Quantum University

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Word learning as Bayesian inference

Using computational modeling in language acquisition research

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

While you are waiting... socrative.com, room number SIMLANG2016

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Softprop: Softmax Neural Network Backpropagation Learning

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Predicting Future User Actions by Observing Unmodified Applications

Go fishing! Responsibility judgments when cooperation breaks down

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Axiom 2013 Team Description Paper

The Strong Minimalist Thesis and Bounded Optimality

SARDNET: A Self-Organizing Feature Map for Sequences

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

INPE São José dos Campos

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

phone hidden time phone

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Calibration of Confidence Measures in Speech Recognition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

BMBF Project ROBUKOM: Robust Communication Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Mandarin Lexical Tone Recognition: The Gating Paradigm

Foothill College Summer 2016

CS Machine Learning

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Word Segmentation of Off-line Handwritten Documents

Speaker Identification by Comparison of Smart Methods. Abstract

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

A Reinforcement Learning Variant for Control Scheduling

Cognitive Thinking Style Sample Report

Using focal point learning to improve human machine tacit coordination

Understanding and Changing Habits

Introduction to Simulation

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

An OO Framework for building Intelligence and Learning properties in Software Agents

Forget catastrophic forgetting: AI that learns after deployment

Number Line Moves Dash -- 1st Grade. Michelle Eckstein

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Networks in Cognitive Science

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Lecture 1: Machine Learning Basics

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Major Milestones, Team Activities, and Individual Deliverables

A study of speaker adaptation for DNN-based speech synthesis

Learning Methods in Multilingual Speech Recognition

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Measurement. When Smaller Is Better. Activity:

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Lecture 10: Reinforcement Learning

Joe Public ABC Company

Beyond the Pipeline: Discrete Optimization in NLP

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Average Daily Membership Proposed Change to Chapter 8 Rules and Regulations for the Wyoming School Foundation Program

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

The Foundations of Interpersonal Communication

On-Line Data Analytics

Physics 270: Experimental Physics

Software Maintenance

Mathematics process categories

Issues in the Mining of Heart Failure Datasets

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

A Process-Model Account of Task Interruption and Resumption: When Does Encoding of the Problem State Occur?

E-3: Check for academic understanding

An empirical study of learning speed in backpropagation

ACCREDITATION STANDARDS

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Discriminative Learning of Beam-Search Heuristics for Planning

TD(λ) and Q-Learning Based Ludo Players

MYCIN. The MYCIN Task

Transcription:

COMP 40260 Connectionism (Artificial Neural Networks) and Dynamical Systems Part 2

Read Rethinking Innateness, Chapters 1 & 2

Let s start with an old neural network, created before training from data was possible. It illustrates how we might use some aspects of our network (processing times, sequence, etc) to mimic measurable behavioural variables.

McClelland and Rummelhart s 1981 model of the Word Superiority effect: Weights are inhibitory (dot) or excitatory (arrow) Weight values are hand crafted to achieve desired results

The interactive Activation Model: a Gradual Mutual Constraint Satisfaction Process Units represent hypotheses about the visual input at several levels and positions. Features Letters Words Connections code contingent relations: Excitatory connections for consistent relations Inhibitory connections for inconsistent relations Lateral inhibition for competition among mutually inconsistent possibilities within levels. Connections run in both directions So that the network tends to evolve toward a state of activation in which everything is consistent.

Interactive Activation Simultaneously Identifies Words and Letters Stimulus input comes first to letter level, but as it builds up, it starts to influence the word level. Letter input from all four positions makes work the most active word unit (there is no word worr). Although the bottom up input to the letter level supports K and R equally in the fourth letter position, feedback from the word level supports K, causing it to become more active, and lateral inhibition then suppresses activation of R.

The patterns seen in the physiology are comparable to those seen in the interactive activation model in that the effect of direct input is manifest first, followed somewhat later by contextual influences, presumably mediated in the physiology by neurons sensitive to the overall configuration of display elements. direct context Web app: http://www.psychology.nottingham.ac.uk/staff/wvh/jiam/

There is a javascript implementation of this model you might like to play with. Stick to those aspects that you learned in class, before exploring fine detail. Web app: http://www.psychology.nottingham.ac.uk/staff/wvh/jiam/

McClelland and Elman: TRACE: model of spoken word recognition (1984) Mapping from sound input to word output, via phonemes

Trace Inhibition only within layers Sequential input Features extracted from speech

A Java-based implementation of TRACE is available at http://magnuson.psy.uconn.edu/jtrace/ if you feel like playing with it. The wikipedia page is also fairly good in the overview it provides. Don t expect much depth to the discussion though https://en.wikipedia.org/wiki/trace_(psycholinguistics)

Both models are examples of Interactive Activation Networks Jets and Sharks

In Lab 2, we will be playing with an implementation of the Jets and Sharks network. Further detailed information is available, e.g. http://staff.itee.uq.edu.au/janetw/cmc/chapters/iac/ http://www.cs.indiana.edu/~port/brainwave.doc/iac.html

Learning (Take 1) The Word-Superiority Network and Trace used hand crafted weights. It would be nice to learn the appropriate values from data Why might this be good? Why might this be bad?

Hebbian Learning

Hebbian learning Hebb's Postulate: When an axon of cell A...excites cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells so that A's efficiency as one of the cells firing B is increased Learns pairwise correlations (and nothing else) Can generate intriguing structure in large multilayered networks One of a family of unsupervised learning techniques

Hebbian Learning Units that Wire together, Fire together It is an associative learning method Similar things are stored in similar ways.

Learning rate Change in weight to unit i from unit j Activation of unit i

Issues with Hebbian Learning Local knowledge Learns correlations Unstable (in simple form) Basis for most unsupervised learning techniques Simple... Adaptable...

Classical conditioning, implemented in the style of Hebb.

These data have been generated to illustrate the simplest Hebbian learning. Notice that the 2 inputs are correlated, and Input 1 is typically about twice the size of Input 2

There are 2 weights. w31 w32 We set these to random values. Let s do that 4 different times. Each time, we pick 2 random numbers. We can plot these in the same 2-D space as the patterns.

Here are the first 20 of 1000 input patterns. Now we take each pattern in turn, and calculate how much we would change the weights, based on that pattern alone. We keep track of this for 1000 patterns, then we change the weights by a small fraction of this amount (the size is given by the learning rate)

Here you see the changes in the weights from 4 different starting points, as we repeat this process over and over. Notice that it doesn t matter where we start, we always end up with weights with a ratio of about 2:1. I.e the weight values start to reflect the correlation evident in the data set.

Limits on pure Hebbian Learning Hebbian learning learns correlations. Only. Some means to stop unlimited growth of weights is necessary Long-Term Inhibition needed as a counter- mechanism to Long-Term Potentiation

Behold, the PERCEPTRON!!!!

The Perceptron

Perceptron Convergence Procedure Perceptron: 2-layer network with threshold activation function at the output units (+/- 1) Trained on a data set for which we have both input and target pairs Wt changes based on error at outputs. Wt change depends on error produced and activation coming along a weight from a given input (credit and blame algorithm)

Perceptron Convergence, contd... PCP requires an explicit teacher Similar inputs yield similar outputs (cf also Hebbian Learning) not a bad idea in principle Many problems cannot be solved with this limitation: famous example: learning XOR

Learning AND -1.5 1.0 1.0 inputs 0 0 output 0 1.0 1 0 0 1 0 0 1 1 1 1 This is input space 0 1

Learning OR 1.0-0.5 1.0 1.0 inputs 0 0 1 0 0 1 output 0 1 1 1 1 1 1 input space 0 1

Learning XOR 1.0??? inputs 0 0 1 0 0 1 output 0 1 1 1 1 0 1 Patterns are not linearly separable in input space 0 1

Adding hidden units 1 input space 0 1 hidden unit space

A perceptron is a classifier. Strictly speaking, each output node is a classifier (output = 1 or 0) If the classes are linearly separable, then the Perceptron Convergence Procedure will reach a solution that correctly classifies all items in the training set. If the classes are not linearly separable, it won t.

The Minsky and Papert Challenge A straightforward training procedure for 2-layer linear networks was long known It was also known that multi-layered networks with non-linear hidden units could solve much tougher problems Minsky and Papert (Perceptrons, 1969) famously claimed that such complex networks could not be readily trained Backpropagation (back prop) famously solved this problem (for many cases)