Important properties of artificial neural networks will be discussed, namely that,

Similar documents
Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Artificial Neural Networks written examination

Python Machine Learning

Learning Methods for Fuzzy Systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lecture 1: Machine Learning Basics

Artificial Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Knowledge-Based - Systems

INPE São José dos Campos

Softprop: Softmax Neural Network Backpropagation Learning

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Human Emotion Recognition From Speech

Issues in the Mining of Heart Failure Datasets

SARDNET: A Self-Organizing Feature Map for Sequences

Evolutive Neural Net Fuzzy Filtering: Basic Description

(Sub)Gradient Descent

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Time series prediction

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Soft Computing based Learning for Cognitive Radio

Test Effort Estimation Using Neural Network

Generative models and adversarial training

Lecture 1: Basic Concepts of Machine Learning

Axiom 2013 Team Description Paper

Classification Using ANN: A Review

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Laboratorio di Intelligenza Artificiale e Robotica

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Speaker Identification by Comparison of Smart Methods. Abstract

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Lecture 10: Reinforcement Learning

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.lg] 15 Jun 2015

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Reinforcement Learning by Comparing Immediate Reward

Evolution of Symbolisation in Chimpanzees and Neural Nets

Software Maintenance

On the Formation of Phoneme Categories in DNN Acoustic Models

Word Segmentation of Off-line Handwritten Documents

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

CSL465/603 - Machine Learning

Early Model of Student's Graduation Prediction Based on Neural Network

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Learning to Schedule Straight-Line Code

Modeling function word errors in DNN-HMM based LVCSR systems

Deep Neural Network Language Models

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

A Reinforcement Learning Variant for Control Scheduling

Using the Artificial Neural Networks for Identification Unknown Person

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

A study of speaker adaptation for DNN-based speech synthesis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Using focal point learning to improve human machine tacit coordination

Forget catastrophic forgetting: AI that learns after deployment

Seminar - Organic Computing

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

A Genetic Irrational Belief System

A Comparison of Annealing Techniques for Academic Course Scheduling

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

BUSINESS INTELLIGENCE FROM WEB USAGE MINING

Knowledge Transfer in Deep Convolutional Neural Nets

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

An Empirical and Computational Test of Linguistic Relativity

Attributed Social Network Embedding

Laboratorio di Intelligenza Artificiale e Robotica

Probabilistic Latent Semantic Analysis

Henry Tirri* Petri Myllymgki

Cooperative evolutive concept learning: an empirical study

WHEN THERE IS A mismatch between the acoustic

Discriminative Learning of Beam-Search Heuristics for Planning

Reducing Features to Improve Bug Prediction

On the Combined Behavior of Autonomous Resource Management Agents

An empirical study of learning speed in backpropagation

Modeling function word errors in DNN-HMM based LVCSR systems

Assignment 1: Predicting Amazon Review Ratings

Speech Recognition at ICSI: Broadcast News and beyond

Transfer Learning Action Models by Measuring the Similarity of Different Domains

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS

Comment-based Multi-View Clustering of Web 2.0 Items

Australian Journal of Basic and Applied Sciences

arxiv: v1 [math.at] 10 Jan 2016

Transcription:

CP8206 Soft Computing & Machine Intelligence 1 PRINCIPLE OF ARTIFICIAL NEURAL NETWORKS Important properties of artificial neural networks will be discussed, namely that, (i) the underlying principle of artificial neural networks. (ii) general representation of the neural networks, (iii) the principles of the error correction algorithm.

CP8206 Soft Computing & Machine Intelligence 2 ARTIFICIAL INTELLIGENCE & NEURAL NETWORKS During the past twenty years, interest in applying the results of Artificial Intelligence (AI) research has been growing rapidly. AI relates to the development of theories & techniques required for a computational engine to efficiently perceive, think & act with intelligence in complex environments. The artificial intelligence discipline is concerned with intelligent computer systems, exhibiting the characteristics associated with intelligence in human behavior, such as understanding language, learning, solving problems & reasoning.

CP8206 Soft Computing & Machine Intelligence 3 BRANCHES OF AI Developments in some branches of AI have already led to new technologies having significant effects in problem solving approaches. These include new ways of defining the problems, new methods of representing the existing knowledge regarding the problems & new problem handling methods. There are several distinctive areas of research in Artificial Intelligence, more importantly: artificial neural networks, fuzzy logic systems, expert systems, each with its own specific interest, research techniques, terminology & objectives (Fig. 1).

CP8206 Soft Computing & Machine Intelligence 4 AI Neural Networks Expert Systems Genetic Algorithms Fuzzy Systems Neuro-Fuzzy Systems Neuro-Genetic Systems Fuzzy-Expert Systems Fig. 1: Partial Taxonomy of Artificial Intelligence depicting a number of important AI branches & their relationships

CP8206 Soft Computing & Machine Intelligence 5 NEURAL NETWORKS Among the various branches of AI, the area of artificial neural networks in particular has received considerable attention during the twenty years. An artificial neural network is a massively parallel & distributed processor that has a natural propensity for storing experimental knowledge & making it available for use. The underlying idea is to implement a processor that works in a fashion similar to the human brain.

CP8206 Soft Computing & Machine Intelligence 6 NEURAL NETWORKS NN resembles the brain in two respects; first, the knowledge is acquired through a learning process, & second, inter-neuron connection strengths known as weights are used to store the knowledge. The learning process involves modification of the connection weights to obtain a desired objective. Major applications of neural networks can be categorized into five groups including Pattern recognition, image processing, signal processing, system identification & control.

CP8206 Soft Computing & Machine Intelligence 7 NEURAL NETWORKS There are a variety of definitions for artificial neural networks each of which highlights some aspects of this methodology such as: its similarity to its biological counterpart, its parallel computation capabilities, & its interaction with outside world. A neural network is a non-programmable dynamic system with capabilities such as trainability & adaptivity that can be trained to store, process & retrieve information. It also possesses the ability to learn & to generalize based on past observations.

CP8206 Soft Computing & Machine Intelligence 8 NEURAL NETWORKS Owe their computing power to their parallel/distributed structure & the manner that the activation functions have been defined. This information processing ability provides the possibility of solving complex problems. Function approximation (I/O mapping): ability to approximate any nonlinear function to the desired degree. Learning & Generalization: ability to learn I/O patterns, extract the hidden relationship among presented data, & provide acceptable response to new data that the network has not yet experienced. This enables neural networks to provide models based on the imprecise information.

CP8206 Soft Computing & Machine Intelligence 9 NEURAL NETWORKS Adaptivity: capable of modifying their memory, & thus its f n ality, over time. Fault tolerance: due to their highly parallel/distributed structure, failure of a number of neurons to generate the correct response does not lead to failure of the overall performance of the system.

CP8206 Soft Computing & Machine Intelligence 10 NEURAL NETWORKS - DISADVANTAGES large dimension that leads to memory restriction; selection of optimum configuration; convergence difficulty especially when sol n is trapped in local minima; choice of training methodology; black-box representation, lack of explanation capabilities & transparency.

CP8206 Soft Computing & Machine Intelligence 11 NEURAL NETWORKS A neural network can be characterized in terms of: Neurons: the basic processing units defining the manner in which computation is performed. Neuron activation functions: indicate the function of each neuron. Inter-neuron patterns: define the way neurons are connected to each other. Learning algorithms: define how the knowledge stored in the network.

CP8206 Soft Computing & Machine Intelligence 12 NEURON MODEL NN paradigm attempts to clone the physical structure & functionality of the biological neuron. Artificial neurons, like their biological counterparts, receive inputs, [x 1, x 2,..., x r ], from the outside or other neurons through incoming connections. Each neuron then generates a product term, [w i x i ], using the inputs & connections weights ([w 1, w 2,..., w r ], represents the connection memory). The product terms are then summed using an addition operator to produce the neuron internal activity index, v(t).

CP8206 Soft Computing & Machine Intelligence 13 NEURON MODEL This index is passed to an activation function, ϕ(.), which produces an output, y(t). vt ()= r i= 1 wx i i ( vt) (1) yt () = ϕ () (2) A more general model of the neuron functionality can be provided by the introduction of a threshold measure, w 0, for the activation function.

CP8206 Soft Computing & Machine Intelligence 14 NEURON MODEL This signifies the scenario where a neuron generates an output if its input is beyond the threshold (Fig. 2), i.e., r yt () = ϕ ( wx i i w0 ) (3) i = 1 This model is a simple yet useful approximation of the biological neuron & can be used to develop different neural structures including feedforward & feedback networks (Fig. 3).

CP8206 Soft Computing & Machine Intelligence 15 x 1 ±1 w k1 w k1 x 2 w k1 a k ϕ(.) y k x r w k1 aggregation operation synaptic operation somatic operation Fig. 2: nonlinear model of a neuron

CP8206 Soft Computing & Machine Intelligence 16 TYPES OF ACTIVATION FUNCTIONS Each neuron includes a nonlinear function, known as the activation function, that transforms several weighted input signals into a single numerical output signal. The neuron activation function, ϕ(.) expresses the functionality of the neuron. There are at least three main classes of activation function, including linear, sigmoid & Gaussian. Table 3.1 illustrates different types of activation functions.

CP8206 Soft Computing & Machine Intelligence 17 NEURAL NETWORK ARCHITECTURES The manner in which neurons are connected together defines the architecture of a neural network. These architectures can be classified into two main groups (Fig. 3): Feedforward neural network Recurrent neural network

CP8206 Soft Computing & Machine Intelligence 18 Neural Networks Feedforward Lattice Recurrent Single Layer Multi Layer Single Layer Multi Layer Perceptron Radial Basis Function Elman Hopfield Fig. 3: Classification of different neural network structures

CP8206 Soft Computing & Machine Intelligence 19 FEEDFORWARD NEURAL NETWORK The flow of the information is from input to output. SINGLE LAYER NETWORK (Fig. 4): The main body of the structure consists of only one layer (a one-dimensional vector) of neurons. Can be considered as a linear association network that relate the output patterns to input patterns.

CP8206 Soft Computing & Machine Intelligence 20 x 1 ϕ(.) y 1 x 2 ϕ(.) y 2 x r Inputs ϕ(.) Single Layer of Neurons y r Outputs Fig. 4: Single layer feedforward neural network

CP8206 Soft Computing & Machine Intelligence 21 A MULTI-LAYER NETWORK (Fig. 5): The structure consists of two or more layers of neurons. The function of the additional layers is to extract higher order statistics. The network acquires a global perspective despite its local connectivity by virtue of the extra set of connection connections & the extra dimension of neural interaction. Specified by The number of I/O, the number of layers, Number of neuron in each layer, The network connection pattern, & The activation function for each layer.

CP8206 Soft Computing & Machine Intelligence 22 x 1 x 2 ϕ 1 (.) x 3 ϕ 1 (.) ϕ 2 (.) y 1 ϕ 2 (.) y q x p-1 ϕ 1 (.) Outputs x p Second Layer Inputs First Layer Fig. 5: Multi-Layer feedforward neural network

CP8206 Soft Computing & Machine Intelligence 23 RECURRENT NEURAL NETWORK A recurrent structure represents a network in which there is at least one feedback connection. Fig. 6 depicts a multi-layer recurrent neural network, which is similar to the feedforward case except for the presence of the feedback loops & z -1 (unit delay operator) that introduces the delay involved in feeding back the output to input.

CP8206 Soft Computing & Machine Intelligence 24 Feedback connections z -1 ϕ 1 (.) z -1 ϕ 1 (.) ϕ 2 (.) y 1 Unit delay x 1 ϕ 2 (.) y q x p-1 ϕ 1 (.) Outputs x p Second Layer Inputs First Layer Fig. 6: Multi-layer recurrent neural network

CP8206 Soft Computing & Machine Intelligence 25 Table 3.1: Neural Network Activation Functions Piecewise Linear; Function 1 if x < b f( x) = ax. if x< b -10 + 1 if X > b 1 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 plot -1-5 0 5 10 10 8 Linear; f(x)=a.x 6 4 2 0-2 -4-6 -8-10 -10-5 0 5 10 1 0.8 Indicator; f(x)=sgn(x) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 -10-5 0 5 10

CP8206 Soft Computing & Machine Intelligence 26 1 Sigmoid; f ( x) = +. 1 e ax 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-10 -5 0 5 10 1 0.8 Bipolar Sigmoid; f(x)=tanh(a.x) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 -10-5 0 5 10 1 0.9 0.8 0.7 Gaussian; f ( x) = e x 2 2σ 2 0.6 0.5 0.4 0.3 0.2 0.1 0-10 -5 0 5 10

CP8206 Soft Computing & Machine Intelligence 27 MULTI-LAYER PERCEPTRON (MLP) A class of NNs that consists of one input layer together with one output layer that represent the system inputs & outputs, respectively, & one or more hidden layers that provide the learning capability for the network (Fig. 7). The basic element of a MLP network is an artificial neuron whose activation function, for the hidden layer, is a smooth, differentiable function (usually sigmoid). The neurons in the output layer have a linear activation function.

CP8206 Soft Computing & Machine Intelligence 28 w ij, b ij : weights & biases-hidden layer i: number of inputs; 1,,n j: number of neurons; 1,,m ω jk : weights-output layer k: number of outputs LINEAR m n f( x1,..., xn) = ωi g wij xj θ i i= 1 j= 1 Sigmoid function 1 gx ( )= + 1 e x ω 1 ω m g(x) g(x) g(x) g(x) g(x) w 1,1 w n,m X 1 X 2 X 3 Fig. 7: General structure of a Multi-Layer Perceptron network, illustrating the concept of input, hidden & output layers X n

CP8206 Soft Computing & Machine Intelligence 29 MLP The output of a MLP network, therefore, can be represented as follows: F ( x 1,..., x p ) M p = ω i g w ij x j θ i 4243 i = 1 j = 1 1 442443 1 44 4 244 44 3 internal activation hidden layer output output layer output (4) where F( ) is the network output, [x 1,,x p ] is the input vector having P inputs, M denotes the number of hidden neurons, w represents the hidden layer connection weights, θ is the threshold value associated with hidden neurons, & ω represents the output layer connection weights which in effect serves as coefficients to the linear output function.

CP8206 Soft Computing & Machine Intelligence 30 UNIVERSAL APPROXIMITY It has been proven mathematically that standard multi-layer perceptron networks using arbitrary squashing functions are capable of approximating any continuous function from one finite dimensional space to another to any desired degree of efficiency, provided sufficient hidden neurons are available. A squashing function is a non-decreasing function that is defined as follows: σ( t) 1 0 as as t t +,. (6)

CP8206 Soft Computing & Machine Intelligence 31 UNIVERSAL APPROXIMITY It has been further shown that approximation can be achieved using any multilayer perceptron with only one hidden layer & sigmoid function. MLPs are a class of universal approximator & can be used successfully to solve difficult problems in diverse areas using the error back-propagation learning algorithm. Furthermore, failure in learning can be attributed to factors such as inadequate learning, insufficient number of hidden neurons, & non-deterministic nature of relationship between inputs & outputs.

CP8206 Soft Computing & Machine Intelligence 32 THE STONE-WEIERSTRASS THEOREM to prove that NNs are capable of uniformly approximating any real continuous function on a compact set to an arbitrary degree of accuracy. This theorem states that for any given real continuous function, f, on a compact set U R n, there exists an NN, F, that is an approximate realization of the function f( ): M p F( x1,..., xp) = ωi ϕ wijxj θ i (7) i = 1 j = 1 F( x1,..., xp) f ( x1,..., xp) < ε (8) where X=(x 1, x 2,, x n ) U represents the input space, ε denotes the approximation error for all { x1,..., x p } U & ε is positive very small value.

CP8206 Soft Computing & Machine Intelligence 33 LEARNING PROCESS accomplished through the associations between different I/O patterns. Regularities & irregularities in the training data are extracted, & consequently are validated using validation data. Achieved by stimulating the network using the data representing the f n learned & attempting to optimize a related performance measure. to be Assumed that the data represents a system that is deterministic in nature with unknown probability distributions.

CP8206 Soft Computing & Machine Intelligence 34 LEARNING PROCESS The fashion in which the parameters are adjusted determines the type of learning. There are two general learning paradigms (Fig. 8): Unsupervised learning Supervised learning Unsupervised learning not in the scope of the course, not to be discussed.

CP8206 Soft Computing & Machine Intelligence 35 Learning Algorithms Supervised Learning Unsupervised Learning Back Propagation Widrow- Hoff Rule Perceptron rule Associative Self Organizing Kohonen Hebbian Competitive Fig. 8: A classification of learning algorithms

CP8206 Soft Computing & Machine Intelligence 36 SUPERVISED LEARNING The organization & training of a neural network by a combination of repeated presentation of input patterns & their associated output patterns. Equivalent to adjusting the network weights. In supervised learning, a set of training data is used to help the network in arriving at appropriate connection weights. Can be seen in the conventional delta rule, one of the early supervised algorithms, that was developed by McCulloch & Pitts, & Rosenblatt. In this method, a training data set is always available that provides the system ideal values for output due to a set of known inputs & the goal is to obtain the strength of each connection in the network.

CP8206 Soft Computing & Machine Intelligence 37 BACK-PROPAGATION The best known supervised learning algorithm. This learning rule was first developed by Werbos, improved by Rumelhart et al. The learning is done on the basis of direct comparison of the output of the network with known correct answers. An efficient method of computing the change in each connection weight in a multi-layer network so as to reduce the error in the outputs. Works by propagating errors backwards from the output layer to the input layer.

CP8206 Soft Computing & Machine Intelligence 38 Back-Propagation an efficient method of computing the change in each connection weight in a multi-layer network so as to reduce the error in the outputs. The method essentially works by propagating errors backwards from the output layer to the input layer. assuming that w ji denotes the connection weight from i th neuron to j th, x j signifies the input to j th neuron, y j represents the corresponding output, d j is the desired output: Total input to unit j: x j = yiw ji (9) i Output from unit j: y j 1 = 1 + e x j (10)

CP8206 Soft Computing & Machine Intelligence 39 The back-propagation algorithm attempts to minimize the global error which, for a given set of weights, is the squared difference between the actual and desired outputs of a unit, i.e., where E denotes the global error. 1 ( y ) 2 j d E =, c j, c (11) 2 c j The error derivatives for all weights can be computed by working backwards from the output units after a case has been presented and given the derivatives, the weights are updated to reduce the error.

CP8206 Soft Computing & Machine Intelligence 40 j+2 j+1 j E y j E x j = y d j j E y y y y = = j x j j ( 1 ); y ( 1 y j j j) j i+2 i+1 i E w ji E y i E x y xj = w y i ; = j E x w x j = w ji ; = j y j j i i ji Fig. 9: basic idea of back-propagation learning algorithm

CP8206 Soft Computing & Machine Intelligence 41 BACK-PROPAGATION Consists of two passes; forward and backward. Forward pass: a training case is presented to the network. The training case itself consists of an input vector and its associated (desired) output. Backward pass: starts when the output error, i.e., the difference between the desired and actual output, is propagated back through and changes are made to connection weights in order to reduce the output error. Different training cases are then presented to the network. The process of presenting epochs of training cases to the network continues until the average error over the entire training set reaches a defined error goal.

CP8206 Soft Computing & Machine Intelligence 42 Define network structure Define connection pattern Define activation functions Define p erformance Prepare training data Prepare validation data P ro vid e stim ilus from training set to the network feedforward flow of information - generate output and performance measure P ro vid e stim ilu s from validation to the netw ork error backpropagated through the netw ork, changes proportional to the derivative of error wrt weight tyo be made to synaptic weights No perform ance measure satisfactory? Yes feedforw ard flow of information - generate output and performance measure No perform ance measure satisfactory? end of training Yes Fig. 10: Basic presentation of back-propagation learning algorithm