CSE-E4810 Machine Learning and Neural Networks

Similar documents
Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods for Fuzzy Systems

Artificial Neural Networks written examination

Artificial Neural Networks

Axiom 2013 Team Description Paper

Time series prediction

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Lecture 1: Basic Concepts of Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning to Schedule Straight-Line Code

Evolution of Symbolisation in Chimpanzees and Neural Nets

Human Emotion Recognition From Speech

INPE São José dos Campos

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Rule Learning With Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

arxiv: v1 [cs.lg] 15 Jun 2015

Lecture 1: Machine Learning Basics

Classification Using ANN: A Review

Knowledge Transfer in Deep Convolutional Neural Nets

Laboratorio di Intelligenza Artificiale e Robotica

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Henry Tirri* Petri Myllymgki

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A study of speaker adaptation for DNN-based speech synthesis

Speaker Identification by Comparison of Smart Methods. Abstract

Probabilistic Latent Semantic Analysis

Welcome to. ECML/PKDD 2004 Community meeting

Australian Journal of Basic and Applied Sciences

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Knowledge-Based - Systems

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

SARDNET: A Self-Organizing Feature Map for Sequences

Circuit Simulators: A Revolutionary E-Learning Platform

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Laboratorio di Intelligenza Artificiale e Robotica

CSL465/603 - Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Rule Learning with Negation: Issues Regarding Effectiveness

Test Effort Estimation Using Neural Network

Reducing Features to Improve Bug Prediction

TD(λ) and Q-Learning Based Ludo Players

A Case-Based Approach To Imitation Learning in Robotic Agents

Calibration of Confidence Measures in Speech Recognition

An OO Framework for building Intelligence and Learning properties in Software Agents

Assignment 1: Predicting Amazon Review Ratings

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Communication and Cybernetics 17

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Forget catastrophic forgetting: AI that learns after deployment

Speech Emotion Recognition Using Support Vector Machine

Deep Neural Network Language Models

GACE Computer Science Assessment Test at a Glance

Reinforcement Learning by Comparing Immediate Reward

(Sub)Gradient Descent

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Lecture 10: Reinforcement Learning

A Reinforcement Learning Variant for Control Scheduling

CS Machine Learning

A Review: Speech Recognition with Deep Learning Methods

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

arxiv: v1 [cs.cv] 10 May 2017

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Issues in the Mining of Heart Failure Datasets

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Second Exam: Natural Language Parsing with Neural Networks

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Model Ensemble for Click Prediction in Bing Search Ads

Soft Computing based Learning for Cognitive Radio

Chapter 9 Banked gap-filling

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

How People Learn Physics

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised Face Detection

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Data Fusion Models in WSNs: Comparison and Analysis

The Good Judgment Project: A large scale test of different methods of combining expert predictions

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Modeling function word errors in DNN-HMM based LVCSR systems

Ordered Incremental Training with Genetic Algorithms

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Automating the E-learning Personalization

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Measurement. When Smaller Is Better. Activity:

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Transcription:

CSE-E4810 Machine Learning and Neural Networks (5 cr) Lecture 1: Introduction to Neural Networks Prof. Juha Karhunen https://mycourses.aalto.fi/course/view.php?id=13086 Aalto University School of Science, Espoo, Finland 1

Artificial neural networks Consist of simple, adaptive processing units, called often neurons. The neurons are interconnected, forming a large network. Computation takes place in parallel, often layer-by-layer. Nonlinearities are typically used in the computations. Important property of neural networks: they learn from input data. Artificial neural networks have their roots in many areas, including: Neuroscience and neurobiology; mathematics and statistics; artificial intelligence; statistical physics; engineering; signal processing. Aalto University School of Science, Espoo, Finland 2

Aalto University School of Science, Espoo, Finland 3

Example of an artificial neural network The figure shows a fully connected feedforward network. There are three layers: input layer, hidden layer, and output layer. In such a network, computations proceed layer-by-layer from the input layer to the output layer. In the input layer of 10 neurons, the components data vector are inputted to the networks only. All the computations take place in the middle hidden layer of four neurons. And in the output layer of two neurons. In this example, the input (data) vectors are 10-dimensional, and output vectors two-dimensional. Aalto University School of Science, Espoo, Finland 4

Neural computing was inspired by computing in human brains. Neural networks resemble the brain in two respects: 1. The network acquires knowledge from its environment using a learning process (algorithm). 2. Synaptic weights, which are interneuron connection strenghts, are used to store the learned information. This is very different from digital computing. However, artificial neural network methods are in practice realized using standard digital computers. This is because standard computers have a huge advantage over neurocomputers (hardware realizations of neural networks) in usability. Aalto University School of Science, Espoo, Finland 5

Computational intelligence Computational intelligence is a broader area, which includes: Neural networks Fuzzy systems Evolutionary computing (especially genetic algorithms) Artificial intelligence Other machine learning approaches, such as: - Graphical modeling - Bayesian methods Our three machine learning courses cover neural networks and machine learning. Aalto University School of Science, Espoo, Finland 6

Application areas of neural networks Neural networks have applications in many branches of science and engineering, including: Modeling of nonlinear systems and mappings Time series processing Pattern recognition Signal processing Automatic control Engineering Business life and banking As well as many applied sciences Aalto University School of Science, Espoo, Finland 7

Figure 1: An example application of neural networks in business. Aalto University School of Science, Espoo, Finland 8

Nonlinearity Benefits of neural networks Allows modeling of nonlinear functions and processes. Nonlinearity is distributed through the network. Each neuron typically has a nonlinear output. Using nonlinearities has drawbacks, too: local minima, difficult analysis, no closed-form easy linear solutions. Input-output mapping In supervised learning, the input-output mapping is learned from training data. For example from known prototypes in classification. Typically, some statistical criterion is used. The synaptic weights (free parameters) are modified to optimize Aalto University School of Science, Espoo, Finland 9

the criterion. After the input-output mapping has been learned, it can be used for mapping new input vectors. Adaptivity Weights (parameters) can be retrained with new data. The network can adapt to nonstationary environment. However, the changes must be slow enough. Fault tolerance and VLSI implementability Neural networks are well-suited for very-large-scale-integrated (VLSI) technology. If neurons are damaged, the performance degrades gradually. Standard computers do not have this property. Some neurocomputers have been built. Aalto University School of Science, Espoo, Finland 10

However, their programming and use is difficult. Neurobiological analogy Human brains are fast, powerful, fault tolerant, and use massively parallel computing. Neurobiologists try to explain the operation of human brains using artificial neural networks. Engineers use neural computation principles for solving complex problems. Aalto University School of Science, Espoo, Finland 11

Learning types Two major categories: supervised and unsupervised learning. Supervised learning Some amount of training data are available. The training data consist of known input-output pairs. The known outputs are sometimes called desired responses. The training data are used to learn the weights of the networks. One can then use the input-output mapping learned in this way to map unseen new data vectors. The quality of learning is measured using a suitable criterion. Such as the mean-square error between the outputs of the network and the corresponding desired responses. Aalto University School of Science, Espoo, Finland 12

Unsupervised learning The are no known input-output training pairs available but only data (input) vectors. Unsupervised learning methods typically fit chosen type of model to the input data. The parameters of the model = weights of the neural networks are learned from input data. A suitable statistical criterion is used to measure the quality of learning. Other types of learning In semi-supervised learning, there is small amount of labeled training data but lots of unlabeled data. Aalto University School of Science, Espoo, Finland 13

Both are used in learning. This is a common situation nowadays as for example internet provides lots of data but labeling it is costly and/or time-consuming. In reinforcement learning one knows the desired output only coarsely. A reward can be given for good performance and/or punishment for poor performance. Humans and animals learn typically in this way. A more advanced mathematical form of reinforcement learning is dynamic programming. There optimization of the reward is based on the combined effect of several sequential decisions. We shall not discuss these learning types in our course. Aalto University School of Science, Espoo, Finland 14

A short history of neural networks McCulloch and Pitts presented in 1943 first simple mathematical model of neuron with no learning. In 1958 Rosenblatt introduced perceptron which is the first computational neural network with learning. In 1960 Widrow introduced Widrow-Hoff learning rule and network structures associated with it. This learning rule for a single neuron has found widespread use in adaptive signal processing under the name LMS (least mean square) algorithm. Minsky and Papert criticized in their book perceptron for its limited capability in 1969. This led to slowdown of neural network research in 1970 s. Aalto University School of Science, Espoo, Finland 15

A boom attracting lots of researchers to study neural networks began around 1985 with several new promising approaches: Hopfield s network Multilayer perceptrons using backpropagation learning Self-organizing map This strong research activity continued largely in the 1990 s. During the last decade many researchers have moved from neural networks to study other machine learning methods and data mining. However, neural networks have many real-world applications in engineering, science, and business. With many conferences and journals still covering their recent developments. Aalto University School of Science, Espoo, Finland 16

Emerging research topics Recently neural networks have again become popular mainly due to deep learning. There one uses neural networks with many layers. We shall discuss it somewhat superficially on the last lecture because it is a difficult topic. By training wisely deep neural networks world records have been achieved in many benchmark classification problems. Another new research topic is cognitive computing. The ultimate goal is to build brain-like cognitive computing chips. This SyNAPSE project tries to combine neuroscience, supercomputing, and nanotechnology for achieving that goal. Aalto University School of Science, Espoo, Finland 17

Examples of applications with real-world data Classification of handwritten digits Deep belief networks (DBFs) are advanced neural network methods for nonlinear mapping and classification. They use a stack of restricted Boltzmann machines. We shall discuss these topics briefly in the last lecture 13. Data: handwritten digits (0,1,2,...,9) from widely used MNIST benchmark database. The MNIST data is often used for testing the performance of different mapping and/or classification methods. By mapping the high-dimensional handwritten digit data to two dimensions, one can assess visually the quality of the mapping. Aalto University School of Science, Espoo, Finland 18

And one can compare the classification errors of different methods for the MNIST data. Figure 2 shows that DBF provides a nonlinear mapping which can separate pretty well the digits even in two dimensions. The classification error of deep belief network is only 1.0%. This is smaller than for multilayer perceptrons, 1.6%, and for support vector machines, 1.4%. These widely used neural network methods are discussed in more detail later on in this course. Principal component analysis is a widely used linear mapping method. But it provides much worse mapping than DBF in this example; see the Figure 3. Aalto University School of Science, Espoo, Finland 19

Figure 2: Mapping of MNIST data using a deep belief network. Aalto University School of Science, Espoo, Finland 20

Figure 3: Mapping of MNIST data using principal component analysis. Aalto University School of Science, Espoo, Finland 21

Web mining using self-organizing maps The self-organizing map (SOM) is a useful tool for visualizing and arranging data with many real-worls applications. It has developed by Prof. Teuvo Kohonen in our laboratory. The next figure shows an application of SOM to web mining of a huge patent dataset of some 6,8 million patents. The self-organizing map computed had about one million neurons. From the map, one can search by keywords closely related patents by their contents. The map inserts in the figure show results of a coarse, medium, and fine search. Aalto University School of Science, Espoo, Finland 22

Figure 4: WEBSOM document map of 6,8 million patents. Aalto University School of Science, Espoo, Finland 23

Analysis of world climate data In this course, we discuss independent component analysis (ICA). ICA can often find more meaningful components from vector-valued input data than for example PCA. Denoising source separation (DSS) is an ICA-related technique which can utilize prior information. DSS and ICA methods have been developed in our laboratory. In this example, we consider application of DSS techniques to world climate data. A huge data sets of daily weather measurements over 56 years in 10,000 locations over the globe. Quantities such as surface temperature, precipitation, air pressure, and cloudiness were measured. Aalto University School of Science, Espoo, Finland 24

Figure 5: A satellite image of earth. Aalto University School of Science, Espoo, Finland 25

2 0 2 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Figure 6: The component describing global warming separated by DSS. Aalto University School of Science, Espoo, Finland 26

DSS with suitable prior information can extract a component which clearly corresponds to global warning. The previous figure shows it both with respect to time (upper curve) and location (world map). The upmost curve in the next figure depicts the component extracted by DSS with largest spatial interannual variability. It describes quite well the El Nino phenomenon; cf. with the third curve which is the climatological El Nino index. The two other curves are derivatives of the El Nino phenomenon: - Separated by DSS (component 2); - And computed from the climatology index (component 4). The red curves show the mean value of the component. Aalto University School of Science, Espoo, Finland 27

3 0 3 3 0 3 3 1 1 1950 1960 1970 1980 1990 2000 Figure 7: The two upmost components separated by DSS have the largest interannual variability. The third curve is the El Nino index used in climatology and the fourth one is its derivative. Aalto University School of Science, Espoo, Finland 28

The last image shows the spatial patterns corresponding to the El Nino component found by denoising source separation: - Surface temperature (upmost subfigure); - Sea level pressure (middle subfigure); - Precipitation (bottom subfigure). Red color) in all the spatial images shows larger values compared to normal ones. Respectively blue color is used to depict values smaller than the normal ones. Aalto University School of Science, Espoo, Finland 29

0.5 0 0.5 100 0 100 2 1 0 1 2 Figure 8: Surface temperature (top), sea level pressure (middle), and precipitation (bottom) corresponding to the first component found by DSS shown in the previous figure. Aalto University School of Science, Espoo, Finland 30

More on neural networks and machine learning Useful books 1. E. Alpaydin, Introduction to Machine Learning, 3rd ed., The MIT Press, 2014. It is used as the textbook in our course T-61.3050 Machine Learning: Basic Principles. This undergraduate level book deals mainly with other machine learning methods than neural networks. 2. C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. A graduate level textbook which is a useful reference especially on probabilistic methods. It is too difficult and deals too little neural networks for the purposes of our course. Aalto University School of Science, Espoo, Finland 31

Some examples from this book are presented in our course. 3. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice-Hall, 1998. This book was used earlier when we had two courses on neural networks. It is too extensive (800 pages) for the purposes of our course. However, we use the Chapter 6 on support vector machines. Matters of lecture 11: Processing of temporal information are taken from Chapter 13 of this book. 4. S. Haykin, Neural Networks and Learning Methods, 3rd ed., Pearson Int. Ed., 2009. This new 3rd edition of the previous book is not markedly better than the 2nd edition, but has now more than 900 pages. Chapters have been restructured, some new material has been Aalto University School of Science, Espoo, Finland 32

added and some has been left out. The main problem of this book is that the matters are discussed throughout too extensively and in detail. 5. K. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012. An excellent and extensive (over 1000 pages) new book on probabilistic machine learning methods. But it does not discuss hardly at all neural networks. Journals publishing new research results Journals on neural network research: Neural Computation, IEEE Trans. on Neural Networks, Neural Networks, Neurocomputing, Neural Network Letters, Int. Journal on Neural Systems. Many of these publish also articles on other machine learning methods. Aalto University School of Science, Espoo, Finland 33

Journals on machine learning research: Machine Learning, Int. Journal of Machine Learning Research. International Conferences IJCNN, IEEE Int. Joint Conf. on Neural Networks, is the largest neural network conference in the world. ICANN, Int. Conf. on Artificial Neural Networks, is the premiere European conference on neural networks and now also machine learning. NIPS, Neural Information Processing Systems, is a high quality conference on machine learning and neural networks. ICML, Int. Conf. on Machine Learning, is a high quality machine learning conference. ECML, European Conf. on Machine Learning, is the respective good Aalto University School of Science, Espoo, Finland 34

quality European conference. There are many other smaller and/or lower quality conferences. Usually new research results are first published in conferences, and valuable enough ones later on in expanded form in journals. Aalto University School of Science, Espoo, Finland 35