Supervised and Unsupervised Learning. Ciro Donalek Ay/Bi 199 April 2011

Similar documents
Python Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

Assignment 1: Predicting Amazon Review Ratings

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Artificial Neural Networks

CS Machine Learning

CSL465/603 - Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods for Fuzzy Systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Issues in the Mining of Heart Failure Datasets

Time series prediction

SARDNET: A Self-Organizing Feature Map for Sequences

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning From the Past with Experiment Databases

Rule Learning With Negation: Issues Regarding Effectiveness

A14 Tier II Readiness, Data-Decision, and Practices

On-Line Data Analytics

Mining Association Rules in Student s Assessment Data

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Softprop: Softmax Neural Network Backpropagation Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Model Ensemble for Click Prediction in Bing Search Ads

Test Effort Estimation Using Neural Network

Calibration of Confidence Measures in Speech Recognition

Word Segmentation of Off-line Handwritten Documents

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Tier II Overview: Readiness, Data-Decisions, and Practices

Knowledge Transfer in Deep Convolutional Neural Nets

Human Emotion Recognition From Speech

Knowledge-Based - Systems

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Applications of data mining algorithms to analysis of medical data

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

A study of speaker adaptation for DNN-based speech synthesis

Rule Learning with Negation: Issues Regarding Effectiveness

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Probabilistic Latent Semantic Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Probability and Statistics Curriculum Pacing Guide

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

INPE São José dos Campos

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Team Formation for Generalized Tasks in Expertise Social Networks

CS 446: Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

Modeling function word errors in DNN-HMM based LVCSR systems

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Forget catastrophic forgetting: AI that learns after deployment

Modeling function word errors in DNN-HMM based LVCSR systems

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Data Fusion Through Statistical Matching

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning to Schedule Straight-Line Code

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Classification Using ANN: A Review

Market Design and Computer- Assisted Markets: An Economist s Perspec;ve. Simons Ins;tute, Berkeley May 31, 2013

Speech Recognition at ICSI: Broadcast News and beyond

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

arxiv: v1 [cs.cv] 10 May 2017

Speech Emotion Recognition Using Support Vector Machine

Deep Neural Network Language Models

Networks in Cognitive Science

Semi-Supervised Face Detection

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Second Exam: Natural Language Parsing with Neural Networks

Content-free collaborative learning modeling using data mining

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Speaker Identification by Comparison of Smart Methods. Abstract

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Generative models and adversarial training

arxiv: v2 [cs.cv] 30 Mar 2017

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Discriminative Learning of Beam-Search Heuristics for Planning

A Case Study: News Classification Based on Term Frequency

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Automating the E-learning Personalization

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Axiom 2013 Team Description Paper

Probability estimates in a scenario tree

Transcription:

Supervised and Unsupervised Learning Ciro Donalek Ay/Bi 199 April 2011

Summary KDD and Data Mining Tasks Finding the op?mal approach Supervised Models Neural Networks Mul? Layer Perceptron Decision Trees Unsupervised Models Kmeans Self Organizing Maps Ensembles Links and References

Knowledge Discovery in Databases KDD may be defined as: "The non trivial process of iden2fying valid, novel, poten2ally useful, and ul2mately understandable pa9erns in data". KDD is an interac?ve and itera?ve process involving several steps.

You got your data: what s next? What kind of analysis do you need? Which model is more appropriate for it?

Clean your data! Data preprocessing transforms the raw data into a format that will be more easily and effec?vely processed for the purpose of the user. Some tasks sampling: selects a representa?ve subset from a large popula?on of data; Noise treatment strategies to handle missing data: some?mes your raws will be incomplete, not all parameters are measured for all samples. normaliza2on feature extrac2on: pulls out specified data that is significant in some par?cular context. Use standard formats!

Missing Data Missing data are a part of almost all research, and we all have to decide how to deal with it. Complete Case Analysis: use only raws with all the values Available Case Analysis Subs?tu?on Mean Value: replace the missing value with the mean value for that par?cular a]ribute Regression Subs?tu?on: we can replace the missing value with historical value from similar cases Matching Imputa?on: for each unit with a missing y, find a unit with similar values of x in the observed data and take its y value Maximum Likelihood, EM, etc Some DM models can deal with missing data be]er than others.

Data Mining Data Mining is about automa?ng the process of searching for pa]erns in the data. More in details, the most relevant DM tasks are: associa?on sequence or path analysis clustering classificadon regression visualiza?on

Finding SoluDon via Purposes What kind of analysis do you need? Regression predict new values based on the past, inference compute the new values for a dependent variable based on the values of one or more measured a]ributes Classifica?on: divide samples in classes use a trained set of previously labeled data Clustering par??oning of a data set into subsets (clusters) so that data in each subset ideally share some common characteris?cs Classifica?on is in a some way similar to the clustering, but requires that the analyst know ahead of?me how classes are defined.

Cluster Analysis How many clusters do you expect?

Search for Outliers

ClassificaDon Data mining technique used to predict group membership for data instances. There are two ways to assign a new value to a given class. Crispy classificadon given an input, the classifier returns its label ProbabilisDc classificadon given an input, the classifier returns its probabili?es to belong to each class useful when some mistakes can be more costly than others winner take all and other rules assign the object to the class with the highest probability (WTA) but only if its probability is greater than 40% (WTA with thresholds)

Regression / ForecasDng Data table sta?s?cal correla?on mapping without any prior assump?on on the func?onal form of the data distribu?on; machine learning algorithms well suited for this. Curve fieng find a well defined and known func?on underlying your data; theory / exper?se can help.

Machine Learning To learn: to get knowledge of by study, experience, or being taught. Types of Learning Supervised Unsupervised

Unsupervised Learning The model is not provided with the correct results during the training. Can be used to cluster the input data in classes on the basis of their sta?s?cal proper?es only. Cluster significance and labeling. The labeling can be carried out even if the labels are only available for a small number of objects representa?ve of the desired classes.

Supervised Learning Training data includes both the input and the desired results. For some examples the correct results (targets) are known and are given in input to the model during the learning process. The construc?on of a proper training, valida?on and test set (Bok) is crucial. These methods are usually fast and accurate. Have to be able to generalize: give the correct results when new data are given in input without knowing a priori the target.

GeneralizaDon Refers to the ability to produce reasonable outputs for inputs not encountered during the training. In other words: NO PANIC when "never seen before" data are given in input!

A common problem: OVERFITTING Learn the data and not the underlying func?on Performs well on the data used during the training and poorly with new data. Use proper training sets, early stopping.

Datasets Training set: a set of examples used for learning, where the target value is known. ValidaDon set: a set of examples used to tune the architecture of a classifier and es?mate the error. Test set: used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased es?mate of the generaliza?on error.

IRIS dataset IRIS consists of 3 classes, 50 instances each 4 numerical a]ributes (sepal and petal length and width in cm) each class refers to a type of Iris plant (Setosa, Versicolor, Verginica) the first class is linearly separable from the other two while the 2 nd and the 3 rd are not linearly separable

PQ Ar?facts ArDfacts Dataset 2 main classes and 4 numerical a]ributes classes are: true objects, ar?facts

Data SelecDon Garbage in, garbage out : training, valida?on and test data must be representa?ve of the underlying model All eventuali?es must be covered Unbalanced datasets since the network minimizes the overall error, the propor?on of types of data in the set is cri?cal; inclusion of a loss matrix (Bishop,1995); olen, the best approach is to ensure even representa?on of different cases, then to interpret the network's decisions accordingly.

ArDficial Neural Network An Ar?ficial Neural Network is an informa?on processing paradigm that is inspired by the way biological nervous systems process informa?on: a large number of highly interconnected simple processing elements (neurons) working together to solve specific problems

A simple ardficial neuron The basic computa?onal element is olen called a node or unit. It receives input from some other units, or from an external source. Each input has an associated weight w, which can be modified so as to model synap?c learning. The unit computes some func?on of the weighted sum of its inputs:

Neural Networks A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the various types and architectures are iden?fied both by the different topologies adopted for the connec?ons as well by the choice of the ac?va?on func?on. The values of the func?ons associated with the connec?ons are called weights. The whole game of using NNs is in the fact that, in order for the network to yield appropriate outputs for given inputs, the weight must be set to suitable values. The way this is obtained allows a further dis?nc?on among modes of opera?ons.

Neural Networks: types Feedforward: Single Layer Perceptron, MLP, ADALINE (Adap?ve Linear Neuron), RBF Self Organized: SOM (Kohonen Maps) Recurrent: Simple Recurrent Network, Hopfield Network. Stochas?c: Boltzmann machines, RBM. Modular: Commi]ee of Machines, ASNN (Associa?ve Neural Networks), Ensembles. Others: Instantaneously Trained, Spiking (SNN), Dynamic, Cascades, NeuroFuzzy, PPS, GTM.

MulD Layer Perceptron The MLP is one of the most used supervised model: it consists of mul?ple layers of computa?onal units, usually interconnected in a feed forward way. Each neuron in one layer has direct connec?ons to all the neurons of the subsequent layer.

Learning Process Back Propaga?on the output values are compared with the target to compute the value of some predefined error func?on the error is then fedback through the network using this informa?on, the algorithm adjusts the weights of each connec?on in order to reduce the value of the error func?on Aler repea?ng this process for a sufficiently large number of training cycles, the network will usually converge.

Hidden Units The best number of hidden units depend on: number of inputs and outputs number of training case the amount of noise in the targets the complexity of the func?on to be learned the ac?va?on func?on Too few hidden units => high training and generaliza?on error, due to underfieng and high sta?s?cal bias. Too many hidden units => low training error but high generaliza?on error, due to overfieng and high variance. Rules of thumb don't usually work.

AcDvaDon and Error FuncDons

AcDvaDon FuncDons

Results: confusion matrix

Results: completeness and contaminadon Exercise: compute completeness and contamina?on for the previous confusion matrix (test set)

Decision Trees Is another classifica?on method. A decision tree is a set of simple rules, such as "if the sepal length is less than 5.45, classify the specimen as setosa." Decision trees are also nonparametric because they do not require any assump?ons about the distribu?on of the variables in each class.