An Introduction to Machine Learning

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Artificial Neural Networks written examination

Generative models and adversarial training

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lecture 1: Basic Concepts of Machine Learning

Human Emotion Recognition From Speech

Probabilistic Latent Semantic Analysis

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Speech Emotion Recognition Using Support Vector Machine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v2 [cs.cv] 30 Mar 2017

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning From the Past with Experiment Databases

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Knowledge Transfer in Deep Convolutional Neural Nets

Semi-Supervised Face Detection

INPE São José dos Campos

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Softprop: Softmax Neural Network Backpropagation Learning

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Assignment 1: Predicting Amazon Review Ratings

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Calibration of Confidence Measures in Speech Recognition

Australian Journal of Basic and Applied Sciences

Issues in the Mining of Heart Failure Datasets

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Simulation

Word Segmentation of Off-line Handwritten Documents

Evolutive Neural Net Fuzzy Filtering: Basic Description

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Learning Methods for Fuzzy Systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning With Negation: Issues Regarding Effectiveness

WHEN THERE IS A mismatch between the acoustic

Model Ensemble for Click Prediction in Bing Search Ads

Reducing Features to Improve Bug Prediction

Corrective Feedback and Persistent Learning for Information Extraction

CS Machine Learning

Time series prediction

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Universidade do Minho Escola de Engenharia

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Learning to Schedule Straight-Line Code

A survey of multi-view machine learning

Rule Learning with Negation: Issues Regarding Effectiveness

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Switchboard Language Model Improvement with Conversational Data from Gigaword

arxiv: v1 [cs.lg] 15 Jun 2015

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Indian Institute of Technology, Kanpur

arxiv: v1 [cs.cv] 10 May 2017

Second Exam: Natural Language Parsing with Neural Networks

Truth Inference in Crowdsourcing: Is the Problem Solved?

A Case Study: News Classification Based on Term Frequency

Speech Recognition at ICSI: Broadcast News and beyond

Modeling function word errors in DNN-HMM based LVCSR systems

Data Fusion Through Statistical Matching

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Methods in Multilingual Speech Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Review: Speech Recognition with Deep Learning Methods

Knowledge-Based - Systems

An OO Framework for building Intelligence and Learning properties in Software Agents

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Axiom 2013 Team Description Paper

Applications of data mining algorithms to analysis of medical data

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Deep Facial Action Unit Recognition from Partially Labeled Data

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Linking Task: Identifying authors and book titles in verbose queries

CS 446: Machine Learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Attributed Social Network Embedding

THE world surrounding us involves multiple modalities

Transcription:

MindLAB Research Group - Universidad Nacional de Colombia Introducción a los Sistemas Inteligentes

Outline 1 2 What s machine learning History Supervised learning Non-supervised learning 3

Observation and analysis

Tycho Brahe

Tycho Brahe

Johannes Kepler

Data and models

Machine Learning

d the cascaded strategy of combining handcrafted features d CNN-derived features enables the possibility of maximizing formance by leveraging the disconnected feature sets. vious work in this approach includes the Nippon Electric mpany (NEC) team,13 where an attempt was made to stack CNN-learned features and handcrafted features yielded an measure of 0.659, suggesting that more intelligent combinans of CNN and handcraft features are required. In this paper, we present a cascaded approach to combining NN and handcrafted features for mitosis detection. The workw of the new approach is depicted in Fig. 2. The first step is to Machine Learning with Images Model Data ing15 is first applied to segment mitosis cane extracted and classified via a random forests convolutional neural networks (CNN)11 are the classification layer. For those candidates NN), we train a second-stage random forests crafted features. Final decision is obtained via Learning/ Model Induction Fabio Gonza lez, PhD Prediction

The fourth paradigm

Machine Learning What s machine learning History Supervised learning Non-supervised learning Construction and study of systems that can learn from data Main problem: to find patterns, relationships, regularities among data, which allow to build descriptive and predictive models. Related fields: Statistics Pattern recognition and computer vision Data mining and knowledge discovery Data analytics

Brief history What s machine learning History Supervised learning Non-supervised learning Fisher s linear discriminant (Fisher, 1936) Artificial neuron model (MCCulloch and Pitts, 1943) Perceptron (Rosenblatt, 1957) (Minsky&Papert, 1969) Probably approximately correct learning (Valiant, 1984) Multilayer perceptron and back propagation (Rumelhart et al., 1986) Decision trees (Quinlan, 1987) Bayesian networks (Pearl, 1988) Support vector machines (Cortes&Vapnik, 1995) Efficient MLP learning, deep learning (Hinton et al., 2007)

Machine Learning in the news What s machine learning History Supervised learning Non-supervised learning

Supervised learning What s machine learning History Supervised learning Non-supervised learning Fundamental problem: to find a function that relates a set of inputs with a set of outputs Typical problems: Classification Regression

Supervised learning What s machine learning History Supervised learning Non-supervised learning Fundamental problem: to find a function that relates a set of inputs with a set of outputs Typical problems: Classification Regression

Non-supervised learning What s machine learning History Supervised learning Non-supervised learning There are not labels for the training samples Fundamental problem: to find the subjacent structure of a training data set Typical problems: clustering, segmentation, dimensionality reduction, latent topic analysis Some samples may have labels, in that case it is called semi-supervised learning

Non-supervised learning What s machine learning History Supervised learning Non-supervised learning There are not labels for the training samples Fundamental problem: to find the subjacent structure of a training data set Typical problems: clustering, segmentation, dimensionality reduction, latent topic analysis Some samples may have labels, in that case it is called semi-supervised learning

The machine Learning process

Model induction from data Learning is an ill-posed problem (more than one possible solution for the same particular problem, solutions are sensitive to small changes on the problem) It is necessary to make additional assumptions about the kind of pattern that we want to learn Hypothesis space: set of valid patterns that can be learnt by the learning algorithm Occam s razor: All things being equal, the simplest solution tends to be the best one.

Approaches to learning Probabilistic: Generative models: model P(Y, X ) Discriminative models: model P(Y X ) Geometrical: Manifold learning: model the geometry of the space where the data lives Max margin learning: model the separation between the classes Optimization: Energy/loss/risk minimization

Learning as optimization General optimization problem: min L(f, D), f H with H:hypothesis space, D:training data, L:loss/error Example, logistic regression: Hypothesis space: Cross-entropy error: E(w) = ln p(t w) = y(x) = P(C + x) = σ(w T x) l [t n ln y n + (1 t n ) ln(1 y n )] n=1

Methods Supervised generative: Naïve Bayes Graphical models Markov random fields Hidden markov models Supervised discriminative: Logistic regression Ridge regression Conditional random fields Supervised geometrical Max margin classification (SVM) k-nearest neighbors Non-supervised generative: Latent semantic analysis Latent Dirichlet allocation Gaussian mixtures non-supervised geometrical: Other k-means PCA Manifold learning Neural networks (deep learning) Decision tress Association rules

Methods

Strategies Optimization (non-linear, convex, etc) Stochastic gradient descent Kernel methods Maximum likelihood estimation Maximum a posteriori estimation Bayesian estimation (variational learning, Gaussian processes) Expectation maximization Maximum entropy models Sampling (Markov Chain Monte Carlo, particle filtering)

Evaluation

Training error vs generalization error Training error: Generalization error: l L(f w, S i ) i=1 E[(L(f w, S)]

Cross validation

Overfitting and underfitting

Regularization Controls the complexity of a learned model Usually, the regularization term corresponds to a norm of the parameter vector (L 1 or L 2 the most common) In some cases, it is equivalent to the inclusion of a prior and finding a MAP solution.

Features Features represent our prior knowledge of the problem Depend on the type of data Specialized features for practically any kind of data (images, video, sound, speech, text, web pages, etc) Medical imaging: Standard computer vision features (color, shape, texture, edges, local-global, etc) Specialized features tailored to the problem at hand New trend: learning features from data

Feature learning

Unsupervised feature learning

AMIDA-MICCAI 2013 Challenge

High-throughput data analytics Large scale machine learning (big-data): Large number of samples Large samples (whole-slide images, 4D high-resolution volumes) Scalable learning algorithms (on-line learning) Distributed computing architectures (Hadoop, Spark) GPGPU computing and multicore architectures

Questions? fagonzalezo@unal.edu.co http://www.mindlaboratory.org