Introduction to Machine Learning

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

CS Machine Learning

CSL465/603 - Machine Learning

Lecture 10: Reinforcement Learning

Probabilistic Latent Semantic Analysis

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 1: Basic Concepts of Machine Learning

arxiv: v2 [cs.cv] 30 Mar 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Australian Journal of Basic and Applied Sciences

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Speech Recognition at ICSI: Broadcast News and beyond

Word Segmentation of Off-line Handwritten Documents

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Artificial Neural Networks written examination

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Laboratorio di Intelligenza Artificiale e Robotica

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

WHEN THERE IS A mismatch between the acoustic

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Seminar - Organic Computing

Axiom 2013 Team Description Paper

CS 446: Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods in Multilingual Speech Recognition

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Rule Learning with Negation: Issues Regarding Effectiveness

Software Maintenance

The stages of event extraction

Chapter 2 Rule Learning in a Nutshell

BYLINE [Heng Ji, Computer Science Department, New York University,

A Case Study: News Classification Based on Term Frequency

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Human Emotion Recognition From Speech

Using dialogue context to improve parsing performance in dialogue systems

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Linking Task: Identifying authors and book titles in verbose queries

Switchboard Language Model Improvement with Conversational Data from Gigaword

12- A whirlwind tour of statistics

A Reinforcement Learning Variant for Control Scheduling

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Speech Emotion Recognition Using Support Vector Machine

Semi-Supervised Face Detection

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Online Updating of Word Representations for Part-of-Speech Tagging

Knowledge Transfer in Deep Convolutional Neural Nets

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Learning Methods for Fuzzy Systems

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

A survey of multi-view machine learning

AQUA: An Ontology-Driven Question Answering System

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

BMBF Project ROBUKOM: Robust Communication Networks

An investigation of imitation learning algorithms for structured prediction

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Generative models and adversarial training

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Calibration of Confidence Measures in Speech Recognition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Softprop: Softmax Neural Network Backpropagation Learning

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

SARDNET: A Self-Organizing Feature Map for Sequences

A Vector Space Approach for Aspect-Based Sentiment Analysis

Disambiguation of Thai Personal Name from Online News Articles

Multi-Lingual Text Leveling

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

LEGO MINDSTORMS Education EV3 Coding Activities

Model Ensemble for Click Prediction in Bing Search Ads

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Universidade do Minho Escola de Engenharia

Cross Language Information Retrieval

Mining Student Evolution Using Associative Classification and Clustering

Mathematics process categories

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Reinforcement Learning by Comparing Immediate Reward

Using focal point learning to improve human machine tacit coordination

Transcription:

Andrea Passerini passerini@disi.unitn.it Machine Learning

What is Machine Learning A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. T. Mitchell

Successful applications of machine learning Speech recognition, Optical character recognition, Computer Vision Learning to drive an autonomous vehicle (DARPA Grand Challenges, Google Self-Driving Car) Hot topic Game playing (IBM s Deep Blue, Watson, AlphaGO) Recommender Systems Big players are heavily investing in machine learning: Google, Facebook, Amazon, IBM, Uber...

Components of a well-posed learning problem task to be addressed by the system (e.g. recognizing handwritten characters) performance measure to evalute the learned system (e.g. number of misclassified characters) training experience to train the learning system (e.g. labelled handwritten characters)

Designing a machine learning system 1 Formalize the learning task 2 Collect data 3 Extract features 4 Choose class of learning models 5 Train model 6 Evaluate model

Formalize the learning task Define the task that should be addressed by the learning system (e.g. recognizing handwritten characters from images) A learning problem is often composed of a number of related tasks. E.g.: Segment the image into words and each word into characters. Identify the language of the writing. Classify each character into the language alphabet. Choose an appropriate performance measure for evaluating the learned system (e.g. number of misclassified characters)

Collect data A set of training examples need to be collected in machine readable format. Data collection is often the most cumbersome part of the process, implying manual intervention especially in labelling examples for supervised learning. Recent approaches to the problem of data labeling try to make use of the much cheaper availability of unlabeled data (semi-supervised learning)

Extract features A relevant set of features need to be extracted from the data in order to provide inputs to the learning system. Prior knowledge is usually necessary in order to choose the appropriate features for the task at hand. Too few features can miss relevant information preventing the system from learning the task with reasonable performance. Including noisy features can make the learning problem harder. Too many features can require a number of examples greater than those available for training.

Choose learning model class A simple model like a linear classifier is easy to train but insufficient for non linearly separable data. A too complex model can memorize noise in training data failing to generalize to new examples.

Train model Training a model implies searching through the space of possible models (aka hypotheses) given the chosen model class. Such search typically aims at fitting the available training examples well according to the chosen performance measure. However, the learned model should perform well on unseen data (generalization), and not simply memorize training examples (overfitting). Different techniques can be used to improve generalization, usually by trading off model complexity with training set fitting Entia non sunt multiplicanda praeter necessitatem (Entities are not to be multiplied without necessity) William of Occam (Occam s razor)

Evaluate model The learned model is evaluated according to its ability to generalize to unseen examples. Evaluation can provide insights into the model weaknesses and suggest directions for refining/modifying it. Evaluation can imply comparing different models/learners in order to decide the best performing one Statistical significance of observed differences between performance of different models should be assessed with appropriate statistical tests.

Learning settings Supervised learning The learner is provided with a set of input/output pairs (x i, y i ) X Y The learned model f : X Y should map input examples into their outputs (e.g. classify character images into the character alphabet) A domain expert is typically involved in labeling input examples with the corresponding outputs.

Learning settings Unsupervised learning The learner is provided with a set of input examples x i X, with no labeling information The learner models training examples, e.g. by grouping them into clusters according to their similarity

Learning settings Semi-supervised learning The learner is provided with a set of input/output pairs (x i, y i ) X Y A (typically much bigger) additional set of unlabeled examples x i X is also provided. Like in supervised learning, the learned model f : X Y should map input examples into their outputs Unlabelled data can be exploited to improve performance, e.g. by forcing the model to produce similar outputs for similar inputs, or by allowing to learn a better internal representation of examples.

Learning settings Reinforcement learning The learner is provided a set of possible states S, and for each state, a set of possible actions, A moving it to a next state. In performing action a from state s, the learner is provided an immediate reward r(s, a). The task is to learn a policy allowing to choose for each state s the action a maximizing the overall reward (including future moves). The learner has to deal with problems of delayed reward coming from future moves, and trade-off between exploitation and exploration. Typical applications include moving policies for robots and sequential scheduling problems in general.

Supervised learning tasks Classification binary Assign an example to one of two possible classes (often a positive and a negative one). E.g. digit vs non-digit character. multiclass Assign an example to one of n > 2 possible classes. E.g. assign a digit character among {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} multilabel Assign an example to a subset m n of the possible classes. E.g. predict the topics of a text.

Supervised learning tasks Regression Assign a real value to an example. E.g. predict biodegradation rate of a molecular compound under aerobic conditions. Ordinal regression or ranking Order a set of examples according to their relative importance/quality wrt the task. E.g. order emails according to their urgency.

Unsupervised learning tasks Dimensionality reduction Reduce dimensionality of the data maintaining as much information as possible. E.g. principal component analysis (PCA), random projections. Clustering Cluster data into homogeneous groups according to their similarity. E.g. cluster genes according to their expression levels Novelty detection Detect novel examples which differ from the distribution of a certain set of data. E.g. recognize anomalous network traffic indicating a possible attack.

Probabilistic Reasoning Reasoning in presence of uncertainty Evaluating the effect of a certain piece of evidence on other related variables Estimate probabilities and relations between variables from a set of observations

Choice of Learning Algorithms Information available Full knowledge of probability distributions of data: Bayesian decision theory Form of probabilities known, parameters unknown: Parameter estimation from training data Form of probabilities unknown, training examples available: discriminative methods: do not model input data (generative methods), learn a function predicting the desired output given the input Form of probabilities unknown, training examples unavailable (only inputs): unsupervised methods: cluster examples by similarity