ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Learning From the Past with Experiment Databases

Probabilistic Latent Semantic Analysis

Australian Journal of Basic and Applied Sciences

Cross Language Information Retrieval

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Rule Learning with Negation: Issues Regarding Effectiveness

CSL465/603 - Machine Learning

Evidence for Reliability, Validity and Learning Effectiveness

On-the-Fly Customization of Automated Essay Scoring

Software Maintenance

Speech Recognition at ICSI: Broadcast News and beyond

Lecture 1: Basic Concepts of Machine Learning

An Introduction to Simio for Beginners

Generative models and adversarial training

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

A Case Study: News Classification Based on Term Frequency

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

CROSS COUNTRY CERTIFICATION STANDARDS

Switchboard Language Model Improvement with Conversational Data from Gigaword

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Learning Methods for Fuzzy Systems

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Lesson Study Course

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Softprop: Softmax Neural Network Backpropagation Learning

Multi-Lingual Text Leveling

Truth Inference in Crowdsourcing: Is the Problem Solved?

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Indian Institute of Technology, Kanpur

Detailed course syllabus

Probability and Statistics Curriculum Pacing Guide

INPE São José dos Campos

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Word Segmentation of Off-line Handwritten Documents

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

WHEN THERE IS A mismatch between the acoustic

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Laboratorio di Intelligenza Artificiale e Robotica

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

The Evolution of Random Phenomena

Reinforcement Learning by Comparing Immediate Reward

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

An Empirical and Computational Test of Linguistic Relativity

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

Statewide Framework Document for:

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

South Carolina English Language Arts

Mandarin Lexical Tone Recognition: The Gating Paradigm

Artificial Neural Networks written examination

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

On the Combined Behavior of Autonomous Resource Management Agents

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Systematic reviews in theory and practice for library and information studies

Unit 3. Design Activity. Overview. Purpose. Profile

Physics 270: Experimental Physics

EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 4. Working with measures

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Laboratorio di Intelligenza Artificiale e Robotica

Loughton School s curriculum evening. 28 th February 2017

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

AQUA: An Ontology-Driven Question Answering System

BENCHMARK TREND COMPARISON REPORT:

Speech Emotion Recognition Using Support Vector Machine

Transcription:

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti and Mauro Brunato, 2015, all rights reserved. Slides can be used and modified for classroom usage, provided that the attribution (link to book website) is kept.

Chap.3 Learning requires a method Data Mining, noun 1. Torturing the data until it confesses... and if you torture it long enough, you can get it to confess to anything.

What is learning? Unifying different cases by discovering the underlying explanatory laws. Learning from examples is only a means to reach the real goal: generalization, the capability of explaining new cases

Supervised learning architecture: feature extraction and classification

Performance estimation If the goal is generalization, estimating the performance has to be done with extreme care Feature extraction (xi; yi), i = 1;...; L Classification Regression Output can be probability

Learning from labeled examples: minimization and generalization A flexible model f(x;w), where the flexibility is given by some tunable parameters (or weights) w determination of the best parameters is fully automated, this is why the method is called machine learning after all

Flexible model with internal parameters (mental images)

Learning from labeled examples: minimization and generalization (2) fix the free parameters by demanding that the learned model works correctly on the examples in the training set. power of optimization: we start by defining an error measure to be minimized, an automated optimization process to determine optimal parameters

Learning from labeled examples: minimization and generalization (3) suitable error measure is the sum of the errors between the correct answer (given by the example label) and the outcome predicted if the function is smooth one can discover points of low altitude by being blindfolded and parachuted to a random initial point (gradient descent)

RMS (root mean square) error function Indivisual errors Square Average (Sum and divide) Square root is optional... (optimizing sum of squares or its square root leads to the sam eresult)

Bias-Variance dilemma minimization of an error function is a first critical component, but not the only one. We are interested in generalization Model complexity matters!

Bias-Variance dilemma 1. Models with too few parameters are inaccurate because of a large bias: they lack flexibility. 2. Models with too many parameters are inaccurate because of a large variance: they are too sensitive to the sample details (changes in the details will produce huge variations). 3. Identifying the best model requires identifying the proper model complexity, i.e., the proper architecture and number of parameters.

Learn, validate, test! careful experimental procedures to measure the effectiveness of the learning process. It is a terrible mistake to measure the performance of the learning systems on the same examples used for training The test set is used only once for a final measure of performance.

Learn, validate, test!

K-fold Cross-validation the original sample is randomly partitioned into K subsamples. A single subsample is used as the validation data for testing, and the remaining K - 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. Validation results are averaged

K-fold Cross-validation Advanced: stratification helps in classification (subdivide each class)

Errors of different kinds

Errors of different kinds Accuracy is defined as the fraction of correct answers over the total Precision as the fraction of correct answers over the number of retrieved (positive) cases Recall is computed as the fraction of correct answers over the number of relevant (true positive) cases.

Gist The goal of machine learning is to use a set of training examples to realize a system which will correctly generalize to new cases, in the same context but not seen during learning. ML learns, i.e., determines appropriate values for the free parameters of a flexible model, by automatically minimizing a measure of the error on the example set, possibly corrected to discourage complex models, and therefore improving the chances of correct generalization. The output value of the system can be a class (classification), or a number (regression). In some cases having as output the probability for a class increases flexibility of usage.

Gist Accurate classifiers can be built without any knowledge elicitation phase, just starting from an abundant and representative set of example data. This is a dramatic paradigm change. ML is very powerful but requires a strict method (a kind of pedagogy of ML). For sure, never estimate performance on the training set this is a mortal sin: be aware that re-using validation data will create optimistic estimates. If examples are scarce, use cross-validation to show off that you are an expert ML user. To be on the safe side, set away some test examples and use them only once at the end to estimate performance. There is no single way to measure the performance of a model, different kinds of mistakes can have very different costs. Accuracy, precision and recall are some possibilities, a confusion matrix is giving the complete picture for more classes.