Machine learning theory

Similar documents
Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Lecture 10: Reinforcement Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Semi-Supervised Face Detection

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Probabilistic Latent Semantic Analysis

Reinforcement Learning by Comparing Immediate Reward

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Welcome to. ECML/PKDD 2004 Community meeting

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning From the Past with Experiment Databases

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Artificial Neural Networks written examination

CS Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Evolutive Neural Net Fuzzy Filtering: Basic Description

Assignment 1: Predicting Amazon Review Ratings

A Case Study: News Classification Based on Term Frequency

A survey of multi-view machine learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

AMULTIAGENT system [1] can be defined as a group of

Laboratorio di Intelligenza Artificiale e Robotica

Reducing Features to Improve Bug Prediction

Axiom 2013 Team Description Paper

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Generative models and adversarial training

Rule Learning with Negation: Issues Regarding Effectiveness

Laboratorio di Intelligenza Artificiale e Robotica

Human Emotion Recognition From Speech

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Speech Emotion Recognition Using Support Vector Machine

Probability and Statistics Curriculum Pacing Guide

Discriminative Learning of Beam-Search Heuristics for Planning

Australian Journal of Basic and Applied Sciences

arxiv: v1 [cs.lg] 15 Jun 2015

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Calibration of Confidence Measures in Speech Recognition

Handling Concept Drifts Using Dynamic Selection of Classifiers

Learning Methods for Fuzzy Systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Softprop: Softmax Neural Network Backpropagation Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Comparison of Two Text Representations for Sentiment Analysis

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

TD(λ) and Q-Learning Based Ludo Players

(Sub)Gradient Descent

A study of speaker adaptation for DNN-based speech synthesis

Using focal point learning to improve human machine tacit coordination

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

A Reinforcement Learning Variant for Control Scheduling

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Seminar - Organic Computing

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

FF+FPG: Guiding a Policy-Gradient Planner

Knowledge Transfer in Deep Convolutional Neural Nets

Learning Methods in Multilingual Speech Recognition

Time series prediction

Mining Association Rules in Student s Assessment Data

The Strong Minimalist Thesis and Bounded Optimality

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Truth Inference in Crowdsourcing: Is the Problem Solved?

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

A Case-Based Approach To Imitation Learning in Robotic Agents

Using Web Searches on Important Words to Create Background Sets for LSI Classification

A Comparison of Standard and Interval Association Rules

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Cooperative evolutive concept learning: an empirical study

SARDNET: A Self-Organizing Feature Map for Sequences

WHEN THERE IS A mismatch between the acoustic

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Linking Task: Identifying authors and book titles in verbose queries

Comment-based Multi-View Clustering of Web 2.0 Items

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Grade 6: Correlated to AGS Basic Math Skills

INPE São José dos Campos

Disambiguation of Thai Personal Name from Online News Articles

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

An Empirical and Computational Test of Linguistic Relativity

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Transcription:

Machine learning theory Machine learning theory Introduction Hamid Beigy Sharif university of technology February 27, 2017 Hamid Beigy Sharif university of technology February 27, 2017 1 / 28

Machine learning theory Table of contents 1. Introduction 2. Supervised learning 3. Reinforcement learning 4. Unsupervised learning 5. Machine learning theory 6. Outline of course 7. References Hamid Beigy Sharif university of technology February 27, 2017 2 / 28

Machine learning theory Introduction Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 2 / 28

Machine learning theory Introduction What is machine learning? Definition (Mohri et. al., 2012) Computational methods that use experience to improve performance or to make accurate predictions. Definition (Mitchell, 1997) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Example (Spam classification) Task: determine if emails are spam or non-spam. Experience: incoming emails with human classification. Performance Measure: percentage of correct decisions. Hamid Beigy Sharif university of technology February 27, 2017 3 / 28

Machine learning theory Introduction Why we need machine learning? We need machine learning because 1 Tasks are too complex to program Tasks performed by animals/humans such as driving, speech recognition, image understanding, and etc. Tasks beyond human capabilities such as weather prediction, analysis of genomic data, web search engines, and etc. 2 Some tasks need adaptivity. When a program has been written down, it stays unchanged. In some tasks such as optical character recognition and speech recognition, we need the behavior to be adapted when new data arrives. Hamid Beigy Sharif university of technology February 27, 2017 4 / 28

Machine learning theory Introduction Types of machine learning Machine learning algorithms based on the information provided to the learner can be classified into three main groups. 1 Supervised/predictive learning: The goal is to learn a mapping from inputs x to outputs y given the labeled set S = {(x 1, t 1), (x 2, t 2),..., (x m, t m)}. x k is called feature vector. When t i {0, 1}, the learning problem is called classification. When t i R, the problem is called regression. 2 Unsupervised/descriptive learning: The goal is to find interesting pattern in data S = {x 1, x 2,..., x m}. Unsupervised learning is arguably more typical of human and animal learning. 3 Reinforcement learning: Reinforcement learning is learning by interacting with an environment. A reinforcement learning agent learns from the consequences of its actions. Hamid Beigy Sharif university of technology February 27, 2017 5 / 28

Machine learning theory Introduction Applications of machine learning 1 Supervised learning: Classification: Document classification and spam filtering. Image classification and handwritten recognition. Face detection and recognition. Regression: Predict stock market price. Predict temperature of a location. Predict the amount of PSA. 2 Unsupervised/descriptive learning: Discovering clusters. Discovering latent factors. Discovering graph structures (correlation of variables). Matrix completion (filling missing values). Collaborative filtering. Market-basket analysis (frequent item-set mining). 3 Reinforcement learning: Game playing. robot navigation. Hamid Beigy Sharif university of technology February 27, 2017 6 / 28

Machine learning theory Introduction The need for probability theory A key concept in machine learning is uncertainty. Data comes from a process that is not completely known. This lack of knowledge is indicated by modeling the process as a random process. The process actually may be deterministic, but we don t have access to complete knowledge about it, we model it as random and we use the probability theory to analyze it. Hamid Beigy Sharif university of technology February 27, 2017 7 / 28

Machine learning theory Supervised learning Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 7 / 28

Machine learning theory Supervised learning Supervised learning In supervised learning, the goal is to find a mapping from inputs X to outputs t given a labeled set of input-output pairs S is called training set. S = {(x 1, t 1), (x 2, t 2),..., (x m, t m)}. In the simplest setting, each training input x is a D dimensional vector of numbers. Each component of x is called feature, attribute, or variable and x is called feature vector. In general, x could be a complex structure of object, such as an image, a sentence, an email message, a time series, a molecular shape, a graph. When t i { 1, +1} or t i {0, 1}, the problem is known as classification. When t i R, the problem is known as regression. Hamid Beigy Sharif university of technology February 27, 2017 8 / 28

Machine learning theory Supervised learning Classification Classification The learning algorithm should find a particular hypotheses h H to approximate C as closely as possible. We choose H and the aim is to find h H that is similar to C. This reduces the problem of learning the class to the easier problem of finding the parameters that define h. Hypothesis h makes a prediction for an instance x in the following way. { 1 if h classifies x as an instance of a positive example h(x) = 0 if h classifies x as an instance of a negative example Hamid Beigy Sharif university of technology February 27, 2017 9 / 28

Machine learning theory Supervised learning Classification Classification (Cont.) In real life, we don t know c(x) and hence cannot evaluate how well h(x) matches c(x). We use a small subset of all possible values x as the training set as a representation of that concept. Empirical error (risk)/training error is the proportion of training instances such that h(x) c(x). ˆR(h) = 1 m m I[h(x i ) c(x i )] When ˆR(h) = 0, h is called a consistent hypothesis with dataset S. i=1 For many examples, we can find infinitely many h such that ˆR(h) = 0. But which of them is better than for prediction of future examples? This is the problem of generalization, that is, how well our hypothesis will correctly classify the future examples that are not part of the training set. Hamid Beigy Sharif university of technology February 27, 2017 10 / 28

Machine learning theory Supervised learning Classification Classification (Generalization) The generalization capability of a hypothesis usually measured by the true error/risk. R(h) = P [h(x) c(x)] x D We assume that H includes C, that is there exists h H such that ˆR(h) = 0. Given a hypothesis class H, it may be the cause that we cannot learn C; that is there is no h H for which ˆR(h) = 0. Thus in any application, we need to make sure that H is flexible enough, or has enough capacity to learn C. Hamid Beigy Sharif university of technology February 27, 2017 11 / 28

Machine learning theory Supervised learning Regression Regression In regression, c(x) is a continuous function. Hence the training set is in the form of S = {(x 1, t 1 ), (x 2, t 2 ),..., (x m, t m )}, t k R. In regression, there is noise added to the output of the unknown function. t k = f (x k ) + ϵ k = 1, 2,..., m f (x k ) R is the unknown function and ϵ is the random noise. The explanation for the noise is that there are extra hidden variables that we cannot observe. t k = f (x k, z k ) + ϵ k = 1, 2,..., N z k denotes hidden variables Our goal is to approximate the output by function g(x). The empirical error on the training set S is ˆR(h) = 1 m [t k g(x k )] 2 m The aim is to find g(.) that minimizes the empirical error. We assume that a hypothesis class for g(.) with a small set of parameters. k=1 Hamid Beigy Sharif university of technology February 27, 2017 12 / 28

Machine learning theory Reinforcement learning Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 12 / 28

Machine learning theory Reinforcement learning Introduction Reinforcement learning is what to do (how to map situations to actions) so as to maximize a scalar reward/reinforcement signal The learner is not told which actions to take as in supervised learning, but discover which actions yield the most reward by trying them. The trial-and-error and delayed reward are the two most important feature of reinforcement learning. Reinforcement learning is defined not by characterizing learning algorithms, but by characterizing a learning problem. Any algorithm that is well suited for solving the given problem, we consider to be a reinforcement learning. One of the challenges that arises in reinforcement learning and other kinds of learning is tradeoff between exploration and exploitation. Hamid Beigy Sharif university of technology February 27, 2017 13 / 28

Machine learning theory Reinforcement learning Introduction Reinforcement Learning A key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. action Agent state Environment reward 4.1 Representation of the general scenario of reinforcement le Hamid Beigy Sharif university of technology February 27, 2017 14 / 28

Machine learning theory Unsupervised learning Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 14 / 28

Machine learning theory Unsupervised learning Introduction Clustering is fundamentally problematic and subjective. Examples : 1 Clustering : Find natural grouping in data. 2 Dimensionality reduction : Find projections that carry important information. 3 Compression : Represent data using fewer bits. Unsupervised learning is like supervised learning with missing outputs (or with missing inputs). Hamid Beigy Sharif university of technology February 27, 2017 15 / 28

Machine learning theory Unsupervised learning Clustering Clustering Clustering is fundamentally problematic and subjective Given data X = {x 1, x 2,..., x m} learn to understand the data, by re-representing it in some intelligent way. Clustering Clustering is fundamentally problematic and subjective Clustering Clustering is fundamentally problematic and subjective Hamid Beigy Sharif university of technology February 27, 2017 16 / 28

Machine learning theory Machine learning theory Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 16 / 28

Machine learning theory Machine learning theory Introduction What is machine learning theory? 1 What are the intrinsic properties of a given learning problem that make it hard or easy to solve? 2 How much do you need to know ahead of time about what is being learned in order to be able to learn it effectively? 3 Why are simpler hypotheses better? 4 How do we formalize machine learning problems (for eg. online, statistical)? 5 How do we pick the right model to use and what are the tradeoffs between various models? 6 How many instances do we need to see to learn to given accuracy? 7 How do we design learning algorithms with provable guarantees on performance? Hamid Beigy Sharif university of technology February 27, 2017 17 / 28

Machine learning theory Machine learning theory Example 1 Suppose that you have a coin that has an unknown probability θ of coming up heads. 2 We must determine this probability as accurately as possible using experimentation. 3 Experimentation is to repeatedly tossing the coin. Let us denote the two possible outcomes of a single toss by 1 (for HEADS) and 0 (for TAILS). 4 If you toss the coin m times, then you can record the outcomes as x 1,..., x m, where each x i {0, 1} and P [x i = 1] = θ independently of all other x i s. 5 What would be a reasonable estimate of θ? By Law of Large Numbers, in a long sequence of independent coin tosses, the relative frequency of heads will eventually approach the true value of θ with high probability. Hence, ˆθ = 1 m 6 Using Chernoff bound, we have [ ] P ˆθ θ > ϵ 2e 2ϵ2 m 7 Equivalently, m 1 ( ) 2 2ϵ log, 2 δ where 1 δ specifies the confidence of estimation. Hamid Beigy Sharif university of technology February 27, 2017 18 / 28 i x i

Machine learning theory Machine learning theory Machine learning theory 1 There are two basic questions 1 How large of a sample do we need to achieve a given accuracy with a given confidence? 2 How efficient can our learning algorithm be? 2 The first question is within statistical learning theory. 3 The second question is within computational learning theory. 4 However, there are some overlaps between these two fields. Hamid Beigy Sharif university of technology February 27, 2017 19 / 28

Machine learning theory Outline of course Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 19 / 28

Machine learning theory Outline of course Outline of course I Outline of course 1 Introduction 2 Part 1 (Theoritical foundation) 1 Consitency and PAC model 2 Learning by uniform convergence 3 Emperical and structural risk minimization 4 Growth functions, VC-dimension, covering number,... 5 Learning by non-uniform convergence and MDL 6 Generalization bounds 7 Regularization and stability of algorithms 8 Analysis of kernel learning 9 Computational complexity and running time of learning algorithms 10 PAC-MDP model for reinforcement learning 11 Theoritical foundattion of clustering 3 Part 2 (Analysis of algorithms) 1 Linear classification 2 Boosting 3 SVM and Kernel based learning 4 Regression 5 Learning automata 6 Reinforcement learning Hamid Beigy Sharif university of technology February 27, 2017 20 / 28

Machine learning theory Outline of course Outline of course II 7 Ranking 8 Online learning 9 Active learning 10 Semi-supervised learning 4 Part 3 (Advanced topics) 1 Radamacher Complexity 2 PAC-Bayes theory 3 Universal learning 4 Advance Topics Hamid Beigy Sharif university of technology February 27, 2017 21 / 28

Machine learning theory Outline of course Course evaluation Evaluation: Mid-term exam 30% 1396/2/4 Final exam 20% Take home exam 25% Homeworks 15% Project 15% Course page: Lectures: TAs : Sum of all exams 7.2 for passing. Explore a theoretical or empirical question and present it. http://ce.sharif.edu/courses/95-96/2/ce956-1/ Lectures in general will be on the board and ocasionally, will use slides. Hamid Beigy Sharif university of technology February 27, 2017 22 / 28

Machine learning theory References Table of contents 1 Introduction 2 Supervised learning 3 Reinforcement learning 4 Unsupervised learning 5 Machine learning theory 6 Outline of course 7 References Hamid Beigy Sharif university of technology February 27, 2017 22 / 28

Machine learning theory References Main references Hamid Beigy Sharif university of technology February 27, 2017 23 / 28

Machine learning theory References References I Anthony, M., and Bartlett, P. L. Learning in Neural Networks : Theoretical Foundations. Cambridge University Press, 1999. Anthony, M., and Biggs, N. Computational Learning Theory : An introduction. Cambridge University Press, 1992. Devroye, L., Gyorfi, L., and Lugosi, G. A probabilistic theory of pattern recognition. Springer, 1996. Kearns, M. J., and Vazirani, U. An Introduction to Computational Learning Theory. MIT Press, 1994. Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT Press, 2012. Hamid Beigy Sharif university of technology February 27, 2017 24 / 28

Machine learning theory References References II Shalev-Shwartz, S., and Ben-David, S. Understanding Machine Learning : From Theory to Algorithms. Cambridge University Press, 2014. Hamid Beigy Sharif university of technology February 27, 2017 25 / 28

Machine learning theory References Relevant journals 1 IEEE Trans on Pattern Analysis and Machine Intelligence 2 Journal of Machine Learning Research 3 Pattern Recognition 4 Machine Learning 5 Neural Networks 6 Neural Computation 7 Neurocomputing 8 IEEE Trans. on Neural Networks and Learning Systems 9 Annuals of Statistics 10 Journal of the American Statistical Association 11 Pattern Recognition Letters 12 Artificial Intelligence 13 Data Mining and Knowledge Discovery 14 IEEE Transaction on Cybernetics (SMC-B) 15 IEEE Transaction on Knowledge and Data Engineering 16 Knowledge and Information Systems Hamid Beigy Sharif university of technology February 27, 2017 26 / 28

Machine learning theory References Relevant conferences 1 Neural Information Processing Systems (NIPS) 2 International Conference on Machine Learning (ICML) 3 European Conference on Machine Learning (ECML) 4 Asian Conference on Machine Learning (ACML2013) 5 Conference on Learning Theory (COLT) 6 Algorithmic Learning Theory (ALT) 7 Conference on Uncertainty in Artificial Intelligence (UAI) 8 Practice of Knowledge Discovery in Databases (PKDD) 9 International Joint Conference on Artificial Intelligence (IJCAI) 10 IEEE International Conference on Data Mining series (ICDM) Hamid Beigy Sharif university of technology February 27, 2017 27 / 28

Machine learning theory References Relevant packages and datasets 1 Packages: R http://www.r-project.org/ Weka http://www.cs.waikato.ac.nz/ml/weka/ RapidMiner http://rapidminer.com/ MOA http://moa.cs.waikato.ac.nz/ 2 Datasets: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/ StatLib http://lib.stat.cmu.edu/datasets/ Delve http://www.cs.toronto.edu/ delve/data/datasets.html Hamid Beigy Sharif university of technology February 27, 2017 28 / 28