Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Probabilistic Latent Semantic Analysis

CS Machine Learning

Learning From the Past with Experiment Databases

Lecture 1: Basic Concepts of Machine Learning

Semi-Supervised Face Detection

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

STA 225: Introductory Statistics (CT)

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Emotion Recognition Using Support Vector Machine

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Australian Journal of Basic and Applied Sciences

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Time series prediction

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Reducing Features to Improve Bug Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Probability and Statistics Curriculum Pacing Guide

Lecture 10: Reinforcement Learning

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Human Emotion Recognition From Speech

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speaker recognition using universal background model on YOHO database

Calibration of Confidence Measures in Speech Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A study of speaker adaptation for DNN-based speech synthesis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Generative models and adversarial training

Switchboard Language Model Improvement with Conversational Data from Gigaword

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Artificial Neural Networks written examination

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A Case Study: News Classification Based on Term Frequency

Issues in the Mining of Heart Failure Datasets

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Welcome to. ECML/PKDD 2004 Community meeting

Corrective Feedback and Persistent Learning for Information Extraction

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v2 [cs.cv] 30 Mar 2017

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Reinforcement Learning Variant for Control Scheduling

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Axiom 2013 Team Description Paper

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

An Online Handwriting Recognition System For Turkish

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Word Segmentation of Off-line Handwritten Documents

Why Did My Detector Do That?!

Evolutive Neural Net Fuzzy Filtering: Basic Description

Detailed course syllabus

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning Methods for Fuzzy Systems

arxiv: v1 [cs.lg] 15 Jun 2015

Speech Recognition at ICSI: Broadcast News and beyond

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Introduction to Simulation

WHEN THERE IS A mismatch between the acoustic

Knowledge Transfer in Deep Convolutional Neural Nets

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Universidade do Minho Escola de Engenharia

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Reinforcement Learning by Comparing Immediate Reward

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Toward Probabilistic Natural Logic for Syllogistic Reasoning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Predicting Outcomes Based on Hierarchical Regression

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Transcription:

Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt Many of the figures are provided by Chris Bishop. Programming Social Robots 4, Zheng-Hua Tan 1 Course outline 1. Introduction to Robot Operating System (ROS) 2. Introduction to isociobot and NAO robot, and demos 3. Social Robots and Applications 4. Machine Learning and Pattern Recognition 5. Speech Processing I: Acquisition of Speech, Feature Extraction and Speaker Localization 6. Speech Processing II: Speaker Identification and Speech Recognition 7. Image Processing I: Image Acquisition, Pre-processing and Feature Extraction 8. Image Processing II: Face Detection and Face Recognition 9. User Modelling 10. Multimodal Human-Robot Interaction Programming Social Robots 4, Zheng-Hua Tan 2

Classification examples Handwritten Digit Recognition It s not easy to recognize speech. It s not easy to wreck a nice beach. Programming Social Robots 4, Zheng-Hua Tan 3 Regression example Polynomial Curve Fitting from Bishop Programming Social Robots 4, Zheng-Hua Tan 4

Density estimation examples from Bishop Programming Social Robots 4, Zheng-Hua Tan 5 General information References Pattern Recognition and Machine Learning by Bishop Introduction to Machine Learning by Alpaydin For more resources, refer to http://kom.aau.dk/~zt/cources/machine_learning/ Machine_learning_resources.htm Programming Social Robots 4, Zheng-Hua Tan 6

Lecture outline Introduction Machine learning Concepts, supervised learning, unsupervised learning Memory-based learning Model-based learning Programming Social Robots 4, Zheng-Hua Tan 7 What is machine learning? Machine: computing device. Learning is acquiring and improving performance through experience. is the acquisition and development of memories and behaviors, including skills, knowledge, understanding, values, and wisdom. Machine learning is programming computers to optimize a performance criterion using example data or past experience. is concerned with the design and development of algorithms and techniques that allow computers to learn from examples or experiences. Programming Social Robots 4, Zheng-Hua Tan 8

Machine learning s WHs When: We want computers to learn when it is too difficult or too expensive to program them directly to perform a task (e.g., spam filtering). human expertise does not exist (e.g., navigating on Mars), humans are unable to explain their expertise (e.g., speech recognition) Solution changes in time (routing on a computer network) What: Get the computer to learn density, discriminant or regression functions by showing examples of inputs (and outputs). How: We write a parameterized program, and let the learning algorithm find the set of parameters that best approximates the desired function or behavior. Programming Social Robots 4, Zheng-Hua Tan 9 Why study machine learning? Build intelligent computer systems that acquire or improve knowledge from examples adapt to users, customize and be context-aware discover patterns in large databases (data mining) Timing is good ubiquitous computing: computers are cheap, powerful and everywhere progresses in algorithm and theory development abundant data Study is needed develop new algorithms, and understand which algorithms should be applied in which circumstances, primarily aiming at good generalization performance on unseen test data Programming Social Robots 4, Zheng-Hua Tan 10

Related subjects and applications Statistics: statistical estimation targets the same problem as machine learning and most learning algorithms are statistical in nature. Pattern Recognition is when the output of the learning machine is a set of discrete categories. Data Mining is when machine learning is applied to a large database. Applications: speech recognition, handwriting recognition, bio-informatics, adaptive control, natural language processing, web search and text classification, fraud detection, time-series prediction, etc. Programming Social Robots 4, Zheng-Hua Tan 11 Types of machine learning Supervised learning: given inputs along with corresponding outputs, find the correct outputs for test inputs Classification: 1-of-N discrete output (pattern recognition) Regression: real-valued output (prediction) Unsupervised learning: given only inputs without outputs as training, find structure in the space density estimation clustering dimensionality reduction Reinforcement learning: given inputs from the environment, take actions that affect the environment and produce action sequences that maximize the expected scalar reward or punishment. This is similar to animal learning. Programming Social Robots 4, Zheng-Hua Tan 12

Supervised learning {input, output} Classification: assign each input to one of a finite number of discrete categories. Learn a decision boundary that separates one class from the other. Two separate stages: Inference stage: use a training data to learn a model for p(c k x), being either probabilistic generative or discriminative model. Decision stage: use these posterior probabilities to make optimal class assignments. Alternatively, we can solve both problems together and simply learn a discriminant function that maps inputs x directly into decisions. Regression: the desired output consists of one or more continuous variables. Learn a continuous input-output mapping from a limited number of examples. Regression is also known as curve fitting or function approximation. Programming Social Robots 4, Zheng-Hua Tan 13 Supervised learning {input, output} How to represent the inputs and outputs How to select a both powerful and searchable hypothesis space to represent the relationship between inputs and outputs Programming Social Robots 4, Zheng-Hua Tan 14

Unsupervised learning {input} Discover the unknown structure of the inputs. density estimation: determine the probability density distribution of data within the input space, e.g. k-nn, histogram, kernel. clustering: discover groups of similar examples (clumps) within the data, e.g., k-means, EM. Dimensionality reduction: project the data from a highdimensional space down to low dimensions. Compression/quantization: discover a function that for each input computes a compact code from which the input can be reconstructed (clustering). Association E.g., in retail, from customer transactions to consumer behavior: people who bought A also bought B. Programming Social Robots 4, Zheng-Hua Tan 15 Introduction Machine learning Memory-based learning Model-based learning Lecture outline Programming Social Robots 4, Zheng-Hua Tan 16

Learning is more than memorization Constructing a lookup table is easy Simply store all the inputs and their corresponding outputs in the training data. For a new input, compare it to all the samples and produce the output associated with the matching prototype. Problem In general, new inputs are different from training prototypes. The key of learning is generalization: the ability to produce correct outputs or behavior on previously unseen inputs. Programming Social Robots 4, Zheng-Hua Tan 17 Memory based learning: a simple trick Compute the distances between the input and all the stored prototypes, instead of identity requirement. 1-nearest neighbor search: choose the class of the nearest prototype. K-nearest neighbor search: choose the class that has the majority among the K nearest prototypes. so called lazy learning, memory based learning, or instancebased learning ; similar to case based reasoning. Challenges What is the right similarity measure? High computational intensity for large number of prototypes The curse of dimensionality and data sparsity. Programming Social Robots 4, Zheng-Hua Tan 18

Lecture outline Introduction Machine learning Memory-based learning Model-based learning Over-fitting, bias-variance trade-off Programming Social Robots 4, Zheng-Hua Tan 19 Model based learning Build a model that is a good, useful approximation to the data, or construct a general, explicit description of the target function linear vs nonlinear parametric vs nonparametric Discard learning examples when they are processed efficient computationally efficient in memory use. It is limited by the used learning bias - a coarse approximation of the target function. Programming Social Robots 4, Zheng-Hua Tan 20

Linear classifier - two classes g T ( x) = w x + w0 ( x) C1 if g > 0 choose C2 otherwise from Alpaydin Programming Social Robots 4, Zheng-Hua Tan 21 Regression - polynomial curve fitting, again! from Bishop Programming Social Robots 4, Zheng-Hua Tan 22

Sum-of-squares error function from Bishop Programming Social Robots 4, Zheng-Hua Tan 23 Polynomials model selection excessively tuned to the random noise! Programming Social Robots 4, Zheng-Hua Tan 24

Over-fitting from Bishop The need for a separate validation (or hold-out) set for model selection. Root-Mean-Square (RMS) Error: Programming Social Robots 4, Zheng-Hua Tan 25 Polynomial coefficients various order Programming Social Robots 4, Zheng-Hua Tan 26

Regularization Penalize large coefficient values from Bishop Programming Social Robots 4, Zheng-Hua Tan 27 Regularization: vs. from Bishop Programming Social Robots 4, Zheng-Hua Tan 28

Polynomial coefficients various λ 9 th Order Polynomial Programming Social Robots 4, Zheng-Hua Tan 29 Data set size 9 th Order Polynomial N = 10 For a given model complexity, the over-fitting problem become less severe as the size of the data set increases. Programming Social Robots 4, Zheng-Hua Tan 30

Number of data sets 100 data sets; training multiple polynomials and then averaging them, the contribution from the variance term tended to cancel, leading to improved predictions. The dependence of bias and variance on model complexity variance bias Programming Social Robots 4, Zheng-Hua Tan 31 Bias-variance trade-off There is a trade-off between bias and variance, with very flexible models having low bias and high variance, and relatively rigid models having high bias and low variance. Mean square error of the estimator d for unknown parameter θ : r (d,θ) = E [(d θ) 2 ] = (E [d] θ) 2 + E [(d E [d]) 2 ] = Bias 2 + Variance Programming Social Robots 4, Zheng-Hua Tan 32

Beating the bias-variance trade-off We can reduce the variance by averaging lots of models trained on different datasets. This seems silly. If we had lots of different datasets it would be better to combine them into one big training set. (With more training data there will be much less variance.) Weird idea: We can create different datasets by bootstrap sampling of our single training dataset. This is called bagging and it works surprisingly well. But if we have enough computation its better to do the right Bayesian thing: Combine the predictions of many models using the posterior probability of each parameter vector as the combination weight. Programming Social Robots 4, Zheng-Hua Tan 33 Over-fitting a still unsolved problem! from Bishop The least squares approach to finding the model parameters resorts to intuition and represents a specific case of maximum likelihood, and that the over-fitting problem can be understood as a general property of ML. More principled approach - probability theory, foundation for machine learning. Programming Social Robots 4, Zheng-Hua Tan 34

Lecture outline Introduction Machine learning Concepts, supervised learning, unsupervised learning Memory-based learning Model-based learning Over-fitting, bias-variance trade-off Programming Social Robots 4, Zheng-Hua Tan 35 Course outline 1. Introduction to Robot Operating System (ROS) 2. Introduction to isociobot and NAO robot, and demos 3. Social Robots and Applications 4. Machine Learning and Pattern Recognition 5. Speech Processing I: Acquisition of Speech, Feature Extraction and Speaker Localization 6. Speech Processing II: Speaker Identification and Speech Recognition 7. Image Processing I: Image Acquisition, Pre-processing and Feature Extraction 8. Image Processing II: Face Detection and Face Recognition 9. User Modelling 10. Multimodal Human-Robot Interaction Programming Social Robots 4, Zheng-Hua Tan 36