Machine Learning: Algorithms and Applications

Similar documents
Lecture 10: Reinforcement Learning

Laboratorio di Intelligenza Artificiale e Robotica

Artificial Neural Networks written examination

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Axiom 2013 Team Description Paper

Australian Journal of Basic and Applied Sciences

Lecture 1: Machine Learning Basics

Reinforcement Learning by Comparing Immediate Reward

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Functional Skills Mathematics Level 2 assessment

TD(λ) and Q-Learning Based Ludo Players

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Assignment 1: Predicting Amazon Review Ratings

Active Learning. Yingyu Liang Computer Sciences 760 Fall

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Lecture 1: Basic Concepts of Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

(Sub)Gradient Descent

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Radius STEM Readiness TM

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Diagnostic Test. Middle School Mathematics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Issues in the Mining of Heart Failure Datasets

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Firms and Markets Saturdays Summer I 2014

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Learning Methods in Multilingual Speech Recognition

CROSS COUNTRY CERTIFICATION STANDARDS

Extending Place Value with Whole Numbers to 1,000,000

Exploration. CS : Deep Reinforcement Learning Sergey Levine

SARDNET: A Self-Organizing Feature Map for Sequences

CSL465/603 - Machine Learning

Grade 6: Correlated to AGS Basic Math Skills

Word Segmentation of Off-line Handwritten Documents

Reducing Features to Improve Bug Prediction

Probabilistic Latent Semantic Analysis

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Rule Learning With Negation: Issues Regarding Effectiveness

On the Combined Behavior of Autonomous Resource Management Agents

Evolutive Neural Net Fuzzy Filtering: Basic Description

arxiv:cmp-lg/ v1 22 Aug 1994

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Multi-Lingual Text Leveling

Learning From the Past with Experiment Databases

Science Olympiad Competition Model This! Event Guidelines

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

BMBF Project ROBUKOM: Robust Communication Networks

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Truth Inference in Crowdsourcing: Is the Problem Solved?

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Learning Methods for Fuzzy Systems

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning Prospective Robot Behavior

Probability and Game Theory Course Syllabus

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Software Maintenance

Physics 270: Experimental Physics

B. How to write a research paper

Major Milestones, Team Activities, and Individual Deliverables

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

INPE São José dos Campos

ASSESSMENT TASK OVERVIEW & PURPOSE:

Rule Learning with Negation: Issues Regarding Effectiveness

Improving Action Selection in MDP s via Knowledge Transfer

1.11 I Know What Do You Know?

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

How to Judge the Quality of an Objective Classroom Test

An OO Framework for building Intelligence and Learning properties in Software Agents

Using focal point learning to improve human machine tacit coordination

Conference Presentation

Introducing the New Iowa Assessments Mathematics Levels 12 14

Matching Similarity for Keyword-Based Clustering

Financing Education In Minnesota

Go fishing! Responsibility judgments when cooperation breaks down

Transfer Learning Action Models by Measuring the Similarity of Different Domains

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Accessing Higher Education in Developing Countries: panel data analysis from India, Peru and Vietnam

The New York City Department of Education. Grade 5 Mathematics Benchmark Assessment. Teacher Guide Spring 2013

Introduction to Causal Inference. Problem Set 1. Required Problems

A Case Study: News Classification Based on Term Frequency

Mathematics process categories

CS 446: Machine Learning

Transcription:

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont ) Slides courtesy of Bing Liu: www.cs.uic.edu/~liub/webminingbook.html 1

Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use? Cluster evaluation Summary Mixed attributes The distance functions we have seen are for data with all numeric attributes, or all nominal attributes, etc. In many practical cases data has different types of attributes, from the following 6: interval-scaled ratio-scaled symmetric binary asymmetric binary nominal ordinal Clustering a data set involving mixed attributes is a challenging problem 2

Convert to a single type One common way of dealing with mixed attributes is to: 1. Choose a dominant attribute type 2. Convert the other types to this type E.g., if most attributes in a data set are interval-scaled we convert ordinal attributes and ratio-scaled attributes to interval-scaled attributes it is also appropriate to treat symmetric binary attributes as interval-scaled attributes Convert to a single type (cont ) It does not make much sense to convert a nominal attribute or an asymmetric binary attribute to an interval-scaled attribute but it is frequently done in practice by assigning some numbers to them according to some hidden ordering, e.g., prices of the fruits Alternatively, a nominal attribute can be converted to a set of (symmetric) binary attributes, which are then treated as numeric attributes 3

Combining individual distances This approach computes individual attribute distances and then combine them A combination formula, proposed by Gower, is The distance dist(x i,x j ) is between 0 and 1 r is the number of attributes f! ij r f f = 1! 1 if x if and x jf are not missing # = " 0 if x if or x jf is missing # 0 if attribute f is asymmetric and x if and x jf are both 0 $ # f ij δ d = 1 dist( x i, x j ) = r (4) δ d ij f is the distance contributed by attribute f, in the range [0,1] f ij f ij Combining individual distances (cont ) If f is a binary or nominal attribute " f $ 1 if x d ij = if! x # jf %$ 0 otherwise distance (4) reduces to equation (3)-lect 10 if all attributes are nominal the simple matching distance (1)-lect 10 if all attributes are symmetric binary the Jaccard distance (2)-lect 10 if all attributes are asymmetric If f is interval-scaled f d ij = x if! x jf R f R f is the value range of f R f = max( f )! min( f ) If all the attributes are interval-scaled, distance (4) reduces to Manhattan distance Assuming that all attributes values are standardized Ordinal and ratio-scaled attributes are converted to interval-scaled attributes and handled in the same way 4

Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use? Cluster evaluation Summary How to choose a clustering algorithm Clustering research has a long history A vast collection of algorithms are available We only introduced several main algorithms Choosing the best algorithm is challenging Every algorithm has limitations and works well with certain data distributions It is very hard, if not impossible, to know what distribution the application data follow The data may not fully follow any ideal structure or distribution required by the algorithms One also needs to decide how to standardize the data, to choose a suitable distance function and to select other parameter values 5

How to choose a clustering algorithm (cont ) Due to these complexities, the common practice is to 1. run several algorithms using different distance functions and parameter settings 2. carefully analyze and compare the results The interpretation of the results must be based on insight into the meaning of the original data knowledge of the algorithms used Clustering is highly application dependent and to certain extent subjective (personal preferences) Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use? Cluster evaluation Summary 6

Cluster Evaluation: hard problem The quality of a clustering is very hard to evaluate because We do not know the correct clusters Some methods are used User inspection A panel of experts inspects the resulting clusters and scores them Study centroids as spreads Examine rules (e.g., from a decision tree) that describe the clusters For text documents, one can inspect by reading The final score is the average of the individual scoring Manual inspection is labor intensive and time consuming Cluster evaluation: ground truth We use some labeled data (for classification) Assumption: Each class is a cluster Let the classes in the data D be C=(c 1, c 2,,c k ) The clustering method produces k clusters, which divides D into k disjoint subsets, D 1, D 2,, D k After clustering, a confusion matrix is constructed From the matrix, we compute various measurements: entropy, purity, precision, recall and F-score 7

Evaluation measures: Entropy For each cluster, we can measure the entropy as entropy(d i ) =!" Pr i (c j )log 2 Pr i (c j ) Pr i (c j ): proportion of class c j in cluster D i The entropy of the whole clustering is entropy total (D) = k j=1 k! i=1 D i D entropy(d i) D i / D is the weight of cluster D i, proportional to its size Evaluation measures: purity Measures the extent a cluster contains only one class of data The purity of the whole clustering is purity(d i ) = max j ( Pr(c j )) purity total (D) = D i / D is the weight of cluster D i, proportional to its size k! i=1 D i D purity(d i ) Precision, recall, and F-measure can be computed as well Based on the class that is most frequent in the cluster 8

An example We can use the total entropy or purity to compare different clustering results from the same algorithm different algorithms Precision, recall and F-measure can be computed as well for each cluster The precision of Science in cluster 1 is 0.89, the recall is 0.83, the F-measure is thus 0.86 A remark about ground truth evaluation Commonly used to compare different clustering algorithms A real-life data set for clustering has no class labels Thus although an algorithm may perform very well on some labeled data sets, no guarantee that it will perform well on the actual application data at hand The fact that it performs well on some label data sets does give us some confidence of the quality of the algorithm This evaluation method is said to be based on external data or information 9

Evaluation based on internal information Intra-cluster cohesion (compactness): Cohesion measures how near the data points in a cluster are to the cluster centroid Sum of squared error (SSE) is a commonly used measure Inter-cluster separation (isolation): Separation means that different cluster centroids should be far away from one another In most applications, expert judgments are still the key Indirect evaluation In some applications, clustering is not the primary task, but used to help perform another task We can use the performance on the primary task to compare clustering methods For instance, in an application, the primary task is to provide recommendations on book purchasing to online shoppers If we can cluster shoppers according to their features, we might be able to provide better recommendations We can evaluate different clustering algorithms based on how well they help with the recommendation task Here, we assume that the recommendation can be reliably evaluated 10

Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use? Cluster evaluation Summary Summary Clustering is has along history and still active There are a huge number of clustering algorithms More are still coming every year We only introduced several main algorithms. There are many others, e.g., density based algorithm, sub-space clustering, scale-up methods, neural networks based methods, fuzzy clustering, co-clustering, etc. Clustering is hard to evaluate, but very useful in practice This partially explains why there are still a large number of clustering algorithms being devised every year Clustering is highly application dependent and to some extent subjective 11

Reinforcement Learning These slides are an adaptation of slides drawn by Tom Mitchell and modified by Liviu Ciortuz Introduction Supervised learning is the simplest and most studied type of learning How can an agent learn behaviors when it doesn t have a teacher to tell it how to perform? The agent has a task to perform It takes some actions in the world At some later point, it gets feedback telling it how well it did on performing the task The agent performs the same task over and over again This problem is called reinforcement learning: The agent gets positive reward for tasks done well The agent gets negative reward for tasks done poorly 12

Introduction (cont ) The goal is to get the agent to act in the world so as to maximize its rewards The agent has to figure out what it did that made it get the reward/punishment This is known as the credit assignment problem Reinforcement learning can be used to train computers to do many tasks, such as: playing board games job shop scheduling controlling robot flight/taxy scheduling Overview Task: Control learning make an autonomous agent (robot) to perform actions, observe consequences and learn a control strategy The Q learning algorithm acquire optimal control strategies from delayed rewards, even when the agent has no prior knowledge of the effect of its actions on the environment Reinforcement Learning is related to dynamic programming, used to solve optimization problems While DP assumes that the agent/program knows the effect (and rewards) of all its actions, in RL the agent has to experiment in the real world 13

Reinforcement Learning Problem Target function to learn:! : S! A Goal: maximize r 0 +!r 1 +! 2 r 2 +... where 0!! <1 Example: play Backgammon (TD-Gammon [Tesauro, 1995]); immediate reward +100 if win, -100 if lose, 0 otherwise Control learning characteristics 14

Learning Sequential Control Strategies Using Markov Decision Processes Agent s Learning Task 15