Unsupervised Learning: Clustering

Similar documents
Generative models and adversarial training

Lecture 1: Machine Learning Basics

Python Machine Learning

Artificial Neural Networks written examination

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CS Machine Learning

CSL465/603 - Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

Probabilistic Latent Semantic Analysis

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Reinforcement Learning by Comparing Immediate Reward

Semi-Supervised Face Detection

Axiom 2013 Team Description Paper

Rule Learning With Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v1 [cs.lg] 15 Jun 2015

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Lecture 10: Reinforcement Learning

Rule Learning with Negation: Issues Regarding Effectiveness

FF+FPG: Guiding a Policy-Gradient Planner

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Issues in the Mining of Heart Failure Datasets

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Introduction to Simulation

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Speech Recognition at ICSI: Broadcast News and beyond

Evolutive Neural Net Fuzzy Filtering: Basic Description

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Softprop: Softmax Neural Network Backpropagation Learning

WHEN THERE IS A mismatch between the acoustic

Time series prediction

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

Discriminative Learning of Beam-Search Heuristics for Planning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Truth Inference in Crowdsourcing: Is the Problem Solved?

Australian Journal of Basic and Applied Sciences

Learning and Transferring Relational Instance-Based Policies

Learning Methods in Multilingual Speech Recognition

Human Emotion Recognition From Speech

12- A whirlwind tour of statistics

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Data Fusion Through Statistical Matching

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Learning Methods for Fuzzy Systems

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A Comparison of Annealing Techniques for Academic Course Scheduling

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Learning to Schedule Straight-Line Code

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

A Case Study: News Classification Based on Term Frequency

Assignment 1: Predicting Amazon Review Ratings

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Functional Skills Mathematics Level 2 assessment

Reducing Features to Improve Bug Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Model Ensemble for Click Prediction in Bing Search Ads

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv:cmp-lg/ v1 22 Aug 1994

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Word learning as Bayesian inference

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Universidade do Minho Escola de Engenharia

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Comment-based Multi-View Clustering of Web 2.0 Items

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Go fishing! Responsibility judgments when cooperation breaks down

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

An OO Framework for building Intelligence and Learning properties in Software Agents

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speech Emotion Recognition Using Support Vector Machine

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

TD(λ) and Q-Learning Based Ludo Players

STA 225: Introductory Statistics (CT)

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Georgetown University at TREC 2017 Dynamic Domain Track

CS 446: Machine Learning

Transcription:

Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer

Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non-parametric Y Continuous Y Discrete Gaussians Learned in closed form Linear Functions 1. Learned in closed form 2. Using gradient descent Decision Trees Greedy search; pruning Probability of class features 1. Learn P(Y), P(X Y); apply Bayes 2. Learn P(Y X) w/ gradient descent Non-probabilistic Linear: perceptron gradient descent Nonlinear: neural net: backprop Support vector machines 2

Overview of Learning Type of Supervision (eg, Experience, Feedback) What is Being Learned? Discrete Function Continuous Function Labeled Examples Classification Regression Policy Apprenticeship Learning Reward Reinforcement Learning Nothing Clustering PCA 3

Key Perspective on Learning Learning as Optimization Closed form Greedy search Gradient ascent Loss Function Error + regularization 4

Clustering systems: Unsupervised learning Clustering Requires data, but no labels Detect patterns eg in Group emails or search results Customer shopping patterns Program executions (intrusion detection) Useful when don t know what you re looking for But: often get gibberish

Clustering Basic idea: group together similar instances Example: 2D point patterns What could similar mean? One option: small (squared) Euclidean distance

Outline K-means & Agglomerative Clustering Expectation Maximization (EM) Principle Component Analysis (PCA) 7

An iterative clustering algorithm Pick K random points as cluster centers (means) Alternate: Assign data instances to closest cluster center Change the cluster center to the average of its assigned points Stop when no points assignments change K-Means

An iterative clustering algorithm Pick K random points as cluster centers (means) Alternate: Assign data instances to closest cluster center Change the cluster center to the average of its assigned points Stop when no points assignments change K-Means

K-means clustering: Example Pick K random points as cluster centers (means) 10

K-means clustering: Example Iterative Step 1 Assign data instances to closest cluster center 11

K-means clustering: Example Iterative Step 2 Change the cluster center to the average of the assigned points 12

K-means clustering: Example Repeat until convergence 13

K-means clustering: Example 14

K-means clustering: Example 15

K-means clustering: Example 16

Example: K-Means for Segmentation K=2 Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual Original image appearance.

Example: K-Means for Segmentation K=2 K=3 K=10 Original image

Example: K-Means for Segmentation K=2 K=3 K=10 Original image 4% 8% 17%

K-Means as Optimization Consider the total distance to the means: points assignments means Two stages each iteration: Update assignments: fix means c, change assignments a Update means: fix assignments a, change means c Co-ordinate Gradient Descent Will it converge? Yes!, if you can argue that each update can t increase Φ

Phase I: Update Assignments For each point, re-assign to closest mean: Can only decrease total distance phi!

Phase II: Update Means Move each mean to the average of its assigned points: Also can only decrease total distance (Why?) Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean

Initialization K-means is non-deterministic Requires initial means It does matter what you pick! What can go wrong? Various schemes for preventing this kind of thing: variancebased split / merge, initialization heuristics

K-Means Getting Stuck A local optimum:

K-Means Questions Will K-means converge? To a global optimum? Will it always find the true patterns in the data? If the patterns are very very clear? Runtime? Do people ever use it? How many clusters to pick?

Agglomerative Clustering Agglomerative clustering: First merge very similar instances Incrementally build larger clusters out of smaller clusters Algorithm: Maintain a set of clusters Initially, each instance in its own cluster Repeat: Pick the two closest clusters Merge them into a new cluster Stop when there s only one cluster left Produces not one clustering, but a family of clusterings represented by a dendrogram

Agglomerative Clustering How should we define closest for clusters with multiple elements?

Agglomerative Clustering How should we define closest for clusters with multiple elements? Many options: Closest pair (single-link clustering) Farthest pair (complete-link clustering) Average of all pairs Ward s method (min variance, like k-means) Different choices create different clustering behaviors

Clustering Behavior Average Farthest Nearest Mouse tumor data from [Hastie] 32

Agglomerative Clustering Questions Will agglomerative clustering converge? To a global optimum? Will it always find the true patterns in the data? Do people ever use it? How many clusters to pick?

Soft Clustering Clustering typically assumes that each instance is given a hard assignment to exactly one cluster. Does not allow uncertainty in class membership or for an instance to belong to more than one cluster. Soft clustering gives probabilities that an instance belongs to each of a set of clusters. Each instance is assigned a probability distribution across a set of discovered categories (probabilities of all categories must sum to 1). 34

Expectation Maximization (EM) Probabilistic method for soft clustering. Direct method that assumes k clusters:{c 1, c 2, c k } Soft version of k-means. Assumes a probabilistic model of categories that allows computing P(c i E) for each category, c i, for a given example, E. For text, typically assume a naïve-bayes category model. Parameters = {P(c i ), P(w j c i ): i {1, k}, j {1,, V }} 35

EM Algorithm Iterative method for learning probabilistic categorization model from unsupervised data. Initially assume random assignment of examples to categories. Learn an initial probabilistic model by estimating model parameters from this randomly labeled data. Iterate following two steps until convergence: Expectation (E-step): Compute P(c i E) for each example given the current model, and probabilistically re-label the examples based on these posterior probability estimates. Maximization (M-step): Re-estimate the model parameters,, from the probabilistically re-labeled data. 36

Acknowledgements K-means & Gaussian mixture models presentation contains material from excellent tutorial by Andrew Moore: http://www.autonlab.org/tutorials/ K-means Applet: http://www.elet.polimi.it/upload/matteucc/clustering /tutorial_html/appletkm.html Gaussian mixture models Applet: http://www.neurosci.aist.go.jp/%7eakaho/mixtureem.html