Lecture: Clustering and Segmentation

Similar documents
Python Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Generative models and adversarial training

INPE São José dos Campos

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

(Sub)Gradient Descent

Artificial Neural Networks written examination

Mandarin Lexical Tone Recognition: The Gating Paradigm

Lecture 1: Machine Learning Basics

On-Line Data Analytics

Learning Methods in Multilingual Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Statewide Framework Document for:

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Seminar - Organic Computing

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

On the Combined Behavior of Autonomous Resource Management Agents

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Short Text Understanding Through Lexical-Semantic Analysis

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speech Recognition by Indexing and Sequencing

TopicFlow: Visualizing Topic Alignment of Twitter Data over Time

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

WHEN THERE IS A mismatch between the acoustic

CSC200: Lecture 4. Allan Borodin

Matching Similarity for Keyword-Based Clustering

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cl] 2 Apr 2017

A Graph Based Authorship Identification Approach

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Radius STEM Readiness TM

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Copyright by Sung Ju Hwang 2013

Abstractions and the Brain

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Shared Challenges in Object Perception for Robots and Infants

MULTIMEDIA Motion Graphics for Multimedia

Grade 6: Correlated to AGS Basic Math Skills

A Case Study: News Classification Based on Term Frequency

Multimedia Application Effective Support of Education

R4-A.2: Rapid Similarity Prediction, Forensic Search & Retrieval in Video

Learning Methods for Fuzzy Systems

Word learning as Bayesian inference

The Strong Minimalist Thesis and Bounded Optimality

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Comment-based Multi-View Clustering of Web 2.0 Items

Applications of memory-based natural language processing

Probability and Statistics Curriculum Pacing Guide

Axiom 2013 Team Description Paper

We re Listening Results Dashboard How To Guide

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Constraining X-Bar: Theta Theory

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Ensemble Technique Utilization for Indonesian Dependency Parser

Transfer Learning Action Models by Measuring the Similarity of Different Domains

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Visual CP Representation of Knowledge

Discriminative Learning of Beam-Search Heuristics for Planning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Pre-AP Geometry Course Syllabus Page 1

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

An Introduction to the Minimalist Program

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Using dialogue context to improve parsing performance in dialogue systems

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Multivariate k-nearest Neighbor Regression for Time Series data -

What is beautiful is useful visual appeal and expected information quality

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

The taming of the data:

Content-based Image Retrieval Using Image Regions as Query Examples

Corrective Feedback and Persistent Learning for Information Extraction

Assignment 1: Predicting Amazon Review Ratings

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

TextGraphs: Graph-based algorithms for Natural Language Processing

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Why Did My Detector Do That?!

Reducing Features to Improve Bug Prediction

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Effective Supervision: Supporting the Art & Science of Teaching

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

arxiv: v2 [cs.cv] 30 Mar 2017

Modeling function word errors in DNN-HMM based LVCSR systems

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Vorlesung Mensch-Maschine-Interaktion

Issues in the Mining of Heart Failure Datasets

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Truth Inference in Crowdsourcing: Is the Problem Solved?

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

Introduction to the Practice of Statistics

Transcription:

Lecture: Clustering and Segmentation Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1

What we will learn today Introduction to segmentation and clustering Gestalt theory for perceptual grouping Agglomerative clustering Oversegmentation Reading: [FP] Chapters: 14.2, 14.4 Lecture 12-2

Lecture 12-3

Image Segmentation Goal: identify groups of pixels that go together Slide credit: Steve Seitz, Kristen Grauman Lecture 12-4

The Goals of Segmentation Separate image into coherent objects Image Human segmentation Slide credit: Svetlana Lazebnik Lecture 12-5

The Goals of Segmentation Separate image into coherent objects Group together similar-looking pixels for efficiency of further processing superpixels X. Ren and J. Malik. Learning a classification model for segmentation. ICCV 2003. Slide credit: Svetlana Lazebnik Lecture 12-6

Segmentation for feature support 50x50 Patch 50x50 Patch Lecture 12-7 Slide: Derek Hoiem

Segmentation for efficiency [Felzenszwalb and Huttenlocher 2004] [Hoiem et al. 2005, Mori 2005] [Shi and Malik 2001] Slide: Derek Hoiem Lecture 12-8

Segmentation as a result Rother et al. 2004 Lecture 12-9

Types of segmentations Oversegmentation Undersegmentation Multiple Segmentations Lecture 12-10

One way to think about segmentation is Clustering Clustering: group together similar data points and represent them with a single token Key Challenges: 1) What makes two points/images/patches similar? 2) How do we compute an overall grouping from pairwise similarities? Slide: Derek Hoiem Lecture 12-11

Why do we cluster? Summarizing data Look at large amounts of data Patch-based compression or denoising Represent a large continuous vector with the cluster number Counting Histograms of texture, color, SIFT vectors Segmentation Separate the image into different regions Prediction Images in the same cluster may have the same labels Slide: Derek Hoiem Lecture 12-12

How do we cluster? Agglomerative clustering Start with each point as its own cluster and iteratively merge the closest clusters K-means (next lecture) Iteratively re-assign points to the nearest cluster center Mean-shift clustering (next lecture) Estimate modes of pdf Lecture 12-13

General ideas Tokens whatever we need to group (pixels, points, surface elements, etc., etc.) Bottom up clustering tokens belong together because they are locally coherent Top down clustering tokens belong together because they lie on the same visual entity (object, scene ) > These two are not mutually exclusive Lecture 12-14

Examples of Grouping in Vision Grouping video frames into shots Determining image regions What things should be grouped? Figure-ground What cues indicate groups? Slide credit: Kristen Grauman Object-level grouping Lecture 12-15

Similarity Slide credit: Kristen Grauman Lecture 12-16

Symmetry Slide credit: Kristen Grauman Lecture 12-17

Common Fate Image credit: Arthus-Bertrand (via F. Durand) Slide credit: Kristen Grauman Lecture 12-18

Proximity Slide credit: Kristen Grauman Lecture 12-19

Muller-Lyer Illusion What makes the bottom line look longer than the top line? Lecture 12-20

What we will learn today Introduction to segmentation and clustering Gestalt theory for perceptual grouping Agglomerative clustering Oversegmentation Lecture 12-21

The Gestalt School Grouping is key to visual perception Elements in a collection can have properties that result from relationships The whole is greater than the sum of its parts Illusory/subjective contours Occlusion Familiar configuration http://en.wikipedia.org/wiki/gestalt_psychology Slide credit: Svetlana Lazebnik Lecture 12-22

Gestalt Theory Gestalt: whole or group Whole is greater than sum of its parts Relationships among parts can yield new properties/features Psychologists identified series of factors that predispose set of elements to be grouped (by human visual system) I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees. Max Wertheimer (1880-1943) Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/wertheimer/forms/forms.htm Lecture 12-23

Gestalt Factors These factors make intuitive sense, but are very difficult to translate into algorithms. Image source: Forsyth & Ponce Lecture 12-24

Continuity through Occlusion Cues Lecture 12-25

Continuity through Occlusion Cues Continuity, explanation by occlusion Lecture 12-26

Continuity through Occlusion Cues Image source: Forsyth & Ponce Lecture 12-27

Continuity through Occlusion Cues Image source: Forsyth & Ponce Lecture 12-28

Figure-Ground Discrimination Lecture 12-29

The Ultimate Gestalt? Lecture 12-30

What we will learn today Introduction to segmentation and clustering Gestalt theory for perceptual grouping Agglomerative clustering Oversegmentation Lecture 12-31

What is similarity? Similarity is hard to define, but We know it when we see it The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. Lecture 12-32

Clustering: distance measure Clustering is an unsupervised learning method. Given items, the goal is to group them into clusters. We need a pairwise distance/similarity function between items, and sometimes the desired number of clusters. When data (e.g. images, objects, documents) are represented by feature vectors, a commonly used similarity measure is the cosine similarity. Let be two data vectors. There is angle between the two vectors. Lecture 12-33

Defining Distance Measures Let x and x be two objects from the universe of possible objects. The distance (similarity) between x and x is a real number denoted by sim(x, x ). The euclidian similarity is defined as In contrast, cosine distance measure would be Lecture 12-34

Desirable Properties of a Clustering Algorithms Scalability (in terms of both time and space) Ability to deal with different data types Minimal requirements for domain knowledge to determine input parameters Interpretability and usability Optional Incorporation of user-specified constraints Lecture 12-35

Animated example Lecture 12-36 source

Animated example Lecture 12-37 source

Animated example Lecture 12-38 source

Agglomerative clustering Lecture 12-39 Slide credit: Andrew Moore

Agglomerative clustering Lecture 12-40 Slide credit: Andrew Moore

Agglomerative clustering Lecture 12-41 Slide credit: Andrew Moore

Agglomerative clustering Lecture 12-42 Slide credit: Andrew Moore

Agglomerative clustering Lecture 12-43 Slide credit: Andrew Moore

Agglomerative clustering How to define cluster similarity? - Average distance between points, - maximum distance - minimum distance - Distance between means or medoids How many clusters? - Clustering creates a dendrogram (a tree) - Threshold based on max number of clusters or based on distance between merges distance Lecture 12-44

Agglomerative Hierarchical Clustering - Algorithm Lecture 12-45

Different measures of nearest clusters Single Link Long, skinny clusters Lecture 12-46

Different measures of nearest clusters Complete Link Tight clusters Lecture 12-47

Different measures of nearest clusters Average Link Robust against noise. Lecture 12-48

Conclusions: Agglomerative Clustering Good Simple to implement, widespread application. Clusters have adaptive shapes. Provides a hierarchy of clusters. No need to specify number of clusters in advance. Bad May have imbalanced clusters. Still have to choose number of clusters or threshold. Does not scale well. Runtime of O(n 3 ). Can get stuck at a local optima. Lecture 12-49

What we will learn today? Introduction to segmentation and clustering Gestalt theory for perceptual grouping Agglomerative clustering Oversegmentation Lecture 12-50

How do we segment using Clustering? Solution: Oversegmentation algorithm Introduced by Felzenszwalb and Huttenlocher in the paper titled Efficient Graph- Based Image Segmentation. Lecture 12-51

Problem Formulation Graph G = (V, E) V is set of nodes (i.e. pixels) E is a set of undirected edges between pairs of pixels w(vi, vj ) is the weight of the edge between nodes vi and vj. S is a segmentation of a graph G such that G = (V, E ) where E E. S divides G into G such that it contains distinct clusters C. Lecture 12-52

Predicate for segmentation Predicate D determines whether there is a boundary for segmentation. Where dif(c1, C2 ) is the difference between two clusters. in(c1, C2 ) is the internal different in the clusters C1 and C2 Lecture 12-53

Predicate for Segmentation Predicate D determines whether there is a boundary for segmentation. The different between two components is the minimum weight edge that connects a node vi in clusters C1 to node vj in C2 Lecture 12-54

Predicate for Segmentation Predicate D determines whether there is a boundary for segmentation. In(C1, C2) is to the maximum weight edge that connects two nodes in the same component. Lecture 12-55

Predicate for Segmentation k/ C sets the threshold by which the components need to be different from the internal nodes in a component. Properties of constant k: If k is large, it causes a preference of larger objects. k does not set a minimum size for components. Lecture 12-56

Features and weights Project every pixel into feature space defined by (x, y, r, g, b). Every pixel is connected to its 8 neighboring pixels and the weights are determined by the difference in intensities. Weights between pixels are determined using L2 (Euclidian) distance in feature space. Edges are chosen for only top ten nearest neighbors in feature space to ensure run time of O(n log n) where n is number of pixels. Lecture 12-57

Results Lecture 12-58

What we have learned today? Introduction to segmentation and clustering Gestalt theory for perceptual grouping Agglomerative clustering Oversegmentation Lecture 12-59