Conceptual Clustering

Similar documents
Lecture 1: Machine Learning Basics

First Grade Standards

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Grade 6: Correlated to AGS Basic Math Skills

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

CS Machine Learning

Chapter 2 Rule Learning in a Nutshell

Probability and Statistics Curriculum Pacing Guide

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Python Machine Learning

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Answer Key For The California Mathematics Standards Grade 1

(Sub)Gradient Descent

Dublin City Schools Mathematics Graded Course of Study GRADE 4

On-Line Data Analytics

A Version Space Approach to Learning Context-free Grammars

Missouri Mathematics Grade-Level Expectations

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Extending Place Value with Whole Numbers to 1,000,000

An Introduction to the Minimalist Program

Lecture 1: Basic Concepts of Machine Learning

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Mining Student Evolution Using Associative Classification and Clustering

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Matching Similarity for Keyword-Based Clustering

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Math Grade 3 Assessment Anchors and Eligible Content

Arizona s College and Career Ready Standards Mathematics

Standard 1: Number and Computation

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Rule Learning With Negation: Issues Regarding Effectiveness

Mathematics process categories

Software Maintenance

A Case-Based Approach To Imitation Learning in Robotic Agents

Visual CP Representation of Knowledge

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Ontologies vs. classification systems

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Assignment 1: Predicting Amazon Review Ratings

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Evolutive Neural Net Fuzzy Filtering: Basic Description

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Proof Theory for Syntacticians

Multiplication of 2 and 3 digit numbers Multiply and SHOW WORK. EXAMPLE. Now try these on your own! Remember to show all work neatly!

Primary National Curriculum Alignment for Wales

Chunk Formation in Immediate Memory and How It Relates to Data Compression

Rule Learning with Negation: Issues Regarding Effectiveness

Mathematics subject curriculum

AQUA: An Ontology-Driven Question Answering System

STA 225: Introductory Statistics (CT)

Pre-AP Geometry Course Syllabus Page 1

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

USING LEARNING THEORY IN A HYPERMEDIA-BASED PETRI NET MODELING TUTORIAL

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Australian Journal of Basic and Applied Sciences

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Learning Methods in Multilingual Speech Recognition

Probabilistic Latent Semantic Analysis

Learning Methods for Fuzzy Systems

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The Singapore Copyright Act applies to the use of this document.

Learning goal-oriented strategies in problem solving

Contents. Foreword... 5

Florida Mathematics Standards for Geometry Honors (CPalms # )

arxiv:cmp-lg/ v1 22 Aug 1994

TextGraphs: Graph-based algorithms for Natural Language Processing

Ontological spine, localization and multilingual access

Ohio s Learning Standards-Clear Learning Targets

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

Innovative Methods for Teaching Engineering Courses

Generative models and adversarial training

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

CSL465/603 - Machine Learning

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Probability and Game Theory Course Syllabus

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

PowerTeacher Gradebook User Guide PowerSchool Student Information System

United States Symbols First Grade By Rachel Horen. Featured Selection:

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Transcription:

Conceptual Clustering What is conceptual clustering Why? Conceptual vs. Numerical clustering Definitions & key-points Approaches The AQ/CLUSTER approach Adapting STAR generation for conceptual Clustering The COBWEB conceptual clustering approach University of Crete Fall 2000 course 00/

Conceptual Clustering: What How to group examples/ cases/ observations/ objects. Based on their descriptions Unsupervised Learning Method no class assignment to cases Example: Taxonomy of species BODY_COVER HEART_CHAMPER BODY_TEMP FERTILIZATION s1 hair four regulated internal s2 feathers four regulated internal s3 cornified imperf-four unregulated internal s4 moist three unregulated external s5 scales two unregulated external Hierarchical Conceptual Clustering {s1,, s5} {s1, s2} { s3 } {s4, s5} { s1 } { s2 } cornified, impref.-four, { s4 } { s5 } hair, four, hair, four, moist, three, Scales, two, University of Crete Fall 2000 course 01/

Conceptual Clustering vs Numerical Clustering Numeric: based on distances Two groups hard to interpret? Conceptual: based on their descriptions One group DIAMOND concept Points, facts, observations, instances, examples, cases are are put put together if if they represent the the same concept University of Crete Fall 2000 course 02/

Conceptual Clustering: Key-points A conceptual clustering system accepts a set of object descriptions(events, facts, observations, ) and produces a classification scheme over them Semantic-Network class sub-class instances HIERARCHY Do not require a teacher Unsupervised learning An evaluation function is needed for the goodness of clustering Contextual factors: Performance is the resulting classifications of any good? Environment dynamic changes then, hierarchical clustering University of Crete Fall 2000 course 03/

Conceptual Clustering: Definition Given: A set of unclassified instances I An evaluation function e Do: Create a set of clusters for I that maximizes e Clusters need to be disjoint?? Clusters can be hierarchically related Evaluation functions (for the quality of clusters) Maximize intra-cluster similarity Maximize inter-cluster dissimilarity Prefer simpler clustering (Ocam s razor) University of Crete Fall 2000 course 04/

Conceptual Clustering: Definitions & key-points -2 Performance measures Ability to predict all or, important attributes Comprehensibility & utility of induced clusters Ability to generate hierarchy Recognition process Structured descriptions ML contribution to clustering Representation: symbolic variables Automatic characterization of induced clusters University of Crete Fall 2000 course 05/

Conceptual Clustering: Approaches CLUSTER [Michalski & Stepp, 1983] STAR generation Hierarchical organization Branches are distinguishing characterizations Hill-climbing with backtracking Pres-specified # of clusters AUTOCLASS [Cheesman et. All., 1988] Probability distributions of member s values Bayesian Finds most probable partition of instances, maximizing: p( θ, π N ) p( p( θ, π D, N ) = p( D Not pre-specified # of clusters D θ, π N ) COBWEB [Fisher, 1987] Statistical measure of category (cluster) utility # number of clusters NOT pre-specified Incremental University of Crete Fall 2000 course 06/, N )

Conceptual Clustering: AQ/ CLUSTER Adapt AQ for conceptual clustering AQ requires the classification into POSitive and NEGative examples Given: A collection of events, E The number of clusters desired, k The criterion of clustering quality LEF Find: A disjoint set of clustering of the collection of events that optimizes the given criterion of clustering quality University of Crete Fall 2000 course 07/

AQ/ CLUSTER: Terminology Variables Nominal (categorical): DOMAIN(X i ) = {v 1, v 2,, n m } Linear (quantitative): DOMAIN(X i ) = [v i..v j ] Structured: polygon shape oval 3-sides 4-sides circle ellipse triangle rectangle trapezoid square Syntactic Distance d( e1,e2 ) = i sd( x 1 ; i,x2; j ) Relational Statement [X i : R i ] R i the reference of a variable e.g., [length > 2], [color = blue OR red], [weight = 2..5], University of Crete Fall 2000 course 08/

Conceptual Clustering: Adapting STAR generation 1. k events are selected (= the number of clusters wanted) 2. G(e i E-{e i }) (STAR) is generated for each event against the other events 3. The complexes are modified to construct a disjoint cover that optimizes LEF 4. Termination (condition)? 5. Choose new seeds If cluster quality improves choose central events If cluster quality is not improving choose border events Central events: Those nearest the geometric-mean of the set of events in the cluster At the end one has a set of clusters and their descriptions (RULES) For each cluster perform the same hierarchy University of Crete Fall 2000 course 09/

Adapting STAR generation for conceptual clustering: Disjoint covers We have k covers each covering one event and not k-1 others From the rest of the events in the given set determine those covered by more than one of the k covers (Multiple Covered Event List m-list) The size of this list is a measure of cluster quality If m-list is empty termination e Refunion: complexes complex Linear e1 = (2, 3, 0, 1) new e selected e2 = (0, 2, 1, 1) new e selected c = [X1 = 2..3] [X2=4][X3=0][X4=2] c = [X1 = 0..3][X2=2..4][X3=0..1][X4=1..2] Structured: climb the generalization tree Quality: Sparseness of a cluster r(c)= 1 [p(c)/(p(c)+s(c)] p(c): # covered by c MINIMIZE TOTAL SPARESENESS s(a): # covered by E-c MAXIMIZE SIMPLICITY as less atts University of Crete Fall 2000 course 10/

AQ/CLUSTER: Flow Chart Given: E a set of data events K the number of clusters LEF the clustering quality criterion(a) (1) Choose initial k seed events from A (2) Determine a star for each seed against the other seed events (3) By appropriately modifying and selecting complexes from stars, construct a disjoint cover of E that optimizes the LEF criterion(a) (4) Is the termination criterion satisfied? (5) Is the clustering quality improved? Choose k new central events Choose k new border events University of Crete Fall 2000 course 11/

COBWEB: The Basics Representation: Attribute-Value pairs Search: Heuristic statistical evaluation measure Hierarchical clustering: different state representations Method: operators to built classification schemes Control: high level algorithmic process applying evaluation measure; forming states applying operators Ability to identify basic level categories Basic level categories (e.g., Bird) are retrieved more quickly that either more general (e.g., Animal) or, more specific (e.g., robin) Efficient Recognition process Better classification Animal Bird robin a basic level category Maximize inference related capabilities University of Crete Fall 2000 course 12/

COBWEB: Towards a measure of Category Utility Trade off between intra-class similarity & inter-class dissimilarity An Index for intra-class similarity maximize p( A i = V / C a continuous analogue of logical necessity ij k ) the higher this probability the more necessary is A i =V ij for predicting C k more necessary to have objects sharing this att-value pair in the same category the higher this probability the greater the proportion of class members sharing this att-value pair University of Crete Fall 2000 course 13/

COBWEB: Towards a measure of Category Utility -2 Trade off between intra-class similarity & inter-class dissimilarity An Index for inter-class disimilarity maximize p( C / A = k i V ij ) a continuous analogue of logical sufficiency the higher this probability the more sufficient is A i =V ij for predicting C k the higher this probability less sufficiently predict other classes sharing this att-value pair University of Crete Fall 2000 course 14/

Category Utility: Final Definition CU n ({ C 1,C 2,..., C n }) [ p( C k ) p( A i = V ij ) / C k ) p( A i = k = 1 i j i j # of att-values correctly guessed given C k = n 2 V ij ) 2 ] # of att-values correctly The special case of irrelevant attributes A i = V ij independent of class membership p(a i =V ij /C k ) = p(a i =V ij ) If for-all j values CU = 0 A i is irrelevant University of Crete Fall 2000 course 15/

COBWEB: The Operators Operator-1: (a) (a) Placing an object in an existing node (sub-cluster in the hierarchy) Place object in each (so-far) sub-cluster compute CU i (i = # so-far sub-clusters) Identify BEST CUi object is placed in the corresponding node BEST Operator-2: Creation of a new class (sub-cluster) (a) (a) Apply Operator-1 CU[object in BEST host] take previous results Compute CU[n-so=far U NewNode] CU[n-so=far + NewNode] > CU[object in BEST host] Create NEW-NODE (new class/ sub-cluster) BEST NEW-NODE not predefined # of clusters University of Crete Fall 2000 course 16/

COBWEB: The Operators -2 Operator-3: Merging two nodes Up one level ( because operators-1,2 are biased to initial input as objects are coming!!!) (a) Do it for all node-pairs MERGED-NEW-NODE: SUM probabilities of merged nodes MERGED-NEW-NODE: SUM of probabilities from merged nodes CU < quality improves CU[merged] University of Crete Fall 2000 course 17/

COBWEB: The Operators -3 Operator-4: Splitting two nodes Down one level ( because operators-1,2 are biased to initial input as objects are coming!!!) (a) Do it for all nodes SPLIT-NEW-NODE SPLIT-NEW-NODEs: CU < quality improves CU[+ split-new-nodes] = CU[- split-node] University of Crete Fall 2000 course 18/

COBWEB: The algorithmic process The CONTROL Search: Hill-climbing with Backtracking N=Node; I = New instance Train(N,I) = IF leaf(n) THEN create_sub-tree(n,i) ELSE Incorporate(I,N); Update N s probabilities Until all Instances are presented Compute score of placing I in each child of N N1 = child with highest score = HIGH N2 = child with second highest score NEW = score when placing I as a new child of N MERGE = score of merging N1 and N2 and putting I in merged node SPLIT = score of splitting N1 into its children IF highest score is: HIGH: Train(N1,I) NEW: Add I as a new child of N MERGE: Train(merge(N1,N2,N),I) SPLIT: Train(split(N1,N),I) University of Crete Fall 2000 course 19/

COBWEB:The different Uses CLASSIFICATION 1. Eliminate class from data 2. Form COBWEB classification tree 3. Each unseen (test) example is passed through the tree to reach a leaf 4. The Best-host-node is used to classify the case take as its class the class with the highest # of objects in the node INFER ATT-VALUES 5. For an unknown-value in a test example, predict its value from the att-values of the objects in the Best-host-node highest att-value occurrence University of Crete Fall 2000 course 20/

COBWEB: Incrementality & its evaluation 4 criteria for evaluating an incremental system ( like COBWEB) COST of incorporating a single Instance QUALITY of learned classification tree # Objects to STABILIZE classification tree COST = O(B 2 log B n * A * V) B: average branching factor log B n: maximum depth n: classified objects so-far A: # of attributes V: mean # of values for atts University of Crete Fall 2000 course 21/