Conceptual Clustering - PDF Free Download

Conceptual Clustering What is conceptual clustering Why? Conceptual vs. Numerical clustering Definitions & key-points Approaches The AQ/CLUSTER approach Adapting STAR generation for conceptual Clustering The COBWEB conceptual clustering approach University of Crete Fall 2000 course 00/

Conceptual Clustering: What How to group examples/ cases/ observations/ objects. Based on their descriptions Unsupervised Learning Method no class assignment to cases Example: Taxonomy of species BODY_COVER HEART_CHAMPER BODY_TEMP FERTILIZATION s1 hair four regulated internal s2 feathers four regulated internal s3 cornified imperf-four unregulated internal s4 moist three unregulated external s5 scales two unregulated external Hierarchical Conceptual Clustering {s1,, s5} {s1, s2} { s3 } {s4, s5} { s1 } { s2 } cornified, impref.-four, { s4 } { s5 } hair, four, hair, four, moist, three, Scales, two, University of Crete Fall 2000 course 01/

Conceptual Clustering vs Numerical Clustering Numeric: based on distances Two groups hard to interpret? Conceptual: based on their descriptions One group DIAMOND concept Points, facts, observations, instances, examples, cases are are put put together if if they represent the the same concept University of Crete Fall 2000 course 02/

Conceptual Clustering: Key-points A conceptual clustering system accepts a set of object descriptions(events, facts, observations, ) and produces a classification scheme over them Semantic-Network class sub-class instances HIERARCHY Do not require a teacher Unsupervised learning An evaluation function is needed for the goodness of clustering Contextual factors: Performance is the resulting classifications of any good? Environment dynamic changes then, hierarchical clustering University of Crete Fall 2000 course 03/

Conceptual Clustering: Definition Given: A set of unclassified instances I An evaluation function e Do: Create a set of clusters for I that maximizes e Clusters need to be disjoint?? Clusters can be hierarchically related Evaluation functions (for the quality of clusters) Maximize intra-cluster similarity Maximize inter-cluster dissimilarity Prefer simpler clustering (Ocam s razor) University of Crete Fall 2000 course 04/

Conceptual Clustering: Definitions & key-points -2 Performance measures Ability to predict all or, important attributes Comprehensibility & utility of induced clusters Ability to generate hierarchy Recognition process Structured descriptions ML contribution to clustering Representation: symbolic variables Automatic characterization of induced clusters University of Crete Fall 2000 course 05/

Conceptual Clustering: Approaches CLUSTER [Michalski & Stepp, 1983] STAR generation Hierarchical organization Branches are distinguishing characterizations Hill-climbing with backtracking Pres-specified # of clusters AUTOCLASS [Cheesman et. All., 1988] Probability distributions of member s values Bayesian Finds most probable partition of instances, maximizing: p( θ, π N ) p( p( θ, π D, N ) = p( D Not pre-specified # of clusters D θ, π N ) COBWEB [Fisher, 1987] Statistical measure of category (cluster) utility # number of clusters NOT pre-specified Incremental University of Crete Fall 2000 course 06/, N )

Conceptual Clustering: AQ/ CLUSTER Adapt AQ for conceptual clustering AQ requires the classification into POSitive and NEGative examples Given: A collection of events, E The number of clusters desired, k The criterion of clustering quality LEF Find: A disjoint set of clustering of the collection of events that optimizes the given criterion of clustering quality University of Crete Fall 2000 course 07/

AQ/ CLUSTER: Terminology Variables Nominal (categorical): DOMAIN(X i ) = {v 1, v 2,, n m } Linear (quantitative): DOMAIN(X i ) = [v i..v j ] Structured: polygon shape oval 3-sides 4-sides circle ellipse triangle rectangle trapezoid square Syntactic Distance d( e1,e2 ) = i sd( x 1 ; i,x2; j ) Relational Statement [X i : R i ] R i the reference of a variable e.g., [length > 2], [color = blue OR red], [weight = 2..5], University of Crete Fall 2000 course 08/

Conceptual Clustering: Adapting STAR generation 1. k events are selected (= the number of clusters wanted) 2. G(e i E-{e i }) (STAR) is generated for each event against the other events 3. The complexes are modified to construct a disjoint cover that optimizes LEF 4. Termination (condition)? 5. Choose new seeds If cluster quality improves choose central events If cluster quality is not improving choose border events Central events: Those nearest the geometric-mean of the set of events in the cluster At the end one has a set of clusters and their descriptions (RULES) For each cluster perform the same hierarchy University of Crete Fall 2000 course 09/

Adapting STAR generation for conceptual clustering: Disjoint covers We have k covers each covering one event and not k-1 others From the rest of the events in the given set determine those covered by more than one of the k covers (Multiple Covered Event List m-list) The size of this list is a measure of cluster quality If m-list is empty termination e Refunion: complexes complex Linear e1 = (2, 3, 0, 1) new e selected e2 = (0, 2, 1, 1) new e selected c = [X1 = 2..3] [X2=4][X3=0][X4=2] c = [X1 = 0..3][X2=2..4][X3=0..1][X4=1..2] Structured: climb the generalization tree Quality: Sparseness of a cluster r(c)= 1 [p(c)/(p(c)+s(c)] p(c): # covered by c MINIMIZE TOTAL SPARESENESS s(a): # covered by E-c MAXIMIZE SIMPLICITY as less atts University of Crete Fall 2000 course 10/

AQ/CLUSTER: Flow Chart Given: E a set of data events K the number of clusters LEF the clustering quality criterion(a) (1) Choose initial k seed events from A (2) Determine a star for each seed against the other seed events (3) By appropriately modifying and selecting complexes from stars, construct a disjoint cover of E that optimizes the LEF criterion(a) (4) Is the termination criterion satisfied? (5) Is the clustering quality improved? Choose k new central events Choose k new border events University of Crete Fall 2000 course 11/

COBWEB: The Basics Representation: Attribute-Value pairs Search: Heuristic statistical evaluation measure Hierarchical clustering: different state representations Method: operators to built classification schemes Control: high level algorithmic process applying evaluation measure; forming states applying operators Ability to identify basic level categories Basic level categories (e.g., Bird) are retrieved more quickly that either more general (e.g., Animal) or, more specific (e.g., robin) Efficient Recognition process Better classification Animal Bird robin a basic level category Maximize inference related capabilities University of Crete Fall 2000 course 12/

COBWEB: Towards a measure of Category Utility Trade off between intra-class similarity & inter-class dissimilarity An Index for intra-class similarity maximize p( A i = V / C a continuous analogue of logical necessity ij k ) the higher this probability the more necessary is A i =V ij for predicting C k more necessary to have objects sharing this att-value pair in the same category the higher this probability the greater the proportion of class members sharing this att-value pair University of Crete Fall 2000 course 13/

COBWEB: Towards a measure of Category Utility -2 Trade off between intra-class similarity & inter-class dissimilarity An Index for inter-class disimilarity maximize p( C / A = k i V ij ) a continuous analogue of logical sufficiency the higher this probability the more sufficient is A i =V ij for predicting C k the higher this probability less sufficiently predict other classes sharing this att-value pair University of Crete Fall 2000 course 14/

Category Utility: Final Definition CU n ({ C 1,C 2,..., C n }) [ p( C k ) p( A i = V ij ) / C k ) p( A i = k = 1 i j i j # of att-values correctly guessed given C k = n 2 V ij ) 2 ] # of att-values correctly The special case of irrelevant attributes A i = V ij independent of class membership p(a i =V ij /C k ) = p(a i =V ij ) If for-all j values CU = 0 A i is irrelevant University of Crete Fall 2000 course 15/

COBWEB: The Operators Operator-1: (a) (a) Placing an object in an existing node (sub-cluster in the hierarchy) Place object in each (so-far) sub-cluster compute CU i (i = # so-far sub-clusters) Identify BEST CUi object is placed in the corresponding node BEST Operator-2: Creation of a new class (sub-cluster) (a) (a) Apply Operator-1 CU[object in BEST host] take previous results Compute CU[n-so=far U NewNode] CU[n-so=far + NewNode] > CU[object in BEST host] Create NEW-NODE (new class/ sub-cluster) BEST NEW-NODE not predefined # of clusters University of Crete Fall 2000 course 16/

COBWEB: The Operators -2 Operator-3: Merging two nodes Up one level ( because operators-1,2 are biased to initial input as objects are coming!!!) (a) Do it for all node-pairs MERGED-NEW-NODE: SUM probabilities of merged nodes MERGED-NEW-NODE: SUM of probabilities from merged nodes CU < quality improves CU[merged] University of Crete Fall 2000 course 17/

COBWEB: The Operators -3 Operator-4: Splitting two nodes Down one level ( because operators-1,2 are biased to initial input as objects are coming!!!) (a) Do it for all nodes SPLIT-NEW-NODE SPLIT-NEW-NODEs: CU < quality improves CU[+ split-new-nodes] = CU[- split-node] University of Crete Fall 2000 course 18/

COBWEB: The algorithmic process The CONTROL Search: Hill-climbing with Backtracking N=Node; I = New instance Train(N,I) = IF leaf(n) THEN create_sub-tree(n,i) ELSE Incorporate(I,N); Update N s probabilities Until all Instances are presented Compute score of placing I in each child of N N1 = child with highest score = HIGH N2 = child with second highest score NEW = score when placing I as a new child of N MERGE = score of merging N1 and N2 and putting I in merged node SPLIT = score of splitting N1 into its children IF highest score is: HIGH: Train(N1,I) NEW: Add I as a new child of N MERGE: Train(merge(N1,N2,N),I) SPLIT: Train(split(N1,N),I) University of Crete Fall 2000 course 19/

COBWEB:The different Uses CLASSIFICATION 1. Eliminate class from data 2. Form COBWEB classification tree 3. Each unseen (test) example is passed through the tree to reach a leaf 4. The Best-host-node is used to classify the case take as its class the class with the highest # of objects in the node INFER ATT-VALUES 5. For an unknown-value in a test example, predict its value from the att-values of the objects in the Best-host-node highest att-value occurrence University of Crete Fall 2000 course 20/

COBWEB: Incrementality & its evaluation 4 criteria for evaluating an incremental system ( like COBWEB) COST of incorporating a single Instance QUALITY of learned classification tree # Objects to STABILIZE classification tree COST = O(B 2 log B n * A * V) B: average branching factor log B n: maximum depth n: classified objects so-far A: # of attributes V: mean # of values for atts University of Crete Fall 2000 course 21/