Sparse Gaussian Graphical Models with Unknown Block Structure

Similar documents
Comparison of network inference packages and methods for multiple networks inference

Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

(Sub)Gradient Descent

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Truth Inference in Crowdsourcing: Is the Problem Solved?

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Introduction to Simulation

Assignment 1: Predicting Amazon Review Ratings

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Comment-based Multi-View Clustering of Web 2.0 Items

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Universityy. The content of

Speaker recognition using universal background model on YOHO database

Learning Methods in Multilingual Speech Recognition

CS Machine Learning

arxiv: v2 [cs.cv] 30 Mar 2017

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Model Ensemble for Click Prediction in Bing Search Ads

arxiv:cmp-lg/ v1 22 Aug 1994

Why Did My Detector Do That?!

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Corrective Feedback and Persistent Learning for Information Extraction

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Python Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

CSL465/603 - Machine Learning

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Attributed Social Network Embedding

Honors Mathematics. Introduction and Definition of Honors Mathematics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Theory of Probability

Learning Methods for Fuzzy Systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Applications of data mining algorithms to analysis of medical data

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Software Maintenance

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

Calibration of Confidence Measures in Speech Recognition

Corpus Linguistics (L615)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

Australian Journal of Basic and Applied Sciences

Visual CP Representation of Knowledge

Getting Started with Deliberate Practice

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Discriminative Learning of Beam-Search Heuristics for Planning

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Instructor: Matthew Wickes Kilgore Office: ES 310

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Statewide Framework Document for:

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Artificial Neural Networks written examination

arxiv: v1 [cs.cl] 2 Apr 2017

MGT/MGP/MGB 261: Investment Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Mathematics process categories

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Radius STEM Readiness TM

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

Modeling function word errors in DNN-HMM based LVCSR systems

College Pricing and Income Inequality

On-Line Data Analytics

Semi-Supervised Face Detection

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Organizational Knowledge Distribution: An Experimental Evaluation

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Arizona s College and Career Ready Standards Mathematics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

An Online Handwriting Recognition System For Turkish

Speech Recognition by Indexing and Sequencing

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Evolutive Neural Net Fuzzy Filtering: Basic Description

BMBF Project ROBUKOM: Robust Communication Networks

A study of speaker adaptation for DNN-based speech synthesis

Circuit Simulators: A Revolutionary E-Learning Platform

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Edinburgh Research Explorer

Transcription:

Sparse Gaussian Graphical Models with Unknown Block Structure Department of Computer Science University of British Columbia Department of Computer Science, University of British Columbia 1

Outline Introduction Related Work Graphical Lasso Group L1 Penalized Maximum Likelihood Sparse Dependency Networks Unknown Block Structure Model Variational Inference Experiments and Results Conclusions Department of Computer Science, University of British Columbia 2

Introduction: Covariance Estimation Estimating the covariance matrix Σ of a Gaussian distribution is known to be difficult when the number of data cases N is low relative to the number of data dimensions D. D=1 D=2 D=3 Department of Computer Science, University of British Columbia 3

Introduction: Covariance Selection In 1972, Dempster proposed clamping some of the elements of the precision matrix Ω = Σ -1 to zero as a way of controlling complexity and deriving better covariance estimates. Zeros in the precision matrix correspond to absent edges in the Gaussian Graphical Model (GGM). Favoring sparse precision matrices corresponds to favoring sparse GGMs. Precision X 0 GGM Graph Y Z 0 X Y Z Department of Computer Science, University of British Columbia 4

Introduction: Group Sparsity For some kinds of data, the variables can be clustered or grouped into types that share similar connectivity or correlation patterns. If we can infer these groups, we can use them to regularize precision matrix estimation in the N D and N<D regimes. Department of Computer Science, University of British Columbia 5

Introduction: Problem Statement The problem we address in this work is how to estimate sparse, block-structured Gaussian precision matrices when the blocks are not known a priori. Department of Computer Science, University of British Columbia 6

Outline Introduction Related Work Graphical Lasso Group L1 Penalized Maximum Likelihood Sparse Dependency Networks Unknown Block Structure Model Variational Inference Experiments and Results Conclusions Department of Computer Science, University of British Columbia 7

Related Work: Graphical Lasso The Graphical Lasso is a technique for sparse precision estimation based on independently penalizing the L1 norm of each precision matrix entry [Banerjee et al, Yuan & Lin]. S: Empirical covariance matrix. ν : Diagonal regularization parameter. λ : Off-diagonal regularization parameter. Department of Computer Science, University of British Columbia 8

Related Work: Group Graphical Lasso The graphical lasso has been extended to group sparsity by penalizing the norm of each block of the precision matrix given a known grouping of the variables [Duchi et al, Schmidt et al]. k l G k : Set of variables in group k. λ kl : Penalty parameter for entries between groups k and l. p kl : Norm on entries between groups k and l. Schmidt et al. use p kl = 1 within groups and p kl = 2 between. Department of Computer Science, University of British Columbia 9

Related Work: Sparse Dependency Nets In a sparse dependency net we penalize the L1 norm of the linear regression weights for each node j regressed on every other node i j [Meinshausen and Buhlmann]. We can extract a graph and fit GGM using IPF/gradient-based optimization. wji : Linear regression weight for node j given node i. x nj : Value of data dimension j for data case n. x n-j : Value of all data dimensions but j for data case n. λ : Penalty parameter. Department of Computer Science, University of British Columbia 10

Outline Introduction Related Work Graphical Lasso Group L1 Penalized Maximum Likelihood Sparse Dependency Networks Unknown Block Structure Model Variational Inference Experiments and Results Conclusions Department of Computer Science, University of British Columbia 11

Unknown Block Structure: Overview A Two-Stage Approach to Precision Estimation: 1. Use a hierarchical dependency network-based model to infer a grouping of the variables. 2. Fix the grouping and estimate the precision matrix using the Group L1/L2 method of Schmidt et al. Using group graphical lasso to estimate the precision matrix gives us block sparsity when it is well supported by the data, and block shrinkage in general. Department of Computer Science, University of British Columbia 12

Unknown Block Structure: Model Stochastic Block Model Dependency Network Spike and Slab style prior Department of Computer Science, University of British Columbia 13

Unknown Block Structure: Model Department of Computer Science, University of British Columbia 14

Unknown Block Structure: Inference Variational Bayes Approximation: We use a fully factorized variational Bayes approximation for learning. Department of Computer Science, University of British Columbia 15

Unknown Block Structure: Inference Variational Bayes Learning Algorithm: Department of Computer Science, University of British Columbia 16

Unknown Block Structure: Inference Extensions to Basic Variational Inference: The variational updates for the cluster indicators are tightly coupled together. To help get around this problem we introduce explicit cluster splitting steps based on graph cuts. For large problems, the dependency network weight updates are very costly at O(d 4 ) per iteration. We use a fast adaptive variational update schedule to help with this problem. Department of Computer Science, University of British Columbia 17

Outline Introduction Related Work Graphical Lasso Group L1 Penalized Maximum Likelihood Sparse Dependency Networks Unknown Block Structure Model Variational Inference Experiments and Results Conclusions Department of Computer Science, University of British Columbia 18

Experiments: Methods T: Tikhonov Regularization IL1: Independent L1 penalized maximum likelihood (aka graphical lasso) KGL1: Group L1/L2 penalized maximum likelihood with known groups. UGL1: Group L1/L2 penalized maximum likelihood with groups inferred by our hierarchical dependency network. UGL1F: Group L1/L2 penalized maximum likelihood with groups inferred by our hierarchical dependency network. Uses fast update schedule. Department of Computer Science, University of British Columbia 19

Experiments: Empirical Protocol We used fixed hyper-parameters for the hierarchical dependency network to infer the groups for UGL1 and UGL1F. We report five-fold cross validation test log likelihood estimates (relative to the Tikhonov baseline) as a function of the regularization parameter λ. We present results on two data sets. Department of Computer Science, University of British Columbia 20

Results: CMU Data Set CMU Motion Capture Data Set (N={25,50,75,100}, D=60): Department of Computer Science, University of British Columbia 21

Results: CMU Test Log Likelihood N=25 Known groups No Groups Inferred Groups Department of Computer Science, University of British Columbia 22

Results: CMU Test Log Likelihood N=50 Known groups Inferred Groups No Groups Department of Computer Science, University of British Columbia 23

Results: CMU Test Log Likelihood N=75 Known groups Inferred Groups No Groups Department of Computer Science, University of British Columbia 24

Results: CMU Test Log Likelihood N=100 Known groups Inferred Groups No Groups Department of Computer Science, University of British Columbia 25

Results: CMU Inferred Structures N=50 Department of Computer Science, University of British Columbia 26

Results: CMU Estimated Precision Matrix Department of Computer Science, University of British Columbia 27

Results: Gasch Genes Data Set Gasch Genes Data Set (N=174,D=667): Department of Computer Science, University of British Columbia 28

Results: Genes Test Set Log Likelihood Department of Computer Science, University of British Columbia 29

Results: Genes Inferred Structures Department of Computer Science, University of British Columbia 30

Results: Genes Estimated Precision Department of Computer Science, University of British Columbia 31

Outline Introduction Related Work Graphical Lasso Group L1 Penalized Maximum Likelihood Sparse Dependency Networks Unknown Block Structure Model Variational Inference Experiments and Results Conclusions Department of Computer Science, University of British Columbia 32

Conclusions and Future Work We have demonstrated a method for estimating sparse block-structured precision matrices when the blocks are not known a priori. The method is based on using variational inference in a hierarchical dependency network model to estimate the blocks, combined with convex optimization to estimate the precision matrix given the blocks. In work appearing at UAI 09, we present an alternative approach based on converting the graphical lasso and group L1/L2 penalty functions into distributions on positive definite matrices. Department of Computer Science, University of British Columbia 33

The End Department of Computer Science, University of British Columbia 34