Constraint-based Bayesian Network Learning with Permutation Tests

Similar documents
Lecture 1: Machine Learning Basics

Comparison of network inference packages and methods for multiple networks inference

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Semi-Supervised Face Detection

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule-based Expert Systems

Self Study Report Computer Science

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Lecture 10: Reinforcement Learning

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Planning with External Events

Probabilistic Latent Semantic Analysis

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Discriminative Learning of Beam-Search Heuristics for Planning

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

On the Combined Behavior of Autonomous Resource Management Agents

How do adults reason about their opponent? Typologies of players in a turn-taking game

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

The Strong Minimalist Thesis and Bounded Optimality

BMBF Project ROBUKOM: Robust Communication Networks

Henry Tirri* Petri Myllymgki

CS Machine Learning

A Model of Knower-Level Behavior in Number Concept Development

Acquiring Competence from Performance Data

Purpose of internal assessment. Guidance and authenticity. Internal assessment. Assessment

Level 1 Mathematics and Statistics, 2015

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Predicting Future User Actions by Observing Unmodified Applications

What is Thinking (Cognition)?

Major Milestones, Team Activities, and Individual Deliverables

Learning Methods for Fuzzy Systems

Reinforcement Learning by Comparing Immediate Reward

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Learning and Transferring Relational Instance-Based Policies

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Seminar - Organic Computing

Word learning as Bayesian inference

Liquid Narrative Group Technical Report Number

Calibration of Confidence Measures in Speech Recognition

A Model to Detect Problems on Scrum-based Software Development Projects

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Laboratorio di Intelligenza Artificiale e Robotica

understand a concept, master it through many problem-solving tasks, and apply it in different situations. One may have sufficient knowledge about a do

Knowledge-Based - Systems

Axiom 2013 Team Description Paper

Truth Inference in Crowdsourcing: Is the Problem Solved?

Knowledge Transfer in Deep Convolutional Neural Nets

MYCIN. The MYCIN Task

Corrective Feedback and Persistent Learning for Information Extraction

Georgetown University at TREC 2017 Dynamic Domain Track

Integrating E-learning Environments with Computational Intelligence Assessment Agents

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

(Sub)Gradient Descent

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

An OO Framework for building Intelligence and Learning properties in Software Agents

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Probability and Statistics Curriculum Pacing Guide

A Genetic Irrational Belief System

Mathematics subject curriculum

Universidade do Minho Escola de Engenharia

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Mining Student Evolution Using Associative Classification and Clustering

An empirical study of learning speed in backpropagation

BENCHMARK TREND COMPARISON REPORT:

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Introduction to Causal Inference. Problem Set 1. Required Problems

Reducing Features to Improve Bug Prediction

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Lecture 1: Basic Concepts of Machine Learning

A Comparison of Standard and Interval Association Rules

Regret-based Reward Elicitation for Markov Decision Processes

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning to Rank with Selection Bias in Personal Search

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Probabilistic Mission Defense and Assurance

A Reinforcement Learning Variant for Control Scheduling

Artificial Neural Networks written examination

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Rules from Incomplete Examples via Implicit Mention Models

On-Line Data Analytics

Transcription:

Constraint-based Bayesian Network Learning with Permutation Tests Marco Scutari marco.scutari@stat.unipd.it Adriana Brogini brogini@stat.unipd.it Department of Statistical Sciences June 15, 2010

Bayesian networks: definitions A Bayesian network B = (G, P) is a graphical model composed by: a directed acyclic graph G = (U, A). Each node represents a random variable X U and the arcs in A specify the conditional dependence structure of U. a global probability distribution P (U) defined over the variable set U. It can be factorized into a set of local probability distributions of the form P (U) = P (X i Π Xi ), X i U where Π Xi is the set of the parents of the node X i.

Learning Bayesian networks Model selection (usually called learning) of a Bayesian network is also performed in two steps: 1. structure learning: finding a graph structure that encodes the conditional independence (CI) relationships in the data. 2. parameter learning: fitting the parameters of each local distribution given the graph structure selected in the previous step. Most modern structure learning algorithms use conditional independence tests to find out CI constraints from data (constraint-based algorithms), sometimes together with goodness-of-fit scores (hybrid algorithms).

Parametric vs Permutation tests for structure learning Proofs of correctness of structure learning algorithms assume that conditional independence tests do not incur in type I or type II errors [6, 8, 10]. This makes the use of parametric tests problematic because: most of them are asymptotic or approximate; but they are often applied in situations where convergence is problematic (high-dimensional data, small n, large p settings). they require distributional assumptions which are difficult to justify and rarely satisfied by real-world data. Permutation tests do not present any of these limitations [7], and therefore result in a more effective model selection.

Model validation: experimental setting The impact of permutation tests on Bayesian network learning has been evaluated for the Max-Min Hill Climbing (MMHC) hybrid algorithm [9], which is one of the best performers up to date and has been extensively tested over a wide variety of data sets. In particular: data sets have been generated from the ALARM network [2], which is often used a benchmark for testing structure learning algorithms. ALARM contains 37 dicrete nodes, for a total of 509 parameters. the G 2 log-likelihood ratio test [1] have been used as a CI test, with an α = 0.05 threshold. G 2 is also equivalent to the mutual information CI test up to a constant [5].

Model validation: goodness of fit Goodness of fit has been measured with the following scores: the Bayesian Information Criterion (BIC) [4], which is a penalized likelihood score. the Bayesian Dirichlet equivalent (BDe) score [3], which is posterior Dirichlet distribution based on a uniform prior. the Structural Hamming Distance (SHD) score [9], which is an extension of Hamming s distance measure for undirected graphs. Each score has been computed on 4 sets of pairs of Bayesian networks learned from samples of different sizes (50 networks for each size) using parametric and permutation implementations of the G 2 CI test.

Effect on the BIC score of fitted networks 0.10 Relative BIC improvement 0.05 0.00 200 500 1000 5000 sample size

Effect on the BDe score of fitted networks 0.15 Relative BDe improvement 0.10 0.05 0.00 200 500 1000 5000 sample size

Effect on the BIC score, predictive goodness-of-fit 0.10 Relative BIC improvement 0.05 0.00 200 500 1000 5000 sample size

Effect on the BDe score, predictive goodness-of-fit 0.10 Relative BDe improvement 0.05 0.00 200 500 1000 5000 sample size

Effect on Structural Hamming Distance (SHD) 0.2 Relative SHD improvement 0.0 0.2 0.4 0.6 200 500 1000 5000 sample size

Conclusions The correctness of structure learning algorithms depends heavily on the performance of the underlying CI tests. Parametric tests are problematic in many real-world settings in which Bayesian networks are used ( small n, large p ). Model selection based on permutation tests consistently produces networks with higher BIC and BDEu scores for both small and moderately large sample sizes.

References

References References I A. Agresti. Categorical Data Analysis. Wiley, 2002. I.A Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, pages 247 256, 1989. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3):197 243, 1995. Available as Technical Report MSR-TR-94-09. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. S. Kullback. Information Theory and Statistics. Wiley, 1959. D. Margaritis. Learning Bayesian Network Model Structure from Data. PhD thesis, School of Computer Science, Carnegie-Mellon University, May 2003. Available as Technical Report CMU-CS-03-153.

References References II F. Pesarin. Multivariate Permutation Tests with Applications in Biostatistics. Wiley, 2001. I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for Large Scale Markov Blanket Discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference, pages 376 381, 2003. I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65(1):31 78, 2006. T. S. Verma and J. Pearl. Equivalence and Synthesis of Causal Models. Uncertainty in Artificial Intelligence, 6:255 268, 1991.