Bayesian Networks (Structure) Learning. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Similar documents
Lecture 1: Machine Learning Basics

Comparison of network inference packages and methods for multiple networks inference

Discriminative Learning of Beam-Search Heuristics for Planning

Semi-Supervised Face Detection

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probability and Statistics Curriculum Pacing Guide

(Sub)Gradient Descent

CS Machine Learning

The Good Judgment Project: A large scale test of different methods of combining expert predictions

CS 446: Machine Learning

Introduction to Simulation

Learning From the Past with Experiment Databases

Word learning as Bayesian inference

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

A Version Space Approach to Learning Context-free Grammars

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

STA 225: Introductory Statistics (CT)

Learning and Transferring Relational Instance-Based Policies

Proof Theory for Syntacticians

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Machine Learning and Development Policy

An Empirical and Computational Test of Linguistic Relativity

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Australian Journal of Basic and Applied Sciences

Corrective Feedback and Persistent Learning for Information Extraction

Radius STEM Readiness TM

A Case Study: News Classification Based on Term Frequency

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

MYCIN. The MYCIN Task

Analysis of Enzyme Kinetic Data

Python Machine Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Generative models and adversarial training

Switchboard Language Model Improvement with Conversational Data from Gigaword

Multimedia Application Effective Support of Education

B. How to write a research paper

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v1 [cs.cl] 2 Apr 2017

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

How do adults reason about their opponent? Typologies of players in a turn-taking game

INPE São José dos Campos

Planning with External Events

A Model of Knower-Level Behavior in Number Concept Development

Software Maintenance

Universidade do Minho Escola de Engenharia

Artificial Neural Networks written examination

MGT/MGP/MGB 261: Investment Analysis

Team Formation for Generalized Tasks in Expertise Social Networks

Axiom 2013 Team Description Paper

Learning to Rank with Selection Bias in Personal Search

Reducing Features to Improve Bug Prediction

Model Ensemble for Click Prediction in Bing Search Ads

CSC200: Lecture 4. Allan Borodin

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

CSL465/603 - Machine Learning

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Major Milestones, Team Activities, and Individual Deliverables

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Active Learning. Yingyu Liang Computer Sciences 760 Fall

On the Polynomial Degree of Minterm-Cyclic Functions

Managerial Decision Making

Lecture 10: Reinforcement Learning

An Introduction to Simio for Beginners

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Speech Recognition at ICSI: Broadcast News and beyond

Probabilistic Latent Semantic Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

Mathematics subject curriculum

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Executive Guide to Simulation for Health

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Welcome to. ECML/PKDD 2004 Community meeting

The Effect of Collaborative Partnerships on Interorganizational

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.lg] 15 Jun 2015

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Office Hours: Mon & Fri 10:00-12:00. Course Description

Detailed course syllabus

MODULE 4 Data Collection and Hypothesis Development. Trainer Outline

Math 181, Calculus I

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Transcription:

Bayesian Networks (Structure) Learning Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 1 Review Bayesian Networks Compact representation for probability distributions Flu Allergy Exponential reduction in number of parameters Sinus Fast probabilistic inference As shown in demo examples Compute P(X e) Headache Nose Today Learn BN structure 2 1

Learning Bayes nets Data x (1) x (m) structure CPTs P(X i Pa Xi ) parameters 3 Learning the CPTs Data For each discrete variable X i x (1) x (m) 4 2

Information-theoretic interpretation of maximum likelihood 1 Given structure, log likelihood of data: Flu Allergy Sinus Headache Nose 5 Information-theoretic interpretation of maximum likelihood 2 Given structure, log likelihood of data: Flu Allergy Sinus Headache Nose 6 3

Information-theoretic interpretation of maximum likelihood 3 Given structure, log likelihood of data: Flu Allergy Sinus Headache Nose 7 Decomposable score Log data likelihood Decomposable score: Decomposes over families in BN (node and its parents) Will lead to significant computational efficiency!!! Score(G : D) = i FamScore(X i Pa Xi : D) 8 4

How many trees are there? Nonetheless Efficient optimal algorithm finds best tree 9 Scoring a tree 1: equivalent trees 10 5

Scoring a tree 2: similar trees 11 Chow-Liu tree learning algorithm 1 For each pair of variables X i,x j Compute empirical distribution: Compute mutual information: Define a graph Nodes X 1,,X n Edge (i,j) gets weight 12 6

Chow-Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN: pick any node as root, breadth-firstsearch defines directions 13 Structure learning for general graphs In a tree, a node only has one parent Theorem: The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d>1 Most structure learning approaches use heuristics (Quickly) Describe the two simplest heuristic 14 7

Learn BN structure using local search Starting from Chow-Liu tree Local search, possible moves: Add edge Delete edge Invert edge Score using BIC 15 Learn Graphical Model Structure using LASSO Graph structure is about selecting parents: Flu Sinus Headache Allergy Nose If no independence assumptions, then CPTs depend on all parents: With independence assumptions, depend on key variables: One approach for structure learning, sparse logistic regression! 16 8

What you need to know about learning BN structures Decomposable scores Maximum likelihood Information theoretic interpretation Best tree (Chow-Liu) Beyond tree-like models is NP-hard Use heuristics, such as: Local search LASSO 17 Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington October 27, 2013 18 9

What now We have explored many ways of learning from data But How good is our classifier, really? How much data do I need to make it good enough? 19 A simple setting Classification N data points Finite number of possible hypothesis (e.g., dec. trees of depth d) A learner finds a hypothesis h that is consistent with training data Gets zero error in training error train (h) = 0 What is the probability that h has more than ε true error? error true (h) ε 20 10

How likely is a bad hypothesis to get N data points right? Hypothesis h that is consistent with training data got N i.i.d. points right h bad if it gets all this data right, but has high true error Prob. h with error true (h) ε gets one data point right Prob. h with error true (h) ε gets N data points right 21 But there are many possible hypothesis that are consistent with training data 22 11

How likely is learner to pick a bad hypothesis Prob. h with error true (h) ε gets N data points right There are k hypothesis consistent with data How likely is learner to pick a bad one? 23 Union bound P(A or B or C or D or ) 24 12

How likely is learner to pick a bad hypothesis Prob. a particular h with error true (h) ε gets N data points right There are k hypothesis consistent with data How likely is it that learner will pick a bad one out of these k choices? 25 Generalization error in finite hypothesis spaces [Haussler 88] Theorem: Hypothesis space H finite, dataset D with N i.i.d. samples, 0 < ε < 1 : for any learned hypothesis h that is consistent on the training data: P (error true (h) > ) apple H e N 26 13