Bayesian networks. Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/ / 20

Similar documents
Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Comparison of network inference packages and methods for multiple networks inference

Lecture 10: Reinforcement Learning

Laboratorio di Intelligenza Artificiale e Robotica

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Seminar - Organic Computing

Corrective Feedback and Persistent Learning for Information Extraction

Laboratorio di Intelligenza Artificiale e Robotica

Semi-Supervised Face Detection

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

CS Machine Learning

A Version Space Approach to Learning Context-free Grammars

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Knowledge-Based - Systems

CS 446: Machine Learning

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Finding Your Friends and Following Them to Where You Are

Learning From the Past with Experiment Databases

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Truth Inference in Crowdsourcing: Is the Problem Solved?

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

On-Line Data Analytics

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning With Negation: Issues Regarding Effectiveness

An Online Handwriting Recognition System For Turkish

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Theory of Probability

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Lecture 1: Basic Concepts of Machine Learning

BMBF Project ROBUKOM: Robust Communication Networks

Universidade do Minho Escola de Engenharia

Softprop: Softmax Neural Network Backpropagation Learning

Applications of data mining algorithms to analysis of medical data

Rule Learning with Negation: Issues Regarding Effectiveness

Probability and Statistics Curriculum Pacing Guide

CSL465/603 - Machine Learning

Introduction to Simulation

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Python Machine Learning

Artificial Neural Networks written examination

Beyond the Pipeline: Discrete Optimization in NLP

Julia Smith. Effective Classroom Approaches to.

Probabilistic Latent Semantic Analysis

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Mining Student Evolution Using Associative Classification and Clustering

Australian Journal of Basic and Applied Sciences

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chapter 2 Rule Learning in a Nutshell

Probabilistic Mission Defense and Assurance

Biome I Can Statements

Probability and Game Theory Course Syllabus

Learning to Schedule Straight-Line Code

A Case Study: News Classification Based on Term Frequency

Word learning as Bayesian inference

INPE São José dos Campos

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Learning Methods for Fuzzy Systems

Grade 6: Correlated to AGS Basic Math Skills

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Model of Knower-Level Behavior in Number Concept Development

Axiom 2013 Team Description Paper

AMULTIAGENT system [1] can be defined as a group of

Learning and Transferring Relational Instance-Based Policies

Introduction to Causal Inference. Problem Set 1. Required Problems

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Bayesian Learning Approach to Concept-Based Document Classification

Intelligent Agents. Chapter 2. Chapter 2 1

Software Maintenance

Cooperative evolutive concept learning: an empirical study

A Genetic Irrational Belief System

Visual CP Representation of Knowledge

Assignment 1: Predicting Amazon Review Ratings

Investigating Ahuja-Orlin s Large Neighbourhood Search Approach for Examination Timetabling

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Short Text Understanding Through Lexical-Semantic Analysis

A Comparison of Annealing Techniques for Academic Course Scheduling

Speeding Up Reinforcement Learning with Behavior Transfer

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Issues in the Mining of Heart Failure Datasets

An Empirical and Computational Test of Linguistic Relativity

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Transcription:

Bayesian networks It is a formalism used to represent dependence/independence relationships among attributes as a DAG Each node stores the joint probability distribution of the variable and its parents in the graph From this and using the topology of the network we can infer the probability distribution of the values of any attribute given the values of any subset of attributes in the network Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 1 / 20

Bayesian networks Machine Learning Drinking Smoking Diet Blood Pressure Heart Attack Exercise Weight Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 2 / 20

What information they represent? The structure of a bayesian network represents: The joint probability distribution of the data How to factorize the joint probability distribution into independent components to reduce the cost of probability estimation From the bayesian network we can obtain The probability of the values of an attribute given a subset of attributes The set of attributes that have direct influence over an attribute Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 3 / 20

The Machine Learning perspective Allow to study attribute relationships Allow to discover the patterns of the relationships among attributes Allow to infer the values of an unknown attribute using their dependencies Can be used to discover the attributes relevance in classification tasks Can be used as a classifier (generalized version of naive bayes) Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 4 / 20

Learning Bayesian networks It is difficult to build a Bayesian Network (expert knowledge) The topology and the probabilities distribution can be learned from a dataset We want to learn the best network topology (DAG) that fits the data Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 5 / 20

Learning Bayesian networks We can define the problem as a search problem Search space: All possible DAGs that can be defined with the variables n ( f (n) = ( 1) i+1 n i i=1 ) 2 i(n i) f (n i) with f (0) = 1 and f (1) = 1 f (14) 1,8 10 31 Search operators: Possible modifications to a DAG Heuristic function: We want to obtain the most simple BN that explains the probability distribution of the data Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 6 / 20

Bayesian networks evaluation Three components: A priori information: Prior information that bias the search to specific types of networks, (for example: Prior probabilities, partial order among nodes,...) Information from the dataset: Joint probability estimation for a given topology, adequacy of the estimation to the actual data Network complexity: Value that allows to bias the search toward simpler networks, penalties over the number of connections or estimation parameters Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 7 / 20

Quality functions There are multiple goal functions that can be used to evaluate bayesian networks using different criteria: Bayesian estimation (like naive bayes/em algorithm) Minimum Description Length (DAG that best compresses the dataset) Information theory (like decision trees) The estimation methods differ on the types of the attributes: Bayesian networks with discrete attributes (multinomial) or continuous (multinormal) Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 8 / 20

Quality functions (multinomial networks) Given a dataset, this represents multiple instantiations of a random variable X = (X 1, X 2,... X n ) Each random variable X i has a multinomial distribution that has r i values Variable Values X 1 {a,b,c} X 2 {0,1} X 3 {a,b,c,d} X n {a,b} Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 9 / 20

Quality functions (multinomial networks) Π i is the set of parents of the variable X i (variables that have influence over X i ), for example: Π 1 = {X 2, X 3 } We define s i as the number of possible combinations of the values of the variables Π i (r i number of values of variable i) for example: s i = X j Π i r j s 1 = r 2 r 3 = 2 4 = 8 Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 10 / 20

Quality functions (multinomial networks) We define N ijk as the number of cases in the dataset where the variable X i = j (j is the j-th value of the variable) and the parents of X i have the values of the k-th combination of Π i We define N ik = r i j=0 N ijk as the number of cases in the dataset where the parents of X i have the values of the k-th combination of Π i independently of the value of X i We can estimate the similarity of this two probability distributions as: r i s i j=1 k=1 N ijk log N ijk N ik Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 11 / 20

Quality functions (multinomial networks) A B C f(a,b) f(a,b,c) 0 0 0 0.5 0.4 0 0 1 0.5 0.1 0 1 0 0.2 0 0 1 1 0.2 0.2 1 0 0 0.1 0.05 1 0 1 0.1 0.05 1 1 0 0.2 0.15 1 1 1 0.2 0.05 A, B C? r C s C j=1 k=1 N Cjk log N Cjk N Ck Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 12 / 20

Quality functions (multinomial networks) Summing this values for all the variables of the network gives an estimation about how well the topology of the network fits the joint probability distribution of the data Q I (B) = n r i s i i=1 j=1 k=1 N ijk log N ijk N ik Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 13 / 20

Quality functions (multinomial networks) The apriori probability of the network p(d) This probability can be described explicitly as probability over the network topologies Bias information can be introduced in the search (order of exploration of the variables, constraints about possible parents of a variable, maximum number of parents,...) Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 14 / 20

Quality functions (multinomial networks) The penalty for the complexity of the network (the simpler the better) This penalty is usually a function of the number of parameters to estimate the probabilities and the size of the dataset There are different criterion Maximum Likelihood information criteria: No penalization for network complexity Akaike Information Criteria: The penalty is the number of parameters to estimate the network probabilities Minimum Description Length criteria: The penalty is the product of the number of parameters and the logarithm of the size of the dataset Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 15 / 20

Algorithm K-2 Hill climbing search The initial state is an empty network We assume that the precedence order of the nodes are known We evaluate all possible parents for a node (Only those that precede it in the order) We define q i (Π i ) as the evaluation function restricted to the nodes that are parents of the variable X i We add all nodes that improve this function Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 16 / 20

Algorithm K-2 Algorithm: K-2 Order the variables for i=1 to n do Π i = for i=1 to n do repeat Select Y {X 1,..., X i 1 }\Π i that maximizes g = q i (Π i {Y }) δ = g q i (Π i ) if δ > 0 then Π i = Π i {Y } end until δ 0 o Π i = {X 1,..., X i 1 } end Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 17 / 20

Algorithm K-2 1 2 3 4 2 1 3 4 2 1 3 1 3 2 2 4 3 2 4 1 2 4 3 1 2 4 3 3 4 3?????? Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 18 / 20

Other Local Search approaches Genetic algorithms The chromosomes are the DAGs Crossover interchanges subgraphs between pairs of solutions Mutation adds, deletes and inverts edges Simulated annealing Search operators: Inversions, add/delete edges between pairs of nodes, add/delete edges between pairs of nodes and a third Accept changes with a probability Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 19 / 20

Probabilistical Graphical Models Bayesian networks are just one example of the family of Probabilistical Graphical Models Directed graphs: Tree Augmented Networks, Influence Diagrams Undirected graphs: Conditional Random Fields, Markov Random Fields Sequential directed graphs: Markov Chains, Hidden Markov Models, Markov Decision Processes These models are applied in several domains like: Speech recognition, vision, robotics, planning, human computer interaction,... Javier Béjarcbea (LSI - FIB) Bayesian networks Term 2011/2012 20 / 20