Probabilistic Graphical Models. Dr. Xiaowei Huang

Similar documents
Probabilistic Latent Semantic Analysis

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

STA 225: Introductory Statistics (CT)

The Good Judgment Project: A large scale test of different methods of combining expert predictions

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Semi-Supervised Face Detection

Universidade do Minho Escola de Engenharia

Probability and Statistics Curriculum Pacing Guide

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Comparison of network inference packages and methods for multiple networks inference

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

understand a concept, master it through many problem-solving tasks, and apply it in different situations. One may have sufficient knowledge about a do

CSL465/603 - Machine Learning

Python Machine Learning

A Genetic Irrational Belief System

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Mathematics subject curriculum

Probabilistic Mission Defense and Assurance

Managerial Decision Making

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule-based Expert Systems

Corrective Feedback and Persistent Learning for Information Extraction

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Knowledge-Based - Systems

Learning Methods for Fuzzy Systems

EGRHS Course Fair. Science & Math AP & IB Courses

Truth Inference in Crowdsourcing: Is the Problem Solved?

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Bug triage in open source systems: a review

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Methodology

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Machine Learning and Development Policy

A Bayesian Learning Approach to Concept-Based Document Classification

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Research Area

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Radius STEM Readiness TM

Introduction to Simulation

Integrating E-learning Environments with Computational Intelligence Assessment Agents

Evolutive Neural Net Fuzzy Filtering: Basic Description

Introduction to Causal Inference. Problem Set 1. Required Problems

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Theory of Probability

A Model of Knower-Level Behavior in Number Concept Development

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

The Evolution of Random Phenomena

OFFICE SUPPORT SPECIALIST Technical Diploma

Extracting Verb Expressions Implying Negative Opinions

Cross-Media Knowledge Extraction in the Car Manufacturing Industry

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Self Study Report Computer Science

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Learning From the Past with Experiment Databases

Finding Your Friends and Following Them to Where You Are

A Case Study: News Classification Based on Term Frequency

Seminar - Organic Computing

Assignment 1: Predicting Amazon Review Ratings

Australian Journal of Basic and Applied Sciences

Applications of data mining algorithms to analysis of medical data

City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

Experts Retrieval with Multiword-Enhanced Author Topic Model

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

MODELING ITEM RESPONSE DATA FOR COGNITIVE DIAGNOSIS

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

(Sub)Gradient Descent

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Model to Detect Problems on Scrum-based Software Development Projects

Visual CP Representation of Knowledge

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Hierarchical Linear Models I: Introduction ICPSR 2015

Latent Knowledge Structures of Traversal Behavior in Hypertext Environment

Short Text Understanding Through Lexical-Semantic Analysis

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

arxiv: v1 [cs.cl] 2 Apr 2017

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Content-based Image Retrieval Using Image Regions as Query Examples

Artificial Neural Networks written examination

THE world surrounding us involves multiple modalities

Mathematics process categories

An Investigation into Team-Based Planning

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Using dialogue context to improve parsing performance in dialogue systems

Planning with External Events

CS Machine Learning

A survey of multi-view machine learning

Transcription:

Probabilistic Graphical Models Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/

Up to now, Overview of Machine Learning Traditional Machine Learning Algorithms Deep learning

Topics Positioning of Probabilistic Inference Recap: Naïve Bayes Example Bayes Networks Example Probability Query What is Graphical Model

Perception-Cognition-Action Loop

What s left? environment action, e.g., planning knowledge sampling inference dataset learning perception structural representation, e.g., Probabilistic graphical model

What are Graphical Models? Model Data:

Fundamental Questions Representation How to capture/model uncertainties in possible worlds? How to encode our domain knowledge/assumptions/constraints? Inference How do I answers questions/queries according to my model and/or based on given data? Learning Which model is right for the data: MAP and MLE?

Recap: Naïve Bayes

Parameters for Joint Distribution Each X i represents outcome of tossing coin i Assume coin tosses are marginally independent i.e., therefore Recall: assumption for naïve Bayes If we use standard parameterization of the joint distribution, the independence structure is obscured and required 2 n parameters However we can use a more natural set of parameters: n parameters

Recap of Basic Prob. Concepts What is the joint probability distribution on multiple variables? How many state configuration in total? Are they all needed to be represented? Do we get any scientific insight? Recall: naïve Bayes

Conditional Parameterization Example: Company is trying to hire recent graduates Goal is to hire intelligent employees No way to test intelligence directly But have access to Student s score Which is informative but not fully indicative Two random variables Intelligence: Score: Joint distribution has 4 entries Need three parameters, high and low, high and low I S P(I,S) i 0 s 0 0.665 i 0 s 1 0.035 i 1 s 0 0.06 i 1 s 1 0.24 Joint distribution

Alternative Representation: Conditional Parameterization Representation more compatible with causality Intelligence influenced by Genetics, upbringing Score influenced by Intelligence Note: BNs are not required to follow causality but they often do Need to specify and i 0 i 1 0.7 0.3 I s 0 s 1 I 0 0.95 0.05 i 1 0.2 0.8 Intelligence Three binomial distributions (3 parameters) needed One marginal, two conditionals, Score

Nai ve Bayes Model represents grades A, B, C i 0 i 1 0.7 0.3 I g 1 g 2 g 3 i 0 0.2 0.34 0.46 i 1 0.74 0.17 0.09 I I s 0 s 1 I 0 0.95 0.05 i 1 0.2 0.8 G S

Conditional Parameterization and Conditional Independences Conditional Parameterization is combined with Conditional Independence assumptions to produce very compact representations of high dimensional probability distributions

Recall: Nai ve Bayes Model Score and Grade are independent given Intelligence (assumption) Knowing Intelligence, Score gives no information about class grade Assertions From probabilistic reasoning From assumption Combining, we have Three binomials, two 3-value multinomials: 7 params More compact than joint distribution Therefore,

Example Bayes Networks

BN for General Naive Bayes Model Encoded using a very small number of parameters Linear in the number of variables

Application of Naive Bayes Model Medical Diagnosis Pathfinder expert system for lymph node disease (Heckerman et.al., 1992) Full BN agreed with human expert 50/53 cases Naive Bayes agreed 47/53 cases

Student Bayesian Network Difficulty Intelligence Grade Score letter

Student Bayesian Network X 1 Difficulty Intelligence X 2 X 3 Grade Score X 4 X 5 letter

Student Bayesian Network If Xs are conditionally independent (as described by a PGM), the joint distribution can be factored into a product of simpler terms, e.g., What s the benefit of using a PGM: Incorporation of domain knowledge and causal (logical) structures 1+1+4+2+2=8, a reduction from 2 5

Student Bayesian Network Represents joint probability distribution over multiple variables BNs represent them in terms of graphs and conditional probability distributions (CPDs) Resulting in great savings in no of parameters needed

Joint distribution from Student BN pa: parent nodes CPDs: Joint Distribution:

Example Probability Query

Example of Probability Query Posterior Marginal Estimation: Probability of Evidence: Here we are asking for a specific probability rather than a full distribution

Computing the Probability of Evidence Probability Distribution of Evidence Probability of Evidence More Generally

Rational Statistical Inference

What is a Graphical Model?

So What is a Graphical Model? In a nutshell, GM = Multivariate Statistics + Structure

What is a Graphical Model? The informal blurb: It is a smart way to write/specify/compose/design exponentially-large probability distributions without paying an exponential cost, and at the same time endow the distributions with structured semantics A more formal description: It refers to a family of distributions on a set of random variables that are compatible with all the probabilistic independence propositions encoded by a graph that connects these variables

Two types of GMs Directed edges give causality relationships (Bayesian Network or Directed Graphical Model): Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):

Example: Alarm Network

Example: Alarm Network

Example: Alarm Network