Graphical Models for Genomic Selection

Similar documents
Comparison of network inference packages and methods for multiple networks inference

Lecture 1: Machine Learning Basics

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Probabilistic Latent Semantic Analysis

Semi-Supervised Face Detection

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A Model of Knower-Level Behavior in Number Concept Development

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Axiom 2013 Team Description Paper

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

STA 225: Introductory Statistics (CT)

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

The Enterprise Knowledge Portal: The Concept

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Python Machine Learning

Stopping rules for sequential trials in high-dimensional data

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning and Transferring Relational Instance-Based Policies

Cooperative evolutive concept learning: an empirical study

How do adults reason about their opponent? Typologies of players in a turn-taking game

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

Learning From the Past with Experiment Databases

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

EGRHS Course Fair. Science & Math AP & IB Courses

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Online Handwriting Recognition System For Turkish

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CS Machine Learning

12- A whirlwind tour of statistics

Discriminative Learning of Beam-Search Heuristics for Planning

Knowledge-Based - Systems

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Integrating E-learning Environments with Computational Intelligence Assessment Agents

Mathematics subject curriculum

Lecture 10: Reinforcement Learning

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Discovering Statistics

Planning with External Events

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Georgetown University at TREC 2017 Dynamic Domain Track

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Probabilistic Mission Defense and Assurance

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Seminar - Organic Computing

Machine Learning and Development Policy

BMBF Project ROBUKOM: Robust Communication Networks

Task Completion Transfer Learning for Reward Inference

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Investment in e- journals, use and research outcomes

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Human Emotion Recognition From Speech

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Wenguang Sun CAREER Award. National Science Foundation

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

On-the-Fly Customization of Automated Essay Scoring

Word learning as Bayesian inference

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Task Completion Transfer Learning for Reward Inference

Lecture 1: Basic Concepts of Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Causal Inference. Problem Set 1. Required Problems

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Cross-Year Stability in Measures of Teachers and Teaching. Heather C. Hill Mark Chin Harvard Graduate School of Education

A Case-Based Approach To Imitation Learning in Robotic Agents

Knowledge Transfer in Deep Convolutional Neural Nets

NIH Public Access Author Manuscript J Prim Prev. Author manuscript; available in PMC 2009 December 14.

Henry Tirri* Petri Myllymgki

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Abstractions and the Brain

Laboratorio di Intelligenza Artificiale e Robotica

Learning to Rank with Selection Bias in Personal Search

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Learning Methods for Fuzzy Systems

MODELING ITEM RESPONSE DATA FOR COGNITIVE DIAGNOSIS

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Introduction to Simulation

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Corrective Feedback and Persistent Learning for Information Extraction

Improving Action Selection in MDP s via Knowledge Transfer

Universidade do Minho Escola de Engenharia

understand a concept, master it through many problem-solving tasks, and apply it in different situations. One may have sufficient knowledge about a do

An Empirical and Computational Test of Linguistic Relativity

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Transcription:

Graphical Models for Genomic Selection Marco Scutari 1, Phil Howell 2 1 m.scutari@ucl.ac.uk Genetics Institute University College London 2 phil.howell@niab.com NIAB June 12, 2013

Background

Background Bayesian networks: an overview A Bayesian network (BN) [6, 7] is a combination of: directed graph G = (V, E), in which each node v i V corresponds to a random variable X i (a gene, a trait, an environmental factor, etc.); a global probability distribution, X = {X i }, which can be split into simpler local probability distributions according to the arcs a ij E present in the graph. This combination allows a compact representation of the joint distribution of high-dimensional problems, and simplifies inference using the graphical properties of G.

Background The two main properties of Bayesian networks X 1 Markov blanket X 3 X 5 X 7 X 9 The defining characteristic of BNs is that graphical separation implies (conditional) probabilistic independence. As a result, the global distribution factorises into local distributions: each one is associated with a node X i and depends only on its parents Π Xi, X 2 X 4 X 6 X 8 X 10 P(X) = p P(X i Π Xi ). i=1 Parents Children Children's other parents (Spouses) In addition, we can visually identify the Markov blanket of each node X i (the set of nodes that completely separates X i from the rest of the graph, and thus includes all the knowledge needed to do inference on X i ).

Background Bayesian networks for GS and GWAS From the definition, if we have a set of traits and markers for each variety, all we need for GS and GWAS are the Markov blankets of the traits [11]. Using common sense, we can make some additional assumptions: traits can depend on markers, but not vice versa; traits that are measured after the variety is harvested can depend on traits that are measured while the variety is still in the field (and obviously on the markers as well), but not vice versa. Most markers are discarded when the Markov blankets are learned. Only those that are parents of one or more traits are retained; all other markers effects are indirect and redundant once the Markov blankets have been learned. Assumptions on the direction of the dependencies allow to reduce Markov blankets learning to learning the parents of each trait, which is a much simpler task.

Learning

Learning Learning the Bayesian network 1. Feature Selection. 1.1 For each trait, use the SI-HITON-PC algorithm [1, 10] to learn the parents and the children of the trait; children can only be other traits, parents are mostly markers, spouses can be either. Dependencies are assessed with Student s t-test for Pearson s correlation [5] and α = 0.01. 1.2 Drop all the markers which are not parents of any trait. 2. Structure Learning. Learn the structure of the BN from the nodes selected in the previous step, setting the directions of the arcs according to the assumptions in the previous slide. The optimal structure can be identified with a suitable goodness-of-fit criterion such as BIC [9]. This follows the spirit of other hybrid approaches [3, 12], that have shown to be well-performing in literature. 3. Parameter Learning. Learn the parameters of the BN as a Gaussian BN [6]: each local distribution in a linear regression and the global distribution is a hierarchical linear model.

Learning The Parameters of the Bayesian Network The local distribution of each trait X i is a linear model X i = µ + Π Xi β + ε = µ + X j β j +... + X k β k + X l β l +... + X m β m +ε }{{}}{{} traits markers which can be estimated any frequentist or Bayesian approach in which the nodes in Π Xi are treated as fixed effects (e.g. ridge regression [4], elastic net [13], etc.). For each marker X i, the nodes in Π Xi are other markers in LD with X i since COR(X i, X j Π Xi ) 0 β j 0. This is also intuitively true for markers that are children of X i, as LD is symmetric.

Learning A caveat about causal interpretations http://xkcd.com/552/ Even though good BNs have a structure that mirrors cause-effect relationships [8], and even though there is ample literature on how to learn causal BNs from observational data, inferring causal effects from a BN requires great care even with completely independent data (i.e. with no family structure).

Learning The MAGIC data The MAGIC data (the same as in Ian s talk) include 721 varieties, 16K markers and the following phenotypes: flowering time (FT); height (HT); yield (YLD); yellow rust, as measured in the glasshouse (YR.GLASS); yellow rust, as measured in the field (YR.FIELD); mildew (MIL) and fusarium (FUS). Varieties with missing phenotypes or family information and markers with > 20% missing data were dropped. The phenotypes were adjusted for family structure via BLUP and the markers screened for MAF > 0.01 and COR < 0.99.

Learning Bayesian network learned from MAGIC G1184 G1130 G3993 G594 G1208 G2636 G4679 G5142 G1764 G239 G313 G3504 G512 G1558 G1878 G5389 G5717 G4557 G1986 G671 G3264 G3892 G1847 G2927 G6242 G4234 G3853 MIL G1097 G373 G1132 G5612 G1152 FT G3140 G2212 G3165 YR.GLASS G5914 G305 HT G4498 G470 G3043 YR.FIELD G1464 G3084 G3253 G4325 YLD FUS 51 nodes (7 traits, 44 markers), 86 arcs, 137 parameters for 600 obs.

Learning Phenotypic traits in MAGIC FT MIL HT YR.GLASS FUS YLD YR.FIELD

Learning Assessing arc strength with boostrap resampling Friedman et al. [2] proposed an approach to assess the strength of each arc based on bootstrap resampling and model averaging: 1. For b = 1, 2,..., m: 1.1 sample a new data set X b from the original data X using either parametric or nonparametric bootstrap; 1.2 learn the structure of the graphical model G b = (V, E b ) from X b. 2. Estimate the confidence that each possible edge e i is present in the true network structure G 0 = (V, E 0 ) as ˆp i = ˆP(e i ) = 1 m m 1l {ei E b }, b=1 where 1l {ei E b } is equal to 1 if e i E b and 0 otherwise.

Learning Averaged Bayesian network from MAGIC G1208 G1130 G4679 G5914 G5142 G3504 G512 G5717 G5389 G1764 G3264 G2927 G239 G1184 G313 G4557 G1558 G1132 G3140 G2212 G1152 G594 G4234 G373 G1097 G671 G1986 G3993 G1847 G1878 G6242 G3892 G3084 G3253 G4325 G4498 HT G3043 G1464 G470 MIL G3853 FT FUS G5612 G3165 G2636 YR.GLASS G305 YLD YR.FIELD 81 out of 86 arcs from the original BN are significant.

Learning Phenotypic traits in MAGIC FUS HT MIL YR.GLASS FT YLD YR.FIELD from to strength direction YR.GLASS YLD 0.636 1.000 YR.GLASS HT 0.074 0.648 YR.GLASS YR.FIELD 1.000 0.724 YR.GLASS FT 0.020 0.800 HT YLD 0.722 1.000 HT YR.FIELD 0.342 0.742 HT FUS 0.980 0.885 HT MIL 0.012 0.666 YR.FIELD YLD 0.050 1.000 YR.FIELD FUS 0.238 0.764 YR.FIELD MIL 0.402 0.661 FUS YR.GLASS 0.030 0.666 FUS YLD 0.546 1.000 FUS MIL 0.058 0.758 MIL YR.GLASS 0.824 0.567 MIL YLD 0.176 1.000 FT YLD 1.000 1.000 FT HT 0.420 0.809 FT YR.FIELD 0.932 0.841 FT FUS 0.436 0.692 FT MIL 0.080 0.825 Arcs in the BN are highlighted in red in the table.

Inference

Inference Inference in Bayesian networks Inference for BNs usually takes two forms: conditional probability queries, in which the distribution of one or more nodes of interest is investigated conditional on a second set of nodes (which are either completely or partially fixed); maximum a posteriori queries, in which the most likely outcome of a certain event (involving one or more nodes) conditional on evidence on a set of nodes (which are often completely fixed for computational reasons). In practice this amounts to answering what if? questions (hence the name queries) about what could happen in observed or unobserved scenarios using posterior probabilities or density functions.

Inference Flowering time: what if we fix directly related alleles? POPULATION EARLY LATE 0.10 Density 0.05 0.00 28.07 31.49 35.09 20 30 40 Flowering Time Fixing 6 genes that are parents of FT in the BN not to be homozygotes for late flowering (EARLY) or for early flowering (LATE). Heterozygotes are allowed in both cases.

Inference Flowering time: which nodes we used... G1208 G1130 G4679 G5914 G5142 G3504 G512 G5717 G5389 G1764 G3264 G2927 G239 G1184 G313 G4557 G1558 G1132 G3140 G2212 G1152 G594 G4234 G373 G1097 G671 G1986 G3993 G1847 G1878 G6242 G3892 G3084 G3253 G4325 G4498 HT G3043 G1464 G470 MIL G3853 FT FUS G5612 G3165 G2636 YR.GLASS G305 YLD YR.FIELD

Inference Yellow rust: what if we fix (in)directly related alleles? 0.8 POPULATION SUSCEPTIBLE (FIELD) SUSCEPTIBLE (ALL) RESISTANT (FIELD) RESISTANT (ALL) 0.6 Density 0.4 0.2 0.0 1.46 1.67 2.44 2.78 3.03 0 1 2 3 4 5 Yellow Rust (Field) Fixing 8 genes that are parents of YR.FIELD, then another 7 that are parents of YR.GLASS, either not to be homozygotes for yellow rust susceptibility or for yellow rust resistance. Heterozygotes are allowed in both cases.

Inference Yellow rust: nodes farther away can help... G1208 G1130 G4679 G5914 G5142 G3504 G512 G5717 G5389 G1764 G3264 G2927 G239 G1184 G313 G4557 G1558 G1132 G3140 G2212 G1152 G594 G4234 G373 G1097 G671 G1986 G3993 G1847 G1878 G6242 G3892 G3084 G3253 G4325 G4498 HT G3043 G1464 G470 MIL G3853 FT FUS G5612 G3165 G2636 YR.GLASS G305 YLD YR.FIELD

Inference G3140: can we guess the allele? 0.8 TALL SHORT 0.6 Density 0.4 0.2 0.0 0.32 1.5 0.0 0.5 1.0 1.5 2.0 G3140 If we have two varieties for which we scored low levels of fusarium (0 to 2), and are among the top 25% yielding, but one is tall (top 25%) and one is short (bottom 25%), which is the most probable allele for gene G3140?

Inference G3140: information travels backwards... G1208 G1130 G4679 G5914 G5142 G3504 G512 G5717 G5389 G1764 G3264 G2927 G239 G1184 G313 G4557 G1558 G1132 G3140 G2212 G1152 G594 G4234 G373 G1097 G671 G1986 G3993 G1847 G1878 G6242 G3892 G3084 G3253 G4325 G4498 HT G3043 G1464 G470 MIL G3853 FT FUS G5612 G3165 G2636 YR.GLASS G305 YLD YR.FIELD

Conclusions

Conclusions Conclusions Bayesian networks provide an intuitive representation of the relationships linking sets of phenotypes and markers, both within and between each other. Given a few reasonable assumptions, we can learn a Bayesian network for multiple trait GWAS and GS efficiently and reusing state-of-the-art general-purpose algorithms. Once learned, Bayesian networks provide a flexible tool for inference on both the markers and the phenotypes. Thanks!

Conclusions Acknowledgements Ian Mackay Phil Howell Nick Gosman Rhian Howells Richard Hornsell Pauline Bancept David Balding NIAB data preparation and general support has run the MAGIC programme and collected disease scores and yield data involved in the running of the MAGIC programmes collected the flowering time data performed crossing to create the MAGIC population and preparation of DNA collected the glasshouse yellow rust data UCL my Supervisor

References

References References I C. F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, and X. D. Xenofon. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research, 11:171 234, 2010. N. Friedman, M. Goldszmidt, and A. Wyner. Data Analysis with Bayesian Networks: A Bootstrap Approach. In Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 196 205. Morgan Kaufmann, 1999. N. Friedman, D. Pe er, and I. Nachman. Learning Bayesian Network Structure from Massive Datasets: The Sparse Candidate Algorithm. In Proceedings of 15th Conference on Uncertainty in Artificial Intelligence (UAI), pages 206 221. Morgan Kaufmann, 1999. A. E. Hoerl and R. W. Kennard. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1):55 67, 1970. H. Hotelling. New Light on the Correlation Coefficient and Its Transforms. Journal of the Royal Statistical Society. Series B (Methodological), 15(2):193 232, 1953. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

References References II J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988. J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009. G. E. Schwarz. Estimating the Dimension of a Model. Annals of Statistics, 6(2):461 464, 1978. M. Scutari. bnlearn: Bayesian Network Structure Learning, Parameter Learning and Inference, 2013. R package version 3.3. M. Scutari, I. Mackay, and D. J. Balding. Improving the Efficiency of Genomic Selection (submitted). Statistical Applications in Genetics and Molecular Biology, 2013. I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65(1):31 78, 2006. H. Zou and T. Hastie. Regularization and Variable Selection via the Elastic Net. J. Roy. Stat. Soc. B, 67(2):301 320, 2005.