Computational Biology

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

Knowledge-Based - Systems

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Neuroscience I. BIOS/PHIL/PSCH 484 MWF 1:00-1:50 Lecture Center F6. Fall credit hours

Python Machine Learning

CS/SE 3341 Spring 2012

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor.

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

CSL465/603 - Machine Learning

Lecture 1: Machine Learning Basics

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Biology 10 - Introduction to the Principles of Biology Spring 2017

Automatic document classification of biological literature

Master s Programme Comparative Biomedicine

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Learning From the Past with Experiment Databases

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Generative models and adversarial training

GUIDELINES FOR COMBINED TRAINING IN PEDIATRICS AND MEDICAL GENETICS LEADING TO DUAL CERTIFICATION

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Biological Sciences, BS and BA

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

What Teachers Are Saying

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Handbook for the Graduate Program in Quantitative Biomedicine

STA 225: Introductory Statistics (CT)

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Comparison of network inference packages and methods for multiple networks inference

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Human Emotion Recognition From Speech

University of Illinois

Statistics and Data Analytics Minor

Evolutive Neural Net Fuzzy Filtering: Basic Description

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Evolution of Symbolisation in Chimpanzees and Neural Nets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Department of Anatomy and Cell Biology Curriculum

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

A Genetic Irrational Belief System

Axiom 2013 Team Description Paper

Laboratorio di Intelligenza Artificiale e Robotica

Introductory Astronomy. Physics 134K. Fall 2016

Australian Journal of Basic and Applied Sciences

General Microbiology (BIOL ) Course Syllabus

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS Machine Learning

Probability and Game Theory Course Syllabus

Active Learning. Yingyu Liang Computer Sciences 760 Fall

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Notes For Agricultural Sciences Grade 12

Status of the MP Profession in Europe

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

BIOL 2402 Anatomy & Physiology II Course Syllabus:

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

AMULTIAGENT system [1] can be defined as a group of

Abstractions and the Brain

Speech Emotion Recognition Using Support Vector Machine

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

The Strong Minimalist Thesis and Bounded Optimality

PROGRAMME SPECIFICATION

Probability and Statistics Curriculum Pacing Guide

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

CORE CURRICULUM BOT 601 (Foundations in Current Botany) Terrestrial Plants. 1 st Lecture/Presentation (all MS and PhD) 2 nd Lecture (PhD only)

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Heredity In Plants For 2nd Grade

CS Course Missive

Biscayne Bay Campus, Marine Science Building (room 250 D)

Planning with External Events

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Uncertainty concepts, types, sources

Introduction to Forensic Drug Chemistry

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Wenguang Sun CAREER Award. National Science Foundation

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Lecture 10: Reinforcement Learning

CS 446: Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Corrective Feedback and Persistent Learning for Information Extraction

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Phone: Office Hours: 10:00-11:30 a.m. Mondays & Wednesdays

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

MTH 141 Calculus 1 Syllabus Spring 2017

MYCIN. The MYCIN Task

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

We will use the text, Lehninger: Principles of Biochemistry, as the primary supplement to topics presented in lecture.

MKT ADVERTISING. Fall 2016

Rule-based Expert Systems

Curriculum for the Bachelor Programme in Digital Media and Design at the IT University of Copenhagen

Foothill College Summer 2016

Transcription:

Computational Biology Instructor: Prof. Michael Q. Zhang (associate instructor: Dr. Pradipta Ray) BIOL 6385 / BMEN 6389 Spring (Jan. 10 Apr. 27) 2017, The University of Texas at Dallas

What the course teaches Computational and statistical methods for analyzing biological data and understanding the biological systems. Introduces computational aspects of : Genomics Evolution & phylogenetics Gene regulation & gene networks Focus on generic methods and algorithms, NOT on specific protocols or tools

Course resources Instructors ( contact details on website ) : Michael Zhang Pradipta Ray Milos Pavlovic Instructor Associate Instructor Teaching Assistant

Course resources Mailing list : biol6385@googlegroups.com Please email the instructors your convenient email ( UTD email preferred ) to join. This is a broadcast email list Only instructors post For students, it is best to directly email the instructors (email early, not late) Email is the preferred mode of communication

Course resources Website : home page ( dates, contacts, hours, news ) : http://utdallas.edu/~prr105020/biol6385/

Course resources Website : schedule tab ( schedule, handouts, HW, solns )

Course resources Website : course info tab ( course policy )

Course policy Attendance and participation : Active participation in class room discussion is expected. Attendance is mandatory except with special permission from the instructor.

Grading : Grading midterm and final exams, and 3 problem sets HWs (50%) Midterm (25%) Final exam (25%) This is a graduate course : don t focus on grades : the goal is to understand the subject matter! Final letter grades will depend on clustering and relative quantile profiles, not on direct translation of numerical grades.

Examinations Exams 75 minutes in duration. open book and open notes. No Computers or communication devices allowed. Mid term exam date: March 2, class hours, in class Final exam date: April 27, class hours, in class It is impossible for us to accommodate individual requests to reschedule the exams.

Homework To be done individually. Late homework: Homework is worth full credit at the beginning of class on the due date, It is worth 75% for the next 24 hours, 50% credit from 24 to 96 hours after the due date, 0% credit after that. Turn in all 3 HWs, even if for no credit, to pass the course. Late HW assignments must be turned in to the instructors.

Textbooks PRIMARY SECONDARY TERTIARY For how to access online, or from a library near you, check the class website.

Reference books http://work.caltech.edu/lectures.htm For how to access online, or from a library near you, check the class website.

5 sections Unit 1: Modelling Uncertainty in Biology How to build a framework to rationally deal with uncertainty : probability How to estimate and infer parameters associated with such uncertainty : statistics How to proceed when there are many sources of uncertainty in a system : bayes nets / deep neural networks sketchup.google.com

5 sections Unit 2: Molecular Sequence Analysis Searching and alignment of sequences Modelling composition of sequences and guessing their functionality : classification of subsequences and annotation Integrative analysis : how to combine evidence from multiple and extra-sequential sources when analyzing sequences commons.wikimedia.org

5 sections Unit 3: Markovian models Markov chains: The Markov condition among random variables, factoring the joint Hidden Markov Models: What happens when the state of the system is unobserved? Supervised and unsupervised inference : Forward- Backward type of algorithms, Baum-Welch / Expectation Maximization algorithm Pair and profile HMMs : Engineering Markovian models to solve computational biological problems Statpics.blogspot.com

5 sections Unit 4: Evolution & Comparative Genomics Evolutionary dynamics : how DNA may change by mutations Multiple sequence alignment : comparing sequences across individuals or species Phylogenetic trees : clustering based on sequences, explicitly modelling evolution of sequences tolweb.org

5 sections Unit 5: Generic Machine Learning Approaches for Comp Biologists Optimization techniques : greedy and more systematic optimization strategies Markov Chain Monte Carlo: Algorithms to sample from probability distributions Classification : identifying classes of observation, category prediction Regression : estimating quantitative relationships among multiple variables, forecasting Structure learning : how to learn the structure of data Ensemble learning : combining learning machines commons.wikimedia.org

What s computational biology? Bioinformatics applies principles of information sciences and technologies to make the vast, diverse, and complex life sciences data more understandable and useful. Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology. Although bioinformatics and computational biology are distinct, there is also significant overlap and activity at their interface. [1] Wikipedia Learning: Information Knowledge, but what s more important than Knowledge?

"Information is any difference that makes a difference. Shannon/Turing/Bateson Digital revolution " It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. " Watson&Crick

The Human Genome Project (1990-2005) (http://www.nhgri.nih.gov/hgp/) Mapped Human Genes The new paradigm now emerging, is that all the genes will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical W. Gilbert (1991) 21

Gene finding and structure/function prediction (Sequence Structure Function) A typical vertebrate gene DNA I1 I2 I3 I4 I5 I6 E1 E2 E3 E4 E5 E6 E7 mrna Splicing Some sizes of human genes Name Size (kb) MRNA (kb) Introns β-globin 1.5 0.6 2 Insulin 1.7 0.4 2 Protein kinase C 11 1.4 7 Albumin 25 2.1 14 Catalase 34 1.6 12 LDL receptor 45 5.5 17 Factor VIII 186 9 25 Thyroglobulin 300 8.7 36 Dystrophin > 2000 17 > 50 Human β-globin 1 3 1 2 3 1 2 3 Example: alternative BIOL splicing 6385, of Spring the fly 17, sex Computational determination Biology gene 22

CF Gene Discovery (1989) Positional cloning: Linkage analysis Physical mapping cdna selection Sequencing Database search (alignment) 23

Single gene regulation (enhancer) CTCF (insulator/boundary) (promoter)

GRN: Respiration Module (Segal et al., Nature Genetics 03) Module genes known targets of predicted regulators? Predicted regulator Regulation program Module genes Hap4+Msn4 known to regulate module genes

Personal Medicine

(Synthetic Biology)

The Omics-cascade, but nature is unity Environment Comp. Biol. Syn. Biol. What s more interesting than understanding ourselves? Modified from ebookbrowse

Two levels of modeling Statistical (Macroscopic) and Population models Simple correlation: Y ~ X Probabilistic/Predictive: P(Y,X), P(Y X) Ῡ=f(x, α) = E[Y x] = Σ y P(Y=y X=x) e.g. f = a x + b (linear regression); Boyel s law: V = C(T) / p, Kinetic theory (Boltzmann); ρ ρ 2 x RT Brown s motion: = D, = D = (Einstein). t 6πηrN Biophysical/Biochemical (Microscopic) and Evolutionary (Dynamical) models x 2t

Chance-Life: Statistical Learning Probabilistic Graphical (chains/trees/dags) Models Directed (Bayesian Networks, Phylogeny),Undirected (Markov Networks:HMM/generative, CRF/discriminative) Representation (Conditional independence, H-C Thm: MN=Gibbs), Inference (DP/VP), Learning (MLE/BE, EM/MCMC, Sparsity, Regularization) http://www.pgm-class.org/ Machine Learning & Learning Machines ANN, GA, Perceptron, SVM, Boosting, Boltzman Machine http://jan2012.ml-class.org/ Belief, behavior, Boosting (Efron)

Machine Brain convergence IBM's supercomputer Deep Blue (May 1997) beat chess master Garry Kasparov in a six-game match, in a dramatic reversal of their battle the previous year. Machine: extension of human being, replacing or beating man in specific functional task. On March 15, 2016, the distributed version of AlphaGo won 4-1 against Lee Sedol, whose Elo rating is now estimated at 3,520. The distributed version of AlphaGo is now estimated at 3,586. It is unlikely that AlphaGo would have won against Lee Sedol if it had not improved since 2015.

Cognitive Computing