Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Similar documents
General Microbiology (BIOL ) Course Syllabus

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

BIOH : Principles of Medical Physiology

Course Content Concepts

CS 100: Principles of Computing

Biological Sciences, BS and BA

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor.

Biological Sciences (BS): Ecology, Evolution, & Conservation Biology (17BIOSCBS-17BIOSCEEC)

INTRODUCTION TO SOCIOLOGY SOCY 1001, Spring Semester 2013

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

GUIDELINES FOR HUMAN GENETICS

NUTRITIONAL SCIENCE (H SCI)

Handbook for the Graduate Program in Quantitative Biomedicine

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

NUTRITIONAL SCIENCE (AGLS)

Navigating the PhD Options in CMS

BIOL 2421 Microbiology Course Syllabus:

EGRHS Course Fair. Science & Math AP & IB Courses

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

BIOL 2402 Anatomy & Physiology II Course Syllabus:

Syllabus Foundations of Finance Summer 2014 FINC-UB

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Master s Programme Comparative Biomedicine

Biology 10 - Introduction to the Principles of Biology Spring 2017

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

ACCOUNTING FOR MANAGERS BU-5190-OL Syllabus

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Event on Teaching Assignments October 7, 2015

Science Fair Project Handbook

What Teachers Are Saying

GUIDELINES FOR COMBINED TRAINING IN PEDIATRICS AND MEDICAL GENETICS LEADING TO DUAL CERTIFICATION

Biomedical Sciences (BC98)

AGN 331 Soil Science. Lecture & Laboratory. Face to Face Version, Spring, Syllabus

Getting Started with Deliberate Practice

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Mathematics Program Assessment Plan

Data Structures and Algorithms

AGN 331 Soil Science Lecture & Laboratory Face to Face Version, Spring, 2012 Syllabus

OUTLINE OF ACTIVITIES

M.S. in Environmental Science Graduate Program Handbook. Department of Biology, Geology, and Environmental Science

Computational Data Analysis Techniques In Economics And Finance

Today s Presentation

BI408-01: Cellular and Molecular Neurobiology

Professors will not accept Extra Credit work nor should students ask a professor to make Extra Credit assignments.

Syllabus Fall 2014 Earth Science 130: Introduction to Oceanography

FIN 571 International Business Finance

CROP GROWTH AND DEVELOPMENT (AND IMPROVEMENT)

Executive Guide to Simulation for Health

BENG Simulation Modeling of Biological Systems. BENG 5613 Syllabus: Page 1 of 9. SPECIAL NOTE No. 1:

Contemporary Opportunities and Challenges for teaching Pharmacogenomics to Student Pharmacists

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Master's Programme Biomedicine and Biotechnology

CS Course Missive

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Office Hours: Mon & Fri 10:00-12:00. Course Description

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

MATH Study Skills Workshop

Mie University Graduate School of Bioresources Graduate School code:25

E C C. American Heart Association. Basic Life Support Instructor Course. Updated Written Exams. February 2016

Accounting 380K.6 Accounting and Control in Nonprofit Organizations (#02705) Spring 2013 Professors Michael H. Granof and Gretchen Charrier

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

Syllabus: PHI 2010, Introduction to Philosophy

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

MAR Environmental Problems & Solutions. Stony Brook University School of Marine & Atmospheric Sciences (SoMAS)

Statewide Framework Document for:

Department of Anatomy and Cell Biology Curriculum

Carolina Course Evaluation Item Bank Last Revised Fall 2009

BA 130 Introduction to International Business

ABI11111 ABIOSH Level 5 International Diploma in Environmental Sustainability Management

Mktg 315 Marketing Research Spring 2015 Sec. 003 W 6:00-8:45 p.m. MBEB 1110

M.SC. BIOSTATISTICS PROGRAMME ( ) The Maharaja Sayajirao University of Baroda

Managerial Decision Making

JFK Middle College. Summer & Fall 2014

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Jeff Walker Office location: Science 476C (I have a phone but is preferred) 1 Course Information. 2 Course Description

International Business Principles (MKT 3400)

GOING VIRAL. Viruses are all around us and within us. They replicate

PATHOPHYSIOLOGY HS3410 RN-BSN, Spring Semester, 2016

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Program in Molecular Medicine

Artificial Neural Networks written examination

CS 101 Computer Science I Fall Instructor Muller. Syllabus

HISTORY 108: United States History: The American Indian Experience Course Syllabus, Spring 2016 Section 2384

Career Series Interview with Dr. Dan Costa, a National Program Director for the EPA

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

Foothill College Summer 2016

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

MAT 122 Intermediate Algebra Syllabus Summer 2016

STUDENT PERCEPTION SURVEYS ACTIONABLE STUDENT FEEDBACK PROMOTING EXCELLENCE IN TEACHING AND LEARNING

Introduction to Psychology

CS177 Python Programming

ACCOUNTING FOR MANAGERS BU-5190-AU7 Syllabus

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

Introduction to Forensic Drug Chemistry

Health and Human Physiology, B.A.

Course Syllabus Chem 482: Chemistry Seminar

Transcription:

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.03 Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine Institute for Computational Biomedicine jgm45@cornell.edu TA: Manisha Munasinghe mam737@cornell.edu TA: Zijun Zhao ziz2003@med.cornell.edu Spring 2018: Jan. 25 - May 8 T/Th: 8:40-9:55

Why you re here Spring 2018 Course Announcement Quantitative Genomics and Genetics Professor: Jason Mezey Biological Statistics and Computational Biology (Cornell) Department of Genetic Medicine (Weill) Dates: Jan. 25 May 8 Days: Tues. and Thurs. Time: 8:40 am - 9:55 am Room for Cornell, Ithaca: 224 Weill Hall Room for WCMC: Belfer 204A or 302A COURSE DESCRIPTION: A rigorous treatment of analysis techniques used to understand the genetics of complex phenotypes when using genomic data. This course will cover the fundamentals of statistical methodology with applications to the identification of genetic loci responsible for disease, agriculturally relevant, and evolutionarily important phenotypes. Data focus will be genome-wide data collected for association, inbred, and pedigree experimental designs. Analysis techniques will focus on the central importance of generalized linear models in quantitative genomics with an emphasis on both Frequentist and Bayesian computational approaches. Tools learned in class will be implemented in the computer lab, during which the language R will be taught from the ground up (no previous experience required or expected) GRADING: S/U or Letter Grade. CREDITS: 4 (lecture + computer lab). SUGGESTED PREREQUISITES: At least one class in Genetics and one class in probability and / or statistics.

Today Logistics (time/locations, registering, syllabus, schedule, requirements, computer labs, video-conferencing, etc.) Intuitive overview of the goals and the field of quantitative genomics The foundational connection between biology and probabilistic modeling Begin our introduction to modeling and probability

Times and Locations 1 This is a distance learning class taught in two locations: Cornell, Ithaca and Weill, NYC I will teach all lectures from EITHER Ithaca or NYC (all lectures will be video-conferenced) I expect questions from both locations Lectures will be recorded: These will be posted along with slides / notes These will also function as backup (if needed) I encourage you to come to class...

Times and Locations II Lectures are (almost) every Tues. / Thurs. 8:40-9:55AM - see class schedule Ithaca lecture will always be 224 Weill Hall DEPENDING ON THE DATE, the Weill lecture location will be: Belfer 302-A or 204-A or Other A spreadsheet will be made available with these locations (please read it carefully!!)

Times and Locations III There is a REQUIRED computer lab (if you take the course for credit) Note that in Ithaca: Lab 1 will meet 5-6PM on Thurs. (!!) in MNLB30A (!!) Mann library Lab 2 Fri. 8-9AM in Weill 226 (bring your laptop every week!!) The lab will be taught by Manisha Note that in NYC: The lab will meet 4-5PM on Thurs. (!!) The lab will be taught by Zijun in LC-504 (Conference Room 5th floor - 1300 York Ave Building) Please bring a laptop EVERY week only If you have an unavoidable conflict at this time, please send me an email (we will do our best to accommodate but...) THE FIRST COMPUTER LAB IS NEXT WEEK = Feb. 1 (!!)

Times and Locations IV Jason will hold office hours: On both campuses by video-conference Thurs. 2:30-4:30PM Office hours will be conducted using Zoom: https://cornell.zoom.us/j/724550601 NOTE: unofficial help sessions can be scheduled with Jason or Manisha or Zijun by appointment NO office hours this week - first will be Feb 1 (!!)

Registering for the class I You may take this class for a letter grade, S/U, or Audit If you can register for this class, please do so (even if you plan to audit!!) If you are a Cornell undergraduate or graduate or WCMC graduate you can officially register for this course If you are a student at MSKCC or Rockefeller (or other) you may register but you will need to fill out the form Application for Non- Degree Student and make sure it gets to the WCMC registrar: registrar@med.cornell.edu - PLEASE DO THIS TODAY If you are a postdoc at Cornell and would like to register for the course, please come talk to me If you are a postdoc at MSKCC or Rockefeller (or other) you may register but you will need to fill out the form Application for Non- Degree Student and make sure it gets to the WCMC registrar: registrar@med.cornell.edu - PLEASE DO THIS TODAY

Registering for the class II If you are not a postdoc (e.g., Technician, Research Associate, etc.) you may be able to register at Cornell or WCMC If you are at Cornell, please check with the registrar or appropriate office If you are at another institution, check with your institution Human Resources Office (then contact me) If you audit or do not register officially, I strongly recommend that you do the work for the class, i.e. homework/exams/project/lab (we will grade your work!) My observation is that you are likely to be wasting your time if you do not do the work but I leave this up to you...

Registering for the class III In Ithaca: You must register for both the lecture (3 credits) and computer lab (1 credit) if you take the course for a letter grade If you are an undergraduate, register for BTRY 4830 (lecture and lab); graduate student, register for BTRY 6830 (same) In NYC: Weill: the course (PBSB.5021.03) should be available in the Graduate School drop-down at learn.weill.cornell.edu Rockefeller: email Kristen Cullen cullenk@mail.rockefeller.edu If Other: check with WCMC registrar for instructions Please contact me if there are any issues with registering (!!)

Grading We will grade undergraduates and graduates separately (!!) Grading: problem sets (20%), computer lab attendance (5%), project (25%), mid-term (20%), final (30%) A short problem set ~every 2 weeks A single project (~1 month) Exams will be take-home (open book)

Class Resource 1: Website The class website will be a under the Classes link on my site: http://mezeylab.cb.bscb.cornell.edu/

Website resources We will post information about the course and a schedule updated during the semester (check back often!!) There is no textbook for the class but I will post slides for all lectures There will also be supplementary readings (and other useful documents) that will be posted We will post videos of lectures (delay in most cases) All homeworks, exams, keys, etc. will be posted elsewhere (see slides that follow) All computer labs and code will be posted elsewhere (see slides that follow)

Class Resources II: Piazza MAKE SURE YOU SIGN UP ON PIAZZA whether you officially register or not = all communication for the course (!!) Main: http://support.piazza.com/customer/en/portal/articles/ 1646659-enroll-in-a-class Class: https://piazza.com/class/jckpr075ilk5n4 Step 1: Sign up on Piazza (if you don t have an account already)! Step 2: Enroll in BTRY 6830 (regardless if you are grad or undergrad) If you have problems getting on to Piazza - email Jason or Manisha and we will get you set up

Email and Posting ALL EMAIL for any aspect of the course must be sent through PIAZZA (we will stop answering direct emails after the first week of the course) PLEASE DON T email Jason / Manisha s / Zijun s direct email after the first week (unless its an emergency) Posting Protocol: Post all questions and comments on Piazza. Public posts (Let the community of students and instructors help out) Private posts (To Jason and Manisha and Zijun) Please note that expected response times to questions will be minimum >24hrs (sometimes longer...) depending on the availability of the instructors We encourage public posts so that your classmates can help you out as well

Class Resource III: CMS Assignments will be posted on CUCS CMS Class: https://cms.csuglab.cornell.edu/ ) Main: http://www.cs.cornell.edu/projects/cms/userdoc/ All submissions should be made through the CMS website. If you do not have a NetID (i.e., you are not at Cornell, Ithaca) please email me directly at jgm45@cornell.edu with the subject Register on CMS and I will get you set up Please don t email your submissions to Jason or Manisha or Zijun

Should I be in this class? No probability or statistics: not recommended Limited probability or statistics (high school, a long time ago, etc.): if you take the class be ready to work (!!) Prob / Stats (e.g. BTRY 4080+4090 or BTRY 6010+6020 in Ithaca, Quantitative understanding in biology at Weill, etc.): you ll be fine No or limited exposure to genetics: you ll be fine No or limited exposure to programming: you ll be fine (we will teach you programming in R from the ground up) Strong quantitative background (e.g. stats or CS graduate student): you may find the intuitive discussion of quantitative subjects and the applications interesting

Tell us about you Please fill out the following survey: https://goo.gl/forms/h8h0cvh69ekxmdko2 We will post the link and send you out reminders on Piazza (please sign up ASAP!) Please do this even if you are just sitting in the class (!!) This helps us with logistics and class planning that is as optimal as possible (within our constraints)

What you will learn in this class I A rigorous introduction to basics of probability and statistics that is intuition based (not proof based) Foundational concepts of how probability and statistics are at the core of genetics, which are complete enough to build additional / more advance understanding (i.e., enough to get your hooks into the subject ) Exposure to many advanced probability / statistics / genetics / algorithmic concepts that will allow you to build additional understanding beyond this class (as brief as a mention to entire lectures - depending on the subject) Clear explanations for convincing yourself that the basics of mathematics and programing are not hard (i.e. anyone can do it if they devote the time)

What you will learn in this class II An intuitive and practical understanding of linear models and related concepts that are the foundation of many subjects in statistics, machine learning, and computational biology The computational approaches necessary to perform inference with these models (EM, MCMC, etc.) The statistical model and frameworks that allow us to identify specific genetic differences responsible for differences in organisms that we can measure You will be able to analyze a large data set for this particular problem, e.g. a Genome-Wide Association Study (GWAS) You will have a deep understanding of quantitative genomics that from the outside seems diffuse and confusing

Questions about logistics?

Subject overview We know that aspects of an organism (measurable attributes and states such as disease) are influenced by the genome (the entire DNA sequence) of an individual This means difference in genomes (genotype) can produce differences in a phenotype: Genotype - any quantifiable genomic difference among individuals, e.g. Single Nucleotide Polymorphisms (SNPs). Other examples? Phenotype - any measurable aspect of an organisms (that is not the genotype!). Examples?

Example: People are different... An illustration Physical, metabolism, disease, countable ways. We know that environment plays a role in these differences...and for many, differences in the genome play a role For any two people, there are millions of differences in their DNA, a subset of which are responsible for producing differences in a given measurable aspect.

An illustration continued... The problem: for any two people, there can be millions of differences their genomes... How do we figure out which differences are involved in producing differences and which ones are not? This course is concerned with how we do this. Note that the problem (and methodology) applies to any measurable difference, for any type of organism!!

Why do we want to know this? If you know which genome differences are responsible: From a child s genome we could predict adult features We target genomic differences responsible for genetic diseases for gene therapy We can manipulate genomes of agricultural crops to be disease resistant strains We can explain why a disease has a particular frequency in a population, why we see a particular set of differences These differences provide a foundation for understanding how pathways, developmental processes, physiological processes work The list goes on...

Quantitative genetics and connection to other disciplines Broad Classification of Fields of Genetics: Modeling Genetic Fields: quantitative genetics; system genetics; population genetics; etc. Mechanism Genetic Fields: Molecular Genetics; Cellular Genetics; etc. Model System Genetic Fields: Human Genetics; Yeast Genetics; etc. Subject Genetic Fields: Medical genetics; Developmental Genetics; Evolutionary Genetics; Agricultural Genetics, etc. Quantitative genomics is a field concerned with the modeling of the relationship between genomes and phenotypes and using these models to discover and predict

History of genetics (relevant to Quantitative Genetics) In sum: during the last two decades, the greater availability of DNA sequence data has completely changed our ability to make connections between genome differences and phenotypes

Connection of genomics-genetics Traditionally, studying the impact / relationship of the genome to phenotypes was the province of fields of Genetics Given this dependence on genomes, it is no surprise that modern genetic fields now incorporate genomics: the study of an organism s entire genome (wikipedia definition) However, one can study genetics without genomics (i.e. without direct information concerning DNA) and the merging of geneticsgenomics is quite recent

Present / future: advances in nextgeneration sequencing driving the field

Why this is a good time to be learning about this subject Mapping (identifying) genotypes (genetic loci) with effects on important phenotypes is fast becoming the major use of genomic data and a major focus of genomics However, the data collection, experimental, and statistical analysis techniques for doing this are still being developed The current statistical approaches are the focus of this course (i.e., you will have a solid foundation by the end) The importance is just now starting to permeate broadly (i.e., we are now in the internet generation for genomics and the impact of genomics on biology) The basic statistical approaches are (=should be) applied in ANY analysis of ANY genomic data for ANY purpose

That s it for today Next lecture, we will begin our formal and technical introduction to probability We will start by defining the concepts of a system, experiments and experimental trials, and sample outcomes and sample spaces