Statistics 571 Statistical Methods for Bioscience I

Similar documents
Probability and Statistics Curriculum Pacing Guide

AP Statistics Summer Assignment 17-18

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Shockwheat. Statistics 1, Activity 1

STA 225: Introductory Statistics (CT)

Lesson M4. page 1 of 2

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Minitab Tutorial (Version 17+)

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Introduction to the Practice of Statistics

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Statewide Framework Document for:

Visit us at:

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Mathematics Success Level E

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Functional Skills Mathematics Level 2 assessment

Mathematics Success Grade 7

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

Grade 6: Correlated to AGS Basic Math Skills

Physics 270: Experimental Physics

Research Design & Analysis Made Easy! Brainstorming Worksheet

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

STAT 220 Midterm Exam, Friday, Feb. 24

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

On-Line Data Analytics

Math 96: Intermediate Algebra in Context

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Measurement. When Smaller Is Better. Activity:

The Editor s Corner. The. Articles. Workshops. Editor. Associate Editors. Also In This Issue

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Office Hours: Mon & Fri 10:00-12:00. Course Description

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Teaching a Laboratory Section

UNIT ONE Tools of Algebra

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Creating a Test in Eduphoria! Aware

Mathematics process categories

Broward County Public Schools G rade 6 FSA Warm-Ups

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

This scope and sequence assumes 160 days for instruction, divided among 15 units.

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Ohio s Learning Standards-Clear Learning Targets

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

Case study Norway case 1

Detailed course syllabus

Level 1 Mathematics and Statistics, 2015

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Measuring physical factors in the environment

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Analysis of Enzyme Kinetic Data

Download or Read Online ebook plant observation chart in PDF Format From The Best User Guide Database

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Julia Smith. Effective Classroom Approaches to.

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Physics Experimental Physics II: Electricity and Magnetism Prof. Eno Spring 2017

Introduction to Communication Essentials

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Test How To. Creating a New Test

Unit 3: Lesson 1 Decimals as Equal Divisions

Foothill College Summer 2016

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Statistics and Probability Standards in the CCSS- M Grades 6- HS

12- A whirlwind tour of statistics

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

Economics Unit: Beatrice s Goat Teacher: David Suits

Mathematics subject curriculum

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Grade 3 Science Life Unit (3.L.2)

End-of-Module Assessment Task

Spinners at the School Carnival (Unequal Sections)

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

What s Different about the CCSS and Our Current Standards?

BUS Computer Concepts and Applications for Business Fall 2012

Measures of the Location of the Data

CS Machine Learning

Python Machine Learning

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

The New York City Department of Education. Grade 5 Mathematics Benchmark Assessment. Teacher Guide Spring 2013

This document has been produced by:

Quantitative Research Questionnaire

Learning Lesson Study Course

Transcription:

Statistics 571 Statistical Methods for Bioscience I Lecture 1: Cecile Ane Lecture 2: Nicholas Keuler Department of Statistics University of Wisconsin Madison Fall 2009

Outline 1 Course Information 2 Introduction to Statistics 3 Descriptive Statistics 4 Shape of distributions

Outline 1 Course Information 2 Introduction to Statistics 3 Descriptive Statistics 4 Shape of distributions

Course Information www.stat.wisc.edu/courses/st571-ane/ Read the entire syllabus carefully. Complete the survey sheet. Switch section? Late homework Block the dates and time for the exams NOW: Tuesday, October 13 Tuesday, November 24 Wednesday, December 20, 7:45am - 9:45am No discussion this week

Course Information Get help beyond lectures: Reading materials, course website, forum, discussion sections, office hours, etc. Your feedback is highly appreciated. Examples of comments from previous years: make shorter exams make slides or write big powerpoint is good, but instead of having the examples printed off, leave blank space, go over on the board [...] have us copy them down get more practical advice Your evaluations are most valuable to me! Ask questions, get involved! Forum on Learn UW.

Course Information: Why R? Why not Microsoft Excel? Limitations of Microsoft Excel: 65K raw data size limit little data protection, little/no tracking XL2000 has many errors, without warning. Can get negative correlation coefficients, wrong pie charts, wrong paired t-test with missing values, does not accept categorical predictors in multiple regression, etc. Some bugs are fixed, new bugs are created in XL2003. Still doesn t have distributions right. Lots of errors known over 10 years without fixes. McCullough & Wilson (2005) On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis 49(4):1244-1252 Foresight: The International Journal of Applied Forecasting, issue 3 (2006) R. Hesse. Incorrect Nonlinear Trend Curves in Excel B. McCullough. The Unreliability of Excel s Statistical Procedures P. Fields. On the Use and Abuse of Microsoft Excel

Expectations with Computing and R. Resources on course webpage. Tutorial at first discussion. Good practice: keep assignments/projects in separate folders. Keep a plain text file (.r extension) with the list of commands to replicate what you have done. Example... Being able to use a computing software is essential for you to analyze your own data when the time comes. My goal = you own the methods and gain independence. I expect that you will experiment with R, try things on your own, so as to get a good understanding of how R works. Getting error and warning messages is normal while experimenting. Don t get stuck: get help! Forum, friends, TAs, instructor.

Expectations with Assignments. Must be written clearly. When including R commands and output, don t put them alone. Add comments to explain in English what the commands are doing, and interpret the results. When using graphs, include axis labels, legend if necessary, etc. Handwritten legends are okay.

Outline 1 Course Information 2 Introduction to Statistics 3 Descriptive Statistics 4 Shape of distributions

Introduction to Statistics What is statistics? Branch of scientific inquiry: helps to determine cause and association, and to make predictions. Organize and summarize data from a sample (i.e. a subset of a population). Use information in the data to draw conclusions about a population (i.e. all individuals of a particular type). Population vs. sample A book vs. a few pages of the book. All corn plants vs. 100 plants in a field.

Introduction to Statistics Probability vs. statistics Probability: mathematics of chance and randomness. Properties of samples when the population is known, Deductive approach. Statistics: a sample is available, Conclusions about a population when one sample is known. Inductive approach. Three main topics Descriptive statistics: display & summarize data in a sample. Probability: Given a population, study the uncertainty associated with a sample taken from the population. Statistics: Given a sample, learn methods to draw conclusions about a population, while taking into account of uncertainties in the sample.

Russell et al. (2007) Science 317:941-943

Russell et al. (2007) Science 317:941-943

Outline 1 Course Information 2 Introduction to Statistics 3 Descriptive Statistics 4 Shape of distributions

Descriptive Statistics Example: height of seedlings Thirteen (13) red pine seedlings were sampled from a nursery in Wisconsin. The heights of these seedlings were (in cm): 42 23 43 34 49 56 31 47 61 54 46 34 26 Graphical methods describe data by visual/graphical techniques. Stem-and-leaf plot*, dot plot Histogram Numerical methods extract summarizing numbers that characterize the data set and reveal main features. Measures of location/center: Sample mean Sample median* Sample quantiles, box plot* Measures of spread: Sample range Interquartile range (IQR)* Sample variance, standard deviation

Descriptive Statistics: stem-and-leaf, dotplots A stem-and-leaf plot: 2 36 3 144 4 23679 5 46 6 1 An alternative is a dot plot. Stem-and-leaf plots and dot plots have information about the shape, center, spread of the data distribution, as well as outliers and # of observations.

Descriptive Statistics: Histogram Divide data into non-overlapping classes. Decide the number of obs (i.e. frequencies) in each class (i.e. tally). Draw rectangles with height = frequencies and base = class intervals. For the height of seedlings, class 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5 frequencies

Descriptive Statistics: Histogram

Ex: milk production of organic cows Dot plot of milk 10 20 30 40 50 60 Histogram of milk 0 2 4 6 10 20 30 40 50 60 Histogram of milk 0 2 4 6

Descriptive Statistics: Remarks Histogram is a pictorial representation of the data frequency distribution. Note the boundary values for the class intervals. Histograms have information about shape, center, spread of the data distribution.

Descriptive Statistics: Sample mean The sample mean of a data set of y 1, y 2,..., y n provides a measure of location/center of the data set. To compute the sample mean: add all the values n i=1 y i = y 1 + y 2 + + y n divide by the number of observations n ȳ = n i=1 y i n Seedlings: 42 23 43 34 49 56 31 47 61 54 46 34 26 y 1 = 42, y 2 = 23, y 3 = 43,..., y 13 = 26 and thus ȳ = n i=1 y i n = 546 13 = 42 cm. ȳ is the balance point of the dot plot. Sometimes n i=1 y i is abbreviated as y i.

Descriptive Statistics: Sample variance s 2 = n i=1 (y i ȳ) 2 n 1 Height of seedlings: y 1 = 42, y 2 = 23,..., and we had ȳ = 42. 42 23 43 34 49 56 31 47 61 54 46 34 26 s 2 = = 138.17 Sample variance measures the average squared deviation. Why dividing by n 1 but not n? For hand calculation, use working formulas or s 2 = s 2 = 1 n 1 [ n i=1 y 2 i ( n i=1 y i) 2 n [ n ] 1 yi 2 n(ȳ) 2 n 1 i=1 ]

Descriptive Statistics: Sample standard deviation Sample standard deviation (SD) is the square root of sample variance s = s 2 Height of seedlings: s = 138.17 = 11.75 cm. Sample standard deviation is a typical deviation, as ±1s captures about 2/3 of bell-shaped data. > mean(milk) [1] 36.21429 > sd(milk) [1] 9.760033 sd=9.8 sd=9.8 16.6 20 26.4 30 36.2 40 46.0 50 55.8

The mean is sensitive to large values Suppose data values are 2 4 6 7 8 10 12 Then ȳ =, s =.42 Suppose data values are 2 4 6 7 8 10 102 Then ȳ, s = 36.32

Key R commands > hts = c(42, 23,43,34,49,56,31,47,61, 54,46,34, 26) # enter data > hts [1] 42 23 43 34 49 56 31 47 61 54 46 34 26 > length(hts) # sample size [1] 13 > stem(hts) # stem-and-leaf plot The decimal point is 1 digit(s) to the right of the 2 36 3 144 4 23679 5 46 6 1 > hist(hts) # histogram plot > mean(hts) # sample mean [1] 42 > var(hts) # sample variance [1] 138.1667 > sd(hts) # sample standard deviation [1] 11.75443

Outline 1 Course Information 2 Introduction to Statistics 3 Descriptive Statistics 4 Shape of distributions

Shape of the distribution of the data Weight of soil: example 1 Actual weight of 15 2-lb. bags of soil used for a lab experiment. 2.36 2.27 2.42 2.13 2.19 2.33 2.54 2.21 2.06 2.36 2.51 2.45 2.12 2.32 2.29 The decimal point is 1 digit(s) to the left of the 20 6 21 239 22 179 23 2366 24 25 25 14 mean ȳ = 2.30, standard deviation s = 0.14

Shape of the distribution of the data Weight of soil: example 2 19 3 20 5 21 8 22 47 23 344789 24 1124 mean ȳ = 2.30, standard deviation s = 0.15 Mean and spread are similar to ex.1, but the distribution is...

Shape of the distribution of the data Weight of soil: example 3 20 69 21 35779 22 9 23 24 01358 25 15 mean ȳ = 2.30, standard deviation s = 0.17 Mean and spread are similar to ex.1, but the distribution is... Need to look at the data! not just at numerical summaries.

Soil weight examples Frequency 0 1 2 3 4 Frequency 0 1 2 3 4 5 6 Frequency 0 1 2 3 4 5 2.0 2.1 2.2 2.3 2.4 2.5 2.6 sample1 1.9 2.0 2.1 2.2 2.3 2.4 2.5 sample2 2.0 2.1 2.2 2.3 2.4 2.5 2.6 sample3

Types of data There are two broad classes of data: quantitative (i.e. numerical) and qualitative (i.e. categorical) data. For quantitative data, each observation has a number associated with it. ex: weight, milk yield, or # of cows on a farm. either continuous or discrete. ex: weight and milk yield are data and # of cows on a farm are data. For qualitative data, each observation can be put into a category, which is either nominal or ordered. ex: 15 cows are assigned to 3 types of beds or 3 different diet types (VC=Vitamin and Choline): bed types # of cows diet types # of cows Hay 5 high in VC 5 Cement 6 low in VC 5 Others 4 control 5