Variables, distributions, and samples. Phil 12: Logic and Decision Making Spring 2011 UC San Diego 4/21/2011

Similar documents
Probability and Statistics Curriculum Pacing Guide

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Research Design & Analysis Made Easy! Brainstorming Worksheet

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

AP Statistics Summer Assignment 17-18

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Grade 6: Correlated to AGS Basic Math Skills

Introduction to the Practice of Statistics

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

Preliminary Chapter survey experiment an observational study that is not a survey

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Mathematics subject curriculum

Algebra 2- Semester 2 Review

STA 225: Introductory Statistics (CT)

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Graduate Division Annual Report Key Findings

Lecture 1: Machine Learning Basics

Science Fair Project Handbook

WORK OF LEADERS GROUP REPORT

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

CWSEI Teaching Practices Inventory

Level 1 Mathematics and Statistics, 2015

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Math 96: Intermediate Algebra in Context

Functional Skills Mathematics Level 2 assessment

Innovative Methods for Teaching Engineering Courses

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Arizona s College and Career Ready Standards Mathematics

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

Introduction to Questionnaire Design

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Lecture 2: Quantifiers and Approximation

12- A whirlwind tour of statistics

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Physics 270: Experimental Physics

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Lesson M4. page 1 of 2

16.1 Lesson: Putting it into practice - isikhnas

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Case study Norway case 1

Quantitative Research Questionnaire

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

School Size and the Quality of Teaching and Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

DO YOU HAVE THESE CONCERNS?

UNIT ONE Tools of Algebra

Physical Features of Humans

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Iowa School District Profiles. Le Mars

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Measures of the Location of the Data

Mathematics process categories

The Editor s Corner. The. Articles. Workshops. Editor. Associate Editors. Also In This Issue

Visit us at:

Chapter 4 - Fractions

The Good Judgment Project: A large scale test of different methods of combining expert predictions

General Microbiology (BIOL ) Course Syllabus

Teacher Action Research Multiple Intelligence Theory in the Foreign Language Classroom. By Melissa S. Ferro George Mason University

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

NCEO Technical Report 27

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Cognitive Thinking Style Sample Report

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Unit 3: Lesson 1 Decimals as Equal Divisions

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Math 121 Fundamentals of Mathematics I

Chemistry 141. Professor James F Harrison. 1:30-2:30 pm MWF Room 37 Chemistry Basement. Office Hours

Examinee Information. Assessment Information

How do adults reason about their opponent? Typologies of players in a turn-taking game

Spinners at the School Carnival (Unequal Sections)

2016 Warren STEM Fair. Monday and Tuesday, April 18 th and 19 th, 2016 Real-World STEM

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Aalya School. Parent Survey Results

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

OFFICE SUPPORT SPECIALIST Technical Diploma

Abu Dhabi Indian. Parent Survey Results

How to set up gradebook categories in Moodle 2.

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

MULTIMEDIA Motion Graphics for Multimedia

Abu Dhabi Grammar School - Canada

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Probability estimates in a scenario tree

Inquiry Learning Methodologies and the Disposition to Energy Systems Problem Solving

MATH Study Skills Workshop

What is this species called? Generation Bar Graph

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Statistics and Probability Standards in the CCSS- M Grades 6- HS

What effect does science club have on pupil attitudes, engagement and attainment? Dr S.J. Nolan, The Perse School, June 2014

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Transcription:

Variables, distributions, and samples Phil 12: Logic and Decision Making Spring 2011 UC San Diego 4/21/2011

Midterm this Tuesday! Don t need a blue book or scantron Just bring something to write with Sample midterm Not posting an answer key Check answers by checking text, notes, in section, office hours, email If asking me or TAs, must talk through what you think the answer might be, talk through options, reasoning

Anonymous clicker question Do you want me to hold office hours Monday afternoon or evening? A. Yes, Monday 2-4pm B. Yes, Monday 3-5pm C. No, I m good 3

Review Observational research involves careful recording and analysis of what is observed - Without an attempt to manipulate what happens Naturalistic vs. participant observation Risks that must be minimized: - Observer bias - Reactivity - Anthropomorphizing

Coding Schemes A coding scheme is a set of categories used to classify observed phenomena - extract data so as to learn from the observations How can a coding scheme be poorly designed? - fail to have a category for some phenomena you care about recording and analyzing - use one category for phenomena you would like to distinguish

Recording continuously vs. selectively Continuous observation: record what is happening at every moment of time Time sampling: recording what is happening at predetermined intervals Event sampling: recording whenever an event of a specified kind occurs Situation sampling: recording what happens in a variety of different situations (locations) 6

Clicker question To determine how many students carry backpacks, a researcher sits outside the library and records, for every fifth students who exits, whether they have a backpack. The researcher is performing A. Continuous observation B. Time sampling C. Event sampling D.Situation sampling 7

Variables The data from observational research is analyzed in terms of variables A variable is a characteristic or feature of an event that varies(i.e., takes on different values) - Variables of a thrown ball: velocity, momentum, direction, spin,... - Variables of human hair: color, length, texture,... - Variables of human cognition: memory span, speed of reasoning, emotional state,...

Types of variables Variables differ in the type of measurement of the values of the variable that is possible. Sometimes one refers to types of scales rather than types of variables. 1. Categorical or nominal variables 2. Ordinal or rank variables 3. Interval variables 4. Ratio variables

Types of variables - 1 Categorical or nominal variables: items can be assigned to a category (whose members can then be counted, or compared on another variable) - Examples: Gender: male/female Major: psychology, political science, economics,... Organisms: Plant, Animal, Bacteria, Virus,...

Types of variables - 2 Ordinal or rank variables: There is a rank-order to the values the variable may take - Numbers might be assigned to the items, but since there is no metric one cannot compare how much higher or lower one item on the scale is than another - Examples: Movies; *, **, ***, **** Class rank: top 10, next 10, etc. Patient condition: resting and comfortable, stable, guarded, and critical Socioeconomic class: low, middle, high

Types of variables - 3 Interval variables: equal differences between numbers assigned to items reflect equal differences between the values being measured. - Allows additive comparison (e.g., x is three more than y) - But lacking a natural zero-point, does not permit multiplicative comparison (e.g., x is three times y) - Examples: Intelligence: IQ score Temperature: in degrees Celsius or Fahrenheit Personality: degree of extroversion

Types of variables - 4 Ratio variables: items are rated on a scale with equal intervals and a natural 0-point. - Allows for both additive and multiplicative comparison - Examples: Age: in year, months, days,... Temperature: in degrees Kelvin Time: in milliseconds, seconds, years,... Velocity, acceleration, etc. - Interval and ratio data often treated similarly and counted as score data

Summary: Types of Variables Type of variable Example Categorical or nominal college major Score variables Ordinal or rank Interval Ratio patient condition temperature in degrees Fahrenheit age

Clicker question The variable number of clicker responses is A. A categorical or nominal variable B. An ordinal or rank variable C. An interval variable D. A ratio variable

Clicker question On the CAPE evaluations, you respond to questions such as Exams are representative of the course material (the variables being measured) using the following answer choices (values): 1 = strongly disagree 2 = disagree 3 = neither 4 = agree 5 = strongly agree What type of variable are these questions? A. A categorical or nominal variable B. An ordinal or rank variable C. An interval variable D. A ratio variable

Visual representations of data

Nominal & ordinal variables: Bar graphs & Pie Charts Example: Profile of pet ownership in San Diego County

Score variables: Histograms Histograms rather than bar graphs used because score variables are continuous This is done by creating bins and tabulating the number of items in each bin The size of bins can create radically different pictures of the distribution! bin size: 0.25 bin size: 1

Daily Life Activities Bin size: 1 hr 25 Studying (online + offline) 20 No. of people 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hours

Daily Life Activities Bin size: 0.5 hr 25 Studying (online + offline) 20 No. of people 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hours

Daily Life Activities Bin size: 0.25 hr 25 Studying (online + offline) 20 No. of people 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hours

Normal and non-normal distributions Normal distributions - Have a single peak - Scores equally distributed around the peak - Fewer scores further from the peak Non-normal distributions Skewed Bimodal

Daily Life Activities N = 32 25 Studying (online + offline) 20 No. of people 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hours

Daily Life Activities N = 32 25 In class 20 No. of people 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Hours

Clicker question The distribution below is <100 100-199 200-299 300-399 400-499 500-599 600-699 700-799 >800 63 45 35 37 82 35 39 53 53 A. Normal since it has one peak B. Normal since scores are equally distributed around the peak C. Not normal since because there are not fewer scores further from the peak D. Not normal because scores are not equally distributed around the peak

Describing distributions Two principal measures: 1. Central the standard tendency deviation Two comparable distributions differing in central tendency 2. Variability Two distributions with same central tendency but differing in variability

Three measures of central tendency Mean: the arithmetic average--sum of all the scores divided by the number of instances Median: the score of which half are higher and half are lower Mode: the most frequent score Consider this distribution of values: 2, 6, 9, 7, 9, 9, 10, 8, 6, 7 mean = 73 / 10 = 7.3 median = mode = 7.5 9

Which measure to use? If the distribution is normal, all three measures of central tendency give the same result - The mean is the easiest to calculate and the most frequently reported If there are extreme outliers in one direction, the mean may be distorted - Exam scores: 21, 72, 76, 79, 82, 84, 87, 88, 90, 91, 95 Mean: 78.6 Median: 84 - In such a case, the median gives a better picture of the central tendency of the class

Measures of variability Variability concerns: How much do the scores vary? Range: the lowest value to the highest value 40 40 30 30 20 20 10 10 0 0 2 4 6 8 10 0 0 2 4 6 8 10

Measures of variability Variability concerns: How much do the scores vary? Range: the lowest value to the highest value Variance: (X-mean) 2 N Standard deviation: Variance 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Mean = 5.0 SD = 0 Mean = 5.0 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 SD = 1.04

Measures of variability Variability concerns: How much do the scores vary? Range: the lowest value to the highest value Variance: (X-mean) 2 N Standard deviation: Variance - Intuitive interpretation: 1 SD: the part of the range in which 68% of the scores fall 2 SD: the part of the range in which 95% of the scores fall 3 SD: the part of the range in which 99% of the scores fall

Variance Consider a distribution: 4 5 5 6 6 6 7 7 8-2 -1-1 6 0 0 1 1 2 4 1 1 0 0 0 1 1 4 Mean = 6 X - mean (X-mean) 2 (X-mean) 2 Variance = = N 12 9 = 1.33 SD = variance = 1.33 = 1.15 Range of 1 SD Range of 2 SD = 6 ± 1.15 = 4.85 to 7.15 = 6 ± 2.30 = 3.70 to 8.30

Range and Standard Deviation range 68% of scores 95% of scores

Clicker question On an exam on which scores were distributed normally and the mean was 86 and the SD was 4, A. 68% of the scores were between 78 and 94 B. 68% of the scores were between 82 and 90 C. 95% of the scores were between 78 and 94 D. 95% of the scores were between 82 and 90 E. None of the above

Populations The phenomena about which we seek to draw conclusions in a study are known as the population. Sometimes one can study each member of the population of interest But if the population is large: - - it may be impossible to study the whole population there may be no need to study the whole population

Samples A sample is a subset of the population chosen for study. From studying the distribution of a variable in a sample, one makes an estimate of the distribution in the actual population Sometimes the estimate from a sample may be more accurate than trying to study the population itself - U.S. Census

Is the sample biased? If information about the sample is to be informative about the actual population, the sample must be representative - Randomization: attempt to insure that the sample is representative by avoiding bias in selecting the sample Risk: inadvertently developing a misrepresentative sample - E.g., using telephone numbers in the phonebook to sample electorate

Does the sample reflect the population? Does the mean of the sample reflect the mean of the actual population? - - - Sampling distribution simulation Very unlikely that the mean of the sample will exactly equal the mean of the population Key question: how much does the mean of the sample vary from the mean of the actual population? Given the mean of a sample, what is the range within which the mean of the actual population lies? - To determine this, the standard deviation measure is very useful

Standard deviation and mean In 68% of samples, the mean of the population will fall within 1 standard deviation of the mean of the sample Sample mean In 95% of samples, the mean of the population will fall within 2 standard deviations from the mean of the sample

What happens as sample size gets larger? As sample size grows, the SD of the sample shrinks So with larger samples, the range of 2 standard deviations shrinks Assume sample mean is 50: Sample size Range of 2 SD (95% confidence interval) Range of 3 SD (99% confidence interval) 10 34.5-65.5 29.5-70.5 20 39-61 35.6-64.4 50 43-57 40.9-59.1 100 45-55 43.5-56.5 500 47.8-52.2 47.1-52.9 1000 48.4-51.6 48-52

Example of estimating population mean from sample mean Example: age of people eating at the Food Court - Draw a sample to make inference of average age of people eating at the Food Court <17 17 18 19 20 21 22 23 24 25 >25 Population 6 18 23 34 32 18 26 29 14 10 10 Sample 2 1 3 1 2 1

Estimating real distribution <17 17 18 19 20 21 22 23 24 25 >25 Population 6 18 23 34 32 18 26 29 14 10 10 Sample 1 (n = 10) 2 1 3 1 2 1 Sample 2 (n=20) 1 2 4 6 3 2 2 Mean of the actual population: 20.63 Sample 1 Sample 2 Mean of the sample: 19.4 20.1 SD of the sample: 1.9 1.6 Range of 1 SD: 17.5-22.3 18.5-21.7 Range of 2 SD: 15.9-24.2 16.9-23.3 Want to predict more accurately? Use a larger sample size

Review Four types of variables: - Nominal ordinal interval ratio Values of variables are distributed - Important goal: characterizing the distribution Graphs - Bar graphs for nominal and ordinal variables - Histograms for score variables Normal versus non-normal distributions - Skewed, bimodal, etc

Review Two principal measures of distributions - Central tendency Mean, median, mode - Variability Range, variance, SD - 1 SD includes approx. 68% of scores - 2 SD includes approx. 95% of scores - 3 SD includes approx. 99% of scores

Review Population and samples - From studying the distribution in sample, estimate the distribution in the actual population - Mean of actual population will Fall within one SD of mean of sample for 68% of samples Fall within two SD of mean of sample for 95% of samples Fall within three SD of mean of sample for 99% of samples - Larger sample yields smaller SD and hence more precise estimate - Hence, to improve the precision of an estimate, use a larger sample