Chapter 2: Descriptive and Graphical Statistics

Similar documents
Probability and Statistics Curriculum Pacing Guide

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Shockwheat. Statistics 1, Activity 1

STA 225: Introductory Statistics (CT)

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Introduction to the Practice of Statistics

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

AP Statistics Summer Assignment 17-18

Measures of the Location of the Data

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Grade 6: Correlated to AGS Basic Math Skills

Lesson M4. page 1 of 2

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Broward County Public Schools G rade 6 FSA Warm-Ups

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Mathematics Success Level E

The Editor s Corner. The. Articles. Workshops. Editor. Associate Editors. Also In This Issue

UNIT ONE Tools of Algebra

Functional Skills Mathematics Level 2 assessment

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Algebra 2- Semester 2 Review

Research Design & Analysis Made Easy! Brainstorming Worksheet

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Minitab Tutorial (Version 17+)

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Using Proportions to Solve Percentage Problems I

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

Statewide Framework Document for:

What s Different about the CCSS and Our Current Standards?

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

Level 1 Mathematics and Statistics, 2015

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Office Hours: Mon & Fri 10:00-12:00. Course Description

Math Grade 3 Assessment Anchors and Eligible Content

learning collegiate assessment]

Math 96: Intermediate Algebra in Context

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Mathematics subject curriculum

Workshop Guide Tutorials and Sample Activities. Dynamic Dataa Software

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Extending Place Value with Whole Numbers to 1,000,000

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

Preliminary Chapter survey experiment an observational study that is not a survey

Statistics and Probability Standards in the CCSS- M Grades 6- HS

Mathematics Success Grade 7

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

LESSON PLANS: AUSTRALIA Year 6: Patterns and Algebra Patterns 50 MINS 10 MINS. Introduction to Lesson. powered by

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Visit us at:

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Math 121 Fundamentals of Mathematics I

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Spinners at the School Carnival (Unequal Sections)

School Size and the Quality of Teaching and Learning

Missouri Mathematics Grade-Level Expectations

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Helping Your Children Learn in the Middle School Years MATH

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

Hardhatting in a Geo-World

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Evaluation of a College Freshman Diversity Research Program

6 Financial Aid Information

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

APPENDIX A: Process Sigma Table (I)

The Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills:

State of New Jersey

Unit 3 Ratios and Rates Math 6

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Mathematics process categories

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Algebra 1 Summer Packet

The Good Judgment Project: A large scale test of different methods of combining expert predictions

STAT 220 Midterm Exam, Friday, Feb. 24

Centre for Evaluation & Monitoring SOSCA. Feedback Information

Objective: Add decimals using place value strategies, and relate those strategies to a written method.

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

12- A whirlwind tour of statistics

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Excel Intermediate

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Association Between Categorical Variables

Massachusetts Department of Elementary and Secondary Education. Title I Comparability

Cal s Dinner Card Deals

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

Transcription:

Chapter 2: Descriptive and Graphical Statistics Section 2.1: Location Measures Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c Department of Mathematics University of Houston Lecture 5 - Math 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 1 / 63

Outline 1 Describing Distributions by Graphs 2 Numerical Descriptions 3 Mean, Median and Mode 4 Measurements of Spread 5 Percentiles 6 Quartiles 7 The 1.5IQR Rule Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 2 / 63

A Data Set: Course Grades From Previous Semesters https://www.math.uh.edu/~cathy/math3339/data/grades.txt Student Score Grade Tests Quiz HW Opt-out Session 1 100.707 A 99.233 87.308 101.270 yes Sp16 2 81.310 B 75 98.231 64.444 yes Sp16 3 8.194 F 14.667 12.769 3.175 no Sp16 4 90.449 A 91.533 77.231 82.222 yes Sp16 5 68.461 D 65.783 81.769 68.571 no Sp16 6 103.955 A 103.32 97.923 101.905 yes Sp16 7 92.889 A 95.6 85.923 75.556 no Sp16 8 84.805 B 83.2 79.385 75.238 yes Sp16 9 91.640 A 89.967 91.231 85.079 yes Sp16 10 22.316 F 17.433 40.615 44.444 no Sp16 11 98.363 A 94.167 99.231 101.587 yes Sp16 12 49.250 F 43.917 73.077 78.095 no Sp16 13 16.967 F 15.5 20.077 29.841 no Sp16 14 50.747 F 45.533 67.385 57.460 no Sp16 15 43.184 F 72.983 47.462 38.413 no Sp16 16 100.845 A 98.667 96.231 100.317 yes Sp16 17 84.195 B 77.5 87.154 95.556 yes Sp16 18 84.400 B 78.733 78.615 82.540 yes Sp16 19 67.170 D 74.3 68.538 72.063 no Fal15 20 87.413 B 92 82.077 77.778 yes Fal15 21 67.899 D 71.8 71.077 84.127 no Fal15 22 74.676 C 70.083 83.308 73.016 no Fal15 23 40.054 F 44.133 21.308 33.333 no Fal15 24 101.014 A 101.08 98.923 95.873 no Fal15 25 11.972 F 17.1 10.385 3.810 no Fal15 26 79.831 B 86.233 71.923 46.667 no Fal15 27 83.301 B 94.6 69.692 60.317 no Fal15 28 72.299 C 64.967 67.615 99.394 no Sum16 29 83.821 B 77.2 80.923 83.030 yes Sum16 30 90.703 A 83.617 87.923 80.000 no Sum16 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 3 / 63

Distributions When observing a data set, one of the first things we want to know is how each variable is distributed. The distribution of a variable tells us what values it takes and how often it takes these values based on the individuals. The distribution of a variable can be shown through tables, graphs, and numerical summaries. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 4 / 63

Describing distributions An initial view of the distribution and the characteristics can be shown through the graphs. Then we use numerical descriptions to get a better understanding of the distributions characteristics. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 5 / 63

Distributions for categorical variables Lists the categories and gives either the count or the percent of cases that fall in each category. One way is a frequency table that displays the different categories then the count or percent of cases that fall in each category. Then we look at the graphs (bar or pie) to determine the distribution of a categorical variable. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 6 / 63

Frequency Tables Oup-out Percent Yes 40% No 60% Grade Percent A 30% B 26.67% C 6.67% D 10% F 26.67% Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 7 / 63

Describing Data By Graphs Graphs are an easy and quick way to describe the data. Types of graphs that we use depends on the type of data that we have. Graphs for categorical variables. Bar graphs: Each individual bar represents a category and the height of each of the bars are either represented by the count or percent. Pie charts: Helps us see what part of the whole each group forms. Graphs for quantitative variables. Dotplot Stemplot Histogram Boxplot Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 8 / 63

Bar Graph of Letter Grades 0 2 4 6 8 A B C D F Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 9 / 63

Pie Chart of Letter Grades A B C D F Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 10 / 63

R code First create a table: counts = table(grades$grade) For bar graph: barplot(counts) For pie chart: pie(counts) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 11 / 63

Describing distributions of quantitative variables The distribution of a variable tells us what values it takes and how often it takes these values. There are four main characteristics to describe a distribution: 1. Shape 2. Center 3. Spread 4. Outliers Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 12 / 63

Describing a distribution Shape A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right if the right side (higher values) of the graph extends much farther out than the left side. A distribution is skewed to the left if the left side (lower values) of the graph extends much farther out than the right side. A distribution is uniform if the graph is at the same height (frequency) from lowest to highest value of the variable. Center - the values with roughly half the observations taking smaller values and half taking larger values. Spread -from the graphs we describe the spread of a distribution by giving smallest and largest values. Outliers - individual values that falls outside the overall pattern. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 13 / 63

Dot plots A dot plot is made by putting dots above the values listed on a number line. Price of Basketball Shoes 0 50 100 150 200 250 300 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 14 / 63

Stem - and - leaf plot 1. Separate each observation into a stem consisting of all but the final rightmost digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Rcode: stem(dataset name$variable name) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 15 / 63

Stem-and-leaf Plot This is the number of wins out of the 2015 baseball season that each pitcher won. > stem(era$wins) The decimal point is 1 digit(s) to the right of the 2 679 3 8 4 1246678 5 022223345677889 6 1234577 7 0019 8 6 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 16 / 63

Stem-and-leaf Plot of ERA > stem(era$era) The decimal point is at the 1 78 2 1 2 567889 3 00023344 3 67777889 4 0001111233 4 579 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 17 / 63

Example of Stem-and-leaf Plot > stem(grades$score) The decimal point is 1 digit(s) to the right of the 0 8 1 27 2 2 3 4 039 5 1 6 788 7 25 8 01344457 9 01238 10 1114 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 18 / 63

Better Plot > stem(grades$score,scale=0.5) The decimal point is 1 digit(s) to the right of the 0 827 2 2 4 0391 6 78825 8 0134445701238 10 1114 1. What is the "shape" of this distribution? a) skewed left b) skewed right c) symmetric d) uniform 2. What is the aprroximate center of this distribuiton? a) 50 b) 82 c) 8.5 d) 4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 19 / 63

Histograms Bar graph for quantitative variables. Values of the variable are grouped together. Bars are touching. The width of the bar represents an interval of values (range of numbers) for that variable. The height of the bar represents the number of cases within that range of values. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 21 / 63

Histogram of Course Score Histogram of Course Scores Frequency 0 2 4 6 8 10 12 0 20 40 60 80 100 120 Course Scores Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 22 / 63

Cumulative Frequency Polygon Plot a point above each upper class boundary at a height equal to the cumulative frequency of the class. Connect the plotted points with line segments. A similar graph can be used with the cumulative percents. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 23 / 63

Cumulative Percent Polygon Cumulative Frequency Chart Cumulative Proportion 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 24 / 63

Describing Quantitative Variables with Numbers Center - mean, median or mode Spread - range, interquartile range, variance, or standard deviation Location - percentiles or standard scores Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 25 / 63

Parameters and Statistics A parameter is a number that describes the population. A parameter is a fixed number, but in practice we usually do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from The purpose of sampling or experimentation is usually to use statistics to make statements about unknown parameters, this is called statistical inference. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 26 / 63

Notation of Parameters and Statistics Name Statistic Parameter mean x µ mu standard deviation s σ sigma correlation r ρ rho regression coefficient b β beta proportion ˆp p Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 27 / 63

Example A carload lot of ball bearings has a mean diameter of 2.503 centimeters. This is within the specifications for acceptance of the lot by the purchaser. The inspector happens to inspect 100 bearings from the lot with a mean diameter of 2.515 centimeters. This is outside the specified limits, so the lot is mistakenly rejected. Is each of the bold numbers a parameter or a statistic? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 28 / 63

Presidential Approval Rating On January 25, 2017 by Gallup.com, 46% of Americans approved of how Trump is doing as President. Gallup tracks daily the percentage of Americans who approve or disapprove of the job Donald Trump is doing as president. Daily results are based on telephone interviews with approximately 1,500 national adults; Margin of error is ± 3 percentage points. Is this 46% a statistic or parameter? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 29 / 63

Question about the Graphs Given the first type of plot indicated in each pair, which of the second plots could not always be generated from it? a) dot plot, histogram b) stem and leaf, dot plot c) histogram, stem and leaf d) dot plot, box plot Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 63 / 63