Chapter 2: Descriptive and Graphical Statistics

Similar documents
Probability and Statistics Curriculum Pacing Guide

Shockwheat. Statistics 1, Activity 1

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Measures of the Location of the Data

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

STA 225: Introductory Statistics (CT)

Grade 6: Correlated to AGS Basic Math Skills

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

AP Statistics Summer Assignment 17-18

Introduction to the Practice of Statistics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Lesson M4. page 1 of 2

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Broward County Public Schools G rade 6 FSA Warm-Ups

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Algebra 2- Semester 2 Review

Minitab Tutorial (Version 17+)

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Statewide Framework Document for:

The Editor s Corner. The. Articles. Workshops. Editor. Associate Editors. Also In This Issue

UNIT ONE Tools of Algebra

Mathematics Success Level E

Using Proportions to Solve Percentage Problems I

Functional Skills Mathematics Level 2 assessment

Math 121 Fundamentals of Mathematics I

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Level 1 Mathematics and Statistics, 2015

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Mathematics subject curriculum

Mathematics Success Grade 7

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Research Design & Analysis Made Easy! Brainstorming Worksheet

Math 96: Intermediate Algebra in Context

Math Grade 3 Assessment Anchors and Eligible Content

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Office Hours: Mon & Fri 10:00-12:00. Course Description

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

STAT 220 Midterm Exam, Friday, Feb. 24

Spinners at the School Carnival (Unequal Sections)

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

What s Different about the CCSS and Our Current Standards?

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

learning collegiate assessment]

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

LESSON PLANS: AUSTRALIA Year 6: Patterns and Algebra Patterns 50 MINS 10 MINS. Introduction to Lesson. powered by

School Size and the Quality of Teaching and Learning

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

State of New Jersey

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Extending Place Value with Whole Numbers to 1,000,000

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Workshop Guide Tutorials and Sample Activities. Dynamic Dataa Software

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Massachusetts Department of Elementary and Secondary Education. Title I Comparability

Algebra 1 Summer Packet

TCC Jim Bolen Math Competition Rules and Facts. Rules:

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Preliminary Chapter survey experiment an observational study that is not a survey

Centre for Evaluation & Monitoring SOSCA. Feedback Information

How and Why Has Teacher Quality Changed in Australia?

6 Financial Aid Information

Statistics and Probability Standards in the CCSS- M Grades 6- HS

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Task Types. Duration, Work and Units Prepared by

The Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills:

Helping Your Children Learn in the Middle School Years MATH

APPENDIX A: Process Sigma Table (I)

Visit us at:

Mathematics Assessment Plan

Lecture 1: Machine Learning Basics

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Mathematics process categories

South Carolina English Language Arts

Ohio s Learning Standards-Clear Learning Targets

2 nd Grade Math Curriculum Map

Missouri Mathematics Grade-Level Expectations

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

w o r k i n g p a p e r s

Hardhatting in a Geo-World

Primary National Curriculum Alignment for Wales

Evaluation of a College Freshman Diversity Research Program

Individual Differences & Item Effects: How to test them, & how to test them well

Truth Inference in Crowdsourcing: Is the Problem Solved?

Transcription:

Chapter 2: Descriptive and Graphical Statistics Section 2.1: Location Measures Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c Department of Mathematics University of Houston Lecture 5 - Math 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 1 / 63

Outline 1 Describing Distributions by Graphs 2 Numerical Descriptions 3 Mean, Median and Mode 4 Measurements of Spread 5 Percentiles 6 Quartiles 7 The 1.5IQR Rule Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 2 / 63

A Data Set: Course Grades From Previous Semesters https://www.math.uh.edu/~cathy/math3339/data/grades.txt Student Score Grade Tests Quiz HW Opt-out Session 1 100.707 A 99.233 87.308 101.270 yes Sp16 2 81.310 B 75 98.231 64.444 yes Sp16 3 8.194 F 14.667 12.769 3.175 no Sp16 4 90.449 A 91.533 77.231 82.222 yes Sp16 5 68.461 D 65.783 81.769 68.571 no Sp16 6 103.955 A 103.32 97.923 101.905 yes Sp16 7 92.889 A 95.6 85.923 75.556 no Sp16 8 84.805 B 83.2 79.385 75.238 yes Sp16 9 91.640 A 89.967 91.231 85.079 yes Sp16 10 22.316 F 17.433 40.615 44.444 no Sp16 11 98.363 A 94.167 99.231 101.587 yes Sp16 12 49.250 F 43.917 73.077 78.095 no Sp16 13 16.967 F 15.5 20.077 29.841 no Sp16 14 50.747 F 45.533 67.385 57.460 no Sp16 15 43.184 F 72.983 47.462 38.413 no Sp16 16 100.845 A 98.667 96.231 100.317 yes Sp16 17 84.195 B 77.5 87.154 95.556 yes Sp16 18 84.400 B 78.733 78.615 82.540 yes Sp16 19 67.170 D 74.3 68.538 72.063 no Fal15 20 87.413 B 92 82.077 77.778 yes Fal15 21 67.899 D 71.8 71.077 84.127 no Fal15 22 74.676 C 70.083 83.308 73.016 no Fal15 23 40.054 F 44.133 21.308 33.333 no Fal15 24 101.014 A 101.08 98.923 95.873 no Fal15 25 11.972 F 17.1 10.385 3.810 no Fal15 26 79.831 B 86.233 71.923 46.667 no Fal15 27 83.301 B 94.6 69.692 60.317 no Fal15 28 72.299 C 64.967 67.615 99.394 no Sum16 29 83.821 B 77.2 80.923 83.030 yes Sum16 30 90.703 A 83.617 87.923 80.000 no Sum16 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 3 / 63

Distributions When observing a data set, one of the first things we want to know is how each variable is distributed. The distribution of a variable tells us what values it takes and how often it takes these values based on the individuals. The distribution of a variable can be shown through tables, graphs, and numerical summaries. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 4 / 63

Describing distributions An initial view of the distribution and the characteristics can be shown through the graphs. Then we use numerical descriptions to get a better understanding of the distributions characteristics. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 5 / 63

Distributions for categorical variables Lists the categories and gives either the count or the percent of cases that fall in each category. One way is a frequency table that displays the different categories then the count or percent of cases that fall in each category. Then we look at the graphs (bar or pie) to determine the distribution of a categorical variable. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 6 / 63

Frequency Tables Oup-out Percent Yes 40% No 60% Grade Percent A 30% B 26.67% C 6.67% D 10% F 26.67% Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 7 / 63

Describing Data By Graphs Graphs are an easy and quick way to describe the data. Types of graphs that we use depends on the type of data that we have. Graphs for categorical variables. Bar graphs: Each individual bar represents a category and the height of each of the bars are either represented by the count or percent. Pie charts: Helps us see what part of the whole each group forms. Graphs for quantitative variables. Dotplot Stemplot Histogram Boxplot Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 8 / 63

Bar Graph of Letter Grades 0 2 4 6 8 A B C D F Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 9 / 63

Pie Chart of Letter Grades A B C D F Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 10 / 63

R code First create a table: counts = table(grades$grade) For bar graph: barplot(counts) For pie chart: pie(counts) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 11 / 63

Describing distributions of quantitative variables The distribution of a variable tells us what values it takes and how often it takes these values. There are four main characteristics to describe a distribution: 1. Shape 2. Center 3. Spread 4. Outliers Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 12 / 63

Describing a distribution Shape A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right if the right side (higher values) of the graph extends much farther out than the left side. A distribution is skewed to the left if the left side (lower values) of the graph extends much farther out than the right side. A distribution is uniform if the graph is at the same height (frequency) from lowest to highest value of the variable. Center - the values with roughly half the observations taking smaller values and half taking larger values. Spread -from the graphs we describe the spread of a distribution by giving smallest and largest values. Outliers - individual values that falls outside the overall pattern. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 13 / 63

Dot plots A dot plot is made by putting dots above the values listed on a number line. Price of Basketball Shoes 0 50 100 150 200 250 300 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 14 / 63

Stem - and - leaf plot 1. Separate each observation into a stem consisting of all but the final rightmost digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Rcode: stem(dataset name$variable name) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 15 / 63

Stem-and-leaf Plot This is the number of wins out of the 2015 baseball season that each pitcher won. > stem(era$wins) The decimal point is 1 digit(s) to the right of the 2 679 3 8 4 1246678 5 022223345677889 6 1234577 7 0019 8 6 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 16 / 63

Stem-and-leaf Plot of ERA > stem(era$era) The decimal point is at the 1 78 2 1 2 567889 3 00023344 3 67777889 4 0001111233 4 579 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 17 / 63

Example of Stem-and-leaf Plot > stem(grades$score) The decimal point is 1 digit(s) to the right of the 0 8 1 27 2 2 3 4 039 5 1 6 788 7 25 8 01344457 9 01238 10 1114 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 18 / 63

Better Plot > stem(grades$score,scale=0.5) The decimal point is 1 digit(s) to the right of the 0 827 2 2 4 0391 6 78825 8 0134445701238 10 1114 1. What is the "shape" of this distribution? a) skewed left b) skewed right c) symmetric d) uniform 2. What is the aprroximate center of this distribuiton? a) 50 b) 82 c) 8.5 d) 4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 19 / 63

Frequency Table of Scores Score Tally Frequency (count) Percent 0-20 20-40 40-60 60-80 80-100 100-120 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 20 / 63

Histograms Bar graph for quantitative variables. Values of the variable are grouped together. Bars are touching. The width of the bar represents an interval of values (range of numbers) for that variable. The height of the bar represents the number of cases within that range of values. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 21 / 63

Histogram of Course Score Histogram of Course Scores Frequency 0 2 4 6 8 10 12 0 20 40 60 80 100 120 Course Scores Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 22 / 63

Cumulative Frequency Polygon Plot a point above each upper class boundary at a height equal to the cumulative frequency of the class. Connect the plotted points with line segments. A similar graph can be used with the cumulative percents. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 23 / 63

Cumulative Percent Polygon Cumulative Frequency Chart Cumulative Proportion 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 24 / 63

Describing Quantitative Variables with Numbers Center - mean, median or mode Spread - range, interquartile range, variance, or standard deviation Location - percentiles or standard scores Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 25 / 63

Parameters and Statistics A parameter is a number that describes the population. A parameter is a fixed number, but in practice we usually do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from The purpose of sampling or experimentation is usually to use statistics to make statements about unknown parameters, this is called statistical inference. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 26 / 63

Notation of Parameters and Statistics Name Statistic Parameter mean x µ mu standard deviation s σ sigma correlation r ρ rho regression coefficient b β beta proportion ˆp p Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 27 / 63

Example A carload lot of ball bearings has a mean diameter of 2.503 centimeters. This is within the specifications for acceptance of the lot by the purchaser. The inspector happens to inspect 100 bearings from the lot with a mean diameter of 2.515 centimeters. This is outside the specified limits, so the lot is mistakenly rejected. Is each of the bold numbers a parameter or a statistic? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 28 / 63

Presidential Approval Rating On January 25, 2017 by Gallup.com, 46% of Americans approved of how Trump is doing as President. Gallup tracks daily the percentage of Americans who approve or disapprove of the job Donald Trump is doing as president. Daily results are based on telephone interviews with approximately 1,500 national adults; Margin of error is ± 3 percentage points. Is this 46% a statistic or parameter? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 29 / 63

Measuring center: The mean Most common measure of center. Arithmetic average. To calculate the mean of a set of observations x 1, x 2,..., x n, add their values and divide by the number of observations n. Denoted: x called x-bar if the data is from a sample, µ, called "mu" if the data is from the entire population. x = x 1 + x 2 + + x n n = 1 n n i=1 x i µ = x 1 + x 2 + + x N N = 1 N Where n is the size of the sample and N is the size of the population. n i=1 x i Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 30 / 63

Measuring center: The Median The median M is the midpoint of a data set such that half of the observations are smaller and the other half are larger. 1. Arrange all observations in order of size, from smallest to largest. 2. Find the middle value of the arranged observations by counting (n + 1)/2 from the bottom of the list. If the number of observations n is odd, the median M is the the center observation in the ordered list. If the number of observations n is even, the median M is the mean of the two center observation in the ordered list. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 31 / 63

Measuring Center: The Mode The mode of a data set is the numerical value that appears the most frequently. The data set can have one mode, two or more modes. A data set may not have any mode. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 32 / 63

Cacluate the mean, median and mode The following is a stem-and-leaf plot of the course scores. Determine, the mean, medain and mode of the course scores. The decimal point is 1 digit(s) to the right of the 0 8 1 27 2 2 3 4 039 5 1 6 788 7 25 8 01344457 9 01238 10 1114 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 33 / 63

Finding mean and median in R scores=c(8,12,17,22,40,43,49,51,67,68,68,72,75,80,81, 83,84,84,84,85,87,90,91,92,93,98,101,101,101,104) mean(scores) [1] 71.03333 median(scores) [1] 82 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 34 / 63

Example: Test Scores The test scores of a class of 20 students have a mean of 71.6 and the test scores of another class of 14 students have a mean of 78.4. Find the mean of the combined group. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 35 / 63

Example The following are ages of automobiles. 8 3 6 5 5 2 10 9 8 2 3 2 2 Determine the mean, median and mode of this set. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 36 / 63

Mean vs. Median If the mean and the median are both numbers that describe the center of the values then why do we have different values? If the data has values that are outliers values that are beyond the range of the others, the mean is going toward these outliers. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 37 / 63

Mean vs. Median x(score) = 71.1 and M(score) = 82.3 If the mean and the median are both numbers that describe the center of the values then why do we have different values? If the data has values that are outliers values that are beyond the range of the others, the mean is going toward these outliers. The median is resistant to extreme values (outliers) in the data set. The mean is NOT robust against extreme values. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 38 / 63

Basketball Team Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 39 / 63

Average Test Scores? What is the mean and median for each of these sections test scores? Section A Section B 65 42 66 54 67 58 68 62 71 67 73 77 74 77 77 85 77 93 77 100 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 40 / 63

Types of Measurements for the Spread Range Percentiles Quartiles IQR; Interquartile range Variance Standard deviation Coefficient of Variation Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 41 / 63

The Range The range is the difference between the highest and lowest values. Section A: Range = 77-65 = 12 Section B: Range = 100-42 = 58 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 42 / 63

Percentiles The pth percentile of data is the value such that p percent of the observations fall at or below it. The use of percentiles to report spread when the median is our measure of center. If you are looking for the measurement that has a desired percentile rank, the 100P th percentile, is the measurement with rank (or position in the list) of np + 0.5, where n represents the number of data values in the sample. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 43 / 63

The 90th percentile of Section A test scores 1. Arrange the scores in order from lowest to highest. 65 66 67 68 71 73 74 77 77 77 2. n = 10, P = 0.90, so the 90 th percentile for this list is at np + 0.5 = 10(0.9) + 0.5 = 9.5, the mean of the 9th and 10th place values. 3. The 90th percentile is 77+77 2 = 77 Find the 35th percentile. Find the 75th percentile. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 44 / 63

Determine the 25th percentile of the Course Scores Another way to determine percentiles is using the cumulative frequency polygon to estimate percentiles. Cumulative Frequency Chart Cumulative Proportion 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 Scores Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 45 / 63

Determining Percentiles Suppose you know the position (order) of a value and want to know what percentile it is ranked at. If you have n data measurements, x i represents the 100(i 0.5)/n th percentile. Example: Determine the percentile of the 4 th order statistic for a sample size of n = 15. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 46 / 63

Examples of percentiles Suppose you want to know what percentile you are in a certain class. You know there are 200 students in this class and that 20 of the students have scores above you. What is your percentile? Suppose your percentile came out to be 90th percentile, how many students scored the same as or below you? What about at the 50th percentile? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 47 / 63

The Quartiles The first quartile is 25th percentile, Q 1. The second quartile is the median and the 50th percentile, Q 2. The third quartile is the 75th percentile, Q 3. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 48 / 63

Determining Q 1 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 1 : P = 0.25 np + 0.5 = 15(0.25) + 0.5 = 4.25. Since we do not get an integer, we find the mean of the 4th and 5th element in the ordered dataset. Q 1 = 120+140 2 = 130. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 49 / 63

Determine Q 2 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 2 : P = 0.5 np + 0.5 = 15(0.5) + 0.5 = 8. So Q 2 is the 8th element of the ordered data. Q 2 = 150. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 50 / 63

Determine Q 3 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 3 : P = 0.75 np + 0.5 = 15(0.75) + 0.5 = 11.75. Again since we did not get and integer, the third quartile is the mean of the 11th and 12th elements in the ordred data. Q 3 = 215+215 2 = 215. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 51 / 63

R-code for finding Q 1, Q 2, & Q 3 The values: Minimum, Q 1, Median (Q 2 ), Q 3, and Maximum are called the Five Number Summary > shoeprice=c(100,110,120,120,140,140,140,150, 185,185,215,215,250,250,290) > fivenum(shoeprice) [1] 100 130 150 215 290 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 52 / 63

Interquartile Range Interquartile range, IQR, is the difference between Q 3 and Q 1 IQR = Q 3 Q 1 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 53 / 63

Example Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Find Q 1, Q 2, Q 3, the range and the IQR. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 54 / 63

Detecting Outliers: 1.5IQR Rule An outlier is an observation that is "distant" from the rest of the data. Outliers can occur by chance or by measurement errors. Any point that falls outside the interval calculated by Q 1 1.5(IQR) and Q 3 + 1.5(IQR) is considered an outlier. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 55 / 63

Outliers for Basketball Shoe Prices? Recall: Q 1 = 130, Q 3 = 215, So IQR = 215-130 = 85. Q 1 1.5(IQR) = 130 1.5(85) = 2.5 Q 3 + 1.5(IQR) = 215 + 1.5(85) = 342.5 Any price that is below $2.50 or above $342.50 is considered an outlier. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 56 / 63

Outliers? The following is information from 91 pairs of basketball shoes: > fivenum(shoes$price) [1] 40 75 90 120 250 The highest four numbers in the dataset is..., 170, 225, 250, 250. Are there any prices that are considered an outlier? Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 57 / 63

Example of Outliers Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Using the 1.5 IQR rule, give the boundaries of the outliers. Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 58 / 63

A Graph of the Five Number Summary: Boxplot A central box spans the quartiles. A line inside the box marks the median. Lines extend from the box out to the smallest and largest observations. Asterisks represents any values that are considered to be outliers. Boxplots are most useful for side-by-side comparison of several distributions. Rcode: boxplot(dataset name$variable name) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 59 / 63

Boxplot of Prices 50 100 150 200 250 boxplot(shoes$price,horizontal = T) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 60 / 63

Boxplot of Course Scores 20 40 60 80 100 Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 61 / 63

Boxplot of Course Scores by Session Fal15 Sp16 Sum16 20 40 60 80 100 boxplot(grades$score~grades$session,horizontal=true) Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 62 / 63

Question about the Graphs Given the first type of plot indicated in each pair, which of the second plots could not always be generated from it? a) dot plot, histogram b) stem and leaf, dot plot c) histogram, stem and leaf d) dot plot, box plot Cathy Poliak, Ph.D. cathy@math.uh.edu Office: Fleming 11c (Department 2.1 of Mathematics UniversityLecture of Houston 5 - Math ) 3339 63 / 63