Chapter 1: Looking at Data Distributions Introduction to the Practice of Statistics Sixth Edition David S. Moore George P. McCabe Bruce A. Craig Statistics is the science of collecting, organizing and interpreting numerical facts, which we call data. The study and collection of data are important in the work of many professions, so training in the science of statistics is valuable preparation for a variety of career. We will learn data analysis and collection approaches within various application context. Context and state are important. Remember that the goal of statistics is not calculation for its own sake but gaining understanding of certain information from numbers. Such information has to be useful within that context. You cannot learn statistics without practice. So practice, practice and practice!! Be prepared to solve problems from data! Copyright 2009 by W. H. Freeman and Company The gain will be worth the pain. We will learn it together. Let us start with some concepts so we sound like professional. Any set of data contains information about some group of individuals. The information is organized in variables. For example, we can organize your statistics class scores (variables) by your student ID (individual). The purpose is for me to check the grades to make sure everyone is doing well (context). Figure 1.1 Some variables uses numerical data (scores) and some use letters (grade). Statistician came up with three ways to describe your data. Bar chart Pie chart Definition, pg 4 The distribution seems like a different animal! How to describe the distribution of our data? Before we answer this question, let s first plot our data. Plotting the data is the more intuitive way to learn some information. Figure 1.3 - One variable at a time - Takes to much screen space 1
Table format Histogram: yeah I learned this at 7th grade. A histogram breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. Important: you can choose any convenient number of classes, but you should always choose classes of equal width. Table 1.1 Figure 1.4 Table 1.2 Definition, pg 9 OK. We talked a lot about stemplot. I will just leave two graphs in the next two page. I trust you know how to get there. Note, it is not good for large samples. Figure 1.5 Definition, pg 18 Table 1.4 2
OK. We learned how to plot data. Now get back to the distribution. This slide is very important that and it tells you what to look for from a graph or a data source. We will learn how you get these numbers. Figure 1.10 Definition, pg 15 Definition, pg 31 Definition, pg 32 33 Definition, pg 35 Definition, pg 37 3
Definition, pg 38a Figure 1.19 Definition, pg 38b Definition, pg 39 Figure 1.21 Definition, pg 40 41 4
Definition, pg 43a Definition, pg 43b You surely know this from high school. Definition, pg 45 Definition, pg 46 47 You know what the blue bars represent, right? Histogram. Simply fitting a smooth curve to the histogram, we get the density curve. 1.3 Density Curve and Normal Distributions Figure 1.24 Introduction to the Practice of Statistics, Sixth Edition Characteristics: - Is always on or above the horizontal axis (histogram uses frequency so it cannot be negative) - Has area exactly 1 underneath it (it shows the distribution of *all* data) 5
More characteristics Definition, pg 56 Definition, pg 57 One particular type of density curve is a normal curve that follows a normal distribution: N(μ, σ) The 68-95-99.7 rule Figure 1.28 *All* normal distributions have the same overall shape. The exact density curve for a particular normal distribution is specified by given its mean and its standard deviation. Figure 1.30 Introduction to the Practice of Statistics, Sixth Edition An example of a normal distribution, N(64.5, 2.5) Standardized Normal Distribution: N(0, 1) Notice the mean is 0 and the standard deviation is 1. Definition, pg 59 Figure 1.29 6
We can do a linear transformation to get a standardized distribution. Definition, pg 61 Definition, pg 62 Check Table A in your textbook, given the z score of 1.47, what is the area on the left of this z score? When x is the mean, the cumulative proportion to the left will be 50%. Figure 1.31 Figure 1.32 How high for the top 10% (the blue block) given a distribution of N(505, 110)? Two steps to solve the problem. (1) Use the Table A to get the z value. (look for 0.9 from the table which is the proportion to the left of z (the yellow block). (2) Un-standardize: transform z back to the data using linear transformation. (x-550) / 110 = 1.28. Then x = 645.8. Review Chapter 1 Five number rules Mean / std Histogram Density curve Normal distribution, its mean and std Z score, 68-95-97.5 rule Plot your data (stem plot, bar chart, pie chart, time plot) Figure 1.33 Introduction to the Practice of Statistics, Sixth Edition 7