Algebra I October 19 th October 23 rd Unit 3: Descriptive Statistics
Jump Start *New procedure: Jump Starts from now on will be Regents Questions of topics previously or currently being covered. There will be a few of the Jump Start questions from the week on your quizzes on Fridays. Distributions: Their Shapes & Center Data are often summarized by graphs; the graphs are the first indicator of variability (how spread out the data is) in the data. Below are some of the graphs that you will encounter in Algebra. Dot plots: A plot of each data value on a scale or number line. 1
Histogram: a graph of data that groups the data based on intervals and represents the data in each interval by a bar. Box plots: a graph that provides a picture of the data ordered and divided into four intervals that each contains approximately 25% of the data. Practice! Directions: On pages 3-7 try to answer the questions about each graph. You may work independently or with a partner. Start with the graph your table has been assigned and then go on to the other graphs. Be ready to share out responses to your assigned graph. 2
Transportation officials collect data on flight delays (the number of minutes past the scheduled departure time that a flight takes off). 1. What do you think this graph is telling us about the flight delays for these sixty flights? Consider the dot plot of the delay times for sixty BigAir flights during December 2012. 2. Can you think of a reason why the data presented by this graph provides important information? Who might be interested in this data distribution? 3. Based on your previous work with dot plots, would you describe this dot plot as representing a symmetric or a skewed data distribution? (Recall that a skewed data distribution is not mound shaped.) Explain your answer. 3
A random sample of eighty viewers of a television show was selected. The dot plot below shows the distribution of the ages (in years) of these eighty viewers. 4. What do you think this graph is telling us about the ages of the eighty viewers in this sample? 5. Can you think of a reason why the data presented by this graph provides important information? Who might be interested in this data distribution? 6. Based on your previous work with dot plots, would you describe this dot plot as representing a symmetric or a skewed data distribution? Explain your answer. 4
The following histogram represents the age distribution of the population of Kenya in 2010. 7. What do you think this graph is telling us about the population of Kenya? 8. Why might we want to study the data represented by this graph? 9. Based on your previous work with histograms, would you describe this histogram as representing a symmetrical or a skewed distribution? Explain your answer. 5
The following histogram represents the age distribution of the population of the United States in 2010. 10. What do you think this graph is telling us about the population of the United States? 11. Why might we want to study the data represented by this graph? Thirty students from River City High School were asked how many pets they owned. The following box plot was prepared from their answers. 12. What does the box plot tell us about the number of pets owned by the thirty students at River City High School? 13. Why might understanding the data behind this graph be important? 6
Twenty-two juniors from River City High School participated in a walkathon to raise money for the school band. The following box plot was constructed using the number of miles walked by each of the twenty-two juniors. 14. What do you think the box plot tells us about the number of miles walked by the twenty-two juniors? 15. Why might understanding the data behind this graph be important? Understanding Check: Try the following question on your own. Sam said that 50% of the twenty-two juniors at River City High School who participated in the walkathon walked at least ten miles. Do you agree? Why or why not? 7
Measure of center is a value that attempts to describe a set of data by identifying the central position of the data set (or what is a typical value in the data set). Measures of center include the mean ( ), median ( ), and mode ( ). Below is a description of how to use your calculator to find the mean and median which are the two measures of center we will focus on using. 8
Enter the following two data sets into your calculator. We will be using these data sets to complete the last example for today. Data Set 1: Pet owners (Put into L1) Students from River City High School were randomly selected and asked, How many pets do you currently own? The results are recorded below. Data Set 2: Length of the east hallway at River City High School (Put into L2) Twenty students were selected to measure the length of the east hallway. Two marks were made on the hallway s floor, one at the front of the hallway, and one at the end of the hallway. Each student was given a meter stick and asked to use the meter stick to determine the length between the marks to the nearest tenth of a meter. The results are recorded below..................... Below are dot plots representing each data set. Answer the questions that follow using the dot plots and information in your calculator. 9
1) Calculate the mean number of pets owned by the thirty students from River City High School. Calculate the median number of pets owned by the thirty students. 2) What do you think is a typical number of pets for students from River City High School? Explain how you made your estimate. 3) Why do you think that different students got different results when they measured the same distance of the east hallway? 4) What is the mean length of the east hallway data set? What is the median length? 5) A construction company will be installing a handrail along a wall from the beginning point to the ending point of the east hallway. The company asks you how long the handrail should be. What would you tell the company? Explain your answer. 10
Jump Start *New procedure: Jump Starts from now on will be Regents Questions of topics previously or currently being covered. There will be a few of the Jump Start questions from the week on your quizzes on Fridays. 11
Recap Try answering the following questions independently about interpreting data sets. Test Scores (as%) for 6 th Period 38 72 88 96 102 1. What was the highest score on the test? 2. What percent of the class scored above a 72? 3. What was the median score on the test? 4. What percent of the class scored between 88 & 96? 5. Do you think that this test was too hard for the students? Explain. 6. The accompanying box-and-whisker plots can be used to compare the annual incomes of three professions. Based on the box-and-whisker plots, which statement is true? (1) The median income for nuclear engineers is greater than the income of all musicians. (2) The median income for police officers and musicians is the same. (3) All nuclear engineers earn more than all police officers. (4) A musician will eventually earn more than a police officer. 12
Deviations from the Mean A consumers organization is planning a study of the various brands of batteries that are available. As part of its planning, it measures lifetime (i.e., how long a battery can be used before it must be replaced) for each of six batteries of Brand A and eight batteries of Brand B. Dot plots showing the battery lives for each brand are shown below. 1. Does one brand of battery tend to last longer, or are they roughly the same? What calculations could you do in order to compare the battery lives of the two brands? 2. Do the battery lives tend to differ more from battery to battery for Brand A or for Brand B? 3. Would you prefer a battery brand that has battery lives that do not vary much from battery to battery? Why or why not? Variability can be measured by deviations. Deviations refer to the distance between two values. Today we will be looking at deviations from the mean to interpret the variability in different data sets. *To find deviations from the mean: 13
The table below shows the lives (in hours) of the Brand A batteries. Life (Hours) 83 94 96 106 113 114 Deviation from the Mean 4. Calculate the deviations from the mean for the remaining values, and write your answers in the appropriate places in the table. 5. What do you notice about the values you came up with? The table below shows the battery lives and the deviations from the mean for Brand B. Life (Hours) 73 76 92 94 110 117 118 124 Deviation from the Mean 27.5 24.5 8.5 6.5 9.5 16.5 17.5 23.5 6. Ignoring the sign of the deviation, which data set tends to have larger deviations from the mean, A or B? Why do you think that is? Directions: Complete questions 1-6 with a partner or independently. The lives of five batteries of a third brand, Brand C, were determined. The dot plot below shows the lives of the Brand A and Brand C batteries. 1. Which brand has the greater mean battery life? (You should be able to answer this question without doing any calculations.) 14
2. Which brand shows greater variability? 3. Which brand would you expect to have the greater deviations from the mean (ignoring the signs of the deviations)? The table below shows the lives (in hours) of the Brand A batteries. Life (Hours) 83 94 96 106 113 114 Deviation from the Mean 18 7 5 +5 +12 +13 The table below shows the lives for the Brand C batteries. Life (Hours) 115 119 112 98 106 Deviation from the Mean 4. Calculate the mean battery life for Brand C. (Be sure to include a unit in your answer.) 5. Write the deviations from the mean in the empty cells of the table for Brand C. 6. Ignoring the signs, are the deviations from the mean generally larger for Brand A or for Brand C? Does your answer agree with your answer to question 3? 15
Jump Start *New procedure: Jump Starts from now on will be Regents Questions of topics previously or currently being covered. There will be a few of the Jump Start questions from the week on your quizzes on Fridays. Measuring and Interpreting Standard Deviation Yesterday we made conclusions about the variability of a data set by looking at the distance of each data value from the mean (deviation from the mean). The deviations from the mean are actually used to calculate what is known as the standard deviation and describes how spread out a data set is. 16
Standard deviation is the measure of how spread out your data is. To calculate the standard deviation of data sets, follow the steps below in your calculator. *Steps 1-3 only need to be done when first turning on the calculator. 1) Hit the 2 ND button and then 0 to access the CATALOG. 2) Scroll down until you get to DiagnosticON and hit ENTER. 3) Hit ENTER again and Done should appear on your screen. 4) Hit the STAT button (second row of buttons on the end) and then ENTER. 5) If there are numbers in L1, clear them by using the arrow buttons to highlight L1and press the CLEAR button then ENTER. 6) Move the cursor back down into L1, type the first data value, and press ENTER. Continue entering the remaining data values into L1 in the same way. 7) Once all data values are entered, press 2 ND then the MODE button to QUIT and return to the home screen. 8) Press the STAT button again and use the cursor to select CALC at the top, and hit ENTER (1-Var Stats should be selected) and then ENTER again. 9) You will have a list of statistics on your home screen. The mean is the value, and the standard deviation for a sample is the value. Example 1 Use a calculator to find the mean and standard deviation for the following set of data. A set of eight men have heights (in inches) as shown below. 67.0 70.9 67.6 69.8 69.7 70.9 68.7 67.2 Indicate the mean and standard deviation you obtained from your calculator to the nearest hundredth. Mean: Standard Deviation: 17
Example 2 The heights (in inches) of nine women are as shown below. 68.4 70.9 67.4 67.7 67.1 69.2 66.0 70.3 67.6 Use the statistical features of your calculator to find the mean and the standard deviation of these heights to the nearest hundredth. Mean: Standard Deviation: Example 3 A group of people attended a talk at a conference. At the end of the talk, ten of the attendees were given a questionnaire that consisted of four questions. The questions were optional, so it was possible that some attendees might answer none of the questions, while others might answer 1, 2, 3, or all 4 of the questions (so, the possible numbers of questions answered are 0, 1, 2, 3, and 4). Suppose that the numbers of questions answered by each of the ten people were as shown in the dot plot below. Use the statistical features of your calculator to find the mean and the standard deviation of the data set. Mean: Standard Deviation: 18
Suppose the dot plot looked like this: Use your calculator to find the mean and the standard deviation of this distribution. Remember that the size of the standard deviation is related to the size of the deviations from the mean. Explain why the standard deviation of this distribution is greater than the standard deviation in Exercise 2. What does this tell us about the data? *Think about this: If the standard deviation of a data set is zero, what can be said about the data within the set? Exit Ticket I can calculate and interpret standard deviation. Three data sets are shown in the dot plots below. a. Which data set has the smallest standard deviation of the three? Justify your answer. b. Which data set has the largest standard deviation of the three? Justify your answer. 19
Jump Start *New procedure: Jump Starts from now on will be Regents Questions of topics previously or currently being covered. There will be a few of the Jump Start questions from the week on your quizzes on Fridays. Interquartile Range Remember... 20
When data is skewed, standard deviation (the distance from the mean) does not give precise information about the variability of the data. In these situations, we can use another measure of variability called interquartile range. Interquartile range (IQR) is a measure of how spread out the middle 50% of the data is and gives us information about the variability when data is skewed. To find the (IQR): Example 1 Below is a data set consisting of the number of hours of television 40 students watched over the weekend. 1) What is the interquartile range (IQR) for this distribution? What percent of the students fall within this interval? 2) Do you think the data distribution represented by the box plot is a skewed distribution? Why or why not? 3) Estimate the typical number of hours students watched television. Explain why you chose this value. 21
Example 2 Transportation officials collect data on flight delays (the number of minutes a flight takes off after its scheduled time). Consider the box plot of the delay times in minutes for 60 BigAir flights during December 2012: 1) How many flights left more than 60 minutes late? 2) Is this data distribution considered skewed? If so, which way? If not, why not? 3) The mean of the 60 flight delays is approximately 42 minutes. Do you think that 42 minutes is typical of the number of minutes a BigAir flight was delayed? Why or why not? 4) What is the interquartile range, or IQR, of this data set? 5) What would be a good indicator of the typical number of minutes a BigAir flight was delayed? Why? 22
Summary When data is skewed, use the to describe a typical value and the to measure variability. When data is symmetric, use the to describe a typical value and the to measure variability. Practice Comparing Distributions 23
Week 8 Homework 1.) Sam said that a typical flight delay for the sixty BigAir flights was approximately one hour. Do you agree? Why or why not? 2.) The chart below shows three sets of data. For each set, determine the mean and the standard deviation and answer the questions that follow. a. What data set has the largest variability (it is spread out the most)? How do you know? b. What data set has the smallest variability (it is spread out the least)? How do you know? 24
3.) 4.) A movie theater recorded the number of tickets sold daily for a popular movie during the month of June. The box-and-whisker plot shown below represents the data for the number of tickets sold, in hundreds. Which conclusion can be made using this plot? (1) The second quartile is 600. (2) The mean of the attendance is 400. (3) The range of the attendance is 300 to 600. (4) Twenty-five percent of the attendance is between 300 and 400. 25