# AP STATISTICS. Summer FUN School Year. contain articles and basic information to help you answer the questions in this packet.

Save this PDF as:

Size: px
Start display at page:

## Transcription

5 Go on the internet to select Gapminder World panel, and the scatterplot should load. You are looking at worldwide data of Life Expectancy vs. Per Capita Income. Point your cursor at the x-axis or y-axis labels to get more information about these variables. Every colored circle on the graph represents a country. Point the cursor at various circles and the name of the country will appear. The size of each circle is proportion to that country s population look in the lower right corner to see each country s population as you point the cursor at it. If you would like, slide the year indicator back to the first year that data was recorded (1950 for this combination of variables), and then click on Play to watch the change in the scatterplot, year by year, from that year to the present. Even more fun is to select one or more countries (this causes all the other countries to dim into the background), and watch the track made by the selected countries over time. 9. What is the relationship between Per Capita Income and Life Expectancy in the world? 10. Which countries are the farthest from the pattern shown by the rest of the world? 11. Which country has the highest life expectancy now? 12. Which has the highest per capita income now? 13. Which has the lowest income now? 14. The lowest life expectancy now? 15. Which group of countries (by color) has gained most since 1950 relative to the rest of the world, in both income and life expectancy? 16. Watch the track of Rwanda from What events in Rwanda might explain the unusual changes that happened? 5

6 Part 3: Vocabulary List Please define, IN YOUR OWN WORDS (handwritten), each of the following terms from the information on StatTrek website. When asked, provide a unique example of the word. Examples from the StatTrek website or this packet will NOT receive credit. 1. Categorical Variables Example: 2. Quantitative Variables Example: 3. Univariate Data: 4. Bivariate Data: 5. Median: 6. Mean: 7. Population: Example: 8. Sample: Example: 9. Center: 10. Spread: 11. Symmetry: 12. Unimodal and Bimodal: 6

7 13. Skewness: Sketch Skewed Left: Sketch Skewed Right: 14. Uniform: 15. Gaps: 16. Outliers: 17. Dotplots: 18. Difference between bar chart and histogram: 19. Stemplots: 20. Boxplots: 21. Quartiles: 22. Range: 23. Interquartile Range: 24. Parallel boxplots 25. Parameter 26. Statistic 7

8 Part 4: Practice Problems- Use Appendix #3 and your research of vocabulary terms to help you answer the following questions CATEGORICAL OR QUANTITATIVE Determine if the variables listed below are quantitative or categorical. Neatly print Q for quantitative and C for categorical. 1. Time it takes to get to school 2. Number of shoes owned 3. Hair color 4. Temperature of a cup of coffee 5. Teacher salaries 6. Gender 8. Height 9. Amount of oil spilled 10. Age of Oscar winners 11. Type of pain medication 12. Jellybean flavors 13. Country of origin 14. Type of meat 7. Facebook user STATISTIC WHAT IS THAT? A statistic is a number calculated from data. Quantitative data has many different statistics that can be calculated. Determine the given statistics from the data below on the number of homeruns Mark McGuire has hit in each season from Mean Minimum Maximum Median Q1 Q3 Range IQR 8

9 CENTER & SPREAD OF A DISTRIBUTION: (REVIEW NOTES IN APPENDIX 3) Last year students collected data on the age of their moms and dads when they (the students ) were born. The following are their results. Dad: Mom: Find the mean and the median for the Dad data. To find the mean using your calculator, go to 2 nd STAT MATH 5 and then type in L1 by typing 2 nd 1. This will add all the values in the list. Then divide by 26 to get the mean. Round Mean to 2 Decimal places. To find the median, sort the data in the lists: STAT 2 L1 The median is exactly in the middle between the 13 th and the 14 th value. Mean Median Are they the same? If not, which is larger? 2. Find the mean and the median for the mom data. Mean Median Are they the same? If not, which is larger? 3. Now compare the two means you calculated. Which is larger? Is this result what you expected? Why/why not? Give explanation in real world context. 4. Calculate the range for each set of data. Dad Mom 9

10 5. Are these ranges about the same? If no, what are some reasons that might cause this difference? Give explanation in real world context. 6. Find Q1 and Q3 for the Dad data. Q1 Q3 7. Find Q1 and Q3 for the Mom data. Q1 Q3 7. You have now calculated the Five-Number Summary. This can also be used as a way to determine the spread of a set of data. The five-number summary consists of: Minimum Q1 Median Q3 Maximum Write the five number summary for the Dad data: Write the five number summary for the Mom data: 8. Now calculate the IQR for each of the two sets of data. Dad Mom 10

11 ACCIDENTAL DEATHS In 1997 there were 92,353 deaths from accidents in the United States. Among these were 42,340 deaths from motor vehicle accidents, 11,858 from falls, 10,163 from poisoning, 4051 from drowning, and 3601 from fires. The rest were listed as other causes. a. Find the percent of accidental deaths from each of these causes, rounded to the nearest percent. b. What percent of accidental deaths were from other causes? c. NEATLY create a well-labeled bar graph of the distribution of causes of accidental deaths. Be sure to include an other causes bar. Label axes, scale and title. d. A pie chart is another graphical display used to show all the categories in a categorical variable relative to each other. By hand, create a pie chart for the accidental death percentages. Label appropriately. 11

12 WEATHER! The data below gives the number of hurricanes that happened each year from 1944 through 2000 as reported by Science magazine. a. Make a dotplot to display these data. Make sure you include appropriate labels, title, and scale. 12

13 SHOPPING SPREE! A marketing consultant observed 50 consecutive shoppers at a supermarket. One variable of interest was how much each shopper spent in the store. Here are the data (round to the nearest dollar), arranged in increasing order: a. Make a stemplot using tens of dollars as the stem and dollars as the leaves. Make sure you include appropriate labels, title and key. KEY 13

14 WHERE DO OLDER FOLKS LIVE? This table gives the percentage of residents aged 65 of older in each of the 50 states. Histograms are a way to display groups of quantitative data into bins (the bars). These bins have the same width and scale and are touching because the number line is continuous. To make a histogram you must first decide on an appropriate bin width and count how many observations are in each bin. The bins for percentage of residents aged 65 or older have been started below for you. a. Finish the chart of Bin widths and then create a histogram using those bins on the grid below. Make sure you include appropriate labels, title and scale. 14

15 SSHA SCORES Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women: and for 20 first-year college men: a. Put the data values in order for each gender. Compute numeral summaries for each gender. 15

21 The 20 percent reduction in risk certainly sounds impressive. But to really understand what this statistic means, you need to ask, "20 percent lower than what?" In other words, you need to know the chance of breast cancer for people who do not use aspirin. Unfortunately, this information did not appear in any of the media reports. While it might be tempting to fault journalists for sloppy, incomplete reporting, it is hard to blame them when the information was missing from the journal article itself. In the study, Columbia University researchers asked approximately 3,000 women with and without breast cancer about their use of aspirin in the past. The typical woman in this study was between the ages of 55 and 64. According to the National Cancer Institute, about 20 out of 1,000 women in this age group will develop breast cancer in the next five years. Therefore, the "20 percent lower chance" would translate into a change in risk from 20 per 1,000 women to 16 per 1, or four fewer breast cancers per 1,000 women over five years. For people who prefer to look at percentages, this translates as meaning that 2 percent develop breast cancer without aspirin, while 1.6 percent develop it with aspirin, for an absolute risk reduction of 0.4 percent over five years. Another way to present these results would be to say that a woman's chance of being free from breast cancer over the next five years was 98.4 percent if she used aspirin and 98 percent if she did not. Seeing the actual risks leaves a very different impression than a statement like "aspirin lowers breast cancer risk by 20 percent." (See "Research Basics: How Big Is the Difference?") Against What Size Harms? Is the potential benefit of aspirin big enough to outweigh its known harms? Unfortunately, aspirin, like most drugs, can have side effects. These, according to the U.S. Preventive Services Task Force, include a small risk of serious (and possibly fatal) bleeding in the stomach or intestine, or strokes from bleeding in the brain -- harms briefly noted but not quantified in the original study or in most media reports. To decide whether aspirin is worth taking, women need to know how the potential size of aspirin's benefit in reducing breast cancer compares with the drug's potential harms. Sound medical practice dictates doing the same kind of calculation -- of potential benefits against potential harms -- anytime you consider taking a drug. We provide the relevant information in the "Aspirin Study Facts," below. The first column shows the health outcome being considered (e.g., getting breast cancer, having a major bleeding event). The second column shows the chance of the outcome over five years for women not taking aspirin. The third column shows the corresponding chance for women taking aspirin. And the fourth column shows the difference -- the possible effect of aspirin. As the table shows, the size of the known risk for stomach bleeding to a woman taking aspirin daily nearly matches the size of the still-hypothetical benefit in terms of breast cancer protection. That kind of comparison might lead some women to conclude that the tradeoff doesn't warrant the risk. While it may take you some time to become familiar with this table, we think this sort of presentation would be helpful in many situations; for example, whenever people are deciding about taking a new medication or undergoing elective surgery. Is It Really Aspirin? Does aspirin really prevent breast cancer, or is there some other difference between women in the study that could account for the difference in cancer rates? Can we be sure that aspirin was responsible for the "20 percent fewer" breast cancers that the Columbia researchers found among aspirin users compared with nonusers? To understand why not, it is necessary to know some of the details about how the study was conducted. 21

22 The researchers collected information from all of the women in New York's Nassau and Suffolk counties on Long Island, who were diagnosed with breast cancer in 1996 and For comparison, they matched these women with others who did not have breast cancer, but who were about the same age and from the same counties. The researchers asked all the women about their use of aspirin. They found that aspirin use was more common among the women without breast cancer. While the researchers were careful to report that the use of aspirin was "associated" with reduced risk of breast cancer, the media used stronger language, suggesting aspirin played a role in preventing breast tumors. Unfortunately, this kind of study -- an observational study -- cannot prove that it was the aspirin that lowered breast cancer risk. Strictly speaking, the researchers demonstrated only that there is an association between aspirin and breast cancer. Consider how an association between aspirin and breast cancer could exist even if aspirin has no effect on breast cancer. It could be that women who use aspirin regularly are already at a lower risk of breast cancer. Imagine, for example, there was a gene that protected against breast cancer but also made people more susceptible to pain. Women who carried this gene would be more apt to use aspirin for pain relief. The lower breast cancer risk in aspirin users might simply reflect the fact that they had this gene. In other words, aspirin might have nothing to do with the findings. To really know if aspirin lowers breast cancer risk would require a different kind of study -- a randomized trial. (See "Research Basics: Cause or Association?") Nonetheless, observational studies are important (and often crucial) in building the case for doing a randomized trial. In this instance, the researchers had a theory for how aspirin might prevent breast cancers. They predicted that it would only be true for certain kinds of cancers (so-called hormone receptor positive cancers, the most dangerous kind, which account for about 60 percent of all breast cancers). And that is just what they observed: The association between aspirin and breast cancer was not seen in hormone receptor negative cancers. That the researchers' prediction was correct supports (but does not prove) the idea that aspirin reduces risk. The next logical step would be a randomized trial. The difference between "cause" and "association" may seem subtle, but it is actually profound. Even so, people -- like the headline writers in this case -- often go beyond the evidence at hand and assume that an association is causal. Readers should know that many associations do not reflect cause and effect. The Bottom Line In a large observational study, researchers found slightly fewer breast cancers among women who took aspirin regularly compared with women who did not. Because aspirin's benefit in reducing breast cancer (assuming it can be proven) was small, it may not outweigh the drug's known harms. While it is possible that aspirin itself reduces the risk of breast cancer, we cannot be sure from this study. It would take a randomized trial to be certain. Fortunately, one has just been completed by researchers at Harvard Medical School, and the results are expected in the very near future. Until then, it is too soon to recommend taking aspirin to prevent breast cancer. Lisa Schwartz, Steven Woloshin and Gilbert Welch are physician researchers in the VA Outcomes Group in White River Junction, Vt., and faculty members at the Dartmouth Medical School. They conduct regular seminars on how to interpret medical studies. (Seehttp:// The views expressed do not necessarily represent the views of the Department of Veterans Affairs or the United States Government The Washington Post Company 22

23 Appendix 3: Quick Reference of Statistical Basics I. Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these data, it makes sense to find things like average or range (largest value smallest value). For instance, it doesn t make sense to find the mean shirt color because shirt color is not an example of a quantitative variable. Some quantitative variables take on discrete values, such as shoe size (6, 6 ½, 7, ) or the number of soup cans collected by a school. Other quantitative variables take on continuous values, such as your height (60 inches, inches, inches, etc,) or how much water it takes to fill up your bathtub ( gallons or gallons or 99 gallons, etc.) Categorical (or qualitative) Data These are data that take on values that describe some characteristic of something, such as the color of shirts. These values are categories of a population, such as M or F for gender of people, Don t Drive or Drive for the method of transportation used by students to get to school. These are examples of binary variables. These variables only have two possible values. Some categorical variables have more than two values, such as hair color, brand of jeans, and so on. Two types of variables: Quantitative Categorical Discrete Continuous Binary More than 2 categories 23

24 II. Numerical Descriptions of Quantitative Data Measures of Center Mean: The sum of all the data values divided by the number (n) of data values. Example Data: 4, 36, 10, 22, 9 Mean = x = n x i = = 5 81 = 16.2 Median: The middle element of an ordered set of data. Examples Data: 4, 36, 10, 22, 9 = Median = 10 Data: 4, 36, 10, 22, 9, 43 = Median = = 16 Measures of Spread: Range: Maximum value Minimum value Example Data: 4, 36, 10, 22, 9 = Range = Max. Min. = 36 4 = 32 Interquartile Range (IQR): The difference between the 75 th percentile (Q3) and the 25 th percentile(q1). This is Q3 Q1. Q1 is the median of the lower half of the data and Q3 is the median of the upper half. In neither case is the median of the data included in these calculations. The IQR contains 50% of the data. Each quartile contains 25% of the data. Examples 1. Data: 4, 36, 10, 22, 9 = So, the IQR = = 22.5 Q1 = 6.5 Q3 = Data: Q1 Q3 So, the IQR = 36 9 = 27 24

25 Five-number summary: consists of Minimum, Q1, Median, Q3, and Maximum. To find these statistics, enter the data you have into your calculator using the list function : STAT ENTER type the data into L1. If you make a mistake, you can go to the error and DELETE. If you forget an item, you can go to the line below where it is supposed to be and press 2 nd DEL to insert it. To find the each value of the five-number summary, go to 2 nd STAT MATH 5 and then type in L1 by typing 2 nd 1 NOTE: If the lists you are using already have numbers in them before you start, you can clear them this way: Arrow up ( ) to the line where L1 is shown. Press CLEAR, then the down arrow ( ). Graphical Displays of Univariate (one variable) Data Dotplot Boxplot (Box and Whiskers) Stemplot (Stem and Leaf) Histogram III. Student GPA's Dot Plot To make a Dotplot: 1. Draw and label a number line so that all the values in your dataset will fit. 2. Graph each of the data values with a dot. Be sure to line the dots up vertically as well as horizontally so that you can really see the shape of the graph GPA Stemplot of Student GPAs Key: 3 4 = 3.4 TO MAKE A STEMPLOT: 1. Put the data in ascending order. Make a key! 2. Use only the last digit of the number as a leaf (see the numbers to the right of the line each digit is the last digit of a larger number). 3. Use one, two, or more digits as the stem. (Sometimes, you can truncate data when there are too many digits in each data value i.e. the number 20, 578 would become 20 5, where the 20 is in thousands. Note that this is different from rounding.) 4. Place the stem digit(s) to the left of the line and the leaf digit to the right of the line. Do this for each data value. You should then arrange the leaves in ascending order. 5. Sometimes, there are many numbers with the same stem. In this situation it might be useful to break the numbers with the same stem into either two distinct groups (each on a separate line; say, leaves from 0 4 on the first line and 5 9 on the second.) or into five distinct groups as is shown in the graph to the right. Here, the first line for each stem contains all the 0 1 leaves, the next line contains the 2 3 leaves and so on. This technique is called splitting the stems. It is useful in some cases in 25 order to show the shape of the data more clearly.

26 To make a Boxplot: Boxplot of Student GPAs GPA 1. Draw and label a number line that includes the minimum and the maximum values for the set of data. 2. Calculate the five-number summary and make a dot for each of these summary numbers above the number line. 3. Draw a line between the 1 st and 2 nd dot, showing the lower quartile ; and then draw a line from the 4 th to the 5 th dot to show the upper quartile. These are commonly called the whiskers. 4. Draw a rectangular box from the 2 nd to the 4 th dot and draw a line through the box on the middle dot the median. NOTE: In AP Statistics, a modified boxplot is used. This shows any outliers. An outlier is a data point that does not fit the pattern of the rest of the data. When your calculator or computer software graphs a modified boxplot, an algorithm is used to determine what it takes to not fit the pattern of the rest of the data. This algorithm is: 1.5*( IQR ) away from the box part of the graph. (above and below the box). These outliers are shown with dots or stars, or any other small symbol. Frequency Histogram of Student GPAs GPA To make a histogram: 1. Put the data into ascending order. 2. Decide upon evenly spaced intervals into which to divide the set of data (such as 0, 10, 20, 30, etc.) and then count the number of values that fall within each interval. This number is called the frequency. If you divide each of these frequencies by the size of the data set, n, making percents, then you have what are called relative frequencies. 3. Draw and label a 1 st quadrant graph using scales appropriate for the data. Be sure to include a title for the x- and for the y- axes. 4. Graph the frequencies that you calculated in step 2. Categorical Data: Bar Graph Circle Graph (Pie Chart) I m assuming that you already know how to make these two types of graphs. If you need help, you can search the internet for directions. 26

27 IV. Assessing the Shape of a Graph There are two basic shapes that we will examine: Symmetric and Skewed. Symmetric: One can tell if a graph is symmetric if a vertical line in the center divides the graph into two fairly congruent shapes. (A graph does not have to be bell-shaped to be considered symmetric.) Mean ~ Median in a symmetric distribution Symmetric Skewed: One can tell that a graph is skewed if the graph has a big clump of data on either the left (skewed right) or on the right (skewed left) with a tendency to get flatter and flatter as the values of the data increase (skewed right) or decrease (skewed left). A common misconception is that the skewness occurs at the big clump. Relationship between Mean and Median in a skewed distribution: Skewed Left, the mean is Less. Skewed Right Skewed Right, the mean is Might. Gathering Information from a Graphical Display The first thing that should be done after gathering data is to examine it graphically and numerically to find out as much information about the various features of the data as possible. These will be important when choosing what kind of procedures will be appropriate to use to find out an answer to a question that is being investigated. The features that are the most important are Center, Unusual Features, Shape, and Spread: CUSS. Most of these can only be seen in a graph. However, sometimes the shape is indistinct difficult to discern. So, in this instance (usually because of a very small set of data), it s appropriate to label the shape indistinct. 27

28 Name: AP Statistics Summer Assignment Rubric Total /75 Part 1: Essay Formatting and Citations: Possible Points Points Earned Formatting- Correct font, double spaced, correct length for all sections and total paper, section titles- aka directions followed! 7 /7 Sources Cited 5 /5 Content What is a statistician?- In depth analysis of what a statistician does using at least 2 sources. Why take statistics? A persuasive explanation of why statistics is useful to high school students Why are you taking statistics? A personal explanation of why you are taking AP stats. 8 /8 8 /8 8 /8 Total /36 Math Packet Possible Points Points Earned Completed and On time 5 /5 Part 2: Reading and Writing 7 /7 Part 3: Vocabulary List 7 /7 Part 4: Practice Problems 20 /20 Total /39 28