# The Ultimate Student s Starter Kit to AP Statistics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 - A PUBLICATION OF ALBERT.IO - AP STATISTICS The Ultimate Student s Starter Kit to AP Statistics EVERYTHING YOU NEED TO GET STARTED *AP and Advanced Placement are registered trademarks of the College Board, which was not involved in the production of, and does not endorse, this product.

2 Ready to Score Higher? Stop stressing about the AP Statistics exam. Albert has got your back! With thousands of practice questions, personalized statistics, and anytime, anywhere access, Albert helps you learn faster and master the difficult concepts you are bound to see on test day. Click below or visit Start Practicing

3 TABLE OF CONTENTS 6 Introduction 7 About Us 10 Is AP Statistics Hard? 20 How to Calculate Means 28 How to Calculate Medians 36 How to Calculate Ranges

4 TABLE OF CONTENTS 45 Standard Deviation 53 The German Tank Problem Explained 58 Z-score Calculations & Percentiles in a Normal Distribution 69 Describing Distributions in AP Statistics 82 3 Ways to Approach AP Statistics Free Response Questions

5 TABLE OF CONTENTS 86 How to Study for AP Statistics 104 The Ultimate List of AP Statistics Tips

6 Introduction AP Statistics is no walk in the park. Last year, only 13.9% of students earned a 5 on the exam. That s why we ve created this ebook. It s designed to be a helpful starter kit for any student planning to take the AP Statistics exam. By beginning here, you ll have a better understanding of the test, and receive essential tools to set yourself up for success. This book features information from the Albert Blog, where new academic resources are published every day of the week. Be sure to regularly check the blog and subscribe to hear about our new posts. You can also find tips and study guides for your AP classes, and admissions advice for your dream school on our blog. us at if you have any questions, suggestions, or comments! Last Updated: January

7 About Us What is Albert? Albert bridges the gap between learning and mastery with interactive content written by world-class educators. We offer: Tens of thousands of AP-style practice questions in all the major APs A complete competitive online leaderboard to see where you stand compared to others Immediate feedback on each question answered An easy to access platform from any Internet-enabled device In-depth personal statistics to track your progress Intuitive classroom tools for teachers and administrators Discover why thousands of students and educators trust Albert 7

9 Why Students Love Us We asked students how they did after using Albert. Here is what they had to say: I scored very well this year four 5s and one 4. Albert helped me get used to the types of questions asked on the exam and overall my scores were better this year. Robyn G., Chambersburg Area Senior High School Last year was my first year taking an AP test, and unfortunately I did not do as well as I had hoped. The subject had not been my best, and that was definitely displayed on my performance. However this year, I made a much higher score on my AP test. The previous year had been AP World History and I had made a 2. For this year it was AP English Language, and I scored a 4. There was a definite jump in my score, because Albert pushed me to focus on my weaknesses and form them into strengths. Charlotte R., Rome High I scored a 4 on AP Biology, much higher than expected. Albert was an effective resource to guide me through AP Biology. Keeping up with it consistently all year as I learned the lesson in class was crucial to reinforcing my understanding and long-term memorization of Biology. After class each day, Albert helped to sink in the ideas that I was taught in the morning. Lily O., Wake Forest High School 9

10 Is AP Statistics Hard? Image Source: Flickr So you are thinking about taking AP Statistics, and you are wondering how much work it will be. Or maybe you ve already signed up and you re wondering what you ve gotten yourself into. Either way, we know how difficult it is to figure out just how much work a class will take, and know whether or not it is worth the effort. Here at Albert.io we ve got you covered with the full story on AP Statistics difficulty, plenty of review information, study guides, and plenty of practice questions. In the following article, you ll get a full AP Stats review, including what the AP Stats Exam is, whether you should take the exam, insight into AP Stats difficulty, and how the exam is scored. We ll also throw in some links to info about the best study guides and practice questions. First, let s take a look at what the AP Stats Exam is. 10

11 Is AP Statistics Hard? Cont. What is the AP Statistics Exam? AP Exams provide top high school students the chance to study college-level course material and perform college-level coursework while still in high school to gain the skills they will need in college. If the student demonstrates that he or she has mastered the material, more than 90% of four-year colleges or universities in the US will reward his or her efforts by providing real credits that take the place of a college course. All AP classes are a step above regular classes in difficulty, and AP Statistics is no exception. But before we dive into the specifics of AP Stats difficulty, the first thing we will cover in our AP Stats review is whether it s a good idea for you to take the AP Stats Exam in the first place. Should You Take the AP Statistics Exam? Let s weigh the pros and cons. On the negative side, AP Exams are difficult and require a lot of time and effort. The AP Statistics Exam is no easier than any other AP Exam. There is also a fee involved, to cover the costs associated with administering and scoring the exams. This fee is listed online at \$93 per exam. On the other hand, the benefits of taking AP Statistics include developing skills like hypothesis testing and statistical literacy that will not only help you with science classes you take in college but also for the rest of your life. And depending on how much your college will charge per credit hour, there s a good chance that being able to waive a semester-long college statistics class for less than \$100 will end up saving you gobs of money in the long run. In general, taking an AP Statistics class is a challenge, but well worth the effort in the long run. And if you have already invested the time and energy to taking an AP Stats class, it would be a waste not to buckle up and take the AP Stats Exam. 11

12 Is AP Statistics Hard? Cont. What s the AP Statistics Exam Like? The AP Statistics Exam is a three-hour test with two separate sections. The first section is a 90-minute multiple-choice section with a total of 40 questions. The second section is a 90-minute free-response section with a total of six problems to be solved: Five free response questions and one investigative task. Graphing calculators are allowed on the exam, and students are encouraged to use them. Both sections count equally towards the overall score you are given, from 1-5. The score you get is standardized across all the students who took the exam at the same time as you, and colleges generally offer credit for all scores above a 2. Unless you have a special extension from CollegeBoard, the AP Stats Exam usually takes place during the second week of May. The exams are then scored, and scores are usually released during the first or second week of July. (Make sure you keep this timeline in mind when you are preparing your AP Statistics study guide!) Next, we will continue our AP Statistics review and get right down to talking about AP Stats difficulty. How Difficult is AP Statistics? Now it s not so easy to give a straightforward answer to the question of AP Statistics difficulty. Every student has strengths and weaknesses, and every school does too. Here we will break down AP Stats difficulty into several different components, including average scores, the difficulty of the content, the skills required, and how easy it is to balance AP Stats with other aspects of your life. It is easy to say what percentage of students taking the AP Statistics Exam scored in a high range. 12

13 Is AP Statistics Hard? Cont. Here is a chart of the percentage of people who got each of the five possible scores on the AP Stats Exam over the last five years. Score % 13.2% 14.0% 12.6% 12.2% % 18.9% 20.9% 20.2% 20.9% % 25.2% 24.5% 25.0% 25.7% % 18.9% 17.9% 18.8% 18.1% % 23.8% 22.7% 23.4% 23.1% As you can see, a high percentage of students earn scores of 3, 4, or 5 on the AP Statistics Exam. In fact, across all five years, an average of 59% of test-takers earned a score of 3, 4, or 5. This means that almost 3 out of every five students taking the AP Stats Exam have the potential to earn college credit. These rates compare favorably to other popular AP Exams, but every student is different. While it might be easier for some students to get a high score on the AP Statistics Exam, it could be easier for others to score highly on the AP Biology Exam or the AP US History Exam. Let s continue our discussion of AP Statistics difficulty by talking about the specific content that is covered on the AP Stats Exam. 13

14 Course Content Is AP Statistics Hard? Cont. The content on the AP Statistics Exam centers around four major themes. These are: 1. Exploring Data: Describing patterns and outliers in datasets 2. Sampling & Experimentation: Planning and conducting statistical analyses 3. Anticipating Patterns: Using probability and simulation to explore random events 4. Statistical Inference: Hypothesis Testing and estimating population parameters You should make sure to include each of these themes in your AP Statistics study guide. If you are interested in what these sections specifically include, you can check out pages of the CollegeBoard AP Statistics course description (pg. 11). Let s continue our discussion of AP Statistics difficulty by talking more broadly about what skills are required to be successful on the test. Skills Required As with most AP Exams, getting a good score on the AP Statistics Exam is more than just memorizing facts and figures. In fact, the real AP Statistics difficulty lies not in how much you can memorize, but rather in your ability to perform statistical reasoning. Let s look at each skill you will need for the exam individually: 1. Arithmetic: AP Statistics does not require a whole lot of complex math certainly no calculus. But you will be using a lot of equations, so a general number sense is a great help, and will help towards lowering overall AP Stats difficulty. 14

15 Is AP Statistics Hard? Cont. 1. Graphical Literacy: There is a lot of graphing involved in AP Statistics. What we mean here is the ability to look at a graph and gain a good sense for what information is expressed there. This is the kind of thing tested on the ACT Science section, for example. 2. Statistical Reasoning: This is the primary skill that you will work on at every single stage of your AP Statistics class. In short, this is the ability to use appropriate statistical language when describing your findings. For example, At a 95% confidence level, we may reject the null hypothesis in favor of the alternative hypothesis. 3. Trial and Error: While you may think it strange to consider this a skill, the fact is that AP Statistics practice problems sometimes require a plug and chug approach. This means the ability to keep on trying new solutions until you find one that works. Some of these skills are things that differ from person to person: For example, some people believe that they are more math people, and some people don t. However, each of these four skills can be developed throughout the course of an AP Statistics class. Let s continue our discussion of AP Statistics difficulty to talk about how you can balance AP Statistics with your other work. Finding Balance For many students taking AP classes, it can be difficult to know how many is too many. It s very common for highly motivated students to overload themselves with so many classes that their performance suffers in all of them. 15

17 Is AP Statistics Hard? Cont. Getting a three the average score on the AP Statistics exam (or the median score, for you statistics experts!) is equivalent to getting a low B or a high C in a college course. That s nothing to sneeze at for a junior or senior in high school! And even if you didn t receive the score you wanted or expected, that doesn t say anything about who you are as a person, and it doesn t even mean that you won t get college credit! Every college has a policy on what scores they will accept for college credit, and you can find these policies online. Normalized Scores Also, remember that your AP Statistics scores are normalized, and the number you get from 1-5 is relative to how everyone else did. So it s not just about how many questions you get right or wrong, rather it s about how you do compared to every other student who took the same test that you did. So if CollegeBoard decides to include a really difficult new free response question on the Exam that you take don t worry! Everybody will probably struggle with it just as much as you did. 17

19 Is AP Statistics Hard? Cont. Now let s take a look at a practice AP Statistics question to give you a small taste of the type of material that will be covered on the AP Statistics Exam. Practice Question Consider a data set of positive values, at least two of which are not equal. Which of the following sample statistics will be changed when each value in this data set is multiplied by a constant whose absolute value is greater than 1? 1. The mean 2. The median III. The standard deviation (a) I only (b) II only (c) III only (d) I and II only (e) I, II, and III In this AP Statistics practice question, the best approach is to both reason out the answer using statistical reasoning and also to plug and chug, just to make sure. Let s define a simple data set as {1, 2, 3}. Now the problem says to multiply each value in the set by a constant greater than one. Let s multiply each number in the set by two, to give us {2, 4, 6}. With this simple example, we can clearly see that the mean and median have both changed from [2] to [4]. So we know that sample statistics I and II are both changed. But to find out if sample statistic III changes, there s no need to go into the detail of calculating the standard deviation. Using our basic statistical knowledge, we know that standard deviation is a measure of how distant individual values in a set are from the mean. In the first set, the first and third values were only one digit away from the mean. In the second set, the same values were two digits away from the mean. Therefore, we know that all three sample statistics will be different and that the answer must then be (e). 19

20 How to Calculate Means Image Source: Wikimedia Commons Means, medians, modes oh my! AP Statistics is all about the numbers, but the first step to getting the math right is understanding the terms that go along with it. One of the most basic and important concepts in AP Stats is the mean you ll need to know how to use it for almost everything else covered on the test! While you ve surely heard the word before, the definition of mean in math is very different from the way you re used to hearing it in daily life. In stats, mean is just a fancy name for the mathematical average of a set of numbers. 20

21 How to Calculate Means Cont. In this AP Statistics review, we ll explain what a mean is, how to calculate the mean, and how to master mean-related questions with ease on the AP test. Defining the Mean Image Source: Wikimedia Commons First thing s first: what does mean even mean? The mean is a synonym for average, and it s one way to measure the center of a set of numbers. 21

22 How to Calculate Means Cont. Your textbook probably discusses the mean under the section called descriptive statistics, because that s exactly what it does it helps us to describe a set of numbers. So, why would we even want to know how to calculate the mean? Well, statistics is all about looking for relationships among data, and, as we ll see, the mean can reveal a lot! Let s work from an example. Say you just got back a test, and you want to know how you compared to your classmates did you score above or below the average? In order to figure this question out, first you ll have to add up all the scores from your classmates and divide by the number of students that s the mean! Then, you can see where your score falls compared to the rest of the class. Example 1 Let s go through this problem step by step. Imagine there are 5 students in your class: Tom, Kelly, Mark, Amanda, and you. The test scores are listed in the table below. Tom Kelly Mark Amanda You 80% 58% 60% 100% 87% You got an 87% not as high as you d like, but how does it compare to everyone else? First, add up all the scores: = 385 We then divide that number by the total number of students to find the mean or the center of the scores. 385 / 5 = 77 22

23 How to Calculate Means Cont. The mean score is 77%. That shows you scored 10 points above average looks like you re not doing too shabby after all! This example shows that it is impossible to know how to interpret a number without looking at how it compares to the average. Is 50% a bad score? Not if the average score is 10%! That s where the mean comes in handy. It can tell us about the relationships between our numbers, which is the basis of statistics! Now that you know what a mean is in practical terms, let s introduce the formula for mean: While this may look a bit scary, it s simply a way of representing the steps we just did, only this time using symbols instead of words. x just means to add up all the x s, or all of our scores. Then, we divide by N, which is the Number of scores in our set. Knowing what these symbols mean and how to use them will be very important for the test! The concept of the mean is crucial for AP Stats not only for what it can tell us on its own, but also because it will be used in a ton of other formulas throughout the course. Let s try another sample to make sure we ve got it down. 23

24 How to Calculate Means Cont. Example 2 A researcher wants to find out if boys or girls are taller at age 10. The heights in inches are recorded for a random sample of 5 boys and 5 girls. The data are shown below. Which group is taller? Boys height (in.) Girls height (in.) From the data, we can see that some of the girls are taller than some of the boys, and vice versa. But which group is taller overall? First, we ll want to sum the scores for each group. Then, we ll divide each of those sums by the number in the group to find the mean. From this, we can conclude that the sample of boys is, on average, taller than the sample of girls. 24

25 How to Calculate Means Cont. When to Use the Mean While the mean is used in a lot of statistics, it s also very important to know when to use it and when not to use it. The mean is only useful for interval or ratio data. Intervals and ratios are types of quantitative measurements. Unlike categorical variables such as gender or nationality, quantitative variables such as height can be measured on a numerical scale. Sometimes categorical variables are also represented with numbers. For example, if you took a survey and recorded the gender of the survey-takers, you might put down a 1 for males and a 0 for females. However, you still can t use a mean in this situation, because the numbers are really just placeholders! You also can t use a mean for ordinal data, which is data about the order of a set such as the finishers in a race. The mean of the 1 st, 2 nd, and 3 rd place finishers is 2, which is also a meaningless number! If you re still confused about the types of data and what they are, you ll want to review levels of measurement. Another important thing to know about the mean is that it is skewed by outliers. In other words, the mean is pulled towards the extremes very big or very small numbers. Let s look at another example to explain this. Say we poll a random sample of 6 people and ask how many hours of television they watch every day. Their data is listed below. Person 1 Person 2 Person 3 Person 4 Person 5 1 hr 0.5 hr 1.5 hrs 11 hrs 1 hr Most of our surveyed people watch between hours a day, but Person 4 watches 11 hours! Let s calculate the mean. 25

26 How to Calculate Means Cont. So it looks like the average time spent watching TV is about 3 hours a day. But let s see what happens when we remove Person 4. The Person 4 s extreme TV habits pulled the mean from 1 to 3 hours! Although it is a descriptive statistic, the mean doesn t describe our data very well in the first case it would be a misrepresentation to claim that most people watch 3 hours of TV per day. In cases when there are outliers like this, you may want to use some other measures of central tendency, like the median and the mode. Image Source: Wikimedia Commons 26

27 How to Calculate Means Cont. Using Means on the AP Statistics Exam Now that we ve seen some examples of how to calculate the mean and when to use it, let s talk about where it will come into play for the AP Statistics test. Because the mean is such a basic concept in AP Stats, most of the time you will be asked about the relationship of the mean to other statistics. You may need to calculate a mean in order to determine a z-score or standard deviation, for example. Other times, you may be given a mean and asked to figure out some other statistics, or determine whether the data set has outliers like we discussed in the last paragraph. The mean is likely to pop up all over the test, so make sure you have it down pat! In this AP statistics review, we ve talked about what the mean is, how to calculate the mean, and how to use it on a test. At this point, you should be a master of all things mean and ready to move onto some bigger concepts! Happy cramming! 27

28 How to Calculate Medians Image Source: Wikimedia Commons Dealing with the stats part of the AP Statistics exam can be intimidating enough on its own, but on top of that, you are probably discovering that there s enough confusing vocabulary to make your head spin! Take the median, for example along with its siblings mean and mode; they make up the measures of central tendency, and like any family members, they can be tough to tell apart. However, once you get a handle on putting names to (mathematical) faces, you ll find that calculating the median is easy, and it s a simple concept that will be very crucial on the exam. This AP Statistics review will explain what the median is and how to tell it apart from similar concepts, show you how to calculate the median, and help you learn how to conquer median-related questions on the AP exam. 28

29 How to Calculate Medians Cont. What is the Median and Why do We Use it in AP Statistics? Before we learn how to calculate the median, it helps to understand it in the context of related terms. The median is the second of the measures of central tendency (see our other blog post for a review of the mean), which means it is one way of showing us the central value within a set of numbers. Outside of statistics, you ve probably heard the word median used to refer to the strip of land that divides the two halves of the road, like in the picture below. Similarly, the mathematical median is the point that divides the two halves of a distribution. In stats, we refer to a set of numbers as a distribution. A distribution, unlike a physical object, doesn t have just one center. Rather, there are different ways to think about what is the central value in a distribution, and, therefore, there are different ways to calculate it. One way to think about the center of a distribution is to take the mathematical average, or mean. For example, when your teacher returns a test, your class will have a distribution of scores some students may score high, the ones who didn t study will have low scores, and the rest will fall somewhere in the middle. If you found out that the mean score on a test was 60%, you might guess that it was a difficult test and most students did poorly (however, the brightest students probably still scored well above the mean!). If the mean score on the next test was 90%, you might guess that most students did very well (though the worst students probably still scored well below the mean). In either case, there is still a spread of scores from the lowest to the highest, but the center of that spread appears to be moving around from test to test. However, there are a few times when the mean does not give us an accurate picture of the center of a distribution. That s because the mean is pulled towards extreme values. For example, if most people have a very low score on the test but a few smarty-pants get 100 s, the mean score will be pulled up. As a result, the majority of the class will fall below the mean, while only a few people will be above the mean. If there are significantly more people on one side than the other, we clearly haven t divided our distribution very well! 29

30 How to Calculate Medians Cont. Let s illustrate this with a simple example of 5 students: Student 1 Student 2 Student 3 Student 4 Student 5 45% 39% 35% 41% 100% To calculate the mean, we add up all the scores and divide by the number of students, and we discover that the mean is 52%. This calculation means that 4 out of 5 students (80%) fall below the mean, while only 1 student (20%) is above the mean. In this case, the mean isn t a very good way of measuring the center of the scores. That s where the median comes in. The median divides the scores evenly in half. It s the middle value, such that 50% of the distribution falls below it, and 50% above it. The median is a useful descriptive statistic for cases in which the distribution is asymmetrical, or skewed, such as in the picture below. As you can see, in this image, the mean has been pulled slightly to the right of the median (obviously not as drastic of a shift as in the previous example of 5 students, but pulled nonetheless). Conversely, when a distribution is symmetrical, the mean and median will be the same value! As a result, we can compare the mean and median to figure out the shape of a distribution. When the mean is above the median, the distribution is positively skewed meaning the long tail is to the right. When the mean is below the median, the distribution is negatively skewed. Image Source: Wikimedia Commons 30

31 How to Calculate Medians Cont. Keeping Your Terms Straight One of the initial challenges you have to overcome before you can master calculating the median and using it on the test is simply making sure you don t mix it up with related concepts! The measures of central tendency mean, median, and mode are obviously similar, but they all serve different functions, so it s important to remember which term is which. You can accomplish this by creating memory associations or other mnemonic devices. Let s talk about some of these options. Memory associations link the name of something to its meaning through ideas or images that are easy to remember. For example, to remember that mean is the average, we might come up with a phrase like the average crocodile is very mean. For median, you might think of the image of the road median from the beginning of this post, and remember that the median divides evenly in half. It doesn t matter what image you choose in fact, you may want to make up your own! The stranger and more personal it is, the easier it will be to remember (just make sure it connects back to the definition somehow!). Other useful mnemonic devices are songs, poems, or rhymes. A teacher of mine taught us to remember median by adapting an old nursery rhyme: Hey diddle diddle, the median s the middle Again, you don t have to use one that already exists. If you spend a little bit of time now thinking of your own way to remember these terms, I guarantee you ll never mix them up again! Image Source: Wikimedia Commons 31

32 How to Calculate Medians Cont. How to Calculate the Median for AP Statistics Now that we know what it is, let s talk about how to calculate the median. The first step to calculating the median is to arrange our scores in numerical order. Let s try this with our example above of the five students. When we rearrange their scores from smallest to largest, they look like this: Student 3 Student 2 Student 4 Student 1 Student 5 35% 39% 41% 45% 100% From there, it s simple we just look to see which value is right in the middle, and divides the set in half. So the median in this example is 41%! It s pretty obvious when you have an odd number of scores, but how do you determine the median when there is no clear middle? Let s try an example with an even number of scores. Imagine there are 6 people, and we re looking at their heights in inches. Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 73 in. 53 in. 61 in. 57 in. 70 in. 65 in. Again, let s arrange them in numerical order. Person 2 Person 4 Person 3 Person 6 Person 5 Person 1 53 in. 57 in. 61 in. 65 in. 70 in. 73 in. 32

33 How to Calculate Medians Cont. How to Calculate the Median for AP Statistics Now that we know what it is, let s talk about how to calculate the median. The first step to calculating the median is to arrange our scores in numerical order. Let s try this with our example above of the five students. When we rearrange their scores from smallest to largest, they look like this: In this case, there is no middle score. However, another way to think about it is that there are two middle scores. So, we need to find the number that is directly between those two. To do that, we ll make use of our old friend mean to find the value that falls right between Person 3 and Person 6. First we add up the two middle scores: = 126 Then, we divide by the number of scores we re looking at: In this case, our median is 63 inches. It s important to note that the median doesn t have to be a number that is actually found in any of the scores in your distribution! Mastering Median-Related Questions on the AP Stats Exam Since the median, like the other measures of central tendency, is one of the most basic concepts in AP Stats, you re unlikely to encounter any questions on the exam that simply ask you to calculate a median. Rather, this calculation is much more likely to be combined with other concepts such as the mean. For example, you may be asked to determine whether the mean or the median is a better measure to use in a specific case. That s why it s very important to remember the differences between the two and when they are each useful: mean is good for symmetrical distributions with no outliers, median is good for asymmetrical distributions or ones with outliers. You could also be asked to use the mean and median to determine the shape of the distribution, so make sure to review the properties of skewness mentioned previously. 33

34 How to Calculate Medians Cont. From this AP Statistics review, you should now know how to calculate the median, when to use it relative to other measures, and how to keep your terms straight! Have you come up with any fun ways to remember the difference between mean and median? Drop your tips in the comments below to help out your fellow AP Stats crammers! 34

35 Ready to Score Higher? Stop stressing about the AP Statistics exam. Albert has got your back! With thousands of practice questions, personalized statistics, and anytime, anywhere access, Albert helps you learn faster and master the difficult concepts you are bound to see on test day. Click below or visit Start Practicing

36 How to Calculate Ranges Statistics has a habit of taking words that we know and love, and turning them into something else completely. Take the range, for example. While this term may make you think of the old tune Home on the Range, the definition of range in math involves way fewer deer and antelope. Luckily for you, this AP Statistics review is here to save you from confusion. Read on to learn what a range is and why it s important, how to calculate a range, and how to tackle range-related questions on the AP Stats exam. By the end of this article, you ll be at home on the range! What is a Range? Below is a picture of what you might imagine when you hear the word range a wide, open stretch of land. In a way, a connection can be made to the meaning of range in math: it s the stretch of your data set, the distance from one side of your distribution to the other. It s a descriptive statistic, which is exactly what it sounds like it helps to describe the shape of a distribution. In this case, range tells us how wide or spread out a distribution is. Just like they preferred it in the Old West, the range is a simple idea, and calculating the range is just as easy. Which brings us right along to the next point Image Source: Wikimedia Commons 36

37 How to Calculate Ranges Cont. How to Calculate the Range In its most basic form, the range is simply the numeric distance between the smallest and largest values in your distribution. At this point, calculating it is probably obvious: you just subtract the smallest number from the largest! Just for fun, let s try out an example. Say we have a class of 5 students, and their AP Stats test has just been returned. Their scores looked like this: Student 1 Student 2 Student 3 Student 4 Student 5 90% 75% 82% 98% 40% The first step is to locate our smallest and largest values easy peasy! Since that s so simple, we might as well arrange the data into numerical order, as that may help us later on if we want to calculate other things, like the median (see our post on the median for a review of that concept). Student 5 Student 2 Student 3 Student 1 Student 4 40% 75% 82% 90% 98% Then, we take our largest value and subtract our smallest value: = 58 So our range is 58%. It s that simple! In fact, the range can also be expressed in another, even easier way: 40% to 98%. While the first method is more common, this second way of expressing the range is useful when the problem also requires you to know the exact values of the endpoints, rather than just the distance between them. Which version you use will depend on upon the problem. 37

38 How to Calculate Ranges Cont. Why do We Calculate the Range? As you can see, the range gives us an idea of how spread out our distribution is. The various measures of spread, including the range, are referred to in AP Stats as measures of variance. The term variance refers to how much your scores vary are they all pretty similar and packed together, or do they differ by a large amount? Let s look at some examples to better understand the concept of variance. First, imagine your class takes a test that is incredibly easy. Everyone in the class is likely to do very well. The scores might look like this: Student 1 Student 2 Student 3 Student 4 Student 5 98% 100% 99% 100% 97% The range, in this case, is = 3. Next, suppose the same students take a different test that is extremely difficult, and everyone does poorly. The scores might look like this: Student 1 Student 2 Student 3 Student 4 Student 5 65% 67% 66% 67% 64% When we calculate our range: = 3. 38

39 How to Calculate Ranges Cont. The range is exactly the same in both cases, even though the scores are very different! That s because the shape of the distribution hasn t changed, it has simply been shifted over to the left. In other words, the range doesn t reflect the absolute value of your scores, just the relative differences among them. In both of these examples, the scores are packed in close together. Another way of saying this is that the variance is low. Conversely, in the first example above, we had some scores that were low and some that were high. In that case, the variance is much greater. The image below illustrates this: the skinny distribution has low variance (and therefore a small range), while the fat distribution has high variance (and a large range). Image Source: Wikimedia Commons 39

40 How to Calculate Ranges Cont. Problems with Using the Standard Range Every statistic has its pros and cons each is useful in certain situations, but not in others. For example, in a previous post we discussed how the mean is pulled by the extremes, and so we may want to use the median in cases where the distribution is asymmetrical or has outliers. The standard range faces a similar issue. Consider the following: Imagine the same situation as the last example, in which we have a very difficult test and 5 students all do pretty poorly on it. However, this time, suppose we add a sixth student who did extremely well (smells like a cheater to me ). The data could look like this: Student 1 Student 2 Stud3nt 3 Student 4 Student 5 Student 6 65% 67% 66% 67% 64% 100% Then we calculate the range: = 36. With the addition of this single outlier, the range has jumped from 3% to 36%! This new number is a very poor indication of the variance in our scores. If we were to look at that number alone, we would expect the scores to be pretty spread out. However, 5 out of 6 scores are still packed in close together, and only one is far apart. This example illustrates how the range is influenced by outliers. 40

41 How to Calculate Ranges Cont. Enter the Interquartile Range (IQR) That s where the interquartile range comes into play. The IQR is when you take a range but ignore the top 25% and bottom 25% of the data so that any outliers will be cut out. As a result, it identifies the middle 50% of your data set and may provide a better description of the variance in your data. Image Source: Wikimedia Commons 41

42 How to Calculate Ranges Cont. When you hear quartiles, think quarters or dividing scores into four even chunks. You accomplish this process using the same technique we used to find the median. Here are the steps for how to calculate the interquartile range: Arrange your data in numerical order. Find the median (which is also Q 2 ) your data is now split in half Find a median for each of the halves (Q 1 and Q 3 ) The IQR is the range between Q 1 and Q 3 simply subtract the smaller from the larger as we did before. The dividing lines of your quartiles are called Q 1, Q 2, and Q 3. It may seem strange at first that there are only 3 of these, but imagine taking a small rope and cutting it into 4 pieces. How many cuts would you have to make? Only 3, since the last cut will result in 2 pieces. Let s try this with a concrete example. This time, we will use 11 students to make the math simpler. Below we have the test scores for 11 students. First, we rearrange the data to be in numerical order: 42

43 How to Calculate Ranges Cont. Then, we find the median as well as Q 1 and Q 3. As you can see, Q 2 is the value directly in the middle, and Q 1 and Q 3 are the values in the middle of their respective halves. Now, the IQR is simply the range from Q 1 to Q = 10 In this example, most of the scores are between 79-89%, but a couple of students got scores much lower, and a couple got scores much higher. Those scores are outliers and would expand our range to be much bigger. As a result, the IQR provides a more accurate picture of our data in this case. Now, the example above was designed to have easy to find quartiles, but what if our data don t divide easily? For example, imagine we have 5 data points again. Student 5 Student 2 Student 3 Student 1 Student In this case, our median is simple to find, but our Q 1 and Q 3 will fall between two values. 43

44 How to Calculate Ranges Cont. To get around this, we use the same logic as when this happens with the median you simply find the number that is directly between those values by taking the mean. Q 1 is between the first two values, so: Q 3 falls between the last two values, so: Then, our IQR is Using Ranges on the AP Statistics Exam The most common reason to calculate range on the AP Stats exam will be for a boxplot, also known as a box and stem graph (see the image above for an example). Boxplots give a detailed description of your data because they include several different values: the standard range, the IQR, and the median. The box represents the IQR, the line in the center of the box represents the median, and the stems represent the full range. This representation allows the viewer to get an accurate picture of the variance as well as the outliers, and provides a measure of the central value. From this AP Stats review, you now know what a range is, how to calculate ranges and IQR s, why they re important, and how to use them on the AP exam! At this point, you should have a handle on the main methods for describing distributions, including measures of central tendency and variance. With these foundational concepts nailed down, you ve laid the groundwork for the rest of AP Statistics! 44

45 Standard Deviation Standard deviation is used to test variability in statistics by calculating the average distance from the mean of all the values in a data set. Another way to think of it is to ask, How much do the values in this data set deviate from the mean value? The nuts and bolts of the equation are fairly simple it just has a lot of different components to consider. This crash course will take you through how to calculate and interpret standard deviation. Then we ll look at an example from the AP Statistics test. First Step: Calculating the Mean of a Data Set In our example, the Smith family has five children. Let s say we want to find the standard deviation from the mean age of the siblings. Our data set is the five children s ages: To find the standard deviation from the mean, we first need to know what our mean, or average, is. To calculate the mean, add each number in the data set: Then divide your result by the number of values in your data set, or N. We have five values (for five Smith kids), so our N = 5. 45

46 Standard Deviation Cont. 9.2, then, is our mean, or µ. The formula for finding the mean of a data set can also be expressed as: Where is the sum of all the values in the data set; N is the number of values in the population, and n is the number of values in the data sample. Notice that the mean symbol changes when referring to a sample mean versus the population mean. If you are taking the mean of a sample rather than the whole data set, you will want to use the x-bar symbol for mean rather than the µ. Since we are calculating the mean age of all the Smith children, we will use the µ. You may also see the x in this equation written as. The superscript i simply means individual, telling you to consider each individual x value. Meanwhile, the sigma symbol,, means you have to take the sum of something. So the formula tells you that the mean is equal to the sum of all the x values divided by N. This may seem pretty basic, but understanding the code of this formula will make understanding the standard deviation formula that much easier. Next: Calculating the Standard Deviation The standard deviation is the average distance from the mean. Our µ = 9.2 for the values {3, 7, 8, 12, 16}. Therefore, we need to find the distance of each of those values from the mean, and then calculate the average distance. 46

47 Standard Deviation Cont. The formula is one you ll want to learn by heart, even though it s included on the AP Stats formula sheet. Make sure you include this one on your AP Statistics study guide. This formula represents the standard deviation from the µ. The i superscript is something you may or may not see written out in this equation; it just depends on how clear the writer wants to be. The superscript simply means to take the sum of each individual point in the data set. Since we re working with the whole population of Smith children, this is the formula we ll cover first. Later, we ll cover the formula for a sample mean. Let s set up the equation for our data set and go through it step by step: Our standard deviation is That means that each Smith child is an average distance of 4.45 years away from the mean age of all the Smith children. 47

48 Standard Deviation Cont. That s the basic formula for standard deviation. If you need to find the standard deviation of a sample mean, refer to this formula: The main difference here, apart from the use of the s for sample standard deviation and the x-bar symbol for sample mean, is the n 1. A lowercase n refers to the sample population while a capital N refers to the total population and n 1 adjusts for the difference between the sample and the whole. Interpreting the Standard Deviation A high standard deviation generally means that the data points are widely scattered from the average while a low standard deviation means that the data points are closer to the mean. This allows you to compare results within a population group. It also allows you to compare standard deviation in results between different population groups. This is particularly useful if you are attempting to reproduce your results in a scientific study. Say, for instance, that you are testing response times of participants in a driving simulation. The control group is well rested while the 3 experimental groups have had 6, 4, and 2 hours of sleep respectively. The standard deviation in their response times would give valuable insight into how erratic drivers become when sleep-deprived. 48

49 Standard Deviation Cont. That s the basic formula for standard deviation. If you need to find the standard deviation of a sample mean, refer to this formula: Image Source: Wikimedia Commons In this figure, the x-axis represents the difference in standard deviations from the mean, while the y-axis represents the percentiles of the data set. On the x-axis, the 0 is the mean. The points to its left are -1, -2, and -3 standard deviations from the mean, and vice versa on the right. This graph tells us that 34.1% of this data set falls between -1 and 1 standard deviation from the mean, while a mere 0.1% falls outside of -3 and 3. Standard Deviation on the AP Statistics Test On the AP Statistics test, you will be given all the relevant standard deviation formulas on the AP Stats formula sheet. The questions on the test will ask you to demonstrate your knowledge of standard deviation and interpret it in the context of a practical problem. Often, this means using a given standard deviation to calculate another value in a different formula. 49

50 Standard Deviation Cont. Take, for instance, this question from the FRQ portion of the 2009 AP Stats exam. Image Source: CollegeBoard This question asks a student to apply the concept of standard deviation in context to determine other information about the tire treads. The red circle marks the most important information you need for this problem. Once we use the standard deviation to find the 70 th percentile, we can use that answer to solve parts b and c. First, we need to get the z-score for 70 percent another calculation which involves the standard deviation. The formula to find the z-score is: For our example, that gives us: 50

51 Standard Deviation Cont. So for 70 percent, z = We already have the standard deviation, so we can plug both values into this formula for calculating percentile: Written out with the values for this problem, that becomes: Let s see how the student in this example used that formula to complete the problem: Image Source: CollegeBoard This particular test-taker also underlined the same information that we circled in red above and wrote out the mean, standard deviation, p, and z-score needed to complete the problem. This test-taking strategy lets you organize your thoughts and mark relevant information in a question clearly. It also spells out your process for the examiners, who can follow along with your work. 51

52 Standard Deviation Cont. Wrapping Up Standard Deviation Standard deviation is one of the most important and frequently used statistics we can find whether used on its own to tell us something about a data set or as part of an equation to find percentile or other information. As a rule of thumb, remember that high standard deviation means lots of variation from the mean and may be caused by factors such as outliers or a more scattered data set while low standard deviation tends to mean less variation from the mean and a more homogeneous data set. 52

53 The German Tank Problem Explained What is the German Tank Problem? The German Tank Problem is a famous statistical problem that helped the Allied Forces during World War II, and can help you with your AP Statistics review. Statisticians use estimators when dealing with samples from a larger population. Often, it can be useful to know the size of the total population when you are working with a limited sample size from a population of unknown size. Estimating the total population size, or population maximum, can let you draw more accurate conclusions about the sample and how it represents or fails to represent the entire population. The German Tank Problem will help you do just that. In World War II, each manufactured German tank or piece of weaponry was printed with a serial number. Using serial numbers from damaged or captured German tanks, the Allies were able to calculate the total number of tanks and other machinery in the German arsenal. Allied mathematicians were only able to collect a limited sample of German tanks, but used that sample as an estimator of the population maximum of German tanks. They applied the same principles to estimating number and importance of factories, as well as a number of other manufactured munitions. Statistical analysis proved far more accurate than estimates based on conventional intelligence gathering, which tended to wildly overestimate the number of tanks produced each month. For example, traditional intelligence gathering put production of tanks at an absurdly high 1,400 tanks per month. When statisticians calculated the population maximum, their estimate was a mere 256 tanks per month. It turned out that the statisticians information was more accurate than the spies : according to German records recovered after the war, they had produced 255 tanks for the month in question. 53

54 The German Tank Problem Explained Cont. As you can probably imagine, this was tactically useful information. In fact, it was critical to the Allies plan for D-Day. The German Tank Problem has remained one of the most famous examples of applied mathematics in the twentieth century. Today, the German Tank Problem is used by many AP Stats teachers to demonstrate how to estimate a population parameter. The Problem with Samples and Estimators Essentially, the German Tank Problem demonstrates how to estimate the size of an entire population given only a limited sample. There are several estimators we can use to guess at the size of the population when given only a sample. None of them yield very accurate results, as the following example will demonstrate. Say you have a sample of five random serial numbers from a group with an unknown population maximum. We ll say those serial numbers are 3, 21, 30, 87, and 115. For the purpose of this example, we ll also say that the population maximum is 150 so that you can follow along more easily. First, we can attempt to calculate the population maximum by doubling the maximum value of the sample. The maximum sample value, or highest value in the sample, is 115. Doubled, that gives us 230 not quite twice our real maximum population. That method is far too inaccurate and tends to overshoot the population maximum. So is the method of using twice the mean value of the sample, but it tends to have the opposite problem of underestimation. The mean of our example sample Doubled, that comes out to It s closer to our target of 150, but still off by about 32%. Lastly, we come to the method of doubling the sample median. The median of this sample is 30; doubled, that comes to 60 a long way off from

55 The German Tank Problem Explained Cont. Simply put, none of these methods provides the kind of accuracy you d need if you had to plan a battle. They all rely too heavily on which numbers you draw for your sample. While some samples might give you estimators that get fairly close to the real population maximum, these methods are too hit or miss for highstakes applied math. Image Source: Wikipedia Commons 55

56 The German Tank Problem Explained Cont. How to Use the Minimum-Variance Unbiased Estimator (MVUE) Fortunately, there s a more accurate and relatively easy way to find an unbiased estimator. All you need to know is your sample size, the sample maximum (largest value in your sample), and this equation: Population Maximum = Sample Maximum + (Sample Maximum / Sample Size) 1 The MVUE equation solves the German Tank Problem by operating on the assumption that the population maximum is likely to be just a little higher than the sample maximum. That difference between sample maximum and population maximum is approximately equal to the mean gap between each number in the sample. In our example, our sample size is 5 and our sample maximum is 115. Our equation would look like this: Population Maximum = (115 / 5) 1 That gives us an estimator of 137. In this particular sample, which I took by using a random number generator, we re still off by about 9%. But compared to the methods demonstrated earlier, the Multi-Variable Unbiased Estimator estimated the maximum population within a margin of error of 10%. For an estimate based on only 5 numbers out of 150, that s impressively accurate. 56

57 The German Tank Problem Explained Cont. Wrapping Up the German Tank Question For your AP Stats test, you may have to solve problems similar to the German Tank Problem using the Multi-Variable Unbiased Estimator. As World War II demonstrated, there are many real-world applications for estimating the population maximum from your sample. You should be familiar with different estimators and be able to understand, and possibly explain for a free-response question, how and why the MVUE equation works. Sampling distribution and parameters are groundwork concepts for introductory statistics, and an AP level student should have a firm grasp of their relevance. For a teacher s perspective on the application of sampling distribution, including its relation to the German Tank Problem and various other creative examples of its relevance, see this Special Focus published by the College Board. 57

58 Z-score Calculations & Percentiles in a Normal Distribution One of the challenges in preparing for the AP Statistics exam is that the concepts build upon one another. Some statistical tests involve several steps, combining earlier and simpler concepts into more complex ones. As a result, failing to understand any one of the earlier ideas in the course can mean big trouble when it comes time for the exam. Z-score calculations are a perfect example of this. As one of the core skills in AP Statistics, z-score calculations require you to combine much of what is covered in the first half of the course. Use this AP Statistics review to be sure you understand everything you need to beat z-score questions on the exam. We will tell you all of the concepts related to z-scores, show you how to perform z-score calculations using sample questions, and explain percentiles in a normal distribution. What is a Z-score? A z-score shows you the distance between an observed score and the mean in units of standard deviations. These terms may sound a bit complicated right now if they are new to you. However, it s very simple to perform once you understand all of the concepts that lead up to z-score calculations. Ingredients for Z-score Calculations Performing statistical tests, like z-score calculations, is a bit like cooking. All you need to do is follow the recipe, but first, you need to have all the ingredients! Below are all the concepts you should understand in order to fully grasp z-score calculations. 58

59 Z-score Calculations & Percentiles in a Normal Distribution Cont. We ll review these ingredients first to find out where your gaps in knowledge are. Then, we ll go over the major concepts one at a time. Frequency Distributions Density Curves & Probability Normal Distribution Mean and Standard Deviation P-values Frequency Distributions A frequency distribution is a table showing the number of observations of each outcome along a given dimension. It can be represented graphically in several ways, including histograms and line charts. For example, we may measure the number of students in our class of 50 students who earned each possible letter grade, which could look like this: Letter grade A B C D F Number grade < 60 Number of students

60 Z-score Calculations & Percentiles in a Normal Distribution Cont. Plotted as a histogram, it would look like this: Density Curves & Probability A density curve looks similar to a frequency distribution, but it represents the probability of observing each of the outcomes. Probabilities are fractions of 1. To understand this, consider the probability of observing the grade A in the example above. There are 50 students, and 5 of them earned A s, so the probability of observing an A in this class is Below, we ve plotted the probability of each outcome to illustrate that the total probability adds up to 1. Letter grade A B C D F Total Number grade < 60 Probability Since the total probability of all possible outcomes is 1, the area under the curve is also 1. With z-scores, we will use the concept of area under the curve to determine the probability of various outcomes. 60

61 Z-score Calculations & Percentiles in a Normal Distribution Cont. Standard Normal Distribution A normal distribution is one that is symmetrical and bell-shaped, like the examples we ve seen here. The standard normal distribution is a special type, having a mean of 0 and a standard deviation of 1, like the one below. In calculating z-scores, we convert a normal distribution into the standard normal distribution this process is called standardizing. Since distributions come in various units of measurement, we need a common unit in order to compare them. The standard unit used to compare different distributions is the standard deviation. Image Source: Wikimedia Commons 61

62 Z-score Calculations & Percentiles in a Normal Distribution Cont. Mean and Standard Deviation The mean and the standard deviation are the two main ingredients that go into calculating the z-score. The mean is a measure of the center of a distribution (see our other blog post for a review of means). The standard deviation is a measure of the spread of a distribution it shows the average distance of each observation from the mean. In statistics, we represent the mean and standard deviation using letters from the Greek alphabet. The symbol for mean is μ The symbol for standard deviation is σ The standard deviation is important for z-scores because it tells us whether a score is close or far away from the mean. Imagine a class takes a test, and the mean score is 50%, but student S scored 75%. Is this a good or a bad score? Well, if every other student scored between 45-55%, the distribution has a small standard deviation, and suddenly S s score seems a lot more impressive! On the other hand, if the distribution of scores is more spread out (large standard deviation) and falls between 0-100%, S is no longer happy about his score. As we discussed above, the standard normal distribution has a mean of 0 and a standard deviation of 1. Z-scores are represented in units of standard deviations. A z-score of 1 means that an observation is 1 standard deviation away from the mean. So, in the example above, if the standard deviation is 15, S s score of 75 is 1 standard deviation away from the mean of 50 he has a z-score of 1. If the standard deviation is 5, S s score is now 3 standard deviations away from the mean he would have a z-score of 3. 62

63 P-values Z-score Calculations & Percentiles in a Normal Distribution Cont. The p stands for probability. P-values represent the probability of observing a specific z-score. Just as the probability of observing an A was lower than the probability of earning a C in our original example, the probability of observing a z- score of 3 is lower than the probability of observing a z-score of 1. The larger the z-score, the smaller the probability! This stems from the fact that the further away you get from the mean, the more unlikely the scores become. Z-scores can be converted into p-values (and vice versa) by using a simple table that is found in the back of any statistics textbook. If you re not familiar with using a z-table, see this short video for a review. Percentiles in a Normal Distribution Rule Instead of always using a z-table, there is also a convenient rule for estimating the probability of a given outcome. It is called the Rule. This rule means that 68% of the observations fall within 1 standard deviation of the mean, 95% fall within 2 standard deviations, and 99.7% fall within 3 standard deviations. That means the probability of observing an outcome greater than 3 standard deviations from the mean is very low: 0.3% Image Source: Wikimedia Commons 63

64 Z-score Calculations & Percentiles in a Normal Distribution Cont. Performing Z-score Calculations Now that you have all the ingredients, you re ready for the recipe! The formula for a z-score looks like this: x represents an observed score, also known as a raw score. As previously mentioned, μ represents the mean and σ represents the standard deviation. To calculate a z-score, we simply subtract the mean from a raw score and then divide by the standard deviation. (On exam questions, the mean and standard deviation may be provided, or you may need to calculate them, so make sure you know how to do that!) Then, we take our z-score and check the z-table to find the p-value of that score. There are several ways you may be asked to use z-scores on the AP Statistics exam. You may have to compare scores in two distributions, find the probability of a certain observation, or find the probability of an interval between two observations. You can also go in reverse, using p-values to find z-scores and then raw scores. Let s try some examples! Example 1 Tom is a sprinter, and Alex is a long-jumper. They both compete at the track meet this weekend, along with 4 other athletes in each of their respective events. Tom thinks he is a better athlete than Alex is. Is there evidence for his claim? (Note: we are assuming that sprint times and long-jump distances are normally distributed. Otherwise, we can t use z-score calculations!) 64

65 Z-score Calculations & Percentiles in a Normal Distribution Cont. The sprint times in seconds are as follows: Athlete Tom Athlete 2 Athlete 3 Athlete 4 Athlete 5 Time (sec) Calculating the mean and standard deviation we find: The long-jump distances in feet are as follows: Athlete Alex Athlete 2 Athlete 3 Athlete 4 Athlete 5 Distance (ft) Calculating the mean and standard deviation we find: 65

66 Z-score Calculations & Percentiles in a Normal Distribution Cont. Keep in mind that Tom is racing he wants to have a smaller score than his competitors, whereas Alex is going for greater distance. Tom is standard deviations below the mean, and Alex is 0.77 standard deviations above the mean. That means that Alex is actually the better athlete relative to his competition! We can also use our z-table to find the probability of earning each score. Based on the z-scores we calculated above, the p-value of an athlete running as fast or faster than Tom did is.26. Similarly, the p-value of an athlete jumping as far or farther than Alex did is.22. Example 2 What if we wish to find the probability of scoring within a certain range? For example, what is the probability of a student scoring between 85-90% on a test if the mean is 80% and the standard deviation is 5%? First, we find the z-scores for both sides of our range. We are looking for the probability of the shaded area under the curve, pictured below. 66

67 Z-score Calculations & Percentiles in a Normal Distribution Cont. Z-tables can vary in the information they display, but they generally show the area above a given score, the area below a given score, and sometimes the area between the mean and the score. This information gives us several methods of solving the problem, but all of them involve simple subtraction. For example, we can take the area below our higher score, which the z-table tells us is : And subtract the area below our lower score, which the z-table tells us is.8413: Which leaves us with just the area between the two values we re interested in: That means there is a 13.59% probability of a student scoring between 85-90% on this exam. Congratulations! You now have a handle on every concept you need to know when it comes to z-score calculations on the AP Statistics exam! 67

68 Ready to Score Higher? Stop stressing about the AP Statistics exam. Albert has got your back! With thousands of practice questions, personalized statistics, and anytime, anywhere access, Albert helps you learn faster and master the difficult concepts you are bound to see on test day. Click below or visit Start Practicing

69 Describing Distributions in AP Statistics Image Source: Wikimedia Commons Describing distributions is one of the key skills you ll need to earn a high score on the AP Statistics exam. If you need proof of this, just flip through some past exam questions, which can be found at the CollegeBoard website. You ll notice that the first free response question is almost always a question requiring you to look at a graph and describe it. At this point, you may be thinking, How hard could it be to just describe something? I describe things all the time! Unfortunately, describing distributions for AP Statistics is not like describing a movie you watched last weekend. There is definitely a right and a wrong way to do it, and the test-makers at CollegeBoard expect you to go through specific steps and use specific language. Use this quick AP Stats review to learn everything you need about describing distributions. We ll review all of the relevant concepts, view some examples, and finish up with some practice questions. You ll be exam-ready in no time! 69

70 Describing Distributions in AP Statistics Cont. Distributions: A Review Before learning how to describe distributions, it s obviously important to understand what they are. A distribution is the set of numbers observed from some measure that is taken. For example, the histogram below represents the distribution of observed heights of black cherry trees. Scores between feet are the most common, while higher and lower scores are less common. The most commonly observed heights were between feet, of which the researcher found 10 cases. Image Source: Wikimedia Commons 70

71 Describing Distributions in AP Statistics Cont. 4 Key Concepts: A Preview When describing distributions on the AP Statistics exam, there are 4 key concepts that you need to touch on every time: center, shape, spread, and outliers. Below is a preview of the main elements you will use to describe each of these concepts. In the following sections, we ll explain each of these terms one by one. 1. Center 2. Shape 3. Spread a. Mean b. Median c. Mode a. Symmetrical vs. Skewed b. Unimodal vs. Bimodal a. Range b. IQR 4. Outliers a. Are they any? To ensure that you remember each of these 4 concepts, it is very helpful to come up with a mnemonic device, such as an acronym or a sentence. For example, you could rearrange the letters into SOCS, and remember to think, When describing a distribution, ask about its socks. Or, you could come up with a short sentence like Cats Sometimes Sleep Outside. If you make your own, it will be even easier to remember the more unique and wacky, the better. 71

72 Describing Distributions in AP Statistics Cont. 1. Center The first concept you should understand when it comes to describing distributions are the measures of central tendency: mean, median, and mode. There are multiple measures because there are different ways to think about what is the center of a distribution. Each measure has pros and cons and will be useful in different situations. As a result, you need to provide all three measures to give a full description. Mean Image Source: Wikimedia Commons The mean is the arithmetic average of all of the scores in your distribution. To calculate it, you simply add up all of the scores, and then divide by the total number of scores. The mean is important for many other statistical calculations you will need in AP Stats. However, the mean is also skewed by outliers. In other words, it is pulled towards the extremes. 72

73 Describing Distributions in AP Statistics Cont. Median The median is the exact middle score in your distribution. To find the median, you must arrange all of the scores in numerical order. Then, you find the score that falls directly in the center, splitting the distribution evenly in half. If the number of scores in your distribution is even, there will be two scores in the center. In this case, you take the mean of the two middle numbers, and the result will be your median. Mode The mode is the easiest measure of central tendency to find. It is simply the most common score in your distribution, or the number that appears most often. Sometimes you may have a tie between two or more scores that all appear the same number of times in your distribution. In cases like this, you have more than one mode, and that is perfectly fine. However, there can only be one mean and one median per distribution. 2. Shape The next thing to consider about a distribution is its shape. At the most basic level, distributions can be described as either symmetrical or skewed. You will see that there are also relationships between the shape of a distribution, and the positions of each measure of central tendency. Symmetrical Distributions Symmetrical distributions are ones where the right and left halves are perfect mirrors of each other. Symmetrical distributions that are bell-shaped are also known as normal distributions. 73

74 Describing Distributions in AP Statistics Cont. An example of a normal distribution is pictured below. Image Source: Wikimedia Commons Distributions may also have a single peak or more than one peak. We call distributions with a single peak unimodal. Modal comes from the word mode this makes sense when you consider that the peak of a distribution is also the score that appears most frequently. Distributions with two equal peaks are bimodal since two scores appear more frequently than the others but are equally frequent to each other. 74

75 Describing Distributions in AP Statistics Cont. Below is an example of a bimodal distribution. Image Source: Wikimedia Commons There are also cases in which a distribution appears to have two peaks, but one peak is larger than the other, such as the one below. For purposes of the AP Statistics exam, these can be described as bimodal, though strictly speaking they are unimodal since there is only one most frequent score. Image Source: Wikimedia Commons In normal distributions, the mean, median, and mode will all fall in the same location. If the distribution is symmetrical but has more than one peak, the mean and median will be the same as each other, but the mode will be different, and there will be more than one. 75

76 Describing Distributions in AP Statistics Cont. Skewed Distributions We call distributions that are not symmetrical skewed. They may be skewed either to the right or to the left. Right skew is also termed positive skew, since the x-axis becomes more positive as it moves to the right. As you might expect, left skew is termed negative skew. The location of the tail determines the direction of the skew the longer end of the distribution. If the tail is to the right, the distribution is right skewed, and vice versa. You can remember this by imagining taking a normal distribution, pinching one end of it, and stretching it out in that direction. The direction in which you stretch the distribution is the direction of the skew. Image Source: Wikimedia Commons When a distribution is skewed, the mean will be pulled towards the tail. The halfway point of the distribution (the median) will also fall off the peak in the direction of the tail but not as far as the mean. The mode will remain at the peak. As a result, in a right skewed distribution the mode < median < mean, while in a left skewed distribution, the mean < median < mode. 76

77 Describing Distributions in AP Statistics Cont. Understanding this idea can allow you to determine the shape of a distribution simply by knowing the measures of central tendency. A comparison of mean, median, and mode in a right-skewed distribution. Image Source: Wikimedia Commons When a distribution is skewed, the mean will be pulled towards the tail. The halfway point of the distribution (the median) will also fall off the peak in the direction of the tail but not as far as the mean. The mode will remain at the peak. As a result, in a right skewed distribution the mode < median < mean, while in a left skewed distribution, the mean < median < mode. 77

78 Describing Distributions in AP Statistics Cont. 3. Spread The main measure of spread that you should know for describing distributions on the AP Statistics exam is the range. The range is simply the distance from the lowest score in your distribution to the highest score. To calculate the range, you just subtract the lower number from the higher one. You can also utilize the interquartile range (IQR), which is a bit more complicated (for a review, see our other post on ranges). The IQR is the range of the middle 50% of the data. The IQR is useful for situations in which you have outliers. Image Source: Wikimedia Commons 78

79 Describing Distributions in AP Statistics Cont. 4. Outliers Outliers are scores that fall far outside of the main part of your distribution either much higher or much lower. Outliers appear to be disconnected from the pack, meaning there are no scores observed between the outlier and the rest of the distribution. Below is an example of a distribution with one lower outlier. Notice on the right side, the distribution dips and rises again. However, this observation is not technically an outlier, since it is not disconnected from the rest of the distribution. When describing distributions on the AP Statistics exam, you simply need to indicate whether or not there are outliers, so this section of the question should be easy! 79

80 Describing Distributions in AP Statistics Cont. Practice Questions Now that you know all of the concepts you need to describe a distribution on the AP Statistics exam, let s try a couple of practice problems! Practice AP Statistics Free Response Question 1 Now we just go through each of our 4 points! Center: The median salaries for both corporations are approximately equal. (The mean and mode are not shown in boxplots, so we can t touch on those here). Shape: The salary distribution of corporation A appears skewed slightly to the left, while corporation B is approximately symmetrical. Spread: The range and interquartile range for corporation A are larger than those of corporation B. Outliers: Corporation A has two outliers, while corporation B has none. 80

81 Describing Distributions in AP Statistics Cont. Practice AP Statistics Free Response Question 1 Center: The mode is the easiest measure to find since it is simply the most frequent score, which in this case is dollars. You could also write all of the dollar amounts in a table to find the median and calculate the mean, but that would take more time and is unnecessary here. You know by the skew that the median is slightly higher than the mode, and the mean will be the highest of the three. Shape: This distribution is unimodal and positively skewed. Spread: We can t find the exact range in this case since the graph shows us intervals of tip amounts rather than the exact numbers. However, the highest possible amount would be 22.5 dollars, and the lowest possible amount would be 0 dollars, making the greatest possible range 22.5 dollars. Using similar logic, we know that the smallest possible range is 17.5 dollars. Outliers: This distribution has one outlier in the dollars range. Now that you ve had a bit of practice, you should feel very comfortable using the 4 key concepts of center, shape, spread, and outliers. You re ready to take on any question about describing distributions on the AP Statistics exam! 81

82 3 Ways to Approach AP Statistics Free Response Questions How to Approach AP Statistics Free Response Questions At first glance, the free response questions (FRQ s) may seem like the most intimidating part of the AP Statistics exam. Essay questions on a math test? The FRQ section consists of five short answer questions and one investigative task. This portion takes ninety minutes and is half your exam score. Here are some tips for studying beforehand and for how to approach the test itself. Before the Test: Read Past AP Statistics Free Response Questions Studying the exams of the past will prepare you for the kinds of questions the test may have. The College Board website features archives of free-response questions and sample answers dating back to This free resource exists to help students like you prepare for the exam. If you have a study group or a study hall period, try solving some of the past FRQ s as a team. Some of you may have stronger writing skills while others have stronger math skills, so take the opportunity to learn from each other. Structuring Your Written Responses The primary goal of writing about math is to cover all the relevant information clearly and succinctly, with adept analysis that shows you understand how the math applies to a real-life problem. Keep your responses brief and avoid filler content. 82

83 3 Ways to Approach AP Statistics Free Response Questions Cont. Let s break down the structure of this sample response to part 2(e) of the AP Stats investigative task question: 1. State your answer. 2. Explain it in a few sentences. 3. Restate and justify your answer. Image Source: College Board In the first sentence, the student stated their answer and gave a brief reason. The second sentence went on to explain why they chose that answer. In the final sentence, the student restated their answer using their analysis to justify their conclusion. Notice the transition and information words: because, tend to be, and Therefore. Words that transition to or set up information are the key to a neat, concise response. 83

86 How to Study for AP Statistics Anyone who has taken an AP Exam knows that it s a big undertaking. Not only did the class require an entire school year (sometimes two!) of classwork, (probably much harder stuff than what you get in non-ap classes), but now your teachers or your parents want you to take an external exam? (Sometimes in addition to the end-of-the-year assessment for the class itself!) And as if all of this weren t enough, the AP Exam you are thinking about taking is the AP Statistics Exam. So what is the AP Statistics Exam all about? Here at Albert.io, we ve got everything you need to feel completely confident. In this review, you ll read about what the AP Statistics Exam is like, what to include on your AP Statistics study guide, the best AP Statistics study plan, and all the AP Statistics tips you need to ace the exam. 86

87 How to Study for AP Statistics Cont. What is the AP Statistics Exam Like? The AP Statistics Exam is a three-hour, paper-and-pencil exam that consists of two sections. Students are allowed and encouraged to use a graphing calculator during the test. The first section is a 90-minute multiple-choice section, consisting of 40 questions, and the second section is a 90-minute free-response section, consisting of five questions and one investigative task. The two sections are weighted equally and standardized to give exam-takers a score from 1-5, with college credit usually offered for scores of 3, 4, or 5. For students who take the exam during the standard time (generally the second week of May), scores become available during the first or second week of July. So now that you have a general idea of what the AP Statistics Exam is like let s take a closer look at each of the topics we just mentioned. First, we ll start with a comprehensive overview for what to include on your AP Statistics study guide. Then, we ll discuss the best AP Statistics study plan. Finally, we ll talk AP Statistics tips and provide more study resources. What Should You Include in Your AP Statistics Study Guide? There are four major themes included in AP Statistics courses form part of the assessment on the AP Statistics Exam. These are: Exploring Data (20-30% of exam) describing patterns and outliers in datasets Sampling & Experimentation (10-15% of exam) planning and conducting statistical analyses Anticipating Patterns (20-30% of exam) using probability and simulation to explore random events Statistical Inference (30-40% of exam) hypothesis testing and estimation population parameters Each of these themes should be fully covered in your AP Statistics study guide. Now let s take a look at each of these themes in depth, and define what exactly the AP Statistics Exam will cover. 87

88 How to Study for AP Statistics Cont. Exploring Data Exploring Data is not just about finding patterns in the data. It s also about describing when data points don t fit the patterns. Students must be able to report important characteristics of datasets, like the shape, location, and variability of a given distribution of data, but they also must be able to report outliers in the data and to describe how outliers change the characteristics of distributions. Specifically, topics to put on your AP Statistics study guide for Exploring Data include: Dot plots Stem plots Histograms Cumulative frequency plots Median Mean Interquartile range Standard deviation Z-scores Boxplots Scatterplots Correlation Least Squares regression Transformations (logarithmic and power) Frequency tables Bar charts Marginal frequency Residual plots Data comparison You can find the entire list of what content is included in the Exploring Data theme in the College Board AP Statistics course description (pg. 11). 88

89 How to Study for AP Statistics Cont. Let s take a look at a sample question for the Exploring Data theme so that you know what type of content to include on your AP Statistics study guide. In the scatterplot of y versus x shown above, you ll find the least squares regression line superimposed on the plot. Which of the following points has the largest residual? (a) A (b) B (c) C (d) D (e) E In this sample question (adapted from College Board), determining the correct answer hinges on remembering your statistics vocabulary. A residual is a difference between the observed value of a data point and the value that was predicted by the mathematical model. So in this case, that means the distance between the regression line (which predicts where each data point would be if it exactly fit the model), and the actual points. We see that point A lies the greatest distance above or below the regression line, so the correct answer is (a). Now let s continue to the second major theme to include in your AP Statistics study guide. 89

90 How to Study for AP Statistics Cont. Sampling & Experimentation It s easy for students who have just learned a laundry list of statistical procedures in their AP Statistics class to take a plug and chug approach to running statistical tests. What this approach requires is simply to plug in the data points into your graphing calculator, and then run each of your favorite statistical tests until you get a data output that looks familiar. However, while this approach is certainly useful if you have completely run out of ideas, it s important to note that the Sampling & Experimentation portion of the AP Statistics Exam does not simply assess your ability to run statistical tests and correctly report the results. It s equally important that you demonstrate to the AP scorers (especially on the free-response section) that you are collecting data according to a carefully thought out, well-developed plan. To be more specific, topics to add to your AP Statistics study guide that you ll need to know for Sampling & Experimentation questions include: Census Surveys Experiments Observational studies Populations Samples Sampling methods Random selection Stratified random sampling Random assignment Confounding variables Matched pairs design Simple random sampling Cluster sampling Replication Placebo effect Generalizability Bias Control groups Blinding 90

91 How to Study for AP Statistics Cont. Each person in a simple random sample of 2,000 received a survey, and 317 people returned their survey. How could non response cause the results of the survey to be biased? (a) Those who did not respond reduced the sample size, and small samples have more bias than large samples. (b) Those who did not respond caused a violation of the assumption of independence. (c) Those who did not respond were indistinguishable from those who did not receive the survey. (d) Those who did not respond represent a stratum, changing the simple random sample into a stratified random sample. (e) Those who did respond may differ in some important way from those who did not respond. In this sample question (adapted from College Board), arriving at the correct answer hinges on your general understanding of random sampling. This question requires you to use logic to reason the problem out rather than remembering some factoid. If the goal is to create a random sample, then you know what s important is not the number of individuals in the sample, so option (a) is out. A person not responding would not influence whether another person responds, so option (b) is out. Finally, the key to differentiating between the last three options is understanding that in a random sample, the goal is for the sample to represent all of the different groups present in the population. Option (e) expresses the greatest threat to this idea of having a truly random sample. As we saw in this example, to find the correct answer required almost no knowledge of what a simple random sample is. Much more important was a general understanding of statistical concepts. Keep this in mind as you create your AP Statistics study guide! Now let s move on to the third major theme, Anticipating Patterns. 91

93 How to Study for AP Statistics Cont. In this sample question (adapted from College Board), you can find the correct answer by doing some simple arithmetic. The central probability idea at play in this problem is what happens when you roll a fair die 100 times. The Law of Large Numbers tells us that with 100 rolls, each of the six outcomes on average will occur with equal probability. So 83.33% of the rolls will be a number between 1 and 5 (earning 3 points), and the remaining 16.66% of rolls will be a 6 (earning 20 points). Multiplying by 3 and adding the result to x 20 gives us a total of points. Therefore, the answer is (c). Let s continue now with the last theme you should have included on your AP Statistics study guide. Statistical Inference Statistical Inference is truly the meat of the AP Statistics Exam. With 30-40% of the questions on the exam designed to test this theme, Statistical Inference is the most important thing for you to understand on the AP Statistics Exam. The skills you need for this theme include being able to select a statistical test to run in a given situation, running that test, and most importantly being able to describe your conclusions using appropriate statistical language. In other words, this is where you showcase your understanding of statistical confidence (read: confidence intervals, confidence, intervals, confidence intervals). Other topics that fall under this theme include the following topics. You should make sure to include these in your AP Statistics study guide: Confidence intervals Margins of error Unbiasedness Variability Significance tests Null hypothesis Alternative hypothesis p-values One- and two-sided tests Type I error Type II error Power Goodness of fit Homogeneity of Proportions Least-squares regression line 93

94 How to Study for AP Statistics Cont. You can check out the College Board AP Statistics course description (pg. 13) for a full description of what concepts are included. Now let s check out a practice problem, to give you a better idea of what should go on your AP Statistics study guide. In a test of H0: μ = 8 versus Ha: μ 8, a sample of size 220 leads to a p-value of Which of the following must be true? (a) A 95% confidence interval for μ calculated from these data will not include μ = 8. (b) At the 5% level if H0 is rejected, the probability of a Type II error is (c) The 95% confidence interval for μ calculated from these data will be centered at μ = 8. (d) The null hypothesis should not be rejected at The most important thing to keep in mind with this seemingly complicated practice question (adapted from College Board) is what you don t know. With only the null hypothesis, the alternative hypothesis, the sample size, and the p-value, we don t know what test was run, what the results of the test were, and what conclusions to draw. The process here should be to carefully scan the possible answers and select one that we can state with confidence. We start with option (a). Knowing that a p-value is simply the 5% that lies outside the 95% confidence interval, we know that a p-value less than p = 0.05 would indicate that a 95% confidence interval would not include the sample mean. Therefore, option (a) must be true. Now that we have reviewed the four themes that comprise the AP Statistics Exam and mentioned all of the topics that you need to have on included on your AP Statistics study guide let s move on to some specific strategies for how you can study all of the material covered on the AP Statistics Exam. We ll start by talking about the difference between the two sections of the exam, and then we ll help you find the best AP Statistics study plan that works for you. 94

95 How to Study for AP Statistics Cont. How Should You Study for the AP Statistics Exam? To study properly for the AP Statistics Exam, the first thing you have to understand is what resources you will have available to you on test day. This means two critically important things: 1. You will be able to use your graphing calculator on test day. So you should use it while you are studying. It s crucial that you are familiar with the exact calculator you will have available for the exam; you don t want to waste any time on test day figuring out how to use a new or unfamiliar calculator. Take the time now looking through the approved list of calculators and talk to your AP Statistics teacher about what calculator you will want to use on test day. 2. You will have an equation sheet available to you on test day. So you should not waste any time memorizing any formulas that are on the sheet. The equation sheet (available here as the first five pages of a College Board sample freeresponse packet) contains a list of equations as well as graphs of different distributions. Make sure that you utilize the formula sheet enough while studying that you are familiar with what is contained on the sheet and where to find what you are looking for. Now that you know to include your graphing calculator and equation sheet in your studying, let s take a look at the two sections of the exam one at a time and discuss the best ways to prepare for each. Multiple Choice Section As you may have guessed, the best way to study for the multiple-choice section of the AP Statistics Exam is to do a whole lot of practice multiple choice questions. This allows you to not only test yourself on your learning but also to practice the testing strategies like process of elimination that will serve you well on the multiple choice section. You can find plenty of practice AP Statistics multiplechoice questions right here at Albert.io.(Check out this link to start practicing.) 95

97 How to Study for AP Statistics Cont. Now that you have a good idea for how to study for both the multiple-choice and free-response sections of the AP Statistics Exam, let s move on to finding the best AP Statistics study plan for you. After that, we ll offer some more of our best AP Statistics tips. What s the Best AP Statistics Study Plan for You? The AP Statistics Exam is not a test that you want to cram for. In fact, one of the AP Statistics tips that we ll talk about later is that you shouldn t do any studying the day of the exam, or even the day before. With this much content, it s important to space out your studying. All it takes is finding the right AP Statistics study plan that fits your situation. Below we have a 6-month AP Statistics study plan, a 3-month AP Statistics study plan, and a 1-month AP Statistics study plan that provide specific tips for what you should be doing at each stage before the exam. Depending on how much time you have left, pick up your test prep at one of these three options. 6 Month AP Statistics Study Plan With six months left before the AP Statistics Exam, it is probably November (the test is in May), and you are likely starting the third month of your AP Statistics class. You ve learned enough content that you are aware of some AP Statistics concepts, and you have taken a few tests and quizzes. Taking a practice AP Statistics Exam or doing practice questions could potentially be counter-productive at this stage, as you might not have enough of a learning base for the practice to be effective. Instead, the most important thing for you to do right now is to be thorough in your learning and make sure that nothing falls through the cracks. What we mean is that the AP Statistics curriculum is cumulative, so concepts you learn now will reappear throughout the course. This is the time for you to ensure that you fully understand the foundational principles of statistical reasoning, so that you can build up into more and more specific applications as the course progresses. 97