1 1 st Quarterly Exam ~ Sampling, Designs, Exploring Data and Regression Part 1 Review I. SAMPLING MC I-1.) [APSTATSMC2014-6M] Approximately 52 percent of all recent births were boys. In a simple random sample of 100 recent births, 49 were boys and 51 were girls. The most likely explanation for the difference between the observed results and the expected results in this case is (A) bias (B) variability due to sampling (C) nonsampling error (D) a sampling frame that is incomplete (E) small sample size MC I-2.) [APSTATSMC2007-2] In which of the following situations would it be most difficult to use census? (A) To determine what proportion of licensed bicycles on a university campus have lights. (B) To determine what proportion of students in a high school support wearing uniforms. (C) To determine what proportion of registered students enrolled in a college are employed more than 20 hours each week. (D) To determine what proportion of single-family dwellings in Tenafly, New Jersey have two-car garages. (E) To determine what proportion of fish in Lake Michigan are bass. MC I-3.) [CBAPSTATSPRACTICE-2] Under which of the following conditions is it preferable to use stratified random sampling rather than simple random sampling? A.) The population can be divided into a large number of strata so that each stratum contains only a few individuals. B.) The population can be divided into a small number of strata so that each stratum contains a large number of individuals. C.) The population can be divided into strata so that the individuals in each stratum are as much alike as possible. D.) The population can be divided into strata so that the individuals in each stratum are as different as possible. E.) The population can be divided into strata of equal sizes so that each individual in the population still has the same chance of being selected. 1

2 MC I-4.) [CBAPSTATSPRACTICE-9] Each person in a simple random sample of 2,000 received a survey, and 317 people returned their survey. How could nonresponse cause the results of the survey to be biased? A) Those who did not respond reduced the sample size, and small samples have more bias than large samples. B.) Those who did not respond caused a violation of the assumption of independence. C.) Those who did not respond were indistinguishable from those who did not receive the survey. D.) Those who did not respond represent a stratum, changing the simple random sample into a stratified random sample. E) Those who did respond may differ in some important way from those who did not respond. MC I-5.) [APSTATSMC2014-2] A researcher wants to know the percentage of villages in a certain African country that have access to a clean drinking water source less than ¼ mile from the center of the village. The country is divided into 12 districts and each district has many villages in it, as indicated in the table below. The researcher selects a random sample of 10% of the villages from each district. Which of the following terms best describes this sampling method? A.) Simple random sampling B.) Stratified random sampling C.) Cluster sampling D.) Systemic sampling E.) Voluntary response sampling MC I-6.) [APSTATSMC ] In a recent poll of 1,500 randomly selected eligible voters, only 525 (35 percent) said that they did not vote in the last election. However, a vote count showed that 80 percent of eligible voters actually did not vote in the last election. Which of the following types of bias is most likely to have occurred in the poll? (A) Nonresponse bias (B) Sampling bias (C) Selection bias (D) Response bias (E) Undercoverage bias 2

3 MC I-7.) [APSTATSMC ] In the design of a survey, which of the following best explains how to minimize response bias? (A) Increase the sample size. (B) Decrease the sample size. (C) Randomly select the sample. (D) Increase the number of questions in the survey. (E) Carefully word and field-test survey questions. MC I-8.) [APSTATSMC2014-9] A survey was administered to parents of high school students in a certain state to see if the parents thought the students academic needs were being met. To select the sample, the parents were divided into two groups one group of parents who live in cities with populations of more than 100,000 and the other group of parents who live in cities with populations less than or equal to 100,000. A random sample of 100 parents from each group was taken. Which of the following statements about the sample of 200 parents is true? (A) It is a convenience sample because the sample of parents was easily obtained. (B) It is a stratified random sample because parents were randomly selected from each group. (C) It is a random cluster sample because parents were randomly selected from each group. (D) It is a random cluster sample because groups of high schools were randomly selected. (E) It is a systematic sample because the parents were systematically divided into two groups. MC I-9.) [APSTATSMC2012-4] A bank surveyed all of its 60 employees to determine the proportion who participate in volunteer activities. Which of the following statements is true? (A) The bank should not use the data from this survey because this is an observational study. (B) The bank can use the result of this survey to prove that working for the bank causes employees to participate in volunteer activities. (C) The bank did not select a random sample of employees, so the survey will not provide the bank with useful information. (D) The bank would have to use the survey data to construct a confidence interval in order to estimate the proportion of employees who participate in volunteer activities. (E) The bank does not need to use an inference procedure to determine the proportion of employees who participate in volunteer activities because the survey was a census of all employees. 3

4 MC I-10.) [APSTATSMC2002-4] Suppose that 30 percent of the subscribers to a cable television service watch the shopping channel at least once a week. You are to design a simulation to estimate the probability that none of five randomly selected subscribers watches the shopping channel at least once a week. Which of the following assignments of the digits 0 through 9 would be appropriate for modeling an individual subscriber's behavior in this simulation? (A) Assign "0, 1, 2" as watching the shopping channel at least once a week and "3, 4, 5, 6, 7, 8, and 9" as not watching, (B) Assign "0, 1, 2, 3" as watching the shopping channel at least once a week and "4, 5, 6, 7, 8, and 9" as not watching. (C) Assign "1, 2, 3, 4, 5" as watching the shopping channel at least once a week and "6, 7, 8, 9, and 0" as not watching. (D) Assign "0" as watching the shopping channel at least once a week and "1, 2, 3, 4, and 5" as not watching; ignore digits "6, 7, 8, and 9," (E) Assign "3" as watching the shopping channel at least once a week and "0, 1, 2, 4, 5, 6, 7, 8, and 9" as not watching. MC I-11.) [APSTATSMC ] The student government at a high school wants to conduct a survey of student opinion. It wants to begin with a simple random sample of 60 students. Which of the following survey methods will produce a simple random sample? (A) Survey the first 60 students to arrive at school in the morning. (B) Survey every 10 th student entering the school library until 60 students are surveyed. (C) Use random numbers to choose 15 each of the first-year, second-year, third-year, and fourth-year students. (D) Number the cafeteria seats. Use a table of random numbers to choose seats and interview the students until 60 have been interviewed. (E) Number the students in official school roster. Use a table of random numbers to choose 60 students from this roster for the survey. 4

5 MC I-12.) [APSTATSMC1997-7M] A certain county has 1,000 farms. Corn is grown on 100 of these farms but on none of the others. In order to estimate the total farm acreage of corn for the county, two plans are proposed. Plan I: a.) Sample 20 farms at random. b.) Estimate the mean acreage of corn per farm +/- some standard error. c.) Multiply the mean and standard error by 1000 to get the interval of estimate of the total. Plan II: a.) Identify the 100 corn-growing farms. b.) Sample 20 corn-growing farms at random. c.) Estimate the mean acreage of these 20 corn-growing farms +/- some standard error. d.) Multiply the mean and standard error by 100 to get the interval of estimate of the total. On the basis of information given, which of the following is better method for estimating the total farm acreage of corn for the county? (A) Choose plan I over plan II (B) Choose plan II over plan I (C) Choose either plan, since both are good and will produce equivalent results. (D) Choose neither plan, since neither estimates the total farm acreage of corn. (E) The plans cannot be evaluated from the information given. MC I-13.) [APSTATSFRQ2015-3] Recently, a company acquired the rights to use a forest like the one shown in the photograph below to harvest trees to produce lumber. The company wants to conduct a study to estimate the mean trunk diameter of the trees from the forest by taking a random sample of approximately 5 percent of the tree from the forest. For the study, the company divides the forest into 200 equally sized plots of approximately one acre each, as shown in the figure below. 5

6 Because of previous logging practices and growth patterns, plots with older trees, such as Plot 6, tend to have fewer trees but with larger trunk diameters, and plots with younger trees, such as Plot 121, tend to have more trees but with smaller trunk diameters. This is illustrated in the two figures of Plot 6 and Plot 121 by the varying number and sizes of the symbol. a.) Describe the procedure for using cluster sampling to obtain a random sample of approximately 5 percent of the trees from the forest, using the plots as clusters. 6

7 b.) Describe a procedure for using stratified sampling to obtain a random sample of approximately 5 percent of the trees form the forest, using the plots as strata. c.) For the study, give one advantage of using cluster sampling as described in part a.) over stratified sampling as described in part (b). d.) For the study, give one advantage of using stratified sampling as described in part b.) over cluster sampling as described in part a.). 7

8 II. DESIGN OF STUDIES MC II-1.) [APSTATSMC1997-9] To check the effect of cold temperature on elasticity of two brands of rubber bands, one box of Brand A and one box of Brand B rubber bands are tested. Ten bands from Brand A box are placed in a freezer for two hours and ten bands from the Brand B box are kept at room temperature. The amount of stretch before breakage is measured on each rubber band, and the mean for cold bands is compared to the mean for the others. Is this a good experimental design? (A) No, because the means are not proper statistics for comparison. (B) No, because more than two brands should be used. (C) No, because more temperatures should be used. (D) No, because temperature is confounded with brand. (E) Yes MC II-2.) [APSTATSMC2014-3] A well-designed experiment should have which of the following characteristics? I. Subjects assigned randomly to treatments II. A control group or at least two treatment groups III. Replication (A) I only (B) I and II only (C) I and III only (D) II and III only (E) I, II, and III MC II-3.) [APSTATSMC ] Which of the following distinguishes an observational study from a randomized experiment? (A) In an observational study volunteers are always used, whereas in a randomized experiment a random sample is always taken from the population. (B) In an observational study a random sample is always taken from the population, whereas in a randomized experiment volunteers are always used. (C) In an observational study treatments are not randomly assigned, whereas in a randomized experiment treatments are randomly assigned. (D) In an observational study a control group is never used, whereas in a randomized experiment a control group is always used. (E) An observational study can be double-blind, whereas a randomized experiment can only be single-blind because the experimenter determines who is randomly assigned to each treatment. 8

9 MC II-4.) [APSTATSMC ] A compact disc (CD) manufacturer wanted to determine which of two different cover designs for a newly released CD will generate more sales. The manufacturer chose 70 stores to sell the CD. Thirty-five of these stores were randomly assigned to sell CDs with one of the cover designs and the other 35 were assigned to sell the CDs with the other cover design. The manufacturer recorded the number of CDs sold at each of the stores and found a significant difference between the mean number of CDs sold for the two cover designs. Which of the following gives the conclusion that should be made based on the results and provides the best explanation for the conclusion? (A) It is not reasonable to conclude that the difference in sales was caused by the different cover designs because this was not an experiment. (B) It is not reasonable to conclude that the difference in sales was caused by the different cover designs because there was no control group for comparison. (C) It is not reasonable to conclude that the difference in sales was caused by the different cover designs because the 70 stores were not randomly chosen. (D) It is reasonable to conclude that the difference in sales was caused by the different cover designs because the cover designs were randomly assigned to stores. (E) It is reasonable to conclude that the difference in sales was caused by the different cover designs because the sample size was large. MC II-5.) [2012APSTATSMC ] The manager of a public swimming pool wants to compare the effectiveness of two laundry detergents, Detergent A and Detergent B, in cleaning the towels that are used daily. As each dirty towel is turned in, it is placed into the only washing machine on the premises. When the washing machine contains 20 towels, the manager flips a coin to determine whether Detergent A or Detergent B will be used for that load. The cleanliness of the load of towels is rated on a scale of 1 to 10 by a person who does not know which detergent was used. The manager continues this experiment for many days. Which of the following best describes the manager s study? (A) A completely randomized design (B) A randomized block design with Detergent A and Detergent B as blocks (C) A randomized block design with the washing machine as the block (D) A matched-pairs design with Detergent A and Detergent B as the pair (E) An observational study MC II-6.) [APSTATSMC2002-1] Which of the following is a key distinction between well designed experiments and observational studies? (A) More subjects are available for experiments than for observational studies. (B) Ethical constraints prevent large-scale observational studies. (C) Experiments are less costly to conduct than observational studies. (D) An experiment can show a direct cause-and-effect relationship, whereas an observational study cannot. (E) Tests of significance cannot be used on data collected from an observational study. 9

10 MC II-7.) [APSTATSMC ] A researcher wishes to test a new drug developed to treat hypertension (high blood pressure). A group of 40 hypertensive men and 60 hypertensive women is to be used. The experimenter randomly assigns 20 of the men and 30 of the women to placebo and assigns the rest to the treatment. The major reason for separate assignment for men and women is that (A) it is a large study with 100 subjects. (B) the new drug may affect men and women differently. (C) the new drug may affect hypertensive and nonhypertensive people differently. (D) this design uses matched pairs to detect the new-drug effect. (E) there must be an equal number of subjects in both the placebo group and the treatment group. MC II-8.) [APSTATSFRQ2016-3] Alzheimer s disease results in a loss of cognitive ability beyond what is expected with typical aging. A local newspaper published an article with the following headline. The article reported that a study tracked the medical histories of 21,123 men and women for 23 years. The article stated that, for those who smoked at least two packs of cigarettes a day, the risk of developing Alzheimer s disease was 2.57 times the risk for those who did not smoke. (a) Identify the explanatory and response variables in the study. (b) Is the study described in the article an observational study or an experiment? Explain. (c) Exercise status (regular weekly exercise versus no regular weekly exercise) was mentioned in the article as a possible confounding variable. Explain how exercise status could be a confounding variable in the study. 10

11 III. EXPLORING DATA MC III-1.) [APSTATSMC2014-2] Professor James gave the same test to his three sections. On the 34- question test, the highest score was 32 and the lowest was 15. Based on the information displayed in the boxplot below, which of the following statements is true? (A) Section 1 has the smallest interquartile range. (B) The lowest score in section 2 is highest than the highest score in either of the other sections. (C) Section 2 has the smallest range of scores. (D) The top 25% of scores in section 2 are lower than the highest score in section 3. (E) At least 50% of the scores in section 3 are higher than all of the scores in section 1. MC III-2.) [APSTATSMC2014-7] Each person in a random sample of adults was asked how many DVDs he or she owned. Summary statistics are given below. Which of the following statements is true? (A) Seventy-five percent of the adults in the sample own more than 95 DVDs. (B) Fifty percent of the adults in the sample own between 0 and DVDs. (C) The distribution of the number of DVDs owned appears to be approximately symmetric. (D) The interquartile range of the number of DVDs owned is 65. (E) The distribution of the number of DVDs owned contains outliers on both the low side and the high side. 11

12 MC III-3.) [APSTATSMC2012-2] A random sample of 374 United States pennies was collected, and the age of each penny was determined. According to the boxplot below, what is the approximate interquartile range (IQR) of the ages? MC III-4.) [APSTATSMC ] The table above shows the sample size, the mean, and the median for two samples of measurements. What is the median for the combined sample of 47 measurements? (A) (B) (C) 21(42.6) 26(49.2) 47 (D) 21(45.0) 26(48.5) 47 (E) It cannot be determined from the information given. 12

13 MC III-5.) [APSTATSMC ] The number of hurricanes reaching the East Coast of the United States was recorded for each of the last ten decades by the National Hurricane Center. Summary measures are shown below. Min=12 Max=24 Lower quartile =15 Upper quartile =18 Median=16 n=10 Which of the following statements is true? (A) The smallest observation is 12 and it is an outlier. No other observation is the data set could be outliers. (B) The largest observation is 24 and it is an outlier. No other observations in the data set could be outliers. (C) Both 12 and 24 are outliers. It is possible that there are also other outliers. (D) 12 is an outlier and it is possible that there are other outliers at the low end of data set. There are no outliers at the high end of the data set. (E) 24 is an outlier and it is possible that there are other outliers at the high end of the data set. There are no outliers at the low end of the data set. MC III-6.) [APSTATSMC2015-7] Data were collected on the number of text messages sent by each student in a large school for one day. A boxplot of the data is shown below. Based on the boxplot, which of the following statements is the most reasonable conclusion? (A) There are more students with data values below the median than there are students with data values above the median. (B) There are more students with data values between the first quartile and the median than there are students with data values between the median and the third quartile. (C) There are fewer students with data values between the first quantile and the median than there are students with data values between the median and the third quartile. (D) There are approximately the same number of students with data values between the first quartile and the minimum as there are students with data values between the third quartile and the maximum. (E) The data are less spread out between the first quartile and the median than between the median and the third quartile. 13

14 MC III-7.) [APSTATSMC ] For a sample of 42 rabbits, the mean weight is 5 pounds and the standard deviation of weights is 3 pounds. Which of the following is most likely true about the weights for the rabbits in this sample? (A) The distribution of weights is approximately normal because the sample size is 42, and therefore the central limit theorem applies. (B) The distribution of weights is approximately normal because the standard deviation is less than the mean. (C) The distribution of weights is skewed to the right because the least possible weight is within 2 standard deviations of the mean. (D) The distribution of weights is skewed to the left because the least possible weight is within 2 standard deviations of the mean. (E) The distribution of weights has a median that is greater than the mean. MC III-8.) [APSTATSMC2012-3] The histogram below shows the number of minutes needed by 45 students to finish playing a computer game. Which of the following statements is correct? (A) The distribution is skewed to the right. (B) The distribution is skewed to the left. (C) The distribution appears to be normal. (D) The distribution appears to be chi-square. (E) The distribution appears to be uniform. 14

15 MC III-9.) [APSTATSFRQFORMB2010-1] As a part of US Dept of Agriculture s Super Dump cleanup efforts in the early 1990s, various sites in the country were targeted for cleanup. Three of the targeted sites River X, River Y, and River Z had become contaminated with pesticides because they were located near abandoned pesticide dump sites. Measurements of concentration of aldrin (a commonly used pesticide) were taken at twenty randomly selected locations in each river near the dump sites. The boxplots shown below display the five-number summaries for concentrations, in parts per million (ppm) of aldrin, for the twenty locations that were sampled in each of the three rivers. Compare the distributions of the concentration of Aldrin among the three rivers. 15

16 MC III-10.) [APSTATSMC ] The histogram below displays the times, in minutes, needed for each chimpanzee in a sample of 26 to complete a simple navigational task. The largest observation, 93, is an outlier since Q.5( Q Q ) Which of the following boxplots could represent the information in the histogram?

17 MC III-11.) [APSTATSMC ] A school is having a contest in which students guess the number of candies in a jar. The student whose guess is closest to the correct number of candies in the jar wins a prize. The number of candies guessed by male and female students is shown in the back-to-back stemplot below. Which of the following statements is true about the distributions of guesses? (A) The distribution of guesses for male students is skewed to the left, and the distribution of guesses for female students is skewed to the right. (B) The distribution og guesses for male students is skewed to the right, and the distribution of guesses for female students is skewed to the left. (C) The distribution of guesses for male and female students are both skewed to the right. (D) The distribution of guesses for male and female students are both skewed to the left. (E) The distribution of guesses for male and female students are both symmetric. MC III-12.) [CBAPSTATSPRACTICEPROBLEM] Consider a data set of positive values, at least two of which are not equal. Which of the following sample statistics will be changed when each value in this data set is multiplied by a constant whose absolute value is greater than 1? I. The mean II. II. The median III. III. The standard deviation (A) I only (B) II only (C) III only (D) I and II only (E) I, II, and III 17

18 MC III-13.) [APSTATSMC ] MC III-14.) [APSTATSMC2014-3] Administrators at a state university computed the mean GPA (grade point average) for juniors and seniors majoring in either physics or chemistry. The results are displayed in the table below. When juniors and seniors are grouped together, could physics majors have a higher mean GPA than chemistry majors? 18

19 MC III-15.) [APSTATSFRQ2013-3] An environmental group conducted a study to determine whether crows in a certain region were ingesting food containing unhealthy level of lead. A biologist classified lead levels greater than 6.0 parts per million (ppm) as unhealthy. The lead levels of a random sample of 23 crows in the region were measured and recorded. The data are shown in the stamplot below. What proportion of crows in the sample had lead levels that are classified by the biologist as unhealthy? MC III-16.) [APSTATSFRQB2011] Records are kept by each state in US on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers (P-T ratio) can be calculated for each state. The histogram below show the P-T ratio for every state during the school year. The histogram on the left displays the ratios for 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of Mississippi River. 19

20 (a) Describe how you would use the histograms to estimate the median P-T ratio for each group (west and east) of states. Then use this procedure to estimate the median of the west group and the median of the east group. (b) Write a few sentences comparing the distribution of P-T ratios for states in the two groups (west and east) during the school year. (c) Using your answers in parts (a) and (b), explain how you think the mean P-T ratio during the school year will compare for the two groups (west and east). MC III-17.) [APSTATSFRQ2016-1] Robin works as a server in a small restaurant, where she can earn a tip (extra money) from each customer she servers. The histogram shows the distribution of her 60 tip amounts for one day of work. (a) Write a few sentences to describe the distribution of tip amounts for the day shown. (b) One of the tip amounts was \$8. If the \$8 tip had been \$18, what effect would the increase have had on the following statistics? Justify your answers. The mean: The median: 20

21 MC III-18.) [APSTATSMC2007-1] The statistics below provide a summary of the distribution of heights, in inches, for a simple random sample of 200 young children. Mean: 46 inches Median: 45 inches Standard Deviation: 3 inches First Quartile: 43 inches Third Quartile: 48 inches About 100 children in sample have heights that are (A) less than 43 inches (B) less than 48 inches (C) between 43 and 48 inches (D) between 40 and 52 inches (E) more than 46 inches MC III-19.) APSTATSMC ] One hundred people were interviewed and classified according to their attitude toward small cars and their personality type. The results are shown in the table below. Which of the following is true? (A) Of the three attitude groups, the group with negative attitude has the highest proportion of type A personality types. (B) Of the three attitude groups, the group with the neutral attitude has the highest proportion of type B personality types. (C) For each personality type, more than half of the 100 respondents have a neutral attitude toward small cars. (D) The proportion that has a positive attitude toward small cars is higher among people with a type B personality type than among people with type A personality type. (E) More than half of the 100 respondents have a type A personality type and a positive attitude toward small cars. 21

22 MC III-20.) [APSTATSFRQ ] The boxplot below summarize two data sets, A and B. Which of the following must be true? I. Set A contains more data than Set B. II. The box of Set A contains more data than the box of Set B. III. The data in Set A have a larger range than the data in Set B. IV. (A) I only (B) III only (C) I and II only (D) II and III only (E) I, II and III MC III-21.) [APSTATSFRQ2012-5] The histogram below displays the frequencies of waiting times, in minutes, for 175 patients in a dentist s office. Which of the following could be the median of the waiting times, in minutes? (A) 2.50 (B) 7.25 (C) (D) (E)

23 MC III-22.) [APSTATSFRQ2012-6] Data were collected on the amount, in dollars, that individual customers spent on dinner in an Italian restaurant. The quartiles for these data are given below. Which of the following statements must be true for these customers? (A) At least half of the customers spent less than or equal to \$44.27 and at least half spent greater than or equal to \$ (B) Seventy-five percent of the customers spent between \$36.27 and \$ (C) Twenty-five percent of the customers spent less than or equal to \$58.97 and the remaining 75 percent spent greater than or equal to \$ (D) The mean amount spent by customers is \$ (E) A majority of customers spent \$

24 Answers Part 1 Review I. SAMPLING MC I-1.) B MC I-2.) E MC I-3.) C. For Choice A, to reduce the number of people is not the reason to divide into strata. For Choice B, tries to divide by size, not homogeneity. For Choice D, it is the division for cluster sampling. For Choice E, to divide area into equal sizes is not the purpose to form strata. MC I-4.) E. For Choice A, reduced sample size may still be useful as long as the conditions are satisfied. For Choice B, the results are independent as long as those people do not influence each other. For Choice C, no info on this. For Choice D, the people who don t response may not response regardless the ways it was sampled. MC I-5.) B MC I-6.) D MC I-7.) E MC I-8.) B MC I-9.) E MC I-10.) A MC I-11.) E MC I-12.) B. Plan I is SRS, and plan II is stratified sampling. MC I-13.) a.) 1.) Label each plot from 1 to ) Randomly generate 10 numbers from 1 to ) Measure ALL the trees in those 10 sample plots. b.) 1.) Divide the area into 200 plots. 2.) For each plot, label ALL the trees. Then, randomly generate numbers to cover 5% of all the labeled trees. 3.) Measure the labeled trees in each plot. c.) Cluster sampling is relatively easier and cheaper to do, since only 10 plots of all trees need to be measured. The drawback is that those sampled plots may not represent the forest well when the distribution varies from area to area. d.) The stratified sampling may better represent the entire area, but the survey job could be harder or more expensive to finish, since the trees in each plot needs to be labeled and sampling is needed in every plot. 24

25 II. DESIGN OF STUDIES MC II-1.) D MC II-2.) E MC II-3.) C. Observational study treatments are not randomly assigned. MC II-4.) D. The word cause should be avoided. MC II-5.) A MC II-6.) D MC II-7.) B MC II-8.) a.). Explanatory variable is the degree of cigarette smoking. Response variable is whether that person develops Alzheimer s disease. b.) Observational: the people were not assigned to certain degree of cigarette smoking. c.) Two possible cases to confound: people who exercise more may smoke less or people who exercise more may be less likely to have Alzheimer s disease. In either case exercise is a confounding factor. III. EXPLORING DATA MULTIPLE-CHOICE PROBLEMS MC III-1.) E. Each quartile contains a 25% of data. MC III-2.) D. The interquartile range is MC III-3.) 15. IQR is MC III-4.) E. MC III-5.) E. MC III-6.) D MC III-7.) C. The curve can only span 5 pounds (less than 2 sigmas) to the left and possibly more on the right. So, it may not be symmetric around the mean or median. MC III-8.) B. The distribution is skewed to the left. MC III-9.) Shape (S): River X is skewed to the right; River Y is more symmetrical; River Z is skewed to left. Outlier (O): No outlier for all rivers. Center (C) : River X has the highest median of the all three rivers. Spread (S): River Z has the smallest spread and clustered around the center. 25

26 MC III-10.) D. quartile value is a number form the data, and 70 is the largest number in the data that is less than , and the median is around the 13 th number which is greater than 20. MC III-11.) D. The left-side should be the side with smaller numbers and the right-side should have larger numbers. MC III-12.) E. The problem can be understood better when the concepts and formulas of expected values and variance are introduced. Let X be random variable of the original data set and Y kx for a constant 2 k 0. Then E[ Y] k[ X ], Var[ Y] k Var[ X ] or y k y, and median for Y will be multiplied by k. All three measures will change. MC III-13.) D. The median is around 55, and the median divides the area into two congruent parts. MC III-14.) Yes. Since there are no numbers given for each of the categories, think of the extreme case that there are only one junior taking physics and a lot more seniors taking physics, and only one senior taking chemistry and a lot more juniors taking chemistry. The averages in this case could be close to a 3.2 for seniors taking physics and a 3.0 for juniors taking chemistry MC III-15.) 17.4%. There are 4 crows with ppm higher than 6: 6.3, 6.4, 6.6, 6.8. So, 17.4% MC III-16. ) a.) West median is between 15~16, east median is between 15~16. Medians are similar. b.) West: spread out -> range=22-12=10, skewed to right, unimodal, more variability. East: clustered -> range =19-12=7, symmetric, unimodal. So, west has more variability. c.) Since they have similar medians and West skewed to right, West should have high mean. MC III-17.) a.) SOCS. S: skewed to the right; O: one tip around \$20 is an outlier; C: median is around 2.5~5; S: most tips are less than \$5 and the range is 0~22.5. b.) mean has right dollars, and median is unchanged, since it is around 2.5~5, and \$8 and \$18 are on 60 6 MC III-18.) C. Half of the children are in the middle 50% MC III-19.) B. 45% MC III-20.) B. MC III-21.) B. MC III-22.) A. 26

27 Part 2 Quarterly Exam Questions MULTIPLE-CHOICE QUESTIONS I. SAMPLING MC I-1.) [APSTATSMC2002-9] A volunteer for a mayoral candidate's campaign periodically conducts polls to estimate the proportion of people in the city who are planning to vote for this candidate in the upcoming election. Two weeks before the election, the volunteer plans to double the sample size in the polls. The main purpose of this is to (A) reduce nonresponse bias (B) reduce the effects of confounding variables (C) reduce bias due to the interviewer effect (D) decrease the variability in the population (E) decrease the standard deviation of the sampling distribution of the sample proportion MC I-2.) [APSTATSMC ] A high school statistics class wants to conduct a survey to determine what percentage of students in the school would be willing to pay a fee for participating in after-school activities. Twenty students are randomly selected from each of the freshman, sophomore, junior, and senior classes to complete the survey. This plan is an example of which type of sampling? (A) Cluster (B) Convenience (C) Simple random (D) Stratified random (E) Systematic MC I-2.) [APSTATSMC ] Jason wants to determine how age and gender are related to political party preference in his town. Voter registration lists are stratified by gender and age-group. Jason selects a simple random sample of 50 men from the 20 to 29 age-group and records their age, gender, and party registration (Democratic, Republican, neither). He also selects an independent simple random sample of 60 women from the 40 to 49 age-group and records the same information. Of the following, which is the most important observation about Jason's plan? (A) The plan is well conceived and should serve the intended purpose. (B) His samples are too small. (C) He should have used equal sample sizes. (D) He should have randomly selected the two age groups instead of choosing them nonrandomly. (E) He will be unable to tell whether a difference in party affiliation is related to differences in age or to the difference in gender. 27

28 MC I-4.) [APSTATSMC ] A study of existing records of 27,000 automobile accidents involving children in Michigan found that about 10 percent of children who were wearing a seatbelt (group SB) were injured and that about 15 percent of children who were not wearing a seatbelt (group NSB) were injured. Which of the following statements should NOT be included in a summary report about this study? (A) Driver behavior may be a potential confounding factor. (B) The child's location in the car may be a potential confounding factor. (C) This study was not an experiment, and cause-and-effect inferences are not warranted. (D) This study demonstrates clearly that seat belts save children from injury. (E) Concluding that seatbelts save children from injury is risky, at least until the study is independently replicated. MC I-5.) [APSTATSMC ] Which of the following is NOT a characteristic of stratified sampling? (A) Random sampling is part of the sampling procedure. (B) The population is divided into groups of units that are similar on some characteristic. (C) The strata are based on facts known before the sample is selected. (D) Each individual unit in the population belongs to one and only one of strata. (E) Every possible subset of population, of the desired sample size, has an equal chance of being selected. MC I-6.) [APSTATSMC ] A polling firm is interested in surveying a representative sample of registered voters in the United States. The firm has automated its sampling so that random phone numbers within the United States are called. Each time a number is called, the procedure below is followed. If there is no response or if an answering machine is reached, another number is automatically called. If a person answers, a survey worker verifies that the person is at least 18 years of age. If the person is not at least 18 years of age, no response is recorded, and another number is called. If the person is at least 18 years of age, that person is surveyed. Some people claim the procedure being used does not permit the results to be extended to all registered voters. Which of the following is NOT a legitimate concern about the procedure being used? (A) Registered voters with children under the age of 18 years may be underrepresented in the sample. (B) Registered voters with unlisted telephone numbers may be underrepresented in the sample. (C) Registered voters who have more than one telephone number may be overrepresented in the sample. (D) Registered voters who live in households consisting of more than one voter may be underrepresented. (E) People who are not registered to vote may bias the sample results. 28

29 MC I-7.) [APSTATSMC ] When using a one-sample t-procedure to construct a confidence interval for the mean of a finite population, a condition is that the population size be at least 10 times the sample size. The reason for the condition is to ensure that (A) the sample size is large enough (B) the central limit theorem is applicable for the sample mean (C) the sample standard deviation is a good approximation of the population standard deviation (D) the degree of dependence among observations is negligible (E) the sampling method is not biased MC I-8.) [APSTATSMC2013-2] A school principal wanted to investigate student opinion about the food served in the school cafeteria. The principal selected at 50 first-year students, 50 second-year students, 50 third-year students, and 50 fourth-year students to complete a questionnaire. Which of the following best describes the principal s sampling plan? (A) A stratified random sample (B) A simple random sample (C) A cluster sample (D) A convenience sample (E) A systematic sample MC I-9.) [APSTATSMC ] A certain motel is roughly 20 miles from the entrance to Yosemite National Park. The motel manager wants to get a better estimate of the distance and asks five people to each measure the distance, to the nearest tenth of a mile, using the odometer in his or her car. The manager will use the median of the five measurements as the estimate of the distance. Which of the following statements is NOT a statistical justification for the manager s plan? (A) Odometer reading should be considered a variable when used to measure to measure this distance. (B) The median of the five measurements is more likely to be close to the actual distance than is a single measurements. (C) The actual distance should be considered a variable, and taking five measurements allows the manager to estimate the variability in the actual distance. (D) If one or two odometers give inaccurate readings, the estimate still should be fairly close to the actual distance. (E) The manager can get some indication of how far off the estimate might be. 29

30 MC I-10.) [APSTATSMC ] A regional transportation authority is interested in estimating the mean number of minutes working audits in the region spends commuting to work on a typical day. A random sample of working audits will be selected from each of three strata: urban, suburban, and rural. Selected individuals will be asked the number of minutes they spend commuting to work on a typical day. Why is stratification used in this situation? (A) To remove bias when estimating the proportion of working audits living in urban, suburban, and rural areas. (B) To remove bias when estimating the mean commuting time (C) To reduce bias when estimating the mean commuting time (D) To decrease the variability in estimates of the proportion of working adults living in urban, suburban, and rural areas. (E) To decrease the variability in estimates of the mean commuting time. 30

31 II. DESIGN OF STUDIES MC II-1.) [APSTATSMC ] An experiment will be concluded to determine whether children learn their multiplication facts better by practicing with flash cards or by practicing on a computer. Children who volunteer for the experiment will be randomly assigned to one of the two treatments. Because the children s gender may affect the outcome, there will be blocking by gender. After practice, the children will be given a test on their multiplication facts. Why will it be possible to conduct a double-blind experiment? (A) The experimenter will know whether the child is a boy or a girl and whether he or she used flash cards or the computer. (B) The child will know whether he or she is a boy or a girl. (C) The child will know whether he or she used flash cards or computer. (D) The person who grades the tests will know whether the child was a boy or a girl. (E) The person who grades the tests will know whether the child used flash cards or the computer. MC II-2.) [APSTATSMC ] MC II-3.) [APSTATSMC ] A study of existing records of 27,000 automobile accidents involving children in Michigan found that about 10 percent of children who were wearing a seatbelt (group SB) were injured and that about 15 percent of children who were not wearing a seatbelt (group NSB) were injured. Which of the following statements should NOT be included in a summary report about this study? (A) Driver behavior may be a potential confounding factor. (B) The child's location in the car may be a potential confounding factor. (C) This study was not an experiment, and cause-and-effect inferences are not warranted. (D) This study demonstrates clearly that seat belts save children from injury. (E) Concluding that seatbelts save children from injury is risky, at least until the study is independently replicated. 31

32 MC II-4.) [APSTATSMC2007-9] A television news editor would like to know how local registered voters would respond to the question, "Are you in favor of the school bond measure that will be voted on in an upcoming special election?" A television survey is conducted during a break in the evening news by listing two telephone numbers side by side on the screen, one for viewers to call if they approve of the bond measure, and the other to call if they disapprove. This survey method could produce biased results for a number of reasons. Which one of the following is the most obvious reason? (A) It uses a stratified sample rather than a simple random sample. (B) People who feel strongly about the issue are more likely to respond. (C) Viewers should be told about the issues before the survey is conducted. (D) Some registered voters who call might not vote in the election. (E) The wording of the question is biased. MC II-5.) [APSTATSMC ] Automobile brake pads are either metallic or nonmetallic. An experiment is to be conducted to determine whether the stopping distance is the same for both types of brake pads. In previous studies, it was determined that car size (small, medium, large) is associated with stopping distance, but car type (sedan, wagon, coupe) is not associated with stopping distance. The experiment would be best done (A) by blocking on car size (B) by blocking on car type (C) by blocking on stopping distance (D) by blocking on brake pad type (E) without blocking MC II-6.) [APSTATSMC ] A group of students has 60 houseflies in a large container and needs to assign 20 to each of the three groups labeled A, B, and C for an experiment. They can capture the flies one at a time when the flies enter a side chamber in the container that is baited with food. Which of the following methods will be most likely to result in three comparable groups of 20 houseflies each? (A) Label the first 20 flies caught as Group A, the second 20 caught as group B, and the third 20 caught as group C. (B) Write the letters A, B, and C on separate slips of paper. Randomly pick one of the slips of paper and assign the first 20 flies caught to that group. Pick another slip and assign the next 20 flies caught to that group. Assign the remaining flies to the remaining group. (C) When each fly is caught, roll a die. If the die shows an even number, the fly is labeled A. If the die shows an odd number, the fly is labeled B. When 20 flies have been labeled A and 20 have been labeled B, the remaining flies are then labeled C. (D) Place each fly in its own numbered container (numbered from 1 to 60) in the order that it was caught. Write the numbers from 1 to 60 on slips of paper, put the slips in a jar, and mix them well. Pick 20 numbers out of the jar. Assign the flies in the containers with those numbers to group A. Pick 20 more numbers and assign the flies in the containers with those numbers to group B. Assign the remaining 20 flies to group C. (E) When each fly is caught, roll a die. If the die shows a 1 or 2, the fly is labeled A. If the die shows a 3 or 4, the fly is labeled B. If the die shows a 5 or 6, the fly is labeled C. Repeat this process for all 60 flies. 32

33 MC II-7.) [APSTATSMC ] A randomized block design will be used in an experiment to compare two lotions that protect people from getting sunburned. Which of the following should guide the formation of the blocks? (A) Participants in the same block should receive the same location. (B) Participants should be randomly assigned to the blocks. (C) Participants should be kept blind as to which block they are in. (D) Participants within each block should be as similar as possible with respect to how easily they get sunburned. (E) Participants within each block should be as different as possible with respect to how easily they get sunburned. MC II-8.) [APSTATSMC ] The dining and nutrition staff at the University of Georgia plans to survey students to get their opinion on the new nutrition program introduced this semester at each of the on-campus dining halls. They are interested in getting feedback from students living both on-campus and off-campus about the new gluten-free and vegetarian options offered at each meal. Which of the following sampling methods is the most appropriate for accomplishing this? (A) Hand out a survey to every 10th student that enters each dining hall on a specified day. (B) Group students by housing status, one group representing those living on campus and the other representing those living off campus. a survey to 100 randomly selected students from each group. (C) On equally sized slips of paper, write down the names of all the dormitories on campus as well as all the apartment complexes off campus. Put all the names in a hat, mix them well, and draw out five of them. a survey to all students in the five randomly selected buildings. (D) Hand out a survey to the first 50 students that enter each dining hall on a specified day. (E) Create a Facebook page for each dining hall where students can post their comments. MC II-9.) [APSTATSMC ] A university statistics professor wants to know if including review problems in each set of homework problems (treatment I) is more effective than including only new problems (treatment II). He teaches three sections of the course: a morning, an afternoon, and an evening section, each with 30 students. Within each section the professor randomly assigns 15 students to treatment I and 15 students to treatment II. Compared to randomly assigning 45 students to each treatment, what is the advantage of randomly assigning 15 students to each treatment within each section? (A) Random assignment within section eliminates the placebo effect. (B) Random assignment within section allows the professor to generalize the results to all sections. (C) Random assignment within section permits the professor and students to be blinded as to the treatment group assignment. (D) Random assignment within section accounts for possible differences in performance due to the time of day the class meets. (E) Random assignment within section reduces the effect of nonresponse bias. 33

34 MC II-10.) [APSTATSMC ] Nearly 12,000 high school students across 11 different countries were surveyed about both their sleeping habits and their performance in school. Based on the results, researchers concluded that a lack of sleep is linked to students earning poor grades in school. Which of the following statements is true? (A) This is an observational study. Therefore, researchers cannot conclude that a lack of sleep causes poor grades. (B) This is an observational study. Therefore, researchers can conclude that a lack of sleep causes poor grades. (C) This study is a well-designed experiment. Therefore, researchers cannot conclude that a lack of sleep causes poor grades. (D) This study is a well-designed experiment. Therefore, researchers can conclude that a lack of sleep causes poor grades. (E) This is neither an observational study nor a well-designed experiment. 34

35 III.. EXPLORING DATA MC III-1.) [APSTATSMC ] MC III-2.) [APSTATSMC ] 35

36 MC III-3.) [APSTATSMC ] MC III-4.) [APSTATSMC ] The boxplots shown above summarize two data sets, I and II. Based on the boxplots, which of the following statements about these two data sets CANNOT be justified? (A) The range of data set I is equal to the range of data set II. (B) The interquartile range of data set I is equal to the interquartile range of data set II. (C) The median of data set I is less than the median of data set II. (D) Data set I and data set II have the same number of data points. (E) About 75% of the values in data set II are greater than or equal to about 50% of the values in data set I. 36

37 MC III-5.) [APSTATSMC ] A small town employs 34 salaried, nonunion employees. Each employee receives an annual salary increase of between \$500 and \$2000 based on a performance review by the mayor's staff. Some employees are members of the mayor's political party, and the rest are not. Students at the local high school form two lists, A and B, one for the raises granted to employees who are in the mayor's party, and the other for raises granted to employees who are not. They want to display a graph (or graphs) of the salary increases in the student newspaper that readers can use to judge whether the two groups of employees have been treated in a reasonably equitable manner. Which of the following displays is least likely to be useful to readers for this purpose? (A) Back-to-back stemplots of A and B (B) Scatterplot of B versus A (C) Parallel boxplots of A and B (D) Histograms of A and B that are drawn to the same scale (E) Dotplots of A and B that are drawn to the same scale MC III-6.) [APSTATSMC ] The figure above shows a cumulative relative frequency histogram of 40 scores on a test given in an AP Statistics class. Which of the following conclusions can be made from the graph? (A) There is greater variability in the lower 20 test scores than in the higher 20 test scores. (B) The median test score is less than 50. (C) Sixty percent of the students had test scores above 80. (D) If the passing score is 70, most students did not pass the test. (E) The horizontal nature of the graph for the test scores of 60 and below indicates that those scores occurred most frequently. 37

38 MC III-7.) [APSTATSMC ] The histograms below represent the distribution of five different data sets, each containing 28 integers, from 1 through 7, inclusive. The horizontal and vertical scales are the same for all graphs. Which graph represents the data set with the largest standard deviation. MC III-8.) [APSTATSMC ] Five estimators for a parameter are being evaluated. The true value of the parameter is 0. Simulations of 100 random samples, each of size n, are drawn from the population. For each simulated sample, the five estimates are computed. The histograms below display the simulated sampling distributions for the five estimators. Which simulated sampling distribution is associated with the best estimator for this parameter? 38

39 MC III-9.) [APSTATSMC ] The amount of time required for each of 100 mice to navigate through a maze was recorded. The histogram below shows the distribution of times, in seconds, for the 100 mices. Which of the following values is closest to the standard deviation of the 100 mice? (A) 2.5 seconds (B) 10 seconds (C) 20 seconds (D) 50 seconds (E) 90 seconds MC III-10.) [APSTATSMC ] A graph (not shown) of the selling prices of homes in a certain city for the month of April reveals that the distribution is skewed to the left. Which of the following statements is the most reasonable conclusion about the selling prices based on the graph? (A) The mean is greater than the median. (B) The median is the average of the first quartile and the third quartile. (C) There are fewer selling prices between the first quartile and the median than there are between the median and the third quartile. (D) There are more selling prices that are less than the mean than selling prices that are greater than the mean. (E) The value of maximum minus third quartile is less than the value of first quartile minus minimum. 39

42 FRQ 3.3.) [APSTATSFRQ ] Two large corporations, A and B, hire many new college graduates as accountants at entry-level positions. In 2009 the starting salary for an entry-level accountant position was \$36,000 a year at both corporations. At each corporation, data were collected from 30 employees who were hired in 2009 as entry-level accountants and were still employed at the corporation five years later. The yearly salaries of the 60 employees in 2014 are summarized in the boxplots below. a.) Write a few sentences comparing the distributions of the yearly salaries at the two corporations. b.) Suppose both corporations offered you a job for \$36,000 a year as an entry-level accountant. (i) Based on the boxplots, give one reason why you might choose to accept the job at corporation A. (ii) Based on the boxplots, give one reason why you might choose to accept the job at corporation B. 42

43 FRQ 3.4.) [APSTATSFRQ , Investigative Task ] Tropical storms in the Pacific Ocean with sustained winds that exceed 74 miles per hour are called typhoons. Graph A below displays the number of recorded typhoons in two regions of the Pacific Ocean the Eastern Pacific and the Western Pacific for the years from 1997 to a.) Compare the distributions of yearly frequencies of typhoons for the two regions of the Pacific Ocean for the years from 1997 to b.) For each region, describe how the yearly frequencies changed over the time period from 1997 to

44 A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below. For example, the Eastern Pacific 4-year moving average for 2000 is the average of 22, 16, 15, and 21, which is equal to c.) Show how to calculate the 4-year moving average for the year 2010 in the Western Pacific. Write your value in the appropriate place in the table. 44

45 d.) Graph B below shows both yearly frequencies (connected by dashed lines) and the respective 4-year moving averages (connected by solid lines). Use your answer in part (c) to complete the graph. e.) Consider graph B. i) What information is more apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons? ii) What information is less apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons? 45

46 Answers Part 2 Quarterly Exam Questions I. SAMPLING MC I-1.) E, p 0 q 0 n MC I-2.) D MC I-3.) E, Jason probably should have chosen the same age groups for both men and women. MC I-4.) D, Can t claim cause-effect for observational study. MC I-5.) E MC I-6.) B MC I-7.) D MC I-8.) A MC I-9.) C MC I-10.) E II. DESIGN OF STUDIES MC II-1.) C MC II-2.) E MC II-3.) E MC II-4.) B MC II-5.) A MC II-6.) D MC II-7.) D MC II-8.) B MC II-9.) D MC II-10.) A III. EXPLORING DATA MC III-1.) D MC III-2.) B MC III-3.) A MC III-4.) D MC III-5.) B MC III-6.) A MC III-7.) D MC III-8.) B MC III-9.) B MC III-10.) E 46

47 The following answers/solutions are from College Board. Your answers/solutions could vary. FRQ 4.1.) a.) The median is less affected by skewness and outliers than the mean. With a variable such as income, a small number of very large incomes could dramatically increase the mean but not the median. Therefore, the median would provide a better estimate of a typical income value. b.) Method 2 is better than Method 1. A sample obtained from Method 1 could be biased because of the voluntary nature of the response. It is plausible that class members with larger incomes might be more likely to return the form than class members with smaller incomes. The mean income for such a sample would overestimate the mean income of all class members. With Method 2, despite the smaller sample size, the random selection is likely to result in a sample that is more representative of the entire class and produce an unbiased estimate of mean yearly income of all class members. FRQ 4.2.) a.). The study was an experiment because treatments (D-cycloserine or placebo) were imposed by the researchers on the people with acrophobia. b.) No, the experiment was designed to compare the D-cycloserine group with a control group that received the placebo. The researchers can conclude that the D-cycloserine pill and two therapy sessions show significantly more improvement than a placebo and two therapy sessions. However, there is no basis for comparison with another group of people with acrophobia who received eight therapy sessions and no pill. c.) One example is that if the therapists were allowed to choose who received the placebo and who received D-cycloserine, they might assign the people with more severe acrophobia to one of the groups and the people with less severe acrophobia to the other group. Thus, the improvement after only two therapy sessions could be related to the initial severity of the acrophobia rather than to the effects of D-cycloserine. FRQ 4.3.) 47

