Name: Exam Score: Instructions: This exam covers the material from chapter 1 through 3. Please read each question carefully before you attempt to solve them. Remember that you have to show all of your work clearly in order to get credit. Multiple question answers without explanations will get zero points. Any problem requiring data will indicate such fact. The data can be found on the class website (www.math-tek.com). The exam is closed book. Good luck! Problem 1: Researchers collected data on 16,500 high school students in the US in an attempt to identify the determinants of academic performance. The variables of interest in the study included each student s cumulative GPA, family income, teacher quality, the number of laboratories in the school attended, neighborhood crime rate, GPA of friends, and the level of education of the student s parents; parent s educational level was recorded using labels such as high school graduate, college graduate, and so on. Identify the following. a. Population b. Quantitative independent variable? Explain why the variable you chose is quantitative. c. Categorical independent variable? Explain why the variable you chose is categorical. d. Confounding variables. What makes them confounding variables? e. Is this study an experiment? Explain your answer. f. Can a causal relationship be established? a. All high school students in the US. b. Family income: A numerical quantity is likely to be reported when describing monetary information. c. Parent s educational attainment: we are told the information was reported using labels. d. Student Schedule: student study time can probably explain a large portion of the variation in academic performance. e. This study is not an experiment. It is unlikely that students would be randomly assigned to different type of families. f. This is not an experiment, so technically speaking we cannot establish a causal relationship. Page 1 of 7
Problem 2: Multiple Choice According to the following data table, which variable(s) is(are) categorical? Explain your answer. Summary Statistics Age Gender Shoe Size Ethnicity 18 1 10 1 23 0 7 0 21 0 6 2 19 1 11 1 20 1 10 3 a. Gender and ethnicity b. None are categorical because there are only numbers in the table c. Gender, shoe size, and ethnicity d. Gender e. All of the above f. None of the above Answer is a. Gender and ethnicity are categorical (i.e. male, female, Asian, African American,ect.) but they are coded in this example. Problem 3: Multiple Choice What would you expect the shape of the distribution described to look like? Explain your reasoning. The distribution of the time (in minutes) it takes to drive to work using the same route each day. Explain your answer. a. Right Skewed b. Left Skewed c. Symmetric d. None of the above Answer is c. The distribution of the time it takes to drive to work using the same route each day should be roughly symmetric because the time you leave your house is probably the same each day. The commute times will be very similar on a day-to-day basis. Problem 4: Multiple Choice A large state university conducted a survey among their students and received 300 responses. The survey asked the students to provide the following information: Age, Year in School (Freshman, Sophomore,Junior, Senior), Gender, GPA. What type of graph would you use to describe the variables Gender and Year in School? Explain your answer. Page 2 of 7
a. A side-by-side histogram should be used since these are two numerical variables. b. A side-by-side bar chart should be used since these are two numerical variables. c. A side-by-side histogram should be used since these are two categorical variables. d. A side-by-side bar chart should be used since these are two categorical variables. Answer is d. Problem 5: Multiple Choice What is the difference between a histogram and a relative frequency histogram? Explain your answer. a. A histogram uses counts to record how many observations are in a data set, and a relative histogram uses proportions. b. A histogram uses categories to record how many observations are in a data set, and a relative histogram uses counts. c. A histogram uses numbers to record how many observations are in a data set, and a relative histogram uses categories. d. A histogram uses proportions to record how many observations are in a data set, and a relative histogram uses counts. e. None of the above Answer is A. Problem 6: Multiple Choice Order the following histograms from least to most variability. Explain your answer. a. (ii), (i), (iii) Page 3 of 7
b. (iii), (i), (ii) c. (ii), (iii), (i) d. (i), (ii), (iii) e. None of the above Answer is b. Problem 7: Multiple Choice What percentage of the participants had a heart rate greater than 130 bpm? Show your calculation and explain your answer. a. 13% b. 53% c. 50% d. 33% e. 27% f. 10% The answer is b, # over 130 bpm n = 8 15. Page 4 of 7
Problem 8: The Executions excel file shows the number of total executions in the United States from 1977 to 2014. a. Find the median and interpret. b. Find the IQR to measure the variability in the number of executions. What can you discern from this information? (Hint: It might be easier to interpret the IQR as an interval) c. What is the mean number of executions? Interpret this number. d. How does the mean and median compare to one another? Explain your reasoning. Which is a better measure of center in this case, the mean or the median? (Hint: histogram) e. Which year had the highest number of executions? Which year had the lowest number of executions? a. The median number of executions in the US (per year) is 38. b. IQR = Q3 Q1 = 55.25 16.50 = 38.75. About 50% of the number of yearly executions from 1977 to 2014 varied from 17 to 55 executions. That means that the number of executions carried out varies quite a lot. c. The mean number of executions in the US (per year) is 36. d. The mean and median are relatively close to one another. The distribution is slightly right skewed, but there is a resemblance to a symmetric distribution too. In this case, either the mean or median would serve well as a measure of center. e. The highest number of executions took place in 1999 (98 executions) and the lowest in 1978 and 1980 (0 executions in both years). Problem 9: The Behavioral Risk Factor Surveillance System (BRFSS) is the nation s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. We will focus on a random sample of 20,000 people from the BRFSS survey conducted in the year 2000. There are over 200 variables in this data set, but we will work with a small subset. Use the BRFSS excel file to answer the following questions. a. Consider the weight and wtdesire variables, which are the weight and the desired weight of the survey participants. Calculate summary statistics for each variable (i.e. five number summary and mean). On average, are people heavier than their desired weight? Explain using your summary statistics. b. Create a histogram of peoples weights using a class width of 20. What is the shape of the distribution? Page 5 of 7
c. Are there any outliers? If so, how much do these individuals weigh? a. The summary statistics are given in the table below. Notice that the average person weighs about 170 lbs but desires to weigh about 155 lbs, which means that on average people tend to be heavier than the weight they desire to be. Summary Statistics Statistic weight wtdesire Min. 68.0 68.0 1st Qu. 140.0 130.0 Median 165.0 150.0 Mean 169.7 155.1 3rd Qu. 190.0 175.0 Max. 500.0 680.0 b. The histogram is slightly right-skewed, but you can also argue that it is relatively symmetric. Histogram of Participant Weight Frequency 0 1000 2000 3000 4000 100 200 300 400 500 Weight (pounds) c. There are two outliers that weight 495 and 500 pounds. Problem 10: The standard deviation for a sample is given by the formula Σ(x x) 2 s = n 1 a. Clearly explain what the numerator of this formula calculates and interpret the calculation. Do the same for the denominator, and finally, do the same for the entire Page 6 of 7
formula. b Give an example in which you interpret the standard deviation; use imaginary numbers for the mean and standard deviation in your example. The numerator depicts the sum of the squared distances of each observation from the sample average. The denominator is the number of observations minus 1. The equation essentially calculates the dispersion of the data about the mean. a. The numerator depicts the sum of the squared distances of each observation from the sample average. The denominator is the number of observations minus 1. The equation essentially calculates the dispersion of the data about the mean. b. Imagine that the mean score for a quiz is 70% and the standard deviation is 10 percentage points. This tells us that each quiz scores sits an average distance of 10 percentage points away from 70%. Page 7 of 7