ECO220 Mid-Term Test (June 29, 2005) Page of 5 SOLUTIONS T () Adding up the heights of the bars in a frequency histogram yields the sample size. F (2) In general, as the sample size increases you should reduce the number of bins in a density histogram. F (3) The standard error of the population mean is given by: T (4) The interquartile range is robust to outliers.. N F (5) A 90% confidence interval estimator is more likely to exclude the true population parameter with a sample size of 0 compared to a sample size of 00. F (6) Inferences should not be based on very small samples because estimators from such samples are biased. T (7) If is a normal random variable then variable. Y 2 * 0 e is also a normal random T (8) If, 2, 3, 4, 5 are independent random variables with identical E is equivalent to ] probability distributions, then [ 5] E [ 5 i. i F (9) If, 2, 3, 4, 5 are independent random variables with identical probability distributions, then V[ 5] is equivalent to V [ 5 i ]. i F (0) If a purely random sample is selected and the sample does not suffer from nonresponse or selection bias, then you can conclude that you have experimental data and not observational data. b () In which case is it reasonable to expect that the sample median will be greater than the sample mean? b (2) The sample standard deviation and the coefficient of variation of the data shown in this histogram is closest to which of the following? e (3) How many extra units should the firm expect to sell if it runs 0 additional promotions? a (4) If the baggage of all passenger tiers is handled in the same way, what is the probability (to the nearest hundredth) that of the 0 people who had their bags lost by the airline, none of them are super elite?
ECO220 Mid-Term Test (June 29, 2005) Page 2 of 5 c (5) Consider a population with a variance of 00. To obtain a 95% interval estimator of the population mean such that the difference between the upper and lower confidence limits is no more than 2 units would require a sample size of at least: c (6) Consider the data summarized in the tabulation below. Which of the following is a reasonable inference? d (7) Which of the following statements is most plausible? e (8) Based on these survey results what is the point estimate of the total dollar amount (includes bonuses and salary increases) that the company should set aside if it wants to meet its employees expectations? b (9) Which of the following scatter diagrams is plausible when and Y are measured in 000 s of dollars: d (0) What is the s.d. of the amount of additional compensation (includes bonus and salary increase) expected by employees? b () If on average Q is million units, then what is the expected value of total costs (TC)? b (2) For which of these populations would a sample size of 0,000 yield the narrowest (most precise) 95% confidence interval estimator of the population mean? d (3) Consider the sample of data shown in this graph. The 95% confidence interval estimator (rounded to nearest tenth) of μ is: e (4) Suppose and Y are independent. ~ U[0, 0] and Y ~ U[-0, 0], where U[a, b] denotes the uniform distribution and its parameters. If a sample of 00 observations of each is taken, how is the sum of the sample means distributed? a (5) Suppose that on an average Saturday night during peak season, 24 of a hotel s 304 rooms request room service. What is the probability that on a given Saturday night during peak season 35 or more rooms request room service? () Call Diane s sales and Dan s sales Y. Must find V[ + Y]. V[ + Y] V[] + V[Y] + 2COV[,Y] E[] 0.5* + 0.5*2.5 E[Y] 0.6* + 0.4*2.4 V[] 0.5(.5) 2 + 0.5(2.5) 2 0.25 V[Y] 0.6(.4) 2 + 0.4(2.4) 2 0.24 COV[,Y] 0.4(.5)(.4) + 0.(.5)(2.4) + 0.2(2.5)(.4) + 0.3(2.5)(2.4) 0. V[ + Y] 0.25 + 0.24 + 2*0. 0.69 0.69 0.5 0.83; The s.d. of total sales per month is 0.83 sales.
ECO220 Mid-Term Test (June 29, 2005) Page 3 of 5 (2) (a) The population contains an observation for each high school student, where the actual number of caffeinated soft drinks that each purchases in a day from any vending machine on school property is recorded. This population would have a discrete distribution because the number of caffeinated soft drinks that a high school student could potentially purchase in a day is: 0,, 2, 3, up to some maximum that is not very large (can certainly be no more than the capacity of the vending machines). This population would have the property that there would be a large number of 0 s (students that don t purchase) and lots of students buying or maybe 2 drinks. Given that the number of successes (purchasing a drink) is likely to be small one may be tempted to say that it will be Poisson distributed. Across students the important assumption is that all students are independence draws from the SAME Poisson experiment (has the same lambda parameter). If difference students have different tastes (different lambda's), which would be likely as not all students have the same tastes for caffeinated soft drinks, then the data will have overdispersion (variance > mean) and hence will not be Poisson distributed, which we talked about in Lecture 4. This is another reason to say that it is not likely to be Poisson. However, purchases are not likely to be independent (if you already bought one recently then you are less likely to buy another one in the subsequent time period). The Binomial distribution is also inappropriate because of the violation of independence and because it is not possible to identify a fixed number of trials. (Even if you wanted to call the breaks in between classes trials this would not work because there is the possibility that during a single trial a student could buy more than one can.) Hence, other than to note that the population will be discrete, and note its properties (lots of zeros and small integers) we cannot say it will follow a named distribution that we have studied. (Note: the normal distribution would be completely inappropriate. The population is very discrete: most students would buy 0, or 2 drinks. Also, it is not possible for a student to buy a negative number of drinks which is likely what the normal approximation would predict.) (b) The population parameter of interest is the population mean (μ). A plausible estimator of this parameter is the sample mean (-bar). (c) () Selection Bias #: All students in the relevant population do NOT have an equal chance of being included in the sample. Those students that do not purchase caffeinated soft drinks have 0 probability of being included in the sample. This is because even among students that visit the vending machine, this researcher is ONLY selecting those that actually purchase a CAFFEINATED beverage. In addition, students that do not make ANY purchases from the vending machine also have 0 probability of being included in the sample. This introduces a selection bias. In this case NO ONE in the collected sample will have purchased 0 caffeinated soft drinks. This is a big problem given that lots of students in the target population will purchase 0. (2) Selection Bias #2: Even among students that purchase a caffeinated beverage, the probability of inclusion in the sample is NOT equal. Those students that purchase more caffeinated soft drinks would have a higher probability of being included in the sample: they would be visiting the vending machine more frequently and hence have multiple chances to
ECO220 Mid-Term Test (June 29, 2005) Page 4 of 5 be included in the random sample. You might remember from class that this bias has a name: an avidity bias (more avid people have a higher probability of inclusion in the sample). (3) Selection Bias #3: The researcher is standing next to a particular vending machine in the high school. It is likely that different types of students visit different vending machines, which are located in different areas of the school. (For example if the machine the researcher is standing next to is near the gymnasium then the sample would include a disproportionate number of athletic students.) Again, not all students would have an equal probability of inclusion: the type of students that use that particular vending machine would be disproportionately represented. (4) Target Population not equal to Sampled Population: The researcher is only sampling students from a particular high school. However, the assignment was to study a typical high school student. To the extent that the school that the researcher selected is different or special in any way (which it likely is given that there is considerable variance across school districts) than this sample will reflect that difference. Students at other high schools have no chance of being included in this sample. (5) Researcher is asking wrong question. (i) Researcher is asking how many the student already drank. However, those students that are surveyed earlier in the day would give lower numbers than those surveyed later in the day. The assignment asked how many caffeinated soft drinks would be purchased in an entire day by a typical high school student. If you ask a student how many they had so far and the day is not over yet then you have not collected the correct piece of information. (ii) Researcher is asking about soft drinks rather than caffeinated soft drinks. (iii) Researcher is asking how many the student already drank rather than how many the student purchased. (d) The sample mean would be a biased estimator of the population mean. It is likely that the sample mean would be biased upward: the reason is that we are excluding all students that drink zero from our sample and disproportionately sampling students that drink large numbers of soft drinks. The sample mean would also not be a consistent estimator of the population mean. The reason is that there are many systematic biases present in the sampling plan and survey instrument (discussed in (c)). These will not go away by simply collecting a larger sample size. For example, collecting data on 450 randomly selected students instead of 50 will do NOTHING to address the biases in the data collection process. Hence in this case the estimator will be biased and inconsistent. 0 7 P ( < 0) P Z < P Z < 27 (3) (a) (.35) 0. 0890
ECO220 Mid-Term Test (June 29, 2005) Page 5 of 5 (b) μ P( μ n 7 27 30 < 0) P Z < 3 3 0 3 0 7 27 30 3 0 0.9487 P( Z < 7.30) 0 (c) ( Y < 0) 2* 0. 6 ( 2) P Note: The probability is found using the formula for the area of a rectangle: A b*h. (d) μ P( Y (6 + ( 2)) μ 7 2 2 (6 ( 2)) 2 n 30 < 0) P Z < 8 2 30 0 7 P( Z 27 30 8 2 3 0 3 3 0 < 7.30) 0 0.9487 (e) In (a) and (c) we obtained different answer for the probability of obtaining a single observation less than zero. The reason is that in (a) we used the normal distribution and in (c) we used the uniform distribution. Even though the population distributions in both cases have the same mean and variance, the probabilities were different because of the different shapes of these two distributions. In (b) and (d) we obtained the same answer for the probability of obtaining a sample mean with n 30 that is less than zero. The reason is the sampling distribution of the sample mean is the same in both cases: normal. It is normal in (b) because the population was normal. It is normal in (d) because of the Central Limit theorem. Hence, because the shape of the distributions is the same and the parameters are the same, the probabilities must be the same.