{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. Exam is scored out of 100 points. EPE/EDP 660 Exam 3 {3 points} Minitab (or other approved software) output must be included. It must be clearly labeled, with all answers clearly identified. In addition, you must include a copy of your session window. Do NOT include a copy of the worksheet. Read each question before responding. In order to receive partial credit, work must be shown. PART A: (23 points) Fill in each blank on the answer sheet with the best choice. {1 point each blank} (1) The two branches of statistics are and. (2) The measures the direction and strength of the linear association between two variables. (3) is the idea that simpler models are easier to understand and appreciate, and therefore have a "beauty" that their more complicated counterparts often lack. (4) If H0 is true and we reject it, we have made a error. (5) In a(n) design, the total sum of squares is made up of the treatment sum of squares and the error sum of squares. (6) Predicting y when the x values are outside the range of experimentation is. (7) We refer to as the error term ε having constant variance σ 2 for all levels of the independent variables. (8) If the effect of a 1-unit change in one independent variable depends on the level of the other independent variable, we have a(n). (9) In, the β parameter is interpreted as the percentage change in odds for every 1-unit increase in xi holding all other x s fixed. (10) In a hypothesis test, if the p-value =.94 and you have set alpha at.05, you would the null hypothesis. (11) occurs when two (or more) independent variables in a regression are related; they measure essentially the same thing. (12) Variance can be separated into two major components, variability in particular groups and, variability depending on group. 1 P a g e
True/False: Determine the correctness of each statement by assigning the best choice, or. Using the following table, answer items 13 15. ID Age Score [0-100%] Sex Disease [Relapse or Remission] 1 24 89 F Relapse 2 32 74 F Remission 3 36 77 F Remission 4 28 92 M Relapse (13) ID is an ordinal measure. (14) Sex could be classified as a qualitative variable and a nominal measure. (15) Score is a ratio measure. (16) When choosing a measure of central tendency, if the data set has extreme values, the mean would be the best measure. (17) Range and standard deviation are both measures of variability. (18) To test if all of the slope parameters are zero, we use an F test. (19) The value of SST does not change with the model, as it depends only on the values of the dependent variable y. (20) Once an interaction has been deemed important in a model, we cannot remove any associated first-order terms in the model. (21) In a completely randomized experimental design with 4 factors and 4 levels, 8 treatments exist. 2 P a g e
PART B: Short Answer (30 POINTS) Answer the questions below. {5 points each} (1) In hypothesis testing, does rejecting the null hypothesis prove that the research hypothesis is correct? Specifically, can we accept the alternative? Explain. (2) A colleague conducts a study and finds a positive correlation between income and health. She concludes that higher income causes better health. Is this a suitable conclusion? Explain. (3) Explain when we might use stepwise regression, and note at least one reason we would need to use caution in drawing inferences from a stepwise model. (4) In an experimental design, what is the purpose of blocking? Explain. (5) Consider the assumption of equal population variances in ANOVA. Why is this important? Explain. (6) In an ANOVA, why is it preferable to use a follow-up analysis such as Tukey s Multiple Comparisons of Means as opposed to multiple t-tests? 3 P a g e
PART C: Data Analysis (42 points) *** (Use α=.05) for testing purposes *** Consider the following data set (posted on the website as FinalData) The High School and Beyond data set includes the following variables: sex (1-male, 2-female), SES (Socioeconomic status: 1-low, 2-middle, 3-upper), school type (1-public, 2-private), type of high school program (1-general, 2-academic or 3-vocational), self-concept scores, and motivation level scores, in addition to test scores on an achievement test in writing. The data are posted as HSB Data for Final under Exams on the website. 1. Descriptive statistics were produced for all the continuous variables, including a correlation matrix. Using the output below, describe the distribution of each variable and their relationship with one another. {5 points} Descriptive Statistics: self concept, motivation, WRTG Variable N N* Mean StDev Minimum Q1 Median Q3 self concept 600 0 0.0049 0.7055-2.6200-0.3000 0.0300 0.4400 Motivation 600 0 0.6608 0.3427 0.0000 0.3300 0.6700 1.0000 WRTG 600 0 52.385 9.726 25.500 44.300 54.100 59.900 Variable Maximum Skewness Kurtosis self concept 1.1900-0.90 1.56 Motivation 1.0000-0.59-0.88 WRTG 67.100-0.47-0.70 Correlations: self concept, motivation, WRTG self concept motivation Motivation 0.289 WRTG 0.019 0.254 4 P a g e
2. A multiple regression equation was computed to explain the variation in Self-Concept, with a summary residual analysis. Using the output below, A. Write the regression model in population format. Label each component, i.e., main effect, error, etc. {5 points} B. Determine if the model has utility. Report your p-value and explain the decision. {3 points} C. Test the significance of the variables included. Interpret the results. {3 points} D. Do you feel the assumptions of regression have held in this analysis? Explain. {4 points} Regression Analysis: self concept versus motivation, SEX, motivation*sex The regression equation is self concept = 0.209 + 0.195 motivation - 0.403 SEX + 0.279 motivation*sex Predictor Coef SE Coef T P Constant 0.2094 0.1875 1.12 0.265 motivation 0.1953 0.2595 0.75 0.452 SEX -0.4034 0.1185-3.41 0.001 motivation*sex 0.2792 0.1602 1.74 0.082 S = 0.666568 R-Sq = 11.2% R-Sq(adj) = 10.7% Analysis of Variance Source DF SS MS F P Regression 3 33.341 11.114 25.01 0.000 Residual Error 596 264.810 0.444 Total 599 298.151 Percent Frequency 99.99 99 90 50 10 1 0.01 100 75 50 25 Normal Probability Plot -3.0-1.5 0.0 Residual Histogram Residual Plots for self concept Residual Residual Versus Fits 1 0-1 -2-3 1.5 3.0-0.6-0.4-0.2 0.0 0.2 Fitted Value 1 0-1 -2 Versus Order 0-2.4-1.8-1.2-0.6 0.0 0.6 1.2 Residual -3 1 50 100 150 200 250 300 350 400 450 500 550 600 Observation Order 5 P a g e
3. Using an ANOVA approach A. Conduct an analysis to determine if there is a significant difference between the self-concept of students by SES (1 = low, 2 = average, 3 = high). {3 points} i. Produce the 4 in 1 plot. {1 point} ii. Produce the comparative boxplots. {2 point} iii. Make sure to run Tukey s post hoc. {2 points} B. Based on your results, is there sufficient evidence of a difference between the self-concept of students for different SES levels? Explain. {3 points} C. If you found an overall difference, where did the individual differences lie? Justify your answer. {2 points} 4. Researchers decided to block on School Type to attempt to control for variation. A. List the explained and unexplained components of the model. List the random effect(s). {4 points} B. Using the output below, determine if the blocking was useful. Explain. {3 points} General Linear Model: self concept versus SES, School Type Factor Type Levels Values SES fixed 3 1, 2, 3 School Type random 2 1, 2 Analysis of Variance for self concept, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P SES 2 4.5017 4.2549 2.1274 4.32 0.014 School Type 1 0.0521 0.0521 0.0521 0.11 0.745 Error 596 293.5972 293.5972 0.4926 Total 599 298.1510 S = 0.701864 R-Sq = 1.53% R-Sq(adj) = 1.03% C. Plot the potential interaction between SES and school type. {2 points} When you are finished, submit your exam and celebrate. You have just completed 660 in the 4-week summer session! 6 P a g e