Math 385/585 Applied Regression Analysis

Similar documents
Probability and Statistics Curriculum Pacing Guide

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Honors Mathematics. Introduction and Definition of Honors Mathematics

Statewide Framework Document for:

STA 225: Introductory Statistics (CT)

Lecture 1: Machine Learning Basics

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Analysis of Enzyme Kinetic Data

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Course Syllabus for Math

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Python Machine Learning

AP Statistics Summer Assignment 17-18

Assignment 1: Predicting Amazon Review Ratings

Grade 6: Correlated to AGS Basic Math Skills

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Office Hours: Mon & Fri 10:00-12:00. Course Description

Math 96: Intermediate Algebra in Context

BUS Computer Concepts and Applications for Business Fall 2012

Mathematics subject curriculum

Syllabus ENGR 190 Introductory Calculus (QR)

Physics 270: Experimental Physics

School of Innovative Technologies and Engineering

Analyzing the Usage of IT in SMEs

Math 181, Calculus I

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Hierarchical Linear Models I: Introduction ICPSR 2015

12- A whirlwind tour of statistics

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Probability and Game Theory Course Syllabus

Universityy. The content of

Probabilistic Latent Semantic Analysis

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

COURSE WEBSITE:

Foothill College Summer 2016

Using Calculators for Students in Grades 9-12: Geometry. Re-published with permission from American Institutes for Research

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Mathematics. Mathematics

MGT/MGP/MGB 261: Investment Analysis

February Statistics: Multiple Regression in R

Detailed course syllabus

Cal s Dinner Card Deals

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

CS 101 Computer Science I Fall Instructor Muller. Syllabus

BENCHMARK TREND COMPARISON REPORT:

Individual Differences & Item Effects: How to test them, & how to test them well

Mathematics process categories

On-the-Fly Customization of Automated Essay Scoring

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Instructor Dr. Kimberly D. Schurmeier

Technical Manual Supplement

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

learning collegiate assessment]

Instructor: Matthew Wickes Kilgore Office: ES 310

Exploring Derivative Functions using HP Prime

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Introducing the New Iowa Assessments Mathematics Levels 12 14

(Sub)Gradient Descent

Evaluation of a College Freshman Diversity Research Program

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Ohio s Learning Standards-Clear Learning Targets

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

MTH 215: Introduction to Linear Algebra

Innovative Methods for Teaching Engineering Courses

Teaching a Laboratory Section

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

ECO 3101: Intermediate Microeconomics

Math Techniques of Calculus I Penn State University Summer Session 2017

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

How the Guppy Got its Spots:

BAYLOR COLLEGE OF MEDICINE ACADEMY WEEKLY INSTRUCTIONAL AGENDA 8 th Grade 02/20/ /24/2017

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Multiple Measures Assessment Project - FAQs

Application of Virtual Instruments (VIs) for an enhanced learning environment

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Phys4051: Methods of Experimental Physics I

Learning From the Past with Experiment Databases

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Mathematics Success Level E

SAT MATH PREP:

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

CALCULUS III MATH

Transcription:

Math 385/585 Applied Regression Analysis Fall 2017 Section 001 1:50 to 2:50 M W F Instructor: Dr. Chris Edwards Phone: 948-3969 Office: Swart 123 Classroom: Swart 3 Text: Applied Linear Statistical Models, 5 th edition, by Kutner, Nachtsheim, Neter, and Li. Earlier editions of the text will likely be adequate, but you will have to allow for different page numbers and homework problem numbers. Catalog Description: A practical introduction to regression emphasizing applications rather than theory. Simple and multiple regression analysis, basic components of experimental design, and elementary model building. Both conventional and computer techniques will be used in performing the analyses. Prerequisite: Math 201 or Math 301 and Math 256 each with a grade of C or better. Course Objectives: Linear models in statistics are the backbone of many applications, including regression and ANOVA techniques. Math 385 focuses students on the regression aspect of modeling while Math 386 focuses students on the ANOVA aspect. In Math 385, students will learn how to calculate and interpret regression estimates, including parameter estimates, fits, and residuals, and will be able to perform statistical inference. In addition to simple linear regression, successful students will understand the issues introduced in multiple linear regression, including polynomial regression and non-linear regression. Finally, the student will be able to assess model adequacy and know methods to update and improve the model. Upon successful completion of the course, students are expected to have the ability to complete the following: Identify and understand the components and assumptions for the standard linear regression model Use statistical inference on regression model coefficients, including confidence intervals and hypothesis tests Construct and interpret the ANOVA table for describing a linear regression model Calculate and analyze residuals from a regression model Perform diagnostics on a regression model, including assessing lack of fit Perform remedial measures such as transformations to improve a regression model Understand how linear algebra can be used to describe a multiple regression model Perform inference in multiple regression and understand how the increased number of dimensions adds complexity to the interpretations due to collinearity Understand how to fit polynomial regression models Know how to use indicator variables in regression models Be able to build a model from a pool of variables, using techniques such as Best Subsets and Stepwise Regression Identify outliers, in both the X and Y dimensions, in multiple regression models Understand the basics of non-linear regression, including Logistic Regression

Grading: Final grades are based on these 300 points: Topic Points Tentative Date Chapters Exam 1 Simple Linear Regression 70 pts. October 6 1 to 4 Exam 2 Multiple Regression I 70 pts. November 13 5 to 8 Exam 3 Multiple Regression II 70 pts. December 15 9 to 11, 13 and 14 Homework 15 Points Each 90 pts. Homework: I will collect (around) 5 homework problems approximately once every other week. The due dates are listed on the course outline below. I suggest that you work together in small groups on the homework if you like; don t forget that I am a resource for you to use. Often we will use computer software to perform our analyses; include printouts where appropriate, but please make your papers readable. In other words, I don t want 25 pages of printout handed in if you can summarize it in two pages. Final grades are assigned as follows: 270 pts. A (90 %) 260 pts. A- (87 %) 250 pts. B+ (83 %) 240 pts. B (80 %) 230 pts. B- (77 %) 220 pts. C+ (73 %) 210 pts. C (70 %) 200 pts. C- (67 %) 190 pts. D+ (63 %) 180 pts. D (60 %) 179 pts. or less F Office Hours: Office hours are times when I will be in my office to help you. There are many other times when I am in my office. If I am in and not busy, I will be happy to help. My office hours for Fall 2017 semester are 3:00 to 3:45 Monday and Wednesday, and 9:00 to 11:00 Tuesday. Philosophy: I strongly believe that you, the student, are the only person who can make yourself learn. Therefore, whenever it is appropriate, I expect you to discover the mathematics we will be exploring. I do not feel that lecturing to you will teach you how to do mathematics. I hope to be your guide while we learn some mathematics, but you will need to do the learning. I expect each of you to come to class prepared to digest the day s material. That means you will benefit most by having read each section of the text and the Day By Day notes before class. My personal belief is that one learns best by doing. I believe that you must be truly engaged in the learning process to learn well. Therefore, I do not think that my role as your teacher is to tell you the answers to the problems we will encounter; rather I believe I should point you in a direction that will allow you to see the solutions yourselves. To accomplish that goal, I will find different interactive activities for us to work on. Your job is to use me, your text, your friends, and any other resources to become adept at the material. The Day By Day notes also include Skills that I expect you to attain. Math 585 Expectations: Expectations for the graduate students are understandably more rigorous than for the undergraduate student. Students taking Math 585 will have an extra theoretical problem added to each homework, to be assigned during the semester. In addition, a final project worth 50 points will be due at the end of the semester. This project will involve a complete analysis of a data set, including model estimation, development, and validation.

Monday Wednesday Friday September 4 No Class September 11 Day 3 Estimation Sections 1.6 to 1.8 September 18 Day 6 Homework 1 Due ANOVA Section 2.7 September 25 Day 9 Residuals II Sections 3.1 to 3.6 October 2 Day 12 Homework 2 Due Simultaneous Inference Sections 4.1 to 4.3 October 9 Day 15 Intro to Matrices Sections 5.1 to 5.7 October 16 Day 18 Inference Sections 6.3 to 6.6 October 23 Day 21 Homework 3 Due Extra SS Section 7.1 October 30 Day 24 Polynomial Models Section 8.1 November 6 Day 27 Dummy Variables I Sections 8.3 to 8.7 November 13 Day 30 Exam 2 November 20 Day 33 Diagnostics Sections 10.1 to 10.2 November 27 Day 34 X Outliers Section 10.3 December 4 Day 37 Non-Linear Regression I Sections 13.1 to 13.2 December 11 Day 40 Homework 6 Due Logistic Inference Section 14.5 September 6 Day 1 Introduction, Least Squares September 13 Day 4 Inference Sections 2.1 to 2.3 September 20 Day 7 GLM Section 2.8 September 27 Day 10 Lack of Fit Section 3.7 October 4 Day 13 Review October 11 Day 16 Regression Matrices Sections 5.8 to 5.13 October 18 Day 19 Intervals Section 6.7 October 25 Day 22 GLM Tests Sections 7.2 to 7.3 November 1 Day 25 Interactions I Section 8.1 November 8 Day 28 Dummy Variables II Sections 8.3 to 8.7 November 15 Day 31 Model Building Sections 9.1 to 9.3 November 22 No Class November 29 Day 35 Homework 5 Due Y Outliers Section 10.4 December 6 Day 38 Non-Linear Regression II Sections 13.3 to 13.4 December 13 Day 41 Review September 8 Day 2 Models Sections 1.1 to 1.5 September 15 Day 5 Interval Estimates Sections 2.4 to 2.6 September 22 Day 8 Residuals I Sections 3.1 to 3.6 September 29 Day 11 Transformations Sections 3.8 to 3.9 October 6 Day 14 Exam 1 October 13 Day 17 Mult. Reg. Models Sections 6.1 to 6.2 October 20 Day 20 Diagnostics Section 6.8 October 27 Day 23 Computational Problems and Multicollinearity Sections 7.5 to 7.6 November 3 Day 26 Interactions II Section 8.2 November 10 Day 29 Homework 4 Due Review November 17 Day 32 Best Subsets Sections 9.4 to 9.6 November 24 No Class December 1 Day 36 Trees Section 11.4 December 8 Day 39 Logistic Regression Sections 14.2 to 14.3 December 15 Day 42 Exam 3

Homework Assignments: (subject to change if we discover difficulties as we go) Homework 1 Due September 18 1.19, p. 35 Grade Point Average. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student s grade point average (GPA) at the end of the freshman year (Y) can be predicted from the ACT test score (X). The results of the study follow. Assume that first-order regression model (1.1) is i: 1 2 3 118 119 120 X! : 21 14 28 28 16 28 Y! : 3.897 3.885 3.778 3.914 1.860 2.948 1.23, p. 36 1.33, p. 37 2.4, p. 90 appropriate. a.) Obtain the least squares estimates of β! and β!, and state the estimated regression function. b.) Plot the estimated regression function and the data. Does the estimated regression function appear to fit the data well? c.) Obtain a point estimate of the mean freshman GPA for students with ACT test score X = 30. d.) What is the point estimate of the change in the mean response when the entrance test score increases by one point? Refer to Grade Point Average Problem 1.19. a.) Obtain the residuals e!. Do they sum to zero in accord with (1.17)? b.) Estimate σ! and σ. In what units is σ expressed? Refer to the regression model Y! = β! + ε! in Exercise 1.30 Derive the least squares estimator of β! for this model. Refer to Grade Point Average Problem 1.19.

2.55, p. 97 a.) Obtain a 99 percent confidence interval for β!. Interpret your confidence interval. Does it include zero? Why might the director of admissions be interested in whether the confidence interval includes zero? b.) Test, using the test statistic t, whether or not a linear association exists between student s ACT score (X) and GPA at the end of the freshman year (Y). Use a level of significance of 0.01. State the alternatives, decision rule, and conclusion. c.) What is the P-value of our test in part (b)? How does it support the conclusion reached in part (b)? Derive the expression for SSR in (2.51): Homework 2 Due October 2 2.23, p. 93 2.67, p. 99 Refer to Grade Point Average Problem 1.19. a.) Set up the ANOVA table.! SSR = b!!! X! X!. b.) What is estimated by MSR in your ANOVA table? By MSE? Under what condition do MSR and MSE estimate the same quantity? c.) Conduct and F test of whether or not β! = 0. Control the α risk at 0.01. State the alternatives, decision rule, and conclusion. d.) What is the absolute magnitude of the reduction in the variation of Y when X is introduced into the regression model? What is the relative reduction? What is the name of the latter measure? e.) Obtain r and attach the appropriate sign. f.) Which measure, R! or r, has the more clear-cut operational interpretation? Explain. Refer to Grade Point Average Problem 1.19. a.) Plot the data, with the least squares regression line for ACT scores between 20 and 30 superimposed? b.) On the plot from part (a), superimpose a plot of the 95 percent confidence band for the true regression line for ACT scores between 20 and 30. Does the confidence band suggest that the true regression relation has been precisely estimated? Discuss.

3.3, p. 146-147 3.21, p. 151 Refer to Grade Point Average Problem 1.19. a.) Prepare a box plot for the ACT scores X!. Are there any noteworthy features in this plot? b.) Prepare a dot plot of the residuals. What information does this plot provide? c.) Plot the residuals e! against the fitted values Y!. What departures from regression model (2.1) can be studied from this plot? What are your findings? d.) Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation between the ordered residuals and their expected values under normality. Test the reasonableness of the normality assumption here using Table B.6 and α = 0.05. What do you conclude? e.) Conclude the Brown-Forsythe test to determine whether or not the error variance varies with the level of X. Divide the data into the two groups, X > 26 and X 26, and use α = 0.01. State the decision rule and conclusion. Does your conclusion support your preliminary findings in part (c)? f.) Information is given below for each student on two variables not included in the model, namely, intelligence test score X!. Derive the result in (3.29):!!!!!!!!!!!!!!!!!! Y!" Y!" = Y!" Y! + Y! Y!"!!!!!!!!!!!! Homework 3 Due October 23 3.17, p. 150-151 SSE = SSPE + SSLF Sales growth. A marketing researcher studied annual sales of a product that had been introduced 10 years ago. The data are as follows, where X is the year (coded) and Y is sales in thousands of units: i: 1 2 3 4 5 6 7 8 9 10 X! : 0 1 2 3 4 5 6 7 8 9 Y! : 98 135 162 178 221 232 283 300 374 395

a.) Prepare a scatter plot of the data. Does a linear relation appear adequate here? b.) Use the Box-Cox procedure and standardization (3.36) to find an appropriate power transformation of Y. Evaluate SSE for λ = 0.3, 0.4, 0.5, 0.6, 0.7. What transformation of Y is suggested? c.) Use the transformation Y! = transformed data. Y and obtain the estimated linear regression function for the 4.21, p. 175 5.7, p. 210 d.) Plot the estimated regression line and the transformed data. Does the regression line appear to be a good fit to the transformed data? e.) Obtain the residuals and plot them against the fitted values. Also prepare a normal probability plot. What do your plots show? f.) Express the estimated regression function in the original units. When the predictor variable is so coded that X = 0 and the normal error regression model (2.1) applies, are b! and b! independent? Are the joint confidence intervals for β! and β! then independent? Refer to Plastic hardness Problem 1.22. Using matrix methods, find: 1) Y Y 2) X X 3) X Y 5.20, p. 211 5.26, p. 212 Find the matrix A of the quadratic form: 7Y!! 8Y! Y! + 8Y!!. Refer to Plastic hardness Problems 1.22 and 5.7. a) Using matrix methods, obtain the following: 1) X X!! 2) b 3) Y 4) H

5) SSE 6) s! {b} 7) s! {pred} when X! = 30. b) From part (a6), obtain the following: 1) s! b! 2) s b!, b! 3) s b! c) Obtain the matrix of the quadratic form for SSE. Homework 4 Due November 10 6.10, p. 249 Refer to Grocery retailer Problem 6.9. a) Fit regression model (6.5) to the data for three predictor variables. State the estimated regression function. How are b!, b!, and b! interpreted here? b) Obtain the residuals and prepare a box plot of the residuals. What information does this plot provide? c) Plot the residuals against Y, X!, X!, X!, and X! X! on separate graphs. Also prepare a normal probability plot. Interpret the plots and summarize your findings. d) Prepare a time plot of the residuals. Is there any indication that the error terms are correlated? Discuss. 7.4, p. 289 e) Divide the 52 cases into two groups, placing the 26 cases with the smallest fitted values Y! into group 1 and the other 26 cases into group 2. Conduct the Brown-Forsythe test for constancy of the error variance, using α = 0.01. State the decision rule and conclusion. Refer to Grocery retailer Problem 6.9. a) Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with X! ; with X 3, given X! ; and with X!, given X! and X 3. b) Test whether X! can be dropped from the regression model given that X! and X 3 are retained. Use the F test statistic and α = 0.05. State the alternatives, decision rule, and conclusion. What is the P-value of the test?

7.17, p. 290 c) Does SSR(X! ) + SSR(X! X! ) equal SSR(X! ) + SSR(X! X! ) here? Must this always be the case? Refer to Grocery retailer Problem 6.9. a) Transform the variables by means of the correlation transformation (7.44) and fit the standardized regression model (7.45). b) Calculate the coefficients of determination between all pairs of predictor variables. Is it meaningful here to consider the standardized regression coefficients to reflect the effect of one predictor variable when the others are held constant? c) Transform the estimated standardized regression coefficients by means of (7.53) back to the ones for the fitted regression model in the original variables. Verify that they are the same as the ones obtained in Problem 6.10a. 8.16, p. 337-338 8.34, p. 340 Refer to Grade point average Problem 1.19. An assistant to the director of admission conjectured that the predictive power of the model could be improved by adding information on whether the student had chosen a major field of concentration at the time the application was submitted. Assume that regression model (8.33) is appropriate, where X! is entrance test score and X! = 1 if student had indicated a major field of concentration at the time of application and 0 if the major field was undecided. Data for X 2 were as follows: i: 1 2 3 118 119 120 X!! : 0 1 0 1 1 0 a) Explain how each regression coefficient in model (8.33) is interpreted here. b) Fit the regression model and state the estimated regression function. c) Test whether the X! variable can be dropped from the regression model; use α = 0.01. State the alternatives, decision rule, and conclusion. d) Obtain the residuals for regression model (8.33) and plot them against X! X!. Is there any evidence in your plot that it would be helpful to include an interaction term in the model? In a regression study, three types of banks were involved, namely, commercial, mutual savings, and savings and loan. Consider the following system of indicator variables for type of bank: Type of bank X! X! Commercial 1 0 Mutual savings 0 1

Savings and loan 1 1 a) Develop a first-order linear regression model for relating last year s profit or loss (Y) to size of bank (X! ) and type of bank (X!, X! ). b) State the response functions for the three types of banks. c) Interpret each of the following quantities; 1) β! 2) β! 3) β! β! Homework 5 Due November 29 9.15, p. 378-379 9.16, p. 379 9.19, p. 379 Kidney function. Creatinine clearance (Y) is an important measure of kidney function, but is difficult to obtain in a clinical office setting because it requires 24-hour urine collection. To determine whether this measure can be predicted from some data that are easily available, a kidney specialist obtained the data that follow for 33 male subjects. The predictor variables are serum creatinine concentration (X! ), age (X! ), and weight (X! ). a) Prepare separate dot plots for each of the three predictor variables. Are there any noteworthy features in these plots? Comment. b) Obtain the scatter plot matrix. Also obtain the correlation matrix of the X variables. What do the scatter plots suggest about the nature of the functional relationship between the response variable Y and each predictor variable? Discuss. Are any serious multicollinearity problems evident? Explain. c) Fit the multiple regression function containing the three predictor variables as first-order terms. Does it appear that all predictor variables should be retained? Refer to Kidney function Problem 9.15. a) Using first-order and second-order terms for each of the three predictor variables (centered around the mean) in the pool of potential X variables (including cross products of the firstorder terms), find the three best hierarchical subset regression models according to the C! criterion. b) Is there much difference in C! for the three best subset models? Refer to Kidney function Problem 9.15.

a) Using the same pool of potential X variables as in Problem 9.16a, find the best subset of variables according to forward stepwise regression with α limits of 0.10 and 0.15 to add or delete a variable, respectively. b) How does the best subset according to forward stepwise regression compare with the best! subset according to the R!,! criterion obtained in Problem 9.16a? 10.10 a, p 415 Refer to Grocery retailer Problems 6.9 and 6.10. a) Obtain the studentized deleted residuals and identify any outlying Y observations. Use the Bonferroni outlier test procedure with α = 0.05. State the decision rule and conclusion. Homework 6 Due December 11 10.10 b-f, p 415 11.29, p. 479 Refer to Grocery retailer Problems 6.9 and 6.10. b) Obtain the diagonal elements of the hat matrix. Identify any outlying X observations using the rule of thumb presented in the chapter. c) Management wishes to predict the total labor hours required to handle the next shipment containing X! = 300,000 cases whose indirect costs of the total hours is X! = 7.2 and X! = 0 (no holiday in week). Construct a scatter plot of X! against X! and determine visually whether this prediction involves an extrapolation beyond the range of the data. Also, use (10.29) to determine whether an extrapolation is involved. Do your conclusions from the two methods agree? d) Cases 16, 22, 43, and 48 appear to be outlying X observations, and cases 10, 32, 38, and 40 appear to be outlying Y observations. Obtain the DFFITS, DFBETAS, and Cook s distance values for each of these cases to assess their influence. What do you conclude? e) Calculate the average absolute percent difference in the fitted values with and without each of these cases. What does this measure indicate about the influence of each of the cases? f) Calculate Cook s distance D! for each case and prepare an index plot. Are any cases influential according to this measure? Refer to Muscle Mass Problem 1.27. a) Fit a two-region regression tree. What is the first split point based on age? What is SSE for this two-region tree? b) Find the second split point given the two-region tree in part (a). What is SSE for the resulting three-region tree?

13.10, p. 550 c) Find the third split point given the three-region tree in part (b). What is SSE for the resulting four-region tree? d) Prepare a scatter plot of the data with the four-region tree in part (c) superimposed. How well does the tree fit the data? What does the tree suggest about the change in muscle mass with age? e) Prepare a residual plot of e! versus Y! for the four-region tree in part (d). State your findings. Enzyme kinetics. In an enzyme kinetics study the velocity of a reaction (Y) is expected to be related to the concentration (X) as follows: Y! = γ!x! γ! + X! + ε! 13.12, p. 550 Eighteen concentrations have been studied and the results follow: i: 1 2 3 16 17 18 X! : 1 1.5 2 30 35 40 Y! : 2.1 2.5 4.9 19.7 21.3 21.6 a) To obtain starting values for g 0 and g 1, observe that when the error term is ignored we have Y!! = β! + β! X!!, where Y!! =!, β!! =!, β!!! =!! and X!!!! =!. Therefore fit a linear!!! regression function to the transformed data to obtain initial estimates g (!)! =! and!! g! (!) =!!!!. b) Using the starting values obtained in part (a), find the least square estimates of the parameters γ! and γ!. Refer to Enzyme kinetics Problem 13.10. Assume that the fitted model is appropriate and that large-sample inferences can be employed here. 1) Obtain an approximate 95 percent confidence interval for γ!. 2) Test whether or not γ! = 20; use α = 0.05. State the alternatives, decision rule, and conclusion.