Professor Jacoby PLS 802 319 South Kedzie Spring 2017 jacoby@msu.edu REGRESSION ANALYSIS Course Objectives: This course provides an introduction to the theory, methods, and practice of regression analysis. The goal is to provide students with the skills that are necessary to: (1) Read, understand, and evaluate the professional literature that uses regression analysis; (2) design and carry out studies that employ regression techniques for testing substantive theories; and (3) prepare to learn about more advanced statistical procedures. This course will not dwell on statistical theory. But, neither will it take a superficial, cookbook approach to methodology. Instead, we will concentrate on: The importance of evaluating empirical relationships between variables as a component of the theory-testing process; the utility of regression analysis for doing so; the nature of the basic regression model; and the development of the regression estimators. We will see that the regression model depends very heavily on several assumptions. Therefore, we will examine these assumptions in detail, considering why they are necessary, whether they are valid in practical research situations, and the consequences of violating them in particular applications of the regression techniques. These formal, analytic treatments will be counterbalanced by the frequent use of substantive examples and class exercises. The overall course objective is not to turn you into a statistician rather, we are trying to maximize your research skills as a political scientist. Course Prerequisites: Any course of this type must assume a working knowledge of elementary statistical concepts and techniques. Students should be familiar with such ideas as univariate descriptive statistics, graphical displays for univariate data, the fundamentals of probability, sampling distributions, statistical inference, confidence intervals, and hypothesis testing. An understanding of these basic concepts is absolutely essential before moving on to the more complicated topics that will comprise the majority of the course material. Therefore, I assume that everyone has taken at least one prior course in introductory statistics and data analysis (e.g., PLS 801). Course Requirements: Regular attendance and active class participation is expected. This is a mandatory component of the course: Statistical knowledge is cumulative, and gaps in the early material will always have detrimental consequences later on. Homework assignments will be given frequently (about once a week). Some of these will be problems requiring penciland-paper calculations. But, most of the assignments will be computer-based data analysis exercises. All of them are intended to familiarize you with the various concepts and techniques introduced in class and in the readings. Assignments will not be graded for correct answers. But, they will be checked for completion, and comments will be provided. There will be two examinations in this course. The midterm and the final (which is cumulative) will both be take-home exams; the specific procedures and expectations will be discussed in class. The course grades will be determined as follows: 20% Homework assignments 10% Class participation 30% Midterm examination 40% Final examination
Page 2 TEXTBOOKS The required text for this course is: Fox, John, (2016) Applied Regression Analysis and Generalized Linear Models (Third Edition). Thousand Oaks, CA: Sage. Some alternative and supplemental texts that could be used in this course are (in alphabetical order): Berry, William D. and Stanley Feldman. (1985) Multiple Regression in Practice. Beverly Hills, CA: Sage. Gujarati, Damodar N. and Dawn C. Porter. Boston, MA: McGraw-Hill Irwin. (2009) Basic Econometrics (Fifth Edition). Kennedy, Peter. (2008) A Guide to Econometrics (Sixth Edition). Malden, MA: Blackwell Publishing. McClendon, McKee. (1994) Multiple Regression and Causal Analysis. Prospect Heights, IL: Waveland Press. Wooldridge, Jeffrey M. (2013) Introductory Econometrics: A Modern Approach (Fifth Edition). Mason, OH: South-Western Cengage Learning. The Fox text is an excellent book, providing a thorough and modern treatment of of the general and generalized linear model from a social scientific perspective. Unfortunately, this book (like most statistics texts) is extremely expensive. In addition, there are some topics that it does not cover. For these reasons, I include the preceding list of additional works. These books could be used in addition to, or as alternatives for, the Fox text. The Gujarati and Porter text and the Wooldridge text provide similarly broad coverage of the material from an econometric perspective. Their level of technical discourse is similar to that used in the Fox text. The treatments of the material in the McClendon text and the Berry and Feldman monograph are fairly elementary and non-technical. Nevertheless, they are reasonably priced and, taken together, they cover most of the course topics in an accessible manner. The Kennedy book is a nice supplement to the texts, providing accessible, and sometimes provocative, discussions about most of the topics that will be covered during the semester. We will discuss the various books in greater detail in class, and I will be happy to talk with anyone about the pros and cons of each one. COURSE WEB SITE The website for this course is located at the following URL: http://www.polisci.msu.edu/jacoby/msu/pls802 The contents of this website will evolve and expand as the course proceeds through the subject matter. You should regard the site as an information resource. It will contain the syllabus, handouts, datasets, assignments, computing resources, study materials for examinations, and links to other relevant sites on the web.
Page 3 COMPUTING AND SOFTWARE Statistical methods almost invariably require repetitive calculations, applied to large amounts of data. At the same time, graphical displays of quantitative information require a precision in their rendering that is almost impossible to achieve using pencil and paper. For these reasons, computers and statistical software are absolutely necessary for employing modern statistical techniques in an effective manner. They will be closely integrated into the course material. Most of our work (including class examples, demonstrations, homework, and examinations) will rely on the Stata statistical software package. Stata is available in the computer labs that are located throughout the MSU campus. It can also be purchased for a reasonable price through Stata s GradPlan option. For pricing and details, go to http://stata.com/order/new/edu/gradplans/ us-pickup/. Regardless how you access Stata, you should make sure that you are using a relatively recent version no earlier than Stata 12 (the latest version is Stata 14). Earlier versions of Stata do not include several commands that we will be using in this class (e.g., marginsplot). You may also use other statistical software in this class (e.g., SAS, SYSTAT, SPSS, MatLab), as long as it has the analytical routines and capacities that are required to complete the assignments and examinations. In order to make sure that this is the case, students who are interested in alternative software for assignments and examinations must check with me early in the semester! An additional software option for this class is the R statistical computing environment. Along with its superb functionality and flexibility, R is also attractive because it is available free of charge. You can download the software from the R website, http://www.r-project.org/. This site also provides a great deal of useful information about R and many useful links to additional material (e.g., manuals, FAQs, newsletter, etc.). Installation on your own computer is very easy; if you are offered any choices during the process you should use the defaults. Ph.D. students who are planning to declare a minor field in Methodology should give particular consideration to using R. While there are no required textbooks devoted to computing, you might find some of the following works to be helpful for learning either Stata or R and using the respective software systems for regression analysis: Acock, Alan C. (2012) A Gentle Introduction to Stata (Revised Third Edition). Station, TX: Stata Press. College Fox, John and Sanford Weisberg. (2011) An R Companion to Applied Regression (Second Edition). Thousand Oaks, CA: Sage. Mitchell, Michael N. (2012) A Visual Guide to Stata Graphics. College Station, TX: Stata Press. Mitchell, Michael N. (2012) Interpreting and Visualizing Regression Models Using Stata. College Station, TX: Stata Press. Verzani, John. (2005) Using R for Introductory Statistics. Boca Raton, FL: Chapman and Hall. Zuur, Alain F.; Elena N. Ieno; Erik H. W. G. Meesters. (2009) A Beginner s Guide to R. New York, NY: Springer.
Page 4 TOPICS AND READING ASSIGNMENTS I. Introduction and Preliminary Material A. Causality, functional dependence, and the nature of statistical models Fox (2016), pages 1-11 Gujarati and Porter (2009), pages 1-12 Wooldridge (2013), pages 1-17 McClendon (1994), pages 1-19 Practice for the Social Sciences, Chapter 1 B. Basic introduction to Stata (if necessary) (No explicit reading assignment) C. Basic tools: linear transformations, linear combinations, and the properties of statistical estimators Fox (2016), Appendix D Gujarati and Porter (2009), pages 15-48 Wooldridge (2013), pages 703-707, 737-741, 755-768 McClendon (1994), pages 20-28 Kennedy (2008), pages 11-39 Practice for the Social Sciences, Chapter 2, pages 9-49 D. Graphical displays for visualizing data Fox (2016), pages 28-54 Jacoby, William G. and Saundra K. Schneider. (2017) Graphical Displays for Political Science Research. In Lonna Rae Atkeson and R. Michael Alvarez (Editors) The Oxford Handbook of Polling and Polling Methods. (Forthcoming). Practice for the Social Sciences, Chapter 2, pages 1-9 E. Scatterplot smoothing and simple nonparametric regression Fox (2016), pages 13-25 Practice for the Social Sciences, Chapter 3, pages 1-11
Page 5 II. The Descriptive Linear Regression Model A. Bivariate regression Fox (2016), pages 82-92 Gujarati and Porter (2009), pages 55-61 Wooldridge (2013), pages 22-45 McClendon (1994), pages 28-45, 53-59 Practice for the Social Sciences, Chapter 3, pages 11-53 B. The multiple regression model Fox (2016), pages 92-100 Gujarati and Porter (2009), pages 188-207, 227-229 Wooldridge (2013), pages 69-81 McClendon (1994), pages 60-83, 94-109, 116-118 III. Statistical Inference for the Linear Regression Model A. Regression assumptions and properties of the least squares estimator Fox (2016), pages 106-110 Gujarati and Porter (2009), pages 61-105 Wooldridge (2013), pages 45-57, 93-94, 99-104, 168-178 McClendon (1994), pages 133-146 Berry and Feldman (1985), pages 9-12 Kennedy (2008), pages 40-50 B. Confidence intervals and hypothesis tests for bivariate regression models Fox (2016), pages 111-112 Gujarati and Porter (2009), pages 107-135 Wooldridge (2013), pages 118-140 McClendon (1994), pages 147-154 C. Statistical inference for multiple regression Fox (2016), pages 112-117 Gujarati and Porter (2009), pages 233-260 Wooldridge (2013), pages 140-154, 178-183, 207-216 McClendon (1994), pages 157-174 Berry and Feldman (1985), pages 12-18 Kennedy (2008), pages 51-70
Page 6 D. Interpretation and specification issues in regression analysis Fox (2016), pages 100-102, 117-120 Gujarati and Porter (2009), pages 154-159, 468-482 Wooldridge (2013), pages 83-93, 98-99, 154-159, 186-191, 200-207 McClendon (1994), pages 45-49, 154-157 Berry and Feldman (1985), pages 18-26 Kennedy (2008), pages 71-95 MIDTERM EXAMINATION DISTRIBUTED ON THURSDAY, MARCH 2 IV. Categorical independent variables A. Dummy variables Fox (2016), pages 128-138 Gujarati and Porter (2009), pages 277-288 Wooldridge (2013), pages 227-240 McClendon (1994), pages 198-214, 223-226 Kennedy (2008), pages 232-235 B. Multiplicative terms and interaction Fox (2016), pages 140-150 Gujarati and Porter (2009), pages 288-290 Wooldridge (2013), pages 240-248 McClendon (1994), pages 271-287 Kennedy (2008), pages 235-240 C. A brief introduction to analysis of variance (ANOVA) Fox (2016), pages 153-177 McClendon (1994), pages 226-229 V. Alternative Representations of the Linear Regression Model (If Time Permits) A. The multiple regression model in matrix form Fox (2016), pages 202-224, 227-230 Gujarati and Porter (2009), pages 849-875 Wooldridge (2013), pages 807-819 McClendon (1994), pages 119-132 B. Vector geometry of the linear regression model Fox (2016), pages 245-258 Wickens, Thomas D. (1995) The Geometry of Multivariate Statistics. Hillsdale, NJ: Lawrence Erlbaum. (Especially Chapters 1-6).
Page 7 VI. Potential Problems with the Linear Regression Model A. Multicollinearity and its effects Fox (2016), pages 341-346 Gujarati and Porter (2009), pages 320-347 Wooldridge (2013), pages 94-98 Berry and Feldman (1985), pages 37-50 Kennedy (2008), pages 192-202 B. Outliers, unusual, and influential observations Fox (2016), pages 266-289 Kennedy (2008), pages 346-349 C. Functional form, nonlinearity, and transformations Fox (2016), pages 55-75, 307-318 Gujarati and Porter (2009), pages 476-496 Wooldridge (2013), pages 304-308 McClendon (1994), pages 230-270 Berry and Feldman (1985), pages 51-72 Kennedy (2008), pages 95-97 Jacoby, William G. (2000) Loess: A Nonparametric Graphical Tool for Depicting Relationships Between Variables. Electoral Studies 19: 577-613. D. Nonnormal and nonconstant (heteroskedastic) errors Fox (2016), pages 296-307 Gujarati and Porter (2009), pages 365-411 Wooldridge (2013), pages 268-294 McClendon (1994), pages 174-195 Berry and Feldman (1985), pages 73-88 Kennedy (2008), pages 112-118 E. Measurement error and regression analysis Fox (2016), pages 120-123 Gujarati and Porter (2009), pages 482-486 Wooldridge (2013), pages 317-324 Berry and Feldman (1985), pages 26-37 Kennedy (2008), pages 157-160 Jacoby, William G. and Saundra K. Schneider. (2012) Dependent Variable Measurement Error in Regression Models: Complacency, Caution, and Correction. Unpublished manuscript.
Page 8 VII. Dichotomous Dependent Variables: A Brief Look A. The linear probability model Fox (2016), pages 370-375 Gujarati and Porter (2009), pages 541-553 Wooldridge (2013), pages 248-253, 294-296 B. The logistic regression and probit models Fox (2016), pages 370-391 Gujarati and Porter (2009), pages 553-574 Wooldridge (2013), pages 584-596 Kennedy (2008), pages 241-244 VIII. Nonindependent Errors and Time Series Data: A Brief Look Fox (2016), pages 474-495 Gujarati and Porter (2009), pages 412-466 Wooldridge (2013), pages 344-434 Kennedy (2008), pages 118-122 IX. Good Statistical Practice in Political Science Research Wooldridge (2013), pages 676-700 Kennedy (2008), pages 361-384 Berk, Richard A. (2004) What to Do. Chapter 11 in Regression Analysis: A Constructive Critique. Thousand Oaks, CA: Sage. FINAL EXAMINATION DISTRIBUTED ON THURSDAY, MAY 4