STATISTICS 641 - Fall 2005 Methods of Statistics, I Time and Place: MWF 1:50-2:40 p.m., BLOCKER 457 Instructor: Office: Phone: Email: Office Hours: Prerequisite for Course: Dr. Michael Longnecker 404D Blocker Building 845-3141 (Office) and 690-0553 (Home) longneck@stat.tamu.edu TR: 12:00 a.m. - 2:30 p.m. or by appointment. Concurrent enrollment in STAT 610 or its equivalent. Grader for Course: Beverly Gaucher: Office Hours TBA, Room 506C. Text: Supplementary Text: Supplementary Text: Statistics and Data Analysis from Elementary to Intermediate by Tamhane & Dunlop. A Handbook of Statistical Analyses using SAS by Der and Everitt Introductory Statistics with R by Peter Dalagaard References: 1. Graphical Methods for Data Analysis, by J. Chambers, W.Cleveland, B. Kleiner, and P. Tukey. 2. Nonparametric Statistical Methods, 2nd Ed., by M. Hollander and D. Wolfe 3. Categorical Data Analysis, 2nd Ed., by A. Agresti 4. Statistics for Experimenters, 2nd Ed., by G. Box, W. Hunter, and S. Hunter 5. Statistical Principles of Research Design and Analysis, 2nd Ed., by R. Kuehl 6. Applied Regression Methods, 3rd Ed., by N. Draper and H. Smith 7. Applied Multivariate Statistical Analysis, 2nd Ed. by D. Wichern and R. Johnson Computing: Data sets will be analyzed using SAS, Minitab, and R software packages. packages are available on the department s computer network. These software Homework: Homework will be regularly assigned and collected. Hyejin Shin will be first grader of all homework assignments and I will also review all assignments. We will select a subset of the assigned problems for grading. In this course, the methods of solving a problem are as important as the final solutions. Homework should therefore be detailed enough to adequately demonstrate your method of solution. You may discuss the homework problems with other students, but you should write up your solutions independently. Do not copy other students solutions. Homework will be graded by the teaching assistant. Exams: There will be two semester exams and a comprehensive final exam. The exams will be worth 30% each. The final exam will be given on Tuesday, December 13 from 3:30-5:30 p.m. Make-up Policy: Make-up of missed exams will be allowed only for university approved reasons. The Course Instructor must be notified of the absence as soon as possible. If the absence is approved, then the final exam grade will be used as the grade for the missed exam, that is, the final exam will be worth 60% of the course grade. 1
ADA Statement: STATEMENT ON DISABILITIES: The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation for their disabilities. If you believe you have a disability requiring an accommodation, please contact the Office of Support Services for Students with Disabilities in Room 126 of the Koldus Student Services Building. The phone number is 845-1637 Plagiarism Statement: STATEMENT ON PLAGIARISM: The handouts used in this course are copyrighted. By handouts, I mean all materials generated for this class, which include but are not limited to syllabi, quizzes, exams, lab problems, in-class materials, review sheets, and additional problem sets. Because these materials are copyrighted, you do not have the right to copy the handouts, unless I expressly grant permission. As commonly defined, plagiarism consists of passing off as one s own ideas, words, writing, etc., which belong to another. In accordance with this definition, you are committing plagiarism if you copy the work of another person and turn it in as your own, even if you should have the permission of that person. Plagiarism is one of the worst academic sins, for the plagiarist destroys the trust among colleagues without which research cannot be safely communicated. If you have any questions regarding plagiarism, please consult the latest issue of the Texas A&M University Student Rules, under the section Scholastic Dishonesty. Academic Integrity Statement: An Aggie does not lie, cheat, or steal or tolerate those who do. All syllabi shall contain the above Aggie Honor Code and refer students to the Honor Council Rules and Procedures on the web http://www.tamu.edu/aggiehonor It is further recommended that instructors print the following on assignments and examinations: On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work. INCOMPLETE GRADE: Signature of Student A temporary grade of I (Incomplete) at the end of a semester indicates that the student has completed the course with the exception of a major quiz, final exam, or other work. The instructor shall give this grade only when the deficiency is due to an authorized absence or other cause beyond the control of the student. 2
TOPICS COVERED I. Introduction to Statistics: a. Role of statistics in scientific investigations b. Experimental vs observational study c. Types of sampling schemes d. Probability: Population to Samples e. Statistics: Samples to Population II. Distribution Theory and Data Modeling: a. Discrete Distributions: 1. pmf, cdf, percentiles 2. expected value, standard deviation 3. binomial: number of successes in n independent Bernoulli trials 4. negative binomial: number of trials needed to obtain r successes 5. hypergeometric: number of Type A units in a sample of n units selected without replacement from a population consisting of Type A and Type B units (if sampling with replacement, distribution is binomial) 6. Poisson: the number of occurrences of an event in a unit of time or space b. Continuous Distributions: 1. Understand the relationship between pdf, cdf, and percentiles 2. pdf is not the probability of an event, explain 3. survival function and hazard function 4. normal distribution: symmetric distribution 5. Weibull-models processes with nonconstant failure rates, maximums 6. gamma-models time to kth occurrence of event in a Poisson process 7. exponential-models processes with constant failure rates, time to next occurrence of an event in a Poisson process 8. Relationship between normal and the following distributions: t, Chi-squared, F 9. Uniform distribution (central to simulation) special case of beta distribution 10. Cauchy, Double Exponential, and Logistic - symmetric distributions 11. lognormal Distribution - Skewed to the right, support on positive reals 12. mixture distributions - multimodal distributions c. Be able to recognize processes for which each of the above discrete and continuous distributions would be the appropriate model III. Graphical descriptions of data: a. Univariate: 1. edf, histogram, nonparametric density estimator 2. box plot, stem-leaf plot 3. quantile plot 3
b. Multivariate: 1. scatterplot, draftsman s plot 2. side-by-side box plot IV. Numerical summaries of data: a. measures of location: mean, median, mode, percentiles, trimmed means, m-estimator b. measures of dispersion: range, IQR, standard deviation, SIQR, coefficient of variation, MAD c. measures of Higher Order Properties: skewness, kurtosis d. measures of dependency: correlation, partial correlation V. Cumulative and Empirical Distribution Function (cdf and edf) : a. relation of numerical summaries to the cdf and edf b. the inverse probability transform and its relationship to computer simulation of continuous distributions c. simulation of discrete distributions d. goodness of fit tests: 1. Chi-squared test for discrete data 2. Shapiro-Wilk for normality 3. Kolmogorov-Smirnov, Cramer-von-Mises, Anderson-Darling with corrections VI. Graphical Comparisons a. reference distribution probability plot b. comparisons of populations/processes using box plots c. comparisons of population/processes using quantile plots d. probability plot for location-scale family of distributions d. normal and Weibull probability plots VII. Sampling Distributions and Their Application a. Relationship between samples and populations b. Central limit theorem for sample mean: the distribution of X µ σ/ n converges to the standard normal distribution as n c. Central limit theorems for sample quantiles, median, sample standard deviations d. Extreme order statistics properly standardized converge in distribtuion to an extremely value distribution, not the normal distribution VIII. Estimation of population parameters a. biased vs unbiased estimators b. estimation of population parameters using edf in place of cdf in definition of parameter c. mean squared error as measure of performance of estimator 4
d. confidence intervals for population parameters e. determine sample sizes for estimating µ and π to a certain level of precision. f. construct tolerance interval for processes and populations g. understand differerence between tolerance intervals and confidence intervals IX. Tests of Hypotheses a. Null and Alternative Hypotheses, b. Type I and Type II errors, c. Rejection Regions, d. OC-curves and Power-curves for z-tests, t-tests, F-tests e. Sample Size Determination for achieving specified power or P(Type II error), f. P-values g. Test Statistics for Hypotheses about Population Parameters: µ, σ, p, failure rate, λ in Poisson model h. Test Statistics for Comparing: Population Means µ 1, µ 2,..., µ k ; Population Standard Deviations: σ 1, σ 2,..., σ k ; Population Proportions: π 1, π 2,..., π k Small sample tests for Binomial and Poisson Parameter in Exponential Distribution i. distribution-free procedures and their relative performance vs parametric procedures j. evaluation of required conditions in inference procedures: robustness of some procedures to deviations from normality effect of correlation on most statistical procedures detecting non-normality using tests, quantile plot detecting unequal variance using Levine test, box plots detecting correlation using Durbin-Watson and runs tests, scatterplots k. Goodness of fit tests for Discrete and Continuous Distributions X. Fitting Models to Data 1. Chi-squared test for discrete data 2. Shapiro-Wilk for normality 3. Kolmogorov-Smirnov, Cramer-von-Mises, Anderson-Darling with corrections a. strategy of regression analysis b. general linear model in matrix form c. LSE and relationship to MLE d. AOV for regression 5
e. properties of LSE of parameters f. model specification g. regression diagnostics h. discussion of detection of violation of assumptions i. variable selection XI. Analysis of Completely Randomized Design a. single factor experiments b. multifactor experiments c. parameter estimation d. tests, confidence intervals, and graphical displays e. relate models to regression models f. discussion of detection of violation of assumptions g. design concepts illustrated using examples from the literature and consulting examples where the lack of design considerations led to inappropriate or incomplete analysis h. discussion of need for more complex designs i. discussion of alternative analyses: transformations, rank based techniques, etc. j. multiple comparisons, contrasts 6
MS in STATISTICS - FIRST YEAR COURSES STAT 610 Theory of Statistics, I: Intro. Prob. Theory Distribution Theory Generating Functions Limit Theorems Transformations STAT 604 Computational Statistics: Computer algorithms Random number generation Simulation studies S+, C++, Java STAT 641 Methods of Statistics, I: Design of Studies Data Analysis Graphical Summaries Inferences STAT 611 Theory of Statistics, II: Estimation- Point & Interval Hypothesis Testing Decision Theory Bayesian Theory STAT 608 Regression Analysis: Linear and Non-Linear Models Least Squares Non-Full Rank Models STAT 642 Methods of Statistics, II: Experimental Design ANOVA Variance Components Inferences 7