MULTIPLE COMPARISONS (Section 4.4) 1. Bonferroni Method.

Similar documents
STA 225: Introductory Statistics (CT)

12- A whirlwind tour of statistics

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Lecture 1: Machine Learning Basics

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Stopping rules for sequential trials in high-dimensional data

Office Hours: Mon & Fri 10:00-12:00. Course Description

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

THE INFORMATION SYSTEMS ANALYST EXAM AS A PROGRAM ASSESSMENT TOOL: PRE-POST TESTS AND COMPARISON TO THE MAJOR FIELD TEST

Probability and Statistics Curriculum Pacing Guide

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

CS Machine Learning

Generic Skills and the Employability of Electrical Installation Students in Technical Colleges of Akwa Ibom State, Nigeria.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Development Policy

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Probability estimates in a scenario tree

Introduction to Simulation

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

Measures of the Location of the Data

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

GDP Falls as MBA Rises?

learning collegiate assessment]

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Assignment 1: Predicting Amazon Review Ratings

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Self Study Report Computer Science

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Evolutive Neural Net Fuzzy Filtering: Basic Description

Research Design & Analysis Made Easy! Brainstorming Worksheet

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

How to make your research useful and trustworthy the three U s and the CRITIC

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Interdisciplinary Journal of Problem-Based Learning

STAT 220 Midterm Exam, Friday, Feb. 24

Functional Skills Mathematics Level 2 assessment

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Investment in e- journals, use and research outcomes

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Analysis of Enzyme Kinetic Data

How do adults reason about their opponent? Typologies of players in a turn-taking game

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school

Discovering Statistics

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

AP Statistics Summer Assignment 17-18

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

HUBBARD COMMUNICATIONS OFFICE Saint Hill Manor, East Grinstead, Sussex. HCO BULLETIN OF 11 AUGUST 1978 Issue I RUDIMENTS DEFINITIONS AND PATTER

TCC Jim Bolen Math Competition Rules and Facts. Rules:

ECE-492 SENIOR ADVANCED DESIGN PROJECT

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Running head: DELAY AND PROSPECTIVE MEMORY 1

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

The Implementation of Interactive Multimedia Learning Materials in Teaching Listening Skills

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Individual Differences & Item Effects: How to test them, & how to test them well

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

HAZOP-based identification of events in use cases

Detailed course syllabus

CSC200: Lecture 4. Allan Borodin

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Algebra 2- Semester 2 Review

A study of speaker adaptation for DNN-based speech synthesis

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

A Study on Situated Cognition: Product Dissection s Effect on Redesign Activities

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Capturing and Organizing Prior Student Learning with the OCW Backpack

MGT/MGP/MGB 261: Investment Analysis

Analyzing the Usage of IT in SMEs

Statewide Framework Document for:

Technical Manual Supplement

Running head: METACOGNITIVE STRATEGIES FOR ACADEMIC LISTENING 1. The Relationship between Metacognitive Strategies Awareness

w o r k i n g p a p e r s

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

ROA Technical Report. Jaap Dronkers ROA-TR-2014/1. Research Centre for Education and the Labour Market ROA

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

WIC Contract Spillover Effects

Hierarchical Linear Models I: Introduction ICPSR 2015

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Effect of Cognitive Apprenticeship Instructional Method on Auto-Mechanics Students

The Singapore Copyright Act applies to the use of this document.

EDPS 859: Statistical Methods A Peer Review of Teaching Project Benchmark Portfolio

Transcription:

1 2 MULTIPLE COMPARISONS (Section 4.4) 1. Bonferroni Method. Last time: If we form two 95%confidence intervals for two means or two effect differences, etc., then the probability that, under repeated sampling with the same design, the procedures used will give intervals each containing the true mean, effect differences, etc. might only be 90% -- we have no reason to believe it must be any higher without any more information. i.e., the simultaneous or family-wise or overall confidence level is 90% Analogous calculations show: If we are forming m confidence intervals, each with confidence level 1-! individually, then the simultaneous or family-wise or overall or experiment-wise confidence level will be only 1-m!. Consequence: If we want overall level 1-!, then choose individual level 1-!/m. This is called the Bonferroni method. e.g., if we are forming 5 confidence intervals and want an overall 95% confidence level, then we need to use the procedure for individual 99% confidence intervals. Bonferroni typically gives wide intervals. Example: In the battery experiment, the individual 95% confidence intervals for the four means shown in the Minitab output have a Bonferroni overall confidence level 80%. If we want an overall confidence level 95% for the four confidence intervals, we need to calculate individual 98.75% confidence intervals: mse 2368 se = r = = 24.33 and use i 4 t-value t(12,.99375) = 2.9345 Result: Confidence intervals have half-width 71.40 -- compare with 24.33x2.1254 = 51.71 for the individual 95% confidence intervals -- more than a third as wide.

This illustrates the reality: To get a certain family confidence level, you will get wider confidence intervals than those formed with the individual confidence level. A Bonferrroni approach can also be used for hypothesis tests: If you want to do m hypothesis tests on your data, and you want an overall type I error rate of! (that is, you want to have probability of falsely rejecting at least one of the null hypotheses less than!), you can achieve this by using a significance level of!/m for each test individually. Example: Suppose the experimenter in the battery example collected the data, analyzed them, looked at the confidence intervals in the Minitab output, noticed that the estimate of the mean for the second level was largest and the estimate for the first level the second largest, and tested the null hypothesis H 0 : µ 1 = µ 2. For what p-values should he reject the null hypothesis using the Bonferroni method in order to claim his result is significant at the.05 level? 3 Pre-planned comparisons and data snooping A pre-planned comparison: Identified before running the experiment. The experiment should be designed so that items to be estimated are estimable and their variance is as small as possible. Data-snooping: Looking at your data after the experiment has been performed, deciding something looks interesting, then doing a test on it. There's nothing inherently wrong with datasnooping -- often interesting results are found this way. But data-snooping tests need to be done with care to obtain an honest significance level. The problem is that they usually are the result of several comparisons, not just the one formally tested. So if, for example, a Bonferroni procedure is used, you need to take into account all the other comparisons that are done informally in setting a significance level. 4

5 6 Summary of utility of Bonferroni methods: Not recommended for data snooping -- it's too easy to overlook comparisons that were made in deciding what to test. OK for pre-planned comparisons when m is small. Not useful when m is large -- too conservative (CI s may be too large; type II error too large) Comment: In regression: Interest often in model building, not estimation or establishing causality. Thus less attention to multiple inference. (But model validation, using another data set, is important.) Some uses of regression do require attention to multiple inference (e.g., estimating more than one parameter in a regression equation). Bonferroni methods can be used in regression. Confidence regions in parameter space usually give tighter results. Unfortunately, many users of statistics aren t aware of problems with multiple comparisons.

2. General Comments on Methods for Multiple Comparisons. There are many methods for multiple comparison. All the methods that we will discuss produce confidence intervals with endpoints of the form C ˆ ± w se( C ˆ ), where: o C is the contrast or other parameter being estimated o ˆ C is the least squares estimate of C o se( ˆ C ) is the standard error of ˆ C o w (the critical coefficient) depends on the overall confidence level!, the method, the number v of treatments, the number m of things being estimated, and on the number of error degrees of freedom. For Bonferroni, w = w B = t(n-v,!/(2m)) 7 Note: The half-width w se( C ˆ ) of the confidence interval is called the minimum significant difference (msd) -- it is the smallest value of C ˆ that will produce a confidence interval not containing 0, and hence say the contrast is significantly different from zero. 3. Scheffe Method. Does not depend on the number of comparisons being made Applies to contrasts only. The idea: All contrasts are linear combination of the v-1 "treatment vs control" contrasts " 2 - " 1,, " v-1 - " 1. A 1-! confidence region for these v-1 contrasts is formed. This confidence region for these special contrasts determines confidence bounds for every possible contrast, independently of the number of contrasts. 8

9 10 Summary of utility of Scheffe method: Does not matter how many comparisons are made, so suitable for data snooping. For large m, gives shorter confidence intervals than Bonferroni. For m small, is "expensive insurance." Note: Minitab 15 does not give the Scheffe method, so we won't use it in this class. 4. Tukey Method for All Pairwise Comparisons. Used for all pairwise contrasts " i - " j. Also called the Honest Significant Difference Method, since (for equal sample sizes) it depends on the distribution of the statistic max{ T 1,...,T v }" min{ T 1,...,T v } Q = MSE r, where T i = Y i" - µ i. This distribution is called the Studentized range distribution. Like the F distribution, it has two degrees of freedom. Critical coefficient: w T = q(v, n-v,!)/ 2. For equal sample sizes, the overall confidence level is 1-!; for unequal sample sizes, it is at least 1-!. Note: Since this method only deals with pairwise contrasts, the standard error of " i - " j needed in the " calculation of the msd is just mse 1 + 1 % $ # r i r ' j &

Summary of utility of Tukey method: Usually gives shorter confidence intervals than either Bonferroni or Scheffe. In basic form can be used only for pairwise comparisons. (There is an extension to all contrasts, but it is usually not as good as Scheffe.) Example: Battery Experiment. 11 5. Dunnett Method for Treatment-Versus-Control Comparisons. If Treatment 1 is a control, then we are likely to be interested in the treatment-versus-control contrasts " i -" 1. Method is based on the joint distribution of the estimators Y i - Y 1 (a type of multivariate t- distribution). 12 Because the distribution is complicated, the calculation of w D is best left to reliable software. Not all software (e.g., Minitab) gives one-sided confidence intervals, which might be desired. Summary of utility of Dunnett method: Best method for treatment-versus-control comparisons. Not applicable to other types of comparisons. Example: Battery experiment.

6. Hsu's Method for Multiple Comparisons with the Best Treatment. Instead of comparing each treatment with the control group, each treatment is compared with the best of the other treatments. Procedure varies slightly depending on whether "best" is largest or smallest. (Minitab allows the user to check which is desired.) See p. 90 of textbook for details. Summary of utility of Hsu method: Good for what it does. Not applicable to other types of comparisons. Example: Battery experiment. 13 7. Other Methods. There are many. Books have been written on the subject (e.g., Miller, Hsu). Some people have their favorites, which others argue are not good choices. 8. Combinations of Methods. Various possibilities. See p.91 for some. The idea: Split! between the methods, analogous to the Bonferroni procedure. Example: If the experiment is intended to test treatment vs control: Use Dunnett with (overall! =.02 for that. Use Tukey or Hsu or Scheffe at overall! =.03 for other things of interest that arise. 14