Bootstrapping: described and illustrated Comparing the standard mean, 10 and 20% trimmed means, & median

Similar documents
Probability and Statistics Curriculum Pacing Guide

STA 225: Introductory Statistics (CT)

Analysis of Enzyme Kinetic Data

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Lecture 1: Machine Learning Basics

Statewide Framework Document for:

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Python Machine Learning

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Introduction to Simulation

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

On-the-Fly Customization of Automated Essay Scoring

AP Statistics Summer Assignment 17-18

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Truth Inference in Crowdsourcing: Is the Problem Solved?

Why Did My Detector Do That?!

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Introduction to Causal Inference. Problem Set 1. Required Problems

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

NIH Public Access Author Manuscript J Prim Prev. Author manuscript; available in PMC 2009 December 14.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Extending Place Value with Whole Numbers to 1,000,000

Shockwheat. Statistics 1, Activity 1

School Size and the Quality of Teaching and Learning

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Proof Theory for Syntacticians

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

arxiv: v1 [math.at] 10 Jan 2016

Software Maintenance

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

How to Judge the Quality of an Objective Classroom Test

Lesson M4. page 1 of 2

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Physics 270: Experimental Physics

MGT/MGP/MGB 261: Investment Analysis

Probability estimates in a scenario tree

Research Design & Analysis Made Easy! Brainstorming Worksheet

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

Role Models, the Formation of Beliefs, and Girls Math. Ability: Evidence from Random Assignment of Students. in Chinese Middle Schools

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Houghton Mifflin Online Assessment System Walkthrough Guide

Assignment 1: Predicting Amazon Review Ratings

Ohio s Learning Standards-Clear Learning Targets

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

Measures of the Location of the Data

Universityy. The content of

Visit us at:

Interpreting ACER Test Results

On the Distribution of Worker Productivity: The Case of Teacher Effectiveness and Student Achievement. Dan Goldhaber Richard Startz * August 2016

Introduction to the Practice of Statistics

NCEO Technical Report 27

Probability Therefore (25) (1.33)

B. How to write a research paper

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning From the Past with Experiment Databases

Working Paper: Do First Impressions Matter? Improvement in Early Career Teacher Effectiveness Allison Atteberry 1, Susanna Loeb 2, James Wyckoff 1

Linking Task: Identifying authors and book titles in verbose queries

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Norms How were TerraNova 3 norms derived? Does the norm sample reflect my diverse school population?

STAT 220 Midterm Exam, Friday, Feb. 24

CS Machine Learning

Mathematics subject curriculum

An Introduction to Simio for Beginners

16.1 Lesson: Putting it into practice - isikhnas

Graduate Division Annual Report Key Findings

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

Exploring Derivative Functions using HP Prime

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

12- A whirlwind tour of statistics

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

Generating Test Cases From Use Cases

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

success. It will place emphasis on:

Mathematics (JUN14MS0401) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Professor Christina Romer. LECTURE 24 INFLATION AND THE RETURN OF OUTPUT TO POTENTIAL April 20, 2017

I N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

School of Innovative Technologies and Engineering

Individual Differences & Item Effects: How to test them, & how to test them well

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Jason A. Grissom Susanna Loeb. Forthcoming, American Educational Research Journal

On-Line Data Analytics

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

Corpus Linguistics (L615)

American Journal of Business Education October 2009 Volume 2, Number 7

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

WORK OF LEADERS GROUP REPORT

Transcription:

Bootstrapping: described and illustrated Comparing the standard mean, 10 and 20% trimmed means, & median Before discussing the main topic, let us quickly review sampling distributions so that everyone is clear on the major theoretical background. Because the concept of a sampling distribution of a statistic (especially a mean) is so fundamental to bootstrapping what it s about, why it works as it does I want to review the following: The sampling distribution of the mean has three principal characteristics you should remember: (1) For any sample size n, the mean of all (!) sample means [drawn from a finite population] is necessarily equal to the population mean (such a statistic is said to be unbiased); (2) The variance of the distribution of all means (always) equals the population variance divided by n; and (perhaps surprisingly), (3) As sample size, n, grows larger, the shape of the sampling distribution of the mean tends toward that of a normal curve, regardless of the shape or form of the parent population. Thus, it is properly said that the distribution of the sample mean, necessarily has mean µ, and variance σ 2 /n, where these Greek letters stand for the population mean and variance respectively. Moreover, such a sampling distribution approaches normal form as the sample size n grows larger for [virtually] every population! It is the generality of the last point that is so distinctive. The square root of the variance σ 2 /n (written as σ/ n, so we may write σ mean = σ/ n) is called the standard error of the mean (i.e., the standard deviation of the sampling distribution of the mean, σ mean ); a term that is frequently encountered in statistical practice. What has just been stated are the principal results of the Central limit theorem. (Stop to note that the sampling distribution [of any statistic] is itself a population; but such a distribution is wholly distinct from the distribution of the parent population, or from the distribution of any particular sample. Because of its status as a population, the characteristics of a sampling distribution are generally denoted by Greek letters [consider that all possible samples of a given size were the sources of the statistics]. But don t confuse the sampling distribution (of any statistic and there is generally a different one for different statistics) with the parent population from which samples were drawn. The preceding points are fundamental to the bootstrap method (see below). But note that when we speak about a bootstrap distribution of a statistic we are talking about an approximate sampling distribution of a particular statistic (not just the mean!), based on a large number of bootstrap samples; and for each sample, the sampling is done with replacement from a particular sample-as-population. And each sample is of the same size as the original, i.e., n. Still that large number [1000 below] of statistics is far smaller than the total of all possible samples, which is generally n to the power n (n n ) in a bootstrap context, or N to the power n (N n ), for a finite population of size N.) Bootstrapping entails sampling with replacement from a vector (or rows of matrix or data frame; see next page, bottom for how to do in R), so that each bootstrap sample is always the same size as the original. But don t confine your thinking to just the mean as we begin to consider bootstrapping; in general, bootstrap distributions can be created for any statistic that can be computed and each statistic is based on a set of resampled data points. The following illustration begins from a vector y that contains n = 100 values, originally generated as a random sample from the t 3 distribution, i.e. t w/ 3 degrees of freedom, and then scaled to have a mean of 20 and a standard deviation of about 3. This accounts for the relatively long tails of y, compared with a Gaussian (normal) distribution that you see below. See plot of y

(which is both a sample and a population, depending on your point of view both ideas are relevant); its summary statistics (parameters?) are given below. Here is the R function I used to obtain four central tendency estimates for each of 1000 bootstrap samples: (Copy and paste means4 into your R session) means4 <- function(x,tr1=.1,tr2=.2) { # function that, given vector x, computes FOUR statistics to assess central tendency xm1 <- mean(x) xmt.1 <- mean(x, trim =tr1) # 10% trimmed mean xmt.2 <- mean(x,trim=tr2) # 20% trimmed mean xm.5 <- median(x) # 50% trimmed mean = median xms <- c(xm1, xmt.1,xmt.2, xm.5) xms=round(xms,2) list(xms=xms) } #the four means above are given as mean 1...mean4 below. Now, we use the bootstrap function from the library bootstrap: mns4.y <- bootstrap(y, nboots=1000,means4) command used for main bootstrap run (1000 replicates)[ nboot >= 1000 for good C.I s] I generated 1000 bootstrap replications of the four statistics [for library: bootstrap in R] All R commands given on page 4 below. Numerical summary of bootstrap results: >cbind(my.summary(y=x),my.summary(mns4.500)) pop. mean1 mean2 mean3 mean4 #mean1 is just conventional mean. means 20.00 19.99 20.01 20.00 20.02 departures from 20 indicate bias s.d.s 3.03 0.30 0.26 0.26 0.24 #first value is popul s.d./rest (bold vals=s.e. s) skewns -1.04-0.07 0.00-0.09-0.06 of s.d.s are bootstrap s.e. s

will discuss! See plot next p. #key results in bold italics above AND BELOW. A second run, again w/ 1000 bootstrap replications, gave results: means 19.99 20.02 20.01 20.03 # I ignore the pop. values here s.d.s 0.30 0.25 0.25 0.24 as they did not change. skewns -0.07 0.10 0.00-0.02 For practical purposes, identical. Following are the four bootstrap distributions for the first set: NB: The initial population was a long-tailed sample. It s use affords an opportunity to study the conventional sample mean as an estimate of the center of a distribution, when normality does not hold. We in fact see below that the conventional mean is the worst of the four estimators of the center of the distribution of the parent population, based on 1000 bootstrap samples. Remember initial sample as population had n=100 scores, so that n = 100 for each sample. Thus, the first s.e., for mean1, can be calculated by theory; that theory says divide the population s.d. by sqrt(n); here 3.03/sqrt(100)=.303. We are most informed by the computed standard error estimates; these quantify how well the different estimators of the center of this distribution work in relation to one another. To repeat, each bootstrap sample entails sampling WITH replacement from the elements of the initial data vector y. Each of the B = 1000 bootstrap samples contains n = 100 resampled scores, and all four statistics ( means ; three trimmed ) were computed for each bootstrap sample. The summary results, and especially standard error estimates, based on the bootstrap replicates are the principal results on which one will usually focus in studies like this. See the documentation for bootstrap for more information as to what this function does, or can do.

The first major book on the bootstrap was written by Bradley Efron, inventor of the bootstrap, and Tibshirani: An introduction to the bootstrap, 1993. There are now at least a dozen books, many of them technical, about bootstrapping. (The May 2003 issue of Statistical Science is devoted exclusively to articles on bootstrapping, for its 25 th anniversary.) Some things you may find useful about bootstrapping within the world of R: 1. A vector such as y, regarded as y[1:n], where one controls contents, e.g. y[c(1,3)] = 1 st and 3 rd elements of y; or y[n:1] presents y values in reverse order; or y[sample(1:n,n,repl=t)] yields a bootstrap sample of y, of size n; and the latter, repeated (using sampling WITH replacement), becomes a basis for bootstrap analysis. 2. A matrix such as yy, regarded as yy[1:n,1:p] (of order n x p) can be examined in parts using bracket notation; e.g. yy[1:3, ] displays the first 3 rows of yy; also, to sample the rows of yy, use yy[sample(1:n,n,repl=t), ], where comma in [, ] separates row and column designations. R commands used to get the preceding numerical results: > bt.mn4=bootstrap(x,nboot=1000,theta=means4) #x is called xt3 in Fig1 > bt.mns4=(as.data.frame(bt.mn4$theta)) #the output thetastar is of class list ; needs to be data.frame or a matrix for what follows. > bt.mns4=t(bt.mns4) #transpose of matrix (d.frame) bt.mns4 is taken for convenience (below) > gpairs(bt.mns4) #gpairs function is from package YaleToolkit > my.summary(bt.mns4) #I wrote my.summary. Copy it into your R session, or just use summary from R. my.summary <- function (xxx, dig = 3) { #xxx is taken to be input data.frame or matrix. xxx <- as.matrix(xxx) xm <- apply(xxx, 2, mean) s.d <- sqrt(apply(xxx, 2, var)) xs <- scale(xxx) sk <- apply(xs^3, 2, mean) kr <- apply(xs^4, 2, mean) - 3 rg <- apply(xxx, 2, range) sumry <- round(rbind(xm, s.d, sk, kr, rg), 3) dimnames(sumry)[1] <- list(c("means", "s.d.s", "skewns", "krtsis", "low", "high")) sumry <- round(sumry, dig) sumry } Bootstrapping sources in R R functions for bootstrapping can be found in the bootstrap and the boot library, so you should examine the help files for several of the functions in these libraries to see how to proceed. Note that bootstrap is a much smaller library than boot, and generally easier to use effectively. I recommend that you begin w/ the function bootstrap in the library of the same name, but the greater generality of boot will be most helpful in some more advanced situations. The help.search function will yield more packages that reference bootstrapping, so give that a try too. Naturally, many introductions and discussions can be found on the web; let s see what you like post a URL or two.

These pages summarize key points, and offer more general principles. Of particular interest is the concept of confidence intervals, and ways to use bootstrap methods to generate limits for CIs. Note the (deliberate) redundancy in what follows and what you have read above. The essence of bootstrapping: We begin by assuming a quantitative variable, for which we want to characterize or describe using numerous different statistics (means, medians, trimmed means, variances, sds, skewness, etc.). Our goal is to make inferences about parent population parameters using confidence intervals that have in turn been constructed using the information from a reasonably large number of computer generated bootstrap samples. (Take note: we will NOT introduce any mathematical theory here; all that follows involves computer intensive computation, but no theory as such.) 1. Begin from an initial sample, not too small (say, 30 or 40 cases at a minimum); this should be a random sample, or a sample for which we can reasonably argue that it reasonably re- presents some larger universe of scores that we shall think of as our parent population. Let y represent the sample data. 2. Decide what feature(s) of the parent population we would like to make inferences about ( center, spread, skewness, etc.); then, given one or two choices, say center and spread, decide on what statistics we want to use for inferences. We might have two, three or more alternative measures of each feature (e.g., four means for center; s.d.s and IQRs to assess spread, etc), a total of S statistics, say. One goal here is to compare various estimators with one another with respect to their purposes in helping to make inferences. We might also have begun with difference scores in our initial vector. 3. Compute and save each of these statistics for our initial sample of y values; we shall call them by a special name: bootstrap parameters (which are also statistics, see below). Reflect on this point since it is easy to be confused here. 3. Choose or write functions that we will be able to apply to each bootstrap sample, where each bootstrap sample is simply a sample drawn w/out replacement from the initial sample. The initial sample will now be regarded for the purposes of bootstrapping as our bootstrap population. (Note carefully that we must take care in what follows to distinguish the parent population from the bootstrap population. The latter population can be said to have bootstrap parameters that are also properly labeled as conventional sample statistics.) 4. Generate bootstrap samples a substantial number of times (say B = 500 to 1000 of these), where we save these bootstrap replicates (those that measure center, spread, skewness) for each of the bootstrap samples. Best to generate an array (matrix, of order B x S) that contains all of these; they shall be called replicate values for the respective statistics, and they will be the basis for the ultimate inferences. 5. Summarize the preceding table by columns, one for each statistic that relates to a particular feature of the initial bootstrap population (recalling that our bootstrap population began as our initial sample). Both numeric and graphical methods should usually be employed. 6. Compute and compare the (conventional) means of the replicate statistics (columns) with the bootstrap population parameters; the differences may positive or negative, and these differences measure bias. Ideally, we might seek zero bias, but small amounts of bias are usually tolerated, particularly if the biased statistics have compensating virtues, especially relatively small variation across the set of bootstrap samples. 7. Then compute and compare the s.d.s of the respective statistics; often the main goal of the entire bootstrapping study is to find which statistics have the smallest s.d.s (which is to say bootstrap standard errors) since these are the statistics that will have the narrowest confidence

intervals. If a statistic is found to be notably biased, we may want to adjust the statistics (nominally used as centers of our ultimate confidence intervals). 8. Generate the density distributions (histograms ok) and, more importantly, selected quantiles, of any bootstrap statistics (the bootstrap replicates) we generated. For, example if we aim to get a 95% interval for a trimmed mean, we find the 2.5% and the 97.5% quantile points of the distribution of that trimmed mean, and (supposing it has minimal bias) these become our confidence limits for a 95% interval. We will surely want to compare these limits with those for the conventional mean. Statistics with the narrowest CIs can usually be said to be best, particularly if they were found to be minimally biased. Similar methods are used for 99% CIs, etc. Graphics can be useful in this context, but be sure to note that all the information is based on a (rather arbitrary) initial sample, so care has to be taken not to misinterpret, or over-interpret results. 9. Summarize by comprehensively describing the main results, also noting that this methodology has bypassed normal theory methods that strictly speaking, apply only when normality assumptions can be invoked; moreover, we have made no assumptions about shapes or other features of the (putative [look it up!]) parent population. In particular, we have not assumed normality at any point, and we have gone (well) beyond the mean to consider virtually any statistic of interest (review these; add others). 10. Finally, recognize that the interpretation of any such bootstrap CI is essentially the same as that for a conventional CI gotten by normal theory methods. These ideas readily generalize to statistics that are vectors, such as vectors of regression coefficients. This means we are all free to invent and use bootstrap methods to study the comparative merits and demerits of a wide variety of statistics, without regard for whether they are supported by normal theory. We need not invoke normality assumptions, nor make any other so-called parametric assumptions in the process. The main thing to note is that any bootstrap sample drawn from a vector or matrix, is just a sample drawn with replacement from the ROWS of an initial sample data matrix; and (vector) bootstrap statistics are computed for each bootstrap matrix, analogous to what has been described above. The computation may be intense but with modern computers such operations are readily carried out for rather large matrices (thousands of rows, hundreds of columns) if efficient computational routines are used. Conventional statistics can be notably inferior to certain new counterparts, a point that needs often to be considered seriously.