Making Sense of Statistics

Similar documents
STA 225: Introductory Statistics (CT)

EDEXCEL FUNCTIONAL SKILLS PILOT. Maths Level 2. Chapter 7. Working with probability

Probability and Statistics Curriculum Pacing Guide

EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 4. Working with measures

Genre Trajectories. Identifying, Mapping, Projecting. Garin Dowd. Natalia Rulyova. Edited by. and. University of West London, UK

To link to this article: PLEASE SCROLL DOWN FOR ARTICLE

Guide to Teaching Computer Science

AP Statistics Summer Assignment 17-18

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

CHALLENGES FACING DEVELOPMENT OF STRATEGIC PLANS IN PUBLIC SECONDARY SCHOOLS IN MWINGI CENTRAL DISTRICT, KENYA

MODULE 4 Data Collection and Hypothesis Development. Trainer Outline

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Knowledge management styles and performance: a knowledge space model from both theoretical and empirical perspectives

Evaluating Statements About Probability

International Series in Operations Research & Management Science

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Shockwheat. Statistics 1, Activity 1

The Singapore Copyright Act applies to the use of this document.

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Problem-Solving with Toothpicks, Dots, and Coins Agenda (Target duration: 50 min.)

Ideas for Intercultural Education

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

How the Guppy Got its Spots:

Lecture Notes on Mathematical Olympiad Courses

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Analysis of Enzyme Kinetic Data

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

Excel Formulas & Functions

Practical Research Planning and Design Paul D. Leedy Jeanne Ellis Ormrod Tenth Edition

Spoken English, TESOL and Applied Linguistics

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Tuesday 13 May 2014 Afternoon

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Problem Solving for Success Handbook. Solve the Problem Sustain the Solution Celebrate Success

Visit us at:

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Mathematics subject curriculum

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor.

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

The Indices Investigations Teacher s Notes

THE INFLUENCE OF COOPERATIVE WRITING TECHNIQUE TO TEACH WRITING SKILL VIEWED FROM STUDENTS CREATIVITY

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Research Design & Analysis Made Easy! Brainstorming Worksheet

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Section I: The Nature of Inquiry

A THESIS. By: IRENE BRAINNITA OKTARIN S

D Road Maps 6. A Guide to Learning System Dynamics. System Dynamics in Education Project

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Answers To Hawkes Learning Systems Intermediate Algebra

Advanced Grammar in Use

Algebra 2- Semester 2 Review

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Julia Smith. Effective Classroom Approaches to.

Level 1 Mathematics and Statistics, 2015

AP English Literature & Composition Syllabus

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Grade 6: Correlated to AGS Basic Math Skills

learning collegiate assessment]

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

BENG Simulation Modeling of Biological Systems. BENG 5613 Syllabus: Page 1 of 9. SPECIAL NOTE No. 1:

Learning Disability Functional Capacity Evaluation. Dear Doctor,

GDP Falls as MBA Rises?

Diagnostic Test. Middle School Mathematics

What the National Curriculum requires in reading at Y5 and Y6

Technical Manual Supplement

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Grade 6: Module 2A Unit 2: Overview

Instrumentation, Control & Automation Staffing. Maintenance Benchmarking Study

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Literature and the Language Arts Experiencing Literature

Functional Skills Mathematics Level 2 assessment

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

THE PROMOTION OF SOCIAL AWARENESS

STAT 220 Midterm Exam, Friday, Feb. 24

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Investment in e- journals, use and research outcomes

A Practical Introduction to Teacher Training in ELT

correlated to the Nebraska Reading/Writing Standards Grades 9-12

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

White Paper. The Art of Learning

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Theory of Probability

International Journal of Innovative Research and Advanced Studies (IJIRAS) Volume 4 Issue 5, May 2017 ISSN:

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Bergen Community College Division of English Department Of Composition and Literature. Course Syllabus. WRT 206: Memoir and Creative Nonfiction

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Transcription:

Making Sense of Statistics

Visit our online Study Skills resource at www.skills4study.com Palgrave Study Guides A Handbook of Writing for Engineers Joan van Emden Authoring a PhD Patrick Dunleavy Effective Communication for Arts and Humanities Students Joan van Emden and Lucinda Becker Effective Communication for Science and Technology Joan van Emden How to Manage your Arts, Humanities and Social Science Degree Lucinda Becker How to Manage your Science and Technology Degree Lucinda Becker and David Price How to Study Foreign Languages Marilyn Lewis How to Write Better Essays Bryan Greetham Key Concepts in Politics Andrew Heywood Making Sense of Statistics Michael Wood The Mature Student s Guide to Writing Jean Rose The Postgraduate Research Handbook Gina Wisker Professional Writing Sky Marsen Research Using IT Hilary Coombes Skills for Success Stella Cottrell The Student s Guide to Writing John Peck and Martin Coyle The Study Skills Handbook (second edition) Stella Cottrell Studying Economics Brian Atkinson and Susan Johns Studying History (second edition) Jeremy Black and Donald M. MacRaild Studying Mathematics and its Applications Peter Kahn Studying Modern Drams (second edition) Kenneth Pickering Studying Psychology Andrew Stevenson Study Skills for Speakers of English as a Second Language Marilyn Lewis and Hayo Reinders Teaching Study Skills and Supporting Learning Stella Cottrell Palgrave Study Guides: Literature General Editors: John Peck and Martin Coyle How to Begin Studying English Literature (third edition) Nicholas Marsh How to Study a Jane Austen Novel (second edition) Vivien Jones How to Study Chaucer (second edition) Rob Pope How to Study a Charles Dickens Novel Keith Selby How to Study an E. M. Forster Novel Nigel Messenger How to Study James Joyce John Blades How to Study Linguistics (second edition) Geoffrey Finch How to Study Modern Poetry Tony Curtis How to Study a Novel (second edition) John Peck How to Study a Poet (second edition) John Peck How to Study a Renaissance Play Chris Coles How to Study Romantic Poetry (second edition) Paul O Flinn How to Study a Shakespeare Play (second edition) John Peck and Martin Coyle How to Study Television Keith Selby and Ron Cowdery Linguistic Terms and Concepts Geoffrey Finch Literary Terms and Criticism (third edition) John Peck and Martin Coyle Practical Criticism John Peck and Martin Coyle

Making Sense of Statistics A Non-mathematical Approach Michael Wood

Michael Wood 2003 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. No paragraph of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London W1T 4LP. Any person who does any unauthorised act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The author has asserted his right to be identified as the author(s) of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2003 by PALGRAVE MACMILLAN Houndmills, Basingstoke, Hampshire RG21 6XS and 175 Fifth Avenue, New York, N.Y. 10010 Companies and representatives throughout the world PALGRAVE MACMILLAN is the global academic imprint of the Palgrave Macmillan division of St. Martin s Press, LLC and of Palgrave Macmillan Ltd. Macmillan is a registered trademark in the United States, United Kingdom and other countries. Palgrave is a registered trademark in the European Union and other countries. ISBN 978-1-4039-0107-1 ISBN 978-0-230-80278-0 (ebook) DOI 10.1007/978-0-230-80278-0 This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. A catalogue record for this book is available from the British Library. 10 9 8 7 6 5 4 3 2 12 11 10 09 08 07 06 05 04

for Annette

Contents List of Figures List of Tables Preface xi xiii xv 1 Introduction: Statistics, Non-mathematical Methods and How to Use this Book 1 1.1 Statistics 1 1.2 The difficulties of statistics 3 1.3 Non-mathematical methods 5 1.4 Bucket and ball models and computer simulation 7 1.5 Numbers, calculators and computers 10 1.6 Suggestions for studying statistics and using this book 11 2 Probability, Samples, Buckets and Balls 16 2.1 Probability 16 2.2 What do the balls and buckets represent? 19 2.3 Where do probabilities come from? 20 2.4 What can you do with probabilities? 21 2.5 Sampling 21 2.6 Similar concepts 25 2.7 Exercises 26 2.8 Summary of main points 27 3 Summing Things up: Graphs, Averages, Standard Deviations, Correlations and so on 28 3.1 Introduction 28 3.2 What can a sample tell us about? 29 3.3 Variables, cases and units of analysis 30 3.4 Summarising a single variable 31 3.5 Summarising the relation between two category variables 39 3.6 Summarising the relation between one category and one number variable 40 vii

viii Contents 3.7 Summarising the relation between two number variables 40 3.8 Summarising the relation between three or more variables 48 3.9 Beyond the data: cause, effect and the wider context 49 3.10 Similar concepts 51 3.11 Exercises 52 3.12 Summary of main points 54 4 Why Use Statistics? Pros, Cons and Alternatives 55 4.1 Introduction 55 4.2 Characteristic features of the statistical approach 56 4.3 What sort of conclusions does statistics provide? 57 4.4 What are the alternatives to statistics? 59 4.5 Anecdotes about specific cases 60 4.6 Fuzzy logic 61 4.7 Chaos 63 4.8 Exercise 67 4.9 Summary of main points 67 5 Calculating Probabilities: Mental Ball Crunching and Computer Games 69 5.1 Working out probabilities by listing equally likely possibilities 69 5.2 Working out probabilities by thought experiments: the lots of times tactic 70 5.3 Working out probabilities by computer simulation 75 5.4 Simulating distributions of probabilities: the two bucket model 77 5.5 The Poisson distribution: counts of random events 81 5.6 The normal distribution 85 5.7 Similar concepts 90 5.8 Exercises 90 5.9 Summary of main points 92 6 Possible Worlds and Actual Worlds: How can we Decide What s True? 94 6.1 Introduction 94 6.2 An experiment on telepathy 95 6.3 Actual and possible worlds 96 6.4 Bayes principle: the pruned ball method 96 6.5 Confidence intervals 101

Contents ix 6.6 Null hypothesis testing 101 6.7 Exercises 103 6.8 Summary of main points 104 7 How Big is the Error? Confidence Intervals 105 7.1 Introduction 105 7.2 Bootstrap confidence intervals 106 7.3 Large populations: does size matter? 112 7.4 What is the smallest sample we can get away with? 113 7.5 Other statistics: proportions, quartiles, correlations and so on 115 7.6 Assumptions behind bootstrap confidence intervals: what to check for 117 7.7 Similar concepts 118 7.8 Exercises 119 7.9 Summary of main points 121 8 Checking if Anything is Going on: Tests of Null Hypotheses 122 8.1 An example: testing the hypothesis that a doctor is not a murderer 122 8.2 General procedure and rationale for null hypothesis tests 124 8.3 Testing a hypothesis that there is no relationship 125 8.4 An alternative: confidence intervals for a relationship 131 8.5 The interpretation of p values 133 8.6 Why p values can be misleading: things to watch for 135 8.7 Choosing a method of inference: p values and other approaches 138 8.8 Similar concepts 139 8.9 Exercises 141 8.10 Summary of main points 142 9 Predicting the Unpredictable or Explaining the Inexplicable: Regression Models 143 9.1 Introduction: predicting earnings on the Isle of Fastmoney 143 9.2 Straight line prediction models: linear regression 145 9.3 Assessing the accuracy of prediction models 151 9.4 Prediction models with several independent variables: multiple regression 154 9.5 Using the regression procedures in Excel and SPSS 159

x Contents 9.6 Things to check for with regression 160 9.7 Cause, effect, prediction and explanation 162 9.8 Similar concepts 164 9.9 Exercises 165 9.10 Summary of main points 166 10 How to do it and What Does it Mean? The Design and Interpretation of Investigations 167 10.1 The logic of empirical research: surveys, experiments and so on 167 10.2 The practicalities of doing empirical research 172 10.3 Interpreting statistical results and solving problems 174 10.4 Similar concepts 176 10.5 Exercises 177 Appendices 178 A Using spreadsheets (Excel) for statistics 178 A.1 Functions 180 A.2 Random number functions: generating samples and probabilities 180 A.3 Statistical procedures 181 A.4 Pivot Table reports 181 A.5 The Solver 182 A.6 Formatting and rounding numbers 182 A.7 Producing a histogram 182 B A brief guide to the statistical package, SPSS 183 B.1 Data entry 183 B.2 Analysis: commonly used methods 184 C Data and program files for downloading 185 D Comments on some of the exercises 187 Notes 192 References 201 Index 202

List of Figures 3.1 Histogram of units drunk on Saturday based on sample of 92 (one outlier excluded) 32 3.2 Histogram of estimated total weekly units drunk based on sample of 92 (three outliers excluded) 32 3.3 Scatter diagram of Satunits and Sununits 41 3.4 Scatter diagram of Satunits and Monunits 42 3.5 Scatter diagram of age and Satunits 43 3.6 SPSS scatter diagram with sunflowers 44 3.7 Scatter diagrams corresponding to correlations of +1, 0 and -1 47 4.1 Truth values for drinks too much 62 5.1 Tree diagram for three coin-tosses 71 5.2 Tree diagram for hands of two cards 73 5.3 Distribution of the number of girls among 100 babies based on 200 000 simulated groups of babies 86 5.4 Normal distribution to model heights of 18-year-old males 87 5.5 Two bucket model simulation of the normal distribution 88 7.1 First resample (question on socialising) 109 7.2 Resample distribution (question on socialising) 109 7.3 Error distribution (question on socialising) 110 7.4 Top of Lots of resamples sheet of resample.xls 116 8.1 Probability of different numbers of deaths (mean = 40) 123 8.2 Resample distribution of difference of means 128 8.3 Resample distribution of range of means (sample of 20) 129 8.4 Resample distribution of range of means (sample of 92) 130 8.5 Confidence distribution for difference between proportion of female smokers and proportion of male smokers 132 8.6 Confidence distribution for Kendall correlation between drinking and smoking 133 9.1 Relationship between 10 km run time and earnings (based on sample of 16) 145 9.2 Relationship between 10 km run time and 10 km cycle time (based on sample of 16) 146 xi

xii List of Figures 9.3 One line for predicting 10 km cycle time from 10 km run time (based on sample of 16) 147 9.4 A better line for predicting 10 km cycle time from 10 km run time (based on sample of 16) 148 9.5 Best fit line for predicting 10 km cycle time from 10 km run time (based on sample of 16) 150 9.6 Best fit line for predicting earnings from 10 km run time (based on sample of 16) 151

List of Tables 2.1 Numbers of men in different categories 19 2.2 Random numbers 23 3.1 Drink data from 20 students 29 3.2 Frequencies for Figure 3.2 33 3.3 Weights in grams of apples in two bags 37 3.4 Percentages of female and male smokers in Table 3.1 40 3.5 Mean units drank on Saturday night for males and females (based on Table 3.1) 41 3.6 Calculation of a correlation coefficient between Satunits and Sununits 46 3.7 Relation between Kendall s correlation coefficient and the probability of a same-direction observation 47 3.8 Kendall s correlation coefficient between variables in Table 3.1 (data from all 92 students) 48 3.9 Table showing relation between three variables in drink data 49 4.1 Balances in four bank accounts opened on 1 January 64 5.1 Computer simulation of 1000 jennys 76 5.2 Lottery probability distribution based on 200 resamples 79 5.3 Lottery probability distribution based on 6 000 000 resamples 80 5.4 Probability distribution from 10000 simulated and 38 real Manchester United matches 84 6.1 Telepathy experiment: number of balls in different categories 98 7.1 Data from question about socialising with accountants 107 7.2 Guessed population for question about socialising with accountants 108 7.3 Estimated 95% confidence intervals and error limits based on pilot sample of 10 114 8.1 Differences between shop X and shop Y: p values 137 8.2 Differences between shop X and shop Y: confidence intervals 139 9.1 Subsample of data from the Isle of Fastmoney 144 9.2 Calculation of error and square error for Figure 9.3 148 9.3 Calculation of error and square error for Figure 9.4 149 9.4 Calculation of three variable prediction model (three slopes and base prediction equal to 3) 156 xiii

xiv List of Tables 9.5 Three variable regression model for earnings based on subsample of 16 157 9.6 Five variable regression model for earnings based on subsample of 16 158 9.7 Regression terminology 160

Preface There are a great many books on statistics, ranging from advanced mathematical works to simple introductions to the subject. The problem with books in the latter category is that statistics is a difficult subject: it is not that simple. This means that introductory books either cover only the very beginnings of the subject, or they leave a lot unmentioned or unexplained. This book takes an unusual approach to statistics a non-mathematical approach which is designed to tackle this problem. The idea is to present a view of statistics which clarifies what the subject is really about, without getting bogged down in mathematics. There are only two mathematical equations in this book, and both are optional diversions from the main argument. Instead, the book focuses on concepts which are simple enough to describe in words and on computer simulations. The book is intended for anyone who wants an introduction to probability and statistics which concentrates on the underlying concepts, rather than the mathematical formulae. This includes students studying statistics at college or university, readers who use statistics in research, or who want to be able to follow the ins and outs of the statistical results described in research reports, as well as readers with a general interest in statistics. The book has a website at www.palgrave.com/studyguides/wood, from where you can download data files, interactive Excel spreadsheets and a program for resampling. I am grateful to many people for their comments, suggestions, reactions and encouragement, especially Arief Daynes, Andreas Hoecht, David Preece, Alan Rutter, Alan Stockdale, Annette Wood, and several reviewers of earlier drafts of the book. Michael Wood Emsworth, 2003 xv