Brian Hilburn, CHPR Dirk Schaefer, EUROCONTROL

Similar documents
Probability and Statistics Curriculum Pacing Guide

STA 225: Introductory Statistics (CT)

Research Design & Analysis Made Easy! Brainstorming Worksheet

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Algebra 2- Semester 2 Review

School of Innovative Technologies and Engineering

Grade 6: Correlated to AGS Basic Math Skills

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

STAT 220 Midterm Exam, Friday, Feb. 24

12- A whirlwind tour of statistics

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

learning collegiate assessment]

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

On-the-Fly Customization of Automated Essay Scoring

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

CS Machine Learning

Tun your everyday simulation activity into research

Software Maintenance

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

M55205-Mastering Microsoft Project 2016

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

School Size and the Quality of Teaching and Learning

How to Design Experiments

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Introduction to the Practice of Statistics

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Case study Norway case 1

Characteristics of Functions

success. It will place emphasis on:

STABILISATION AND PROCESS IMPROVEMENT IN NAB

Exercise Format Benefits Drawbacks Desk check, audit or update

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Application of Virtual Instruments (VIs) for an enhanced learning environment

AP Statistics Summer Assignment 17-18

Julia Smith. Effective Classroom Approaches to.

EDPS 859: Statistical Methods A Peer Review of Teaching Project Benchmark Portfolio

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

A Comparison of Charter Schools and Traditional Public Schools in Idaho

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

On the Distribution of Worker Productivity: The Case of Teacher Effectiveness and Student Achievement. Dan Goldhaber Richard Startz * August 2016

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

On the Combined Behavior of Autonomous Resource Management Agents

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

APPENDIX A: Process Sigma Table (I)

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Mathematics subject curriculum

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

College Pricing and Income Inequality

Visit us at:

ACBSP Related Standards: #3 Student and Stakeholder Focus #4 Measurement and Analysis of Student Learning and Performance

Initial English Language Training for Controllers and Pilots. Mr. John Kennedy École Nationale de L Aviation Civile (ENAC) Toulouse, France.

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Analysis of Enzyme Kinetic Data

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

How to make your research useful and trustworthy the three U s and the CRITIC

Multiple regression as a practical tool for teacher preparation program evaluation

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

EDEXCEL FUNCTIONAL SKILLS PILOT. Maths Level 2. Chapter 7. Working with probability

Discovering Statistics

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

MGF 1106 Final Exam Review / (sections )

Lecture 15: Test Procedure in Engineering Design

Spinners at the School Carnival (Unequal Sections)

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

An Introduction to Simio for Beginners

Computerized Adaptive Psychological Testing A Personalisation Perspective

Unit 3 Ratios and Rates Math 6

An Automated Data Fusion Process for an Air Defense Scenario

Office Hours: Mon & Fri 10:00-12:00. Course Description

Math 96: Intermediate Algebra in Context

EGRHS Course Fair. Science & Math AP & IB Courses

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

Role Models, the Formation of Beliefs, and Girls Math. Ability: Evidence from Random Assignment of Students. in Chinese Middle Schools

Evidence for Reliability, Validity and Learning Effectiveness

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Transcription:

Experimental Methods II Designing Experiments Brian Hilburn, CHPR Dirk Schaefer, EUROCONTROL COURSE 102: RESEARCH IN DECISION SUPPORT SYSTEMS FOR FUTURE AIR TRAFFIC MANAGEMENT La Granja 9th -12th July, 2012 www.hala-sesar.net

Overview Measurement Scales and Distributions Experimental Design Validity Sampling methods Statistical power Sample size Factorial design, introduction Experimental Design in Practice Exercise 10/07/12 2

Measurement Scales And Distributions 10/07/12 3

Measurement Scales Ratio - absolute zero Interval - equivalent units Ordinal - ordered attributes Nominal - named attributes 10/07/12 4

Distributions Discrete vs. Continuous distributions Discrete - finite values - categorisation of observations Continuous - infinite values - distribution of observations Number of children? Height? Wealth? Response time Errors? 10/07/12 5

Normal distribution Example: plot the weight of all people in the room Normal distribution (Gaussian, bell-shaped) Measures: mean μ; variance σ 2 ; standard deviation σ Standard distribution: μ = 0; σ 2 = 1 Floor and ceiling Testing for normal distribution 10/07/12 6

Deviations from normality Normal Non-normal kurtosis Leptokurtic (thin) Mesokurtic Platykurtic (flat) skewness negative positive 10/07/12 7

Some non-gaussian distributions Example: plot the number of customers queuing at the supermarket cash Poisson distribution discrete Constant in log/normal diagram Example: plot the income of all people in the country Power-law distribution Continuous Linear in log/log diagram 10/07/12 8

Characteristics of the normal distribution Measures of central tendency 2 2 3 3 3 4 5 Mean arithmetic average = 3.14 Median middle value = 3 Mode highest frequency value = 3 Measures of variance Variance = avg of squared differences from mean Standard deviation = Variance 10/07/12 9

Experimental Design 10/07/12 10

Experimental Design-- Overview Investigate possible cause-and-effect relationships INFER (p value) Manipulate one independent variable to influence the other variable(s) Control other relevant variables Measure effect by statistical means 10/07/12 11

Validity Validity vs (experimental) reliability Internal validity are we measuring correctly? Sampling, measurement, experimental runs, choice of stats tests, p value External validity do results generalise? Face validity does it look valid? (not true validity) Some threats to validity History events occur Maturation participants change Testing itself causes a change Instrumentation calibration shift in instrument (or scorer) Statistical regression sample selected for extreme scores will regress Biases in selection of groups Mortality differential loss of respondents across groups Confounds 10/07/12 12

Experimental Design: Basic Steps State the problem what is the effect of X on Y? Form hypothesis (one vs two tailed?) - H o - define Independent variable(s)- What you manipulate experimentally or via selection e.g. age, traffic load, display design - define Dependent Variables What you measure e.g. response time, preference, etc. - consider Control variables what is constant? Design (control group? Repeated trials?) Sample Collect data Analyse and conclude 10/07/12 13

Conducting the experiment Analyse & conclude How do we infer from statistics? Run How are experimental runs organised and run? Assign How are participants assigned to conditions? Sample How are participants chosen? Hypothesis Research Question Research Objective Curiosity 10/07/12 14

Example research questions Question 1 Do tall controllers perform better? Question 2 DV: nr of near misses per hour IV=? What are the two levels of the IV? Can we use the same controller? Do sober controllers perform better? No -> BETWEEN subjects design DV: nr of near misses per hour IV=? What are the two levels of the IV? Can we use the same controller? Yes! Yes (or No) -> WITHIN (or BETWEEN) subjects design 10/07/12 15

Repeated Measures vs. Between-Subjects Designs Repeated Measures: Same participant is exposed to various conditions, and/or repeated runs Some Advantages: Fewer Ss required (always a problem in ATM!) Greater statistical power Reduce variability Some Risks: Regression Conditions make repeated measures impossible Sequence effects 10/07/12 16

Sequence effect Differences in DV can sometimes be caused by the sequence of experimental runs when subjects participate in more than one run, e.g. Fatigue Learning Carry-over Maturation Reactivity Solutions include: Randomise Counterbalance conditions across Ss (eg Latin Square) Within Ss A B C D D C B A A B C B C A C A B 10/07/12 17

Sampling Random Stratified Stratified random Cluster (aka Multistage) Convenience (e.g. self selection) Systematic random Others. 10/07/12 18

Assignment and design Control confounds Permit conclusions about DV Methods include Blocking Groups stratified Holding variable(s) constant Only test 42 year old, IQ 110, 80Kg male controllers Randomising 10/07/12 19

Statistical power Power: ability of a test to correctly reject the null hypothesis Power is driven by e.g.: - sample size - alpha level - effect size Power test set sample size a priori Question: is effect size knowable a priori? 10/07/12 22

Estimating required sample size The number of samples / participants in an experiment must be determined; it depends on The experimental design, e.g. within- or between-subject design The error variance of the expected distribution (DV) The randomizing technique, e.g. Latin square for a 2*2 design means sample size must be multiples of 4 Techniques for assessing sample size Equations Look-up tables Pre-experiments Experience 10/07/12 23

Performance One factor design Does the new display help? 1 FACTOR, 2 LEVELS Baseline display New display 10/07/12 25 Main effect of Display

Performance Factorial design: Two factors Does the new display help both young and old controllers? 2 x 2 DESIGN Young Old Baseline display New display Main effect of Display Interaction, Age x Display Simple main effects 10/07/12 26

Factorial design: Three factors Does training help young and old controllers differently, in transitioning to the new display? 2 x 2 x 2 DESIGN No training Training 10/07/12 27

Transitioning to new cockpit automation based on data from Casner, 2003 10/07/12 28

Experimental Design A (hypothetical) quick and dirty study: Hypothesis: Controllers will accept the new iplane app Participants: email volunteers (n=4) 1 hour familiarisation, 1 hour test session Procedure: Verbal debrief and survey Measure: On a scale of 1-5, how much do you like iplane? Conclusion: Average is 4.2, therefore acceptance is good! How many errors can you find? 10/07/12 29

Experimental Design in Practice 10/07/12 30

Types of Experiments Live trials (Shadow-mode trial) Realism Human-in-the-loop experiments ( simulations ) Multi-operator Single operator Vignettes / non-nominal events scenarios Gaming sessions Fast-time simulations Numerical methods Control 10/07/12 31

Scenario design for ATM HITL simulations Within-subject design often preferable Reduces random effects / increases statistical power Participants can be debriefed / survey on a comparison for various design options BUT: you can t use the same scenario more than once Reduce scenario effects by designing comparable traffic scenarios Aircraft count Traffic complexity, e.g. NASA s Dynamic Density Anonymize scenarios Change aircraft callsigns Rotate / mirror-image 10/07/12 32

Scenario design for ATM vignettes In HITL simulations the situation unfolds in response to action taken by the operator Typically only the first aircraft or aircraft pair is comparable when repeating scenarios Vignettes are short traffic scenarios consisting of first aircraft pairs Typically 2-5 minutes 10/07/12 33

MUFASA vignettes 10/07/12 34

10/07/12 35

10/07/12 36

10/07/12 37

10/07/12 38

10/07/12 39

Existing vs. synthetic sectors Existing ( real ) sector Realistic Necessary if you want to observe a specific sector related effect, e.g. sector redesign Must use a homogenous population Reality-bias Synthetic sector Designed to meet research needs No constraint on population 10/07/12 40

Exercise 10/07/12 41

Exercise Please design an experiment for testing the hypothesis you have defined in the previous exercise. Outline the experimental plan. Work in the same groups. Time : 20 minutes. Be prepared to present your experimental plan in 2 minutes. 10/07/12 42

Backup slides 10/07/12 43