Brian Hilburn, CHPR Dirk Schaefer, EUROCONTROL

Experimental Methods II Designing Experiments Brian Hilburn, CHPR Dirk Schaefer, EUROCONTROL COURSE 102: RESEARCH IN DECISION SUPPORT SYSTEMS FOR FUTURE AIR TRAFFIC MANAGEMENT La Granja 9th -12th July, 2012 www.hala-sesar.net

Overview Measurement Scales and Distributions Experimental Design Validity Sampling methods Statistical power Sample size Factorial design, introduction Experimental Design in Practice Exercise 10/07/12 2

Measurement Scales And Distributions 10/07/12 3

Measurement Scales Ratio - absolute zero Interval - equivalent units Ordinal - ordered attributes Nominal - named attributes 10/07/12 4

Distributions Discrete vs. Continuous distributions Discrete - finite values - categorisation of observations Continuous - infinite values - distribution of observations Number of children? Height? Wealth? Response time Errors? 10/07/12 5

Normal distribution Example: plot the weight of all people in the room Normal distribution (Gaussian, bell-shaped) Measures: mean μ; variance σ 2 ; standard deviation σ Standard distribution: μ = 0; σ 2 = 1 Floor and ceiling Testing for normal distribution 10/07/12 6

Deviations from normality Normal Non-normal kurtosis Leptokurtic (thin) Mesokurtic Platykurtic (flat) skewness negative positive 10/07/12 7

Some non-gaussian distributions Example: plot the number of customers queuing at the supermarket cash Poisson distribution discrete Constant in log/normal diagram Example: plot the income of all people in the country Power-law distribution Continuous Linear in log/log diagram 10/07/12 8

Characteristics of the normal distribution Measures of central tendency 2 2 3 3 3 4 5 Mean arithmetic average = 3.14 Median middle value = 3 Mode highest frequency value = 3 Measures of variance Variance = avg of squared differences from mean Standard deviation = Variance 10/07/12 9

Experimental Design 10/07/12 10

Experimental Design-- Overview Investigate possible cause-and-effect relationships INFER (p value) Manipulate one independent variable to influence the other variable(s) Control other relevant variables Measure effect by statistical means 10/07/12 11

Validity Validity vs (experimental) reliability Internal validity are we measuring correctly? Sampling, measurement, experimental runs, choice of stats tests, p value External validity do results generalise? Face validity does it look valid? (not true validity) Some threats to validity History events occur Maturation participants change Testing itself causes a change Instrumentation calibration shift in instrument (or scorer) Statistical regression sample selected for extreme scores will regress Biases in selection of groups Mortality differential loss of respondents across groups Confounds 10/07/12 12

Experimental Design: Basic Steps State the problem what is the effect of X on Y? Form hypothesis (one vs two tailed?) - H o - define Independent variable(s)- What you manipulate experimentally or via selection e.g. age, traffic load, display design - define Dependent Variables What you measure e.g. response time, preference, etc. - consider Control variables what is constant? Design (control group? Repeated trials?) Sample Collect data Analyse and conclude 10/07/12 13

Conducting the experiment Analyse & conclude How do we infer from statistics? Run How are experimental runs organised and run? Assign How are participants assigned to conditions? Sample How are participants chosen? Hypothesis Research Question Research Objective Curiosity 10/07/12 14

Example research questions Question 1 Do tall controllers perform better? Question 2 DV: nr of near misses per hour IV=? What are the two levels of the IV? Can we use the same controller? Do sober controllers perform better? No -> BETWEEN subjects design DV: nr of near misses per hour IV=? What are the two levels of the IV? Can we use the same controller? Yes! Yes (or No) -> WITHIN (or BETWEEN) subjects design 10/07/12 15

Repeated Measures vs. Between-Subjects Designs Repeated Measures: Same participant is exposed to various conditions, and/or repeated runs Some Advantages: Fewer Ss required (always a problem in ATM!) Greater statistical power Reduce variability Some Risks: Regression Conditions make repeated measures impossible Sequence effects 10/07/12 16

Sequence effect Differences in DV can sometimes be caused by the sequence of experimental runs when subjects participate in more than one run, e.g. Fatigue Learning Carry-over Maturation Reactivity Solutions include: Randomise Counterbalance conditions across Ss (eg Latin Square) Within Ss A B C D D C B A A B C B C A C A B 10/07/12 17

Sampling Random Stratified Stratified random Cluster (aka Multistage) Convenience (e.g. self selection) Systematic random Others. 10/07/12 18

Assignment and design Control confounds Permit conclusions about DV Methods include Blocking Groups stratified Holding variable(s) constant Only test 42 year old, IQ 110, 80Kg male controllers Randomising 10/07/12 19

Statistical power Power: ability of a test to correctly reject the null hypothesis Power is driven by e.g.: - sample size - alpha level - effect size Power test set sample size a priori Question: is effect size knowable a priori? 10/07/12 22

Estimating required sample size The number of samples / participants in an experiment must be determined; it depends on The experimental design, e.g. within- or between-subject design The error variance of the expected distribution (DV) The randomizing technique, e.g. Latin square for a 2*2 design means sample size must be multiples of 4 Techniques for assessing sample size Equations Look-up tables Pre-experiments Experience 10/07/12 23

Performance One factor design Does the new display help? 1 FACTOR, 2 LEVELS Baseline display New display 10/07/12 25 Main effect of Display

Performance Factorial design: Two factors Does the new display help both young and old controllers? 2 x 2 DESIGN Young Old Baseline display New display Main effect of Display Interaction, Age x Display Simple main effects 10/07/12 26

Factorial design: Three factors Does training help young and old controllers differently, in transitioning to the new display? 2 x 2 x 2 DESIGN No training Training 10/07/12 27

Transitioning to new cockpit automation based on data from Casner, 2003 10/07/12 28

Experimental Design A (hypothetical) quick and dirty study: Hypothesis: Controllers will accept the new iplane app Participants: email volunteers (n=4) 1 hour familiarisation, 1 hour test session Procedure: Verbal debrief and survey Measure: On a scale of 1-5, how much do you like iplane? Conclusion: Average is 4.2, therefore acceptance is good! How many errors can you find? 10/07/12 29

Experimental Design in Practice 10/07/12 30

Types of Experiments Live trials (Shadow-mode trial) Realism Human-in-the-loop experiments ( simulations ) Multi-operator Single operator Vignettes / non-nominal events scenarios Gaming sessions Fast-time simulations Numerical methods Control 10/07/12 31

Scenario design for ATM HITL simulations Within-subject design often preferable Reduces random effects / increases statistical power Participants can be debriefed / survey on a comparison for various design options BUT: you can t use the same scenario more than once Reduce scenario effects by designing comparable traffic scenarios Aircraft count Traffic complexity, e.g. NASA s Dynamic Density Anonymize scenarios Change aircraft callsigns Rotate / mirror-image 10/07/12 32

Scenario design for ATM vignettes In HITL simulations the situation unfolds in response to action taken by the operator Typically only the first aircraft or aircraft pair is comparable when repeating scenarios Vignettes are short traffic scenarios consisting of first aircraft pairs Typically 2-5 minutes 10/07/12 33

MUFASA vignettes 10/07/12 34

10/07/12 35

10/07/12 36

10/07/12 37

10/07/12 38

10/07/12 39

Existing vs. synthetic sectors Existing ( real ) sector Realistic Necessary if you want to observe a specific sector related effect, e.g. sector redesign Must use a homogenous population Reality-bias Synthetic sector Designed to meet research needs No constraint on population 10/07/12 40

Exercise 10/07/12 41

Exercise Please design an experiment for testing the hypothesis you have defined in the previous exercise. Outline the experimental plan. Work in the same groups. Time : 20 minutes. Be prepared to present your experimental plan in 2 minutes. 10/07/12 42

Backup slides 10/07/12 43