Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Similar documents
12- A whirlwind tour of statistics

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

School Size and the Quality of Teaching and Learning

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

CSC200: Lecture 4. Allan Borodin

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

The Evolution of Random Phenomena

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

CS Machine Learning

A. What is research? B. Types of research

STA 225: Introductory Statistics (CT)

Greek Teachers Attitudes toward the Inclusion of Students with Special Educational Needs

Introduction to Questionnaire Design

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Houghton Mifflin Online Assessment System Walkthrough Guide

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

Probability and Statistics Curriculum Pacing Guide

Sight Word Assessment

ACADEMIC TECHNOLOGY SUPPORT

On-the-Fly Customization of Automated Essay Scoring

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Lecture 1: Machine Learning Basics

STA2023 Introduction to Statistics (Hybrid) Spring 2013

Lecture 2: Quantifiers and Approximation

How To Enroll using the Stout Mobile App

Research Design & Analysis Made Easy! Brainstorming Worksheet

TU-E2090 Research Assignment in Operations Management and Services

Office Hours: Mon & Fri 10:00-12:00. Course Description

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING

ATW 202. Business Research Methods

Physics 270: Experimental Physics

(Sub)Gradient Descent

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Test How To. Creating a New Test

CEE 2050: Introduction to Green Engineering

Cal s Dinner Card Deals

Home Access Center. Connecting Parents to Fulton County Schools

Discovering Statistics

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Cognitive Thinking Style Sample Report

PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Evaluation of Teach For America:

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

CUNY Academic Works. City University of New York (CUNY) Hélène Deacon Dalhousie University. Rebecca Tucker Dalhousie University

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Grade 6: Correlated to AGS Basic Math Skills

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Occupational Therapy and Increasing independence

Analyzing the Usage of IT in SMEs

Academic Integrity RN to BSN Option Student Tutorial

What to Do When Conflict Happens

Chapter 4 - Fractions

Getting Started with Deliberate Practice

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

What is related to student retention in STEM for STEM majors? Abstract:

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Active Learning. Yingyu Liang Computer Sciences 760 Fall

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

SECTION 12 E-Learning (CBT) Delivery Module

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Van Andel Education Institute Science Academy Professional Development Allegan June 2015

PREVIEW LEADER S GUIDE IT S ABOUT RESPECT CONTENTS. Recognizing Harassment in a Diverse Workplace

A Program Evaluation of Connecticut Project Learning Tree Educator Workshops

MAT 122 Intermediate Algebra Syllabus Summer 2016

How to make your research useful and trustworthy the three U s and the CRITIC

Genevieve L. Hartman, Ph.D.

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

STUDENT SATISFACTION IN PROFESSIONAL EDUCATION IN GWALIOR

Effects of Anonymity and Accountability During Online Peer Assessment

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Running head: DELAY AND PROSPECTIVE MEMORY 1

Evaluating Statements About Probability

Moodle Student User Guide

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Evidence for Reliability, Validity and Learning Effectiveness

Your School and You. Guide for Administrators

Race, Class, and the Selective College Experience

Comparison of network inference packages and methods for multiple networks inference

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Introduction to Causal Inference. Problem Set 1. Required Problems

Mathematics Success Level E

Association Between Categorical Variables

Theory of Probability

Lab 1 - The Scientific Method

How to Design Experiments

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

INTERMEDIATE ALGEBRA Course Syllabus

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Transcription:

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1

Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available 2

INTERVIEWS 3

Why an interview Rich data (from fewer people) Good for exploration When you aren t sure what you ll find Helps identify themes, gain new perspectives Usually cannot generalize quantitatively Potential for bias (conducting, analyzing) Structured vs. semi-structured 4

Interview best practices Make participants comfortable Avoid leading questions Support whatever participants say Don t make them feel incorrect or stupid Know when to ask a follow-up Get a broad range of participants (hard) 5

Try it! In pairs, write two interview questions about password security/usability Change partners with another pair and ask each other; report back 6

DIARY STUDIES 7

Why do a diary study? Rich longitudinal data (from a few participants) In the field ish Natural reactions and occurences Existence and quantity of phenomena User reactions in the moment rather than via recall Lots of work for you and your participants On paper vs. technology-mediated 8

Experience sampling Kind of a prompted diary Send participants a stimulus when they are in their natural life, not in the lab 9

Diary / ESM best practices When will an entry be recorded? How often? Over what time period? How long will it take to record an entry? How structured is the response? Pay well Pay per response? But don t create bias 10

Facebook regrets (Wang et al.) Online survey, interviews, diary study, 2 nd survey What do people regret posting? Why? How do users mitigate? 11

FB regrets Interviews Semi-structured, in-person, in-lab Recruiting via Craigslist Why pre-screen questionnaire? 19/301 Coded by a single author for high-level themes 12

FB regrets Diary study The diary study did not turn out to be very useful Daily online form (30 days) Facebook activities, incidents Have you changed anything in your privacy settings? What and why? Have you posted something on Facebook and then regretted doing it? Why and what happened? 22+ days of entries: $15 12/19 interviewees entered 1+ logs (217 total logs) 13

Location-sharing (Consolvo et al.) Whether and what about location to disclose To people you know Preliminary interview Buddy list, expected preferences Two-week ESM (simulated location requests) Final interview to reflect on experience 14

ESM study Whether to disclose or not, and why Customized askers, customized context questions If so, how granular? Where are you and what are you doing? One-time or standing request $60-$250 to maximize participation Average response rate: above 90% 15

Statistics for experimental comparisons The main idea: Hypothesis testing Choosing the right test: Comparisons Regressions Other stuff Non-independence, directional tests, effect size Tools 16

What s the big idea, anyway? OVERVIEW 17

Statistics In general: analyzing and interpreting data We often mean: Statistical hypothesis testing Question: Are two things different? Is it unlikely the data would look like this unless there is actually a difference in real life? 18

Important note This lecture is not going to be precise or complete. It is intended to give you some intuition and help you understand what questions to ask. 19

The prototypical case Q: Do ponies who drink more caffeine make better passwords? Experiment: Recruit 30 ponies. Give 15 caffeine pills and 15 placebos. They all create passwords. http://www.fanpop.com/clubs/my-little-pony-friendship-is-magic/images/33207334/title/little-pony-friendship-magic-photo 20

Hypotheses Null hypothesis: There is no difference Caffeine does not affect pony password strength. Alternative hypothesis: There is a difference Caffeine affects pony password strength. Note what is not here (more on this later): Which direction is the effect? How strong is the effect? 21

Hypotheses, continued Statistical test gives you one of two answers: 1. Reject the null: We have (strong) evidence the alternative is true. 2. Don t reject the null: We don t have (strong) evidence the alternative is true. Again, note what isn t here: We have strong evidence the null is true. (NOPE) 22

P values What is the probability that the data would look like this if there s no actual difference? i.e., Probability we tell everyone about ponies and caffeine but it isn t really true Most often, α = 0.05; some people choose 0.01 If p < 0.05, reject null hypothesis; there is a significant difference between caffeine and placebo A p-value is not magic, just probability, and the threshold is arbitrary But, reported TRUE or FALSE: You don t say something is more significant because the p-value is lower 23

Type II Error (False negative) There is a difference, but you didn t find evidence No one will know the power of caffeinated ponies Hypothesis tests DO NOT BOUND this error Instead, statistical power is the probability of rejecting the null hypothesis if you should Requires that you estimate the effect size (hard) 24

Hypotheses, power, probability After an experiment, one of four things has happened (total P=1). PROBABILITY You rejected the null You didn t Reality: Difference Estimated via power analysis? Reality: No difference Bounded by α? Which box are you in? You don t know. 25

Correlation and causation Correlation: We observe that two things are related Do rural or urban ponies make stronger passwords? Causation: We randomly assigned participants to groups and gave them different treatments If designed properly Do password meters help ponies? 26

CHOOSING THE RIGHT TEST 27

What kind of data do you have? Explanatory variables: inputs, x-values e.g., conditions, demographics Outcome variables: outputs, y-values e.g., time taken, Likert responses, password strength 28

What kind of data do you have? Quantitative Discrete (Number of caffeine pills taken by each pony) Continuous (Weight of each pony) Categorical Binary (Is it or isn t it a pony?) Nominal: No order (Color of the pony) Ordinal: Ordered (Is the pony super cool, cool, a little cool, or uncool) http://i196.photobucket.com/albums/aa92/ karina408_album/wallpaper-53.jpg 29

What kind of data do you have? Does your dependent data follow a normal distribution? (You can calculate this!) If so, use parametric tests. If not, use non-parametric tests. http://www.wikipedia.org Are your data independent? If not, repeated-measures, mixed models, etc. 30

If both are categorical. Participants each used one of two systems Did they like the system they got? (Yes/no) H A : System affects user sentiment Use (Pearson s) χ 2 (Chi-squared) test of independence. Fewer than 5 data points in any single cell, use Fisher s Exact Test (also works with lots of data) 31

Contingency tables Rows one variable, columns the other Example: Row = condition Column = true/false χ 2 = 97.013, df = 14, p = 1.767e-14 32

Explanatory: categorical Outcome: continuous. Participants each used one system Measure a continuous value (time taken, pwd guess #) H A : System affects password strength Normal, continuous outcome (compare mean): 2 conditions: T-test 3+ conditions: ANOVA 33

Explanatory: categorical Outcome: continuous. Non-normal outcome, ordinal outcome Does one group tend to have larger values? 2 conditions: Mann-Whitney U (AKA Wilcoxon ranksum) 3+ conditions: Kruskal-Wallis 34

Outcome: Length of password 35

What about Likert-scale data? Respond to the statement: Ponies are magical. 7: Strongly agree 6: Agree 5: Mildly agree 4: Neutral 3: Mildly disagree 2: Disagree 1: Strongly disagree 36

What about Likert-scale data? Some people treat it as continuous (not good) Other people treat it as ordinal (better!) Difference 1-2 2-3 Use Mann-Whitney U / Kruskal-Wallis Another good option: binning (simpler) Transform into binary agree and not agree Use χ 2 or FET 37

Password meter annoying Control Visual Scoring Visual & Scoring baseline meter three-segment green tiny huge no suggestions text-only bunny half-score one-third-score nudge-16 nudge-comp8 text-only half-score bold text-only halfscore 38 38

Notes for study design Plan your analysis before you collect data! What explanatory, outcome variables? Which tests will be appropriate? Ensure that you collect what you need and know what do with it Otherwise your experiment may be wasted 39

CONTRASTS 40

Contrasts If you have more than two conditions, H 0 = the conditions are all the same H A = the conditions are not all the same Omnibus test If you accept the null, you are done ONLY if you reject this null, you may compare individual conditions to each other AKA Pairwise 41

Example: Password meters: 15 conditions Does assigned meter affect password strength? Omnibus test: yes Individual meter: Better than no meter? One meter better than another meter? 42

P values and multiple testing P-values bound Type I error (false positive) You expect this to happen 5% of the time if α = 0.05 What happens if you conduct a lot of statistical tests in one experiment? Your cumulative probability of a Type I error can increase dramatically! 43

Correcting p-values Goal: Adjust the math so your overall Type I error remains bounded by α = 0.05 Many methods for correcting p values Bonferroni correction: Easy but conservative (Multiply p values by the number of tests) Holm-Bonferroni is also frequently used 44

Planned vs. Unplanned Contrasts N-1 free planned contrasts Actually, really planned. No peeking at the data. Additional contrasts (planned or unplanned) require p-correction for multiple testing 45

Contrasts in the meters paper We ran pairwise contrasts comparing each condition to our two control conditions, no meter and baseline meter. In addition, to investigate hypotheses about the ways in which conditions varied, we ran planned contrasts comparing tiny to huge, nudge-16 to nudge-comp8, half-score to onethird-score, text-only to text-only half-score, halfscore to text-only half-score, and text-only halfscore to bold text-only half-score. 46

Continuous/ordinal data 47

Notes for study design Lots of conditions means lots of correction Which means you need big effect sizes or large N Consider limiting conditions What do you really want to test? Full-factorial or not? 48

Finding a relationship among variables CORRELATION, REGRESSION 49

Correlation Measure two numeric values Are they related? Pearson correlation Requires both variables to be normal Only looks for a linear relationship Often preferred: Spearman s rank correlation coefficient (Spearman s ρ) Evaluates a relationship s monotonicity Both variables get larger together 50

Regressions What is the relationship among variables? Generally one outcome (dependent variable) Often multiple factors (independent variables) The type of regression you perform depends on the outcome Binary outcome: logistic regression Ordinal outcome: ordinal / ordered regression Continuous outcome: linear regression 51

Example regression Outcome: Pass pony quiz (or not): Logistic Total score on pony quiz: Linear Independent variables: Age of pony Number of prior races Diet: hay or pop-tarts (code as eatshay=true/false) (Indicator variables for color categories) Etc. 52

What you get Linear: Outcome = ax 1 + bx 2 + c Score = 5*eatsHay - 3*age + 7 Logistic: Outcome is in log likelihood Intuition: probability of passing decreases with age, increases if ate hay, etc. 53

Interactions in a regression Normally, outcome = ax 1 + bx 2 + c + Interactions account for situations when two variables are not simply additive. Instead, their interaction impacts the outcome e.g., Maybe blue ponies, and only blue ponies, get a larger benefit from eating pop-tarts before the quiz Outcome = ax 1 + bx 2 + c + d(x 1 x 2 ) + 54

Example regression output 55

Notes for study design The more input variables in your regression, the more data you will need to collect to get useful results 56

Try it! In groups of 2-3 Does caffeine impact pony password strength? When strength = cracked or not cracked When strength = 0-100 scoring When strength = self-reported perception 1-5 Compare caffeine, NyQuil, placebo Do gender, state of residence, and education level impact pony password strength? 57

Non-independence, directional testing, effect size OTHER THINGS TO CONSIDER 58

What if you have lots of questions? If we ask 40 privacy questions on a Likert scale, how do we analyze this survey? One option: Add responses to get privacy score Make sure the scales are the same Reverse if needed (e.g., personal privacy is important to me I don t care if companies sell my data ) Important: Verify that responses are correlated! 59

Verifying correlation Usually preferred: Spearman s rank correlation coefficient (Spearman s ρ) Evaluates a relationship s monotonicity e.g., all variables get larger with privacy sensitivity 60

Another option: Factor analysis Evaluate underlying factors you are detecting You specify N, a number of factors Algorithm groups related questions (N groups) Each group is a factor Factor loadings measure goodness of correlation Questions loading primarily onto one factor are useful 61

In groups: Plan your analysis Does caffeine impact pony password strength? When strength = cracked or not cracked When strength = 0-100 scoring Compare caffeine, NyQuil, placebo Do gender, age, state of residence, and education level impact pony privacy concern? Concerned vs. unconcerned Privacy score by adding 30 questions 62

Independence Why might your data not be independent? Non-independent sample (bad!) The inherent design of the experiment (ok!) Example: Same ponies make passwords, before and after taking the caffeine pills Each pony cannot be independent of itself 63

Repeated measures AKA within subjects Measure the same participant multiple times Paired T-test Two samples per participant, two groups Repeated measures ANOVA More general 64

Hierarchy and mixed model For regressions, use a mixed model Intuition: Each pony s result driven by combo of individual skills, group characteristics, treatment effects Case 1: Many measurements of each pony Case 2: The ponies have some other relationship. e.g., all ponies attended 1 of 5 security camps. (You want to control for this, but not evaluate it.) 65

Directional testing If your hypothesis goes one way: Caffeinated ponies make stronger passwords. More power than more general tests BUT, must select direction BEFORE looking at data Won t reject null if there s a difference the other way Example: One-tailed T-test Use with caution! 66

Effect size Hypothesis test: Is there a difference? Also (more?) important: How big a difference? Findings can be significant but unimportant Factor Coef. Exp(coef) SE p-value login count <0.001 1.000 <0.001 <0.001 password fail rate -0.543 0.581 0.116 <0.001 gender (male) 0.078 0.925 0.027 0.005 engineering -0.273 0.761 0.048 <0.001 humanities -0.107 0.898 0.054 0.048 public policy 0.079 1.082 0.058 0.176 67

TOOLS 68

So how do I DO these tests? Excel: Very easy, but not very powerful Doesn t have many useful tests R: Most powerful, steepest learning curve Like Matlab but for stats Somewhat bizarre language/api/data representation Free and open-source (awesome add-on packages) SPSS: Graphical, also quite powerful Expensive ($25 student license from Terpware) Somewhat scriptable, not as flexible as R 69

R tutorials http://www.statmethods.net http://cyclismo.org/tutorial/r/ 70

Choosing a test http://webspace.ship.edu/pgmarr/geo441/statistical%20test %20Flow%20Chart.pdf http://abacus.bates.edu/~ganderso/biology/resources/ statistics.html http://bama.ua.edu/~jleeper/627/choosestat.html http://med.cmb.ac.lk/smj/volume%203%20downloads/ Page%2033-37%20-%20Choosing%20the%20correct %20statistical%20test%20made%20easy.pdf http://fwncwww14.wks.gorlaeus.net/images/home/news/ Flowchart2011.jpg 71