Why OUT-OF-LEVEL Testing? 2017 CTY Johns Hopkins University

Why OUT-OF-LEVEL Testing?

BEFORE WE GET STARTED Welcome and introductions Today s session will last about 20 minutes Feel free to ask questions at any time by speaking into your phone or by using the Q&A feature at the top of your screen Please press *6 to mute your phone; #6 will unmute your phone Copies of the slides from today s presentation will be available from the web page you will be directed when we conclude the session

AGENDA What is Psychometrics Roles as a Psychometrician Mandatory Testing Impact of NCLB Rationale for Testing Questions?

WHAT IS PSYCHOMETRICS Psychology/Education/Statistics/ Research design Definition Reliability and validity Broad and specific simultaneously How did I get interested? Teaching NCLB Where are the white hats? Math background Ph.D. (Fordham University) A rose by any other name

ROLE OF PSYCHOMETRICS Current Role Test development Security Test score analyses Technology Industry standards Prior Role Educational Testing Service (ETS) Automated scoring Quality control Analyses (SAS)

MANDATORY TESTING Why so Frequent? No Child Left Behind (NCLB) 2001 Federal mandate ($) / State administered tests Grades 3-8, High School Proficiency 2013/2014 Adequate Yearly Progress (AYP) School and subgroup average States set proficiency levels Qualified teachers (core subjects) Bipartisan act

INTENDED IMPACT Purpose of NCLB Hidden protected groups Achievement gap Increase standards International standing Global economy Mixed results

UNINTENDED IMPACT Purpose of NCLB Additional testing Local Levels Formative / Summative Non-core subjects Test prep resources

PERFORMANCE IMPACT - Domestic National Assessment of Educational Progress (NAEP) 1969 1990 1996 Grades 4, 8, and 12 Subjects Core (RRR + S) Supplementary (Humanities + TEL) Results Prior trend continued Recent plateau Efficacy Multiple state tests Prep proof

PERFROMANCE IMPACT International Trends in Mathematics and Science Assessment (TIMSS) 1995 Grades 4 and 8 Namesake Every four years Results Slightly above average Upward trend

PERFORMANCE IMPACT International Program for International Student Assessment (PISA) 2000 15 year olds Math, science, and reading Every three years Results Average Flat trend

ACTUAL IMPACT A Psychometrician s Warning Statistics of interest Rankings Scores Inferential statistics Significance of difference

ACTUAL IMPACT - Subgroups Achievement Gap Decreasing Still exists Still significant

Purpose RATIONALE FOR TESTING NCLB: Accountability Students Teachers Schools/Districts Others? Basic types Diagnostic (Past) Clinical / Classroom Achievement (Present) Formative / Benchmark / Summative Aptitude (Future)

RATIONALE FOR TESTING Scope of Norms Ends of distribution Graphic Frequency Test Score

RATIONALE FOR TESTING Knowledge v aptitude Testing as an authentic assessment Speed Familiarity Position on Test Prep

SUMMARY Is testing occurring too frequently? Resources Preference Is the frequency worth the effort? TBD Trendy Can it be minimized? Sampling options Level of interest (Purpose)

FINAL THOUGHTS Is testing inherently evil? Why should anyone be tested? Purpose Purpose Purpose Can it be minimized? Is research a critical element of testing? Reliability Validity

QUESTIONS? Frank Williams fwilli34@jhu.edu