ILSAs The OECD Programme for International Student Assessment (PISA) Andreas Schleicher OECD Director for Education and Skills
PISA in brief Every three years since 2000, over half a million students - representing 15-year-olds in now over 80 countries - National sample sizes vary between 4000 and 30,000 students take an internationally agreed 2-hour test - Focus on mathematics, science and reading - Problem-solving, collaborative problem-solving, creative thinking, financial literacy and respond to questions on - their personal background, their schools, their well-being and their motivation Teachers, principals, parents and system leaders provide data on: - school policies, practices, resources and institutional factors that help explain performance differences
Some design choices and trade-offs ILSA s are complements, not substitutes of other research methods
Design choices and trade-offs Balancing investment in measurement of outcomes vs. measurement of covariates that can help to explain outcomes Improved questionnaires a priority Larger samples vs. higher quality of measurement Greater geographic granularity of results to compare apples with apples E.g. validity vs. efficiency or relevance vs. reliability Being open to generating new insights and hypothesis on the nature of relationships (fishing) vs. constraining design to answer specific questions (hunting) ILSA s are complements, not substitutes of other research methods In particular longitudinal components place constraints on design Balancing type I and type II errors
Design choices and trade-offs Measuring change vs changing the measures Every three years one of the frameworks is revised Bridging studies for content and delivery New measures are first explored through innovative assessment areas As comparable as possible but as country-specific as necessary Adaptive assessment instruments Greater investment in better and more modular context questionnaires Integration and links with national assessments Frameworks informed but not constrained by national standards and curricula Curriculum validation studies, extensive consensus building
Memorisation is less useful as problems become more difficult (OECD average) Greater success 1.00 Odds ratio Easy problem R² = 0.81 Less 0.70 success Source: Figure 4.3 Difficult problem 300 400 500 600 700 800 Complexity of of mathematics tasks on the PISA scale
Elaboration strategies are more useful as problems become more difficult (OECD average) Greater Odds ratio success 1.50 R² = 0.82 Difficult problem Less 0.80 success Easy problem 300 400 500 600 700 800 Complexity of mathematics tasks on the PISA scale Source: Figure 6.2
Interpretation and reporting Keeping things as simple as possible but as complex as necessary; and as comparable as possible but as country-specific as necessary
Things that can be addressed (and are) Comparable samples Sampling errors are now well within measurement errors Coverage issues addressed through supplementary studies (e.g. PISA-D) Meaningful units of comparisons Most federal countries now collect state-level PISA data Better instruments to facilitate and support use of PISA data Investment in communication Quality of media coverage has significantly evolved
Policy-uses and limitations The perfect can be the enemy of the good (and remember that without measurement policy-makers may just throw a coin)
A world without PISA One question is whether ILSA s meet the methodological requirements of some gold standard of social research another question is what the state of knowledge and policy development, both nationally and internationally, would have been without them. Some scientific revolutions have started with controlled experiments, many have started with better measurement
A world with PISA Seeing what is possible in education Helping policy-makers and educators to look outwards Placing national standards in a broader perspective Exposing grade inflation Contextualising curricular choices Lowering the political cost of action Raising the political cost of inaction Generating hypotheses
Science performance and equity in PISA (2015) Mean science performance 550 500 450 400 350 Higher perfomance Singapore Japan Macao (China) Chinese Tapei Finland Estonia Viet Nam Canada B-S-J-G (China) Slovenia New Zealand Netherlands Korea Hong Kong (China) Germany Australia United Kingdom Belgium Switzerland Portugal Poland Denmark France Austria Ireland United States Norway Czech Rep. Spain Sweden Latvia Russia Luxembourg Italy Hungary Croatia Lithuania Slovak Rep. Iceland Malta Israel Bulgaria Greece Chile Romania Uruguay Moldova Turkey United Arab Emirates Trinidad and Tobago Costa Rica Thailand Colombia Mexico Qatar Jordan Montenegro Indonesia Brazil Peru Tunisia Lebanon FYROM Kosovo Algeria Dominican Rep. (332) Greater socio-economic More equity equity Some countries combine excellence with equity
Brazil: School performance and schools socio-economic profile Score points 700 600 Lev 6 Level 5 Level 4 Public schools Private schools 500 Level 2 Level 3 400 Level 1a 300 Level 1b 200 Below 1b -3-2 -1 0 1 2 3 PISA index of economic, social and cultural status
Viet Nam: School performance and schools socio-economic profile Score points 700 600 Lev 6 Level 5 Level 4 Public schools Private schools 500 Level 2 Level 3 400 Level 1a 300 Level 1b 200 Below 1b -3-2 -1 0 1 2 3 PISA index of economic, social and cultural status
Brazil: School performance and schools socio-economic profile Score points 700 600 Lev 6 Level 5 Level 4 Public schools Private schools 500 Level 2 Level 3 400 Level 1a 300 Level 1b 200 Below 1b -3-2 -1 0 1 2 3 PISA index of economic, social and cultural status
Comparing apples with apples and oranges with oranges PISA math performance by decile of social background PISA mathematics performance High math performance Low math performance Mathematics performance of the 10% most privileged American 15-year-olds (~Japan) Mathematics performance of the 10% most disadvantaged American 15-year-olds (~Mexico)
Poverty is not destiny Learning outcomes by international deciles of the PISA index of economic, social and cultural status (ESCS) 630 Figure I.6.7 580 Bottom decile Second decile Middle decile Ninth decile Top decile 530 OECD median student Score points 480 430 380 330 280 % of students in the bottom international deciles of ESCS Dominican Republic 40 Algeria 52 Kosovo 10 Qatar 3 FYROM 13 Tunisia 39 Montenegro 11 Jordan 21 United Arab Emirates 3 Georgia 19 Lebanon 27 Indonesia 74 Mexico 53 Peru 50 Costa Rica 38 Brazil 43 Turkey 59 Moldova 28 Thailand 55 Colombia 43 Iceland 1 Trinidad and Tobago 14 Romania 20 Israel 6 Bulgaria 13 Greece 13 Russia 5 Uruguay 39 Chile 27 Latvia 25 Lithuania 12 Slovak Republic 8 Italy 15 Norway 1 Spain 31 Hungary 16 Croatia 10 Denmark 3 OECD average 12 Sweden 3 Malta 13 United States 11 Macao (China) 22 Ireland 5 Austria 5 Portugal 28 Luxembourg 14 Hong Kong (China) 26 Czech Republic 9 Poland 16 Australia 4 United Kingdom 5 Canada 2 France 9 Korea 6 New Zealand 5 Switzerland 8 Netherlands 4 Slovenia 5 Belgium 7 Finland 2 Estonia 5 Viet Nam 76 Germany 7 Japan 8 Chinese Taipei 12 B-S-J-G (China) 52 Singapore 11
Generating hypotheses Making policy alternatives visible
Category 1. Analyse policy variation across countries Country-level correlation and partial correlation Three-level regression models
Spending per student from the age of 6 to 15 and science performance Figure II.6.2 Science performance (score points) 600 550 500 450 400 Australia Germany Slovenia Japan Chile Czech Rep. Korea Spain Canada Ireland New Zealand Poland Israel France Italy Croatia Latvia Slovak Rep. Portugal Lithuania Costa Rica Hungary Estonia Russia Belgium Mexico Brazil Bulgaria Uruguay Thailand Chinese Montenegro Dominican Taipei Republic Colombia 11.7, 411 Turkey Georgia R² = 0.41 Netherlands Singapore United Kingdom Finland Iceland Austria Norway Denmark United States Malta Sweden R² = 0.01 Switzerland Luxembourg 350 Peru 300 0 20 40 60 80 100 120 140 160 180 200 Average spending per student from the age of 6 to 15 (in thousands USD, PPP)
Student-teacher ratios and class size Figure II.6.14 Student-teacher ratio 30 25 20 15 10 High student-teacher ratios and small class sizes OECD average Switzerland Finland Belgium Denmark Malta Netherlands United States Russia Poland OECD average Peru Kosovo Hungary Albania Dominican Republic Algeria Jordan Chile Hong Kong (China) Singapore Colombia Brazil Mexico Chinese Taipei Viet Nam Macao (China) Georgia Japan Thailand R² = 0.25 B-S-G-J (China) CABA (Argentina) Low student-teacher ratios and large class sizes Turkey 5 15 20 25 30 35 40 45 50 Class size in language of instruction
Learning time and science performance Figure II.6.23 PISA science score 600 550 500 450 400 350 OECD average Finland Germany Japan Estonia Macao (China) New Zealand Netherlands Switzerland Sweden Uruguay Iceland OECD average Israel Bulgaria Hong Kong (China) Singapore Chinese Taipei Korea Poland United States Russia Italy Greece Mexico Colombia Costa Rica Brazil Chile Turkey Montenegro Peru Dominican Republic Qatar Thailand B-S-J-G (China) OECD average Tunisia R² = 0.21 United Arab Emirates 300 35 40 45 50 55 60 Total learning time in and outside of school
Learning time and science performance Figure II.6.23 Hours Intended learning time at school (hours) Study time after school (hours) Score points in science per hour of total learning time 70 60 50 40 30 20 10 0 Finland Germany Switzerland Japan Estonia Sweden Netherlands New Zealand Australia Czech Republic Macao (China) United Kingdom Canada Belgium France Norway Slovenia Iceland Luxembourg Ireland Latvia Hong Kong (China) OECD average Chinese Taipei Austria Portugal Uruguay Lithuania Singapore Denmark Hungary Poland Slovak Republic Massachusets Spain Croatia United States Israel Bulgaria Korea Russia Italy Greece B-S-J-G (China) Colombia Chile Mexico Brazil Costa Rica Turkey Montenegro Peru Qatar Thailand United Arab Emirates Tunisia Dominican Republic 16 15 14 13 12 11 10 9 8 7 6 Score points in science per hour of total learning time
Category 2. Analyse policy variation within countries Regression models, by country School fixed-effects models Student fixed-effects models
Differences in educational resources between advantaged and disadvantaged schools Figure I.6.14 Mean index difference between advantaged and disadvantaged schools 1 1 0-1 -1-2 -2-3 Disadvantaged schools have more resources than advantaged schools Disadvantaged schools have fewer resources than advantaged schools Index of shortage of educational material Index of shortage of educational staff CABA (Argentina) Mexico Peru Macao (China) United Arab Emirates Lebanon Jordan Colombia Brazil Indonesia Turkey Spain Dominican Republic Georgia Uruguay Thailand B-S-J-G (China) Australia Japan Chile Luxembourg Russia Portugal Malta Italy New Zealand Croatia Ireland Algeria Norway Israel Denmark Sweden United States Moldova Belgium Slovenia OECD average Hungary Chinese Taipei Viet Nam Czech Republic Singapore Tunisia Greece Trinidad and Tobago Canada Romania Qatar Montenegro Kosovo Netherlands Korea Finland Switzerland Germany Hong Kong (China) Austria FYROM Poland Albania Bulgaria Slovak Republic Lithuania Estonia Iceland Costa Rica United Kingdom Latvia
Category 5. Analyse ILSA data that follow the same students over time. Canadian YITS study
Canadian PISA/YITS study Learning beyond Fifteen Growth in reading skills Low 55 Improvements in reading skills between the ages of 15 and 24, by individual and family-related factors associated with skills at age 15 621 586 596 606 618 591 575 572 568 572 530 537 551 529 515 506 High 60 Sense of mastery Low 55 High 62 Family educational support Low 60 High 49 Parental cultural communication Disadvantaged 62 Advantaged 46 Socio-economic background PISA-24 PISA-15
In conclusion ILSA s can help policy-development in many ways Seeing what is possible in education Helping policy-makers and educators to look outwards Putting national standards in a broader perspective Exposing grade inflation Lowering the political cost of action Raising the political cost of inaction Generating hypotheses See ILSA s as complements, not substitutes of other research methods Don t overload ILSA s with unrealistic expectations and avoid time and energy traps Don t underplay but also don t overplay RCTs Neuroscience, big data, predictive analytics Keep limitations in mind when reporting results
Thank you Find out more about our work at www.oecd.org/pisa All publications The complete micro-level database Email: Andreas.Schleicher@OECD.org Twitter: SchleicherOECD Wechat: AndreasSchleicher
Multi-dimensional constructs
Comparing countries and economies on the different science knowledge subscales Figure I.2.30 Chinese Taipei Overall science scale, 532 Procedural and epistemic knowledge, 528 Content knowledge, 538 480 490 500 510 520 530 540 550 560 Score points
Comparing countries and economies on the different science knowledge subscales Figure I.2.30 Singapore Overall science scale, 556 Content knowledge, 553 Procedural and epistemic knowledge, 558 Chinese Taipei Overall science scale, 532 Procedural and epistemic knowledge, 528 Content knowledge, 538 480 490 500 510 520 530 540 550 560 Score points
Comparing countries and economies on the different science knowledge subscales Figure I.2.30 Singapore Overall science scale, 556 Content knowledge, 553 Procedural and epistemic knowledge, 558 Chinese Taipei Overall science scale, 532 Procedural and epistemic knowledge, 528 Content knowledge, 538 Austria Overall science scale, 495 Procedural and epistemic knowledge, 490 Content knowledge, 501 440 460 480 500 520 540 560 Score points
Gender The difference is not how good they are at science but in their attitudes to science
Figure I.2.29 Boys' and girls' strengths and weaknesses in science Score-point difference (boys - girls) 25 20 15 10 5 0-5 -10-15 Science Explaining phenomena scientifically It is harder for boys, on average, to perform well on these types of tasks... Evaluating and designing scientific enquiry Science competencies Interpreting data and evidence scientifically Content knowledge Procedural and epistemic knowledge Knowledge types Physical systems Living systems Content areas Earth and space
Figure I.2.29 Top-performing boys' and girls' strengths and weaknesses Score-point difference (boys - girls) 25 20 15 10 5 0-5 -10-15 Science Explaining phenomena scientifically...but the highestachieving boys perform better than the highest-achieving girls on all types of tasks, including these Evaluating and designing scientific enquiry Science competencies Interpreting data and evidence scientifically Content knowledge Procedural and epistemic knowledge Knowledge types Physical systems Living systems Content areas Earth and space
Figure I.2.29 Bottom-performing boys' and girls' strengths and weaknesses 4 Score-point difference (boys - girls) 2 0-2 -4-6 -8-10 -12-14 -16... It is harder for girls to perform well on these types of tasks, even among low achievers Science Explaining phenomena scientifically Evaluating and designing scientific enquiry Interpreting data and evidence scientifically Content knowledge Procedural and epistemic knowledge Physical systems Living systems Earth and space Science competencies Knowledge types Content areas