Machine Learning and Development Policy

Similar documents
Lecture 1: Machine Learning Basics

Introduction to Causal Inference. Problem Set 1. Required Problems

Writing Research Articles

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probability and Statistics Curriculum Pacing Guide

Rule-based Expert Systems

MYCIN. The MYCIN Task

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

12- A whirlwind tour of statistics

Universityy. The content of

Axiom 2013 Team Description Paper

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Lecture 1: Basic Concepts of Machine Learning

Executive Guide to Simulation for Health

Short vs. Extended Answer Questions in Computer Science Exams

Dangerous. He s got more medical student saves than anybody doing this kind of work, Bradley said. He s tremendous.

Introduction to Simulation

Assignment 1: Predicting Amazon Review Ratings

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Tun your everyday simulation activity into research

Office Hours: Mon & Fri 10:00-12:00. Course Description

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Occupational Therapy and Increasing independence

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

A BOOK IN A SLIDESHOW. The Dragonfly Effect JENNIFER AAKER & ANDY SMITH

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Program Rating Sheet - University of South Carolina - Columbia Columbia, South Carolina

Comparison of network inference packages and methods for multiple networks inference

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report

(Sub)Gradient Descent

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Critical Thinking in Everyday Life: 9 Strategies

WELCOME! Of Social Competency. Using Social Thinking and. Social Thinking and. the UCLA PEERS Program 5/1/2017. My Background/ Who Am I?

Statewide Framework Document for:

GDP Falls as MBA Rises?

Exam Centre Contingency and Adverse Effects Policy

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

English Language Arts Missouri Learning Standards Grade-Level Expectations

w o r k i n g p a p e r s

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Laboratorio di Intelligenza Artificiale e Robotica

What s in Your Communication Toolbox? COMMUNICATION TOOLBOX. verse clinical scenarios to bolster clinical outcomes: 1

THE UNIVERSITY OF WESTERN ONTARIO. Department of Psychology

Law Professor's Proposal for Reporting Sexual Violence Funded in Virginia, The Hatchet

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

Science Fair Project Handbook

ReFresh: Retaining First Year Engineering Students and Retraining for Success

A Case Study: News Classification Based on Term Frequency

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Competition in Information Technology: an Informal Learning

PEER EFFECTS IN THE CLASSROOM: LEARNING FROM GENDER AND RACE VARIATION *

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Python Machine Learning

SIMPLY THE BEST! AND MINDSETS. (Growth or fixed?)

A Comparison of Standard and Interval Association Rules

CLASSROOM MANAGEMENT INTRODUCTION

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Between. Art freak. and. school freak. Lupes Facilitator : A magic teacher

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Master of Statistics - Master Thesis

Can Money Buy Happiness? EPISODE # 605

Course Law Enforcement II. Unit I Careers in Law Enforcement

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Laboratorio di Intelligenza Artificiale e Robotica

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

FREE COLLEGE Can Happen to You!

UNIT IX. Don t Tell. Are there some things that grown-ups don t let you do? Read about what this child feels.

TEAM-BUILDING GAMES, ACTIVITIES AND IDEAS

College Pricing and Income Inequality

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

UW RICHLAND. uw-richland richland.uwc.edu

Large Kindergarten Centers Icons

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

To tell the TRUTH: Dealing with Negativity in the Workplace

West s Paralegal Today The Legal Team at Work Third Edition

College of Agriculture / K-State Research and Extension

Should a business have the right to ban teenagers?

Red Flags of Conflict

Forget catastrophic forgetting: AI that learns after deployment

The Relationship Between Tuition and Enrollment in WELS Lutheran Elementary Schools. Jason T. Gibson. Thesis

Dentist Under 40 Quality Assurance Program Webinar

The Evaluation of Students Perceptions of Distance Education

Transcription:

Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer)

Magic? Hard not to be wowed But what makes it tick? Could that be used elsewhere? In my own work?

AI Approach We do it perfectly. Introspect how Program that up

Programming For each review make a vector of words Figure out whether it has positive words and negative words Count

Trying to Copy Humans Brilliant Dazzling Cool Gripping Moving 60% Bad Suck Cliched Slow Awful

This Approach Stalled Trivial problems proved impossible Marvin Minsky once assigned "the problem of computer vision" as a summer project Forget about the more complicated problems like language

What is the magic trick? Make this an empirical exercise Collect some data Look at what combination of words predicts being a good review Example dataset: 2000 movie reviews 1000 good and 1000 bad reviews

Learning not Programming Love Superb STILL 95% Bad Stupid Great? Worst! Pang, Lee and Vaithyanathan

Machine learning Turn any intelligence task into an empirical learning task Specify what is to be predicted Specify what is used to predict it Y = {0, 1} {z } Positive? X = {0, 1} k {z } Word Vector ˆf = argmin f E[L(f(x),y)]

Machine learning Turn any intelligence task into an empirical learning task Specify what is to be predicted Specify what is used to predict it Underneath most machine intelligence you see Not a coincidence that ML and big data arose together

Wonderful Great that engineers discovered the 100+ year old field of statistics! We ve been estimating functions from data for a long time In part true But far from the full story

Pang, Lee and Vaithyanathan

Machine Learning High dimensional statistical procedure Despite this machine learning can do well It s no surprise we fit well We fit well out of sample

So Estimation Fit Y with X Machine Learning Fit Y with X out of sample Low dimensional High dimensional JUST BETTER?

Data S n =(y i,x i ) iid Reviews Function Class F Vector of Words Estimation ˆ Estimates ˆf [f, f] Unbiased functions E S [ ˆf A,S ]=f = E[y x] {z } Right ˆf Sen9ment Predictor Data size Information going in Thousands? Estimates Information coming out Tens of Thousands How can we get more information out than we re putting in?

Data S n =(y i,x i ) iid Reviews Function Class F Vector of Words Estimation ˆ Estimates ˆf [f, f] Unbiased functions E S [ ˆf A,S ]=f = E[y x] {z } Right ˆf Sen9ment Predictor Do we need this?

Unbiased functions E S [ ˆf A,S ]=f = E[y x] {z } Right ˆf Gets more out? Put more in

Prediction and Estimation Estimation Adjudicate between variables Confidence intervals around coefficients Prediction Do not adjudicate Arbitrary choices to deal with covariance

Estimation vs Prediction Estimation Strict assumptions about data generating process Prediction Allow for flexible functional forms Back out parameters Get individual predictions Low dimensional Do not adjudicate between observably similar functions (variables) ˆβ ŷ i

Great for Engineers. Much of what we do is inference of coefficiens In fact we fret about causal inference What use is a procedure where even the coefficients aren t meaningful?

Applications of Machine Learning New Data Prediction in Policy

Applications of Machine Learning New Data Prediction in Policy

An Example New Data

Xie et. al. (2016)

What does this have to do with ML? Processing of data requires machine learning How do you relate luminosity to income levels?

Crop Yield

Cell Phone Data Blumenstock et. al. 2015

Blumenstock et. al. (2015)

New Kinds of Data Measurement has always played a central role in development These new data give us a new way to measure Not the depth that Morduch will discuss tomorrow But breadth And a very different look at life.

Applications of Machine Learning New Data Prediction in Policy

Applications of Machine Learning New Data Prediction in Policy

Question Can prediction be directly useful in policy? These decisions seem inherently causal Should we do policy X? What will X do? What happens with and without X? In fact decisions seem inherently causal

Two Toy Policy Decisions Rain Dance Causa9on ˆ Umbrella Predic9on ŷ Common Elements Both are decisions with payoffs Both rely on data of the type: Y = rain, X = variables correlated with rain Both use data to estimate function y = f(x)

Framework X Atmospheric Condi9ons Decision X 0 Rain Dance Y Rain Causa9on

Framework X Atmospheric Condi9ons Decision X 0 Umbrella Y Rain Predic9on

Atmospheric Condi9ons Atmospheric Condi9ons X X X 0 X 0 Rain Dance Y Rain Umbrella Y Rain Causa9on Predic9on Experiments Machine Learning

X X 0 Causation Y dπ = Π (Y ) + Π dx 0 X 0 Y Y X 0 Prediction Prediction Causation

Are there Umbrella Problems? Decisions where predictions matter Where we can have big social impact And with enough data Prediction policy problems

Prediction

A Policy Problem in the US Each year police make over 12 million arrests Where do people wait for trial? Release vs. detain high stakes Pre-trial detention spells avg. 2-3 months (can be up to 9-12 months) Nearly 750,000 people in jails in US Consequential for jobs, families as well as crime Kleinberg Lakkaraju Leskovec Ludwig and Mullainathan

Judge s Problem Judge must decide whether to release or not (bail) Defendant when out on bail can behave badly: Fail to appear at case Commit a crime The judge is making a prediction

Prediction Policy Problem Large dataset of decisions Build a prediction algorithm

Build a Decision Aid? Simplest aid: safeguard

Build a Decision Aid? Simplest aid: safeguard Re-ranking?

Bail Not Unique Pure prediction problems: Poverty targeting (Adelmen et. al. 2016) Retail crystal ball: Weather and yield prediction (Rosenzweig and Udry 2013) Teacher selection (Predict non-attendance?) Pseudo-prediction problems Treatment effects depend on risk Predict risk Target high risk pregnancies for hospital delivery

Key Inputs Problem: Prediction affects deicision Individual, micro decisions Inputs Reasonable individual data Large samples (10,000+?)

Conclusion Fortunate enough to see two large changes in policy: RCTs Behavioral economics I think this will be the next one Three papers I ve drawn on: Machine Learning: An Econometric Approach, wih Jann Spiess, Journal of Economic Perspectives, forthcoming. Human Decisions and Machine Predictions, with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, and Jens Ludwig Targeting Poverty by Predicting Poverty: Using Machine Learning in Targeted Transfer Program, with Melissa Adelman, Jonathan Glidden, Paul Niehaus, and Jack Willis.