The Role of Statistics in Data Science, and Vice Versa

Similar documents
HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

Natural language processing implementation on Romanian ChatBot

Application for Admission

Consortium: North Carolina Community Colleges

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

Management Science Letters

arxiv: v1 [cs.dl] 22 Dec 2016

'Norwegian University of Science and Technology, Department of Computer and Information Science

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

part2 Participatory Processes

also inside Continuing Education Alumni Authors College Events

2014 Gold Award Winner SpecialParent

VISION, MISSION, VALUES, AND GOALS

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

DERMATOLOGY. Sponsored by the NYU Post-Graduate Medical School. 129 Years of Continuing Medical Education

Lecture 1: Machine Learning Basics

Early Warning System Implementation Guide

Mining Association Rules in Student s Assessment Data

Space Travel: Lesson 2: Researching your Destination

Rule Learning With Negation: Issues Regarding Effectiveness

OUTLINE OF ACTIVITIES

CS Machine Learning

EGE. Netspace/iinet. Google. Edmodoo. /enprovides. learning. page, provider? /intl/en/abou t. Coordinator. post in forums, on. message, Students to

SOFTWARE EVALUATION TOOL

Kentucky s Standards for Teaching and Learning. Kentucky s Learning Goals and Academic Expectations

Bluetooth mlearning Applications for the Classroom of the Future

Parnell School Parnell, Auckland. Confirmed. Education Review Report

A Finnish Academic Libraries Perspective on the Information Literacy Framework

A Case Study: News Classification Based on Term Frequency

Modeling user preferences and norms in context-aware systems

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

CMST 2060 Public Speaking

A cognitive perspective on pair programming

LIBRARY AND RECORDS AND ARCHIVES SERVICES STRATEGIC PLAN 2016 to 2020

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Editor s Welcome. Summer 2016 Lean Six Sigma Innovation. You Deserve More. Lean Innovation: The Art of Making Less Into More

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

& Jenna Bush. New Children s Book Authors. Award Winner. Volume XIII, No. 9 New York City May 2008 THE EDUCATION U.S.

Davidson College Library Strategic Plan

Extraordinary Eggs (Life Cycle of Animals)

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Guide to Teaching Computer Science

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Scholastic Leveled Bookroom

CEFR Overall Illustrative English Proficiency Scales

Lecture 2: Quantifiers and Approximation

12 th ICCRTS Adapting C2 to the 21st Century. COAT: Communications Systems Assessment for the Swedish Defence

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Evaluation of Learning Management System software. Part II of LMS Evaluation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

On-Line Data Analytics

Python Machine Learning

Measurement & Analysis in the Real World

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Section 1: Basic Principles and Framework of Behaviour

ITM2500 Spreadsheet & Database Productivity. Spreadsheet & Database Productivity

MMOG Subscription Business Models: Table of Contents

OVERALL PARKING January 24, 2017 NOTE: Initial Season Plan, Subject to

THESIS GUIDE FORMAL INSTRUCTION GUIDE FOR MASTER S THESIS WRITING SCHOOL OF BUSINESS

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

Vertical Teaming. in a small school

ESSENTIAL SKILLS PROFILE BINGO CALLER/CHECKER

Programme Specification. BSc (Hons) RURAL LAND MANAGEMENT

Setting the Scene: ECVET and ECTS the two transfer (and accumulation) systems for education and training

General study plan for third-cycle programmes in Sociology

Improving Fairness in Memory Scheduling

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Ben Kokkeler University of Twente 10 th September 2015 HEIR Network Conference University of the West of Scotland, Paisley

Curriculum Policy. November Independent Boarding and Day School for Boys and Girls. Royal Hospital School. ISI reference.

PROJECT DESCRIPTION SLAM

Prevent Teach Reinforce

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

B. How to write a research paper

Senior Project Information

Math Pathways Task Force Recommendations February Background

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

ACTION LEARNING: AN INTRODUCTION AND SOME METHODS INTRODUCTION TO ACTION LEARNING

6 Financial Aid Information

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Full text of O L O W Science As Inquiry conference. Science as Inquiry

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Ten Easy Steps to Program Impact Evaluation

ESC Declaration and Management of Conflict of Interest Policy

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

Promotion and Tenure standards for the Digital Art & Design Program 1 (DAAD) 2

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

St Philip Howard Catholic School

Rule Learning with Negation: Issues Regarding Effectiveness

Multimedia Application Effective Support of Education

Governors and State Legislatures Plan to Reauthorize the Elementary and Secondary Education Act

Digital Media Literacy

Finding Translations in Scanned Book Collections

Stakeholder Engagement and Communication Plan (SECP)

INSPIRE A NEW GENERATION OF LIFELONG LEARNERS

Transcription:

The Role of Statistics i Data Sciece, ad Vice Versa Jessica Utts Professor of Statistics Uiversity of Califoria, Irvie Presidet, America Statistical Associatio Nicholas Horto Professor of Statistics Amherst College

Some Issues for Discussio How does statistics (as a disciplie) view the emergig field of data sciece? What ca statisticias cotribute to data sciece? What elemets of statistics are essetial for data sciece educatio?

Overview ad History Statistics has evolved alog with techology ad the growth of data Statistics from the 1990s Statistics today! Foudatioal goal is the same ASA s visio statemet says it well: A world that relies o data ad statistical thikig to drive discovery ad iform decisios But methods for achievig that goal have chaged ad expaded

A Very Early Adopter: Joh Tukey 1962, Aals of Mathematical Statistics Idetified four drivig forces i the ew sciece : 1. The formal theories of statistics 2. Acceleratig developmets i computers ad display devices 3. The challege, i may fields, of more ad ever larger bodies of data 4. The emphasis o quatificatio i a ever wider variety of disciplies

A Less Early Adopter: Leo Breima, 2001 Statistical Modelig: The Two Cultures There are two cultures i the use of statistical modelig to reach coclusios from data. Oe assumes that the data are geerated by a give stochastic data model. The other uses algorithmic models ad treats the data mechaism as ukow. The statistical commuity has bee committed to the almost exclusive use of data models Algorithmic modelig, both i theory ad practice, has developed rapidly i fields outside statistics. It ca be used both o large complex data sets ad as a more accurate ad iformative alterative to data modelig o smaller data sets. If our goal as a field is to use data to solve problems, the we eed to move away from exclusive depedece o data models ad adopt a more diverse set of tools. (Statistical Sciece, 2001, with discussats)

A Side Commet David Dooho s 50 Years of Data Sciece (2015) is worth readig. His versio of Breima s 2 cultures: The Geerative [stochastic data] Modelig culture seeks to develop stochastic models which fit the data, ad the make ifereces about the data-geeratig mechaism based o the structure of those models. The Predictive [algorithmic] Modelig culture prioritizes predictio is effectively silet about the uderlyig mechaism geeratig the data, ad allows for may differet predictive algorithms, preferrig to discuss oly accuracy of predictio made by differet algorithms o various datasets.

Fast Forward 14 Years: ASA Statemet o Role of Data Sciece i Statistics, 2015 Idetifies foudatioal data sciece fields: Database maagemet Statistics ad machie learig Distributed ad parallel systems Ecourages greater, mutually beeficial collaboratio across these three fields Itersects with umerous disciplies ad related research areas

May ogoig discipliary collaboratios Some examples: Geomics (ad persoalized medicie) Health services research (electroic medical records) Busiess aalytics (customer trackig) Smart cities (ad sesor etworks) Astroomy (data streams) Ad others

ASA Statemet, Cotiued Notes that statistics educatio must evolve to meet eeds For example, address iclusio of data sciece i K-12, commuity college More later o other aspects of educatio Elucidates role of statistics i data sciece

From the ASA Statemet: The Role of Statistics Framig questios statistically allows researchers to leverage data resources to extract kowledge ad obtai better aswers. The cetral dogma of statistical iferece, that there is a compoet of radomess i data, eables researchers to formulate questios i terms of uderlyig processes ad to quatify ucertaity i their aswers. A statistical framework allows researchers to distiguish betwee causatio ad correlatio ad thus to idetify itervetios that will cause chages i outcomes.

The ASA Statemet, cotiued It also allows them to establish methods for predictio ad estimatio, to quatify their degree of certaity, ad to do all of this usig algorithms that exhibit predictable ad reproducible behavior. I this way, statistical methods aim to focus attetio o fidigs that ca be reproduced by other researchers with differet data resources. Simply put, statistical methods allow researchers to accumulate kowledge.

The Statistical Iquiry Cycle Wild ad Pfakuch, 1999, Iteratioal Statistical Review Problem, Pla, Data, Aalysis, Coclusios PPDAC CONCLUSIONS Iterpretatio Coclusios New Ideas Commuicatio ANALYSIS Data exploratio Plaed aalyses Uplaed Aalyses Hypothesis Geeratio DATA PROBLEM Graspig system dyamics Defiig Problem PLAN Measuremet System Samplig desig Data Maagemet Pilotig ad aalysis Data Collectio Data Maagemet Data Cleaig

How to carry out PPDAC? This scietific approach to statistical problem-solvig is importat for all data aalysts. It eeds to start i the first course ad be a cosistet theme i all subsequet courses. - America Statistical Associatio Guidelies for Udergraduate Programs i Statistics (2014), http:// www.amstat.org/asa/educatio/curriculum-guidelies-for- Udergraduate-Programs-i-Statistical-Sciece.aspx

How to carry out PPDAC? Workig with data requires extesive computig skills. To be prepared for statistics ad data sciece careers, studets eed facility with professioal statistical aalysis software, the ability to wragle data i various ways ad algorithmic problem-solvig. Studets should be fluet i higher-level programmig laguages ad facile with database systems. - America Statistical Associatio Guidelies for Udergraduate Programs i Statistics (2014), http://www.amstat.org/asa/educatio/ Curriculum-Guidelies-for-Udergraduate-Programs-i- Statistical-Sciece.aspx

How to carry out PPDAC? Statistical Methods ad Theory: Need to uderstad issues of desig, cofoudig, ad bias, have a foudatio i theoretical statistics priciples for soud aalyses, develop kowledge ad gai experiece applyig a variety of statistical methods, assess appropriateess of methods, ad commuicate results

How to carry out PPDAC? Data Wraglig ad Computatio: Need to be facile with professioal statistical software program i a higher-level laguage ad thik algorithmically, use simulatio-based statistical techiques ad udertake simulatio studies, maage ad wragle data, ad udertake aalyses i reproducible maer

How to carry out PPDAC? Statistical Practice ad Commuicatio: Need to write clearly, speak fluetly, ad costruct effective visual displays ad compellig summaries, demostrate ability to collaborate i teams ad to orgaize ad maage projects, icorporate ethical precepts ito all aspects of their work, ad commuicate complex statistical methods i basic terms to maagers ad other audieces

How to carry out PPDAC? Disciplie-Specific Kowledge: Need to apply statistical reasoig to domai-specific questios, traslate research questios ito statistical questios, ad commuicate results appropriate to differet discipliary audieces. Skills take from udergraduate guidelies, but relevat at other levels as well

Park City Group Report (2016) Curriculum Guidelies for Udergraduate Programs i Data Sciece (DeVeaux + 24 other authors) Data sciece as sciece Iterdiscipliary ature Data at the core Aalytical (computatioal ad statistical) thikig ad problem-solvig (New pathways for) mathematical foudatios Flexibility http://www.amstat.org/asa/files/pdfs/edu-datascieceguidelies.pdf

What do statisticias brig to the table? Importace of cotext Accoutig for variability Desig, cofoudig, ad aalysis of foud (observatioal) data Uderstadig of iferece, multiplicity ad reproducibility issues Statistical aalysis (PPDAC) cycle Log history of makig decisios with data Experiece workig o multidiscipliary teams

Some Issues for Discussio How does statistics (as a disciplie) view the emergig field of data sciece? What ca statisticias cotribute to data sciece? What elemets of statistics are essetial for data sciece educatio?