Executive Data Science Education

Similar documents
A Case Study: News Classification Based on Term Frequency

CS Machine Learning

Measurement & Analysis in the Real World

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Learning From the Past with Experiment Databases

Visit us at:

ARTS ADMINISTRATION CAREER GUIDE. Fine Arts Career UTexas.edu/finearts/careers

Math 96: Intermediate Algebra in Context

How To: Structure Classroom Data Collection for Individual Students

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Python Machine Learning

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Disciplinary Literacy in Science

Lecture 1: Basic Concepts of Machine Learning

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Envision Success FY2014-FY2017 Strategic Goal 1: Enhancing pathways that guide students to achieve their academic, career, and personal goals

ECE-492 SENIOR ADVANCED DESIGN PROJECT

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unit 3. Design Activity. Overview. Purpose. Profile

Success Factors for Creativity Workshops in RE

PROVIDENCE UNIVERSITY COLLEGE

Business 712 Managerial Negotiations Fall 2011 Course Outline. Human Resources and Management Area DeGroote School of Business McMaster University

Mini Lesson Ideas for Expository Writing

MGT/MGP/MGB 261: Investment Analysis

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

Finding, Hiring, and Directing e-learning Voices Harlan Hogan, E-learningvoices.com

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

On-Line Data Analytics

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Computer Organization I (Tietokoneen toiminta)

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Reducing Features to Improve Bug Prediction

CS 100: Principles of Computing

Multivariate k-nearest Neighbor Regression for Time Series data -

STABILISATION AND PROCESS IMPROVEMENT IN NAB

MAE Flight Simulation for Aircraft Safety

What is Thinking (Cognition)?

Georgia Tech College of Management Project Management Leadership Program Eight Day Certificate Program: October 8-11 and November 12-15, 2007

(Sub)Gradient Descent

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Rule Learning With Negation: Issues Regarding Effectiveness

Tools to SUPPORT IMPLEMENTATION OF a monitoring system for regularly scheduled series

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

Writing the Personal Statement

Curriculum Vitae IMAD A. ELHAJ

Len Lundstrum, Ph.D., FRM

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Local Activism: Identifying Community Activists (2 hours 30 minutes)

Software Development Plan

Intervention in Struggling Schools Through Receivership New York State. May 2015

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

UNA PROFESSIONAL ACCOUNTING PREP PROGRAM

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

International Business Principles (MKT 3400)

Diploma in Library and Information Science (Part-Time) - SH220

Community Power Simulation

BUSINESS FINANCE 4265 Financial Institutions

LOYOLA CATHOLIC SECONDARY SCHOOL FEB/MARCH 2015

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

BSM 2801, Sport Marketing Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

i didnt do my homework poem

Hongyan Ma. University of California, Los Angeles

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

New Venture Financing

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Grade 7 - Expansion of the Hudson s Bay Company: Contributions of Aboriginal Peoples in Canada

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Augusta University MPA Program Diversity and Cultural Competency Plan. Section One: Description of the Plan

Examining the Structure of a Multidisciplinary Engineering Capstone Design Program

Generative models and adversarial training

Managing Sustainable Operations MGMT 410 Bachelor of Business Administration (Sustainable Business Practices) Business Administration Program

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

Multi-label classification via multi-target regression on data streams

Introduction to Simulation

Evaluation of Learning Management System software. Part II of LMS Evaluation

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Comprehensive Program Review Report (Narrative) College of the Sequoias

Firms and Markets Saturdays Summer I 2014

Lecture 1: Machine Learning Basics

Student Feedback Analysis Report

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Teacher Action Research Multiple Intelligence Theory in the Foreign Language Classroom. By Melissa S. Ferro George Mason University

FINN FINANCIAL MANAGEMENT Spring 2014

Aviation English Solutions

MBA 5652, Research Methods Course Syllabus. Course Description. Course Material(s) Course Learning Outcomes. Credits.

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Integrating simulation into the engineering curriculum: a case study

Law Professor's Proposal for Reporting Sexual Violence Funded in Virginia, The Hatchet

Australian Journal of Basic and Applied Sciences

TU-E2090 Research Assignment in Operations Management and Services

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Unit 7 Data analysis and design

Measurement. When Smaller Is Better. Activity:

Transcription:

Executive Data Science Education Claudia Perlich Chief Scientist @claudia_perlich 1

Introductions Master from CU Boulder in Computer Science Master from TU Darmstadt in Computer Science PhD from Stern School of Business in Information Systems IBM T.J. Watson Research Lab: Research Staff member Chief Scientist at Dstillery Adjunct Professor at Stern NYU 2

Data Mining for Business Intelligence Course is offered since 2003 Since 2015 2 Tracks: Managerial & Technical Managerial (MBA Elective) No coding required Use WEKA Many cases covered Focus on managing DS Project 3

Course Format weekly 3 hour lectures 6 homeworks that are heavily scripted to enable project 2-3 guest speakers 2 hour take home final Project Find your own predictive problem & data Solve it (using any tool of your choice) Demonstrate business value 10-15 pages written report (in class presentation) 4

Terminology Methods Broad Outline Supervised & some unsupervised learning Model evaluation Importance Applications Managing DS Deployment Hiring & Interviews Proposal evaluation 5

6 Syllabus Example from Fall 2016

Typical Class Format 60% technical material Occasional demonstration using WEKA 40% case discussion 10% business case brainstorming 30% teacher presentation of solution 10% critical evaluation of impact 7

High-level Goals of the class 1. Approach business problems data-analytically Think carefully & systematically about whether & how data can improve performance 2. Be able to interact competently on the topic of data mining for business intelligence Know the basics of data mining processes, algorithms, & systems well enough 3. Receive hands-on experience mining data You should be able to follow up on ideas or opportunities that present themselves 8

Managerial Objectives Appreciate the hardships of good DS Data prep is key and takes as long as it takes Sometimes the data does not support the solution of the problem Recognize DS opportunities in your business Only some problems are DS problems Evaluate DS BS detection Train to think backwards from the problem, not forwards from the data How to get started with DS in a (small) company.. How to hire a data scientist (what to look for) 9

Course Philosophy Problem? Data Algorithms 10

11 What is Good Data Science? Problem Data Algorithms 11

12 Case Discussion 3 types of cases/example projects: Published work with technical details and business impacts. Please read them as best as you can (some details might be beyond your current knowledge) Brainstorming problems and possible solutions. Think about the problem and possible solutions. Be prepared to present your ideas in class and defend them Problems I have worked on that demonstrate some concept from class 12

13 Cases Instructions 1. Reframe in your mind the issue and why it might be relevant/important, ask questions (and possibly answer them.) 2. What are possible actions your can take in your role and specifically what (micro) decision/choices do they translate to? 3. How do you measure better decisions? 4. How can the data you have (or should have) help you make those decisions better? 5. Can you evaluate your strategy BEFORE implementing it? 6. What are the alternative baseline strategies you should compare to? 13

14 Case Example: Direct Mailing campaign You are working for your favorite non-profit organization. They have been running bi-annual mailing campaigns soliciting monetary donations for the past 5 years based on a database of existing donors. Historically they sent a letter to everybody. You are in charge of running this years mailing campaign. You are wondering if you can do better 14

15 What makes for a good Project Problem? You can take some action You have some hope that a predictive model can add some value You are likely to be able to demonstrate business value by simulating a decision 15

16 Bad Project ideas Flight delay Box office return NYC traffic accidents Stock returns 16

Putting it to the test: Proposal Evaluation What exactly is the business problem to be solved and the action to be taken? What business entity does an instance/example correspond to? Is a target variable defined? Supervised vs. unsupervised If so, is it define precisely? Think about the values it can take Are the attributes defined precisely? Think about the values they can take Will modeling this target variable solve/improve the stated business problem? Will it be reasonably possible to get values for attributes and put them into a single table? Will it be reasonably possible to get values for the target variable (for training) and put them into the table? How exactly does one acquire values for the target variable? Is there any cost involved? If so, is it taken into account? Is holdout data used? cross-validation is one technique Is there a plan for domain-knowledge validation? Is the evaluation setup and metric appropriate for the business task? E.g., are business costs/benefits taken into account? For classification, how is threshold chosen? Are probability estimates used directly? Is ranking more appropriate (e.g., for a fixed budget)? Will deployment as planned actually (best) address stated business problem? Is the choice of model appropriate for the choice of target variable? Classification, regression Does the model/modeling technique meet the other requirements of the task? Accuracy, comprehensibility, speed of learning, speed of application, amount of data required, type of data, missing values, fit with knowledge of problem (e.g., definitely non-linear) See chart on next slide Should various models be tried and compared (in evaluation)? 17

Things Students struggle with most Recognizing a predictive modeling problem How good is good? Value of good baselines Translating a model into an action Precise language ( accuracy ) Data preparation for project is always late 18

19 Thank You!