Syllabus Data Mining for Business Analytics - Managerial INFO-GB.3336, Spring 2018

Similar documents
Python Machine Learning

Learning From the Past with Experiment Databases

Rule Learning With Negation: Issues Regarding Effectiveness

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS Machine Learning

Applications of data mining algorithms to analysis of medical data

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

Assignment 1: Predicting Amazon Review Ratings

A Case Study: News Classification Based on Term Frequency

Introduction to Forensic Drug Chemistry

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Introduction to Psychology

Probabilistic Latent Semantic Analysis

University of Massachusetts Lowell Graduate School of Education Program Evaluation Spring Online

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Ryerson University Sociology SOC 483: Advanced Research and Statistics

On-Line Data Analytics

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Mining Association Rules in Student s Assessment Data

Department of Legal Assistant Education THE SOONER DOCKET. Enroll Now for Spring 2018 Courses! American Bar Association Approved

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

STA 225: Introductory Statistics (CT)

Lecture 1: Basic Concepts of Machine Learning

Universidade do Minho Escola de Engenharia

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

(Sub)Gradient Descent

Class Mondays & Wednesdays 11:00 am - 12:15 pm Rowe 161. Office Mondays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Course Content Concepts

MGMT 5303 Corporate and Business Strategy Spring 2016

Mathematics Program Assessment Plan

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CTE Teacher Preparation Class Schedule Career and Technical Education Business and Industry Route Teacher Preparation Program

Content-based Image Retrieval Using Image Regions as Query Examples

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Policy for Hiring, Evaluation, and Promotion of Full-time, Ranked, Non-Regular Faculty Department of Philosophy

International Business BADM 455, Section 2 Spring 2008

Business Computer Applications CGS 1100 Course Syllabus. Course Title: Course / Prefix Number CGS Business Computer Applications

HEALTH INFORMATION ADMINISTRATION Bachelor of Science (BS) Degree (IUPUI School of Informatics) IMPORTANT:

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

Syllabus Foundations of Finance Summer 2014 FINC-UB

Statistics and Data Analytics Minor

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Medical Terminology - Mdca 1313 Course Syllabus: Summer 2017

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Co-Professors: Cylor Spaulding, Ph.D. & Brigitte Johnson, APR Office Hours: By Appointment

CS 3516: Computer Networks

An OO Framework for building Intelligence and Learning properties in Software Agents

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Top US Tech Talent for the Top China Tech Company

CS 446: Machine Learning

Individual Interdisciplinary Doctoral Program Faculty/Student HANDBOOK

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

RM 2234 Retailing in a Digital Age SPRING 2016, 3 credits, 50% face-to-face (Wed 3pm-4:15pm)

Advanced Corporate Coaching Program (ACCP) Sample Schedule

Humboldt-Universität zu Berlin

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

Reducing Features to Improve Bug Prediction

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

Activity Recognition from Accelerometer Data

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

PSCH 312: Social Psychology

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Legal Studies Research Methods (Legal Studies 207/Sociology 276) Spring 2017 T/Th 2:00pm-3:20pm Harris Hall L28

MGMT 3280: Strategic Management

George Mason University Graduate School of Education Education Leadership Program. Course Syllabus Spring 2006

UW-Stout--Student Research Fund Grant Application Cover Sheet. This is a Research Grant Proposal This is a Dissemination Grant Proposal

Biscayne Bay Campus, Marine Science Building (room 250 D)

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Our Hazardous Environment

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Introduction to Personality Daily 11:00 11:50am

Firms and Markets Saturdays Summer I 2014

IDS 240 Interdisciplinary Research Methods

SOC 175. Australian Society. Contents. S3 External Sociology

BOS 3001, Fundamentals of Occupational Safety and Health Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes.

HISTORY 108: United States History: The American Indian Experience Course Syllabus, Spring 2016 Section 2384

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

PROVIDENCE UNIVERSITY COLLEGE

COMM 210 Principals of Public Relations Loyola University Department of Communication. Course Syllabus Spring 2016

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

General syllabus for third-cycle courses and study programmes in

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

BUSINESS FINANCE 4265 Financial Institutions

Transcription:

Syllabus Data Mining for Business Analytics - Managerial INFO-GB.3336, Spring 2018 Course information When: Mondays and Wednesdays 3-4:20pm Where: KMEC 3-65 Professor Manuel Arriaga Email: marriaga@stern.nyu.edu Web: http://pages.stern.nyu.edu/~marriaga Office: KMEC 8-59 Office Hours: By appointment Teaching assistant Liam Greenamyre Email: ltg245@stern.nyu.edu Office hours: TBA Course Overview The goal of this course is to give you a solid understanding of the opportunities, techniques and critical challenges in using data mining and predictive modeling in a business setting. This course will provide you with hands-on experience using a variety of real-world datasets. We will pay special attention to how we can best understand and translate business challenges into data mining problems. So that you can develop that ability, in our lectures we will cover the major issues involved in knowledge discovery and decision making as well as core technical concepts and machine learning methods. Our discussion of these more technical aspects will be carried out without getting into their mathematical underpinnings. If you are interested in a deeper, more technical perspective and have some programming experience, consider taking Data Science for Business Analytics Technical [INFO-GB.2336] instead. This course doesn t promise to turn you into a data scientist (although this may happen anyway!). It is meant to make you literate in data science, which means you will be comfortable doing some handson work (albeit not at scale), interacting with and managing data scientists as well as evaluating data science proposals from a business standpoint.

Prerequisites The course does not have any prerequisites. Learning Goals There are two primary and two secondary learning goals associated with this course: (i) (ii) (iii) (iv) Critical and Integrative Thinking: specifically, how do you formulate business problems in terms that make them amenable to being solved through a systematic modeling approach. Formulation is key as is the construction and evaluation of the model. This skill is also essential as a manager tasked with evaluating the proposals, progress, and work outputs of data science teams. Modeling: you should be competent in applying basic statistical and machine learning methods to data. Your modeling expertise should be sufficient for you to manage data science teams. Effective Oral Communication: Each student shall be able to communicate verbally in an organized, clear, and persuasive manner, and be a responsive listener. You will have the chance to demonstrate communication skills via a presentation of your term project. Interpersonal Awareness and Working in Teams: Students will submit a project which may entail working in a small group (2-4 people) and must apportion tasks appropriately and submit a quality product in a timely manner. Self-learning is a particularly important part of this course. You will get the best value from this course if you experiment actively with ideas and explore ideas instead of just coming to class and expecting to be told what works and what doesn t. There s nothing like learning by doing. Accordingly, 35% of the grade is assigned to your project. So, start early. Exploratory work always takes longer than you think. Indeed, your very first assignment is to write a 1-2 page summary of what you might do as your project. Even if you end up changing topics, the exercise will help you get started in thinking about it seriously, before you get into the nitty-gritty of the quantitative exercises. Reading materials The textbook for this course is: Data Science for Business: What you need to know about data mining and data analytic thinking by Provost & Fawcett (O Reilly, 2013) In the readings section of this syllabus, any reference to chapters without any additional information refers to chapters from our textbook. We will also read some chapters of an old data mining book: Seven Methods for Transforming Corporate data Into Business Intelligence, Vasant Dhar and Roger Stein, Prentice-Hall (1997). These chapters will be shared through NYU Classes. In the readings section of this syllabus, readings from this book can easily be identified by the prefix DS. Finally, additional reading materials will also be made available through NYU Classes.

Software The key concepts and methods discussed in this course are not specific to any piece of software. However, for the assignments and hands-on practice we will use Weka, an open-source, multiplatform data mining toolkit: https://www.cs.waikato.ac.nz/ml/weka/ Weka is a well-established, highly popular data mining application. For that reason, it has the added benefit of it being easy to find abundant documentation, how-to videos and Q&A threads online. The official go to source is known as the Weka book: Data Mining: Practical Machine Learning Tools and Techniques by Ian Witten, Eibe Frank, Mark Hall ISBN- 10: 0123748569 All individual assignments must be done in Weka. For your final project, you are welcome to either use Weka or explore other tools. The latter route will probably appeal to the more technically minded among you, in particular when considering tools such as R or Python s SciKitLearn library. Requirements and grading Given the nature of the material we will be covering, it is expected that you attend all sessions and do not arrive late. There is a strong cumulative aspect to the structure of this course, as is often the case when discussing more technical material. There will be five assignments, each of which builds on a previous one. These will be front loaded so you get most of them over with in the first half of the semester which should give you time to spend on your term project. Assignments will be due by the beginning of our Wednesday class (3pm). You must turn in all assignments on the dates they are due. The project is the most important component of the course and gives you a chance to do your own thing. Start early. You can do the project in groups of 2 to 4 people. Completing the project entails two deliverables a project proposal and final report as well as delivering an in-class presentation at the end of our course. There is no final exam. The grade breakdown is as follows. Assignments: 55 points Term project: 35 points Class participation and attendance: 10 points

Term project The term project should be a substantial piece of work that (i) involves the use and application of techniques learned in this course and, just as importantly, (ii) is of interest to you. Most projects fall in one of the following categories (these are just examples, not an exhaustive list of what is accepted): a) An original idea that you want to build on and test. Examples: Is it possible to extract useful sentiment information from news? If so, how? Build and evaluate a machine learning-based trading strategy based on high frequency data. b) Replication/extension of an existing study or result. Example: Past research shows that boosting and bagging result in variance reduction: we compare these methods on 20 standard datasets from the UCI database and demonstrate under what conditions they work best. c) Extension of an assignment. Example: In Assignment 5 we considered an imbalanced class problem. We consider 20 imbalanced class problems and evaluate the impacts of oversampling the majority class. d) Applying a data-driven approach to a core business problem within your organization (must at a minimum include preliminary results and a detailed proposal for further analysis). You will present your project in the last two sessions of the semester, so make sure you start on it early and give a polished presentation!

Timeline (subject to small revisions) Please note: assignments are always due by the beginning of our second class of each week (i.e., Wednesday 3pm). Week Topic(s) Readings Assignments Week 1 What is the course about? (starts Jan 29) What is predictive analytics? The data mining process Chap 1 & 2 Assignment 1 handed out Week 2 (starts Feb 5) Week 3 (starts Feb 12) Week 4 (only Feb 21) Predictive modeling in action Introduction to Trees Software installation & demo More trees; logistic regression and support vector machines Model performance analysis 1: evaluation and validation Overfitting and its avoidance Model performance analysis 2: ROC, lift, MSE, etc. Chap 3 & 4 Chap 5 Chap 7 & 8 Assignment 1 due Assignment 2 handed out Assignment 2 due Assignment 3 handed out Assignment 3 due Assignment 4 handed out Week 5 (starts Feb 26) Week 6 (starts Mar 5) Week 7 (starts Mar 19) Week 8 (starts Mar 26) Week 9 (starts Apr 2) Week 10 (starts apr 9) Week 11 (starts Apr 16) Week 12 (starts Apr 23) Week 13 (starts Apr 30) Text as data Bayesian modeling and the Naïve Bayes approach Connectionism: Neural networks and deep learning SPRING BREAK Similarity, clusters and neighbors Crowds of predictive models Boosting and Random Forests Evolutionary approaches and genetic algorithms Prediction and Noise revisited How to evaluate data science proposals Topic TBD Guest industry speakers Term project presentations Chap 9 & 10 - DS Chapter 6 Chapter 6 Reading on website DS Chapter 5 Chap 11 & 13 Assignment 4 due Project proposal due Assignment 5 handed out Assignment 5 due Final project report due by May 7