Course Syllabus. Eco Predictive Analytics for Economists Spring 2017 TTh 6:30 7:50 pm 110 Dedman Life Sciences Building

Similar documents
Python Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

This course has been proposed to fulfill the Individuals, Institutions, and Cultures Level 1 pillar.

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Learning From the Past with Experiment Databases

Word Segmentation of Off-line Handwritten Documents


Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MTH 215: Introduction to Linear Algebra

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

(Sub)Gradient Descent

Course Syllabus for Math

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

Beginning and Intermediate Algebra, by Elayn Martin-Gay, Second Custom Edition for Los Angeles Mission College. ISBN 13:

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

Scottsdale Community College Spring 2016 CIS190 Intro to LANs CIS105 or permission of Instructor

CSL465/603 - Machine Learning

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

Course Syllabus It is the responsibility of each student to carefully review the course syllabus. The content is subject to revision with notice.

University of Florida ADV 3502, Section 1B21 Advertising Sales Fall 2017

Reducing Features to Improve Bug Prediction

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Lecture 1: Basic Concepts of Machine Learning

Office Hours: Mon & Fri 10:00-12:00. Course Description

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Applications of data mining algorithms to analysis of medical data

San José State University

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

BY-LAWS of the Air Academy High School NATIONAL HONOR SOCIETY

Lecture 1: Machine Learning Basics

Course Content Concepts

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Probability and Statistics Curriculum Pacing Guide

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Human Emotion Recognition From Speech

STA2023 Introduction to Statistics (Hybrid) Spring 2013

Ryerson University Sociology SOC 483: Advanced Research and Statistics

CALCULUS III MATH

Foothill College Summer 2016

Introduction to Sociology SOCI 1101 (CRN 30025) Spring 2015

A Case Study: News Classification Based on Term Frequency

CS Machine Learning

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Psychology 101(3cr): Introduction to Psychology (Summer 2016) Monday - Thursday 4:00-5:50pm - Gruening 413

Texas A&M University - Central Texas PSYK PRINCIPLES OF RESEARCH FOR THE BEHAVIORAL SCIENCES. Professor: Elizabeth K.

Course Syllabus. Alternatively, a student can schedule an appointment by .

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Class meetings: Time: Monday & Wednesday 7:00 PM to 8:20 PM Place: TCC NTAB 2222

MAT 122 Intermediate Algebra Syllabus Summer 2016

Accounting 380K.6 Accounting and Control in Nonprofit Organizations (#02705) Spring 2013 Professors Michael H. Granof and Gretchen Charrier

Multi-Lingual Text Leveling

ECO 2013: PRINCIPLES OF MACROECONOMICS Spring 2017

CS 3516: Computer Networks

Detailed course syllabus

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Demography and Population Geography with GISc GEH 320/GEP 620 (H81) / PHE 718 / EES80500 Syllabus

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

HMS 241 Lab Introduction to Early Childhood Education Fall 2015

PHO 1110 Basic Photography for Photographers. Instructor Information: Materials:

MTH 141 Calculus 1 Syllabus Spring 2017

COURSE WEBSITE:

The Policymaking Process Course Syllabus

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Issues in the Mining of Heart Failure Datasets

PBHL HEALTH ECONOMICS I COURSE SYLLABUS Winter Quarter Fridays, 11:00 am - 1:50 pm Pearlstein 308

Financial Accounting Concepts and Research

Photography: Photojournalism and Digital Media Jim Lang/B , extension 3069 Course Descriptions

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Psychology 102- Understanding Human Behavior Fall 2011 MWF am 105 Chambliss

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

Visual Journalism J3220 Syllabus

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

SYLLABUS: RURAL SOCIOLOGY 1500 INTRODUCTION TO RURAL SOCIOLOGY SPRING 2017

Probabilistic Latent Semantic Analysis

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

Fall Instructor: Dr. Claudia Schwabe Class hours: T, R 12:00-1:15 p.m. Class room: Old Main 304

CS/SE 3341 Spring 2012

Data Fusion Through Statistical Matching

Introduction to Forensic Anthropology ASM 275, Section 1737, Glendale Community College, Fall 2008

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Firms and Markets Saturdays Summer I 2014

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

Transcription:

Course Syllabus Eco 6380.701 Predictive Analytics for Economists Spring 2017 TTh 6:30 7:50 pm 110 Dedman Life Sciences Building This course is a follow-up to Eco 5350 Introductory Econometrics. Statistical methods used in engineering and computer science are introduced to complement the traditional economist s toolbox of statistical methods. An emphasis of this course will be to demonstrate how this extended toolbox can be used to improve economic and business decision-making. Purposes of Course: There are several major purposes of this course. As the result of taking this course, the student should have an understanding of: The basics of supervised learning prediction and classification Prediction models including multiple linear regression, artificial neural networks, regression trees, K-nearest Neighbors, and Lasso models Classification models including logit/probit models, classification trees, Naïve- Bayes models, and Support Vector Machines Model validation by means of data partitioning Scoring Models on data sets with outcomes yet to be realized Methods of unsupervised learning exploratory data analysis (EDA), principal components, cluster analysis, association rules Ensemble modeling where predictions and classifications are made using combinations of models How to use standard Data Mining Packages including XLMINER, SPSS Modeler, and R. Evaluation of the Student: The evaluation of the student will consist of four parts: Quick Quizzes (20%) Homework Exercises (20%) Mid-term Exam (30%) Final Exam (30%) Thursday, May 11, 6:00 9:00 PM in Room 110 Dedman Life Sciences Building 1

The Quick Quizzes (QQs) will consist of occasional short answer and/or multiple-choice quizzes that will be administered in the first five minutes of the class. They are meant to reinforce the concepts presented in the previous lecture. In addition to keeping the students current in the class and providing review material for the mid-term and final exams, the QQs allow me to keep track of student attendance. It has been my experience that the more Quick Quizzes missed by students, the lower their scores on the mid-term and final exams. The bottom line is that it pays to come to class! I will be dropping your lowest QQ grade before calculating the QQ average. The purpose of homework exercises is to reinforce the concepts discussed in class. They invariably will be based on computer-oriented empirical problems using XLMINER, SPSS Modeler, and R. In completing the homework exercises students can confer with each other with respect to programming advice and discussion of basic ideas but in the final analysis each student is expected to write up his/her own homework answers and not make copies of others homework. Copying someone else s homework to hand in as one s own work is a violation of the SMU Honor Code and will be dealt with according to the rules of the SMU Honor Code. You should know that the homework assignments are very important in that the basic ideas covered by them invariably show up on the mid-term and final exams. If you know you are going to be missing a class on the day a homework exercise is due, hand in your homework in advance to receive full credit for your work. Any homework that is handed in late will be given a one letter grade reduction for each day of tardiness. I will be dropping the lowest homework exercise score before calculating the exercise average. Additionally, I want all homework handed in in hardcopy form no pdf files sent to my e-mail address or the address of my teaching assistant. If you must send in your finished homework by e-mail, a point deduction (out of 10 points) will be applied to the student s exercise. Also, I am expecting the homework to be typed as compared to handwritten. Handwritten homework will be given a one grade point deduction for not being typed. The mid-term exam will cover the topics in the first half of the course. The final exam will cover only the topics covered following the mid-term exam. Note: After 4 unexcused class absences, I reserve the right to administratively drop students from the class. My grading scale in this course is approximately as follows: 92-100 A 90-91 A- 88-89 B+ 82-87 B 80-81 B- 78-79 C+ 72-77 C 70-71 C- 68-69 D+ 62-67 D 60-61 D- 0-59 F 2

Additional Details Classroom Website: http://faculty.smu.edu/tfomby/ Office: Room 301M, Umphrey Lee, 214-768-2559. E-mail address: tfomby@smu.edu Office Hours: 3:00-4:30 PM TTh or by appointment. My Graduate Teaching Assistant: Igor Zhadan. His E-mail address is: izhadan@smu.edu. If you should need extra tutorials or help outside of my office hours, contact Mr. Zhadan and he will be happy to go over concepts that you may not fully understand. Textbook and Computer Software: The required textbook for this course is Data Mining for Business Intelligence by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce, (Wiley, 3rd ed., 2016) hereafter referred to as SPB. This book, when purchased as a new book as compared to used, includes complementary access to an EXCEL add-in called XLMiner. I will be giving you more instructions on how to download the add-in to your computer in class. Later in class I will provide pdf files for operating XLMiner (XLMinerUserGuide_2016.pdf and XLMinerReferenceGuide_2016.pdf). In addition you can download, free, another book in pdf form that I will be referring to in class: An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 2015). The free download of the book can be obtained at www.statlearning.com. We will also be using the SPSS Modeler software package. Later in class I will provide pdf files of the User s Guide for this software (SPSS_Modeler 17 UsersGuide.pdf) and Algorithms Guide (Modeler17 Algorithms Guide.pdf). Access to this software package can be obtained through Apps.smu. To use SPSS Modeler on the Apps.smu system you will first need to download the Citrix Receiver. You can go to the website http://www.smu.edu/businessfinance/oit/services/appssmu and then, being a first time user, you will be prompted to download Citrix Receiver to your PC or laptop. Citrix Receiver provides you with virtual access to the SPSS Modeler software in that Citrix makes it appear that you have SPSS Modeler installed on your own computer when, in fact, it is being accessed from an SMU server on campus. After you install the Citrix Receiver on your computer, you can then logon to the Citrix Receiver by entering your student ID and personal password. Thereafter you can work on your homework assignments, etc. using SPSS Modeler. General comments on work and class etiquette: In order to succeed in this class, constant work is essential. Come to class. Read all assigned readings, complete all exercises on time, and prepare for the Quick Quizzes. Don t get behind. 3

If there is something in class discussion or homework assignments that you don t understand, don t hesitate to ask me in class, after class, during office hours, or through e-mail. Obviously, general rules of etiquette apply: cell phones are to be turned off during class and miscellaneous reading material stowed away. Important Dates to Remember: First Day of Class: Tuesday, January 24 Spring Break: Monday Sunday, March 13-19 (No Classes) Last Day to Drop a Course: Tuesday, April 11 Last Day of Semester in this Class: Thursday, May 4 Exam Dates: Midterm Exam Thursday, March 23. This is the second class following spring break. Final Exam Thursday, May 11, 6:00 9:00 PM in 110 Dedman Life Sciences Building. Some Standard Stuff You Should Know Excused Absences for University Extracurricular Activities: Students participating in an officially sanctioned, scheduled University extracurricular activity should be given the opportunity to make up class assignments or other graded assignments missed as a result of their participation. It is the responsibility of the student to make arrangements with the instructor prior to any missed scheduled examination or other missed assignment for making up the work. (University Undergraduate Catalogue) Disability Accommodations: Students needing academic accommodations for a disability must first register with Disability Accommodations & Success Strategies (DASS). Students can call 214-768-1470 or visit http://www.smu.edu/provost/alec/dass to begin the process. Once registered, students should then schedule an appointment with the professor as early in the semester as possible, present a DASS Accommodation Letter, and make appropriate arrangements. Please note that accommodations are not retroactive and require advance notice to implement. Religious Observance: Religiously observant students wishing to be absent on holidays that require missing class should notify their professors in writing at the beginning of the semester, and should discuss with them, in advance, acceptable ways of making up any work missed because of the absence. (See University Policy No. 1.9.) Honor Code: All SMU students are bound by the Honor Code (see SMU Student Handbook for a complete discussion of the SMU Honor Code). The code states that any giving or receiving of aid on academic work submitted for evaluation, without the express consent of the instructor, or the toleration of such action shall constitute a breach of the Honor Code. A violation can result in an F for the course and an Honor Code Violation on your transcript. 4

Topics I. Introduction A. What is Data Mining? B. Terminology of Data Mining C. Types of Variables: Interval, Nominal (Unordered Categorical), and Ordinal (Ordered Categorical) D. The Distinct Purposes of Hypothesis Testing versus Prediction (Read Breiman article) E. Data Mining from a Process Perspective (Fig. 1.2 in SPB) F. Data Mining Methods Classified by Nature of the Data (Table 1.1 in SPB) References: SPB, Chapter 1 and Breiman, Leo (2001), Statistical Modeling: The Two Cultures, Statistical Science, 16, 199-231. The Breiman article will be posted to the student by class e- mail. Power Point 1 Two Cultures II. Overview of the Data Mining Process A. Core Ideas in Data Mining i. Classification ii. Prediction iii. Association Rules iv. Data Reduction v. Data Exploration vi. Data Visualization B. Supervised and Unsupervised Learning C. The Steps in Data Mining D. SEMMA (SAS) and CRISP (IBM) E. Preliminary Steps i. Sampling from a Database ii. Pre-processing and Cleaning the Data iii. Partitioning the Data: Training, Validation, and Test data sets iv. Model Evaluation and Comparison of Models F. Building a Model An Example with Linear Regression References: SPB, Chapter 2. SAS_SEMMA.pdf and CRISP_DM.pdf, and Power Point 2 Data Mining Software. These files will be posted to the students by class e-mail and will be on CANVAS. III. Data Exploration and Data Refinement A. Data Summaries B. Data Visualization C. Treatment of Missing Observations 5

D. Detection of Outliers the Box Plot E. Correlation Analysis References: SPB, Chapter 3 and Power Point 3 Missing Obs and EDA IV. Variable Importance and Dimension Reduction A. Binning: Reducing the Number of Categories in Categorical Variables B. Principal Component Analysis of Continuous Variables C. Dimension Reduction using Best Subset Regression and LASSO Modelling Techniques D. Dimension Reduction using Bivariate Association Probabilities (as in the Feature Selection node in SPSS Modeler), and Regression and Classification Trees References: SPB, Chapter 4, Modeler 17 Algorithms Guide.pdf on Feature Selection Algorithm starting on page 153, XLMinerReferenceGuide.pdf on Feature Selection Option on pages 77-101, and Power Point 19 Principle Component Analysis. V. Evaluation Methods for Prediction and Classification Problems A. Prediction Measures: MAE, MSE, RMSE, MAPE, MSPE, and RMSPE B. Application to Validation and Test Data Sets C. Avoiding Overtraining References: SPB, Chapter 5, pp. 106 111 and Power Point 4 Avoiding Overtraining. VI. Prediction Methods A. Linear Regression: Best Subset Selection i. Forward Selection ii. Backward Selection iii. Step-wise Regression (Efroymson s method) iv. All Subsets Regression (Cp Mallows and Adjusted R-square criteria) v. Information Criteria (AIC, SBC, etc.) B. Penalized Regression Methods (Ridge, LASSO, Adaptive LASSO, and Elastic Net) C. k-nearest Neighbors (k-nn) D. Regression Trees i. CART ii. CHAID E. Neural Nets i. Architecture of Neural Nets a. Neurons b. Input Layer c. Hidden Layers d. Output Layer ii. Fitting Neural Nets: Back Propagation F. Comparison of the Various Methods References: SPB, Chapters 6, 7, 9, 11 and Power Points 5, 6, 7, 8, and 12. 6

Mid-Term Exam Approximately Thursday, March 23 VII. Evaluation Methods for Classification Problems A. Classification Measures: Classification (Confusion) Matrix, Accuracy Measures, Profit Curves, ROC Curves, Lift Charts, and Lift Charts B. The Role of Over-sampling in Classification Problems References: SPB, Chapter 5, pp. 112 137 and Power Points 9, 10, and 11 on Evaluation of Classifiers. VIII. Classification Methods A. The Naïve Rule B. Naïve-Bayes Classifier C. K-Nearest Neighbors D. Classification Trees E. Neural Nets F. Logistic Regression G. Support Vector Machines (SVM) References: SPB, Chapters 6, 7, 8, 9, 10, and 11. IX. Ensemble Methods A. Nelson and Granger-Ramanathan Methods for Continuous Targets B. Majority Voting for Categorical Targets C. Bagging D. Boosting Reference: SPB, Chapter 13. X. Association Rules A. Support and Confidence B. The A priori Algorithm C. The Selection of Strong Rules Reference: SPB, Chapter 14. Non-supervised Learning Techniques XI. Cluster Analysis A. Hierarchical Clustering and Dendrograms 7

B. Non-hierarchical Clustering the K-means Algorithm Reference: SPB, Chapters 15 XII. Text Mining A. Preprocessing the Data B. Singular Value Decomposition (SVD) C. Prediction with SVD variables Reference: SPB, Chapter 20. Final Exam Thursday, May 11 6:00 9:00 PM 110 Dedman Life Sciences Building 8