BUS 656 Introduction to Business Data Analytics Spring 2016 Professor: Dr. Vilma Todri Assistant Professor in the Department of Information Systems and Operations Management Office: GBS 420 Homepage: www.vilmatodri.com Phone: (404) 727 6629 Office Hours: Email: Tuesdays 5-6pm, or by appointment vilma.todri@emory.edu Note: If you send class-related emails, please have [BUS 656] as part of the subject line in your email message Course Description: Virtually every aspect of business is instrumented for data collection and data is increasingly analyzed systematically to improve business decision-making and offer competitive advantage. In this class, we will study the fundamental principles and techniques of data mining in order to extract useful information and knowledge from data. We will improve our ability to approach problems "data-analytically", we will examine real-world examples that place data mining in context, and we will apply data-mining techniques while working hands-on with a data mining software. After taking this course, you should be able to view business problems from a data analytics perspective, think systematically about how the techniques of extracting useful knowledge from data can improve business performance and have hands-on experience with data mining techniques. Prior experience with a programming language or with data mining is useful but not necessary. Course Website: All course-related materials (required readings, lecture slides, assignments, grades, etc.) will be posted on the course website, which is available to all registered students through Canvas (https://canvas.emory.edu/).
Textbooks and Readings: Recommended book: Provost, Foster, and Tom Fawcett. Data Science for Business: What you need to know about data mining and data-analytic thinking. " O'Reilly Media, Inc.", 2013. ISBN 10: 1-4493-6132-3, ISBN-13: 978-1-4493-6132-7 Suggested book (optional): Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques (3 rd edition). Morgan Kaufmann, 2011. ISBN-10: 0123814790, ISBN-13: 978-0123814791. This book provides technical details of the predictive modeling techniques and is a very nice supplement for the students who want to dig more deeply into the technical details. Book chapters (and possibly additional online readings) might be assigned for each week. Information about the weekly readings will be posted on the course website. The students are expected to ask questions about any material in the notes that is unclear after our class discussion and the assigned reading material. The students could ask either the TA of the instructor about supplemental material, if they want to go further in a topic. Required Software Tools: Throughout this course, the students will primarily be using RapidMiner Studio (a comprehensive, widely used analytics software suite), both for in-class exercises and for homework assignments. The students should download the software at https://rapidminer.com/academia/educationalprogram/ (the website may ask you to login/sign-up before downloading/activating the software). RapidMiner should automatically generate a free license for you for the basic version of the software during the download/registration process. The students should install the software on their laptop and bring it to the class. We will do a lot of interesting data analyses throughout the course, and being able to use RapidMiner in class on your own will enhance your learning experience. The students should also use RapidMiner (or a similar tool of their preference such as Weka or Python and its data science/analytics/visualization libraries) to complete hands-on assignments. For RapidMiner we will provide installation instructions to make sure that you have full access to the software.
During the course, the students will also be able to apply the methodologies and techniques discussed in class using other software tools (such as Weka, Python, R and several others). Assignments and Grading: The students will be evaluated based on their performance on: Homework Assignments: There will be approx. 4-5 individual homework assignments in total, each taking 1-3 weeks to complete. Unless explicitly indicated otherwise, each homework assignment will be due on Tuesday before class (the specific week when each homework assignment is due will be indicated as part of each homework description). The hands-on tasks in the homework assignments will be based on data that we will provide. Students will mine the data to get hands-on experience in formulating problems and using various techniques discussed in class. Students will use these data to build and evaluate predictive models. Midterm and Exam: these will be closed-book/closed-notes/paper-and-pencil in-class exams. Group Project: it will be approx. 5-week-long project for groups of approx. 4 students; the topic and the groups will be determined in collaboration with the instructor. Students will submit various milestone deliverables through the course. The project requirements will be discussed in class. Participation and Class Contribution: Students should be prepared for class discussions by having satisfied themselves that they understand what we have done in prior classes. Students are also expected to attend every class session, to arrive prior to the starting time, to remain for the entire class, and to follow basic classroom etiquette including refraining from doing irrelevant work or reading during the class. Grading Distribution: Homework: 20% Midterm Exams: 25% Final Exam: 30%
Group Project: 15% Participation and Class Contribution: 10% Several notes about late homework submissions: The due date for each homework is indicated in the class schedule and in the homework description, and the homework should be submitted 30 minutes prior to the start of class on the due date, unless otherwise indicated. In cases of late homework submission, the following policy will be applied: o -10% of the assigned grade for the first late day; o -25% (i.e., 10%+15%) of the assigned grade for the second late day; o -50% (i.e., 10%+15%+25%) of the assigned grade for the next 5 late days; o The homework will not accepted if it is more than 7 days late (resulting in 0 score for the assignment). Please feel free to plan accordingly and take advantage of the policy, if necessary. [For example, if you are only half-way done with the homework on the due date, but feel that you should be able to finish the assignment within the next day, then it may be advantageous to take the extra time and submit the assignment one day late.] Of course, if there are serious extenuating circumstances for not being able to submit homework on time (e.g., illness), you should notify the instructor as soon as possible. If you feel that a calculation, factual, or judgment error has been made in the grading of an assignment or exam, please write a formal memo to me describing the error, within one week after the class date on which that assignment was returned. Include documentation (e.g., pages in the book, a copy of class notes, etc.). I will make a decision and get back to you as soon as I can. Please remember that grading any assignment requires the grader to make many judgments as to how well you have answered the question. Inevitably, some of these go in your favor and possibly some go against. In fairness to all students, the entire assignment or exam will be regraded. Accommodations: Students with accommodations through the Office of Disability Services must get me their documentation ASAP so I can plan for their accommodations. It is the students responsibility to follow up with me about schedules, etc., for accommodations.
Academic Honesty: The course is governed by the Goizueta Honor Code with which all students enrolled at Goizueta must comply. If you have any questions about your responsibilities under the honor code you should see me. I take this extremely seriously and will pursue any violation of the honor code through the university s procedures. Students found to be in violation of the honor code will receive a course grade of XF failure due to academic dishonesty.
Week Date Topic 1 Jan 10 Jan 12 Course introduction and overview Basic terminology and data objects Predictive modeling framework - Supervised vs. unsupervised methods - Classification vs. numeric prediction 2 Jan 17 Jan 19 3 Jan 24 Jan 26 4 Jan 31 Feb 2 5 Feb 7 Feb 9 6 Feb 14 Feb 16 7 Feb 21 Feb 23 8 Feb 28 Mar 2 9 Mar 7 Mar 9 10 Mar 14 Mar 16 Introduction to RapidMiner - Repeatable analytics tasks and workflows Fundamentals of classification - Building and evaluating classification models - Technique: Decision Trees - Technique: k Nearest Neighbors (k-nn) Classification, class probability estimation, and ranking - Technique: Logistic Regression - Generalization and the issue of overfitting - Regularization In-depth view at classifier performance and evaluation - N-fold cross-validation approach - Advanced evaluation metrics - Visualization of predictive performance Evaluating predictive modeling projects and proposals - Common flaws in predictive modeling projects Feb 16 MIDTERM QUIZ/EXAM Mid-semester module Mid-semester module Spring Break Fundamentals of numeric prediction - Technique: Linear Regression - Technique: k-nn and combining functions - Technique: Regression Trees 11 Mar 21 Mar 23 12 Mar 28 Mar 30 13 Apr 4 Apt 6 Predictive analytics with textual data - Document representation, feature construction - Technique: Naïve Bayes classifier - Practical applications of predictive textual analytics Unsupervised predictive analytics - Technique: Clustering Mar 30 MIDTERM 2 QUIZ/EXAM Guest Lecture Meta-modeling and cost-aware modeling techniques - General-purpose ensemble approaches
14 Apr 11 Apr 13 15 Apr 18 Apr 20 Fundamentals of recommender systems - Collaborative (filtering) vs. content-based approaches - Technique: k-nn collaborative filtering - Technique: latent modeling using matrix factorization Project Presentations Hands-On Lab Case Studies 16 Apr 25 FINAL QUIZ/EXAM Note: The schedule is tentative and the list of topics may be adjusted over the course of the term. Even though we will cover all of the topics, the pace of learning/discovery will dictate the actual schedule. The changes (if any) will be indicated on the course Canvas website.