Data Science for Business Instructors contact informa/on Names: Pekka Malo, Johanna Bragge Teaching Assistants: Anton Frantsev, Bikesh Upre9 E- mail: firstname.surname@aalto.fi Office: CG- 4.18 Instructors webpages: heps://people.aalto.fi/pekka_malo heps://people.aalto.fi/johanna_bragge Course informa/on Status of the course: Advanced Studies in Master s degree program in Informa9on and Service Management (DR2013); Business applica9on course in the Aalto level minor on Analy9cs and Data Science Academic Year, Period: 2015-2016, Period IV Loca/on: Töölö, C- 332 Language of Instruc/on: English Course Website: heps://mycourses.aalto.fi/course/view.php?id=3571 Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy Lecture 1, Mon 22.2.2016
What is Data Science for Business? unsupervised association rules business strategy supervised CRISP-DM Data analytic thinking Story telling with data Data Science for Business Predictive modeling SVM decision trees classification pattern mining Profit chart Expected value framework Model evaluation and validation accuracy crossvalidation R Apache Spark Dash of programming IBM Bluemix text mining 2
Course overview and prerequisites Data Science for Business What You Need to Know About Data Mining and Data Analytic Thinking Module I: Fundamentals of predictive analytics Basics of predictive modeling and introduction to commonly used data mining algorithms (e.g., classification, shopping basket analysis) Evaluation of models, expected value framework, and avoidance of overfitting Module II: Data Science tools for Business Analysts Learning data analytics with R programming language Basics of Apache Spark Learning to deploy models on cloud with IBM Bluemix Prerequisites: Fundamentals of statistics (e.g., inference, regression analysis, and logistic regression) Basic skills in programming / scripting (or at least willingness to learn) 3
Toolkit 4
Learning objectives and outcomes After completing the course, the students will be able to identify the role of data as a business asset understand the principles of predictive modeling recognize how different data science methods can support business decision-making learn basic data analytic techniques for solving business problems understand the promises and limitations of big data gain some experience in using data analytic tools (both commercial as well as open source) that are widely used in companies. Upon completion of the course, the students will also receive certificates from IBM/Big Data University stating their completion of the Predictive Modeling Fundamentals I" and "Introduction to R DataCamp Course". 5
Introduction to Predictive Analytics 6
Completing the course Contact sessions Lectures and tutorials (1-2 x 3h / week) Exercise demos and workshops (2 x 3h / week) Class preparation Assignments Team case (course project) Total 18h 36h 12h 48h 46h 160h (6 op) NOTE: There are compulsory contact sessions 3 x 3 hours per week (max two 3-hour sessions can be missed) 7
Course timeline 22.2.2016 29.2.2016 7.3.2016 14.3.2016 21.3.2016 29.3.2016 Week 1. Introduction to Predictive Analytics Week 2. Data Driven Decision- Making Week 3. Pattern Mining and Shopping Basket Analysis Week 4. R for Data Science Week 5. Advanced Analytics with R Week 6 Dash of Big Data with Spark and Bluemix Fundamentals of predictive analytics Data Science tools for business analysts Formation of teams (based on pre-survey) Submission of project proposal (1-page summary) Team case Team Case Presentations on Week 6 = modeling assignment to be reported within 1 week from publication = BDU course assignment (due dates announced separately) 8
Lectures (L) and tutorials (T) # Date Topic Assignment L-1 22.2. Introduction to Predictive Analytics BDU assignment T-1 23.2. Decision Tree Models L-2 29.2. Data Driven Decision-Making (Reaktor) T-2 01.3. Measuring value from predictive analytics Modeling case 1 L-3 07.3. Pattern mining and shopping baskets T-3 08.3. Supermarket transactions with Apriori Modeling case 2 L-4 14.3. R for data science BDU assignment T-4 15.3. R for data science (cont d) L-5 21.3. High-dimensional Regression Techniques T-5 22.3. Learning with LASSO in R L-6 29.3. Dash of Big Data (IBM) T-6 30.3. Cloud computing with IBM Bluemix Team case presentations In addition to weekly lectures and tutorials, there will be 3-hour exercise sessions on each Wednesday during 9 24.2. - 23.3.2016
Grading The course assessment is comprised of the following three parts: Exam in computer lab 30% Team case (course project) 50% Class activity (tutorials, lectures, exercises) 20% All assignments must be completed to pass the course. Evaluation criteria are separately specified in each assignment. When evaluating work done in teams, starting level of the student teams will be taken into account in grading. Special attention is paid to the teams development in knowledge sharing and learning. 10
Assessment and grading of team case (50% of total grade) The grading of team cases is based on a combination of peer evaluation and a corresponding evaluation by teachers. Evaluation rubrics will be provided separately. To conduct the peer evaluation, you will be provided with a separate observation form. Each student will be able to evaluate each member of the team. All peer evaluations will be confidential. 11
Course material All course communication, materials and exercises, as well as submission of exercises, will be available on the course home page in MyCourses 12