IST 718: Big Data Analytics

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Math 181, Calculus I

IST 649: Human Interaction with Computers

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

BUS Computer Concepts and Applications for Business Fall 2012

MTH 215: Introduction to Linear Algebra

Python Machine Learning

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010


Financial Accounting Concepts and Research

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Social Media Journalism J336F Unique ID CMA Fall 2012

Foothill College Summer 2016

Office Hours: Day Time Location TR 12:00pm - 2:00pm Main Campus Carl DeSantis Building 5136

(Sub)Gradient Descent

Math 96: Intermediate Algebra in Context

DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY (AETC)

COURSE WEBSITE:

Office Hours: Mon & Fri 10:00-12:00. Course Description

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

Pitching Accounts & Advertising Sales ADV /PR

MAT 122 Intermediate Algebra Syllabus Summer 2016

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

INTERMEDIATE ALGEBRA Course Syllabus

MKT ADVERTISING. Fall 2016

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

CHEMISTRY 104 FALL Lecture 1: TR 9:30-10:45 a.m. in Chem 1351 Lecture 2: TR 1:00-2:15 p.m. in Chem 1361

Lecture 1: Machine Learning Basics

Instructor: Matthew Wickes Kilgore Office: ES 310

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

MGT/MGP/MGB 261: Investment Analysis

Medical Terminology - Mdca 1313 Course Syllabus: Summer 2017

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

AGN 331 Soil Science Lecture & Laboratory Face to Face Version, Spring, 2012 Syllabus

CEE 2050: Introduction to Green Engineering

Class Tuesdays & Thursdays 12:30-1:45 pm Friday 107. Office Tuesdays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

English Grammar and Usage (ENGL )

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

STA2023 Introduction to Statistics (Hybrid) Spring 2013

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

Professors will not accept Extra Credit work nor should students ask a professor to make Extra Credit assignments.

General Physics I Class Syllabus

MAE Flight Simulation for Aircraft Safety

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

HUMAN ANATOMY AND PHYSIOLOGY II

AGN 331 Soil Science. Lecture & Laboratory. Face to Face Version, Spring, Syllabus

CHEM:1070 Sections A, B, and C General Chemistry I (Fall 2017)

IST 440, Section 004: Technology Integration and Problem-Solving Spring 2017 Mon, Wed, & Fri 12:20-1:10pm Room IST 202

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

CS/SE 3341 Spring 2012

CALCULUS I Math mclauh/classes/calculusi/ SYLLABUS Fall, 2003

MANA 7A97 - STRESS AND WORK. Fall 2016: 6:00-9:00pm Th. 113 Melcher Hall

RM 2234 Retailing in a Digital Age SPRING 2016, 3 credits, 50% face-to-face (Wed 3pm-4:15pm)

Syllabus ENGR 190 Introductory Calculus (QR)

Design and Creation of Games GAME

Visual Journalism J3220 Syllabus

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

AS SYLLABUS. 2 nd Year Arabic COURSE DESCRIPTION

Scottsdale Community College Spring 2016 CIS190 Intro to LANs CIS105 or permission of Instructor

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

Hist 1210, World History 1 Fall 2014

Course Syllabus. Alternatively, a student can schedule an appointment by .

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

Strategic Management (MBA 800-AE) Fall 2010

COURSE DESCRIPTION PREREQUISITE COURSE PURPOSE

Please read this entire syllabus, keep it as reference and is subject to change by the instructor.

BUFFET THEORY AND PRODUCTION - CHEF 2332 Thursday 1:30pm 7:00pm Northeast Texas Community College - Our Place Restaurant Course Syllabus Fall 2013

MGMT3274 INTERNATONAL BUSINESS PROCESSES AND PROBLEMS

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Introduction to Forensic Drug Chemistry

BUSINESS FINANCE 4265 Financial Institutions

UCC2: Course Change Transmittal Form

Syllabus Fall 2014 Earth Science 130: Introduction to Oceanography

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Preferred method of written communication: elearning Message

General Chemistry II, CHEM Blinn College Bryan Campus Course Syllabus Fall 2011

Introduction. Chem 110: Chemical Principles 1 Sections 40-52

PHO 1110 Basic Photography for Photographers. Instructor Information: Materials:

Instructor Dr. Kimberly D. Schurmeier

Computer Science 1015F ~ 2016 ~ Notes to Students

Jeff Walker Office location: Science 476C (I have a phone but is preferred) 1 Course Information. 2 Course Description

CALCULUS III MATH

BUAD 425 Data Analysis for Decision Making Syllabus Fall 2015

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:


THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

POFI 1349 Spreadsheets ONLINE COURSE SYLLABUS

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

CS 100: Principles of Computing

SPM 5309: SPORT MARKETING Fall 2017 (SEC. 8695; 3 credits)

Transcription:

IST 718: Big Data Analytics Course information Instructor: Daniel E. Acuna, deacuna@syr.edu, acuna.io, github.com/daniel-acuna Term: Fall 2018 Class: IST 718-M002 (17034): Mo 9:30 AM 12:15 PM, 011 Hinds Hall IST 718-M003 (20891): Tue 9:30 AM 12:15 PM, 011 Hinds Hall Office: 312 Hinds Hall Office Hours: Tuesdays from 12:30 PM to 1:30 PM or by appointment TA: Tong Zeng tozeng@syr.edu TA hours: Wednesday from 1:00 PM to 3:00 PM in ICE Box Catalog description A broad introduction to analytical processing tools and techniques for information professionals. Students will develop a portfolio of resources, demonstrations, recipes, and examples of various analytical techniques. Detailed course description This course will prepare you to participate as a Data Scientist on big data and data analytics projects. Upon the successful completion of this course, you will be able to: Translate a business challenge into an analytics challenge; Analyze big data, create statistical models, and identify insights that can lead to actionable results; Use Python and Apache Spark to build big data analytics pipelines Learn classic and state of the art machine learning techniques Explain how advanced analytics can be leveraged to create competitive advantage; Prerequisite knowledge required Familiarity with command-line interfaces, quantitative skills, including statistics as well as programming skills in Python. Please see https://acuna.io/teaching/ist718/ Textbooks: We will use parts of 4 textbooks: Python Data Science Handbook (PDSH) by Jake VanderPlas (Free), https://jakevdp.github.io/pythondatasciencehandbook/ An introduction to Statistical Learning with Applications in R (ISLR) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Free) http://www-bcf.usc.edu/~gareth/isl/islr%20sixth%20printing.pdf Deep Learning (DL) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (Free) http://www.deeplearningbook.org/ Apache Spark: The Definitive Guide (Excerpts) (ASDG) by Chambers and Zaharia (https://pages.databricks.com/definitive-guide-spark.html) 1 OF 6

Course Topics The following is a tentative outline of topics to be covered in the course: Week Topics Suggested reading Dates/Events Aug 27/28 Sep 3/4 Sep 10/11 Overview of the course and review Linear algebra, calculus, statistics; Python, Jupyter notebook DL: Ch. 2.1-2.6 DL: Ch. 3.1-3.9.3 DL: Ch. 4.3 PDSH Ch. 1 - HW 1 released No classes for both sessions (Optional session) Programming and math session on September 4 between 6 PM and 9 PM - room TBA Python Programming Numpy, Pandas, Matplotlib PDSH: Ch. 2.2-2.7 PDSH: Ch. 3.2-3.4 PDSH: Ch. 4.1-4.9 ASDG: 3-8 - HW 1 due - HW 2 released Sep 17/18 Introduction to Hadoop, MapReduce, and Apache Spark - Group formation Sep 24/25 Introduction to Spark DataFrames and Spark ML ASDG: 44-126 - Project brainstorming - HW 2 due - HW 3 released Oct 1/2 Oct 8/9 Oct 15/16 A Statistical Perspective on Machine Learning Introduction to probability; maximum likelihood estimation; mean square error estimation; gradient descent Assessing Model Accuracy Confusion matrix, bias variance tradeoff, model selection: training, validating, and testing Case 1: Sentiment Analysis of Twitter Supervised learning, logistic regression, regularized logistic regression, elastic net regularization, model interpretation ISLR: Ch. 1; Ch 2.1 DL: Ch 1-2 ISLR: Ch 2.2 ISLR: Ch 6 - Project proposal due - HW 3 due - HW 4 released Oct 22/ 23 Case 1 (cont.) - HW 4 due - HW 5 released Oct 29/30 Case 2: A recommendation system for courses Unsupervised learning, nearest neighbors, dimensionality ISLR: Ch. 10, sections.10.1, 10.2 - Project update presentation reduction (Principal Component Analysis, PCA), clustering (k-means) and 10.3 Nov 5/6 Case 2 (cont.) Nov 12/13 Case 3: Predicting Credit Scores with Bagging and Boosting "wisdom of the crowd", bagging, random forests, gradient boosting, feature importance ISLR: Ch 8 - HW 5 due - HW 6 released Nov 19/20 Thanksgiving break Nov 26/27 Case 3 (cont.) - Nov 30 Poster Day Dec 3/4 Case 4: Object Recognition with Deep Learning Neural networks, multilayer perceptron, other topics and next steps for data science careers DL: Ch 6 - HW 6 due - Project code due on Dec 7 2 OF 6

Methods of evaluation: Assessment Notes Points Quizzes (best 4 of 5) Covers concepts; unannounced; no make-ups 20 (4 x 5) Homework (6) Based on class materials; late submission will be discounted 42 (6 x 7) Group Project (1) In groups of only 3 or 4; you choose your 35 (total) teammates - Project proposal Beginning of semester 5 - Project update In-class short overview and update of project 5 - Poster presentation Poster day 15 - Project code Final submission 10 Participation - - In class and/or forums 3 TOTAL 100 Grading scale Total Points Earned Registrar Grade 95-100 A 90-94 A - 85-89 B + 80-84 B 75-79 B - 70-74 C + 65-69 C 60-64 C - 50-59 D 0-49 F Quizzes Quizzes are individual effort, in-class short tests which measure your understanding of the concepts and terminology covered in class, labs and the assigned readings. Quizzes could be issued at the beginning of class. Please report to class on time. No make-ups will be given to absent or late-arriving students. Consider quizzes part of your attendance. Quizzes are unannounced. There are no quiz dates posted on the syllabus, so expect there will be a quiz for class. Quizzes will cover all material up to the point where the quiz is issued, Homework Homework is released on Tuesdays at 12:30 PM after classes and due on the Sundays at midnight before classes. You need to create a Github.com account and share your Github username with the professor. Then, you will use the Github authentication system to use http://notebook.acuna.io/ 3 OF 6

While you are encouraged to discuss homework with your classmates, homework programming and writing is individual effort. It is designed to check that you are keeping pace with in class concepts and out-of-class lab activities. The intention of homework is to ensure students are keeping pace with the out of class activities. For late submissions, the grade of your homework will be multiplied by the following factor, where days (could be fractional) is the number of days submitted late (days > 0): def grade_factor(days): if days > 3: return 0 else: return 1/(1 + days)**(3/4) e.g., 1 day late 60% of grade, 2 days late 43%, 2 hours late ~95% Submissions later than 72 hours from deadline will zero grade. Group Project The group project is your chance to demonstrate what you ve learned in the course and apply it to a new scenario. It is expected that each group s project will be novel, and that all will be of the highest quality. Your group will be responsible for finding a data set, analyzing it, and producing visualizations and findings from your analysis. The group project consists of four elements: 1. Project proposal. 2-page preliminary description of the project, including problem statement, solution proposed, techniques, datasets and expected results. 2. Project update: During the middle of the semester, your project team is required to present updates about your project. You are expected to have done significant advances in your analysis and have concrete ideas about next steps. 3. Presentation. Your group is required to present your findings on the ischool's joint poster day (November 30, 2018) During this time, your professor and TA will stop by and ask you to present your project. 4. Project code. Well-commented code to produce the result of your poster. There is no late submission on the presentation parts and all members of the group must be able to describe the work during poster day. Participation It is expected that you participate in class and Blackboard forum discussions. Also, it is expected that you visit the professor during office hours or set up a time to meet him during the semester. Additionally, it is expected that the group visits the professor or talk to the TA to discuss the project. General information Teaching philosophy My teaching philosophy centers around students as critical thinkers that challenge current beliefs using arguments rooted in strong evidence. There are three concepts to this 4 OF 6

We are all inherently curious about how the world works and have an unbounded set of needs We all make mistakes and all questions are valid, however we only realize these blind spots when we critically think and discuss our ideas with others Data is only a means to a goal but we are responsible for keeping our analysis, policy recommendations, and conclusions as ethical and compassionate as possible. Who does well in the course? This is a relatively heavy load course and an ideal student should follow the following items: Study consistently throughout the semester in short burst. Research has shown that pulling allnighters and studying just before the class or lab will make you forget the contents later in your career. Make an effort to study everyday at least 30 minutes for the class. Be active in class and ask questions to the professor. Challenge the materials and try to see all angles of the ideas and conclusions being presented in class. Critical thinking is as important as technical ability in a data science job and it is a highly appreciated skill Focus on learning how to program and the pieces involved in developing professional software. Although this class does not assume a large amount of programming experience, the sooner you start learning about Python and the tools taught in this class, the better you are going to do throughout the semester. Data scientists who are excellent critical thinkers AND known how to transform ideas into software are the most prized in the job market. Academic Integrity Policy Syracuse University s academic integrity policy reflects the high value that we, as a university community, place on honesty in academic work. The pilot policy in effect at the School of Information Studies defines our expectations for academic honesty and holds students accountable for the integrity of all work they submit. Students should understand that it is their responsibility to learn about coursespecific expectations, as well as about university-wide academic integrity expectations. The pilot policy governs appropriate citation and use of sources, the integrity of work submitted in exams and assignments, and the veracity of signatures on attendance sheets and other verification of participation in class activities. The pilot policy also prohibits students from submitting the same work in more than one class without receiving written authorization in advance from both instructors. Under the pilot policy, students found in violation are subject to grade sanctions determined by the course instructor and nongrade sanctions determined by the School or College where the course is offered. SU students are required to read an online summary of the university s academic integrity expectations and provide an electronic signature agreeing to abide by them twice a year during pre-term check-in on MySlice. For more information and the pilot policy, see http://academicintegrity.syr.edu Disability-Related Accommodations Syracuse University values diversity and inclusion; we are committed to a climate of mutual respect and full participation. If you believe that you need accommodations for a disability, please contact the Office of Disability Services (ODS), disabilityservices.syr.edu, located at 804 University Avenue, room 309, or call 315.443.4498 for an appointment to discuss your needs and the process for requesting accommodations. ODS is responsible for coordinating disability-related accommodations and will issue Accommodation Authorization Letters to students as appropriate. Since accommodations may require early planning and generally are not provided retroactively, please contact ODS as soon as possible. Our goal at the ischool is to create learning environments that are useable, equitable, inclusive and 5 OF 6

welcoming. If there are aspects of the instruction or design of this course that result in barriers to your inclusion or accurate assessment or achievement, please meet with me to discuss additional strategies beyond official accommodations that may be helpful to your success. Religious Observances Notification and Policy SU s religious observances policy, found at supolicies.syr.edu/emp_ben/religious_observance.htm, recognizes the diversity of faiths represented in the campus community and protects the rights of students, faculty, and staff to observe religious holy days according to their tradition. Under the policy, students should have an opportunity to make up any examination, study, or work requirements that may be missed due to a religious observance provided they notify their instructors no later than the end of the second week of classes through an online notification form in MySlice listed under Student Services/Enrollment/My Religious Observances/Add a Notification. 6 OF 6