Introduction to Data Science I

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

MGT/MGP/MGB 261: Investment Analysis

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Office Hours: Mon & Fri 10:00-12:00. Course Description

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

MTH 215: Introduction to Linear Algebra

THE UNIVERSITY OF WESTERN ONTARIO. Department of Psychology

INTERMEDIATE ALGEBRA Course Syllabus

Math 181, Calculus I

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

(Sub)Gradient Descent


PSYCHOLOGY 353: SOCIAL AND PERSONALITY DEVELOPMENT IN CHILDREN SPRING 2006

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

BUSINESS FINANCE 4265 Financial Institutions

Management 4219 Strategic Management

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Western University , Ext DANCE IMPROVISATION Dance 2270A

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Please read this entire syllabus, keep it as reference and is subject to change by the instructor.

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Social Media Marketing BUS COURSE OUTLINE

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X


Syllabus ENGR 190 Introductory Calculus (QR)

Class Tuesdays & Thursdays 12:30-1:45 pm Friday 107. Office Tuesdays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

BUS Computer Concepts and Applications for Business Fall 2012

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Course Syllabus for Math

Jeff Walker Office location: Science 476C (I have a phone but is preferred) 1 Course Information. 2 Course Description

Office Hours: Day Time Location TR 12:00pm - 2:00pm Main Campus Carl DeSantis Building 5136

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Strategic Management (MBA 800-AE) Fall 2010

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

MGMT 5303 Corporate and Business Strategy Spring 2016

Course Content Concepts

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Dr. Zhang Fall 12 Public Speaking 1. Required Text: Hamilton, G. (2010). Public speaking for college and careers (9th Ed.). New York: McGraw- Hill.

STA 225: Introductory Statistics (CT)

Python Machine Learning

FINN FINANCIAL MANAGEMENT Spring 2014

ECON 484-A1 GAME THEORY AND ECONOMIC APPLICATIONS

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

BUSINESS FINANCE 4239 Risk Management

CS/SE 3341 Spring 2012

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Office: Colson 228 Office Hours: By appointment

CALCULUS III MATH

Social Media Journalism J336F Unique ID CMA Fall 2012

Department of Statistics. STAT399 Statistical Consulting. Semester 2, Unit Outline. Unit Convener: Dr Ayse Bilgin

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

Principles Of Macroeconomics Case Fair Oster 10e

Jeffrey Church and Roger Ware, Industrial Organization: A Strategic Approach, edition 1. It is available for free in PDF format.

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

THE GEORGE WASHINGTON UNIVERSITY Department of Economics. ECON 1012: PRINCIPLES OF MACROECONOMICS Prof. Irene R. Foster

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Design and Creation of Games GAME

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

CS 3516: Computer Networks

Axiom 2013 Team Description Paper

PBHL HEALTH ECONOMICS I COURSE SYLLABUS Winter Quarter Fridays, 11:00 am - 1:50 pm Pearlstein 308

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

Course Description. Student Learning Outcomes

CEE 2050: Introduction to Green Engineering

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Counseling 150. EOPS Student Readiness and Success

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Foothill College Summer 2016

Teaching Team Professor Dr. Lorraine Jadeski OVC 2617, Extension Office Hours: by appointment

DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY (AETC)

CHEMISTRY 104 FALL Lecture 1: TR 9:30-10:45 a.m. in Chem 1351 Lecture 2: TR 1:00-2:15 p.m. in Chem 1361

Course Syllabus. Course Information Course Number/Section OB 6301-MBP

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

UCC2: Course Change Transmittal Form

Grading Policy/Evaluation: The grades will be counted in the following way: Quizzes 30% Tests 40% Final Exam: 30%

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

MKT ADVERTISING. Fall 2016

Syllabus CHEM 2230L (Organic Chemistry I Laboratory) Fall Semester 2017, 1 semester hour (revised August 24, 2017)

Self Study Report Computer Science

Math 96: Intermediate Algebra in Context

Lecture 1: Machine Learning Basics

Probability and Statistics Curriculum Pacing Guide

McKendree University School of Education Methods of Teaching Elementary Language Arts EDU 445/545-(W) (3 Credit Hours) Fall 2011

BUSI 2504 Business Finance I Spring 2014, Section A

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Beginning and Intermediate Algebra, by Elayn Martin-Gay, Second Custom Edition for Los Angeles Mission College. ISBN 13:

JN2000: Introduction to Journalism Syllabus Fall 2016 Tuesdays and Thursdays 12:30 1:45 p.m., Arrupe Hall 222

Transcription:

Introduction to Data Science I From Introduction to Data Science Contents 1 Course outline for COMPSCI 4414A/9637A/9114A 1.1 Objective 1.2 Prerequisites 1.3 Logistics 1.4 Important Dates 1.5 Materials 1.6 Topics (anticipated) 1.7 Evaluation 1.7.1 Daily Quizzes 5% 1.7.2 Midterm - 35% 1.7.3 Brainstorming Session 5% 1.7.4 Project Proposal 4414: 15% 9637: 10% 1.7.5 Report Draft 5% 1.7.6 Project Report 35% 1.7.7 Peer Review 9637 only: 5% 1.7.8 Participation and Effort 1.8 Accessibility and Support Available at Western 1.9 Missed Course Components 2 Timeline (Tentative) Course outline for COMPSCI 4414A/9637A/9114A The University of Western Ontario London, Ontario, Canada Department of Computer Science Course Outline - Fall (September - December) 2017 From Dan: This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space. Objective The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which specific DS methods are applicable to a problem at hand. During the

course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their findings to their peers in the class. Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The lectures give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project. This course is designed for students who: Like to read - have a desire to understand substantive problems Like to think - make connections between methods and problems Like to hack - be willing to munge (http://en.wikipedia.org/wiki/data_munging) data into usability Like to speak - teach us about what you found Prerequisites At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students. Logistics Instructor: Dan Lizotte dlizotte at uwo dot ca Office MC363 Teaching Assistant: Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below) Time: Tuesday from 2:30PM 4:30PM, and on Thursday from 2:30PM 3:30PM Place: Middlesex College MC-105B (http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf) Question and Collaboration Hour: Tuesday from 4:30pm - 5:30pm Location MC 320 Communication: We will be using OWL (https://owl.uwo.ca) for electronic communication. Important Dates Pick Brainstorming Slot by Friday, 6 Oct at 5pm Project Proposal Due Friday, 27 Oct at 5pm Project Draft Due Friday, 17 Nov at 5pm Project Report Due Friday, 8 Dec at 5pm Paper Reviews Due Friday, 15 Dec at 5pm Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.) Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of Friday, 6 Oct at 5pm or Dan will pick a slot for you. Materials Required Texts JWHT: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer. [Free through Western (https://www.lib.uwo.ca/c gi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7)]

HTF: The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. Expanded version of required text. [Free online (http://www-stat.stanford.edu/~tibs/elemstatlearn/)] LW: Leland Wilkinson's The Grammar of Graphics (2005). [Free from Springer (https://www.lib.uwo. ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0)] ggplot2 book by creator Hadley Wickham (2009). [Free through Western (https://www.lib.uwo.ca/cgibin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387981406)] Review if you need to catch up: Larry Wasserman's (http://www.stat.cmu.edu/~larry/all-of-statistics/) All of Statistics. [Free from Springer (http://link.springer.com/book/10.1007/978-0-387-21736-9)] Devore, J. L., & Berk, K. N. (2007). Modern mathematical statistics with applications. 2nd ed. Springer. [Free through Western (https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer. com/978-1-4614-0391-3)] linear algebra review (http://www.cs.mcgill.ca/~dprecup/courses/ml/materials/linalg-review.pdf) - up to and including Section 3.7 - The Inverse Other Resources The Data and Software Page Cheat Sheets ggplot2 (https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) cheat sheet Data Wrangling (https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatshe et.pdf) cheat sheet Texts Phil Spector. (2008). Data Manipulation with R New York: Springer. [ Free through Western (htt ps://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/978038774730 9) ] probability review (http://www.cs.mcgill.ca/~dprecup/courses/ml/materials/prob-review.pdf) from Stanford University by way of Doina Precup. List of resources (http://www.cs.mcgill.ca/~dprecup/courses/ml/resources.html) from COMP- 652 at McGill (courtesy Doina Precup) C. M. Bishop, Pattern Recognition and Machine Learning (2006) R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998) Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004. David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003. Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001. Other Links Data Visualization for Human Perception (https://www.interaction-design.org/literature/book/the -encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception) Data Journalism (http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_ev eryone) Software The dplyr package documentation (https://cran.r-project.org/web/packages/dplyr/). The "vignettes" are particularly good. The Tensorflow Library (Python, C++) [1] (https://www.tensorflow.org/) Topics (anticipated) Introduction to Data Science

Definitions Components Relationships to Other Fields Data Munging Working with structured data: selecting, filtering, joining, aggregating Web scraping Simple visualizations Sanity checking (Re)-introduction to Statistics Data Summaries Randomness, Sample Spaces and Events, Probability Random Variables, CDF, PMF, PDF Expectation Estimation Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap Inference: Hypothesis testing, P-values, Confidence Intervals Multivariate Statistics: conditional probability, correlation, independence Supervised Machine Learning, Predictive Models Supervised Learning Regression Classification Reinforcement Learning and Sequential Decision Making Evaluation Variance: Test set, cross-validation, bootstrap Bias: Confounding, causal inference Unsupervised Machine Learning, Representations, and Feature Construction Clustering Dimensionality reduction Domain-specific Feature Development Images Sounds Text Visualization Topics to be determined Evaluation There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. Graduate students (9637) will additionally submit peer reviews of other class projects. For detailed requirements, see Project Guidelines. Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [2] (http://www.uwo.ca/univsec/pdf/academic_p olicies/appeals/scholastic_discipline_undergrad.pdf). Daily Quizzes 5%

Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. Quiz marks will only be excused for medical reasons. Midterm - 35% Assessing competencies from the fundamentals taught in the first half of the class. Brainstorming Session 5% Each student will prepare a presentation explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be no more than 10 minutes. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback from the brainstorming session. Project Proposal 4414: 15% 9637: 10% Document detailing the plan for the project. See Project Guidelines for detailed requirements. Report Draft 5% A draft of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project. Project Report 35% Each student will prepare a research paper detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem. Peer Review 9637 only: 5% Each graduate student will prepare two reviews of their classmates' work. Participation and Effort Success of the course as a useful learning experience hinges on active participation and effort of the students. Students are expected to attend all classes and are expected to actively participate in the brainstorming sessions. Accessibility and Support Available at Western Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation. Support Services Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiplechoice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling. Students

who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help. Additional student-run support services are offered by the USC, http://westernusc.ca/services. The website for Registrarial Services is http://www.registrar.uwo.ca. Missed Course Components If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html. A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an off-campus medical facility. For further information, please consult the university s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf. Timeline (Tentative) 7 Sep - Lectures: 12 Sep - Lectures: 14 Sep - Lectures: 19 Sep - Lectures: 21 Sep - Lectures: 26 Sep - Lectures: 28 Sep - Lectures: 3 Oct - Lectures: 5 Oct - Pick Brainstorming Slot by 6 Oct 5pm - Lectures: 10 Oct - Fall Reading Week 12 Oct - Fall Reading Week 17 Oct - Lectures: 19 Oct - Lectures: Guest Lecture by Amanda Holden of SAS. Topic TBA. 24 Oct - Lectures: 26 Oct - Project Proposal Due 27 Oct at 5pm - Lectures: Guest Lecture by Dr. Kemi Ola on Visualization 31 Oct - Lectures: 2 Nov - Lectures: 7 Nov - Midterm 9 Nov - Brainstorming: *slot1*, *slot2*, Sachi Elkerton 14 Nov - Brainstorming: *slot1*, *slot2*, *slot3*, Duff Jones, *slot5*, *slot6* 16 Nov - Project Draft Due 17 Nov at 5pm - Brainstorming: Kerlin Lobo, *slot2*, *slot3* 21 Nov - Brainstorming: *slot1*, Angela Zhao, *slot3*, *slot4*, *slot5*, *slot6* 23 Nov - Brainstorming: *slot1*, *slot2*, *slot3* 28 Nov - Brainstorming: *slot1*, *slot2*, Vanessa Zhu, *slot4*, *slot5*, *slot6* 30 Nov - Brainstorming: *slot1*, *slot2*, *slot3* 5 Dec - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6* 7 Dec - Brainstorming: *slot1*, *slot2*, *slot3*

Project Document Due Friday 8 December 5pm Reviews (graduate students only) Due Thursday 15 December 5pm Retrieved from "https://www.csd.uwo.ca/~dlizotte/teaching/ids/index.php? title=introduction_to_data_science_i&oldid=36" This page was last edited on 13 September 2017, at 00:10.