Welcome to CS4437 / CS9637 / CS9114 Introduction to Data Science

Similar documents
(Sub)Gradient Descent

Python Machine Learning

MTH 215: Introduction to Linear Algebra

Statistics and Data Analytics Minor

Office Hours: Mon & Fri 10:00-12:00. Course Description

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Lecture 1: Machine Learning Basics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

CSL465/603 - Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus


Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur


ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

PSYCHOLOGY 353: SOCIAL AND PERSONALITY DEVELOPMENT IN CHILDREN SPRING 2006

Probability and Statistics Curriculum Pacing Guide

MGT/MGP/MGB 261: Investment Analysis

Welcome to. ECML/PKDD 2004 Community meeting

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Lecture 1: Basic Concepts of Machine Learning

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mining Association Rules in Student s Assessment Data

CS 3516: Computer Networks

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

PHILOSOPHY & CULTURE Syllabus

Upon completion of the Integrated Core Curriculum students will demonstrate competence in:

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

CS 446: Machine Learning

Visual Journalism J3220 Syllabus

CS 101 Computer Science I Fall Instructor Muller. Syllabus

CWSEI Teaching Practices Inventory

Probability and Game Theory Course Syllabus

The Moodle and joule 2 Teacher Toolkit

CS Machine Learning

Computational Data Analysis Techniques In Economics And Finance

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Math 181, Calculus I

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

CS/SE 3341 Spring 2012

Syllabus ENGR 190 Introductory Calculus (QR)

GIS 5049: GIS for Non Majors Department of Environmental Science, Policy and Geography University of South Florida St. Petersburg Spring 2011

STA 225: Introductory Statistics (CT)

CALCULUS I Math mclauh/classes/calculusi/ SYLLABUS Fall, 2003

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Reducing Features to Improve Bug Prediction

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Synchronous Blended Learning Best Practices

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Syllabus CHEM 2230L (Organic Chemistry I Laboratory) Fall Semester 2017, 1 semester hour (revised August 24, 2017)

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

UCC2: Course Change Transmittal Form

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

BUSINESS FINANCE 4265 Financial Institutions

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Human Development: Life Span Spring 2017 Syllabus Psych 220 (Section 002) M/W 4:00-6:30PM, 120 MARB

Skyward Gradebook Online Assignments

Data Structures and Algorithms

Course Syllabus for Math

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

INTERMEDIATE ALGEBRA Course Syllabus

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Probabilistic Latent Semantic Analysis

MASTER OF PHILOSOPHY IN STATISTICS

RTV 3320: Electronic Field Production Instructor: William A. Renkus, Ph.D.

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

ECON 484-A1 GAME THEORY AND ECONOMIC APPLICATIONS

Phys4051: Methods of Experimental Physics I

BUAD 425 Data Analysis for Decision Making Syllabus Fall 2015

Foothill College Summer 2016

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

COMM 210 Principals of Public Relations Loyola University Department of Communication. Course Syllabus Spring 2016

ATW 202. Business Research Methods

STAT 220 Midterm Exam, Friday, Feb. 24

A survey of multi-view machine learning

Time series prediction

CHEM:1070 Sections A, B, and C General Chemistry I (Fall 2017)

Hierarchical Linear Models I: Introduction ICPSR 2015

Visualizing Architecture

Speech Emotion Recognition Using Support Vector Machine

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Generic syllabus for MCB2000L, 3020L and 3023L Summer 2013

Grading Policy/Evaluation: The grades will be counted in the following way: Quizzes 30% Tests 40% Final Exam: 30%

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Self Study Report Computer Science

Math 96: Intermediate Algebra in Context

Transcription:

Welcome to CS4437 / CS9637 / CS9114 Introduction to Data Science Dr. Dan Lizotte (Comp. Sci., Epidemiology & Biostatistics) IF YOU ARE NOT REGISTERED, please have a seat and we will discuss as a group.

A data scientist is a statistician who lives in San Francisco Data Science is Statistics on a Mac A data scientist is better at statistics than any software engineer and better at software engineering than any statistician.

Choosing and using methods from computer science and statistics to understand something about the world. Dan

Statistics Comp Sci Regression, GLMs Decision Trees Apriori GAMs/Splines Hypothesis Testing Bayesian Networks Support Vector Machines Neural Networks The Bootstrap

Supervised Learning

Unsupervised Learning

Reinforcement Learning Figure 1: Screen shots from five Atari 2600 Games: (Left-to-right) Pong, Breakout, Space Invaders, Seaquest, Beam Rider

Dr. Kemi Ola

Course Objective Introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") scientific problems. Through individual projects, students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which specific DS methods are applicable to a problem at hand. This course requires students to show substantial initiative in investigating methods that are applicable for their project. The lectures give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.

Logistics READ. THE. WIKI. http://www.csd.uwo.ca/~dlizotte/teaching/ids/ Instructor: Dan Lizotte dlizotte at uwo dot ca MC363 TA: Brent Davis bdavis56 at uwo dot ca Time: Tuesday from 2:30AM 4:30PM, and on Thursday from 2:30PM 3:30PM Place: Middlesex College MC105-B Communication: We will be using OWL for electronic communication. Question & Collaboration Hour: TBA, Thursdays after class

Materials READ. THE. WIKI. Required materials are materials that I expect you to consult if you have questions. Not required reading cover-to-cover.

Anticipated Topics and Schedule Introduction to Data Science: Definitions, Components, Relationships to Other Fields Data Cleaning: Working with structured data: selecting, filtering, joining, aggregating, Simple visualizations, Face validity Supervised Machine Learning: Regression, Classification. Linear Regression, SVMs, Trees, (Maybe also Reinforcement Learning and Sequential Decision Making) (Re)-introduction to Statistics: Data Summaries, Randomness, Sample Spaces and Events, Probability, Random Variables, Inference: Hypothesis testing, P-values, confidence Intervals Multivariate Statistics: conditional probability, correlation, independence Evaluation: Test set, cross-validation, bootstrap, confounding, causal inference Unsupervised Machine Learning, Representations, and Feature Construction: Clustering, Dimensionality reduction, Domain-specific Feature Development, Deep Learning, Images, Sounds, Text Visualization

Evaluation Daily Quizzes 5% Midterm - 35% Brainstorming Session 5% Project Proposal 4414/9114: 15% 9637: 10% Report Draft 5% Project Report 35% Peer Review 9637 only: 5%

Daily Quizzes Very short quiz at the beginning of class covering the previous day's materials The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. Quiz marks will only be excused for medical reasons.

Individual* Project Project Proposal 4414/9114: 15% 9637: 10% Document detailing the plan for the project. See Project Guidelines on the wiki for detailed requirements. Report Draft 5% The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project. Project Report 35% Each student will prepare a research paper detailing a substantive problem, the data available, the applicable DS methods, and empirical results obtained on the problem.

Brainstorming - 5% Each student will prepare a presentation explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be no more than 10 minutes. We will then discuss the problem as a class, along with possible approaches for solving the problem using ML methods. Student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback from the brainstorming session. See Project Guidelines on the Wiki for detailed requirements.

Brainstorming You must pick a brainstorming slot. 1. Request an account on the Wiki 2. Edit the schedule at the bottom of the main page, replacing SlotX with your name. Pick one before Friday, 6 October at 5pm or I will pick one for you and you won t like it.

Peer Review Each research graduate (9637) student will be assigned three project reports to review Primary Purpose: Provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project. Secondary Purpose: Give students a view of the variety of work that has been done in the course, and further develop reviewing skills. Reviews from other students will not affect the grade of the author in any way. See the wiki for more details.

Accessibility and Support, Missed Course Components Check the wiki.

Questions and Chat: Why are you here?