CptS 483:04 Introduction to Data Science

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mining Association Rules in Student s Assessment Data

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

(Sub)Gradient Descent

Rule Learning With Negation: Issues Regarding Effectiveness

Python Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CSL465/603 - Machine Learning

Reducing Features to Improve Bug Prediction

Rule Learning with Negation: Issues Regarding Effectiveness

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning From the Past with Experiment Databases

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

CS Machine Learning

CS 100: Principles of Computing

Lecture 1: Machine Learning Basics

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Applications of data mining algorithms to analysis of medical data

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Course Development Using OCW Resources: Applying the Inverted Classroom Model in an Electrical Engineering Course

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 1: Basic Concepts of Machine Learning

Content-based Image Retrieval Using Image Regions as Query Examples

BUAD 425 Data Analysis for Decision Making Syllabus Fall 2015

Computer Science 141: Computing Hardware Course Information Fall 2012

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MTH 215: Introduction to Linear Algebra

Statistics and Data Analytics Minor

Syllabus Education Department Lincoln University EDU 311 Social Studies Methods

Introduction to Forensic Drug Chemistry

Assignment 1: Predicting Amazon Review Ratings

GLOBAL INSTITUTIONAL PROFILES PROJECT Times Higher Education World University Rankings

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Australian Journal of Basic and Applied Sciences

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

B. How to write a research paper

Instructor Experience and Qualifications Professor of Business at NDNU; Over twenty-five years of experience in teaching undergraduate students.

BA 130 Introduction to International Business

Georgetown University School of Continuing Studies Master of Professional Studies in Human Resources Management Course Syllabus Summer 2014

Efficient Online Summarization of Microblogging Streams

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Physics XL 6B Reg# # Units: 5. Office Hour: Tuesday 5 pm to 7:30 pm; Wednesday 5 pm to 6:15 pm

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Laboratorio di Intelligenza Artificiale e Robotica

Firms and Markets Saturdays Summer I 2014

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

Navigating the PhD Options in CMS

Instructor: Matthew Wickes Kilgore Office: ES 310

Activity Recognition from Accelerometer Data

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Design and Creation of Games GAME

Axiom 2013 Team Description Paper

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Course Syllabus for Math

BIODIVERSITY: CAUSES, CONSEQUENCES, AND CONSERVATION

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

INTRODUCTION TO DECISION ANALYSIS (Economics ) Prof. Klaus Nehring Spring Syllabus

Word Segmentation of Off-line Handwritten Documents

Probabilistic Latent Semantic Analysis

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Indian Institute of Technology, Kanpur

Customized Question Handling in Data Removal Using CPHC

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

EEAS 101 BASIC WIRING AND CIRCUIT DESIGN. Electrical Principles and Practices Text 3 nd Edition, Glen Mazur & Peter Zurlis

CS 3516: Computer Networks

faculty of science and engineering Appendices for the Bachelor s degree programme(s) in Astronomy

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Maintaining Resilience in Teaching: Navigating Common Core and More Site-based Participant Syllabus

Cross Language Information Retrieval

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

(1) The History, Structure & Function of Urban Settlements; (2) The Relationship Between the Market and the Polis in Economics, Policy and Planning;

B.S/M.A in Mathematics

Business 712 Managerial Negotiations Fall 2011 Course Outline. Human Resources and Management Area DeGroote School of Business McMaster University

Course Specifications

Computer Science 1015F ~ 2016 ~ Notes to Students

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

GRADUATE COLLEGE Dual-Listed Courses

Course outline. Code: ENS281 Title: Introduction to Sustainable Energy Systems

PBHL HEALTH ECONOMICS I COURSE SYLLABUS Winter Quarter Fridays, 11:00 am - 1:50 pm Pearlstein 308

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Generative models and adversarial training

General Physics I Class Syllabus

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Leveraging MOOCs to bring entrepreneurship and innovation to everyone on campus

Top US Tech Talent for the Top China Tech Company

MGMT 479 (Hybrid) Strategic Management

Programme Specification

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

DISCLAIMER. Mechanical Mechanical and Aerospace Mechanical and Materials. Options for Final Year Thesis and Design Projects. David Mee Carl Reidsema

Transcription:

CptS 483:04 Introduction to Data Science Fall 2017 8/20/17 1

About me Name: Assefaw Gebremedhin Office: EME B43 Webpage: www.eecs.wsu.edu/~assefaw Joined WSU: Fall 2014 Research interests: combinatorial scientific computing, network science, data mining, machine learning, high performance computing, bioinformatics Lab: Scalable Algorithms for Data Science Laboratory (https://scads.eecs.wsu.edu) NSF CAREER project: Fast and Scalable Combinatorial Algorithms for Data Analytics www.eecs.wsu.edu/~assefaw/fascada Teaching at WSU: CptS 483: Intro to Data Science (Fall 2015, 2016, 2017) CptS 591: Elements of Network Science (Spring 2015, 2016, 2017) CptS/STAT 424: Data Analytics Capstone (Planned)

About Data Science Class of 2017 (What I know so far) Current enrollment: 30 By level: Graduate: 12 (7 PhD, 5 MS) Undergraduate: 16 (Senior) Post-bacc undergraduate: 2 By program: Computer Science: 22 Electrical Engineering: 2 Computer Engineering: 1 Software Engineering: 1 Bio and Ag Engineering: 1 Mathematics: 1 Antropology: 1 Biology: 1

Course websites Public course site: https://scads.eecs.wsu.edu/index.php/data-science Syllabus Overview of schedule (updated after every lecture) Resources OSBLE+: https://plus.osble.org Lecture material Assignments Announcements Posts Submissions and feedback Currently: 18 added users; 12 whitelisted (be sure to respond to invitation ASAP)

Course Description Data Science is the study of the generalizable extraction of knowledge from data. Data science requires integrated skill set spanning Computer science Mathematics & Statistics Domain expertise + art of problem formulation to engineer effective solutions Purpose of this course: introduce basic principles, tools, and general mindset Emphasis on breadth rather than depth; and on synthesis of concepts Primarily uses the statistical computing language R

Expectation Basic knowledge of algorithms and reasonable programming experience (equivalent to completing CptS 223) Familiarity with basic linear algebra Basic probability and statistics Deficiencies can to a degree be overcome with extra effort

Topics 1. Introduction: What is Data Science? 2. Statistical Learning and Intro to R 3. Exploratory Data Analysis and the Data Science Process 4. Linear Regression 5. Classification K-NN, Logistic regression, Naïve Bayes classifier, Decision Trees 6. Unsupervised Learning K-means clustering, Hierarchical clustering, Principal Components Analysis 7. Data Wrangling Data cleaning, data reshaping, data integration; dplyr, tidyr 8. Data Visualization 9. Time Series Data Mining Distance measures, transformations, algorithms, tools (Matrix Profile, SAX) 10. Recommender Systems and Social Network Mining 11. Intro to Deep Learning 12. Data Science and Ethics

A few things Pre-course survey Your background Level of familiarity with R, Python, MathLab Topics you are excited about Other topics you wish to see covered Complete and submit on OSBLE R tutorial (Python tutorial) Tutorial generally preferred time

Course work and assessment Assignments (30%) About 4 throughout the semester Completed and submitted individually Each of the assignments carries equal weight Semester Project (30%) Team of two or three Option between choosing from a given list OR propose own project Guidelines will be provided Exam (30%) Late midterm Designed to cover most material AND complement assignments and semester project Class participation (10%) Attendance Active participation

Weekly Schedule

Learning Outcomes Describe what Data Science is and the skill sets needed Describe the Data Science Process Use R to carry out basic statistical modeling and analysis Carry out exploratory data analysis Apply basic machine learning algorithms for predictive modeling Apply unsupervised learning methods to discover patterns, trends and anomalies in data Use effective data wrangling approaches to manipulate data Identify and explain mathematical and algorithmic ingredients of a recommender system Create effective visualization of data Reason around ethical and private issues in data science and apply ethical practices Work effectively in teams on data science projects Apply knowledge gained in the course to carry out a project and write technical report

Books No required textbook Lecture notes (slides) and reading material will be made available on the OSBLE+ page References Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 2013. (Freely available online) Cathy O'Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O'Reilly. 2014. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1. Cambridge University Press. 2014. (Freely available online) Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers. 2012. Ethem Alpaydin. Introduction to Machine Learning. Third Edition. MIT Press, 2014. Nathan Yau. Visualize This: The FlowingData Guide to Design, Visualization, and Statistrics. Wiley Publications, 2011. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016. (Freely available online)

Policies Conduct in class Silence personal electronics Arrive on time and remain throughout the class Correspondence Happens via OSBLE+ Attendance Required. Make sure absences are cleared with me Missing or late work Max 48 hrs with 10% penalty per 24 hrs Academic Integrity Strongly enforced Consult syllabus for more details