Practical Data Science with R

Similar documents
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

(Sub)Gradient Descent

Python Machine Learning

Multivariate k-nearest Neighbor Regression for Time Series data -

CS 446: Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Lecture 1: Machine Learning Basics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

M55205-Mastering Microsoft Project 2016

Reducing Features to Improve Bug Prediction

Computational Data Analysis Techniques In Economics And Finance

CS Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CSL465/603 - Machine Learning

Probability and Statistics Curriculum Pacing Guide

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

STA 225: Introductory Statistics (CT)

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Generative models and adversarial training

Applications of data mining algorithms to analysis of medical data

Word Segmentation of Off-line Handwritten Documents

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

SECTION 12 E-Learning (CBT) Delivery Module

Office Hours: Mon & Fri 10:00-12:00. Course Description

Data Fusion Through Statistical Matching

OFFICE SUPPORT SPECIALIST Technical Diploma

Outreach Connect User Manual

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

Interactive Whiteboard

Radius STEM Readiness TM

Introduction to Mobile Learning Systems and Usability Factors

On-Line Data Analytics

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Eduroam Support Clinics What are they?

Time series prediction

International Business Bachelor. Corporate Finance. Summer Term Prof. Dr. Ralf Hafner

Human Emotion Recognition From Speech

Computerized Adaptive Psychological Testing A Personalisation Perspective

Lecture 1: Basic Concepts of Machine Learning

The Teaching and Learning Center

Navigating the PhD Options in CMS

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Universidade do Minho Escola de Engenharia

Minitab Tutorial (Version 17+)

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

Learning Methods for Fuzzy Systems

Introduction to Causal Inference. Problem Set 1. Required Problems

Statistics and Data Analytics Minor

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

McGraw-Hill Connect and Create Built by Blackboard. Release Notes. Version 2.3 for Blackboard Learn 9.1

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

arxiv: v1 [cs.lg] 15 Jun 2015

Rule Learning With Negation: Issues Regarding Effectiveness

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Australian Journal of Basic and Applied Sciences

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Mining Association Rules in Student s Assessment Data

Speech Emotion Recognition Using Support Vector Machine

Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning

Innovative Methods for Teaching Engineering Courses

Bachelor Programme Structure Max Weber Institute for Sociology, University of Heidelberg

WHEN THERE IS A mismatch between the acoustic

Rule Learning with Negation: Issues Regarding Effectiveness

COVER SHEET. This is the author version of article published as:

Dialogue Live Clientside

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

MOODLE 2.0 GLOSSARY TUTORIALS

Speech Recognition at ICSI: Broadcast News and beyond

Self Study Report Computer Science

Research computing Results

Axiom 2013 Team Description Paper

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

An Introduction to the Minimalist Program

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Houghton Mifflin Online Assessment System Walkthrough Guide

Computed Expert System of Support Technology Tests in the Process of Investment Casting Elements of Aircraft Engines

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Data Stream Processing and Analytics

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter

Outline for Session III

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Modeling function word errors in DNN-HMM based LVCSR systems

Edinburgh Research Explorer

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

PeopleSoft Human Capital Management 9.2 (through Update Image 23) Hardware and Software Requirements

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

SCT Banner Student Fee Assessment Training Workbook October 2005 Release 7.2

AQUA: An Ontology-Driven Question Answering System

Note: Principal version Modification Amendment Modification Amendment Modification Complete version from 1 October 2014

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

Tavastia Way of Finnish Language Support during Vocational Studies. Tiina Alhainen Coordinator of Multicultural Issues Tavastia Education Consortium

Android App Development for Beginners

Transcription:

Practical Data Science with R Instructor Matthew Renze Twitter: @matthewrenze Email: info@matthewrenze.com Web: http://www.matthewrenze.com Course Description Data science is the practice of transforming data into knowledge. R is the most popular programming language used by data scientists. In our data-driven economy, this combination of skills is in extremely high demand, commanding significant increases in salary, as it is revolutionizing the world around us. In this workshop, we ll learn about the practice of data science, the R programming language, and how they can be used to transform data into actionable insight. In addition, we ll learn how to transform and clean our data, create and interpret descriptive statistics, data visualizations, and statistical models. We ll also learn how to handle Big Data, make predictions using machine learning algorithms, and deploy R to production. Prerequisites Please bring your own Windows laptop and complete 0 to install all of the necessary software before the workshop begins. Module Descriptions 1. Introduction introduce the practice of data science and the R programming language 2. Working with Data learn how to import, transform, clean, and export data 3. Descriptive Statistics learn how to create and interpret univariate and bivariate statistics 4. Data Visualization learn how to create univariate, bivariate, and multivariate data visualizations 5. Statistical Modeling learn to create Gaussian models and simple linear regression models 6. Handling Big Data learn about big data and how to handle it with tools in R 7. Machine Learning learn about ML and how to train, test, and implement ML models 8. R in Practice learn about R in production, reproducible research, and industry best practices

Learning Objectives When students are finished with this workshop, they should understand the following: Introduction What data science is, why it is important, and how the process of data science works What R is and why it has become so popular for data science How to create data types, data structures, subset data tables, and find help on R topics Working with Data What data munging is, what clean data are, and the steps involved in the data munging process How to import, transform, clean, and export data How to use the dplyr package in R Descriptive Statistics What descriptive statistics are and how they can be used to make sense of data What types of variables exist and the corresponding types of data analysis we can perform How to create standard univariate and bivariate descriptive statistics Data Visualization What data visualization is and how we can use it to identify patterns in data What types of data visualization we can create based on the question we are trying to answer How to create and interpret univariate, bivariate, and multivariate data visualizations Statistical Modeling What a statistical model is and how it can be used for statistical inference How to create and generate data with a Gaussian distribution model How to create and predict with a simple linear regression model Handling Big Data What Big Data is and what are the limitations of R How to work around these limitations with sampling and 3 rd -party tools Machine Learning What machine learning is and how it can be used to make predictions How to train, test, and implement a machine learning algorithm How to predict with k-mean cluster analysis, decision trees, naïve Bayes, and neural networks R in Practice How to use R in production with tools like R Server and shiny What industry best practices exist for using R for data science How to create reproducible research with R markdown

Course Outline Introduction to Data Science and R Introduction to Data Science What is data science? Why is data science important? The data science process Introduction to R What is R? Why is R so popular for data science? R language basics Installation and setup Hello World Working with data types Working with data structures Working with data frames Miscellaneous topics Working with Data What is data munging? What are clean data? The data munging process Data munging tools Importing data Transforming data Cleaning data Exporting data Using dplyr

Descriptive Statistics What are descriptive statistics? Types of data analysis Univariate descriptive statistics Bivariate descriptive statistics Creating univariate descriptive statistics Creating bivariate descriptive statistics Data Visualization What is data visualization? Univariate data visualizations Bivariate data visualizations Multivariate data visualizations Creating univariate data visualizations Creating bivariate data visualizations Creating multivariate data visualizations Statistical Modeling What are statistical models? Gaussian distribution models Linear regression models Creating Gaussian distribution models Creating linear regression models Handling Big Data What is Big Data? How to handle big data? Using ff to work with large data sets Creating linear regression models with biglm

Machine Learning What is machine learning? Types of machine learning The machine learning process Predicting with k-means cluster analysis Creating training and test data sets Predicting with decision trees Predicting with naïve Bayes classifiers Predicting with neural networks R in Practice Using R in production Best practices Reproducible research Exporting charts Using shiny Creating R markdown