DATA SCIENCE Statistics Machine learning NLP R Python

Similar documents
Probability and Statistics Curriculum Pacing Guide

Python Machine Learning

STA 225: Introductory Statistics (CT)

Research Design & Analysis Made Easy! Brainstorming Worksheet

Lecture 1: Machine Learning Basics

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

School of Innovative Technologies and Engineering

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

CSL465/603 - Machine Learning

Statewide Framework Document for:

Assignment 1: Predicting Amazon Review Ratings

Visit us at:

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Applications of data mining algorithms to analysis of medical data

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Reducing Features to Improve Bug Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Introduction to the Practice of Statistics

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

UNIT ONE Tools of Algebra

Math 96: Intermediate Algebra in Context

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Indian Institute of Technology, Kanpur

AP Statistics Summer Assignment 17-18

MASTER OF PHILOSOPHY IN STATISTICS

12- A whirlwind tour of statistics

Theory of Probability

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Probabilistic Latent Semantic Analysis

Artificial Neural Networks written examination

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Mathematics subject curriculum

EGRHS Course Fair. Science & Math AP & IB Courses

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

Modeling function word errors in DNN-HMM based LVCSR systems

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Radius STEM Readiness TM

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Multi-Lingual Text Leveling

WHEN THERE IS A mismatch between the acoustic

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Affective Classification of Generic Audio Clips using Regression Models

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Individual Differences & Item Effects: How to test them, & how to test them well

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Algebra 2- Semester 2 Review

arxiv: v1 [cs.lg] 15 Jun 2015

A Case Study: News Classification Based on Term Frequency

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Mathematics. Mathematics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

School Size and the Quality of Teaching and Learning

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Axiom 2013 Team Description Paper

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Modeling function word errors in DNN-HMM based LVCSR systems

Australian Journal of Basic and Applied Sciences

Activity Recognition from Accelerometer Data

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Rule Learning With Negation: Issues Regarding Effectiveness

Truth Inference in Crowdsourcing: Is the Problem Solved?

Model Ensemble for Click Prediction in Bing Search Ads

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Rule Learning with Negation: Issues Regarding Effectiveness

Time series prediction

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Ohio s Learning Standards-Clear Learning Targets

On the Distribution of Worker Productivity: The Case of Teacher Effectiveness and Student Achievement. Dan Goldhaber Richard Startz * August 2016

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

Introduction to Causal Inference. Problem Set 1. Required Problems

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Detailed course syllabus

Lesson M4. page 1 of 2

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Physics 270: Experimental Physics

Self Study Report Computer Science

Technical Manual Supplement

Office Hours: Mon & Fri 10:00-12:00. Course Description

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

Unit 3: Lesson 1 Decimals as Equal Divisions

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Calibration of Confidence Measures in Speech Recognition

Transcription:

DATA SCIENCE Statistics Machine learning NLP R Python About the Course Data Science is the study of the generalizable extraction of knowledge from data. Being a data Scientist requires an integrated skill set spanning mathematics, statistics, machine learning, databases and programming languages along with a good understanding of the craft of problem formulation to engineer effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. - Students will learn concepts, techniques and tools they need to deal with various facets of datascience practice, including data collection and integration, exploratory data analysis, predictivemodeling, descriptive modeling, data product creation, evaluation, and effective communication. - The focus in the treatment of these topics will be a balanced approach on breadth and depth, and emphasis willbe placed on integration and synthesis of concepts and their application to real time problems. - Tomake the learning contextual, real datasets from a variety of disciplines will be used. Program Highlights Most Comprehensive Curriculum Trained by passionate and Industry experts Each concept will be explained by golden rule Theory Example Software Implementation (R/Python) Real-Time applicability Designed for the Industry Live Project Placement Assistance Audience Any degree. No programming and Statistics knowledge required. Duration& Mode of Training 3 months, Online Training

Course Content INTRODUCTION Introduction to Data Science the 3 W s What is Data Science? Why now? Where Data Science is applicable? DATA EXPLORATION USING STATISTICAL METHODS DESCRIPTIVE AND INFERENTIAL STATISTICS Introduction to statistics Summarizing Data Central Tendency measures Mean, Median and Mode Measures of Variability Range, Interquartile Range, Standard Deviation and Variance Measures of Shape Skewness and Kurtosis Covariance, Correlation Data Visualization Histograms Pie charts Bar Graphs Box Plot Probability basics Parametric and Non parametric Statistical Tests f Test z Test t Test Chi-Square test Probability Distributions Expected value and variance Discrete and Continuous Bernoulli Distribution Binomial Distribution PoissonDistribution Normal Distribution Exponential Distribution

Empirical Rule Chebyshev s Theorem Sampling methods and Central Limit Theorem Overview Random sampling Stratified sampling Cluster sampling Central Limit Theorem Hypothesis Testing Type I error Type II error Null and Alternate Hypothesis Reject or Acceptance criterion P-value Confidence Intervals ANOVA Assumptions One way Two way MACHINE LEARNING INTRODUCTION Introduction to Machine Learning What is Machine Learning? Statistics (vs) Machine Learning Types of Machine Learning - Supervised Learning - Un-Supervised Learning - Reinforcement Learning SUPERVISED MACHINE LEARNING Classification Nearest Neighbor Methods (knn) Logistic Tree based Models Decision Tree Basics Classification Trees

Regression Trees Probabilistic methods Bayes Rule Naïve Bayes Regression Analysis Simple Linear Regression Assumptions Model development and interpretation Sum of Least Squares Model validation Multiple Linear Regression Regression Shrinkage Methods Lasso Ridge Advanced Models Black Box Support Vector Machine Neural Networks Ensemble Models Bagging Boosting Random Forests Optimization Gradient Descent (Batch and Stochastic) Recommendation Systems Collaborative filtering - User based filtering - Item based filtering UNSUPERVISED MACHINE LEARNING Association Rules (Market Basket Analysis) Apriori Cluster Analysis Hierarchical clustering K-Means clustering Dimensionality Reduction Principal Component Analysis Discriminant Analysis (LDA/GDA)

MODEL VALIDATION Confusion Matrix ROC Curve (AUC) Gain and Lift Chart Kolmogorov-Smirnov Chart Root Mean Square Error (RMSE) Cross Validation Leave one out cross validation (LOOCV) K-fold cross validation NATURAL LANGUAGE PROCESSING Introduction to Natural Language Processing Sentiment Analysis Text Similarity R Programming Language Introduction R Overview Installation of R and RStudio software Important R Packages Datatypes in R Vectors, Lists, Matrices, Arrays, Data Frames Decision making & Loops If-else, while,for Next, break.try-catch Functions Writing functions Nested functions Built-in functions Vapply, Sapply, Tapply, Lapplyetc. Data Preparation/Manipulation Reading and Writing Data Summarize and structure of data Exploring different datasets in R Sub Setting Data Frames String manipulation in Data Frames

Handling Missing Values, Changing Data types, Data Binning Techniques, Dummy Variables Data Visualization using ggplot2 Basic charts Histograms, Bar plots, Line graphs, Scatter plots etc. Python Programming Language Introduction How is Python different from R Installing Anaconda- Python Setting up with spyder Datatypes in Python Importing modules Introduction to Strings String manipulation Control loops: For While If else Functions Lambda apply Numpy Pandas Introduction to Dataframes Conversion of written R codes into python Scipy-Machine Learning in Python Beautiful Soup Matplotlib