Lesson 1: Business Concepts. Lesson 2: Foundations of probability and statistics [4 sessions]

Similar documents
Python Machine Learning

Probability and Statistics Curriculum Pacing Guide

Lecture 1: Machine Learning Basics

STA 225: Introductory Statistics (CT)

(Sub)Gradient Descent

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning From the Past with Experiment Databases

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.lg] 15 Jun 2015

Reducing Features to Improve Bug Prediction

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Generative models and adversarial training

CSL465/603 - Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

School of Innovative Technologies and Engineering

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Australian Journal of Basic and Applied Sciences

Detailed course syllabus

Research Design & Analysis Made Easy! Brainstorming Worksheet

Time series prediction

CS 446: Machine Learning

Multivariate k-nearest Neighbor Regression for Time Series data -

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

CS Machine Learning

A Case Study: News Classification Based on Term Frequency

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning With Negation: Issues Regarding Effectiveness

Universidade do Minho Escola de Engenharia

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Applications of data mining algorithms to analysis of medical data

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Rule Learning with Negation: Issues Regarding Effectiveness

Artificial Neural Networks written examination

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

arxiv: v2 [cs.cv] 30 Mar 2017

APPENDIX A: Process Sigma Table (I)

AP Statistics Summer Assignment 17-18

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Axiom 2013 Team Description Paper

Multi-Lingual Text Leveling

On-Line Data Analytics

Grade 6: Correlated to AGS Basic Math Skills

Mining Association Rules in Student s Assessment Data

Statewide Framework Document for:

A survey of multi-view machine learning

Data Fusion Through Statistical Matching

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Probabilistic Latent Semantic Analysis

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

WHEN THERE IS A mismatch between the acoustic

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Hierarchical Linear Models I: Introduction ICPSR 2015

learning collegiate assessment]

MASTER OF PHILOSOPHY IN STATISTICS

Truth Inference in Crowdsourcing: Is the Problem Solved?

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

Softprop: Softmax Neural Network Backpropagation Learning

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Indian Institute of Technology, Kanpur

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Visit us at:

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Learning Methods for Fuzzy Systems

Human Emotion Recognition From Speech

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

12- A whirlwind tour of statistics

Sample Performance Assessment

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

CS/SE 3341 Spring 2012

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

BMBF Project ROBUKOM: Robust Communication Networks

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Knowledge Transfer in Deep Convolutional Neural Nets

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Comparison of Two Text Representations for Sentiment Analysis

Mathematics subject curriculum

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Activity Recognition from Accelerometer Data

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Lecture 1: Basic Concepts of Machine Learning

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Platform for the Development of Accessible Vocational Training

Transcription:

Lesson 1: Business Concepts Session 1: you will be taught analytics life cycle and how analytics is used in real time with case studies and example - Analytics landscape and components - Analytics frame-work-crisp-dm, - Real life analytics project examples Lesson 2: Foundations of probability and statistics [4 sessions] Session 1: This session introduces you with how statistics is used in business with basic statistical concepts like levels of data and measures of central tendencies -Measures of central tendencies on ungrouped data (mean, median, mode) - Measures of Variability on Ungrouped data (Range, IR, Variance Standard Deviation, Z scores, coefficient of variance) - Measures of shape (Skewness and the Relationship of the Mean, Median, and Mode, Coefficient of Skewness Kurtosis, Box-and-Whisker Plots, Histograms) - Introduction to probability Session 2: You will get into the deeper aspects of various distributions. You will understand the parameters that define the probability distributions and differences between discrete and continuous distributions. Discrete probability distributions: Bernoulli, Binomial, Geometric, Poisson and properties of each. Continuous probability distributions: Normal distribution; t-distribution, Exponential Distribution Session 3: You will also start making statistical inferences about populations from samples. Sampling Estimating the Population Mean Using the Z-Statistic and T- statistic Hypothesis testing confidence interval Session 4: Till this point you will have received the complete picture how to understand the data, attributes, distributions, sample versus population, and procedure for statistical testing, etc. While you continue the analysis of a variable, you will extend that understanding to analyse the relationship between variables. chi-square test,t-test, z-test, F-test one- way -ANOVA, two -way -Anova Assignment on statistics

Lesson 3: Introduction to programming [Four Sessions] Session 1: In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. Control structures and functions. : str(), class(), length(), nrow(), ncol(), seq(), cbind (), rbind(), merge(), Data manipulation techniques : The various steps involved in Data Cleaning, functions used in Data Inspection, tackling the problems faced during Data Cleaning, uses of the functions like grepl(), grep(), sub(), Coerce the data, uses of the apply() functions. Session 2: We will continue with R concepts and execute statistical concepts using R Pre-processing Techniques: Binning, Filling missing values, Standardization & Normalization, type conversions, train-test data split, ROCR1 other R concepts Exploratory Data analysis Session 3: Preparing Data as an input for machine learning algorithms Assignment on R understandings Session 4: you will be executing all machine algorithms in R Lesson 4: Big Data and Apache spark [Five Sessions] Big Data: Why and Where Data -- it's been around (even digitally) for a while. What makes data "big" and where does this big data come from? Session 1: Have you ever heard about such technologies as Hadoop ecosystems like Hdfs, Map Reduce, and Spark? Always wanted to learn these new tools but missed concise starting material? This is four session module covers Big Data framework Characteristics of Big Data and Dimensions of Scalability Getting Value out of Big Data Relation between Big data and data science Getting Started with Hadoop Session 2: You will be introduced to Hadoop Eco system and map reduce programming. You will learn to use Sqoop, Hive to ingest and query non-trivial relational data sets. Use Hadoop-as-a-service platforms like HDP Hive, pig,map reduce Data loading using sqoop, oozie Data capturing using Flume

Session 3: You will be introduced to the top framework Apache Spark which has overtaken Hadoop as the most active open source Big Data framework In this module, you will understand different frameworks available for Big Data Analytics and the module also includes a first-hand introduction to Spark, demo on Building and Running a Spark Application and Web UI. Big Data Analytics with Batch & Real-Time Processing Why Spark is needed? What is Spark? How Spark Differs from Its Competitors? Spark at ebay Spark s Plae i Hadoop Eosyste Spark Copoets & It s Arhiteture Running Programs on Spark Shell Spark Web UI Configuring Spark Properties Hands On: Building and Running Spark Application Spark Application Web UI Configuring Spark Properties Session 4: Continuation of Apache spark how machine learning can be applied with apache spark Rdds MLib spark streaming Lesson 5: Regression concepts [Three Sessions] Predictive analytics search for patterns found in historical and transactional data to understand a business problem and predict future events. In many business problems, we try to deal with data on Several variables, sometime more than the number of observations. Regression models help us understand the relationships among these variables and how the relationships can be exploited to make decisions. Primary objective of this module is to understand how regression and causal forecasting models can be used to analyse real-life business problems such as prediction, classification and discrete choice problems. The focus will be case-based practical problem-solving using predictive analytics techniques to interpret model outputs. Session 1: Linear Regression Simple linear regression Coefficient of determination, Significance Tests, Residual Analysis, Confidence and Prediction intervals Multiple linear regression: Coefficient of determination, Interpretation of regression coefficients, Categorical variables in regression Heteroscedasticity, Multi-collinearity outliers, R-square and goodness of fit Hypothesis testing of Regression Model Transformation of variables Polynomial Regression Case Study

Session 2: Logistic Regression Logistic regression is a method for classifying data into discrete outcomes. For example, we might use logistic regression to classify an email as spam or not spam. In this module, we introduce the notion of classification, the cost function for logistic regression, and the application of logistic regression to multi-class classification. Logistic function, Estimation of probability using logistic regression, Model Evaluation Confusion Matrix Session 3: Time series data The focus is on analysing and understanding Time Series with financial markets as the case study. Trend analysis Cyclical and Seasonal analysis Smoothing; Moving averages; Auto-correlation; ARIMA; ARIMAX Applications of Time Series in financial markets Session 1: Clustering Lesson 6: Machine learning [Twelve Sessions] What is Clustering? Clustering Examples in Business Verticals Solution strategies for Clustering Finding pattern and Fixed Pattern Approach Limitations of Fixed Pattern Approach Machine Learning Approaches for Clustering Iterative based K-Means & K-Medoid Approaches Hierarchical Agglomerative Approaches Density based DB-SCAN Approach Evaluation Metrics for Clustering Cohesion, Coupling Metrics Correlation Metric Session 2 : Naive Bayes Conditional probability. Conditional independence. Bayes rule and examples. Naive Bayes algorithm. Space and Time complexity: train and test time.

Session 3: KNN Session 4: Decision Trees Laplace/Additive Smoothing. Under fitting and over fitting. Feature importance and interpretability Intuitive idea of KNN classification KNN learning Limitations of KNN KNN Regression Applying KNN and parameter tuning Pros and Cons of the Model Geometric Intuition: Axis parallel hyper planes. Nested if-else conditions. Sample Decision tree. Building a decision Tree: Entropy, Information Gain Gini Impurity (CART) Depth of a tree: Geometric and programming intuition. Categorical features with many levels. Regression using Decision Trees. Bias-Variance trade-off. Limitations Session 5: Support vector machines (SVM) Geometric intuition. Mathematical derivation. Loss function (Hinge Loss) based interpretation. Support vectors. Linear SVM. Non-linear svm and kernel function Primal and Dual. Kernelization. RBF-Kernel. Polynomial kernel. Domain specific Kernels. Train and run time complexities. Bias-variance trade-off: Under fitting and Over fitting Nu-SVM: control errors and support vectors. SVM Regression. Session 6: Neural networks History of Neural networks and Deep Learning. Pereptro s Self-organizing maps Auto encoders

Back propagation and typical feed forward algorithm Sigmoidal Activation functions. Mathematical formulation. Back propagation and chain rule of differentiation Vanishing Gradient problem. Bias-Variance Trade-off. Determining the number of levels. Decision surfaces. Session 7: Ensemble Methods Understanding Weak Learners Approaches for Ensemble learning: Boosting, Bagging and Randomization Bagging Idea in depth and why it works? Bootstrapped Aggregation (Bagging) Random Forest and their construction. Bias-Variance trade-off Gradient Boosting and XGBoost Algorithm. Loss function and advantages. XGBoost code samples AdaBoost: geometric intuition. Cascading models Stacking models. Case Study Session 8: Association Rules Case Study Apriority Model Intuitive Idea Apriority Model Applying the Algorithm and tuning Pros and Cons of the Model Recommender systems user - user item -item content based Session 9: Feature engineering Dimensionality Reduction PCA and EDA Principal Component Analysis. Why learn it. Geometric intuition.

Case Study Mathematical objective function. Alternative formulation of PCA: Distance minimization Eigen values and Eigen vectors. PCA for dimensionality reduction and visualization. Visualize MNIST dataset. Limitations of PCA Session 10 And Session 11: Business case analysis The ojetive of this sessio is to provide a appliatio ad ed-to-end view of solving a Data Science problem and defend your analysis. We provide a business case in advance in which you will be required to apply all the data pre-processing steps and prepare the input for ML algorithms learnt thus far. The lab is designed such that everyone participates in a discussion, design the solution approach for the given business case and defend the analysis approach Hands on Revision Lesson 7: Deployment Amazon cloud Microsoft Azure Lesson 8: Data Visualization Tableau Real Time Dashboards Lesson 9: Advanced Concepts [2-3 sessions] Session 12 And Session 13: Foundation classes on which take you to advanced Text Mining Deep learning Natural Language processing