It s a Machine World. Predictive Analytics with Machine Learning

It s a Machine World Predictive Analytics with Machine Learning Greg Deckler gdeckler@fusionalliance.com @GregDeckler

Greg Deckler Fusion Alliance Solution Director Cloud Services Columbus, OH United States Email: gdeckler@fusionalliance.com LinkedIn: https://www.linkedin.com/in/gregdeckler Twitter: @GregDeckler PBI Community: smoupre ScoopIt: Business Intelligence Insights Founder of the Columbus Azure ML and Power BI User Group Author of Achieving Process Profitability, Building the IT Profit Center

Agenda What is Machine Learning? History of Machine Learning Why Machine Learning? Examples of Predictive Analytics Core Concepts Putting Theory into Practice Demo Common Issues in ML Operationalizing ML Resources Questions?

About Fusion Alliance

What is Machine Learning? Machine learning can be described as computing systems that improve with experience. It can also be described as a method of turning data into software. Whatever term is used, the results remain the same; data scientists have successfully developed methods of creating software models that are trained from huge volumes of data and then used to predict certain patterns, trends, and outcomes. Predictive analytics is the underlying technology behind Machine Learning, and it can be simply defined as a way to scientifically use the past to predict the future to help drive desired outcomes.

Machine learning was born from the quest for artificial intelligence Antiquity has stories of artificial beings The study of form or mechanical reasoning began with ancient philosophers But, where things really got moving was right around 1956 History

History - 1956 Dartmouth Summer Research Project on Artificial Intelligence John McCarthy, Marvin Minsky, Nathan Rochester, Claude Shannon Arthur Samuel, Allen Newell, Herbert Simon, Dr. Heintz Doofenshmirtz, Alloyse von Roddenstein, Dr. Diminutive, Dr. Killbot, Dr. Goatfish

History Arthur Lee Samuel Coined the term machine learning in 1959 The 1955 version of his checker playing program, the Samuel Checkers-playing Program, is arguably the first example of a self-learning program

History The Rift By 1980, machine learning as well as neural networks were out-of-favor within AI in favor of expert systems Machine learning, reorganized as a separate field, started to flourish in the 1990s. Changed goal to solvable problems of a practical nature Shifted away from symbolic approaches to statistics and probability theory

Machine Learning Recap Evolved from pattern recognition and computational learning theory Explores the study and construction of algorithms that can learn and make predictions on data Closely related and overlaps with computational statistics Has strong ties to mathematical optimization Sometimes conflated with data mining Used within the field of data analytics Tom M. Mitchell s formal definition (1997): "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." In short, turn data into programs to predict something

Why Machine Learning? Exponential data growth Cheap global digital storage Ubiquitous computing power The rise of big data analytics

Examples of Predictive Analytics Spam/junk email filters Mortgage applications Various forms of pattern recognition Life insurance Medical insurance Liability/property insurance Credit card fraud detection Airline flight scheduling Web search page results Predictive maintenance Proactive health management Warranty reserve estimation Propensity to buy Demand forecasting Predictive inventory planning Recommendation engines Dynamic pricing Credit worthiness evaluation Smart grid management Energy supply and demand Carbon emissions and trading Patient triage optimization

Core Concepts Data Preparation Types of Learning Approaches Outputs Questions Linearity Algorithms Training, Scoring and Evaluation

Data Preparation Relevant Connected Accurate Enough Access

Data Preparation - Relevant

Data Preparation - Connected

Data Preparation - Accurate

Data Preparation - Enough

Learning Supervised Unsupervised Reinforcement

Approaches Decision tree learning Association rule learning Artificial neural networks Deep learning Inductive logic programming Support vector machines Clustering Bayesian networks Reinforcement learning Representation learning Similarity and metric learning Sparse dictionary learning Genetic algorithms Rule-based machine learning Learning classifier systems

Approaches Supervised learning AODE Artificial neural network Backpropagation Autoencoders Hopfield networks Boltzmann machines Restricted Boltzmann Machines Spiking neural networks Bayesian statistics Bayesian network Bayesian knowledge base Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning Ripple down rules Symbolic machine learning Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Information fuzzy networks (IFN) Conditional Random Field ANOVA Hidden Markov models Linear classifiers Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machines Quadratic classifiers k-nearest neighbor Boosting Decision trees C4.5 Random forests ID3 CART SLIQ SPRINT Bayesian networks Naive Bayes

Approaches Reinforcement learning Temporal difference learning Q-learning Learning Automata SARSA Semi-supervised learning Generative models Low-density separation Graph-based methods Co-training Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Artificial neural network Self-organizing map Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm Hierarchical clustering Single-linkage clustering Conceptual clustering Cluster analysis K-means algorithm Fuzzy clustering DBSCAN OPTICS algorithm Outlier Detection Local Outlier Factor

Outputs Classification Anomaly Detection Regression Clustering Density Estimation Dimensionality Reduction

Questions Is this A or B? Is this weird? How much, how many? How is this organized? What should I do next?

What Questions does ML Answer? Will this tire fail in the next 1,000 miles: Yes or no? Which brings in more customers: a $5 coupon or a 25% discount?

What Questions does ML Answer? If you have a car with pressure gauges, you might want to know: Is this pressure gauge reading normal? If you're monitoring the internet you d want to know: Is this message from the internet typical?

What Questions does ML Answer? What will the temperature be next Tuesday? What will my fourth quarter sales be?

What Questions does ML Answer? Which viewers like the same types of movies? Which printer models fail the same way?

What Questions does ML Answer? If I'm a temperature control system for a house: Adjust the temperature or leave it where it is? If I'm a self-driving car: At a yellow light, brake or accelerate? For a robot vacuum: Keep vacuuming, or go back to the charging station?

Linearity

Algorithms Classification Anomaly Detection Regression Clustering

Algorithms Logistic Regression Support Vector Machine

Algorithms One-vs-All Multiclass Classifier Decision Trees

Algorithms Linear Regression Neural Networks

Algorithms K-means Principal Components Analysis

k-nearest Neighbors It s Just Simple Math... AODE Support Vector Machines Boltzmann Machines Random Forests k-means Clustering Fischer s Linear Discriminant Naive Bayes Classifier Quadratic Classifiers Perceptron

Training, Scoring and Evaluation Training Scoring Evaluation Cross Validation Confusion Matrix Accuracy (ACC) Precision (PPV) Recall (TPR) F1 Score (F 1 ) Area Under Curve

Break

Putting Theory into Practice Brainstorming What questions could we answer with data? Ranking What questions are most suitable for machine learning?

Putting Theory into Practice Value Suitability Data Available Complexity Score Question Type 1-5 Low 1-5 Low to High to High 1-5 Low to High 1-5 High to Low 1-5 Bad to Good Predict when and why a customer becomes a "Leaver" 3 5 5 4 4.25Is this A or B? Classification Route deliveries to ensure guaranteed timeframe is achieved 3 2 1 1 1.75What should I do now? Reinforcement Price products for greater unit sales and profitability 3 2 2 2 2.25How much, how many? Regression Analyze social media to understand customer personas 3 3 3 2 2.75How is this organized? Clustering Forecast sales and labor staffing efficiently 3 4 3 3 3.25How much, how many? Regression Determine media for best return on marketing investments 3 3 4 5 3.75 How much, how many? How is this organized? Regression Clustering

DEMO Don t Panic

Common Issues Bias Class Imbalance Problem

Bias Bias in the data Bias created by using ML

Class Imbalance Problem If there is a dataset consisting of 10000 genuine and 10 fraudulent transactions, the classifier will tend to classify fraudulent transactions as genuine transactions. The reason can be easily explained by the numbers. Suppose the machine learning algorithm has two possible outputs as follows: Model 1 classified 7 out of 10 fraudulent transactions as genuine transactions and 10 out of 10000 genuine transactions as fraudulent transactions. Model 2 classified 2 out of 10 fraudulent transactions as genuine transactions and 100 out of 10000 genuine transactions as fraudulent transactions.

Class Imbalance Problem

Class Imbalance Problem - Solutions Cost function based approaches Sampling

Cost Function Based Approaches The intuition behind cost function based approaches is that if we think one false negative is worse than one false positive, we will count that one false negative as, e.g., 100 false negatives instead. For example, if 1 false negative is as costly as 100 false positives, then the machine learning algorithm will try to make fewer false negatives compared to false positives (since it is cheaper).

Sampling - Undersampling

Sampling - Oversampling

Sampling - SMOTE

Operationalizing Machine Learning 1. Extract Power BI 4. Publish Power BI Service 2. Call web service R Gateway 3. Return predictions 5. Schedule refresh Azure ML

Platforms Automatic Business Modeler Algorithmia algorithms.io Amazon Machine Learning BigML DataRobot FICO Analytic Cloud Google Prediction API HPE Haven OnDemand IBM s Watson Analytics Microsoft Azure Machine Learning MLJAR.com PurePredictive Yottamine

Questions? Try Machine Learning, what s the worst that could happen? gdeckler@fusionalliance.com @GregDeckler