Overview of Machine Learning and H2O.ai

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Artificial Neural Networks written examination

CSL465/603 - Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Time series prediction

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A Case Study: News Classification Based on Term Frequency

Lecture 1: Basic Concepts of Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Generative models and adversarial training

Assignment 1: Predicting Amazon Review Ratings

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Learning From the Past with Experiment Databases

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.lg] 15 Jun 2015

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Model Ensemble for Click Prediction in Bing Search Ads

Data Fusion Through Statistical Matching

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Probabilistic Latent Semantic Analysis

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Australian Journal of Basic and Applied Sciences

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Top US Tech Talent for the Top China Tech Company

Universidade do Minho Escola de Engenharia

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning With Negation: Issues Regarding Effectiveness

Mining Association Rules in Student s Assessment Data

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS 446: Machine Learning

arxiv: v2 [cs.cv] 30 Mar 2017

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Applications of data mining algorithms to analysis of medical data

Modeling function word errors in DNN-HMM based LVCSR systems

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Modeling function word errors in DNN-HMM based LVCSR systems

Axiom 2013 Team Description Paper

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Laboratorio di Intelligenza Artificiale e Robotica

Attributed Social Network Embedding

Issues in the Mining of Heart Failure Datasets

Word Segmentation of Off-line Handwritten Documents

Learning Methods for Fuzzy Systems

Human Emotion Recognition From Speech

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Knowledge Transfer in Deep Convolutional Neural Nets

Reducing Features to Improve Bug Prediction

Learning Methods in Multilingual Speech Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Indian Institute of Technology, Kanpur

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Switchboard Language Model Improvement with Conversational Data from Gigaword

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

A study of speaker adaptation for DNN-based speech synthesis

A Deep Bag-of-Features Model for Music Auto-Tagging

arxiv: v2 [cs.ir] 22 Aug 2016

Statewide Framework Document for:

Text-mining the Estonian National Electronic Health Record

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.cv] 10 May 2017

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Computerized Adaptive Psychological Testing A Personalisation Perspective

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Speaker Identification by Comparison of Smart Methods. Abstract

Automating the E-learning Personalization

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Welcome to ACT Brain Boot Camp

A Review: Speech Recognition with Deep Learning Methods

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Transcription:

Overview of Machine Learning and H2O.ai

Machine Learning Overview

What is machine learning? -- Arthur Samuel, 1959

Why now? Data, computers, and algorithms are commodities Unstructured data Increasing competition in business

Estimating a model for inference Training a model for prediction What happened? Why? What will happen? Assumptions, parsimony, interpretation Predictive accuracy, production deployment Linear models, statistics Machine learning Models tend to be static Many models can evolve elegantly

Machine Learning Data Science Danger Zone? Traditional Research

1. There is no perfect language. 2. There is no perfect algorithm. 3. Doing things right is always hard. FREE LUNCH! If someone claims to have the perfect programming language, he is either a fool or a salesman or both. -- Bjarne Stroustrup Algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions. -- D.H. Wolpert Developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive. -- Google, Hidden Technical Debt in Machine Learning Systems Copyright 2014, SAS Institute Inc. All rights reserved.

H 2 O.ai Overview

Company Overview Founded 2011 Venture-backed, debuted in 2012 Products H2O: In-Memory AI Prediction Engine Sparkling Water: Spark Integration Steam: Deployment engine Deep Water: Deep Learning Mission Team Headquarters Operationalize Data Science, and provide a platform for users to build beautiful data products 70 employees Distributed Systems Engineers doing Machine Learning World-class visualization designers Mountain View, CA

H2O.ai Offers AI Open Source Platform Product Suite to Operationalize Data Science 100% Open Source Deep Water In-Memory, Distributed Machine Learning Algorithms with Speed and Accuracy State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O H2O Integration with Spark. Best Machine Learning on Spark. Operationalize and Streamline Model Building, Training and Deployment Automatically and Elastically

H 2 O.ai Now Focused On Experience Beyond Algorithms and Data VERTICALS H 2 O Flow Single web-based Document for code execution, text, mathematics, plots and rich media R, Python, Spark APIs Advanced, scalable ML in the language of your choice H 2 O Steam Elastic ML & Auto ML Operationalize Data Science H 2 O DATA Deep Water

High Level Architecture HDFS H2O Compute Engine S3 NFS Load Data Distributed In-Memory Loss-less Compression Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Local Data Prep Export: Plain Old Java Object Model Export: Plain Old Java Object SQL Production Scoring Environment Your Imagination

Intro to Machine Learning Algos

Algorithms on H2O Supervised Learning Unsupervised Learning Statistical Analysis Penalized Linear Models: Super-fast, super-scalable, and interpretable Naïve Bayes: Straightforward linear classifier Clustering K-means: Partitions observations into similar groups; automatically detects number of groups Decision Tree Ensembles Distributed Random Forest: Easy-touse tree-bagging ensembles Gradient Boosting Machine: Highly tunable tree-boosting ensembles Dimensionality Reduction Principal Component Analysis: Transforms correlated variables to independent components Generalized Low Rank Models: Extends the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Stacking Stacked Ensemble: Combine multiple types of models for better predictions Aggregator Aggregator: Efficient, advanced sampling that creates smaller data sets from larger data sets Neural Networks Multilayer Perceptron Deep Learning Deep neural networks: Multi-layer feed-forward neural networks for standard data mining tasks Convolutional neural networks: Sophisticated architectures for pattern recognition in images, sound, and text Anomaly Detection Term Embeddings Autoencoders: Find outliers using a nonlinear dimensionality reduction technique Word2vec: Generate context-sensitive numerical representations of a large text corpus

Supervised Learning Regression: How much will a customers spend? Classification: Will a customer make a purchase? Yes or No yes no y x j H 2 O algos: Penalized Linear Models Random Forest Gradient Boosting Neural Networks Stacked Ensembles X x i H 2 O algos: Penalized Linear Models Naïve Bayes Random Forest Gradient Boosting Neural Networks Stacked Ensembles

Unsupervised Learning Clustering: Grouping rows e.g. creating groups of similar customers Feature extraction: Grouping columns Create a small number of new representative dimensions Anomaly detection: Detecting outlying rows - Finding high-value, fraudulent, or weird customers x j DINK HINRY Soccer mom x j x j Fraudster PC 1 = -0.3 x i - 0.4 x i Weirdo Billionaire H 2 O algos: k means x i H 2 O algos: Principal components Generalized low rank models Autoencoders Word2Vec x i H 2 O algos: Principal components Generalized low rank models Autoencoders x i

Usage Recommendations Problems Penalized Linear Models Regression Classification Creates interpretable models with super-fast training time Nonlinear and interaction terms to be specified manually Can extrapolate beyond training data domain Select the correct target distribution Few hyperparameters to tune NAs Outliers/influential points Strongly correlated inputs Rare categorical levels in new data Naïve Bayes Classification Nonlinear and interaction terms should be specified by users Linear independence assumption Often less accurate than more sophisticated classifiers Rare categorical levels in new data Random Forest Regression Classification Builds accurate models without overfitting Few hyperparameters to tune Requires less data prep Great for implicitly modeling interactions Difficulty extrapolating beyond training data domain Can be difficult to interpret Rare categorical levels in new data Gradient Boosting Machines Neural Networks (Deep learning & MLP) Regression Classification Regression Classification Builds accurate models without overfitting (often more accurate than random forest) Requires less data prep Great for implicitly modeling interactions Great for modeling interactions in fully connected topologies Can extrapolate beyond training data domain Deep learning architectures best-suited for pattern recognition in images, videos, and sound Many hyperparameters Difficulty extrapolating beyond training data domain Can be difficult to interpret Rare categorical levels in new data NAs Overfitting Outliers/influential points Long training times Difficult to interpret Many hyperparameters Strongly correlated inputs Rare categorical levels in new data

Usage Recommendations Problems k - means Clustering Great for creating Gaussian, non-overlapping, roughly equally sized clusters The number of clusters can be unknown NAs Outliers/influential points Strongly correlated inputs Cluster labels sensitive to initialization Curse of dimensionality Principal Components Analysis Feature extraction Dimension reduction Anomaly detection Great for extracting a number <= N of linear, orthogonal features from i.i.d. numeric data Great for plotting extracted features in a reduceddimensional space to analyze data structure, e.g. clusters, hierarchy, sparsity, outliers NAs Outliers/influential points Categorical inputs Generalized Low Rank Models Feature extraction Dimension reduction Anomaly detection Matrix completion Great for extracting linear features from mixed data Great for plotting extracted features in a reduceddimensional space to analyze data structure, e.g. clusters, hierarchy, sparsity, outliers Great for imputing NAs Outliers/influential points Autoencoders (Neural Networks) Feature extraction Dimension reduction Anomaly detection Great for extracting a number of nonlinear features from mixed data Great for plotting extracted features in a reduced dimensional space to analyze structure, e.g. clusters, hierarchy, sparsity, outliers NAs Overtraining Outliers/influential points Long training times Many hyperparameters Strongly correlated inputs Rare categorical levels in new data Word2Vec Highly representative feature extraction from text Great for extracting highly representative, context sensitive term embeddings (e.g. numerical vectors) from text Great for text preprocessing prior to further supervised or unsupervised analysis Many Hyperparameters Overtraining Specifying term weightings prior to training Long training times