Big Data Terms, Tools and Algorithms. What i ve l earned in t he past 12 months

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Python Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

Top US Tech Talent for the Top China Tech Company

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

CSL465/603 - Machine Learning

Axiom 2013 Team Description Paper

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

(Sub)Gradient Descent

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

A Case Study: News Classification Based on Term Frequency

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Machine Learning and Development Policy

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probabilistic Latent Semantic Analysis

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Computerized Adaptive Psychological Testing A Personalisation Perspective

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Strategy and Design of ICT Services

Laboratorio di Intelligenza Artificiale e Robotica

Applications of data mining algorithms to analysis of medical data

Universidade do Minho Escola de Engenharia

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Getting Started with Deliberate Practice

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evolution of Symbolisation in Chimpanzees and Neural Nets

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Accessing Higher Education in Developing Countries: panel data analysis from India, Peru and Vietnam

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Human Emotion Recognition From Speech

Information System Design and Development (Advanced Higher) Unit. level 7 (12 SCQF credit points)

University of Groningen. Systemen, planning, netwerken Bosman, Aart

MAKING YOUR OWN ALEXA SKILL SHRIMAI PRABHUMOYE, ALAN W BLACK

success. It will place emphasis on:

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Model Ensemble for Click Prediction in Bing Search Ads

Reducing Features to Improve Bug Prediction

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Calibration of Confidence Measures in Speech Recognition

Statistics and Data Analytics Minor

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Artificial Neural Networks written examination

What is a Mental Model?

Organizational Knowledge Distribution: An Experimental Evaluation

Issues in the Mining of Heart Failure Datasets

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

DEVELOPMENT OF AN INTELLIGENT MAINTENANCE SYSTEM FOR ELECTRONIC VALVES

Seminar - Organic Computing

Mining Association Rules in Student s Assessment Data

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

Learning From the Past with Experiment Databases

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Len Lundstrum, Ph.D., FRM

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Natural Language Processing. George Konidaris

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Test Effort Estimation Using Neural Network

A study of speaker adaptation for DNN-based speech synthesis

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Time series prediction

Reinforcement Learning by Comparing Immediate Reward

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Rottenberg, Annette. Elements of Argument: A Text and Reader, 7 th edition Boston: Bedford/St. Martin s, pages.

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Henry Tirri* Petri Myllymgki

Knowledge-Based - Systems

Topic: Making A Colorado Brochure Grade : 4 to adult An integrated lesson plan covering three sessions of approximately 50 minutes each.

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Transcription:

Big Data Terms, Tools and Algorithms What i ve l earned in t he past 12 months Kenneth P. Sanford, Ph.D. ekenomics@gmail.com @ekenomics

outline What I ve learned in the past year Economists as storytellers and analytics architects in this space The rise of ML and AI What is ML? Why ML is here to stay. Technological changes (Spark, Streaming) Language changes (SAS R Python) Methodological changes (Deep Learning, Online Learning) What Economists should do to learn

Economists in Data Science (Year Ago) Data Extraction Munging and Manipulation Computation Visualization SQL APIs SAS R SQL SAS R R SAS Tableau Excel

Why Economics is a Data Science Understand objective functions Academic answer vs. useful answer Great storytellers Solid visualization skills Observational data and causality Interdisciplinary training Solid knowledge of regression (background of predictive modeling)

Over the past year

Proprietary to Free AND Open Source http://r4stats.com/articles/popularity/

Analytics for Operations vs. Analytics as Product Hypothetical Examples Amazon: OA: Retail needs estimates of cost of delivery, timing, etc. AP: Create and sell customer cross-sell data (Customer 360) UBER: OA: Where to suggest that drivers locate AP: Targeted list of drivers for maintenance coupons LinkedIn OA: Who you may know AP: Who might buy machine learning software

This cloud stuff is real..

Data Extraction Munging and Manipulation Computation Visualization SQL APIs SAS R SQL SAS R R SAS Tableau Excel Static Analytics (On Premise) Data Extraction Munging and Manipulation Computation Visualization App Development SQL APIs SAS R Python Hadoop SQL SAS R Python Spark R SAS Python Spark Score Code (Java) Tableau Excel API Streaming D3 Production Analytics (Cloud)

Machine Learning and Artificial Intelligence

Machine Learning and Artificial Intelligence

Machine Learning and AI Vocabulary Concept Statistics\Econometrics Machine Learning Computation Fit\Estimate Train Left-hand side Dependent variable Target Right-hand side Regressor\Predictor\Class Feature\Factor\Enum Goal Estimation\Explanation Prediction

Statistics vs. Machine Learning Statistics: Good estimators are. Unbiased in small samples Consistent if not unbiased Efficient Machine Learning: Good models. Predict well.

Diagnostics: Evaluating Your Model Receiver Operating Characteristic (ROC) Curve Lift Curves

Supervised Learning Methods Regression (GLM) Lasso Ridge Elastic net Decision tree Random Forest Gradient Boosted Models Support Vector Machine Neural Network Deep Learning Know Y Unsupervised Learning Methods Clustering Kmeans Hierarchical Principal Components Analysis Autoencoders Non-negative matrix factorization Generalized Low Rank Models Don t know Y

Fitting: Training and Test Samples Why? As our objective is to predict, and with big data we might have lots of observations, lets reserve some data to objectively evaluate out-of-sample performance. Partitioning Data Train: Estimate one or more models Test: Compare the predictive qualities of the model 40% 60% Training Data Test Data

Line of Credit Usage Decision Tree Delinquent Years on the Job Normal

Customer Income Clustering Customer Age

Customer Income Clustering HENRYs Soccer Moms DINKs Customer Age

Algorithm Improvements and Adoption Deep Learning learns a hierarchy of nonlinear transformations Neurons transform their input in a non-linear way Black-box, bruteforce method, really good at pattern recognition Deep Learning got a boost in the last decade due to faster hardware and algorithmic advances Great for image recognition/text

Software Technologies of Big ML

What Economists can do

Where to learn more DataCamp Codeschool Kaggle Competitions Read Athey, Imbens, Bajari, etc. Conferences ODSC MLConf