Python Machine Learning Step-by-Step: Modeling Financial Time Series Data

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning From the Past with Experiment Databases

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

STA 225: Introductory Statistics (CT)

Lecture 1: Machine Learning Basics

Probability and Statistics Curriculum Pacing Guide

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Using Calculators for Students in Grades 9-12: Geometry. Re-published with permission from American Institutes for Research

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

(Sub)Gradient Descent

CS 446: Machine Learning

Statewide Framework Document for:

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Lecture 1: Basic Concepts of Machine Learning

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Data Structures and Algorithms

Universidade do Minho Escola de Engenharia

CSL465/603 - Machine Learning

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

CS 101 Computer Science I Fall Instructor Muller. Syllabus

arxiv: v2 [cs.cv] 30 Mar 2017

School of Innovative Technologies and Engineering

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Multi-label classification via multi-target regression on data streams

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Indian Institute of Technology, Kanpur

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Visit us at:

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Measurement. When Smaller Is Better. Activity:

Knowledge Transfer in Deep Convolutional Neural Nets

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

arxiv: v1 [cs.lg] 15 Jun 2015

Diploma in Library and Information Science (Part-Time) - SH220

Remainder Rules. 3. Ask students: How many carnations can you order and what size bunches do you make to take five carnations home?

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Modeling function word errors in DNN-HMM based LVCSR systems

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Major Milestones, Team Activities, and Individual Deliverables

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Issues in the Mining of Heart Failure Datasets

Generative models and adversarial training

Curriculum Scavenger Hunt

Course Content Concepts

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

MODULE 4 Data Collection and Hypothesis Development. Trainer Outline

Modeling function word errors in DNN-HMM based LVCSR systems

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

APPENDIX A: Process Sigma Table (I)

Artificial Neural Networks written examination

WHEN THERE IS A mismatch between the acoustic

Affective Classification of Generic Audio Clips using Regression Models

Northern Kentucky University Department of Accounting, Finance and Business Law Financial Statement Analysis ACC 308

English Language Arts Missouri Learning Standards Grade-Level Expectations

Introduction to the Practice of Statistics

A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Innovative Methods for Teaching Engineering Courses

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Adaptive Learning in Time-Variant Processes With Application to Wind Power Systems

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

A Case Study: News Classification Based on Term Frequency

Comment-based Multi-View Clustering of Web 2.0 Items

elearning OVERVIEW GFA Consulting Group GmbH 1

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Learning Methods for Fuzzy Systems

On-Line Data Analytics

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Probabilistic Latent Semantic Analysis

Edinburgh Research Explorer

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Len Lundstrum, Ph.D., FRM

Speech Emotion Recognition Using Support Vector Machine

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Exploring Derivative Functions using HP Prime

Detailed course syllabus

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Model Ensemble for Click Prediction in Bing Search Ads

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Transcription:

Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke Director of Big Data Credibly February 27, 2017

What is Machine Learning? Data Preparation Overview Python Toolbox Trade Ideas to Data Conclusion Exploratory Data Analysis Overview Scatter Plot Principal Component Analysis (PCA) Conclusion Fitting Models Overview Models and Pipelines Learning Curves Interpretability Conclusion A Fitted Model

What is Machine Learning?

What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed.

What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem:

What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem: 2.1 The learning

What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem: 2.1 The learning 2.2 Model produced from the learning

Data Preparation: Overview Review the Python software stack

Data Preparation: Overview Review the Python software stack Motivate the problem

Data Preparation: Overview Review the Python software stack Motivate the problem Discuss some issues specific to time series modeling

Python Toolbox 1 1 Scientific Python by Eueung Mulyana

Trump2Cash 2 2 Trump2Cash GitHub Project

Input: Trump criticizes Toyota on Twitter

Output: Toyota stock opens lower 3 3 Toyota Stock on Yahoo Finance s Interactive Chart

WSJ Analysis of Trump Tweets 4 4 by Akane Otani and Shane Shifflett

IPython: A Data Scientist s Best Friend Jupyter Notebook

Data Preparation: Conclusion We now have a illustrative data set to work with Data set has 10 numeric dimensions: 9 inputs, 1 output

Data Preparation: Conclusion We now have a illustrative data set to work with Data set has 10 numeric dimensions: 9 inputs, 1 output Data set is large ( 400MB compressed)

Exploratory Data Analysis: Overview Covariance and Correlation Matrices

Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots

Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots Principal Component Analysis (PCA)

Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots Principal Component Analysis (PCA) Kernel PCA

Using IPython Jupyter Notebook

Scatter Plot: What can we say about the data?

scikit-learn Algorithm Cheat-Sheet: Just looking 5 5 scikit-learn Cheat-Sheet

Principal Component Analysis (PCA)

Kernel PCA with Radial Basis Function (RBF)

Exploratory Data Analysis: Conclusion Nonlinear relationship with (0, 9), (2, 9), (6, 9)

Exploratory Data Analysis: Conclusion Nonlinear relationship with (0, 9), (2, 9), (6, 9) All other dimensions are quite random

Fitting Models: Overview Scikit learn s model and pipelines

Fitting Models: Overview Scikit learn s model and pipelines Illustrative learning curves

scikit-learn Revisited 6 6 scikit-learn Cheat-Sheet

scikit-learn Pipeline 7 7 Python Machine Learning by Sebastian Raschka

Holdout Method 8 8 Python Machine Learning by Sebastian Raschka

Cross-Validation 9 9 Python Machine Learning by Sebastian Raschka

Learning Curves: What does it tell us? 10 10 Python Machine Learning by Sebastian Raschka

Poor fit: Linear Regression even with (K)PCA

Good fits: SVR (RBF) and Decision Tree Learning Curves

Classic Overfitting: Random Forest Regressor

Decision Trees: Easy to understand

Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy

Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy Decision Tree is easier to understand

Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy Decision Tree is easier to understand Choice involves our own priors on the underlying structure

Second Half of Machine Learning: A Persistent Model Jupyter Notebook

Thanks for listening: Q&A https://github.com/rheineke/time series modeling