Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke Director of Big Data Credibly February 27, 2017
What is Machine Learning? Data Preparation Overview Python Toolbox Trade Ideas to Data Conclusion Exploratory Data Analysis Overview Scatter Plot Principal Component Analysis (PCA) Conclusion Fitting Models Overview Models and Pipelines Learning Curves Interpretability Conclusion A Fitted Model
What is Machine Learning?
What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed.
What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem:
What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem: 2.1 The learning
What is Machine Learning? 1. Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. 2. There are two sides to every machine learning problem: 2.1 The learning 2.2 Model produced from the learning
Data Preparation: Overview Review the Python software stack
Data Preparation: Overview Review the Python software stack Motivate the problem
Data Preparation: Overview Review the Python software stack Motivate the problem Discuss some issues specific to time series modeling
Python Toolbox 1 1 Scientific Python by Eueung Mulyana
Trump2Cash 2 2 Trump2Cash GitHub Project
Input: Trump criticizes Toyota on Twitter
Output: Toyota stock opens lower 3 3 Toyota Stock on Yahoo Finance s Interactive Chart
WSJ Analysis of Trump Tweets 4 4 by Akane Otani and Shane Shifflett
IPython: A Data Scientist s Best Friend Jupyter Notebook
Data Preparation: Conclusion We now have a illustrative data set to work with Data set has 10 numeric dimensions: 9 inputs, 1 output
Data Preparation: Conclusion We now have a illustrative data set to work with Data set has 10 numeric dimensions: 9 inputs, 1 output Data set is large ( 400MB compressed)
Exploratory Data Analysis: Overview Covariance and Correlation Matrices
Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots
Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots Principal Component Analysis (PCA)
Exploratory Data Analysis: Overview Covariance and Correlation Matrices Scatter plots Principal Component Analysis (PCA) Kernel PCA
Using IPython Jupyter Notebook
Scatter Plot: What can we say about the data?
scikit-learn Algorithm Cheat-Sheet: Just looking 5 5 scikit-learn Cheat-Sheet
Principal Component Analysis (PCA)
Kernel PCA with Radial Basis Function (RBF)
Exploratory Data Analysis: Conclusion Nonlinear relationship with (0, 9), (2, 9), (6, 9)
Exploratory Data Analysis: Conclusion Nonlinear relationship with (0, 9), (2, 9), (6, 9) All other dimensions are quite random
Fitting Models: Overview Scikit learn s model and pipelines
Fitting Models: Overview Scikit learn s model and pipelines Illustrative learning curves
scikit-learn Revisited 6 6 scikit-learn Cheat-Sheet
scikit-learn Pipeline 7 7 Python Machine Learning by Sebastian Raschka
Holdout Method 8 8 Python Machine Learning by Sebastian Raschka
Cross-Validation 9 9 Python Machine Learning by Sebastian Raschka
Learning Curves: What does it tell us? 10 10 Python Machine Learning by Sebastian Raschka
Poor fit: Linear Regression even with (K)PCA
Good fits: SVR (RBF) and Decision Tree Learning Curves
Classic Overfitting: Random Forest Regressor
Decision Trees: Easy to understand
Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy
Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy Decision Tree is easier to understand
Fitting Models: Conclusion Support Vector Machine (SVR) with Radial Basis Function (RBF) Kernel has a higher accuracy Decision Tree is easier to understand Choice involves our own priors on the underlying structure
Second Half of Machine Learning: A Persistent Model Jupyter Notebook
Thanks for listening: Q&A https://github.com/rheineke/time series modeling