Prediction of Bike Sharing Systems for Casual and Registered Users Mahmood Alhusseini CS229: Machine Learning.

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Generative models and adversarial training

Multivariate k-nearest Neighbor Regression for Time Series data -

Learning From the Past with Experiment Databases

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speech Emotion Recognition Using Support Vector Machine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Human Emotion Recognition From Speech

MGT/MGP/MGB 261: Investment Analysis

arxiv: v1 [cs.lg] 15 Jun 2015

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Assignment 1: Predicting Amazon Review Ratings

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Calibration of Confidence Measures in Speech Recognition

Detailed course syllabus

WHEN THERE IS A mismatch between the acoustic

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CSL465/603 - Machine Learning

STA 225: Introductory Statistics (CT)

Linking Task: Identifying authors and book titles in verbose queries

Reducing Features to Improve Bug Prediction

Rule Learning With Negation: Issues Regarding Effectiveness

Adaptive Learning in Time-Variant Processes With Application to Wind Power Systems

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Rule Learning with Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

Time series prediction

Lecture 1: Basic Concepts of Machine Learning

Probabilistic Latent Semantic Analysis

Statewide Framework Document for:

Softprop: Softmax Neural Network Backpropagation Learning

Interpreting ACER Test Results

Evaluation of Teach For America:

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

School of Innovative Technologies and Engineering

AUTHOR COPY. Techniques for cold-starting context-aware mobile recommender systems for tourism

Medical Complexity: A Pragmatic Theory

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Learning to Schedule Straight-Line Code

Discriminative Learning of Beam-Search Heuristics for Planning

Multi-Lingual Text Leveling

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Visit us at:

A Reinforcement Learning Variant for Control Scheduling

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Using focal point learning to improve human machine tacit coordination

On-the-Fly Customization of Automated Essay Scoring

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Measurement. When Smaller Is Better. Activity:

Artificial Neural Networks written examination

Indian Institute of Technology, Kanpur

Probability and Statistics Curriculum Pacing Guide

(Sub)Gradient Descent

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Axiom 2013 Team Description Paper

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Switchboard Language Model Improvement with Conversational Data from Gigaword

Speech Recognition at ICSI: Broadcast News and beyond

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Executive Guide to Simulation for Health

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Data Fusion Through Statistical Matching

Truth Inference in Crowdsourcing: Is the Problem Solved?

Mathematics process categories

Learning Methods in Multilingual Speech Recognition

An Introduction to Simio for Beginners

Online Updating of Word Representations for Part-of-Speech Tagging

INPE São José dos Campos

Getting Started with TI-Nspire High School Science

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

CS 446: Machine Learning

GUIDE CURRICULUM. Science 10

Support Vector Machines for Speaker and Language Recognition

An Online Handwriting Recognition System For Turkish

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Targeted Alaska Reading Performance Standards for the High School Graduation Qualifying Exam

SURVIVING ON MARS WITH GEOGEBRA

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Lecture 15: Test Procedure in Engineering Design

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Why Did My Detector Do That?!

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Mathematics subject curriculum

Transcription:

Prediction of Bike Sharing Systems for Casual and Registered Users Mahmood Alhusseini mih@stanford.edu CS229: Machine Learning Abstract - In this project, two different approaches to predict Bike Sharing Demand are studied. The first approach tries to predict the exact number of bikes that will be rented using Support Vector Machines (SVM). The second approach tries to classify the demand into 5 different levels from 1 (lowest) to 5 (highest) using Softmax Regression and Support Vector Machines. Index Terms regression, classification, prediction SVM, Softmax I. Introduction Bike sharing systems have been increasing in demand over the past two decades as a result of rapid advancements in technology (Figure 1). However, as seen in Figure 2, fluctuations in demand during the year are still present due to different factors such as temperature, time, etc. The goal of this project is to present a model for predicating fluctuations in this demand for both casual and registered consumers so that the service can be optimized for both system providers and consumers. Two approaches have been used: 1) continuous model to predict an exact demand, and 2) classification into 5 levels of demand. In the continuous model, SVM Regression was used to predict the data while SVM classification and Softmax Regression were used for approach 2. 1 Figure 1: Constant demand increase for bike sharing systems from 2000 to 2010 worldwide. 2 Figure 2: Fluctuations in demand for bike sharing systems over different months of the year. 3 II. Data and Features The data was obtained from an online website. 4 The number of sample points in the data is 10886 from which 8886 samples were used for training the model and 2000 used for testing the model. The original data contained 9 features and 3 labels classified as follows: Feature/Labels Numerical Value Date and Time Format is MM/DD/YYYY HH:MM Season Takes 4 values: 1 = spring, 2 = summer, 3 = fall, 4 = winter Holiday 1 = yes, 0 = no Working day 1 = yes, 0 = no Weather 1: Clear, few clouds, partly cloudy 2: Mist + cloudy, mist + broken clouds, mist + few cloud 3: Light snow, light rain + thunderstorm + Scattered clouds, light rain + scattered clouds. 4: Heavy Rain + Ice Pallets+ thunderstorm + mist, snow + fog Temp Atemp Humidity Wind speed Casual Registered Count Temperature in celcius feels like temperature Relative humidity No units were given Number of non-registered user rentals initiated Number of Registered user rentals initiated Total number of rentals (Casual + Registered) Table 1: Description of features and label as given in the website. 5 As seen in Table 1 above, the date and time of each sample was given as one feature which made the feature difficult to interpret and incorporate in a model. Therefore, the date and time was split into 4 separate 1 SVMs were implemented using the libsvm package available online. Approach I was done using the SVR option while approach II was done using the SVM option. 2 Figure taken from http://www.kevinauyeung.com/cyclehirescheme.html, December 07, 2014. 3 Taken from http://chi.streetsblog.org/wp- content/uploads/2014/09/screenshot-2014-09-03-11.01.42.png, December 07, 2014. 4 Data obtained from: https://www.kaggle.com/c/bikesharing-demand/data, accessed December 10, 2014.

features: year, month, day, and time. This made the total number of features 13 - making it easier to integrate the features into the model. Moreover, the features and labels on the data were also normalized according to: The labels for the data used was first normalized according to: y = y min(y) max(y) min(y) Where: y : normalized value, y : original value, Y: vector containing values y Because this project only deals with predicting the number of casual users and registered users independently (i.e. each label was treated as a separate problem having the same features), normalization was only performed for the number of casual users and the number of registered users; the total number of bikes used label was not considered. III. Methods a. Approach I: Continuous model prediction using SVM Initially, Linear Regression was used to fit the data but the results obtained were very erroneous. Therefore, it was thought that using SVMs in the regression form would result in better result since different linear and nonlinear kernels can be used. A Gaussian kernel was used in the algorithm. The effectiveness of SVMs was then calculated using the Root Mean Squared Logarithmic Error (RMSLE) according to: to turn the problem into ten different classes from 1 (lowest) to 10 (highest); however, this gave a maximum test accuracy of about 60%. In an attempt to simplify the problem and get better accuracy, the classification labels were reduced into five labels from 1 (lowest) to (five) highest this way allowed for a larger error margin in the model. Two learning algorithms were used in approach II. The first is Softmax regression, and the second is SVMs (with a Gaussian kernel). Different SVM settings were used to optimize the results. Similar to what was done in approach I, feature analysis was performed on the features to find which feature contributes the most in accuracy improvement. The performance of both algorithms is calculated using: IV. Accuracy = number of samples labled correctly total number of samples Results and Discussion a. Approach I: Continuous model prediction using SVM The SVM algorithm was run many times under different options for c, g, kernel type, and kernel degree. The findings showed that increasing c resulted in better training data on the whole. On the other hand, the parameter g had to be adjusted to that we would avoid a case of under/over fitting the training data which translated in a bigger RMSLE for the testing data. Figure 3 and Figure 4 summarize the results. m 1 m (ln(y i + 1) ln(y i + 1)) 2 1 Where: m: # of samples, y i i : predicted value, y i : actually value The algorithms and calculations were performed for both casual and registered users independently. Feature analysis was performed to find which parameters contribute the most in accuracy improvement. b. Approach II: Classification into five different classes Figure 3: Best results for RMSLE error as a function of c in SVM Turning the problem into a classification problem would allow for a better interpretation of how well the model is performing. At first, it was decided

Temperature 2.37 5.29 Feels like Temp. 3.92 6.22 Humidity 4.72 6.86 Table 2: Feature selection for SVM in approach 1 The results obtained follow the expected trend, less features resulted in a higher error. The two features that contributed to the highest error were humidity and feels like temperature, followed by weather, time, and holiday. The year, month, and day had very little or no contribution on the error. Figure 4: Best results for RMSLE error as a function of g in SVM The best RMSLE results obtained gave an error of around 1.9 with c = 100 and g = 1. This result is still higher compared with results submitted on the online website where the data was obtained. 6 It is thought that the results can be still improved further by using different options for the SVM or even different machine learning algorithms such as Random Forests or Neural Networks. 7 However, given the time constraint of the project, these were the best solutions achieved given the long computational time it takes to run the program. b. Approach II: Classification Models i. Softmax Regression Results Softmax algorithm was run first in order to get a good initial estimate of the accuracy of the model. The accuracy for the training data was found to be 100% for both casual and registered users, while testing accuracy was found to be around 85% and 86% for casual and registered users, respectively. Feature selection was performed on the data to see which features contributed to the best results. Table 2 summarizes the results. Feature Removed Best Casual RMSLE Best Registered RMSLE Overall 1.93 3.92 Year 1.93 3.92 Month 1.93 3.92 Day 1.94 3.92 Time 1.94 4.52 Season 1.94 4.53 Holiday 1.94 4.72 Working day 2.03 4.83 Weather 2.21 5.07 Figure 5: Softmax regression accuracy for classification ii. Support Vector Machines The second method used was support vector machines for classification. The algorithm was run for different values of c and g in SVM. The optimized solution was found at parameter c = 1 and g = 0.25. The training accuracy was around 99% for both casual 6 As of Dec 12, 2014, results on Kaggle website range from 0.24976 to 4.76189. https://www.kaggle.com/c/bikesharing-demand/leaderboard. 7 A solution posted on kaggle website gives an RMSLE error of 0.70 using Random Forests. http://www.techdreams.org/programming/solvingkaggles-bike-sharing-demand-machine-learningproblem/9343-20140821

and registered users, while the testing accuracy for casual users was around 91% and 86% for casual users. most increase of around 5% came from removing the day feature. As for registered users, results were as expected decreasing accuracy with less features. The most decrease in accuracy came from removing the day and working day features resulting in a decrease of about 2% in accuracy from each. Table 4 below shows a summary of the results. Feature Removed Casual Accur. Registered Accur. Overall 0.9395 0.8624 Year 0.9415 0.8599 Month 0.949 0.858 Figure 6: SVM accuracy as a function of parameter c Day 0.9875 0.833 Time 0.9875 0.8345 Season 0.99 0.8215 Holiday 0.99 0.821 Working day 0.991 0.8055 Weather 0.991 0.813 Temperature 0.9945 0.8075 Feels like Temp. 1 0.8275 Humidity 1 0.8445 Figure 7: SVM accuracy as a function of parameter g In order to understand which classes were harder than others to predict, a classification matrix was prepared. As seen in Table 3, most of the errors were in the classification of class 2 and 3 of the demand. Predicted Actual 1 2 3 4 5 1 90% 9% 1% 0% 0% 2 32% 39% 20% 8% 2% 3 4% 23% 40% 28% 4% 4 0% 0% 0% 67% 33% 5 0% 0% 0% 0% 0% Table 3: Classfication table for SVM regression This result could be due to several reasons including the limited number of samples with labels 2 and 3 which, in turn, do not give our learning algorithm enough training examples to build a more solid model. Similar to what performed in approach I, feature selection was implemented. The results for casual users were unexpected as accuracy increased to 100% as features were removed from the model. The Table 4: Feature selection implemented in SVM algorithm The results obtained for the increasing accuracy with less features for casual users was not expected. One reason could be that the data is very random and that the used featured don t capture the necessary attributes to predict how casual users choose to rent bikes. Such unused features include nationality, socioeconomic status, and length of visit. More data and features need to be used to understand the situation with casual users. V. Conclusion and Future Work Two approaches were taken to try and predict the bike sharing demand. In the first approach, the continuous model, initial results gave a high RMSLE of 7.65 error which were then lowered to 1.9 by changing the parameters of the system. The results contained a high variance which could be due to not having a large enough training samples or not selecting features that would optimize the problem. It is very possible that better results can be obtained if the SVM was optimized. In addition, other algorithms such as

Random Forest and Neural Networks could result in better results. [7] http://brandonharris.io/kaggle-bike-sharing/ accessed Dec. 12, 2014. As for the second approach, demand classification, the accuracy for classifying the data into 5 groups gave good results of up to around 95%, with SVM outperforming Softmax regression. The results contained a high variance as well which could be due to the same reasons mentioned above. One interesting observation is the fact that decreasing the number of features increased accuracy for casual users while (as expected) it decreased the accuracy for registered users. Future work aims at optimizing the SVM algorithm to produce a model that gives less testing error as well as using other algorithms (Random Forest and Neural Networks). Moreover, investigation into the feature selection effect on casual users in approach II is needed to understand how the algorithm produces unexpectedly better results with less features. In addition, for approach II, it would be beneficial to try and increase the classification labels into 10 (or more) different classes instead and still be able to get a high level of accuracy. VI. References [1]http://www.kevinauyeung.com/cycleHireScheme.h tml, December 07, 2014. [2] http://chi.streetsblog.org/wp- content/uploads/2014/09/screenshot-2014-09-03-11.01.42.png, December 07, 2014. [3] https://www.kaggle.com/c/bike-sharing-demand. December 07, 2014. Note: Data was also taken from here. [4] Using Gradient Boosting Machines to Predict Bikesharing Station States, Robert Regue and Will Recker, UC Irvine, TBR 2014, figure 2 pg. 11. [5]http://beyondvalence.blogspot.com/2014/07/predic ting-capital-bikeshare-demand-in_10.html, accessed Dec 07, 2014. [6]http://beyondvalence.blogspot.com/2014/06/predic ting-capital-bikeshare-demand-in.html accessed Dec 12,2014.