Multivariate k-nearest Neighbor Regression for Time Series data -

Similar documents
Time series prediction

Speech Emotion Recognition Using Support Vector Machine

Learning From the Past with Experiment Databases

A Case Study: News Classification Based on Term Frequency

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Reducing Features to Improve Bug Prediction

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Why Did My Detector Do That?!

Python Machine Learning

Intermediate Computable General Equilibrium (CGE) Modelling: Online Single Country Course

Switchboard Language Model Improvement with Conversational Data from Gigaword

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Human Emotion Recognition From Speech

arxiv: v1 [cs.lg] 15 Jun 2015

Rule Learning With Negation: Issues Regarding Effectiveness

Generative models and adversarial training

Data Fusion Through Statistical Matching

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Rule Learning with Negation: Issues Regarding Effectiveness

CS Machine Learning

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Lecture 1: Machine Learning Basics

Activity Recognition from Accelerometer Data

GEOCODING LOCATIONS OF HISTORIC RECLAMATION RESEARCH SITES USING GOOGLE EARTH

Bug triage in open source systems: a review

On the Combined Behavior of Autonomous Resource Management Agents

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

INSTRUCTIONAL FOCUS DOCUMENT Grade 5/Science

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Word Segmentation of Off-line Handwritten Documents

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Australian Journal of Basic and Applied Sciences

Introduction to Causal Inference. Problem Set 1. Required Problems

Large vocabulary off-line handwriting recognition: A survey

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Houghton Mifflin Online Assessment System Walkthrough Guide

Analysis of Enzyme Kinetic Data

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

(Sub)Gradient Descent

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Ryerson University Sociology SOC 483: Advanced Research and Statistics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Rule-based Expert Systems

Computerized Adaptive Psychological Testing A Personalisation Perspective

arxiv: v1 [cs.lg] 3 May 2013

I N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

Measurement. When Smaller Is Better. Activity:

SARDNET: A Self-Organizing Feature Map for Sequences

Cross-lingual Short-Text Document Classification for Facebook Comments

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Universidade do Minho Escola de Engenharia

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

CHANCERY SMS 5.0 STUDENT SCHEDULING

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Task Types. Duration, Work and Units Prepared by

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Discriminative Learning of Beam-Search Heuristics for Planning

CS 446: Machine Learning

Testing Schedule. Explained

Introduction to Psychology

Genre classification on German novels

On-the-Fly Customization of Automated Essay Scoring

Adaptive Learning in Time-Variant Processes With Application to Wind Power Systems

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Ensemble Technique Utilization for Indonesian Dependency Parser

Timeline. Recommendations

The Round Earth Project. Collaborative VR for Elementary School Kids

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Truth Inference in Crowdsourcing: Is the Problem Solved?

WHEN THERE IS A mismatch between the acoustic

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Handling Concept Drifts Using Dynamic Selection of Classifiers

FAQ: The 4Rs and Social & Emotional Learning

Dakar Framework for Action. Education for All: Meeting our Collective Commitments. World Education Forum Dakar, Senegal, April 2000

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Answer Key Applied Calculus 4

The Effects of Statewide Private School Choice on College Enrollment and Graduation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Inside the mind of a learner

Course Development Using OCW Resources: Applying the Inverted Classroom Model in an Electrical Engineering Course

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

HILLCREST INTERNATIONAL SCHOOLS FEES STRUCTURE SEPTEMBER AUGUST 2017

STA 225: Introductory Statistics (CT)

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

STUDYING ACADEMIC INDICATORS WITHIN VIRTUAL LEARNING ENVIRONMENT USING EDUCATIONAL DATA MINING

B.S/M.A in Mathematics

Transcription:

Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science, Lancaster University

Agenda Multivariate KNN Regression for Time Series 1. Introduction KNN for Classification KNN for Regression Formulation and algorithm Meta-parameters KNN Univariate and Multivariate Models 2. KNN for Electricity Load Forecasting Problem and Related work review Experiment Setup Data Description Univariate Model Multivariate Model with One Dummy Variable (WorkDay) Result 3. Conclusions and Future Work

Introduction KNN for Classification: Introduced by Fix and Hodges (1951) and later formalised by Cover and Hart (1967) Figure 1: knn algorithm with k=4 and Euclidian Distance

KNN for Regression:

Introduction KNN for Regression: Formulation: K, Distance Measure, Feature Vector (W) and an operator to combine selected neighbors to estimate forecasted result

Introduction Multivariate Model : Why Consider Using Ambiguous How 1. And 2. Increase Reveal eliminate Do can KNN we This some with W Result need resolve Example: length, the pre-known Reference a false with Multivariate this but two pattern ambiguity? Window Information different Model? (W) (2 patterns Ways) about = 1 the day being predicted Involve more computational cost and it can get worst in the presence of bank holiday Work day Ambiguous Result Non-work day

Introduction Multivariate Model : Introducing a Multivariate Model Consisting of: Previous Load Observations Calendar Information about next day

Agenda Multivariate KNN Regression for Time Series 1. Introduction KNN for Classification KNN for Regression Formulation and algorithm Meta-parameters KNN Univariate and Multivariate Models 2. KNN for Electricity Load Forecasting Related work review Experiment Setup Data Description Univariate Model Multivariate Model with One Dummy Variable Result Extended Multivariate Model 3. Conclusions and Future Work

Electricity Load Forecasting Problem: Accurate load forecasting is essential for the planning and operations of utility companies >1% in forecast error can increase the operating cost of a power utility by 10 million Challenges: Data with Triple Seasonality (Daily, Weekly and Annual) Outliers, Bank Holidays and Exogenous drivers (Temperature, Economy, Special Events ) Models: From Conventional Statistical Models to Advanced Computational Models

Review of Related Work: knn for time series Forecasting Application Areas: Most applications are in the following areas: Finance (Fernández-Rodrıǵuez, Sosvilla-Rivero et al. 1999; Andrada-Félix, Fernadez-Rodriguez et al. 2003) Hydrology and Earth Science (Jayawardena, Li et al. 2002; She and Yang 2010) climatology (Dimri, Joshi et al. 2008)

Review of Related Work: Within the Electricity Demand Forecasting Application Area: Four journal papers: (Lora 2006; Lora, Santos et al. 2007; Sorjamaa, Hao et al. 2007; Jursa and Rohrig 2008) Eight conference contributions: (Tsakoumis, Vladov et al. 2002; Fidalgo and Matos 2007; Bhanu, Sudheer et al. 2008; El-Attar, Goulermas et al. 2009; Kang, Guo et al. 2009; Swief, Hegazy et al. 2009; Karatasou and Santamouris 2010; Zu, Bi et al. 2012) No systemic way to set the knn algorithm parameters Exclude Bank holiday and weekends Rely only on pervious observation

Experiment Setup Objectives: Evaluate the influence of adding features to the KNN algorithm by comparing the accuracy and performance of the univariate and multivariate models ( with only the workday feature) Set the parameters of the KNN algorithm for the univariate and multivariate models and produce forecast for the UK electricity data. Also, Evaluate the performance of both models against Statistical Benchmarks

Experiment Setup UK Electricity Demand Data Hourly Electrical Load Time Series for 2 Years Data from 2001 to 2008

Experiment Setup UK Electricity Demand Data Training Data set: All days in 2004 Testing Data set: All days in 2005

Experiment Setup Univariate Model Tuning

Experiment Setup Multivariate Model with the WorkDay feature

Experiment Setup Multivariate Model Setting K:

Experiment Setup Statistical Benchmarks 2 Seasonal Naïve Models (Random Walk): RW 24 and RW 168 (RW s : y t+ h = y t s+h) 2 Seasonal k Average Models: MOVAV(7) 24 and MOVAV(7) 168 (MOVAV(k) s : y t+ h = 1 k k i=1 y t ks+h )

Experiment Setup Result:

Experiment Setup Result: Computation Cost: Univariate Model : 6.5 Minutes Multivariate Model : 2.7 Minutes 59% Improvement

Experiment Setup Extended Multivariate Model: Work Day Type Position within the Year (Linear and Circular) Position within the Data

Agenda Multivariate KNN Regression for Time Series 1. Introduction KNN for Classification KNN for Regression Formulation and algorithm Meta-parameters KNN Univariate and Multivariate Models 2. KNN for Electricity Load Forecasting Related work review Experiment Setup Data Description Univariate Model Multivariate Model with One Dummy Variable Result Extended Multivariate Model 3. Conclusions and Future Work

Conclusions and Future Work Concluding Remarks: KNN algorithm is intuitive, easy to implement and can give reliable results for electricity demand forecasting when its parameters set correctly Including extra information about the day being predicted into the KNN algorithm can increase its accuracy and improve its performance.

Conclusions and Future Work Future Work Include exogenous variables such as: Temperature Humidity Improving KNN performance by Implementing an Active Learning Mechanism For Selecting The Most Informative Training Data. Integrate knn with other forecasting frameworks such as NN and SVM

Questions? Fahad H. Al-Qahtani Lancaster University Management School Centre for Forecasting - Lancaster, LA1 4YX email: alqahta2@exchange.lancs.ac.uk