Improving Fairness in Memory Scheduling

Similar documents
Improving Memory Latency Aware Fetch Policies for SMT Processors

Reinforcement Learning by Comparing Immediate Reward

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Language properties and Grammar of Parallel and Series Parallel Languages

Learning Methods for Fuzzy Systems

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Laboratorio di Intelligenza Artificiale e Robotica

Introduction to Simulation

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

AMULTIAGENT system [1] can be defined as a group of

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

A Reinforcement Learning Variant for Control Scheduling

On the Combined Behavior of Autonomous Resource Management Agents

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Laboratorio di Intelligenza Artificiale e Robotica

Software Development Plan

FF+FPG: Guiding a Policy-Gradient Planner

Mathematics 112 Phone: (580) Southeastern Oklahoma State University Web: Durant, OK USA

Axiom 2013 Team Description Paper

Truth Inference in Crowdsourcing: Is the Problem Solved?

(Sub)Gradient Descent

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Learning to Schedule Straight-Line Code

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

The New York City Department of Education. Grade 5 Mathematics Benchmark Assessment. Teacher Guide Spring 2013

Lecture 10: Reinforcement Learning

Performance Modeling and Design of Computer Systems

Embedded Real-Time Systems

Mathematics subject curriculum

Python Machine Learning

THE DEPARTMENT OF DEFENSE HIGH LEVEL ARCHITECTURE. Richard M. Fujimoto

Discriminative Learning of Beam-Search Heuristics for Planning

Seminar - Organic Computing

Preprint.

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

ZACHARY J. OSTER CURRICULUM VITAE

INPE São José dos Campos

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Curriculum Vitae FARES FRAIJ, Ph.D. Lecturer

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Assignment 1: Predicting Amazon Review Ratings

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Georgetown University at TREC 2017 Dynamic Domain Track

EMBA 2-YEAR DEGREE PROGRAM. Department of Management Studies. Indian Institute of Technology Madras, Chennai

Teaching a Laboratory Section

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

An OO Framework for building Intelligence and Learning properties in Software Agents

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Evolution of Symbolisation in Chimpanzees and Neural Nets

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Shockwheat. Statistics 1, Activity 1

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The Strong Minimalist Thesis and Bounded Optimality

Introduction and Motivation

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

Artificial Neural Networks written examination

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

An Estimating Method for IT Project Expected Duration Oriented to GERT

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Learning and Transferring Relational Instance-Based Policies

Rule Learning With Negation: Issues Regarding Effectiveness

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Using dialogue context to improve parsing performance in dialogue systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Improvements to the Pruning Behavior of DNN Acoustic Models

Comment-based Multi-View Clustering of Web 2.0 Items

Self Study Report Computer Science

Radius STEM Readiness TM

Somerset Progressive School Planning, Assessment, Recording & Celebration Policy

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

UNIT ONE Tools of Algebra

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Grade 6: Correlated to AGS Basic Math Skills

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Efficient Online Summarization of Microblogging Streams

Institutionen för datavetenskap. Hardware test equipment utilization measurement

The Impact of Test Case Prioritization on Test Coverage versus Defects Found

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Department of Computer Science GCU Prospectus

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Every curriculum policy starts from this policy and expands the detail in relation to the specific requirements of each policy s field.

On-Line Data Analytics

B.S/M.A in Mathematics

Dublin City Schools Mathematics Graded Course of Study GRADE 4

A Research Proposal on Development of Dynamic Manufacturing Theorem, Optimization and Modeling Methodology in a Manufacturing System

CSL465/603 - Machine Learning

Probability and Game Theory Course Syllabus

Transcription:

Improving Fairness in Memory Scheduling Using a Team of Learning Automata Aditya Kajwe and Madhu Mutyam Department of Computer Science & Engineering, Indian Institute of Tehcnology - Madras June 14, 2014 Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 1 / 15

Outline 1 Introduction 2 Related Work 3 Our Learning Automata-based Algorithm 4 Experiments 5 Conclusion Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 2 / 15

Introduction Introduction DRAM scheduling - The order in which memory access requests from the CPU are processed at DRAM. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 3 / 15

Introduction Introduction DRAM scheduling - The order in which memory access requests from the CPU are processed at DRAM. - Impacts main memory fairness, throughput & power consumption. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 3 / 15

Introduction Introduction DRAM scheduling - The order in which memory access requests from the CPU are processed at DRAM. - Impacts main memory fairness, throughput & power consumption. Metrics for evaluating a scheduling algorithm - harmonic speedup, execution time, sum-of-ipcs, maximum slowdown, weighted speedup Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 3 / 15

Introduction Introduction DRAM scheduling - The order in which memory access requests from the CPU are processed at DRAM. - Impacts main memory fairness, throughput & power consumption. Metrics for evaluating a scheduling algorithm - harmonic speedup, execution time, sum-of-ipcs, maximum slowdown, weighted speedup - harmonic speedup = N IPC i alone i IPC i shared Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 3 / 15

Introduction Introduction DRAM scheduling - The order in which memory access requests from the CPU are processed at DRAM. - Impacts main memory fairness, throughput & power consumption. Metrics for evaluating a scheduling algorithm - harmonic speedup, execution time, sum-of-ipcs, maximum slowdown, weighted speedup - harmonic speedup = N IPC i alone i IPC i shared - Provides a good balance between fairness and system performance Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 3 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch - MORSE [4]: extends Ipek et.al s learning technique [1] to target arbitrary figures of merit. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch - MORSE [4]: extends Ipek et.al s learning technique [1] to target arbitrary figures of merit. - MISE [6]: estimates slowdown of each application and accordingly redistributes bandwidth Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch - MORSE [4]: extends Ipek et.al s learning technique [1] to target arbitrary figures of merit. - MISE [6]: estimates slowdown of each application and accordingly redistributes bandwidth Thread Cluster Memory Scheduling (TCMS) [3] - divides threads into two clusters Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch - MORSE [4]: extends Ipek et.al s learning technique [1] to target arbitrary figures of merit. - MISE [6]: estimates slowdown of each application and accordingly redistributes bandwidth Thread Cluster Memory Scheduling (TCMS) [3] - divides threads into two clusters - latency-sensitive cluster > bandwidth-sensitive cluster Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Related Work Related Work - ATLAS [2]: prioritizes threads that have attained the least service - PAR-BS [5]: processes DRAM requests in batches, and uses the SJF principle within a batch - MORSE [4]: extends Ipek et.al s learning technique [1] to target arbitrary figures of merit. - MISE [6]: estimates slowdown of each application and accordingly redistributes bandwidth Thread Cluster Memory Scheduling (TCMS) [3] - divides threads into two clusters - latency-sensitive cluster > bandwidth-sensitive cluster - periodically shuffles priority in the bandwidth cluster Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 4 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : A = {α 1, α 2,..., α r } : finite set of actions. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : A = {α 1, α 2,..., α r } : finite set of actions. B : set of all possible reinforcements Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : A = {α 1, α 2,..., α r } : finite set of actions. B : set of all possible reinforcements τ : learning algorithm to update p(k) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : A = {α 1, α 2,..., α r } : finite set of actions. B : set of all possible reinforcements τ : learning algorithm to update p(k) p(k) = [p 1 (k), p 2 (k),..., p r (k)] T : action probability vect at instant k Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Overview of a Learning Automaton (LA) A simple model for dynamic decision making in unknown environments. Structure of FALA (Finite Action Learning Automaton) Formally, a FALA can be described by the quadruple (A, B, τ, p(k)) : A = {α 1, α 2,..., α r } : finite set of actions. B : set of all possible reinforcements τ : learning algorithm to update p(k) p(k) = [p 1 (k), p 2 (k),..., p r (k)] T : action probability vect at instant k Higher the probability value for a thread, higher is its priority for DRAM scheduling. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 5 / 15

Our Learning Automata-based Algorithm Operation of a Single FALA 1. Choose action (schedule a memory request) based on action probability vector. Environment α Learning Automaton (p) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 6 / 15

Our Learning Automata-based Algorithm Operation of a Single FALA Environment 1. Choose action (schedule a memory request) based on action probability vector. 2. Get reinforcement (harmonic speedup) from the system. β Learning Automaton Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 6 / 15

Our Learning Automata-based Algorithm Operation of a Single FALA Environment 1. Choose action (schedule a memory request) based on action probability vector. 2. Get reinforcement (harmonic speedup) from the system. 3. Update the action probabilities (thread priorities) using equation 2. τ Learning Automaton (p) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 6 / 15

Our Learning Automata-based Algorithm Operation of a Single FALA 1. Choose action (schedule a memory request) based on action probability vector. Environment 2. Get reinforcement (harmonic speedup) from the system. 3. Update the action probabilities (thread priorities) using equation 2. α τ β - This cycle repeats forever Learning Automaton (p) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 6 / 15

Our Learning Automata-based Algorithm The Learning Algorithm τ Linear Reward-Inaction (L R I ) [7] is one learning algorithm: p i = p i + λ β (1 p i ) p j = p j λ β p j, j i The above 2 equations can be combined using vector notation: p(k + 1) = p(k) + λβ(k)(e i p(k)) (1) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 7 / 15

Our Learning Automata-based Algorithm The Learning Algorithm τ Linear Reward-Inaction (L R I ) [7] is one learning algorithm: p i = p i + λ β (1 p i ) p j = p j λ β p j, j i The above 2 equations can be combined using vector notation: Equation for a team of N FALA p(k + 1) = p(k) + λβ(k)(e i p(k)) (1) p i (k + 1) = p i (k) + λβ(k) [ e αi (k) p i (k) ], 1 i N (2) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 7 / 15

Our Learning Automata-based Algorithm The Learning Algorithm τ Linear Reward-Inaction (L R I ) [7] is one learning algorithm: p i = p i + λ β (1 p i ) p j = p j λ β p j, j i The above 2 equations can be combined using vector notation: Equation for a team of N FALA p(k + 1) = p(k) + λβ(k)(e i p(k)) (1) p i (k + 1) = p i (k) + λβ(k) [ e αi (k) p i (k) ], 1 i N (2) The automata implicitly cooperate to perform a stochastic search over the space of rewards [7] : coordination among multiple memory controllers. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 7 / 15

Scheduling Our Learning Automata-based Algorithm Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 8 / 15

Implementation Our Learning Automata-based Algorithm - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Our Learning Automata-based Algorithm Implementation - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) - Additional logic is required for calculating the reward and updating p(k) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Implementation Our Learning Automata-based Algorithm - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) - Additional logic is required for calculating the reward and updating p(k) - Calculating HS on-the-fly: Requires instantaneous IPCi alone. We use overall IPCi alone, obtained by running a benchmark alone on the same baseline system, to get a rough estimate of HS. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Implementation Our Learning Automata-based Algorithm - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) - Additional logic is required for calculating the reward and updating p(k) - Calculating HS on-the-fly: Requires instantaneous IPCi alone. We use overall IPCi alone, obtained by running a benchmark alone on the same baseline system, to get a rough estimate of HS. - Updating p(k) is not on critical path. Can be performed in many tens of CPU cycles. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Implementation Our Learning Automata-based Algorithm - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) - Additional logic is required for calculating the reward and updating p(k) - Calculating HS on-the-fly: Requires instantaneous IPCi alone. We use overall IPCi alone, obtained by running a benchmark alone on the same baseline system, to get a rough estimate of HS. - Updating p(k) is not on critical path. Can be performed in many tens of CPU cycles. - As an approximation, we consider the latency for determining the reward for a scheduling decision to be 90 cycles. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Implementation Our Learning Automata-based Algorithm - Storage cost per controller: 3.3 Kbits (TCMS = 2.6 Kbits) - Additional logic is required for calculating the reward and updating p(k) - Calculating HS on-the-fly: Requires instantaneous IPCi alone. We use overall IPCi alone, obtained by running a benchmark alone on the same baseline system, to get a rough estimate of HS. - Updating p(k) is not on critical path. Can be performed in many tens of CPU cycles. - As an approximation, we consider the latency for determining the reward for a scheduling decision to be 90 cycles. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 9 / 15

Experimental Setup Experiments - Modified version gem5 simulator Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 10 / 15

Experiments Experimental Setup - Modified version gem5 simulator - 16 CPU cores and 4 memory controllers Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 10 / 15

Experiments Experimental Setup - Modified version gem5 simulator - 16 CPU cores and 4 memory controllers - PARSEC: Eight multi-threaded benchmarks with simmedium input set. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 10 / 15

Experiments Experimental Setup - Modified version gem5 simulator - 16 CPU cores and 4 memory controllers - PARSEC: Eight multi-threaded benchmarks with simmedium input set. - SPEC CPU2006: Eight multiprogrammed workloads of varying memory intensity run for 500mn instructions Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 10 / 15

Experiments Experimental Setup - Modified version gem5 simulator - 16 CPU cores and 4 memory controllers - PARSEC: Eight multi-threaded benchmarks with simmedium input set. - SPEC CPU2006: Eight multiprogrammed workloads of varying memory intensity run for 500mn instructions Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 10 / 15

Results Experiments PARSEC SPEC CPU2006 Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 11 / 15

Scalability Experiments Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 12 / 15

Future Work Conclusion - Improve the reward mechanism Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Future Work - Improve the reward mechanism - Evaluate on a wider variety of workloads (SPLASH and NAS benchmarks) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Future Work - Improve the reward mechanism - Evaluate on a wider variety of workloads (SPLASH and NAS benchmarks) - Compare against more recent scheduling algorithms (MISE) Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Future Work - Improve the reward mechanism - Evaluate on a wider variety of workloads (SPLASH and NAS benchmarks) - Compare against more recent scheduling algorithms (MISE) - A more accurate hardware feasibility analysis Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Future Work - Improve the reward mechanism - Evaluate on a wider variety of workloads (SPLASH and NAS benchmarks) - Compare against more recent scheduling algorithms (MISE) - A more accurate hardware feasibility analysis - Evaluate on a synthetic workload where the outcome should be predictable. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Future Work - Improve the reward mechanism - Evaluate on a wider variety of workloads (SPLASH and NAS benchmarks) - Compare against more recent scheduling algorithms (MISE) - A more accurate hardware feasibility analysis - Evaluate on a synthetic workload where the outcome should be predictable. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 13 / 15

Conclusion Conclusion - A learning technique is exploited to give improvement in fairness without much additional hardware cost. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 14 / 15

Conclusion Conclusion - A learning technique is exploited to give improvement in fairness without much additional hardware cost. - Scalable and works on multiprogrammed as well as parallel workloads Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 14 / 15

Conclusion Conclusion - A learning technique is exploited to give improvement in fairness without much additional hardware cost. - Scalable and works on multiprogrammed as well as parallel workloads Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 14 / 15

Conclusion Questions? Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 15 / 15

Conclusion References E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA 08, pages 39 50, Washington, DC, USA, 2008. IEEE Computer Society. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In M. T. Jacob, C. R. Das, and P. Bose, editors, HPCA, pages 1 12. IEEE Computer Society, 2010. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 43, pages 65 76, Washington, DC, USA, 2010. IEEE Computer Society. J. Mukundan and J. Martinez. Morse: Multi-objective reconfigurable self-optimizing memory scheduler. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1 12, Feb 2012. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA 08, pages 63 74, Washington, DC, USA, 2008. IEEE Computer Society. L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. Mise: Providing performance predictability and improving fairness in shared main memory systems. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), HPCA 13, pages 639 650, Washington, DC, USA, 2013. IEEE Computer Society. M. A. L. Thathachar and P. S. Sastry. Networks of Learning Automata. Springer, 2004. Aditya Kajwe and Madhu Mutyam (IITM) Improving Fairness in Memory Scheduling June 14, 2014 15 / 15