Phase-Change-Memory Devices for Non-von Neumann Computing in the AI Era

Similar documents
A Variation-Tolerant Multi-Level Memory Architecture Encoded in Two-state Memristors

PROGRAM Day 1 - Thursday, May 28, 2015

Python Machine Learning

Generative models and adversarial training

Lecture 1: Machine Learning Basics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Computer Science. Embedded systems today. Microcontroller MCR

INPE São José dos Campos

Knowledge Transfer in Deep Convolutional Neural Nets

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Circuit Simulators: A Revolutionary E-Learning Platform

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Seminar - Organic Computing

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Evolutive Neural Net Fuzzy Filtering: Basic Description

Probabilistic Latent Semantic Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks

Improving Fairness in Memory Scheduling

Axiom 2013 Team Description Paper

NSF Grantee s Meeting December 4 th, Gerhard Klimeck

SARDNET: A Self-Organizing Feature Map for Sequences

Bachelor of Science in Mechanical Engineering with Co-op

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

(Sub)Gradient Descent

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

XXII BrainStorming Day

Test Effort Estimation Using Neural Network

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Speech Recognition at ICSI: Broadcast News and beyond

On the Combined Behavior of Autonomous Resource Management Agents

Abstractions and the Brain

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Modeling function word errors in DNN-HMM based LVCSR systems

On-Line Data Analytics

Word Segmentation of Off-line Handwritten Documents

Success Factors for Creativity Workshops in RE

A Reinforcement Learning Variant for Control Scheduling

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

WHEN THERE IS A mismatch between the acoustic

A Case Study: News Classification Based on Term Frequency

Artificial Neural Networks written examination

How People Learn Physics

Rendezvous with Comet Halley Next Generation of Science Standards

An Introduction to Simio for Beginners

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

An empirical study of learning speed in backpropagation

AC : TEACHING COLLEGE PHYSICS

Learning Methods for Fuzzy Systems

Mathematics subject curriculum

Value Creation Through! Integration Workshop! Value Stream Analysis and Mapping for PD! January 31, 2002!

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Data Fusion Models in WSNs: Comparison and Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

Learning to Schedule Straight-Line Code

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Computer Organization I (Tietokoneen toiminta)

Cross Language Information Retrieval

Accelerated Learning Online. Course Outline

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Research computing Results

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Device Independence and Extensibility in Gesture Recognition

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Pipelined Approach for Iterative Software Process Model

Speaker Identification by Comparison of Smart Methods. Abstract

Evolving Spiking Networks with Variable Resistive Memories

Training Memristors for Reliable Computing

Grade 6: Correlated to AGS Basic Math Skills

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

A study of speaker adaptation for DNN-based speech synthesis

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Accelerated Learning Course Outline

Reducing Features to Improve Bug Prediction

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Human Emotion Recognition From Speech

Software Maintenance

Using EEG to Improve Massive Open Online Courses Feedback Interaction

K.L.N. COLLEGE OF ENGINEERING, POTTAPALAYAM. Department of Computer Science and Engineering. Academic Year:

D Road Maps 6. A Guide to Learning System Dynamics. System Dynamics in Education Project

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Australian Journal of Basic and Applied Sciences

LEGO MINDSTORMS Education EV3 Coding Activities

MYCIN. The MYCIN Task

High School Digital Electronics Curriculum Essentials Document

Transcription:

Phase-Change-Memory Devices for Non-von Neumann Computing in the AI Era Evangelos Eleftheriou IBM Fellow, IBM Research - Zurich

Application Trends Computational Complexity Graph Analytics HADOOP HPC O(N 3 ) O(N 2 ) O(N) MB Classical HPC Applications Uncertainty Quantification GB Deep Learning Database Queries Dimensional Reduction TB Knowledge Graph Creation Information Retrieval PB Data Volume

Performance and Power Efficiency Trends 100 90 80 70 60 50 40 30 20 10 2002 2016 Performance (Petaflops/s) Power efficiency (Gigaflops/W) Increasing gap between performance and power efficiency Diminishing performance/power efficiency gains from technology scaling 0

AI Computational Requirements: Very Challenging à 10 18 FLOPS for image classification à 10 7 hours for speech training in several languages Classical Scaling Alone: Not the Solution Key Focus: Decrease the power density and power consumption Overcome the CPU/memory bottleneck of conventional computing architectures Design new AI algorithms, accelerators, interconnect, and software technologies Merolla et al., Science, 2014

Improve von Neumann Computing Storage class memory Near memory computing Monolithic 3D integration MEMORY CMOS Processing Units CPU Burr et al., IBM J. Res. Dev., 2008 Vermij et al., Proc. ACM CF, 2016 Wong, Salahuddin, Nature Nanotechnology, 2015 Minimize the time and distance to memory access

Go beyond von Neumann Computing Neuromorphic computing Computational memory LeCun, Bengio, Hinton, Nature, 2015 Merolla et al., Science, 2014 Indiveri, Liu, Proc. IEEE, 2015 Borghetti et al., Nature, 2010 Di Ventra and Pershin, Scientific American, 2015 Hosseini et al., Electron Dev. Lett., 2015 Sebastian et al., Nature Communications, 2017

Neural Hardware: Digital or Analog? Fully digital Analog/hybrid Google TPU IBM Synapse Manchester Univ. Stanford UZH / ETH Heidelberg Univ. Area/power for SRAM/RRAM neurons and synapses 1 million neurons 1 million neurons Large improvements in power, area and learning performance for memristive neural HW Truly non-von Neumann computations Rajendran et al., IEEE Trans. Electron Dev., 2013 Rajendran et al., IEEE Trans. Electron Dev., 2013 Potential of a memristive neuron/synapse

Resistive Memory Devices Charge-based memory/storage à resistance-based memory/storage Spin-torque transfer magnetic random access memory (STT-MRAM) Metal oxide random access memory (ReRAM) Conductive bridge random access memory (CBRAM) Phase change memory (PCM) Significant impact on memory/storage hierarchy Monolithic integration of memories and computation units Sufficient richness of dynamics for non-von Neumann computing

Phase-Change Memory (PCM) Amorphize Crystallize High-resistance state Low-resistance state Use two distinct solid phases of a Ge-Sb-Te metal alloy to store a bit Use intermediate phases to obtain a continuum of different states or resistance levels Transition between phases by controlled heating and cooling

Phase-Change Devices in Spiking Neural Networks Synapse Postsynaptic potential ( input ) Neuron Ovshinsky, E\PCOS, 2004 Wright, Adv. Mater., 2011 Kuzum et al., Nano Lett., 2012 Jackson et al., ACM JETCS, 2013 Tuma et al., Nature Nanotechnology, 2016 Pantazi et al., Nanotechnology, 2016 Tuma, et al., IEEE Electron Dev. Lett., 2016 All-PCM architecture: Areal/energy efficiency Can we exploit some unique physical attributes?

Phase-Change Neurons Stochastic phase-change neurons T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian & E. Eleftheriou Nature Nanotechnology, Aug. 2016 The internal state of the neuron is stored in the phase configuration of a PCM device Neuronal dynamics emulated using the physics of crystallization Exhibit inherent stochasticity, which is key for neuronal population coding

Neuronal Population Coding How does the brain store and represent complex stimuli given the slowness, unreliability and uncertainty of individual neurons? Motion Vision Sound High-speed, information-rich stimuli Slow (~10 Hz), stochastic, unreliable neurons Spiking activity of neurons As in any good democracy, individual neurons count for little; it is population activity that matters. For example, as with control of eye and arm movements, visual discrimination is much more accurate than would be predicted from the responses of single neurons. Averbeck et al., Nature Reviews, 2006 Spiking activity T. Tuma et al., Nature Nanotechnology, 2016

Application of an SNN: Temporal Correlation Detection Algorithmic goals Determine whether some data streams are statistically correlated Observe variations in the activity of the correlated input Quickly react to occurrence of correlated inputs Continuously and dynamically re-evaluate the learned statistics Use only unsupervised learning & consume very low power FINANCE SCIENCE MEDICINE BIG DATA and more

Learning Patterns with a Spiking Neural Network Neuron #1: synaptic weights Input pattern Neuron #1: output Neuron #2: synaptic weights Neuron #2: output A. Pantazi et al., Nanotechnology, 2016 Purely unsupervised neuromorphic computation: No counting, no transfers between memory and CPU!

Computational Memory Processing unit & Conventional memory Processing unit & Computational memory Borghetti et al., Nature, 2010 Di Ventra and Pershin, Scientific American, 2015 Hosseini et al., Elect. Dev. Lett., 2015 Sebastian et al., Nature Communications, 2017 The concept: Perform certain computational tasks without the need to transfer data back and forth in the process

PCM to Perform Analog Matrix-Vector Multiplications = Burr et al., Adv. Phys: X, 2017 MAP to conductance values MAP to read voltage DECIPHER from the current Matrix multiplication: Exploits multi-level storage capability and Kirchhoff and Ohm laws A crossbar array performs fast matrix-vector multiplication without data movements in O(1)

How Precisely is the Multiplication? Matrix multiplication: Experimental results But: owing to device variability, stochasticity etc., the matrix-vector multiplication is not highly precise

Application 1: Optimization Solvers Input signal Measured Compressed measurements Reconstructed signal Reconstructed Le Gallo et. El., Proc. IEDM 2017 Compressed sensing: Reconstruction of signals from a small number of measurements from a high-dimensional signal Used in various applications such as MRI, facial recognition, holography, audio restoration or in mobile phone camera sensors

Compressed Sensing/Recovery Using Computational Memory Complexity reduction: O(N 2 ) O(N); Potential 10 6 speed-up on 1000x1000 pixel image Le Gallo et al., Proc. IEDM, 2017

Image Reconstruction with Computational Memory Reconstruction Error NMSE 10 0 10-1 10-2 10-3 PCM chip 4x4-bit Fixed-point Floating-point 0 10Iteration t20 30 Iterations t Experimental result: 128x128 image, 50% sampling rate, Computational memory unit with 131,072 PCM devices Le Gallo et. El., Proc. IEDM 2017 Estimated power reduction of 50x compared to using an optimized 4-bit FPGA matrix-vector multiplier that delivers same reconstruction accuracy at same speed.

Can We Compute with the Dynamics of PCM? A nanoscale non-volatile integrator Sebastian et al., Nature Communications, 2017 Can we exploit the crystallization dynamics for computational memory?

Application 2: Correlation Detection Goal: Detect temporal correlations between event-based data streams. Each process is assigned to a single PCM device. Whenever the process takes the value 1, a SET pulse is applied to the PCM device, with the amplitude or the width of the pulse being proportional to the instantaneous sum of all processes. The conductance of the memory devices deciphers the correlated groups. Sebastian et al., Nature Communications, 2017

Experimental Results (1 Million PCM Devices) Processes Conductances Very weak correlation of c = 0.01 No shuttling back and forth of data Massively parallel

Comparative Study IBM Power8+ Architecture Sebastian et al., Nature Communications, 2017 200x improvement in computation time! Peak dynamic power on the order of watts compared to hundreds of Watts

What if Arbitrarily High-precision is Needed? Mixed-precision computing to the rescue! Bulk of computations in low-precision Computational Memory Refinement in high-precision digital processing engine

Application 3: Linear Equation Solver Digital processor (High precision) Computational memory (Low precision) Le Gallo et al., Mixed-Precision In-Memory Computing, ArXiv, 2017 Solution iteratively updated with low-precision error-correction term Error-correction term obtained using inexact inner solver The matrix multiplications in the inner solver are performed using a PCM array

Linear Equation Solver: Experimental Results Le Gallo et al., Mixed-Precision In-Memory Computing, ArXiv, 2017 Mixed-precision computing provides a pathway for arbitrarily precise computation using computational memory.

System-Level Performance Analysis POWER8 CPU as high-precision processing unit, simulated memory computing unit Significant improvement in the time/energy to solution metric The higher the accuracy of the computational memory, the higher the gain

Application 4: Mixed-Precision Deep Learning Synaptic weight Synaptic weights stored in computational memory The matrix-vector multiplications associated with forward/backward propagation performed in place with low precision The desired weight updates are accumulated in high precision Nandakumar et al., arxiv:1712.01192, 2017

Simulation Results Nandakumar et al., Mixed-Precision Training of Deep Neural Networks Using Computational Memory ArXiv, 2017 Two PCM devices in differential configuration to represent a synapse Device-model-based network simulation achieves 97.78% test accuracy Additional drop from the read noise (0.26%) and analog-digital converters (0.12%)