Memory Intensive Architectures Shahar Kvatinsky Viterbi Faculty of Electrical Engineering Technion Israel Institute of Technology ICRI-CI June 2017 1
Memristors Emerging Nonvolatile Memory Technologies Resistive RAM (RRAM) STT MRAM Phase Change Memory (PCM) 2
Example: Intel 3D Xpoint 16 GB, PCIe 3.0, 241 mm 2 20nm process, 4F 2, 1S1R cells between M4 and M5 91.4% memory efficiency (4.5X higher than DRAM) 3
Memristors Add New Capabilities to CMOS Sea of memory above the logic 4 Dense, nonvolatile, fast, and CMOS compatible
Memory Intensive Architectures Input CPU Output Memory Tight integration of memory and logic 5 Bringing memory to logic In-memory computing
MIA Research Projects Memory design Embedded memory 6 Neuromorphic computing Memristive Memory Processing Unit
Memory Design and Methodologies Understanding the fundamental issues in resistive memory Circuits for memory (Ramadan et al., submitted to TCAS-I) Coding for RRAM (Cassuto et al., ISIT 13, 16, TIT 16) RRAM/PCM in the memory system (Nishil Talati) 7
On-Die Intensive Memory Circuits and Architectures Multistate Register (TVLSI 15) Continuous Flow Multithreading (CAL 14) 8 IoT RFIC (Wainstein et al., ISCAS 17, Memrisys 17)
Neuromorphic Computing Online gradient descent training (TNNLS 15, ISCAS 16) Machine learning accelerators (Tzofnat Greenberg) Configurable mixed signal circuits (Loai Danial) 9
Agenda Memristors and MIA Memristive MPU (mmpu) architecture Summary 10
Processing In-Memory (PIM) Reducing Data Movement 4
Processing In-Memory (PIM) Reducing Data Movement 90 s Recent Prior Art Configuration PIM machine Active Pages SA connected to SIMD pipeline Micron Automata Memory Processor CPU Memory (DRAM) Memory (DRAM) CMOS Processing Units (PUs) Data transfer is still required to/from DRAM and PUs M. Gokhale et al., Processing in memory: the Terasys massively parallel PIM array, Computer, 1995 M. Oskin et al., Active pages: A computation model for intelligent memory, Comput. Archit. News, 1998 D. Elliott et al., Computational ram: Implementing processors in memory, IEEE Des. Test, 1999 P. Dlugosch et al., "An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing," IEEE TPDS, 2014 4
Real Computing within the Memory Beyond von Neumann Architecture Input Device CPU Control Unit Arithmetic/ Logic Unit Output Device Memory Processing Unit Memory 13 (MPU)
mmpu: Solving the von Neumann Bottleneck Moving from DRAM to memristive memory CPU mmpu: performing computation USING the memristive memory cells Clock, Address, Data, and Controls mmpu mmpu 5
Logic within Memory Logic Families x 0 x 1 y 0 M Z M Z f 01 M Z M Z Unipolar logic (ICSEE 16, VLSI-Soc 16) IMPLY (ICCD 11, TVLSI 14) M y Z M Z 1 M Z M Z f 10 f out a Array 2x2 model Akers array (MEJ 14) f out MAGIC 15 (TCAS II 14, TNANO 16)
Memristor Memory Resistor Resistor with Varying Resistance Current Voltage Decrease Increase resistance Current 16
MAGIC Memristor Aided LoGIC Example of MAGIC NOR Initialize OUT to R ON R ON =Logic 1 R OFF =Logic 0 R OFF ON R OFF >> R ON <<V >V G G /2 IN 1 IN 2 NOR 0 0 1 0 1 0 1 0 0 1 1 0 R OFF ON R ON OFF Increase resistance 17 S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, "MAGIC Memristor Aided LoGIC," IEEE TCAS II, Nov. 2014
Real MAGIC 18 B. C. Jung et al., Zero-static-power nonvolatile logic-in-memory circuits for flexible electronics, Nano Research, April 2017
MAGIC NOR in a Crossbar V G V G IN 1 IN 2 OUT 19
MAGIC NOR in a Crossbar V G V G IN 1 IN 2 OUT 20
MAGIC NOR in a Memristive Memory V G V G OU IN IN 1 IN IN 2 OUT T V Isolate IN 1 IN 2 OUT 21 N. Talati, S. Gupta, P. Mane, and S. Kvatinsky, Logic Design within Memristive Memories Using MAGIC," IEEE Transactions on Nanotechnology, July 2016
Hierarchy of Logical Functions Matrix multiplication Convolution MUL POW SQRT DIV ADD NOR AND SUB NOT XOR OR COPY NAND 22 MAGIC - NOR
Parallel Vector Operation within Memristive MPU f n : R n R n R n Control a 0 b 0 c 0 a 1 b 1 c 1 f n : a 2 b 2 = c 2 a n, b n c n 23 Latency of the vector operation is independent of the length of the vector
mmpu µarchitecture Column Control Memristive Memory a 0 b 0 a 1 b 1 a 2 b 2 a n b n Row Control mmpu Controller R. Ben-Hur 24 and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, July 2016
mmpu µarchitecture Column Control Memristive Memory a 0 b 0 a 1 b 1 a 2 b 2 a n b n Row Control mmpu Controller R. Ben-Hur 25 and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, July 2016
mmpu µarchitecture Column Control Memristive Memory a 0 b 0 a 1 b 1 a 2 b 2 a n b n Row Control mmpu Controller R. Ben-Hur 26 and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, July 2016
mmpu µarchitecture Column Control Memristive Memory a 0 a 1 a 2 a n b 0 b 1 b 2 b n c 0 c 1 c 2 c n Row Control mmpu Controller R. Ben-Hur 27 and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, July 2016
CPU mmpu Systems Accelerator or Main Memory? Accelerators mmpu Clock, Address, Data, and Controls Memristive memory with processing capabilities 28 mmpu Memristive Memory DRAM DIMM?
Issues Involved in mmpu Architecture Memory Design mmpu Controller Design and Optimization Periphery Design Programming Model Software mmpu Architecture CPU? mmpu Controller mmpu 29 Applications
Agenda Memristors and MIA Memristive MPU (mmpu) architecture Summary 30
Memristors to the Rescue? New technologies enable memory intensive architectures Better processors (multithreading, low power) Accelerators (machine learning) Smart memories (memory processing unit) 31
Thanks! 32