Memory System Support for Online Data-Intensive...

www.inf.ed.ac.uk July 16, 2015 Kyungpook National University

Memory System Support for Online Data-Intensive Services

Boris Grot School of Informatics

University of Edinburgh

The Big Data Explosion

2 Image: Erik Fitzpatrick

Memory: the New Efficiency Battleground

Server CPUs getting more efficient: – Wimpy cores à low energy/op

– Many cores/chip à fewer sockets [SOP]

DRAM: –  Demand for capacity outpacing

technology scaling

– Growing contributor to datacenter Total Cost of Ownership (TCO)

core core core core

Must innovate in the memory system

DRAM 101

Accessed at block granularity -  Page activated in row buffer

•  Energy-intensive operation -  Blocks fetched from row buffer

•  Row buffer hits are 3x lower energy than activations

DRAM organized in pages -  Page consists of multiple

cache blocks •  DRAM page ≠ OS page

Do servers leverage row buffer locality?

Row Buffer

DRAM memory

Information Stores for Big Data Pointer-intensive structures store bulk data objects

–  Constant-time data object retrievals –  Example structures: hash tables, tree structures

–  Example objects: memory-mapped files, SW objects, DB rows

hash tables (e.g., web search, object caching) trees (e.g., databases, file systems)

0 1 2 3 4 5 6 7 8 9

Retrieving a Bulk Object

Memory

0 1 2 3 4 5 6 7 8 9

Application

Keys spread over the memory space

Bulk objects contiguously allocated

Accesses: fine-grained for key lookups & bulk for data objects

Server Memory Traffic is Bimodal

Bulk accesses account for 60-75% of all memory accesses –  Bulk access: touches ≥ 50% of bytes within a 1KB region

7 Are bulk accesses leveraged by memory?

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

bulk fine-grained

Bulk Accesses Are Poorly Exploited

Row buffer locality poorly exploited –  Requests from multiple cores interleave –  Limited instruction window size restricts MLP

DRAM page activations chief contributor to energy (~60%)

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

s Row buffer hits

Activation energy

Need to improve row buffer locality

Prior Work

Memory Access Scheduling: prioritize row buffer hits –  Effectiveness limited by instruction window size

Spatial prefetching and scheduled writebacks –  High hardware cost

–  Limited opportunity: only a fraction of the memory accesses covered

9 Need a comprehensive mechanism with low cost

Streaming Can Exploit Locality –  Stream contents of the row buffer to last-level cache (LLC) –  Subsequent accesses become LLC hits

Row Buffer

Last-Level Cache

0 1 2 3 4 5 6 7 8 9

Application

C LLC hits

Memory Request Stream

Challenge: fine-grained accesses cause overfetch

BuMP: Bulk Memory Access Prediction and Streaming

Prediction: identify bulk accesses –  For both memory reads and writes

Streaming: upon an access to bulk object –  Read: Stream entire object into the LLC – Write: Stream entire object to memory

[MICRO’14]

BuMP: Memory Reads

Memory reads: triggered by LLC misses –  Majority (57-75%) go to pages with coarse-grained data

Prediction: associate coarse-grained access regions with code operating on them

–  Identify functions that operate on coarse-grained data •  Use the program counter (PC) of the first access

Streaming: upon a memory reference –  Check if PC belongs to a coarse-grained operation

–  Trigger bulk fetch

12 Low cost as only few PCs trigger bulk accesses

BuMP: Memory Reads Prediction

Memory

Last-Level Cache

Memory

History Tracking Table

Last-Level Cache

Memory

Last-Level Cache

0 1 2 3 4 5 6 7 8 9

PC1 PC1 PC2

BuMP: Memory Reads Streaming

Memory

Last-Level Cache

Row Buffer

0 1 2 3 4 5 6 7 8 9

Exploit row buffer locality when profitable

BuMP: Memory Writes Memory writes: evictions of modified LLC blocks

–  Significant share (21-38%) of DRAM traffic –  Majority (62-86%) go to pages with coarse-grained data

Prediction: track modified LLC-resident coarse-grained data

–  Extends tracking table with a modified bit

Streaming: upon writing back an LLC block to memory –  Check if it belongs to a page with coarse-grained data –  Trigger bulk writeback

Methodology Server applications [CloudSuite]

–  Data serving, Online analytics, Web search, Web serving, Media streaming

Many-core server –  16-core CMP @ 2.5 GHz –  16 GB of DRAM

Performance evaluation –  Simics: full-system

simulation –  Flexus: cycle-accurate

models of CMP & DRAM

Energy consumption –  Custom DRAM energy

models based on Micron

Evaluation Highlights

–  BuMP reduces row activations by 2x over Base-Open –  Small over-fetch rate of ~12%

–  Improves performance by ~11% over Base-Open

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

Row activations Row buffer hits & Interface

Improves memory energy per access by 23%

BuMP: Summary Servers access memory in two granularities

–  Fine: pointer-intensive data structures –  Coarse: bulk data objects

DRAM does not exploit coarse-grained accesses –  Accesses to different objects are interleaved

BuMP improves server energy efficiency –  Identifies bulk accesses & triggers bulk transfers –  Improves memory energy per access by 23%

0 1 2 3 4 5 6 7 8 9

Memory System Support for Online Data-Intensive...

Documents

Transcript of Memory System Support for Online Data-Intensive...

Analytical Modeling of Magnetically Saturated Inductance ...komag.org/journal/download.html?file_name=3ea2c7c... · Analytical Modeling of Magnetically Saturated Inductance by Lambert

Comparison of the Electromagnetic Characteristics of ...komag.org/journal/download.html?file_name=4e2821cf43e3e8c41054961db... · gap flux density, flux linkage, and inductance, as

Clarinets - woodwindtest.woodwind.org/clarinet/BBoard/download.html/1,6190/BCA1810_clarinets.ja.en...CLARINETS YCL-SEVmaster bell Shape and material of the bell gives a great influence

Syam Kumar Pidaparthy, Member, IEEE IEEE Proofbkict-ocw.knu.ac.kr/include/download.html?fn=59351B71C823B.pdf10 of an upstream boost converter, downstream buck converter, and 11 two

Radio Education and Research Centerbkict-ocw.knu.ac.kr/include/download.html?fn=55890CAD8... · beam1 beam5 beam6 beam4 beam3 beam2 41.5D 15D3dB HPBW (half power beam width) H-plane

Indian Auto Component Industry – Opportunities Unbound › india_info › download.html... · Indian Auto Component Industry – Opportunities Unbound Presentation by: Vinnie Mehta

Reduction of Tooth Harmonic in Fractional-Slot ...komag.org/journal/download.html?file_name=45e943fb... · fractional-slot concentrated-winding (FSCW) have attracted increased attention

test.woodwind.orgtest.woodwind.org/clarinet/BBoard/download.html/1,3102/Henkin Clarinet 28S.pdfclarinets, for superior intonation. Herculite body, the most durable material ever developed

The Airline Industry Challenges Cyclical Capital-intensive Labor-intensive Energy-intensive!!! Technology-intensive Heavily regulated Heavily.

...d bdbd m d bdbd d do nu dd nu G). Intensive English Course - b. Intensive English Course - b dd nu 10.0 Intensive - b.b Intensive - b.m Intensive - b.d Intensive ... 4 2564 ana

Modern Semiconductor Devices for Integrated Circuits Chapter 1. …bkict-ocw.knu.ac.kr/include/download.html?fn=559E036F732... · Modern Semiconductor Devices for Integrated Circuits

Design of An Electromagnetic Energy Harvesting System ...komag.org/journal/download.html?file_name=c1624d858e9fba81f2c8ca2c60…The main advantages of the linear electromagnetic shock

4. ANALYSIS & DESIGN OF COMBINATIONAL LOGICbkict-ocw.knu.ac.kr/include/download.html?fn=558AF2863E0E7.pdf · 4. ANALYSIS & DESIGN OF COMBINATIONAL LOGIC ... (MSI) ~ 100 gates Large-scale

Intensive Model of Physiotherapykidsphysiotherapy.com.au/.../intensive-model-of-physiotherapy.pdf · Intensive Model of Physiotherapy Kids Physiotherapy offers an Intensive Model

2. BOOLEAN SWITCHING ALGEBRA - KNUbkict-ocw.knu.ac.kr/include/download.html?fn=558AF2863... · Boolean Switching Algebra 2-2 Logic Design ©Dong-Seog Han Ex. The set B={0,1} and the

Contributions from the Depletion Region - KNUbkict-ocw.knu.ac.kr/include/download.html?fn=559E057A862B5.pdf · Solar Cells, Light Emitting Diode, Laser Diode, Photodiode Solar Cell

Physical concept ontology for the knowledge intensive ... · of the Knowledge Intensive Engineering Framework (KIEF) to support knowledge-intensive engineering [6]. Knowledge-intensive

Analytical Modelling of Open-Circuit Flux Linkage, Cogging ...komag.org/journal/download.html?file_name=0b11db3cdc1f15eb402… · Noman Ullah1,2*, Faisal Khan1, Wasiq Ullah1, Abdul

Robert Mroczyński - Kyungpook National Universitybkict-ocw.knu.ac.kr/include/download.html?fn=55C4687838A... · Quirk M., Serda J.: Semiconductor Manufacturing Technology, Prentice

Design and Characteristics Analysis of the 78 kWe Grade ...komag.org/journal/download.html?file_name=bca2e63f3e4bf443dee… · Rotating Machinery Center, Korea Testing Certification,