Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu,...

23
Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm

Transcript of Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu,...

Page 1: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm

Page 2: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL2

Why big data memory characterization?

• Workloads, Methodology and Metrics

• Measurements and results

• Conclusion and outlook

Agenda

Page 3: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL3

Why big data memory characterization?

• Studies show exponential data growth to come.

• Big Data: information from unstructured data

• Primary technologies are Hadoop and NoSQL

Page 4: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL4

Large data volumes can put pressure on the memory subsystem

Optimizations tradeoff CPU cycles to reduce load on memory, ex: compression

Why big data memory characterization?

Important to understand memory usages of big data

PowerMemory consumes upto 40% of total server power

PerformanceMemory latency,

capacity, bandwidth are important

Page 5: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL5

Why big data memory characterization?

How do latency-hiding optimizations apply to big data workloads?

DRAM scaling is hitting limits

Emerging memories have higher latency

Focus on latency hiding optimizations

Page 6: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL6

Executive Summary

• Provide insight into memory access characteristics of big data applications

• Examine implications on prefetchability, compressibility, cacheability

• Understand impact on memory architectures for big data usage models

Page 7: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL7

• Why big data memory characterization?

Workloads, Methodology and Metrics

• Measurements and results

• Conclusion and outlook

Agenda

Page 8: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL8

Big Data workloads

• Sort

• WordCount

• Hive Join

• Hive Aggregation

• NoSQL indexing

We analyze these workloads using hardware DIMM traces, performance counter monitoring, and performance measurements

Page 9: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL9

General Characterization

Memory footprint from DIMM trace• Memory in GB touched atleast once by the application

• Amount of memory to keep the workload „in memory“

EMON:• CPI

• Cache behavior: L1, L2, LLC MPI

• Instruction and Data TLB MPI

Understand how the workloads use memory

Page 10: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL10

Cache Line Working Set Characterization

1. For each cache line, compute number of times it is referenced

2. Sort cache lines by their number of references

3. Select a footprint size, say X MB

4. What fraction of total references is contained in X MB of the hottest cache lines?

Identifies the hot working set of application

Page 11: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL11

Cache Simulation

Run workload through a LRU cache simulator and vary the cache size

Considers temporal nature, not only spatial• Streaming through regions larger than cache size

• Eviction and replacement policies impact cacheability

• Focus on smaller sub-regions

Hit rates indicate potential for cacheability in tiered memory architecture

Page 12: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL12

Entropy

• Compressibility and Predictability important

• Signal with high information content – harder to compress and difficult to predict

• Entropy helps understand this behavior. For a set of cache lines K:

Lower entropy more compressibility, predictability

Page 13: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL13

Entropy - example

Footprint: 640B

References: 100

References/line: 10

Footprint: 640B

References: 100

References/line: 10

Footprint: 640B

References: 100

References/line: 10

<<64 byte cache: 10%

192 byte cache: 30%

Entropy: 1

64 byte cache: 19%

192 byte cache: 57%

Entropy: 0.785

64 byte cache: 91%

192 byte cache: 93%

Entropy: 0.217

(A) (B) (C)

Lower entropy more compressibility, predictability

Page 14: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL14

Correlation and Trend Analysis

Examine trace for trendsEg: increasing trend in upper physical address ranges Aggressively prefetch to an upper cache

• With s = 64, l=1000, test function f mimics ascending stride through memory of 1000 cache lines

• Negative correlation with f indicates decreasing trend

High correlation strong trend predict, prefetch

Page 15: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL15

• Why big data memory characterization?

• Big Data Workloads

• Methodology and Metrics

Measurements and results

• Conclusion and outlook

Agenda

Page 16: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL16

General Characterization

• NoSQL and sort have highest footprints

• Hadoop Compression reduces footprints and improves execution time

Page 17: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL17

General Characterization

• Sort has highest cache miss rates (transform large volume from one representation to another)

• Compression helps reduce LLC misses

L2 MPKI

Page 18: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL18

General Characterization

• Workloads have high peak bandwidths

• Sort has ~10x larger footprint than wordcount, but lower DTLB MPKI: memory references not well contained within page granularities, and are widespread

Page 19: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL19

Cache Line Working Set Characterization

Hottest 100MB contains 20% of all references

NoSQL has most spread among its cache lines

Sort has 60% references in 120GB footprint within 1GB

Page 20: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL20

Cache Simulation

Percentage cache hits higher than percentage references from footprint analysis

Big Data workloads operate on smaller memory regions at a time

Page 21: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL21

Entropy

Big Data workloads have higher entropy (>13) than SPEC workloads (>7) they are less compressible, predictable

from [Shao et al 2013]

Page 22: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL22

Normalized Correlation

• Hive aggregation has high correlation magnitudes (+,-)

• Enabling prefetchers has higher correlation in general

Potential for effective prediction and prefetching schemes for workloads like Hive aggregation

Page 23: Memory System Characterization of Big Data Workloads  Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm.

INTEL CONFIDENTIAL23

• Big Data workloads are memory intensive

• Potential for latency hiding techniques like cacheability and predictability to be successful• Large 4th level cache can benefit big data workloads

• Future work • Including more workloads in the study

• Scaling dataset sizes, etc

Take Aways & Next Steps