GNW01: In-Memory Processing for Databases

gluent.com 1

In-MemoryExecutionforDatabases

TanelPoderalongtimecomputerperformancegeek

gluent.com 2

Intro:Aboutme

• TanelPõder• OracleDatabasePerformancegeek(18+years)• ExadataPerformancegeek• LinuxPerformancegeek• HadoopPerformancegeek

• CEO&co-founder:

ExpertOracleExadatabook

(2nd editionisoutnow!)

Instantpromotion

gluent.com 3

GluentOracle

TeradataNoSQL

BigDataSources

Gluentasadatavirtualizationlayer

OpenDataFormats!

gluent.com 4

GluentAdvisor

1. Analyzes DBstorageuseandaccesspatternsforsafeoffloading

2. 500+Databasesanalyzed

3. 10+PB analyzed– 81% offloadable

4. 2-24x queryspeedup

10PBInterestedinanalyzingyourdatabase?

http://gluent.com/whitepapers

gluent.com 5

Tapeisdead,diskistape,flashisdisk,RAMlocalityisking

JimGray,2006

http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt

gluent.com 6

SeagateCheetah15kRPMdiskspecs

200MB/sec!

gluent.com 7

SpinningdiskIOthroughput

• B-Treeindex-walking disk-basedRDBMS• 15000rpmspinningdisks• ~200random IOPSperdisk• ~8kBreadperrandomIO

• 8kB*200IOPS=1.6MB/sec perdisk

• Fullscanning basedworkloads• Potentiallymuchmoredatatoaccess&filter• Partitionpruning,zonemaps,storageindexeshelptoskipdata1• Scanonlyrequiredcolumns(formatswithlargechunksizes)• SequentialIOrateupto200MB/sec perdisk

http://www.dbms2.com/2013/05/27/data-skipping/

However,indexscanscanreadonlyasubsetofdata

gluent.com 8

ScanningabunchofspinningdiskscankeepyourCPUsreallybusy!

*NoteventalkingaboutflashorRAMhere!

gluent.com 9

AsimplequerybottleneckedbyCPU

9GBscanned,processedin7seconds:

~1300MB/sinPX~80MB/sperslave

gluent.com 10

AcomplexquerybottleneckedbyCPU

ComplexQuery:MuchmoreCPUspenton

aggregations,joins.9GBprocessedin1.5minutes

9GB/90seconds=~100MB/sPX

6MB/sperslave

gluent.com 11

Ifdisksandstoragesubsystemsaregettingsofast,whyallthebuzzaroundin-memorydatabasesystems?

*Can’twejustcachetheolddatabasefilesinRAM?

gluent.com 12

AsimpleDataRetrievaltest!

• Retrieve1% rowsoutofa8GBtable:

SELECTCOUNT(*)

, SUM(order_total)FROM

orders WHERE

warehouse_id BETWEEN 500 AND 510

TheWarehouseIDsrangebetween

1and999

Testdatageneratedby

SwingBench tool

gluent.com 13

DataRetrieval:TestResults• Remember,thisisaverysimplescanning+filteringquery:

TESTNAME PLAN_HASH ELA_MS CPU_MS LIOS BLK_READ------------------------- ---------- -------- -------- --------- ---------test1: index range scan * 16715356 265203 37438 782858 511231test2: full buffered */ C 630573765 132075 48944 1013913 849316test3: full direct path * 630573765 15567 11808 1013873 1013850test4: full smart scan */ 630573765 2102 729 1013873 1013850test5: full inmemory scan 630573765 155 155 14 0test6: full buffer cache 630573765 7850 7831 1014741 0

Test5&Test6runentirelyfrommemory

Source:http://www.slideshare.net/tanelp/oracle-database-inmemory-option-in-action

Butwhy50xdifferenceinCPUusage?

gluent.com 14

Tapeisdead,diskistape,flashisdisk,RAMlocalityisking

JimGray,2006

http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt

gluent.com 15

LatencyNumbersEveryProgrammerShouldKnow

Latency Comparison Numbers--------------------------L1 cache reference 0.5 nsBranch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cacheMutex lock/unlock 25 nsMain memory reference 100 ns 20x L2 cache,

200x L1 cacheCompress 1K bytes with Zippy 3,000 ns 3 usSend 1K bytes over 1 Gbps network 10,000 ns 10 us

Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSDRead 1 MB sequentially from memory 250,000 ns 250 usRound trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory,20X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

Source:https://gist.github.com/jboner/2841832

gluent.com 16

CPU=fast

CPUL2/L3cacheinbetween

RAM=slow

gluent.com 17

RAMaccessisthebottleneckofmoderncomputers

WaitsforRAMaccessshowupasCPUusageinmonitoringtools

Wanttowaitless?Doitless!

gluent.com 18

CPU&cachefriendlydatastructuresarekey!

Headers,ITLentries

RowDirectory

#0hdr row

#1hdr row

#2hdr row

#3hdr row

#4hdr row

#5hdr row

#6hdr row

#7hdr row

#8hdr row

… row

#1offset#2offset#3offset

#0offset

Hdrbyte ColumndataLock

byteCCbyte

Col.len ColumndataCol.

len ColumndataCol.len ColumndataCol.

• OLTP:Block->Row->Columnformat• 8kBblocks• Greatforwrites,changes

• Field-lengthencoding• Readingcolumn#100requireswalking

throughallprecedingcolumns

• Columns(withsimilarvalues)notdenselypackedtogether

• NotCPUcachefriendlyforanalytics!

gluent.com 19

Scanningcolumnardatastructures

Scanningacolumninarow-oriented datablock

Scanningacolumninacolumn-oriented compressionunit

col1 col2

col2col2

col3col3

col4col4

col5col5

col5col6

col1 col2

col3 col4col4 col5

col6 col1 col2col3

col5col1 col2

col6col6

col1 col2

col3 col4col4 col5

col6 col1 col2col3

col5col1 col2

col6col6

col1 col2

col3 col4col4 col5

col6 col1 col2col3

col5col1 col2

col6col6 Readfilter

column(s)first.Accessonly

projectedcolumnsifmatchesfound.

Reducedmemorytraffic.More

sequentialRAMaccess,SIMD onadjacentdata.

gluent.com 20

Howtomeasure thisstuff?

gluent.com 21

CPUPerformanceCountersonLinux# perf stat -d -p PID sleep 30

Performance counter stats for process id '34783':

27373.819908 task-clock # 0.912 CPUs utilized86,428,653,040 cycles # 3.157 GHz 32,115,412,877 instructions # 0.37 insns per cycle

# 2.39 stalled cycles per insn7,386,220,210 branches # 269.828 M/sec

22,056,397 branch-misses # 0.30% of all branches 76,697,049,420 stalled-cycles-frontend # 88.74% frontend cycles idle 58,627,393,395 stalled-cycles-backend # 67.83% backend cycles idle

256,440,384 cache-references # 9.368 M/sec 222,036,981 cache-misses # 86.584 % of all cache refs 234,361,189 LLC-loads # 8.562 M/sec 218,570,294 LLC-load-misses # 93.26% of all LL-cache hits 18,493,582 LLC-stores # 0.676 M/sec 3,233,231 LLC-store-misses # 0.118 M/sec

7,324,946,042 L1-dcache-loads # 267.589 M/sec 305,276,341 L1-dcache-load-misses # 4.17% of all L1-dcache hits 36,890,302 L1-dcache-prefetches # 1.348 M/sec

30.000601214 seconds time elapsed

Measurewhat’sgoingoninside a

Metricsexplainedinmyblogentry:

http://bit.ly/1PBIlde

gluent.com 22

TestingdataaccesspathdifferencesonOracle12c

SELECT COUNT(cust_valid) FROM customers_nopart c WHERE cust_id > 0

Runthesamequeryonsamedatasetstoredindifferentformats/layouts.

Fulldetails:http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles/

Testresultdata:http://bit.ly/1RitNMr

gluent.com 23

CPUinstructionsusedforscanning/counting69Mrows

gluent.com 24

AverageCPUinstructionsperrowprocessed

• Knowingthatthetablehasabout69Mrows,Icancalculatetheaveragenumberofinstructionsissuedperrowprocessed

gluent.com 25

CPUcyclesconsumed(fullscansonly)

gluent.com 26

CPUefficiency(Instructions-per-Cycle)

Yes,modernsuperscalarCPUscanexecutemultiple

instructionspercycle

gluent.com 27

ReducingmemorywriteswithinSQLexecution

• Oldapproach:1. Readcompresseddatachunk2. Decompressdata(writedatatotemporarymemorylocation)3. Filteroutnon-matchingrows4. Returndata

• Newapproach:1. Readandfilter compressedcolumns2. Decompressonlyrequiredcolumnsofmatchingrows3. Returndata

gluent.com 28

Memoryreads&writesduringinternalprocessing

Unit=MB Readonlyrequestedcolumns

Rowscountedfromchunkheaders

Scancompresseddata:fewmemorywrites

gluent.com 29

Past&Future

gluent.com 30

Somecommercialcolumnstorehistory

• Disk-optimizedcolumnstores• Expressway103/SybaseIQ(early‘90s)• MonetDB (early‘90s)• OracleHybridColumnarCompression(disk/OLTPoptimized)• …

• Memory-optimizedcolumnstores• …• SAPHANA(December2010)• IBMDB2withBLUAcceleration(June2013)• OracleDatabase12cwithIn-MemoryOption(July2014)• …

*Notaddressingmemory-optimizedOLTP/row-storeshere

gluent.com 31

Future-proofOpenDataFormats!

• Disk-optimizedcolumnardatastructures• ApacheParquet

• https://parquet.apache.org/

• ApacheORC• https://orc.apache.org/

• Memory/CPU-cacheoptimizeddatastructures• ApacheArrow

• Notonlystorageformat• …alsoacross-system/cross-platformIPCcommunicationframework• https://arrow.apache.org/

gluent.com 32

Future

1. RAMgetscheaper+bigger,notnecessarilyfaster

2. CPUcachesgetlarger

3. RAMblendswithstorageandbecomesnon-volatile

4. IOsubsystems(flash)getevenclosertoCPUs

5. IOlatenciesshrink

6. Thelatencydifferencebetweennon-volatilestorageandvolatileRAMshrinks- newdatabaselayouts!

7. CPUcacheisking– newdatastructuresneeded!

gluent.com 33

References

• Slides&Videoofthispresentation:• http://www.slideshare.net/tanelp• https://vimeo.com/gluent

• Indexrangescansvsfullscans:• http://blog.tanelpoder.com/2014/09/17/about-index-range-scans-

disk-re-reads-and-how-your-new-car-can-go-600-miles-per-hour/

• RAMisthenewdiskseries:• http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-

how-to-measure-its-performance-part-1/• https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqA

AlHnZqmuVmSFbHMLDsjaU/

gluent.com 34

Thanks!

http://gluent.com/whitepapers

Wearehiringdevelopers&dataengineers!!!

http://blog.tanelpoder.comtanel@tanelpoder.com

@tanelpoder

GNW01: In-Memory Processing for Databases

Data & Analytics

Transcript of GNW01: In-Memory Processing for Databases

Introduction to Signal Processing on Databases · Graph Statistics • 90 minutes ... Live on Parallel Computers. MemoryHierarchy Parallel Architecture. Unitof Memory. Implications.

Information Processing and Memory

Query Processing in Spatial Network Databases

Memory Module22 :Information Processing

CHAPTER EIGHT Accessing Data Processing Databases.

Document Oriented Databases and Text Processing

Query Processing over Incomplete Autonomous Databases

Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….

Query Processing in Tertiary Memory Databases

Databases - dl.booktolearn.comdl.booktolearn.com/ebooks2/computer/databases/... · Oracle 12c “in-Memory Database” 98

Authenticated Join Processing in Outsourced Databases

Logical and Physical Versioning in Main Memory Databases

Processing Subsystems of Memory

Instant Recovery for Main-Memory Databases - cidrdb.orgcidrdb.org/cidr2015/Papers/CIDR15_Paper13.pdf · Instant Recovery for Main-Memory Databases ... secondary data structures out

Lineage Processing over Correlated Probabilistic Databases

Semantic Web Query Processing with Relational Databases

Top-k Query Processing in Uncertain Databases

In-memory Databases

Memory. Cognitive Processing Automatic processing Controlled processing Serial processing Parallel processing.

Indexing in Spatial Databases and Query Processing