Linux on Power Anwendertag 2016 IBM – Böblingen€¦ · high bandwidth, low-latency memory layer...

1© Copyright 2013-2016 Levyx Inc. Proprietary and Confidential

Linux on Power Anwendertag 2016

IBM – Böblingen

NoSQL Datenbank Abfragen Software optimiert für IBM Power

CAPI FLASH – aware solutions

The Deluge O Data Is Here And Accelerating

© Copyright 2013-2015 Levyx Inc. Proprietary and Confidential

Levyx – Eine Optimizations-Brücke zwischen SW & HW

SoftwareHardware

NVMs Flash SSDs

Multi-core Processors

Hardware

Agnostic storage

middle-ware designed to optimize SW

data-transactions

with latest HW

IBM SW Anwendungen die von Levyx‘s Technologie profitieren können:Netezza analytics, Big Insight, Dr. Watson, Spark, BigSQL, Memcached etc…



Optimization Tool

Patent-pendingMulti-core


Ultra-low latency indexing

engine for billions of

objects

NVM/Flash Replaces DRAM

Enables Very Dense Nodes

World’s Fastest Key Value Store

Secret Sauce: HeliumTM Data Access Engine




Helium: Flash/Multi-Core Optimized

Application-analytics platform

Database

Database Storage Engine

OS File System

OS Volume Manager

OS Device Driver

Disk Controller

Disk Drive

Application-Analytics platform

Database

Levyx Helium

OS device driver

Flash controller F/W

Flash Chips

Leverage Multi-core/Multi-Channel Parallelism to boost performance/reduce latency. Reduce layers of abstraction/overhead



© Copyright 2013-2015 Levyx Inc. 5

Helium Key Attributes

• Compact RAM-based Index – 10’s Billions of Keys, PTB’s Data

• Flash Optimized– tight 99%, 99.99% latency

• Lock-free architecture

• Structured:

–Full SQL Command Set – Sort, Join, Group-by, Filter, Aggregate, Projections, etc

• Unstructured:

–Get, Put, Delete, Point/Range Query, Point Update

• ACID Compliance/Transactions Groups

• In-line Dictionary Compression

• Snapshot



© Copyright 2013-2015 Levyx Inc.

� Portable Implementation with Architecture and OS-specific Dependencies Fully Isolated

� Available on Unix/Linux/Window/Mac platforms

� Distributed in the Form of a Library

� Fully documented key/value API

� Bundled as a Server with Client API Support in Popular Languages

� C, C++, Java, Node.js, REST, etc.

� Wrappers for Popular KVS

�RocksDB, Memcached

� Platform for Integration with Other Technologies

� Support for structured data (to improve Spark’s shuffle performance)

� Columnar database integration with SparkSQL

Helium: Programming Language/Platform/Wrapper Support



What We Have Innovated

Main Memory

I/O Subsystem

Flash

“Traditional” Data Path Levyx Data Path

• Flash/NVM treated as 2nd-class citizens

• Files systems and OS kernels not designed to fully utilize bandwidth

• Block-oriented unstructured access

• A single, persistent, high-capacity, high bandwidth, low-latency memory layer

• Scalable with the # of cores in a system

• Object-oriented, highly structured data access

Registers,

Caches, Main

Memory/ Flash/NVM

Our Innovation:A simpler, more

scalable, I/O stack


strategy ( )

API Attached Flash Optimization CAPI changes the game - Direct Flash Access - feels

more like memory and less like storage

� Attach IBM FlashSystem to POWER8 via CAPI

� Read/write commands issued via APIs from applications to eliminate 97% of code path length

� Saves 10+ cores per 1M IOPS

Pin buffers, Translate, Map DMA, Start I/O

Application

Read/Write Syscall

Interrupt, unmap, unpin,Iodone scheduling

20K instructions reduced to

<2000

Disk and Adapter DD

strategy ( ) iodone ( )

FileSystem

Application

User Library

Posix AsyncI/O Style API

Shared Memory Work Queue

aio_read()aio_write()

iodone ( )

LVM

9

Machine Learning SQL Graph

1.7X System-to-System Advantage2X Core-to-Core Advantage

Machine Learning SQL Graph Machine Learning SQL Graph

1.5X Price Performance Advantage

Performance of Spark on POWER 7-Node S812LC 10-core vs. 7-Node E5-2690 v3 12-core

CAPI Flash Configurations

Up to 56TB of extended memory with one POWER8 server + CAPI attach FLASH

Power S822L / S812L

Flash System 900

Power S822L / S812L / S822 LC

NEW

External Flash Configuration

Integrated Flash Configuration

Up to 8TB of super-fast storage tier on one POWER8 server

10

0

50.000

100.000

150.000

200.000

250.000

300.000

350.000

400.000

450.000

Conventional CAPI - I CAPI - E

IOPS per Hardware Thread

0

20

40

60

80

100

120

140

160

180

200

Conventional CAPI - I CAPI - E

Latency (microseconds)



CAPI Flash Solution Use Cases

Memory Expansion• Application constrained by single-

system memory capacity. Typical growth is through additional compute nodes.

• CAPI Flash APIs offer highly-efficient flash access, increased total capacity at better $ / throughput.

Data Cache• Application uses in-memory caches

for data storage, and typically-constrained by ratios of memory to underlying storage.

• CAPI Flash APIs offer access to much larger ephemeral or persistent data in Flash, freeing up RAM.

Fast Storage• Application is constrained by IO

overhead and throughput of existing storage infrastructure.

• CAPI Flash APIs offer extremely high IO per CPU thread with low latency.

Netezza analytics, Big Insight, Dr. Watson, Spark, BigSQL, Memcached etc…



Helium vs RocksDB vs Aerospikehttp://www.levyx.com/content/helium-demo

© Copyright 2013-2015 Levyx Inc. Proprietary and Confidential© Copyright 2013-2016 Levyx Inc.



• Helium-DB Storage Engine

–World’s Fastest Key Value store for Big Data Analytics and Operational Databases

–In-Memory Speeds for very large data sets with Persistence

• LevyxSpark: Apache Spark+Helium

– Storage Optimized and Accelerated Open Source Spark for real-time/hi IO performance applications

– Full Spark SQL query pushdown (join, group-by, filter, etc) and acceleration to machine code speeds

– Node consolidation with combined memory-flash storage layer

• Levyx Enhanced Memcached

–Maximize resource utilization – over 1 million transactions per second per node.

–Data Persistence without sacrificing ACID properties

Levyx Products



Helium Accelerated Memcached

• Faster : Standard 90:10 and 70:30 (get:set) Helium-Memcached is at least 3,5 to 5,5 times better in TPS.

• Cheaper : Single Helium-Memcached scales with cores/SSD vs. stock memcached (needs multiple nodes, large amounts of RAM)

• Simpler: Plug and Play with existing Memcached applications. Rapid Automatic recovery from persisted SSD simplifies

70% get 30% set

TPS Memcached Levyx Memcached

SolarFlare 1.3 M 5.1 M

SolarFlare OpenOnload

1.5 M 7.4 M

90% get 10% set

TPS Memcached Levyx Memcached

SolarFlare 3.9 M 10.0 M

SolarFlare OpenOnload

4.2 M 14.8 M



Spark without Levyx

(500 nodes) r3.8 large$33,600 /day

Spark with Levyx

(50 nodes) c3.8 large$1,920 /day

15X Lower Cost!

LevyxSpark Reduces Nodes and Cost

Cyber Security Real Time Monitoring Use Case

“Often times technology vendors advertise

scale-out as a way to reach high performance

goals. It is a proven approach, but it is often

used to mask single node inefficiencies.

Without a solution where CPU, memory,

network, and local storage are properly

balanced, this is simply what we call “throwing

hardware at the problem”. Hardware that,

virtual or not, customers pay for.”

-Google Blog, 2015, in reference to Levyx and its groundbreaking

technology

Multi-million operations per second on a single Google Compute Engine instanceThursday, July 30, 2015 : https://cloudplatform.googleblog.com/2015/07/Multi-million-operations-per-second-on-a-single-Google-Cloud-Platform-instance.html



LevyxSpark Advantages

• Faster

–Combined solution provides superior performance vs Native Apache Spark especially in situations involving:

•Large datasets dealing with sorting, joins, group-by (heavy shuffling)

•Ideal for workloads involving small Random inserts, point queries

•Leveraging Index lookups vs filtering

• Cheaper

–Up to 90% reduction in Nodes/lower cost Nodes for equivalent or greater “in-memory” capacity

• Simpler

–Reduced network complexity

–No need to tier from Memory to Flash



LevyxSpark and OpenPower:Ideal Dense, ”Scale-in” Platform

• Power 8

–Hi core count/Relatively low cost

–CAPI Hi-performance interface

• 2 week porting effort

–Goal: Native Spark(FC) vs LevyxSpark (CAPI)

• Test Unit

–Power System S822L 2-socket POWER8 Server

–20 POWER8 cores, 160 logical CPUs (SMT8, 8 threads per core)

–256GB RAM

–Apache Spark 1.6

–FC and CAPI HBAs connected to IBM FlashSystem 840

–Ubuntu16.04.01



Test Benchmarks

• Sort – Integer, String,GenSort• Read an input table from data ingestion drive

• Sort table based on integer column

• Write sorted table to flash subsystem

• Iterative Join

–Read 16 table from data ingestion drive

–Save final join result to flash subsystem

–For 10 iterations

• Change one of input join graph

• Calculate new value of final join result

• Update a new result on flash subsystem

• Incremental Update to Sorted Table

– Read an input table from data ingestion drive as a baseline data set

– For 10 iterations

• Read another small table from data ingestion drive

• Add all elements of small table to base line data set

• Sort base line data based on first integer column

• Write sorted table to flash subsystem

18



Specification – Test Bench Summary

Bench Mark Data Set Size (GB) Comment

Sort 64, 128, 256, 512 Highlight advantage of LevyxSpark in analytical use cases

Iterative Join 128, 256, 512 Highlight advantage of LevyxSpark in data persisting

Incremental Update 128, 256, 512 Highlight advantage of LevyxSpark in transactional use cases

PERFORMANCE COMPARISON

©Copyright 2013-2014 Levyx Inc.

Integer Sort Test Bench

Execution Time

0

1000

2000

3000

4000

5000

6000

64 128 256 512

LevyxSpark Spark

Average CPU(s) User %

©C

opyri

ght 2013-2

016

Levyx Inc. P

ropri

eta

ry a

nd

Confidential

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

64 128 256 512

LevyxSpark Spark

String Sort Test Bench

Execution Time

0

1000

2000

3000

4000

5000

6000

7000

8000

64GB 128GB 256GB 512GB

Input Size

LevyxSpark Spark


©C

opyri

ght 2013-2

016

Levyx Inc. P

ropri

eta

ry a

nd

Confidential

0%

10%

20%

30%

40%

50%

60%

64GB 128GB 256GB 512GB

Input Size

LevyxSpark Spark

GenSort Test Bench

Execution Time

0

500

1000

1500

2000

2500

64GB 128GB 256GB

Input Size

LevyxSpark Spark


©C

opyri

ght 2013-2

016

Levyx Inc. P

ropri

eta

ry a

nd

Confidential

0%

10%

20%

30%

40%

50%

60%

64GB 128GB 256GB

Input Size

LevyxSpark Spark

Iterative Graph Test Bench

Execution Time

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

128GB 176GB 256GB

Input Size

LevyxSpark Spark


©C

opyri

ght 2013-2

016

Levyx Inc. P

ropri

eta

ry a

nd

Confidential

0%

5%

10%

15%

20%

25%

128GB 176GB 256GB

Input Size

LevyxSpark Spark

Sto

ck S

pa

rk E

rror

Sto

ck S

park

Erro

r

Incremental Update

Execution Time

0

1000

2000

3000

4000

5000

6000

7000

64GB 128GB 256GB

Input Size

LevyxSpark Spark


©C

opyri

ght 2013-2

016

Levyx Inc. P

ropri

eta

ry a

nd

Confidential

0%

10%

20%

30%

40%

50%

60%

64GB 128GB 256GB

Input Size

LevyxSpark Spark



Summary

LevyxSpark plus POWER8/CAPI integration ideal combination for Apache Spark IO Intensive Workloads

–Balanced Scale-in platform- Fewer nodes needed for a given workload

–Freed up cores by CAPI integration allow more analytical/computational workloads

–Larger datasets per node/reduced shuffling/spills/crashes

Levyx is ready to collaborate making applications Power CAPI Flash - aware



Contact within IBM

Randy [email protected]

Questions?

Levyx Contact

Kim [email protected]: +4916094878794



www.levyx.com

Danke Schön!

Thank You!


Contact within IBM

Randy [email protected]

Levyx Contact

Kim [email protected]: +4916094878794

Linux on Power Anwendertag 2016 IBM – Böblingen€¦ · high bandwidth, low-latency memory layer...

Documents

Transcript of Linux on Power Anwendertag 2016 IBM – Böblingen€¦ · high bandwidth, low-latency memory layer...