Using NoSQL technologies for handling of the CMS Conditions

Roland Sipos for the CMS Collaboration

Forum on Concurrent Programming Models and Frameworks17 June 2015

Overview● Intro

○ CMS Conditions Database○ NoSQL

● Candidates○ Test framework○ Deployment

● Results● Outlook

Intro3

Conditions DatabaseAlignment and Calibration constants, that record a given “state” of the CMS Detector.

Essential for the analysis and reconstruction of the recorded data.

Also critical for the dataflow and need to be properly re-synchronized during the data processing.

CondDB - DetailsConditions are free from:● Full table scans

○ Only “by key” (or range of keys) access● Joins● Complex, nested queries● Transactions

○ Data is written once, and never deleted, altered● Absolute consistency

○ Only consistency criteria: newly appended data should be available for reads ASAP! (in less than few seconds)

CondDB - MotivationsFind alternative data storing technologies for the CMS Conditions data for:● Storing BLOBs● And it’s meta data● In a read-heavy environmentFurther requirements:● Durability● High availability● (Optional scalability)

Do we really need relational access for such use-case?6

Relational vs. Non-relational● Based on relational

model: relational algebra○ schema

● SQL: not necessarily but the most widespread

● Transactions○ ACID

● Well tested, "proven"

● Does not based on the relational model.○ schema free

● Query languages may differ:○ Datalog, XPath, etc.

● Unique operations (eg. CRUD)○ BASE

● Many, quite new solutions.○ beta phase versions, etc. 7

NoSQL - ACID vs. BASEACID

● Atomicity○ "all or nothing"

transactions● Consistency

○ data is always valid● Isolation

○ transactions are independent

● Durability○ permanent state

BASE● Basic Availability

○ it’s OK to give approximate answers

● Soft state○ easier (schema)

evolution● Eventual consistency

○ stored data achieves consistent state with time

NoSQL - CAP Theorem

Consistencyaka."All client see the same data at the same time!"

Availabilityaka."Every request got a response about success or failure!"

Partition toleranceEventualconsistency

RDBMSs

Enforced consistency(PAXOS)

You can choose only two by the following three attributes.

- Eric Brewer at 2000

10+ years already passed, many misleading/false information were born based on the theorem.

Partitioning - General● Distributed memory cache● Clustering

○ scaling the persistency layer● Separate operations (reads and writes)

○ dedicated master for write, group of slaves for read● Sharding - horizontal partitioning

○ the storage volume is distributed among many nodes

Scale up: "put more RAM or better processor to the server" Scale out: distribute data among computing elements

Scale out = partitioning10

Partitioning - Vertical

Splitting up stored data in one entity, into multiple ones. (It’s a bit more than normalization.)

“Different columns on different resources.”

E.g.: Frequently accessed columns in separate table, cached in-memory. Rarely used columns stored on disk.

Partitioning - Horizontal

One entity, however it’s data may be distributed on many storage elements.

“Different rows on different resources.”

E.g.: Year based distribution on different storage elements based on an “insertion time” constraint.

NoSQL - GeneralNoSQL in keywords:● Only a buzzword

○ Meaning: “One size does not fit all!”● CAP Theorem● ACID vs BASE● Different models

○ Doc. store, Key-Value, Column oriented, BigTable

NoSQL means: “we have options”!Not against relational DBs, but a complement to those!

NoSQL - Options

Options

Non-Relational Relational

Operational

Analytic

NewSQLNoSQL

Document

Key-value DaaS

Column oriented Graph

Oracle IBM DB2 JustOneDB MS SQL Server

HadoopCloudera Hadapt

Oracle TimesTen IBM InfosphereSAP (Hana, Sybase IQ) HP Vertica

Lotus Notes

CouchDB MongoDB

MySQL PostgreSQL JustOneDB

ProgressObjectivity Versant

McObjectMarkLogic

SQL Azure RavenDB Amazon RDS

XeroundFathomDB NuoDB

Riak Redis Voldemort BerkleyDB

Cassandra Accumulo

BigTable HyperTable HBase

Couchbase SimpleDBApp Engine

Brand new RDBS Add-on

Clustrix VoltDB SnakeSQL

ScaleDBMySQL Cluster GenieDB TokutekDrizzle

Flat, Hierarchical, Network, etc...

source: Tim Gasper - Big Data Right Now: Five trendy open source technologies

Challenges 1.How to chose? Rule of thumb: Benchmarking!But an exact way for NoSQL benchmarking could not exist. (It's not like TPC-X for RDBMSs.) Even if we want to compare them, we must fight with different...

○ problems and possibilities,○ APIs,○ partitioning techniques, etc ...

Challenges 2.The main issue is that the design and preferred use-cases of the NoSQL databases are REALLY different. There is no "x better than y" argument for EVERY use-case.

Benchmarks are based on several use cases:● different computing elements,● compared by write/read/update/scan

operations. (mixed with a ratio: 90%/10%/0%)16

Prototypes - The candidates17

SelectionIn multiple phases...

Find:● Showstopper problems (no-go)● Barely usable (some issues)● Promising candidates

Preliminary testing.18

CandidatesNo-go

● HBase (/w HDFS)○ BLOB size problem.

● CouchDB○ Drivers

● Hypertable○ In development

● etc.: app layer needs, CAP characteristics, durability problems.

Promising● MongoDB

● Cassandra

So-so● RIAK

○ Query routing!● (Couchbase)

DeploymentAutomated virtual environments on OpenStack.

○ Personal tenant - biased by user interactions○ Thanks to the collaboration with CERN IT, the

evaluation was made on dedicated resources○ Also SSD cached vs. disk comparisons were made

Details:○ No overcommit○ Instances are “equally” distributed on the

hypervisors. (for 5 node: 2-2-1 on 3 hypervisors)○ 1 GBit NICs (shared between co-hosted VMs)

Evaluation

Empirical evaluation: Check if a given prototype meets the usability and performance criterias of the desired solution.

If more of them passes the criteria, choose the best, based on essential features and performance characteristics.

CustomSamplers 1.An extension for JMeter, with CMS specific needs, in order to measure the performance of different databases.For each candidate the extension has:● Deployers

○ To build up the data model● QueryHandlers

○ Simulate the CMS workflow● ConfigElements

○ Configure persistency objects● Samplers

○ Report to the testplan listeners22

CustomSamplers 2.Testplans are XML configurations that set up the behaviour of the testing engine by controlling:● number of threads● ramp up time of thread creation● configure connection layers● assign requests to threads

1 TPS : 1200 threads started in 1200 second, result in 20 min. constant stage tests. 23

ResultsIncreasing request numbers: 1-9 TPS(For both remote and single testplans)● Exploring limits for saturating factors like:

○ Network bandwidth○ Access of persistency objects○ Storage elements (Ephemeral disk/SSD, Ceph)

● Scaling out (different cluster setups):○ Node numbers (5 x m1.large, 4 x m1.medium) ○ Routing techniques (Round robin, Token-aware)○ Distributed testing (4 JMeter engine)

Single nodeNormal test - DISK

Single nodeRemote test - DISK

Single nodeNormal test - SSD

Single nodeRemote test - SSD

Medium clusterRemote test - DISK

Medium clusterRemote test - SSD

Large clusterRemote test - DISK

Large clusterRemote test - SSD

Ceph volume

Outro - Present and future34

Application layerThe current implementation of the session layer is highly modular and extendable with alternative storage backends. Steps:● Handling persistency objects

○ Extending the software framework with NoSQL support

● Implement the Session interfaces○ Implementing the “equivalent” CondDB queries

● Testing 35

Integration● Release validation● Find differences between the current

solution and the prototypes○ Using real data○ Real use-cases - using CMSSW

This will be the final performance comparison between different deployments.

Outlook● Understand and eliminate issues during the

release validation● Fine-tuning critical performance factors● Formal evaluation and comparison of the

different solutions

Long term project!Not a “by tomorrow” change, but for LS2.

The endThank you for your attention!

Any questions are welcome!

From: http://geek-and-poke.com/38

Using NoSQL technologies for handling of the CMS Conditions

Documents

Transcript of Using NoSQL technologies for handling of the CMS Conditions

Consistent NoSQL data storage with ModeShape (NoSQL Matters 2013)

TECHNOLOGY NEWS Will NoSQL Databases Live Up to Their … · NOSQL PROS AND CONS NoSQL databases have numerous advantages and disadvantages. Advantages NoSQL databases generally pro-cess

NoSQL Racket: A Testing Tool for Detecting NoSQL Injection ...thesai.org/Downloads/Volume8No11/Paper_78-NoSQL... · CouchDB and so on for other NoSQL Database Types (Cassandra, Amazon

Some NoSQL

Home - Centers for Medicare & Medicaid Services | CMS · Web viewCMS SENSITIVE INFORMATION – REQUIRES SPECIAL HANDLING CMS SENSITIVE INFORMATION – REQUIRES SPECIAL HANDLING Author

NoSQL Now! Webinar Series: Innovations in NoSQL Query Languages

NoSQL: Graph Databases. Databases Why NoSQL Databases?

NoSQL - amonra.co.uk · NoSQL An Introduction Amonra IT - 2019. What is NoSQL? NoSQL is a non-relational database management systems, different from traditional relational database

NoSQL Databases - IRISA · BUT makes some manipulations difficult ... § Neo4j, Infinite Graph ... § Handling very big graphs (shardingis difficult) Neo4j 70

Handling humongous data with NoSQL-MongoDB - bed …bed-con.org/.../slides/Handling_humongous_data_with_NoSQL-Mon… · Handling humongous data with NoSQL/MongoDB Andreas Hartmann

Oracle NoSQL Database – A Distributed Key-Value Store · HPTS, October 24, 2011 Agenda • Oracle and NoSQL • Oracle NoSQL Database Architecture • Oracle NoSQL Database Technical

Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012

How to Use NoSQL - nosqlroadshow.comnosqlroadshow.com/dl/NoSQL-Road-Show/slides/Howt... · How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zürich

How companies use NoSQL and Couchbase - NoSQL Now 2013

NoSQL for Next Gen CMS - narensuri.files.wordpress.com · [In short: Not Only SQL] ... Cassandra Graph – Neo4j ... Business Benefits while Y-axis defines the Performance at different

Who am I?assets.astrails.com/.../wtf-is-mysql.pdf · Twitter Rackspace Digg Everybody LinkedIn Wednesday, June 16, 2010. NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL

Schemaless NoSQL Data Stores – Object-NoSQL Mappers to the ... · NoSQL data store depends strongly on the underlying NoSQL data model. Furthermore, different mappings are possible

CMS ZDC Remote Handling Tool (ZDC Crane) Force Calculations and Mechanical Analysis

NoSQL and Big Data Analytics at NOSQL NOW! 2013

NoSQL or Not Only SQL - Montana Technological University · NoSQL or Not Only SQL . Reasons to go to a NoSQL database: ... SQL NoSQL ACID: • Atomic • Consistent • Isolation