Post on 17-Jan-2016
Ariadne:Prima Facie
Illya ShapovalCERN, KIPT, UNIFE, INFN-FE
2nd LHCb Computing Workshop4th-8th November 2013
CERN
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 2
Content
● Objectives● Approach● Choice of DBMS● Ariadne● Use cases
Introduction
● LHCb data processing implies handling of heterogeneous metadata entities– Versions of applications (in 2 dimensions)– Conditions Database states (in 2x3 dimensions)– Real data reconstruction types– MC data simulation types– Many others: trigger configurations, arch. specificators, etc.
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 3
Objectives: how did it start
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 4
How are entities related?
What entities are compatible?
What entities are compatible
with an application?
What entities are compatible with a
CondDB state?
Is a CondDB state consistent?
What entities are compatible with
a data processing type?
Requirements
● An operational space with generic way of– Expressing relationship constraints– Tracking relationships – Extracting solutions
● Ease of data management● Flexibility (to extend the area of application)
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 5
Approach: property graphs
● Modeling structured metadata– as nodes with attributes
(key+value)
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 6
A:3
A:1
A:2
B: “W”
…
…
R:”T”S:99
…
Approach: property graphs
● Modeling structured metadata– as nodes with attributes
(key+value)– with typed connections
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 7
A:3
A:1
A:2
B: ”W”
…
…
R:”T”S:99
…co
mpa
tible
compatiblexyz
Approach: property graphs
● Modeling structured metadata– as nodes with attributes
(key+value)– with typed connections
● Tracking?● Extracting?
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 8
A:3
A:1
A:2
B: ”W”
…
…
R:”T”S:99
…co
mpa
tible
compatiblexyz
Choice of DBMS model
Relevant characteristics Relational solution Graph solution
Object-relational impedance mismatch problem[1]
[2] Suffers of [2] Free of
Flexibility of schema No Yes
Performance of structural queries
Poor[3] Better[3]
Scaling to data complexity Poor[4] Better[4]
Ease of data management Doable Better
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 9
[1] C. Ireland et al, A Classification of Object-Relational Impedance Mismatch, DBKDA ’09.[2] Correlated with the “Performance of structural queries” row of the table[3] C. Vicknair et al, A Comparison of a Graph Database and a Relational Database, ACM SE ’10, NY.[4] See the next slide
Other NoSQL DBMS models
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 10
Data complexity
Dat
a si
ze Key-value dbs
Column dbs
Document dbs
Graph dbsRelational dbs
Still billions of nodes
Choice of graph DBMS
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 11
Taken from the “Knowledge Base of Relational and NoSQL DBMS” at: http://db-engines.com/en/ranking/graph+dbms
Neo4j
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 14
Ariadne: system designA tracking system for relationships in LHCb metadata
(Leveled data flow diagram, shown in the Yourdon-DeMarco DFD notation)
AriadneMetadataprimitives
Topological solutions
Publish
(CL tools, web FE)
Query
(Ariadne Python API, CL tools, web FE)
Neo4j database
Context diagram
Level 0
Ariadne: previous security model
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 15
Neo4j Jetty
server
Admin CL tools
Neo4j admin
web interface
Public Ariadne XMLRPC server
RO
RWRW
Users’ CL tools LHCb job
?
Problem:Administration access was secured by ip-based rules, and thus was very limited and incovenient.
Ariadne: new security model
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 16
Neo4j Jetty
server
Admin CL tools
Neo4j admin
web interface
RO
RW
?
Apache server
with SSO IAA
Public Ariadne XMLRPC server
Users’ CL tools LHCb job
Evolution:All administration components of Ariadne integrated into CERN SSO IAA infrastructure (no LDAP!)
● Metadata entities that current graph contains (>500)– Applications
– CondDB tags (or , and to specify partitions)
– DetectorTypes(DataTypes) , RecoTypes , SimTypes
– Platforms , GRID sites
● Relationships between those entities (~50k)
– , , , – , ,
Ariadne: current knowledge graph
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 17
A
T TD TC TQ
R SD
P G
A T A D A R A S
D T R T S T
Tracking Relationships: Matching Patterns (1)● What is the full compatible set of entities for concrete real
data processing type?
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 18
A
TDTC
TQ
D
R
A
TDTC
TQ
D
R
QuerySolution
TC
TC
D
S
TD
A
Ariadne
Tracking Relationships: Matching Patterns (2)● What is the full compatible set of entities for concrete
application and MC data processing type?
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 19
A
TDTC
TQ
D
S
A
TDTC
TQ
D
S
QuerySolution
TC
TC
D
S
TD
A
Ariadne
Tracking Relationships: filtering multiple solutions (3)● What is the full compatible set of entities for concrete
application and MC data processing type?
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 20
A
TDTC
TQ
D
S
A
TDTC
TQ
D
S
Query
Set of solutions
AriadneA
TDTC
TQ
D
S
A
TDTC
TQ
D
S
Latest
…Ariadne filters multiple results according to the criterion provided
by a user.
Other application domains
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 21
How are entities related?
What entities are compatible?
Wha
t enti
ties
are
com
patib
le
with
an
appl
icati
on?
What entities are compatible with a
CondDB state?
Is a CondDB state consistent? W
hat e
ntitie
s ar
e co
mpa
tible
w
ith a
dat
a pr
oces
sing
ty
pe?
…?
Summary
● Ariadne – a generic tracking system for relationships in LHCb metadata– Provides generic UI layer for heterogeneous metadata;– Based on the novel Neo4j graph database;– Provides powerful expressiveness when dealing with
complex data (lots of relationships);– Scalable and high performant solution for complex data.
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 22
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 23
Ariadne developpers:Illya Shapoval
Marco ClemencicMarco Cattaneo
Many thanks to Joel Closier :for the admin support of the
Ariadne hosting machine
Special thanks to: Regina Hunyadi
for the amazing painting “Ariadne” and authorization
to use its copy
The system was called after ancient Greek character of Ariadne, who, according to the legend, was in
charge of the Cnossian Labyrinth and assisted Theseus, with a clue of thread, to find a way back
from the labyrinth after killing the Minotaur.
Ariadne: system requirements(narrows down to the Neo4j’s requirements)
Minimum Recommended Actual
CPU Intel Core i3 Intel Core i7 Intel(R) Xeon(R) CPU L5640 @
2.27GHz
Memory 2GB 16—32GB or more
48GB
Disk SATA SSD w/ SATA III SATA II
Filesystem ext4 (or similar) ext4, ZFS ext4
Software Oracle Java 7, OpenJDK 7
Oracle Java 7 Oracle Java 7
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 25
Compatibility of Entities: Definition
● Two entities are declared compatible if a job, that uses them simultaneously:
A. never fails because of the combination
B. is configured by the combination to work exactly in the way a developer anticipated it.
Implication:● A set of entities is declared compatible if and only if each
pair of the entities (the compatibility is tracked between) out of the set is compatible.
8-10-2013 LHCb Computing Workshop , 4th-8th November, CERN 26