Performance and Scalability Benchmark: InterSystems IRIS ...
Unveiling the Galaxy - InterSystems€¦ · Unveiling the misteries of our Galaxy with Intersystems...
Transcript of Unveiling the Galaxy - InterSystems€¦ · Unveiling the misteries of our Galaxy with Intersystems...
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Unveiling the misteries
of our Galaxy with Intersystems
Caché
Dr. Jordi
Portell i de Mora
on behalf of the Gaia team at the
Institute for Space Studies of Catalonia (IEEC‐UB)
and the Gaia Data Processing and Analysis Consortium throughout Europe
InterSystems
Spain Summit 2016 -
Barcelona, 26 October 2016
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
It’s
all about
big
numbers
After
all, our
areas
are
not
that
different
•
Human
Genome: ~3 billion
base pairs
•
Company customers: up to ~7 billion
people
(potentially)
•
Our
Galaxy: ~200 billion
stars
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
So you
want
a census
of our
Galaxy?
1.
Get the list of stars
2.
Locate them
3.
Get their informationa)
CV: Where are they coming from?
(and where are they going?)b)
Education:
How ‘bright’
are they?c)
Personal interests:
Which is their favourite color?d)
Food habits:
What are they composed of?
4.
Do this for a representative fraction of the GalaxyGentle reminder: we’re talking about ~200 billion stars (2x1011)
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
The answer: ESA’s Gaia mission
•
Global Astrometry from Space–
Positions,
distances and motions of
~1 billion stars–
Photometry: brightness and colors
–
Spectroscopy: “fingerprint”
(chemical composition)
•
Most complete and accurate Milky Way census to date–
Accuracy: ~0.000000004 degrees
(~15 microarcsecs)
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
The Gaia spacecraft in a nutshell
•
Orbit around “L2 point”1.5 million Km
from Earth
•
5 years (nominally)•
2 telescopes
•
Gigapixel
camera106 CCDs, 9Mpix each
•
Autonomous operationObject detection onboard
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Launch!
•
Watch video here: https://www.youtube.com/watch?v=GidVVgTEfJg
•
…and more Gaia videos here: http://www.cosmos.esa.int/web/gaia/media-gallery/videos
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Data, data, data!
•
Downlink: 7 Mbps, 8h/day
25 GB / day(65 GB uncompressed)115 TB in 5 years
•
50 million measurements / day (1 measurement = 12‐15 tiny photos)
100 billion measurementsin 5 years
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
And…
now what do we do with the data?
•
All data must be promptly received and handled–
Continuous data accumulation and arrangement
•
Spacecraft health must be monitored daily–
Many issues can only be detected after data processing
•
Big processing only at the end? NO!–
Incremental data processing
•
VERY complex algorithms–
HUGE system of equations iterative solution
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
The Data Processing and Analysis Consortium
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
DPAC keystone: the Daily Pipeline
•
Reception and decoding of raw data packets•
Decompression
•
Initial Data Treatment (IDT)•
First Look diagnostics and calibrations
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Some features of the Gaia daily pipeline
•
Complex software system written in Java
(as all DPAC)–
Portability, scalability (multi‐thread operation), development tools
–
~1 million lines of code
•
Many different algorithms–
Data‐driven
approach:
triggering depending on data type received
•
High efficiency required–
Data train
approach:
prepare data and pass to algorithms–
Avoid ‘random’
data access…
if possible!
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Products from the daily pipeline
•
Raw measurements reconstruction–
Flagging of peculiar objects
and conditions
•
Monitoring of ‘basic angle’
between telescopes–
Interferometry:
accuracy of 3.6 picometres
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Products from the daily pipeline
•
First determination of satellite attitude–
100 milliarcsec
accuracy
(0.00003 degrees)
•
Determination of astrophysical background–
Needed for an
accurate photometry
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Products from the daily pipeline
•
Image parameters determination–
Main output for downstream systems
–
Position and brightness
•
Preliminary cross‐matching–
Identification of observations
and link to catalogue sources–
Most demanding element in
terms of DB performance–
Continuous queries on a table
with ~3 billion records
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
But…
is this technically feasible??
YES!
•
Pipeline typically idle a few hours/day
even in ‘dense’
days•
Peaks of ~150 million
measurements/day handled without problems
•
~10 tables
with typ. ~1‐2 billion records
•
Pipeline is stable
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
How?
•
30 IBM iDataPlex
computing nodes–
CPU typically underused; RAM quite busy
•
Very big DB node:–
32 Xeon cores (E5‐4640 2.4GHz), avg. ~50% in ‘dense’
days
–
1.25TB RAM–
7.5TB of local SSD
for most frequently accessed tables
–
NetAPP
NAS storage: FAS 8060 with ~600 disks and 10TB SSD–
10G network (Cisco 5548, <1ms latency)
–
Typical DB disk occupation: ~20TB (regular cleanup: only most recent data kept)
–
Intersystems
Caché
;‐)–
Over 10 billion records
served daily
(incl. complex queries)
•
Also CESCA/CSUC (pre‐deployment tests):–
5 computing nodes + DB node (“just”
64GB RAM)
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
2 years
of nominal operations
•
50 billion measurements processed so far
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Main achievement (so far): Gaia DR1
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Just a tiny
(daily) detail...
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
What is all this useful for?
•
Science and Knowledge–
Better understanding of the Galaxy we live in
–
Astronomers are so happy with our data!
•
Technological advances–
Numerical algorithms and techniques
–
Massive data handling systems–
New data handling and compression algorithms
A clear example: FAPEC Compressor
•
…so it gives a direct return to you!–
Efficient data handling and compression of
medical images, genomic information,
professinal
imaging, massive data transfers,
etc.
J. Portell (IEECJ. Portell (IEEC--UB / DAPCOM) UB / DAPCOM) -- Unveiling the GalaxyUnveiling the Galaxy
Unveiling the Unveiling the misteriesmisteries of our Galaxy of our Galaxy
with with IntersystemsIntersystems CachCachéé
Dr. Dr. JordiJordi
Portell i de Mora Portell i de Mora ‐‐
[email protected]@dapcom.es
on behalf of the Gaia team at theon behalf of the Gaia team at the
Institute for Space Studies of Catalonia (IEECInstitute for Space Studies of Catalonia (IEEC‐‐UB)UB)
and the Gaia Data Processing and Analysis Consortium throughout and the Gaia Data Processing and Analysis Consortium throughout EuropeEurope
InterSystemsInterSystems
Spain Summit 2016 Spain Summit 2016 --
Barcelona, 26 October 2016Barcelona, 26 October 2016
More
info: http://gaia.ub.edu
| @GaiaUB
| GaiaApp