MARS: Experience with managing a multi-petabytes...

35
1 MARS: Experience with managing a multi-petabytes archive Baudouin Raoult Head of Data and Service Section ECMWF

Transcript of MARS: Experience with managing a multi-petabytes...

Page 1: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

1

MARS: Experience with managing a multi-petabytes archive

Baudouin RaoultHead of Data and Service Section

ECMWF

Page 2: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

2

Supporting States and Co-operation

Belgium Ireland PortugalDenmark Italy SwitzerlandGermany Luxembourg FinlandSpain The Netherlands SwedenFrance Norway TurkeyGreece Austria United Kingdom

Co-operation agreements or working arrangements with:Czech Republic Montenegro ACMADCroatia Morocco ESAEstonia Romania EUMETSAT Hungary Serbia WMOIceland Slovakia JRC Latvia Slovenia CTBTOLithuania CLRTAP

Page 3: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

3

ECMWF Objectives

Operational forecasting up to 15 days ahead (including waves)

R & D activities in forecast modelling

Data archiving and related services

Operational forecasts for the coming month and season

Advanced NWP training

Provision of supercomputer resources

Assistance to WMO programmes

Management of Regional Meteorological Data Communications Network (RMDCN)

Page 4: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

4

Current computer configuration

October 2008

Page 5: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

5

ECMWF Forecasting system:

Page 6: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

6

Atmosphere global forecastsForecast to ten days from 00 and 12 UTC at 25 km resolution and 91 levels50 ensemble forecasts to fifteen days from 00 and 12 UTC at 50 km resolution

Ocean wave forecastsGlobal forecast to ten days from 00 and 12 UTC at 50 km resolutionEuropean waters forecast to five days from 00 and 12 UTC at 25 km resolution

Monthly forecasts: Atmosphere-ocean coupled modelGlobal forecasts to one month:atmosphere: 1.125° resolution, 62 levelsocean: horizontally-varying resolution ( ° to 1°), 9 levels

Seasonal forecasts: Atmosphere-ocean coupled modelGlobal forecasts to six months:atmosphere: 1.8° resolution, 40 levelsocean: horizontally-varying resolution ( ° to 1°), 9 levels

ECMWF Forecast Products

13

13

Page 7: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

7

What is a field? An object uniquely identified by:

Forecasting systemDateAnalysis timeLevelParameterTime step…

Up to 11 attributes

Page 8: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

8

MARS: A managed archive

Meteorological Archival and Retrieval System22 years of existenceRetrievals expressed in meteorological termsPost-processing facilities

Interpolation between various data representationInterpolation on coarser gridsSub-area extractions

Manages meteorological fieldsData in GRIB and BUFR format according to WMO standards

Page 9: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

9

MARS: A managed archive (cont.)

Not a file systemUsers are not aware of the location of the data

An archive, not a databaseMetadata onlineData offline

Page 10: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

10

A meteorological language

Retrieve,date = 20010101/to/20010131,parameter = temperature/geopotential,type = forecast,step = 12/to/240/by/12,levels = 1000/850/500/200,grid = 2/2,area = -10/20/10/0

Page 11: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

11

MARS: Contents

All operational model outputsAnalysis, Forecasts, EPS, Seasonal, WaveClimatologies, Hindcasts

ECMWF Research experimentsMember State’s Research experimentsObservations

Conventional, SatelliteAnalysis input (for restartability)Analysis feedbackImages

Page 12: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

12

MARS: Contents (cont.)

Member State’s own model data HIRLAM, COSMO, …

International collaborationsPROVOST, ECSN, ENSEMBLE, DEMETER, TIGGE, …

ReanalysisERA15, ERA40, ERA Interim

Other centres (Washington, Tokyo, Toulouse, Offenbach, Exeter …)

For comparison

Page 13: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

13

Archive size vs. Supercomputer power

0.050.1

0.4

25.2

9

45 5690.5

413.81156

417410800

0.01

0.1

1

10

100

1000

10000

100000

Cray-1A

01/11/1

978

X-MP/2

01/11

/1983

X-MP/4

01/01

/1986

X-MP/8

01/01

/1990

C90/12

01/01

/1992

C90/16

01/01

/1993

VPP700/4

8 01/0

6/1996

VPP700-1

12 01/1

0/199

7VPP50

00 01

/04/19

99IBM-P

4 31/1

2/200

2IBM-P

5 01/0

7/200

4IBM-P

5+ 31

/12/20

06IBM-P

6 01/0

1/200

9

0.01

0.1

1

10

100

1000

10000

100000HPC (GFLOPs)Archive (TB)

Page 14: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

14

Archive size vs. Supercomputer power

0.05 0.1 0.4 2 5.2 9 45 56 90.5 413.81156

4174

10800

0

5000

10000

15000

20000

25000

Cray-1A

01/11/1

978

X-MP/2

01/11

/1983

X-MP/4

01/01

/1986

X-MP/8

01/01

/1990

C90/12

01/01

/1992

C90/16

01/01

/1993

VPP700/4

8 01/0

6/1996

VPP700-1

12 01/1

0/199

7VPP50

00 01

/04/19

99IBM-P

4 31/1

2/200

2IBM-P

5 01/0

7/200

4IBM-P

5+ 31

/12/20

06IBM-P

6 01/0

1/200

9

0

2000

4000

6000

8000

10000

12000HPC (GFLOPs)Archive (TB)

Page 15: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

15

MARS in numbers (March 2009)

~1000 active users, at ECMWF and in the Member StatesAnalysis from 1980, Forecasts from 1985ERA40: Analysis and observations since 19576 PByte of data in 3.4 * 1010 fields8.2 million filesGrowing daily by 10 TBytes (30 million fields)About 300,000 requests per day (100 million fields)

Page 16: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

16

MARS: Requirements

ScalableData volumes (many order of magnitude, Tb, Pb, Eb, …)Number of fields (hundreds of billions)Number of requests

RobustPower cuts, disks full, network glitches, damaged tapesData loss is unacceptable

PerformantHardware is expensive: make the most of the available resources (CPU, disks, tape drives, network, …)Human resources are even more expensive: users should not wait too long…

Page 17: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

17

MARS: Requirements (cont.)

SustainableData must be readable in 100 years from now (and more…)Archive must survive technology changes (hardware and software)

Capable of evolutionsSupport for new data typesSupport for new communities of users

Serves operations and researchDifferent expectations

Page 18: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

18

MARS: Requirements (cont.)Data Integrity (the most important one)

Did we archive already corrupted data?Did we archive the wrong data?Did the network corrupt the data?Did the disks corrupt the data?Did the tape drives corrupt the data?Is the tape damaged?Was there a software bug?

ConsequencesImpossible to investigate an event that happened several years ago.Loss of confidence in the data: one corrupted piece of data = lack of trust in whole archive

Page 19: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

19

Data Integrity

Self describing dataChecking the data to be archived:

By the client before sending to the serverBy the server on receptionData is retrieved again and checked against original.

All disk RAID or mirror“Enterprise quality” drives and tapesBackups made from primary tape copyBackups on different tape technologySoftware does not touch the dataA lot of testing…

Page 20: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

20

Scalability, manageability, performances

Keep the number of files to a minimumLarge files

Use collocation to reduce tape accessTape families, large files

Manage queues and priorities Build a system in which data can be moved around, where files can be split, joined and migrated.Minimise dependencies with commercial software

New releases may force you to perform unwanted migrationsTwo commercial software may become incompatible They may not be there in 20 years

Page 21: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

21

Scalability, manageability, performancesOO design, based on three abstract classes:

Tape manager (create, remove, exists, location, …)Tape file (size, last access, …)Data stream (open, read, write, close, partial reads)

Physical layer is abstracted, indirection is the key

Meteo. Server Data Server

Metadata Metadata

Caches

Requests References

Data Data

Tape System

Page 22: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

22

MARS manages its own caches

“Pre-archive” spaceData is first stored there as produced by the model

• Efficient archival

Allow incremental archivingData is then sorted and aggregated into large tape files

• Efficient retrievals

Retrieval cachesField level caching, only small parts of files are cached

Page 23: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

23

MARS manages its own queues

Three queues:User requests queuesTape read queuesTape write queues

One user request can create several tape read requestsRead requests are sorted according to volume and position

All possible requests for a volume are processedWrite requests are sorted by families

Présentateur
Commentaires de présentation
Read request can go to one or more tape manager All request from a volume
Page 24: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

24

MARS manages its own queues – cont.

A fixed number of tape drives is allocated for reading or writing

Queues and disk spaces are monitoredResults are fed into a decision making algorithmDrive allocation is adjusted accordingly

Better control Minimise tape mountsOptimise tape drive usagePriorities (serve VIPs first)

Page 25: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

25

WEBMARSActivity

System activityQueuesRequests progressInstructive

Better usage

Page 26: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

26

A brief history

1987 - MARS I: CFS – IBM/MVS mainframe - PL/IData in CFS: Common File System from Los AlamosRan out of steam after 5 million files

1997 - MARS II: TSM – AIX – C++TSM (ADSM): A backup system from Tivoli (IBM)Problem with support: ECMWF is one of the largest TSM site, atypical use of the software

2001 - MARS III: HPSS – AIX – C++HPSS is designed for and used by large scientific sitesHPSS is very scalability, excellent supportECMWF is a relatively small site in the HPSS community

Page 27: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

27

Back-archive

Copying data between systemsCostly

In human resourcesIn computer resourcesIn timeFinancially

Must be done without service interruption Do not underestimate it!!Two back-archives

1997: 25 Tb (18 months)2001: 300 Tb (9 months)

Page 28: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

28

Statistics gathering

Statistics are gather continuouslyShort term to check the health of the systemMedium term to check the effect of a change in the system (new disks, reconfigurations of the queues)Long term for evolution planning

What?System activity (CPU, disk usage, tape mounts…)Application performance (cache hits, …)User “experience” (number of fields per second)

The system needs to be tuned regularlyHardware changes (disks, tape drives,…)Access patterns change (new users, new projects,…)

Page 29: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

29

Cache hits in the various MARS servers

Page 30: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

30

Changing tape technologies

Page 31: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

31

QoS: MARS performance from a user point of view

Page 32: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

32

Total Mars Data Archived Daily

New HPC / Parallel run

Page 33: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

33

With a 60% growth per annum, half of the archive is less that 18 months old

Deletion of 1 petabyte

Page 34: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

34

Conclusion

Optimise resource usageData collocation, tape familiesMinimise the number of filesImplement QoS through queues

Use enterprise quality tapes and drivesAllow for migration and back-archiveGather statistics, analyse trends, tune and re-tuneTest everything many times Corruption will happen, it is a fact: check data integrity at all stages, before it is too late, do not trust built-in mechanisms

Page 35: MARS: Experience with managing a multi-petabytes archiveorap.irisa.fr/ArchivesForums/Forum24Lille/F24Presentations/Baudoin... · Estonia Romania EUMETSAT . Hungary Serbia WMO. IcelandSlovakia

35

Thank you