AscalableinfrastructureforCMS dataanalysisbasedon ... · PDF fileGlusterFS" 2 4 14...

1
5 Introduc+on Architecture System Components 7 A scalable infrastructure for CMS data analysis based on OpenStack Cloud and GlusterFS Datacenter Indirec+on Infrastructure(DII) for High Energy Physics (HEP) GlusterFS IO System Stability Performance Analysis Live Migra+on 8 Conclusion 9 Acknowledgements 1 Salman Toor, 2 Lirim Osmani, 1 Paula Eerola, 1 Oscar Kraemer, 1 Tomas Lindén, 2 Sasu Tarkoma and 3 John White 1 Helsinki Ins+tute of Physics (HIP), CMS Program, 2 Computer Science Department, University of Helsinki, and 3 Helsinki Ins+tute of Physics (HIP), Technology Program {Salman.Toor, Paula.Eerola, Carl.Kraemer, Tomas.Linden}@helsinki.fi, {Lirim.Osmani, Sasu.Tarkoma}@cs.helsinki.fi, [email protected] 1 3 2 4 The aim of this project is to design and implement a scalable and resilient Infrastructure for CERN High Energy Physics (HEP) data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with Gluster File System. Our test results show that the adopted approach provides a scalable and resilient solu+on for managing resources. Computa+onal resources Dell PowerEdge, 2 quad core Intel Xeon 32GB (8 x 4GB) RAM 4 x 10GbE Broadcom 57718 network Gluster File System servers Dell Power Edge servers 512GB LUN adached to each FCoE (Fibre Channel over Ethernet) protocol OpenStack Cloud (Grizzly release) Gluster File System Advanced Resource Connector middleware CERN Virtual Machine File System Experiments with different kinds of instances Minimal VM of Ubuntu m1.small took 6 sec Worker node VM with full configura+on took 43 sec Experienced random failures in higher number of live migra+on requests 4% performance loss evaluated with the HEPSPEC2006 benchmark Burst mode VM boot requests based on Local and GlusterFS based setups Uniform boot response with GlusterFS 6 GlusterFS is used inside the Cloud to provide shared area (Brick 1 & 2, 2TB) outside for Glance and Nova repositories (Brick 3 & 4, 1TB) Brick1 Brick2 Brick3 Brick4 Days 20 20 20 20 Total Reads 40GB 41GB 169GB 831GB Total Writes 38GB 36GB 364GB 1017GB Average – Maximum Latency (milliseconds) / No. of Calls (in millions) Read 0.0511.6/ 29.8 0.0510.8/ 31.0 0.29217.1/ 0.6 0.382386/ 0.32 Write 0.0816.8/ 1.8 0.0714.0/ 1.9 1.665151/ 10.8 1.067281/ 7.8 Type Nodes Cores Memory Virtual Machines (QCOW2 images) WN 25 4 14 CE 1 4 14 GlusterFS 2 4 14 Real Machines Controller 1 8 32 Compute 19 8 32 GlusterFS 2 8 32 Local and GlusterFS based storage Root local Ephemeral disk GlusterFS shared MP WN 10GB 45GB 900GB CE 10GB 900GB The evalua+on is based on the CMS Dashboard together with CSC and NDGF accoun+ng and monitoring systems More than 65k jobs have been processed, including CMS produc+on and analysis jobs Example of a specific user Run 400 analysis jobs with 74 wall+me days and 85% CPU efficiency We have demonstrated: More flexible system/user management through the virtualized environment; Efficient addi+on/removal of virtual resourses; Scalability with an acceptable performance loss; A seamless view of our site through ARC middleware. This project is funded by Academy of Finland (AoF) Thanks to Ulf Tigerstedt, CSC for help with HEPSPEC tests The CMS collabora+on for sending produc+on jobs to process

Transcript of AscalableinfrastructureforCMS dataanalysisbasedon ... · PDF fileGlusterFS" 2 4 14...

Page 1: AscalableinfrastructureforCMS dataanalysisbasedon ... · PDF fileGlusterFS" 2 4 14 Real"Machines" Controller" 1 8 32 Compute" 19 8 32 GlusterFS" 2 8 32 Localand GlusterFSbasedstorage

5  

Introduc+on  

Architecture  

System  Components  

7  

A  scalable  infrastructure  for  CMS    data  analysis  based  on    

OpenStack  Cloud  and  GlusterFS  Datacenter  Indirec+on  Infrastructure(DII)  for  High  Energy  Physics  (HEP)      

GlusterFS  IO    

System  Stability  

Performance  Analysis  

Live  Migra+on  

8   Conclusion  

9   Acknowledgements  

1Salman  Toor,    2Lirim  Osmani,    1Paula  Eerola,    1Oscar  Kraemer,    1Tomas  Lindén,    2Sasu  Tarkoma  and    3John  White    1Helsinki  Ins+tute  of  Physics  (HIP),  CMS  Program,    2Computer  Science  Department,  University  of  Helsinki,    and    3Helsinki  Ins+tute  of  Physics  (HIP),  Technology  Program  

 {Salman.Toor,  Paula.Eerola,  Carl.Kraemer,  Tomas.Linden}@helsinki.fi,  {Lirim.Osmani,  Sasu.Tarkoma}@cs.helsinki.fi,  [email protected]    

1   3  

2   4  

The  aim  of  this  project  is  to  design  and  implement  a  scalable   and   resilient   Infrastructure   for   CERN   High  Energy   Physics   (HEP)   data   analysis .   The  infrastructure   is   based   on   OpenStack   components  for   structuring   a   private   Cloud   with   Gluster   File  System.   Our   test   results   show   that   the   adopted  approach   provides   a   scalable   and   resilient   solu+on  for  managing  resources.    

Ø  Computa+onal  resources  ²   Dell  PowerEdge,  2  quad  core  Intel  Xeon    ²   32GB  (8  x  4GB)  RAM    ²   4  x  10GbE  Broadcom  57718  network  

Ø  Gluster  File  System  servers  ²   Dell  Power  Edge  servers    ²   512GB  LUN  adached  to  each  ²   FCoE  (Fibre  Channel  over  Ethernet)  protocol    

Ø OpenStack  Cloud  (Grizzly  release)  Ø  Gluster  File  System  Ø  Advanced  Resource  Connector  

middleware  Ø  CERN  Virtual  Machine  File  System  

Ø  Experiments  with  different  kinds  of  instances    ² Minimal  VM  of  Ubuntu  m1.small  took  6  sec  ² Worker  node  VM  with  full  configura+on  took  43  sec  

Ø  Experienced  random  failures  in  higher  number  of  live  migra+on  requests    

Ø  4%  performance  loss  evaluated  with  the  HEPSPEC-­‐2006  benchmark  

Ø  Burst  mode  VM  boot  requests  based  on  Local  and  GlusterFS  based  setups  

Ø Uniform  boot  response  with  GlusterFS    

6  

Ø  GlusterFS  is  used    ²  inside  the  Cloud  to  provide  shared  

area  (Brick  1  &  2,  2TB)  ² outside  for  Glance  and  Nova  

repositories  (Brick  3  &  4,  1TB)  

Brick-­‐1   Brick-­‐2   Brick-­‐3   Brick-­‐4  Days   20   20   20   20  Total  Reads    

40GB   41GB   169GB   831GB  

Total  Writes  

38GB   36GB   364GB   1017GB  

Average  –  Maximum  Latency  (milliseconds)  /    No.  of  Calls  (in  millions)  

Read   0.05-­‐11.6/  29.8    

0.05-­‐10.8/  31.0    

0.29-­‐217.1/  0.6    

0.38-­‐2386/  0.32    

Write   0.08-­‐16.8/  1.8    

0.07-­‐14.0/  1.9    

1.66-­‐5151/  10.8    

1.06-­‐7281/  7.8    

Type   Nodes   Cores   Memory  

Virtual  Machines  (QCOW2  images)  WN   25   4   14  CE   1   4   14  GlusterFS   2   4   14  

Real  Machines  Controller   1   8   32  Compute   19   8   32  GlusterFS   2   8   32  

Local  and  GlusterFS  based  storage  

Root  local  

Ephemeral  disk  

GlusterFS    shared  MP  

WN   10GB   45GB   900GB  

CE   10GB   -­‐   900GB  

Ø  The  evalua+on  is  based  on  the  CMS  Dashboard  together  with  CSC  and  NDGF  accoun+ng  and  monitoring  systems  

Ø More  than  65k  jobs  have  been  processed,  including  CMS  produc+on  and  analysis  jobs  

Ø Example  of  a  specific  user  ² Run  400  analysis  jobs  with  74  wall+me  

days  and  85%  CPU  efficiency  

We  have  demonstrated:  Ø More  flexible  system/user  management  through  the  

virtualized  environment;  Ø  Efficient  addi+on/removal  of  virtual  resourses;  Ø  Scalability  with  an  acceptable  performance  loss;  Ø  A  seamless  view  of  our  site  through  ARC  middleware.  

Ø  This  project  is  funded  by  Academy  of  Finland  (AoF)  Ø  Thanks  to  Ulf  Tigerstedt,  CSC  for  help  with  HEPSPEC  tests  Ø  The  CMS  collabora+on  for  sending  produc+on  jobs  to  

process