Advancing Science through Coordinated Cyberinfrastructure

40
www.ci.anl.gov www.ci.uchicago.edu Advancing Science through Coordinated Cyberinfrastructure Daniel S. Katz [email protected] Senior Fellow, ComputaBon InsBtute, University of Chicago & Argonne NaBonal Laboratory Affiliate Faculty, Center for ComputaBon & Technology, Louisiana State University Adjunct Associate Professor, Electrical and Computer Engineering, LSU

description

How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014

Transcript of Advancing Science through Coordinated Cyberinfrastructure

Page 1: Advancing Science through Coordinated Cyberinfrastructure

   

www.ci.anl.gov  www.ci.uchicago.edu  

Advancing  Science  through  Coordinated  Cyberinfrastructure  

Daniel  S.  Katz  [email protected]  Senior  Fellow,  ComputaBon  InsBtute,  University  of  Chicago  &  Argonne  NaBonal  Laboratory  Affiliate  Faculty,  Center  for  ComputaBon  &  Technology,  Louisiana  State  University  Adjunct  Associate  Professor,  Electrical  and  Computer  Engineering,  LSU    

Page 2: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

2   Advancing  Science  through  CI  –  [email protected]  

Topics  

•  What  we  did  in  Louisiana  from  2006-­‐2010  •  What  I  would  do  differently  now  

•  A  short  video  to  highlight  some  addiBonal  issues  that  I  hope  the  Center  for  ComputaBonal  Engineering  &  Sciences  will  keep  in  mind  

Page 3: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

3   Advancing  Science  through  CI  –  [email protected]  

Louisiana  

•  Area: 134 382 km2 (33/51) •  Population: 4 533 000 (2010, 25/51) •  GDP: $208 billion (2009, 24/51)

•  GDP/person: $45 700 (2009, 21/51) •  In Poverty: 17% (2009, 44/51)

•  High School Degree: 82% (2009, 46/51) •  BS Degree: 21% (2009, 47/51) •  Advanced Degree: 7% (2009, 48/51)

State  Goals:  talented  workforce,  great  compeBBveness,  strong  educaBonal  system,  increased  economic  development  

Page 4: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

4   Advancing  Science  through  CI  –  [email protected]  

PITAC  Report  Summary:    •  “ComputaBonal  science  -­‐-­‐  the  use  of  

advanced  compuBng  capabiliBes  to  understand  and  solve  complex  problems  -­‐-­‐  is  criBcal  to  scienBfic  leadership,  economic  compeBBveness,  and  naBonal  security.  It  is  one  of  the  most  important  technical  fields  of  the  21st  century  because  it  is  essenBal  to  advances  throughout  society.”  

•  “UniversiBes  must  significantly  change  organizaBonal  structures:    mulBdisciplinary  &  collaboraBve  research  are  needed  [for  US]  to  remain  compeBBve  in  global  science”  

Complex  problems:    Innova1ons  will  occur  at  boundaries  

Page 5: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

5   Advancing  Science  through  CI  –  [email protected]  

Big  Science  and  Infrastructure  

•  Higgs*  boson  discovery  announced  at  CERN  July  4,  2012  •  Instrument:  Large  Hadron  Collider  (LHC)  •  Infrastructure  

–  CompuBng  Hardware:  Worldwide  LHC  CompuBng  Grid  (WLCG):  235,000  cores  across  36  countries,  including  OpenScience  Grid  (OSG,  US),  European  Grid  Infrastructure  (EGI,  Europe),  ...  

–  Data:  ~20  PB  of  data  created  in  2011-­‐2012  –  Soiware:  grid  middleware,  physics  analysis  applicaBons,  ...  –  Networks  –  EducaBon  &  

Training  •  Data  generated    

centrally,  moved    (~3  PB/week)  across  mulB-­‐Bered    infrastructure  to  be    compuBng  upon  

Page 6: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

6   Advancing  Science  through  CI  –  [email protected]  

Big  Science  and  Infrastructure  

•  Hurricanes  affect  humans  •  MulB-­‐physics:  atmosphere,  ocean,  coast,  vegetaBon,  soil  

–  Sensors  and  data  as  inputs  

•  Humans:  what  have  they  built,  where  are  they,  what  will  they  do  –  Data  and  models  as  inputs  

•  Infrastructure:  –  Urgent/scheduled  processing,    

workflow  systems  –  Soiware  applicaBons,  workflows  –  Networks  

–  Decision-­‐support  systems,    visualizaBon  

–  Data  storage,  interoperability  

Page 7: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

7   Advancing  Science  through  CI  –  [email protected]  

Long-­‐tail  Science  and  Infrastructure  

•  Exploding  data  volumes  &  powerful  simulaBon  methods    mean  that  more  researchers  need  advanced  infrastructure  

•  Such  “long-­‐tail”  researchers    cannot  afford  expensive  experBse  and  unique  infrastructure    

•  Challenge:  Outsource  and/or  automate  Bme-­‐consuming  common  processes  

–  Tools,  e.g.,  Globus  Online  and  data  management  

o  Note:  much  LHC  data  is  moved  by  Globus  GridFTP,  e.g.,  May/June  2012,  >20  PB,  >20M  files  

–  Gateways,  e.g.,  nanoHUB,  CIPRES,  access  to  scienBfic  simulaBon  soiware  

NSF  grant  size,  2007.  (“Dark  data  in  the  long  tail  of  science”,  B.  Heidorn)  

Page 8: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

8   Advancing  Science  through  CI  –  [email protected]  

Long-­‐tail  Science  and  Infrastructure  •  CIPRES  Science  Gateway  for  PhylogeneBcs  

–  Study  of  diversificaBon  of  life  and  relaBonships  among  living  things  through  Bme  •  Highly  used,  as  of  mid  2013:  

–  Cited  in  at  least  400  publicaBons,  e.g.,  Nature,  PNAS,  Cell  –  More  than  5000  unique  users  in  3  years  –  Used  rouBnely  in  at  least  68  undergraduate  classes  –  45%  US  (including  most  states),  55%  70  other  countries  

•  Infrastructure  –  Flexible  web  applicaBon  

o  A  science  gateway,  uses  soiware  and  lessons  from  XSEDE  gateways  team,  e.g.,  idenBfy  management,  HPC  job  control  

–  Science  soiware:  tree  inference  and  sequence  alignment  o  Parallel  versions  of  MrBayes,  RAxML,  GARLI,  BEAST,  MAFFT  o  PAUP*,  Poy,  ClustalW,  Contralign,  FSA,  MUSCLE,  ...  

–  Data  o  Personal  user  space  for  storing    

results  o  Tools  to  transfer  and  view  data  

Credit:  Mark  Miller,  SDSC  

Page 9: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

9   Advancing  Science  through  CI  –  [email protected]  

Infrastructure  Challenges  

•  Science  –  Larger  teams,  more  disciplines,  more  countries  

•  Data    –  Size,  complexity,  rates  all  increasing  rapidly  –  Need  for  interoperability  (systems  and  policies)  

•  Systems  –  More  cores,  more  architectures  (GPUs),  more  memory  hierarchy  –  Changing  balances  (latency  vs  bandwidth)  –  Changing  limits  (power,  funds)  –  System  architecture  and  business  models  changing  (clouds)  –  Network  capacity  growing;  increase  networks  -­‐>  increased  security  

•  Soiware  –  MulBphysics  algorithms,  frameworks  –  Programing  models  and  abstracBons  for  science,  data,  and  hardware  –  V&V,  reproducibility,  fault  tolerance  

•  People  –  EducaBon  and  training  –  Career  paths  –  Credit  and  avribuBon  

Page 10: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

10   Advancing  Science  through  CI  –  [email protected]  

Cyberinfrastructure  

“Cyberinfrastructure  consists  of    compu1ng  systems,    data  storage  systems,      advanced  instruments  and      data  repositories,      visualiza1on  environments,  and      people,    

all  linked  together  by  so@ware  and      high  performance  networks    

to  improve  research  produc1vity  and  enable  breakthroughs      not  otherwise  possible.”              -­‐-­‐  Craig  Stewart  

 

Page 11: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

11   Advancing  Science  through  CI  –  [email protected]  

ComputaBonal  &  Data-­‐enabled    Science  &  Engineering  (CDS&E)  

•  LIGO:    Laser  Interferometric  GravitaBonal  Wave  Observatory  

•  Ties  together  theory,  computaBon,  and  experiment  –  Each  drives  the  other  two!  

Page 12: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

12   Advancing  Science  through  CI  –  [email protected]  

How  We  Started  

•  State  commitment:  $25M/year  for  Vision  20/20  –  $9M:  LSU  -­‐>  CCT  (similarly,  ULL  -­‐>  LITE)  

•  University  commitment  to  build  new  programs  for  21st  century  

•  State  and  University  willingness  to  make  extraordinary  investments  

•  Opportunity  to  build  new  world  class  program  in  interdisciplinary  research  and  educaBon,  involving  all  of  LSU  

•  Ed  Seidel-­‐led  vision  to  insBgate  state-­‐wide  collaboraBon  

Page 13: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

13   Advancing  Science  through  CI  –  [email protected]  

Advancing  Research  

•  PotenBally  requires  advances  in  three  areas,  depending  on  exisBng  strengths  

Page 14: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

14   Advancing  Science  through  CI  –  [email protected]  

CCT

Director Office Edward Seidel

HPC Partnership McMahon

Cyberinfrastructure Development���

Katz Focus Areas

Allen

LONI Systems and Software

Coast to Cosmos

LSU HPC Performance Team Core Comp. Sci.

Corporate Relations

Blue Waters, etc. Material World

Labs: ACAL, DSL, Viz, LCAT, …

NSF TeraGrid Cultural Computing Visualization

14  

CCT  OrganizaBon  

Page 15: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

15   Advancing  Science  through  CI  –  [email protected]  

Cyberinfrastructure  Development  

•  Vision:  combine  research  and  infrastructure  –  Research  

o Computer  science  o ApplicaBons  o  Tools  

•  Both  together  have  squared  growth  of  either  alone  

•  CyD  staff  –  PhDs  in  CS  and  apps  who  understand  the  whole  picture  and  want  to  grow  the  ecosystem  

15  

–  Infrastructure  o Hardware  o OperaBons  o  Policies  

Page 16: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

16   Advancing  Science  through  CI  –  [email protected]  

NaBonal  Lambda  Rail  UNO  Tulane  UL-­‐L  

SUBR  LSU  

LA  Tech  

   

LONI:  40  Gbps  network  

LONI:  ~100TF  IBM,  Dell  Supercomputers  

Cybertools:  Tools  and  Services  

CompuBng  in  Louisiana  

LONI  InsBtute:  People  and  CollaboraBons  

TeraGrid,  OSG  

Page 17: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

17   Advancing  Science  through  CI  –  [email protected]  

LONI  -­‐  Networking  &  CompuBng  

LSU

La Tech LSU HSC

ULL

Tulane

SU

UNO LSU HSC

LONI node Multiple 10GE ~500 core Dell cluster & 112 proc. IBM P5 cluster ~4500 core Dell Cluster

ULM

McNeese

NSU

SLU

Alex

Network:  partners  and  customers  

Page 18: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

18   Advancing  Science  through  CI  –  [email protected]  

LONI  CompuBng  Resources  (2010)  

•  One  central  Dell  cluster  (Queen  Bee)  –  5500  IB-­‐connected  cores  at  ISB  in  Baton  Rouge  –  Archival  storage  contracted  through  NCSA  –  50%  of  allocaBons  dedicated  to  TeraGrid  from  2008      

•  Six  distributed  512-­‐core  Dell  clusters  •  Five  distributed  14-­‐node  (112  procs)  IBM  P5-­‐575  clusters  •  Distributed  PetaShare  storage  

–  32  TB  disk  @  each  small  Dell  cluster  –  8  TB  disk  on  LSU  &  LaTech  small  Dell  clusters  –  for  LBRN  –  8  TB  at  SC-­‐S  &  HSC-­‐NO  –  for  LBRN  –  250  TB  tape  

•  All  run  by  HPC@LSU,  including  user  support/training  

Page 19: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

19   Advancing  Science  through  CI  –  [email protected]  

$12M  NSF  CyberTools  Project:  Enabler  and  Driver  

Page 20: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

20   Advancing  Science  through  CI  –  [email protected]  

Cactus  •  Component-­‐based    

HPC  framework    –  Freely-­‐available    

environment  for    collaboraBve  applicaBon    development  

•  Cuzng  edge  CS  –  Grid  compuBng,  petascale,  accelerators,  steering,  remote  viz  

•  AcBve  user  &  developer  communiBes  –  10  year  pedigree,  >$10M  support  –  Numerical  RelaBvity,  CFD,  Coastal,  Reservoir  Engineering,  …  

•  Domain-­‐specific  toolkits,  e.g.  CFD  toolkit  –  FD/FV/FE  numerical  methods  –  Structured,  mulB-­‐block,  unstructured  –  Uses  PETSc,  Trilinos,  MUMPS,  HYPRE  –  Used  to  build  Black  Oil  Toolkit  

Page 21: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

21   Advancing  Science  through  CI  –  [email protected]  

PetaShare  

•  Main  concept:  data  is  managed  (migrated,  moved,  replicated,  cached,  etc.)    automaBcally  

•  Data-­‐aware  storage  systems,  data-­‐aware  schedulers,  cross-­‐domain  metadata  scheme  

•  Provides:  250  TB  disk,  400  TB  tape    storage  (and  access  to  naBonal    storage  faciliBes)  

•  ApplicaBons:    coastal  &  environmental    modeling,    geospaBal  analysis,    bioinformaBcs,    medical  imaging,    fluid  dynamics,    petroleum  engineering,    numerical  relaBvity,    high  energy  physics.        

Credit:  Tevfik  Kosar  

Page 22: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

22   Advancing  Science  through  CI  –  [email protected]  

LONI  InsBtute    “CCT  for  the  Louisiana”  •  $15M  5-­‐year  project  

–  $7M  BoR,  $8M  from  LaTech,  LSU,  SUBR,  Tulane,  UNO,  ULL  

•  Catalyzes  new  inter-­‐insBtuBonal  collaboraBons,  ambiBous  projects  and  top  level  hires:  –  LONI  network  and  compuBng  –  NSF  projects:    PetaShare,  VizTangibles,  TeraGrid,  Blue  Waters  

–  EPSCoR:    NSF  CyberTools,  DOE  UCoMS,  DoD    –  NIH:  $17M  LBRN  –  Promote  collaboraBve  research  at  interfaces  for  innovaBon  

Page 23: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

23   Advancing  Science  through  CI  –  [email protected]  

LONI  InsBtute  Vision  •  LONI  investments  create  world  leading  infrastructure  •  Create  bold  new  inter-­‐university  superstructure  

–  New  faculty,  staff,  students;    train  others.    Focus  on  CS,  Bio,  Materials,  but  all  disciplines  impacted  

–  Promote  research  at  interfaces  for  innovaBon  •  Draw  on,  enhance  strengths  of  all  universiBes  

–  Strong  groups  recently  created;    collecBvely  world-­‐class  –  Solve  complex  problems  through  collaboraBon  &  computaBon  –  Much  stronger  recruiBng  opportuniBes  for  all  insBtuBons  –  Statewide  interdisciplinary  educaBon  &  research  program  

•  Create  University-­‐Industry  Research  Centers  (UIRCs)  –  Research  Triangle,  NCSA/UIUC,  Bay  Area,  others  

•  Transform  Louisiana  –  Such  commived  cooperaBon  between  sites  extraordinary  

Page 24: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

24   Advancing  Science  through  CI  –  [email protected]  

LONI  InsBtute  Hiring  and  Projects  •  Two  new  faculty  at  each  insBtuBon  (12  total)  

–  Six  in  CS,  six  in  Comp.  Bio/Materials  •  Six  ComputaBonal  ScienBsts  

–  Following  Bavarian  KONWIHR  project  –  Support  70-­‐90  projects  over  five  years;  lead  to  external  funding  

•  Graduate  students  –  36  new  students  funded,  trained;  two  years  each  

•  One  Coordinator/economic  development  •  All  hiring  coordinated  across  state  •  Leading  faculty  across  state  create  mulB-­‐insBtuBonal  seed  

projects  •  Building  on  seeds,  dozens  of  new  projects  selected,  started  •  Exploit  common  themes,  compuBng  environments,  tools  

found  in  all  areas  

Page 25: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

25   Advancing  Science  through  CI  –  [email protected]  

TeraGrid  (XSEDE)  •  TeraGrid:  world’s  largest  open  scienBfic  discovery  infrastructure  •  Leadership  class  resources  at  eleven  partner  sites  combined  to  create  

an  integrated,  persistent  computaBonal  resource  –  High-­‐performance  networks  –  High-­‐performance  computers  (>1  Pflops  (~100,000  cores)  -­‐>  1.75  Pflops)  

o  And  a  Condor  pool  (w/  ~13,000  CPUs)  –  VisualizaBon  systems  –  Data  CollecBons  (>30  PB,  >100  discipline-­‐specific  databases)  –  Science  Gateways  –  User  portal  –  User  services  -­‐  Help  desk,  training,  advanced  app  support  

•  Allocated  to  US  researchers  and  their  collaborators  through  naBonal  peer-­‐review  process  

–  Generally,  review  of  compuBng,  not  science  •  Mid  2011:  TeraGrid  -­‐-­‐>  XSEDE  

Page 26: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

26   Advancing  Science  through  CI  –  [email protected]  

Campus  Champions  •  “Champion”  is  a  staff  or  faculty  member  on  a  campus  that  provides  informaBon  on  

XSEDE  to  his/her  colleagues  •  Currently  ~160  insBtuBons  represented  by  champions  •  Champions  get:  

–  Monthly  training  and  updates  –  Start-­‐up  accounts  –  Forum  for  sharing  and  interacBons  –  Access  to  informaBon  on  usage  by    

local  users  –  RegistraBons  for  annual  XSEDE    

Conference  waived  •  Champions  do:  

–  Raise  awareness  locally  –  Provide  training  –  Get  users  started  with  access  quickly  –  Represent  needs  of  local  community  –  Provide  feedback  to  improve  services  –  Avend  annual  XSEDE  conference  –  Share  their  training  and  educaBon  materials  –  Build  community  across  campus,  and  among  all  Champions  

March 26, 2014

Revised March 22, 2014

Campus Champion Institutions Standard – 87

EPSCoR States – 51

Minority Serving Institutions – 12 EPSCoR States and Minority Serving Institutions – 8

Total Campus Champion Institutions – 158

Credit:  Kay  Hunt  

Page 27: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

27   Advancing  Science  through  CI  –  [email protected]  

LONI  and  NaBonal  Cyberinfrastructure  •  TeraGrid  

–  One  of  the  11  TeraGrid  Resource  Providers  –  Playing  a  role  in  TG-­‐wide  governance  (TeraGrid  Forum,  ExecuBve  Steering  Commivee,  various  working  groups,  GIG  Director  of  Science)  

–  Contributed  administraBve  soiware  AmieGold  (glue  between  TG  account  info  and  local  info)  and  CS  soiware  (HARC,  PetaShare,  SAGA)  

•  OSG  –  Currently  providing  resources  

•  XSEDE  –  LONI  not  a  partner  in  XSEDE,  but  a  service  provider  

•  NaBonally  –  Bringing  in  new  users  from  the  southeast  US  –  LONI  InsBtute  ComputaBonal  ScienBsts  -­‐>    Campus  Champions  

Page 28: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

28   Advancing  Science  through  CI  –  [email protected]  

Create  and  maintain  a  CI  ecosystem  providing  new  capabili'es  that  advance  and  accelerate  scienBfic  inquiry  at  unprecedented  complexity  and  scale  

Support  the  foundaBonal  research  needed  to  conBnue  to  efficiently  advance  CI  

Enable  transformaBve,  interdisciplinary,  collaboraBve,  science  and  engineering  research  and  educaBon  through  the  use  of  advanced  CI  

Transform  pracBce  through  new  policies  for  CI  addressing  challenges  of  academic  culture,  open  disseminaBon  and  use,  reproducibility  and  trust,  curaBon,  sustainability,  governance,  citaBon,  stewardship,  and  avribuBon  of  authorship  

Develop  a  next  generaBon  diverse  workforce  of  scienBsts  and  engineers  equipped  with  essenBal  skills  to  use  and  develop  CI,  with  CI  used  in  both  the  research  and  educa'on  process  

NSF  Vision:  Infrastructure  Role  &  Lifecycle  

Page 29: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

29   Advancing  Science  through  CI  –  [email protected]  

Relevant  NSF  Programs  

•  EPSCoR  –  targeted  support  for  states  that  are  less  successful  in  NSF  funding  

•  MRI  –  Major  Research  InstrumentaBon  •  CIF21  (NSF’s  CI  umbrella)  

–  eXtreme  Digital  (XD)  –  Track  1  (Blue  Waters)  –  Soiware  Infrastructure  for  Sustained  InnovaBon  (SI2)  –  Campus  Cyberinfrastructure  -­‐  Network  Infrastructure  and  Engineering  (CC-­‐NIE)  

•  IntegraBve  Graduate  EducaBon  and  Research  Traineeship  Program  (IGERT)  

•  General  research  programs  

Page 30: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

30   Advancing  Science  through  CI  –  [email protected]  

Recap  (to  2010)  

•  Louisiana  decides  that  science  and  technology  can  lead  to  a  bever  future  

•  Builds  a  regional  cyberinfrastructure  (network,  compuBng,  soiware,  ~data,  people)  that  connects  to  naBonal-­‐scale  infrastructure    –  Using  a  mix  of  naBonal,  state,  and  local  funding  

•  Starts  to  change  culture  –  infuse  computaBon  in  academic  departments,  interdisciplinary  hiring,  large  collaboraBve  projects  

•  But...  •  Didn’t  really  think  about  data  as  much  as  we  would  have  were  we  starBng  again  today  

Page 31: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

31   Advancing  Science  through  CI  –  [email protected]  

•  Swii  is  designed  to  compose  large  parallel  workflows,  from  serial  or  parallel  applicaBon  programs,  to  run  fast  and  efficiently  on  a  variety  of  pla~orms  

–  A  parallel  scripBng  system  for  Grids  and  clusters  for  loosely-­‐coupled  applicaBons  -­‐  programs  (executable,  shell,  python,  R,  Octave,  Matlab,  etc.)  linked  by  exchanging  files  

–  Easy  to  write:  simple  high-­‐level  C-­‐like  funcBonal  language,  allows  small  Swii  scripts  to  do  large-­‐scale  work  

–  Easy  to  run:  contains  all  services  for  running,  in  one  Java  applicaBon  o  Works  on  mulBcore  workstaBons,  HPC,  Grids  (interfaces  to  schedulers,  Globus,  ssh)  

–  A  powerful,  efficient,  scalable  and  flexible  execuBon  engine.    o  Scaling  O(10M)  tasks  –  .5M  in  live  science  work,  and  growing  o  CollecBve  data  management  being  developed  to  opBmize  I/O  

•  Used  in  earth  science,  neuroscience,  proteomics,  molecular  dynamics,  biochemistry,  economics,  staBsBcs,  knowledge  modeling,  and  more  

•  hvp://www.ci.uchicago.edu/swii  

M.  Wilde,  N.  Hategan,  J.  M.  Wozniak,  B.  Clifford,  D.  S.  Katz,  I.  Foster,  "Swii:  A  language  for  distributed  parallel  scripBng,"  Parallel  CompuBng,  v.37(9),  pp.  633-­‐652,  2011.  

Page 32: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

32   Advancing  Science  through  CI  –  [email protected]  

Swii  Programming  model:  all  execuBon  driven  by  parallel  data  flow  

•  analyze1()  and  analyze2()  are  computed  in  parallel  •  analyze()  returns  r  when  they  are  done  •  This  parallelism  is  automa1c  •  Works  recursively  throughout  the  program’s  call  graph  

–  E.g.,  can  embed  within  foreach  loop,  itself  done  in  parallel  –  Foreach  loops  can  be  nested  

(int r) analyze(int i)!{! j = analyze1(i); ! k = analyze2(i);! r = 0.5*(j + k);!}!!

Page 33: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

33   Advancing  Science  through  CI  –  [email protected]  

Submit host (login node, laptop, Linux server)

Data server

Swift script

Swii  runBme  system  has  drivers  and  algorithms  to  efficiently  support  and  aggregate  vastly  diverse  runBme  environments  

Swii  Environment  

Clouds:  Amazon  EC2,  XSEDE  Wispy,  Future  Grid  …  

Application Programs

Page 34: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

34   Advancing  Science  through  CI  –  [email protected]  

Globus  

Big data transfer and sharing…

…with Dropbox-like simplicity… …directly from your own storage systems

Run as a non-profit service

to the non-profit research community

Page 35: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

35   Advancing  Science  through  CI  –  [email protected]  

Globus  Users  

•  “I  need  a  good  place  to  store  or  backup  my  (big)  research  data,  at  a  reasonable  price.”  

•  “I  need  to  easily,  quickly,  and  reliably  move  or  mirror  porBons  of  my  data  to  other  places,  including  my  campus  HPC  system,  lab  server,  desktop,  laptop,  XSEDE,  cloud,  etc.”  

•  “I  need  a  way  to  easily  and  securely  share  my  data  with  my  colleagues  at  other  insBtuBons.”  

•  “I  want  to  publish  my  data  so  that  it’s  available  and  discoverable  long-­‐term.”  

•  “I  want  to  archive  my  data  in  case  it’s  needed  someBme  in  the  future.”  

Page 36: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

36   Advancing  Science  through  CI  –  [email protected]  

Globus  is  SaaS  

•  Web,  command  line,  and  REST  interfaces  •  Reduced  IT  operaBonal  costs  •  New  features  automaBcally  available  •  Consolidated  support  &  troubleshooBng  •  Easy  to  add  your  laptop,  server,  cluster,  supercomputer,  etc.  with  Globus  Connect    

Page 37: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

37   Advancing  Science  through  CI  –  [email protected]  

Globus  Connected  Resources  on  Campus  

•  Research  compuBng  center  •  Department  /  lab  storage  •  Campus-­‐wide  home/project  file  system  •  Mass  Storage  Systems  •  Science  instruments  •  Desktops  and  laptops  •  Custom  web  applicaBons  •  Amazon  Web  Services  S3  

Page 38: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

38   Advancing  Science  through  CI  –  [email protected]  

Lessons  •  Three  triangle  facets  (infrastructure,  computaBonal,  interdisciplinary)  have  

be  taken  seriously  at  highest  levels,  seen  as  important  component  of  academic  research  

•  Infrastructure  need  to  be  integrated  at  all  levels  (laboratory,  campus,  regional,  naBonal,  internaBonal)  –  users  need  to  be  able  to  easily  move  work  and  data  to  appropriate  systems,  and  collaborate  across  locaBons    

•  EducaBon  and  training  of  students  and  faculty  is  crucial  –  vast  improvements  are  needed  over  the  small  numbers  currently  reached  through  HPC  center  tutorials;  computaBon  and  computaBonal  thinking  need  to  be  part  of  new  curricula  across  all  disciplines    

•  Emphasis  should  be  made  on  broadening  parBcipaBon  in  computaBon,  not  just  focusing  on  high  end  systems  where  decreasing  numbers  of  researchers  can  join  in,  but  making  tools  much  more  easily  usable  and  intuiBve  and  freeing  all  researchers  from  the  limitaBons  of  their  personal  workstaBons,  and  providing  access  to  simple  tools  for  large  scale  parameter  studies,  data  archiving,  visualizaBon  and  collaboraBon  

•  Vision  needs  to  be  consistent  –  cannot  be  just  one  person  •  Funding  needs  to  be  stable  (acBviBes  need  to  be  sustainable)  

Page 39: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

39   Advancing  Science  through  CI  –  [email protected]  

Video  

•  Data  Sharing  -­‐  hvps://www.youtube.com/watch?v=N2zK3sAtr-­‐4  

Page 40: Advancing Science through Coordinated Cyberinfrastructure

www.ci.anl.gov  www.ci.uchicago.edu  

40   Advancing  Science  through  CI  –  [email protected]  

Sources  •  D.  S.  Katz  et  al.,  “Louisiana:  A  Model  for  Advancing  Regional  e-­‐Science  

through  Cyberinfrastructure,”  Philosophical  TransacBons  of  the  Royal  Society  A,  367(1897),  2009.  

–  authors  from  Louisiana  State  University,  Tulane  University,  University  of  Louisiana  at  Lafayeve,  Louisiana  Tech  University,  Louisiana  Community  and  Technical  College  System,  Southern  University,  University  of  New  Orleans  

•  G.  Allen  and  D.  S.  Katz,  “ComputaBonal  science,  infrastructure  and  interdisciplinary  research  on  university  campuses:  experiences  and  lessons  from  the  Center  for  ComputaBon  and  Technology,”  NSF  Workshop  on  Sustainable  Funding  and  Business  Models  for  Academic  Cyberinfrastructure  FaciliBes,  Cornell  University,  2010  

•  Daniel  S.  Katz,  David  Proctor,  “A  Framework  for  Discussing  e-­‐Research  Infrastructure  Sustainability,”  hvp://dx.doi.org/10.6084/m9.figshare.790767,  submived  to  Workshop  on  Sustainable  Soiware  for  Science:  PracBce  and  Experiences  (hvp://wssspe.researchcompuBng.org.uk)  at  SC13  

•  Swii:  Swii  Team,  led  by  Mike  Wilde,  hvp://www.ci.uchicago.edu/swii  •  Globus:  Globus  Team,  led  by  Ian  Foster  and  Steve  Tuecke,  hvp://

www.globus.org