Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share%...

13
Yushu Yao Big Data @ NERSC - Data sharing and analytic Services

Transcript of Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share%...

Page 1: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Yushu Yao

Big Data @ NERSC - Data sharing and analytic Services

Page 2: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Yushu  Yao  -­‐  2  -­‐  

Uses of Data at NERSC Experiment  

Computer  Simula6on  

Store  

Analyze  Share  

•  Data  comes  to  (or  generated  at)  NERSC  from  Apparatus  or  Computer  Simula;ons  

•  Three  Things  People  Do:  

Page 3: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Store/Share/Analyze Data At NERSC

-­‐  3  -­‐   Yushu  Yao  

Page 4: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Store/Share/Analyze Data At NERSC

-­‐  4  -­‐   Yushu  Yao  

Science  Gateway  

GPFS  

HPSS  

Page 5: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Science Gateway Services

-­‐  5  -­‐   Yushu  Yao  

•  Publish  data  on  the  web  –  Create  a  www  directory  in  your  project  space  and  put  your  data  on  the  

web  •  Build  sophis;cated  web  portals    

–  Build  full  stack  web  applica6ons  for  your  science  at  NERSC  using  Python/Django,  PHP,  Ruby  on  Rails  

•  NEWT  –  the  NERSC  REST  API  –  Use  the  NEWT  REST  HTTP  API  to  access  NERSC  HPC  resources  directly  

from  your  web  apps.  –  Support  for  Authen6ca6on,  Jobs,  Commands,  Files,  Persistent  Store,  

NIM,  System  Informa6on  at  NERSC  

Gateway  examples  -­‐  hVp://portal.nersc.gov  NEWT  –  hVps://newt.nersc.gov  Science  Gateway  informa6on  hVp://www.nersc.gov/users/science-­‐gateways/    

Page 6: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Traditional SQL Database Services

•  PostgreSQL  and  MySQL  

•  Good  For:  –  Structured,  Rela6onal  Data  – Mid-­‐Size,  <=several  GB  in  total  –  Transac6onal  Opera6ons  

Yushu  Yao  -­‐  6  -­‐  

Page 7: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

MongoDB

•  Key-­‐value  pair  /  Text  database  

•  Good  For:  –  Un-­‐Structured,  Text  Data  – Mid-­‐Size  to  Large,  e.g.  10  GB  of  Text  –  E.g:  for  metadata  that  has  ever  changing  schema  

Yushu  Yao  -­‐  7  -­‐  

To  request  a  MongoDB/PostgreSql/MySQL  database:  hVps://www.nersc.gov/users/science-­‐gateways/science-­‐gateway-­‐databases/  

Page 8: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

RStudio Beta Service (r.nersc.gov)

•  Simply  go  to  hQp://r.nersc.gov  and  login  with  NIM  username/password  

•  IDE  for  the  popular  R  analy;c  package:  

Yushu  Yao  -­‐  8  -­‐  

•  Access  to  Global  File  Systems  (Project,  etc)  

•  Rich  set  of  machine  learning  and  other  analy6c  packages  

•  Web-­‐based,  pick  up  unfinished  work  on  any  other  browser.    

Page 9: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

Spark/Hadoop Testbed

•  Map  Reduce  on  Large  Amount  of  Data  

•  Good  For:  –  Un-­‐Structured  or  structured,  not  easily  indexed  –  Large,  e.g.  100  GB  or  more.    –  E.g:  Large  amount  of  sequencing  data  

Yushu  Yao  -­‐  9  -­‐  

Page 10: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

SciDB Testbed: Array-Like Data Examples

Yushu  Yao  -­‐  10  -­‐  

Page 11: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

SciDB Testbed at NERSC

•  Large  Parallel  Array  Database/Analy;c  •  Good  For:  –  Large  (10GB~10TB)  array  structure  data  

Yushu  Yao  -­‐  11  -­‐  

Page 12: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

More Information

•  For  more  informa;on  about  the  SciDB  and  Spark  testbeds  at  NERSC,  please  email  [email protected]  

 

Yushu  Yao  -­‐  12  -­‐  

Page 13: Big Data @ NERSC...2 Yushu%Yao% Uses of Data at NERSC Experiment Computer%Simulaon% Store% Share% Analyze% • Data$comes$to$(or$generated$at)$NERSC$from$Apparatus$or$ ... Science

National Energy Research Scientific Computing Center

-­‐  13  -­‐   Yushu  Yao