NDN and Big Data Science - NIST

39
NDN and Big Data Science Inder Monga Interim Director and CTO, ESnet Interim Director, Scien8fic Networking Division Lawrence Berkeley Na8onal Lab Named-Data Network Workshop @ NIST May 31-June 1

Transcript of NDN and Big Data Science - NIST

Page 1: NDN and Big Data Science - NIST

NDNandBigDataScience

InderMongaInterimDirectorandCTO,ESnetInterimDirector,Scien8ficNetworkingDivisionLawrenceBerkeleyNa8onalLab

Named-DataNetworkWorkshop@NIST

May31-June1

Page 2: NDN and Big Data Science - NIST

Agenda

6/1/16 imongaatesdotnet2

BigScienceData

GlobalScienceCollabora8ons

NDNforScience

Page 3: NDN and Big Data Science - NIST

Howbigisdata(visualcomparison)*

•  Byte•  Kilobyte•  Megabyte

•  Gigabyte•  Terabyte•  Petabyte•  Exabyte•  ZeTabyte

3

•  Onegrainofrice•  Cupofrice•  8bagsofrice•  3tractortrailers•  2containerships•  LayerofriceoverManhaTan

•  2layersovertheUnitedKingdom

•  FillsthePacificocean

*DavidWellman@MyriadGene8cs6/1/16 imongaatesdotnet

Page 4: NDN and Big Data Science - NIST

4

EveryInstagramphoto=110KB216,000photosaresenttoInstagrameveryminute

Thisequals23GBofdataperminute

InstagramDataproducedperdayworldwide=33TBEqualtofilling~1,032–32GBiPhones

6/1/16 imongaatesdotnet

Page 5: NDN and Big Data Science - NIST

5 6/1/16 imongaatesdotnet

Page 6: NDN and Big Data Science - NIST

DOEScience“Apps”

6 6/1/16 imongaatesdotnet

Page 7: NDN and Big Data Science - NIST

AdvancedLightSource

7 6/1/16 imongaatesdotnet

Page 8: NDN and Big Data Science - NIST

AdvancedLightSource

8 6/1/16 imongaatesdotnet

Page 9: NDN and Big Data Science - NIST

AdvancedLightSource

9 6/1/16 imongaatesdotnet

Page 10: NDN and Big Data Science - NIST

Scenario1:Alltoocommonprocessofdiscovery

106/1/16 imongaatesdotnet

Page 11: NDN and Big Data Science - NIST

Beamline–CapturetoResults

BasicEnergySciences(BES)supportsfundamentalresearchtounderstand,predict,andul:matelycontrolma;erandenergyattheelectronic,atomic,andmolecularlevelsinordertoprovidethefounda:onsfornewenergytechnologiesandtosupportDOEmissionsinenergy,environment,andna:onalsecurity.

h;p://science.energy.gov/bes/

11 6/1/16 imongaatesdotnet

Page 12: NDN and Big Data Science - NIST

A_erprocessingonasupercomputer,modelsarecreated.

Hundredstothousandsofimagesarecreatedinafewhours…theycanrangeinsizefromMBtoTB

Scenario2:EPluribusUnumProcessingonthisorderofmagnitudecan’tbedonelocally–weneedtosend(overanetwork)toamorecapablefacility

12 6/1/16 imongaatesdotnet

Page 13: NDN and Big Data Science - NIST

BigDatavs.BigData

Don’tForget:

InstagramDataproduced/dayworldwidebymillionsofpeople

=33TB

OneBiologyexperimentatoneBeamlinebyateamofnine

scien]sts:

=119TB(PhotosystemIIX-RayStudy)

13 6/1/16 imongaatesdotnet

Page 14: NDN and Big Data Science - NIST

BigScienceDatainMo]on=ElephantFlow!IoTwatchingLOLCats=Miceflow!

14

Science Data

6/1/16 imongaatesdotnet

Page 15: NDN and Big Data Science - NIST

ElephantDatavs.MiceDataBehavior

15 6/1/16 imongaatesdotnet

Page 16: NDN and Big Data Science - NIST

ScienceDataTransferredMonthlybyESnet

16

Factorof10growthevery47months

Pt-to-ptcircuits

LHCONE(T1-T1/2)traffic

Availableath;ps://my.es.net/network/traffic-volume

6/1/16 imongaatesdotnet

Page 17: NDN and Big Data Science - NIST

Trafficgrowthatblisteringrates

6/1/1617

10000

10

1

100

1000

Petabytesp

erM

onth

European

Observed Projected

43 PB/Month in April 2016 Run2 is Accelerating

the Growth

Projected Traffic Reaches

1 Exabyte Per Month. by ~2020

10 EB/Mo. by ~2024

SlidefromHarveyNewman

imongaatesdotnet

Page 18: NDN and Big Data Science - NIST

Superfacility:interconnec]onofmul]plefacili]esviathenetwork

ResearchersfromBerkeleyLabandSLACconductedproteincrystallographyexperimentsatLCLStoinves]gatephotoexcitedstatesofPSII,withnear-real-]mecomputa]onalanalysisatNERSC.

18

“Takingsnapshotsofphotosynthe8cwateroxida8onusingfemtosecondX-raydiffrac8onandspectroscopy,”NatureCommunica:ons5,4371(9July2014)

6/1/16 imongaatesdotnet

Page 19: NDN and Big Data Science - NIST

Agenda

6/1/16 imongaatesdotnet19

BigScienceData

GlobalScienceCollabora8ons

NDNforScience

Page 20: NDN and Big Data Science - NIST

UseCase#1

6/1/16 imongaatesdotnet20

ResearchersfromBerkeleyLabandSLACconductedproteincrystallographyexperimentsatLCLStoinves]gatephotoexcited

statesofPSII,withnear-real-]mecomputa]onalanalysisatNERSC.

“Takingsnapshotsofphotosynthe8cwateroxida8onusingfemtosecondX-raydiffrac8onandspectroscopy,”NatureCommunica:ons5,4371(9July2014)

50TBmovedanight

Page 21: NDN and Big Data Science - NIST

UseCase#2:LHCONEdata–mul]plereplicas,globalreach

LHCRun2=300PBofdata,1EBin2018

6/1/16 imongaatesdotnet21

Page 22: NDN and Big Data Science - NIST

UseCase3:WorldwideEarthSystemGridFedera]onSites

6/1/16 imongaatesdotnet22

Page 23: NDN and Big Data Science - NIST

UseCaseGalore

#4-LCLS:DatacomingfromChile,StoredinNCSA,andanalyzedamongaglobalcollabora8on

#5-SKA:DatacomingfromSouthAfricaandAustralia,analyzedamongaglobalcollabora8on

#6-Bio-Health,PrecisionMedicine,Genomics:Open-datatrend,data-setsavailableatmanywebsites.

6/1/16 imongaatesdotnet23

Page 24: NDN and Big Data Science - NIST

ESnet

NewMath

Real-]meanalysis

HighperformanceSolware

Novelcompute/

dataplamorms

Datamgmt.andsharing

Program-mablenetwork

Extreme Data Science Facility

(XDSF)

MS-DESI

ALS

LHC

JGI

APS

LCLS

Other data-producing sources

-24-

SuperfacilityVision:Anetworkofconnectedfacili]es,solwareandexper]setoenablenewmodesofdiscovery

6/1/16 imongaatesdotnet

Page 25: NDN and Big Data Science - NIST

Agenda

6/1/16 imongaatesdotnet25

BigScienceData

GlobalScienceCollabora8ons

NDNforScienceThankstoChristosPapadopoulos,SusmitShannigrahi

Page 26: NDN and Big Data Science - NIST

High-levelobjec]vesforscien]ficdata:alignmentwithNDNapproach

6/1/16 imongaatesdotnet26

•  Abstractthestorageandnetworkcapabilityandloca8ondependencefromtheuser-datainterac]on

•  Enabletheabilityforuserstospecifyandretrievepor]onsofdatatheworkflowneeds

•  Radicallysimplifyhowscien8ficusersmanage,moveandmanipulatelarge,distributed,sciencedatarepositories,butwithhigh-throughputend2end

•  Createasecure,scalableframeworkbasedonintegrateddatamanagementandnetworktransport

Page 27: NDN and Big Data Science - NIST

6/1/16 imongaatesdotnet27

Challenge#1:NamingandDataDiscovery

Page 28: NDN and Big Data Science - NIST

DataDiscoveryUI

6/1/16 imongaatesdotnet28

Page 29: NDN and Big Data Science - NIST

DataDiscoveryUI

•  Threeintui8vewaystosearchscien8ficdata– Auto-complete,namecomponentbasedsearch,andtreeview

•  Canworkwithanyhierarchicaldatasets– Wehavetwoinstances,forclimateandHEPdata

•  Providesmetadatabrowsing,subseqng,stagingcapabili8es

6/1/16 imongaatesdotnet29

Page 30: NDN and Big Data Science - NIST

Challenge#2:Subseongofdata

6/1/16 imongaatesdotnet30

Page 31: NDN and Big Data Science - NIST

Subseong

•  NDNnameseasilyextendtosupportsubseqng

•  Addqueryparametersasaencodednamecomponent

•  ServicescanparseInterestnameandperformintendedac8on– Retrievala*ersubse.ngismucheconomicalthanSubse.nga*erretrieval

6/1/16 imongaatesdotnet31

Page 32: NDN and Big Data Science - NIST

Challenge#3:Highperformanceend-2-end

6/1/16 imongaatesdotnet32

MetroArea

Local(LAN)

RegionalCon8nental

Interna8onal

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

With loss, high performance beyond metro distances is essentially impossible

Page 33: NDN and Big Data Science - NIST

NDNwithOSCARS

•  SomeDatatransfersrequirehighbandwidthreservedpaths

• WehaveintegratedaNDNstrategywithOSCARS– AdataretrievalManagerexpressesspecialInteresttostrategylayer

–  StrategycommunicateswithOSCARStoreserveapath–  Interest/Dataexchangeusesthenewlycreatedpath

•  Fullytransparenttotheapplica8on

Page 34: NDN and Big Data Science - NIST

Retrieval Decision Manager

NDN Strategy

Data Requester

Data Source 1

Data Source 2

Reservation capable face(s)

Express retrieval requirements to Decision Manager

Consult Strategy: What are the options?

Explore and compile options

Inform strategy of decision Tell client when to start

Return options

NDNforIntelligentDataTransfers

•  Strategyforlargescien8ficdatatransfers

•  RetrievalManagerqueriesnetworkforop8ons•  Makesadecision,informsstrategy•  Tellsclienttostartretrieval

6/1/16 imongaatesdotnet34

Page 35: NDN and Big Data Science - NIST

Retrieval Decision Manager

NDN Strategy

Data Requester

Data Source 1

Data Source 2

Reservation capable face(s)

1. Consult Strategy: What are the options?

2. Response: Reservation

Reservation Manager

(OSCARS)

4. Inform strategy on successful reservation 5. Strategy sets path priority for that interest

3. Setup reservation

6/1/16 imongaatesdotnet35

Page 36: NDN and Big Data Science - NIST

RoadmapforNDNExperimenta]on

6/1/16 imongaatesdotnet36

• DataSubseqngforclimate

• SusmitS.,ChristosP.,JohnWu,AlexSim

SummerInternship

2012

• DeployNDNnodes

• ConfigureNDNoverlay

CC-NIEcollabora8onwithCSU2013

• PresentpaperatCHEP

• FundingforFermi/Caltech

NDNforHEP2015

• ICNWG–ESGFfocus

• High-speeddatatransfer,mul8plereplicas

ExpandClimateTestbed20xx?

• FromLabtoWAN

• xRootDintegra8on

HEPTestbedcombine20xx?

• InfrastructureawareNDN

• ForwardingStrategies

High-Throughput

NDN20xx?

Page 37: NDN and Big Data Science - NIST

Manyunprovenques]onss]ll…

• Whereisthecomplexitybeingpushedto,andwhatneedstobedonetomanagethat?–  Fromthescien8sttothenetwork

•  Howcananetworkoperatormaintain,automateandopera8onallymanagethatcomplexity?

•  Thinkthroughthefailuremodels•  Thinkthroughperformancemodels

•  Howdoesthisworkorcompetewithso_warescien8stshavealreadybuilttomanagetheirdata–what’sthebestwaytointegrateand/ormigrate?

6/1/16 imongaatesdotnet37

Page 38: NDN and Big Data Science - NIST

Thefutureisnewdatascien]sts!

17-year-oldBri;anyWegnercreatesbreastcancerdetec:ontoolthatis99%accurateonaminimallyinvasive,previouslyinaccuratetest.MachineLearning+OnlineData+CloudCompu]ng

6/1/16 imongaatesdotnet38

Page 39: NDN and Big Data Science - NIST

Thankyou!

Imongaatesdotnet

6/1/16 imongaatesdotnet39