NDN and Big Data Science - NIST
Transcript of NDN and Big Data Science - NIST
NDNandBigDataScience
InderMongaInterimDirectorandCTO,ESnetInterimDirector,Scien8ficNetworkingDivisionLawrenceBerkeleyNa8onalLab
Named-DataNetworkWorkshop@NIST
May31-June1
Agenda
6/1/16 imongaatesdotnet2
BigScienceData
GlobalScienceCollabora8ons
NDNforScience
Howbigisdata(visualcomparison)*
• Byte• Kilobyte• Megabyte
• Gigabyte• Terabyte• Petabyte• Exabyte• ZeTabyte
3
• Onegrainofrice• Cupofrice• 8bagsofrice• 3tractortrailers• 2containerships• LayerofriceoverManhaTan
• 2layersovertheUnitedKingdom
• FillsthePacificocean
*DavidWellman@MyriadGene8cs6/1/16 imongaatesdotnet
4
EveryInstagramphoto=110KB216,000photosaresenttoInstagrameveryminute
Thisequals23GBofdataperminute
InstagramDataproducedperdayworldwide=33TBEqualtofilling~1,032–32GBiPhones
6/1/16 imongaatesdotnet
5 6/1/16 imongaatesdotnet
DOEScience“Apps”
6 6/1/16 imongaatesdotnet
AdvancedLightSource
7 6/1/16 imongaatesdotnet
AdvancedLightSource
8 6/1/16 imongaatesdotnet
AdvancedLightSource
9 6/1/16 imongaatesdotnet
Scenario1:Alltoocommonprocessofdiscovery
106/1/16 imongaatesdotnet
Beamline–CapturetoResults
BasicEnergySciences(BES)supportsfundamentalresearchtounderstand,predict,andul:matelycontrolma;erandenergyattheelectronic,atomic,andmolecularlevelsinordertoprovidethefounda:onsfornewenergytechnologiesandtosupportDOEmissionsinenergy,environment,andna:onalsecurity.
h;p://science.energy.gov/bes/
11 6/1/16 imongaatesdotnet
A_erprocessingonasupercomputer,modelsarecreated.
Hundredstothousandsofimagesarecreatedinafewhours…theycanrangeinsizefromMBtoTB
Scenario2:EPluribusUnumProcessingonthisorderofmagnitudecan’tbedonelocally–weneedtosend(overanetwork)toamorecapablefacility
12 6/1/16 imongaatesdotnet
BigDatavs.BigData
Don’tForget:
InstagramDataproduced/dayworldwidebymillionsofpeople
=33TB
OneBiologyexperimentatoneBeamlinebyateamofnine
scien]sts:
=119TB(PhotosystemIIX-RayStudy)
13 6/1/16 imongaatesdotnet
BigScienceDatainMo]on=ElephantFlow!IoTwatchingLOLCats=Miceflow!
14
Science Data
6/1/16 imongaatesdotnet
ElephantDatavs.MiceDataBehavior
15 6/1/16 imongaatesdotnet
ScienceDataTransferredMonthlybyESnet
16
Factorof10growthevery47months
Pt-to-ptcircuits
LHCONE(T1-T1/2)traffic
Availableath;ps://my.es.net/network/traffic-volume
6/1/16 imongaatesdotnet
Trafficgrowthatblisteringrates
6/1/1617
10000
10
1
100
1000
Petabytesp
erM
onth
European
Observed Projected
43 PB/Month in April 2016 Run2 is Accelerating
the Growth
Projected Traffic Reaches
1 Exabyte Per Month. by ~2020
10 EB/Mo. by ~2024
SlidefromHarveyNewman
imongaatesdotnet
Superfacility:interconnec]onofmul]plefacili]esviathenetwork
ResearchersfromBerkeleyLabandSLACconductedproteincrystallographyexperimentsatLCLStoinves]gatephotoexcitedstatesofPSII,withnear-real-]mecomputa]onalanalysisatNERSC.
18
“Takingsnapshotsofphotosynthe8cwateroxida8onusingfemtosecondX-raydiffrac8onandspectroscopy,”NatureCommunica:ons5,4371(9July2014)
6/1/16 imongaatesdotnet
Agenda
6/1/16 imongaatesdotnet19
BigScienceData
GlobalScienceCollabora8ons
NDNforScience
UseCase#1
6/1/16 imongaatesdotnet20
ResearchersfromBerkeleyLabandSLACconductedproteincrystallographyexperimentsatLCLStoinves]gatephotoexcited
statesofPSII,withnear-real-]mecomputa]onalanalysisatNERSC.
“Takingsnapshotsofphotosynthe8cwateroxida8onusingfemtosecondX-raydiffrac8onandspectroscopy,”NatureCommunica:ons5,4371(9July2014)
50TBmovedanight
UseCase#2:LHCONEdata–mul]plereplicas,globalreach
LHCRun2=300PBofdata,1EBin2018
6/1/16 imongaatesdotnet21
UseCase3:WorldwideEarthSystemGridFedera]onSites
6/1/16 imongaatesdotnet22
UseCaseGalore
#4-LCLS:DatacomingfromChile,StoredinNCSA,andanalyzedamongaglobalcollabora8on
#5-SKA:DatacomingfromSouthAfricaandAustralia,analyzedamongaglobalcollabora8on
#6-Bio-Health,PrecisionMedicine,Genomics:Open-datatrend,data-setsavailableatmanywebsites.
6/1/16 imongaatesdotnet23
ESnet
NewMath
Real-]meanalysis
HighperformanceSolware
Novelcompute/
dataplamorms
Datamgmt.andsharing
Program-mablenetwork
Extreme Data Science Facility
(XDSF)
MS-DESI
ALS
LHC
JGI
APS
LCLS
Other data-producing sources
-24-
SuperfacilityVision:Anetworkofconnectedfacili]es,solwareandexper]setoenablenewmodesofdiscovery
6/1/16 imongaatesdotnet
Agenda
6/1/16 imongaatesdotnet25
BigScienceData
GlobalScienceCollabora8ons
NDNforScienceThankstoChristosPapadopoulos,SusmitShannigrahi
High-levelobjec]vesforscien]ficdata:alignmentwithNDNapproach
6/1/16 imongaatesdotnet26
• Abstractthestorageandnetworkcapabilityandloca8ondependencefromtheuser-datainterac]on
• Enabletheabilityforuserstospecifyandretrievepor]onsofdatatheworkflowneeds
• Radicallysimplifyhowscien8ficusersmanage,moveandmanipulatelarge,distributed,sciencedatarepositories,butwithhigh-throughputend2end
• Createasecure,scalableframeworkbasedonintegrateddatamanagementandnetworktransport
6/1/16 imongaatesdotnet27
Challenge#1:NamingandDataDiscovery
DataDiscoveryUI
6/1/16 imongaatesdotnet28
DataDiscoveryUI
• Threeintui8vewaystosearchscien8ficdata– Auto-complete,namecomponentbasedsearch,andtreeview
• Canworkwithanyhierarchicaldatasets– Wehavetwoinstances,forclimateandHEPdata
• Providesmetadatabrowsing,subseqng,stagingcapabili8es
6/1/16 imongaatesdotnet29
Challenge#2:Subseongofdata
6/1/16 imongaatesdotnet30
Subseong
• NDNnameseasilyextendtosupportsubseqng
• Addqueryparametersasaencodednamecomponent
• ServicescanparseInterestnameandperformintendedac8on– Retrievala*ersubse.ngismucheconomicalthanSubse.nga*erretrieval
6/1/16 imongaatesdotnet31
Challenge#3:Highperformanceend-2-end
6/1/16 imongaatesdotnet32
MetroArea
Local(LAN)
RegionalCon8nental
Interna8onal
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
NDNwithOSCARS
• SomeDatatransfersrequirehighbandwidthreservedpaths
• WehaveintegratedaNDNstrategywithOSCARS– AdataretrievalManagerexpressesspecialInteresttostrategylayer
– StrategycommunicateswithOSCARStoreserveapath– Interest/Dataexchangeusesthenewlycreatedpath
• Fullytransparenttotheapplica8on
Retrieval Decision Manager
NDN Strategy
Data Requester
Data Source 1
Data Source 2
Reservation capable face(s)
Express retrieval requirements to Decision Manager
Consult Strategy: What are the options?
Explore and compile options
Inform strategy of decision Tell client when to start
Return options
NDNforIntelligentDataTransfers
• Strategyforlargescien8ficdatatransfers
• RetrievalManagerqueriesnetworkforop8ons• Makesadecision,informsstrategy• Tellsclienttostartretrieval
6/1/16 imongaatesdotnet34
Retrieval Decision Manager
NDN Strategy
Data Requester
Data Source 1
Data Source 2
Reservation capable face(s)
1. Consult Strategy: What are the options?
2. Response: Reservation
Reservation Manager
(OSCARS)
4. Inform strategy on successful reservation 5. Strategy sets path priority for that interest
3. Setup reservation
6/1/16 imongaatesdotnet35
RoadmapforNDNExperimenta]on
6/1/16 imongaatesdotnet36
• DataSubseqngforclimate
• SusmitS.,ChristosP.,JohnWu,AlexSim
SummerInternship
2012
• DeployNDNnodes
• ConfigureNDNoverlay
CC-NIEcollabora8onwithCSU2013
• PresentpaperatCHEP
• FundingforFermi/Caltech
NDNforHEP2015
• ICNWG–ESGFfocus
• High-speeddatatransfer,mul8plereplicas
ExpandClimateTestbed20xx?
• FromLabtoWAN
• xRootDintegra8on
HEPTestbedcombine20xx?
• InfrastructureawareNDN
• ForwardingStrategies
High-Throughput
NDN20xx?
Manyunprovenques]onss]ll…
• Whereisthecomplexitybeingpushedto,andwhatneedstobedonetomanagethat?– Fromthescien8sttothenetwork
• Howcananetworkoperatormaintain,automateandopera8onallymanagethatcomplexity?
• Thinkthroughthefailuremodels• Thinkthroughperformancemodels
• Howdoesthisworkorcompetewithso_warescien8stshavealreadybuilttomanagetheirdata–what’sthebestwaytointegrateand/ormigrate?
6/1/16 imongaatesdotnet37
Thefutureisnewdatascien]sts!
17-year-oldBri;anyWegnercreatesbreastcancerdetec:ontoolthatis99%accurateonaminimallyinvasive,previouslyinaccuratetest.MachineLearning+OnlineData+CloudCompu]ng
6/1/16 imongaatesdotnet38
Thankyou!
Imongaatesdotnet
6/1/16 imongaatesdotnet39