Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

20
Slide 1 Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle

Transcript of Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Page 1: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 1Slide 1

UCSC 100 Gbps Science DMZ –1 year 9 month UpdateBrad Smith & Mary Doyle

Page 2: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 2

Goal 1 - 100 Gbps DMZ - Complete!

Page 3: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 3

Goal 2 – Collaborate with users to use it!

• MCD Biologist doing brain wave imaging

• SCIPP analyzing LHC ATLAS data

• HYADES cluster doing Astrophysics visualizations

• CBSE Cancer Genomics Hub

Page 4: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 4

Exploring mesoscale brain wave imaging dataJames AckmanAssistant ProfessorDepartment of Molecular, Cell, & Developmental BiologyUniversity of California, Santa Cruz

1. Record brain activity patterns 2. Analyze cerebral connectivity

• local computing • external computing

Science DMZ

• Acquire 60 2.1GB TIFF images/day (120 GB/day total).• Initially transfer 20 Mbps = 12-15 mins/TIFF = 15hrs/day!• With Science DMZ 354 Mbps = 1min = 1hr/day!• Expected to grow 10x over near term

Page 5: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 5

Ryan [email protected]

Santa Cruz Institute for Particle Physics

SCIPP Network Usage for Physics with ATLAS

Page 6: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 6

T. Rex

Humans(for scale)

ATLAS is a 7 story tall, 100 megapixel “camera”• taking 3-D pictures of proton-proton collisions 20 million

times per second,• saving 10 PB of data per year.

proton beam

p+

p+

Tracker

Muon SpectrometerCalorimeter

collision point

ATLAS Detector

Page 7: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 7

Data Volume

• LHC running 2009-2012 produced ~ 100 PB– Currently ~10 PB/year

• SCIPP process and skim that on the LHC computing grid, and bring ~10 TB of data to SCIPP each year.– 12hr transfer time impacts ability to provide input for next experiment

• Expect ≈ 4 times the data volume in the next run 2015-2018.

• Our bottleneck is downloading the skimmed data to SCIPP.

• Current download rate ~ few TB every few weeks.

Page 8: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 8

Throughput 1 Gbps – 400 Mbps

Dell 6248 Switch (2007)

atlas01 (headprv) atlas02 (int0prv) atlas03 (nfsprv) atlas04 (int1prv)

wrk0prv wrk1prv wrk2prv ... wrk7prv

XROOTD

nfs

public network

private network

XROOTDdata-flow

NFSdata-flow

public-private network bridge

128 CPUs

1 Gb1 Gb1 Gb

campus network

1 Gb

downloadingfrom grid

users

1 Gb

≈20 TB

≈20 TB

Page 9: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 9

Dell 6248 Switch (2007)

atlas01 (headprv) atlas02 (int0prv) atlas03 (nfsprv) atlas04 (int1prv)

wrk0prv wrk1prv wrk2prv ... wrk7prv

XROOTD

nfs

public network

private network

XROOTDdata-flow

NFSdata-flow

public-private network bridge

128 CPUs

10 Gb10 Gb1 Gb

campus network

10 Gb

downloadingfrom grid

users

1 Gb

≈20 TB

≈20 TB

Throughput 10 Gbps – 400 Mbps?!

Page 10: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 10

Dell 6248 Switch (2007)

atlas01 (headprv) atlas02 (int0prv) atlas03 (nfsprv) atlas04 (int1prv)

wrk0prv wrk1prv wrk2prv ... wrk7prv

XROOTD

nfs

public network

private network

XROOTDdata-flow

NFSdata-flow

public-private network bridge

128 CPUs

10 Gb10 Gb1 Gb

campus network

10 Gb

downloadingfrom grid

users

1 Gb

≈20 TB

≈20 TB

10 Gb

Offload Dell Switch – 1.6 Gbps With help from ESNet!

Page 11: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 11

SCIPP Summary

• Quadrupled throughput– Reduce download time from 12 hrs to 3 hrs

• Still long ways from 10 Gbps potential– ~30mins (factor of 8)

• Probably not going to be enough for new run– ~4x data volume

• Possible problems– Atlas03 storage (not enough spindles)– WAN or protocol problems– 6 year old Dell switch– Investigating GridFTP solution and new LHC data access node from SDSC

• We are queued up to help them when they’re ready…

Page 12: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 12

Hyades

• Hyades is an HPC cluster for Computational Astrophysics

• Funded by a $1 million grant from NSF in 2012

• Users from departments of Astronomy & Astrophysics Physics Earth & Planetary Sciences Applied Math & Statistics Computer Science, etc

• Many are also users of national supercomputers

Page 13: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 13

Hyades Hardware

• 180 Compute Nodes• 8 GPU Nodes• 1 MIC Node• 1 big-memory Analysis Node• 1 3D Visualization Node• Lustre Storage, providing 150TB of scratch space• 2 FreeBSD Files Servers, providing 260TB of NFS space• 1 PetaByte Cloud Storage System, using Amazon S3 protocols

Page 14: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 14

Page 15: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 15

Data Transfer

• 100+ TB between Hyades and NERSC

• 20 TB between Hyades and NASA Pleiades; in the process of moving 60+ TB from Hyades to NCSA Blue Waters

• 10 TB from Europe to Hyades

• Shared 10 TB of simulation data with collaborators in Australia, using the Huawei Cloud Storage

Page 16: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 16

Remote Visualization

• Ein is a 3D Visualization workstation, located in an Astronomy office (200+ yards from Hyades)

• Connected to Hyades via a 10G fiber link

• Fast network enables remote visualization in real time:– Graphics processing locally on Ein– Data storage and processing remotely, either on Hyades or on NERSC

supercomputers

Page 17: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 17

CBSE CGHub

• NIH/NCI archive of cancer genomes• 10/2014 - 1.6PB of genomes uploaded• 1/2014 – 1PB/month downloaded(!)

• Located at SDSC… managed from UCSC

• Working with CGHub to explore L2/“engineered” paths

Page 18: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 18

Innovations…

• “Research Data Warehouse”– DTN with long-term storage

• Whitebox switches– On chip packet buffer – 12 MB– 128 10 Gb/s SERDES... so 32 40-gig ports– SOC… price leader, uses less power– Use at network edge

Page 19: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 19

Project Summary

• 100 Gbps Science DMZ completed

• Improved workflow for a number of research groups

• Remaining targets– Extend Science DMZ to more buildings– Further work with SCIPP… when they need it– L2 (“engineered”) paths with CBSE (genomics)– SDN integration

• Innovations– “Research Data Warehouse” - DTN as long-term storage– Whitebox switches

Page 20: Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.

Slide 20

Questions?

Brad Smith

Director Research & Faculty Partnerships, ITS

University of California Santa Cruz

[email protected]