Downloading, Processing, and Archiving a...
Transcript of Downloading, Processing, and Archiving a...
Downloading, Processing, and Archiving a Petabyte-Sized1 Data Set on the Cheap
Kenneth P. Bowman - Atmospheric Sciences
1
NEXRAD: U.S. Operational Weather Radar Network
~140 sites in contiguous U.S.
Using ~100 sites east of Rocky Mountains
KYUX
KLTX
KICT
KSRX
KVBX
KVNX
KINX
KEMX
KTWX
KTBW
KTLH
KLWX
KCCX
KLSX
KSGF
KOTX
KLIX
KFSD
KSHV
KATX
KSOX
KHNX
KMUX
KNKX
KSJT
KMTX
KDAX
KJGX
KFCX
KRIW
KRGX
KUDX
KRAX
KDVN
KPUX
KRTX
KGYX
KSFX
KEAX
KPBZ
KIWA
KDIX
KPDT
KPAH
KOAXKIWX
KHTX
KLNX
KTLX
KAKQ
KAPX
KOHXKMHX
KVAX
KMOB
KMSXKMBX
KMPX
KMKX
KMAF
KAMX
KNQA
KMLB
KMAX
KMXX
KMQT
KLBB
KLVX
KVTX
KLZK
KILX
KDFX
KESX
KLCH
KARX
KMRX
KBYX
KJAX
KDGX
KJKL
KIND
KHGX
KHDX
KGSP
KGRB
KTFX
KGRR
KGJX
KUEX
KGLD
KGGW
KEOX
KPOEKGRK
KTYX
KHPX
KFSX
KMVX
KVWX
KBHX
KLRX
KEPZ
KEVX
KEYX
KDYX
KDLH
KDOX
KDDC
KDTX
KDMX
KFTG
KFWS
KCRP
KGWX
KCAE
KCLE
KILN
KLOTKCYS
KRLX
KCLX
KICX
KCBW
KFDX
KCXX
KBUF
KBRO
KOKX
KBOX
KCBX
KBIS
KBMX
KBGM
KBLX
KBBX
KEWX
KFFC
KAMA
KFDR
KABX
KENX
KABR
KLGX
VCP12 Coverage4,000 ft above ground level*
6,000 ft above ground level*
10,000 ft above ground level*
0 250 500 750125 Miles
* Bottom of beam height (assuming Standard Atmospheric Refraction).Terrain blockage indicated where 50% or more of beam blocked.
NEXRAD Coverage Below 10,000 Feet AGL
1Uncompressed
Individual Radar Volume Scans (Level 2)• Spherical grid – azimuth, elevation, range• Weather-dependent scan modes – 4.5 to 10 min per scan• Merging into 3-D rectangular grids for analysis (Level 3)• Warm-season files are larger and more frequent• ‘Legacy’ resolution (pre-2008)
• 1° x (5 to 14 elevations) x 1 km out to 460 km• ~200 kB compressed, 6 MB uncompressed (x 30)
• Super-resolution (2008 – present)• 0.5° x (5 to 14 elevations) x 0.25 km out to 460 km• ~1.8 MB compressed, 15 MB uncompressed
• Dual polarization (2010 – present)• Added polarization hardware• ~8 MB compressed, 43 MB uncompressed
2
10 years of Data from NOAA NCEI1
• 2004 – 2008 (‘legacy’ resolution)• 12 months: ~5 TB/yr comp., ~150 TB/yr uncomp.
• 2009 – 2010 (super-resolution - software)• 12 months: ~11 TB/yr comp., ~300 TB/yr uncomp.
• 2011 – present (added polarization variables)• 4 months: ~6 to 14 TB/yr comp., ~180 to 450 TB
uncomp.• Total 2004 – 2013: ~80 TB • Data must be ordered and staged from tape – slow and labor
intensive!• Max download speed ~1 MB/s x 4 – slow!• 2+ months/per year of data to download• Hourly, 3-D gridded data for 10 years is <1 TB (compressed)
31National Centers for Environmental Information (formerly NCDC)
Local Storage w/ Pod Storage System• Low cost, open-source hardware storage devices developed
by Backblaze (Mac online backup service)
• 45-drive, 4U rack mount NAS (~$5k w/o drives)
• Not hot-swappable
• One 180 TB (raw) device w/ 4 TB drives
• One 270 TB (raw) device w/ 6 TB drives (~$17k, $63/TB)
• Configured to run FreeNAS
• ZFS RAID-Z2 volumes
• 50 to 80 MB/s read speeds
• Access via NFS
• Process on local Linux cluster
4
Amazon Web Services to the Rescue!• 2015 – NOAA Big Data Contracts with multiple cloud
service providers
• AWS now has entire 1991 – present NEXRAD archive online, including a real-time data feed
• 270 TB (180 million files)
• Accessible with AWS SDKs
• Download speed ~50 MB/s
• For future will evaluate
• Processing the data with AWS cloud computing services
• Providing our data products in the cloud
5