Overview Introduction Missions Evolution Philosophy Software Components and Subsystems: > RDBMS >...
-
Upload
kristin-mcgee -
Category
Documents
-
view
216 -
download
0
Transcript of Overview Introduction Missions Evolution Philosophy Software Components and Subsystems: > RDBMS >...
Overview
IntroductionMissionsEvolutionPhilosophySoftwareComponents and Subsystems:
> RDBMS> Ingest> VDC/Scheduler> Distribution
Scientific SupportBrowser
Ocean Data Processing System (ODPS)
Introduction
The ODPS is an automated data system that provides ingest, processing, archive, and distribution functions for legacy, operational, and future remote-sensing satellite missions.
Legacy Missions:
> CZCS Oct 1978 – Jun 1986
> OCTS Nov 1996 – Jun 1997
Operational Missions:
> Aqua-MODIS Jul 2002 -
> MERIS Mar 2002 -
> SeaWiFS Sep 1997 -
> Terra-MODIS Feb 2000 -
Future Missions:
> Aquarius
> Glory
> NPP VIIRS
Evolution
Originally developed between 1991 and 1996 to support SeaWiFS
Support for OCTS added in 1996
Delivered to MODIS project to serve as the MODIS Emergency Backup System (MEBS) in 1997
Complete system redesign and rewrite 2003-2004
Delivered to GISS in 2008 to support Glory mission
Multiple evolutionary cycles in response to changes in hardware infrastructure and support-function requirements
> Began on early multi-processor SGI IRIX systems> Ported to Linux in 2000> Processing concurrency increased from 30 to over 500> Distribution functions added in 2004> Storage evolution> Validation targets
Philosophy
Adaptive framework that allows any standalone program to be incorporated as a system job
Loosely coupled, modular subsystems> Ease of maintenance> Development and testing alongside production> Subsystem swapping
Standardized coding practices minimize impact of operating-system upgrades> SGI IRIX to Linux> 32-bit to 64-bit> Strict GSFC IT requirements necessitate more-frequent OS updates
Software lifecycle of requirements analysis, rapid-prototype development, and refinement allows new concepts to be quickly developed and adopted for operational use
> Data subscriptions and orders
Ingest and Distribution Statistics
ODPS currently manages over 20 million files in its archive, about 1.06 petabytes
Daily ingests:576 MODIS-L0 granules, 120 GB (60 GB each for Aqua and Terra) 2 SeaWiFS recorder dumps, 200 MB each 2-3 SeaWiFS HRPT (direct broadcast) passes, 50 MB each 5-6 MERIS-L1 granules, 1 GB each
Distribution (Oct 2010):978 orders; 650,786 files; 5.2 TB473 active subscriptions; 576,346 files staged
Proprietary Software
RDBMS
Sybase Adaptive Server Enterprise 15.0.3
Sybase Open Client CT Library
Sybase Transact-SQL
Processing
IDL (limited use)
Open Source Software
Framework
GCC 4.x
Perl 5
Perl DBI module with Sybase driver
OpenMotif 2.x
Bash
Image Generation
GMT
ImageMagick
NetPbm
Octave
Version Control
Subversion
Software
Subsystems
VDC/Scheduler
DataAcquisition
andIngest
ArchiveDevice
Manager
DataDistribution
RDBMS
FileManagement
andMigration
Level-3Scheduler
Primary element that manages all system activity
Core databases support generic system framework, data ingest, processing, file management, and distribution functions
Mission databases house mission-specific data and procedures
High level of reuse possible for similar missions, i.e. MODIS Aqua/Terra, SeaWiFS, and OCTS are ocean-color missions and have similar product suites, data flows, and processing requirements
Database and transaction-log dumps performed regularly and stored in three different locations
Clone of database-server hardware and OS maintained as a warm backup
Components and Subsystems: RDBMS
Components and Subsystems: RDBMS
Admin Catalog Dataflow Processing
Generic Core Databases
MODISAqua
MODISTerra
OCTS SeaWiFS
Mission-Specific Databases
NewMission
CZCS Aquarius VIIRS
Components and Subsystems: RDBMS
RDBMS
Vendor Client LibraryVendor Library Module
Database Services Layer
C Interface FunctionsPerl DBI Module
PerlScripts
CPrograms
Goal: Isolate RDBMS from system software
To use a differentRDBMS vendor, swap
in a new DatabaseServices Layer
Data types and sources are described in the database
Active, passive, and periodic notification methods> Active method scans remote systems for new files> Passive method handles messages for new files> Periodic method schedules transfers of files at specified intervals
File transfers performed by ingest daemons and scheduler tasks
FTP, RCP, SCP, SFTP, and HTTP transfer protocols supported
Generic file transfer process hands off to data-specific post-transfer scripts
Subsystems: Ingest
Ingest: Flowchart
Visual Database Cookbook (VDC)> Prototype developed in 1991> Four separate programs> Originally a distributed model
Runs in a daemon-like state on each server on which processing or supporting jobs need to run
Two main functions:
Task Scheduler – Run high-level jobs (tasks) that support a variety of system functions
Processing Engine – Run processing streams, typically scientific programs, sequenced into steps such as L0->L1, L1->L2, etc
Greedy client model adapted in 2004
Unification of task scheduler and processing engine in 2007
Subsystems: VDC/Scheduler
VDC Function: Scheduler
Primary system element responsible for coordinating most of the system activity
Monitors task records in a to-do list database table and runs tasks according to defined attributes
> Manual> Periodic> Timed> Triggered
Standard job-shell interface allows new programs to be quickly adapted for Scheduler control
Tasks may be bound to specific hosts or claimed by any available host in the processing group
VDC Function: Scheduler
VDC/Scheduler
DailyTask
Scheduler
TaskShell
Tasksfor thecurrent day
DailyTasks
To-doList User input via
SCHEDMON GUI
SCHEDMON GUI
VDC Function: Processing Engine
Scalable infrastructure for concurrent processing of serial streams (e.g. L0 -> L1A -> L1B -> L2)
Each instance of the VDC Engine actively competes for jobs that it is allowed to run based on priority, length of time in the queue, and processing weight
Uses recipes to encapsulate data-specific processing schemes, parameters, and pre-processing rules
Virtual Processing Units (VPUs) serve as distinct processing resources and are allocated based on available time, current OS load, and processing weight
Comprehensive processing priorities allow high-priority real-time data to be handled ahead of lower-priority processing
Standard job-shell interface allows new scientific programs to be quickly adapted as recipe steps
Captures system boot time and monitors OS load
Invokes recipe steps and monitors step-execution time
Handles operator-requested stream actions
Performs flushing operations on completed tasks and streams
VDC Function: Processing Engine
Runs in a daemon-like state
Polls jobs in the processing queue and runs the pre-processing rule procedures
Promotes job status when all rule procedures complete successfully
Governed by currently configured processing priorities
Primarily used for matching proper ancillary data with granules in the processing queue
VDC: Rule Manager
Polls processing queue for jobs that have met pre-processing requirements
Generates VDC job files from recipe templates according to configured priorities and populates the VDC queue
Runs as a Scheduler task, so it can easily be configured to run as often as needed to keep the VDC queue full
VDC: MakeVDC
VDC: Flowchart
VDCMON GUI
Interactive, web-based Data Ordering System, currently supporting Aqua and Terra MODIS, CZCS, OCTS, SeaWiFS
Data Subscription System, currently supporting Aqua and Terra MODIS and SeaWiFS, allows users to define region and products of interest
Order and Subscription Manager daemons poll the order and subscription queues and stage files on FTP servers (stage rate ~12 GBs / hr)
Near-real-time data extraction and image support
Web-CGI applications that allow users to view and update their orders and subscriptions
Subsystems: Distribution
OrderManager
Users
SubscriptionManager
Extractionand
MappingRecipe
Local DistributionServers
Users
Data and images optionally pushed to users
RegionalExtractionand MapRequests
DataSub-
scriptions
DataOrders
Distribution: Flowchart
Scientific Support
24/7 operational support for forward-stream processing
> 9-to-5 staffing
> Extended lights-out periods
> No unscheduled down time in past year due to system-software faults
Support algorithm/calibration testing alongside production
> Product suites
> Test recipes
> Alternate tags in science-software repository
> Processing priorities
Non-standard processing requests
> Regional L3 processing
> Great Barrier Reef research
> Mozambique Whale Shark research
> GMT Intermediate Coastline
> Aquarius Simulation
OceanColor Web
oceancolor.gsfc.nasa.go
v
Consolidated data access, information,services andcommunityfeedback
OceanColor Web
oceancolor.gsfc.nasa.go
v
Consolidated data access, information,services andcommunityfeedback
OceanColor Web
oceancolor.gsfc.nasa.go
v
Consolidated data access, information,services andcommunityfeedback