The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani...
-
Upload
jeanette-shira -
Category
Documents
-
view
221 -
download
0
Embed Size (px)
Transcript of The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani...


The Key PlayersThe Key Players
Maria Nieto-Santisteban (JHU)Maria Nieto-Santisteban (JHU) Ani Thakar (JHU)Ani Thakar (JHU) Alex Szalay (JHU)Alex Szalay (JHU) Jim Gray (Microsoft)Jim Gray (Microsoft) Catherine van Ingen (Microsoft)Catherine van Ingen (Microsoft)

What is Pan-STARRS?What is Pan-STARRS? Pan-STARRS - a new telescope facilityPan-STARRS - a new telescope facility 4 smallish (1.8m) telescopes, but with 4 smallish (1.8m) telescopes, but with
extremely wide field of viewextremely wide field of view Can scan the sky rapidly and repeatedly, Can scan the sky rapidly and repeatedly,
and can detect very faint objectsand can detect very faint objects Unique time-resolution capabilityUnique time-resolution capability
Project was started by IfA with help from Project was started by IfA with help from Air Force, Maui High Performance Air Force, Maui High Performance Computer Center, MIT’s Lincoln Lab and Computer Center, MIT’s Lincoln Lab and Science Applications International Corp. Science Applications International Corp. SAIC has dropped out & the JHU SAIC has dropped out & the JHU database team has joined.database team has joined.

The PS-4 Telescope Array The PS-4 Telescope Array ConceptConcept

The PS1 Prototype – Walk The PS1 Prototype – Walk before you run!before you run!
Pan-STARRS pushes 4 areas of Pan-STARRS pushes 4 areas of technology: wide-field imaging telescope, technology: wide-field imaging telescope, large format CCD mosaic camera, high large format CCD mosaic camera, high throughput image processing pipeline, & throughput image processing pipeline, & data-intensive database server.data-intensive database server.
We were advised to build a functional We were advised to build a functional prototype, PS1, to test and integrate prototype, PS1, to test and integrate these new approaches.these new approaches.
The prototype, PS1, is now nearing The prototype, PS1, is now nearing operational readiness on Haleakala, operational readiness on Haleakala, Maui.Maui.

The PS1 Science The PS1 Science ConsortiumConsortium
University of Hawaii, Institute for AstronomyUniversity of Hawaii, Institute for Astronomy Max Plank Society, Institutes in Garching & Max Plank Society, Institutes in Garching &
HeidelbergHeidelberg Harvard-Smithsonian Center for Astrophysics Harvard-Smithsonian Center for Astrophysics Las Cumbres Observatory Global Telescope Las Cumbres Observatory Global Telescope
NetworkNetwork Johns Hopkins University, Department of Johns Hopkins University, Department of
Physics and AstronomyPhysics and Astronomy University of Edinburgh, Institute of AstronomyUniversity of Edinburgh, Institute of Astronomy Durham University, Extragalactic Astronomy & Durham University, Extragalactic Astronomy &
Cosmology Research GroupCosmology Research Group Queen’s University Belfast, Astrophysics Queen’s University Belfast, Astrophysics
Research CenterResearch Center National Central University, TaiwanNational Central University, Taiwan

PS1 Key Science ProjectsPS1 Key Science Projects Population of objects in the inner solar systemPopulation of objects in the inner solar system Population of objects in the outer solar system (beyond Population of objects in the outer solar system (beyond
Jupiter)Jupiter) Low mass stars, brown dwarfs, & young stellar objectsLow mass stars, brown dwarfs, & young stellar objects Search for exo-planets by stellar transitsSearch for exo-planets by stellar transits Structure of the Milky Way and Local GroupStructure of the Milky Way and Local Group Dedicated deep survey of M31Dedicated deep survey of M31 Massive stars and SN progenitorsMassive stars and SN progenitors Cosmology investigations with variables and explosive Cosmology investigations with variables and explosive
transientstransients Galaxy propertiesGalaxy properties Active galactic nuclei and high redshift quasarsActive galactic nuclei and high redshift quasars Cosmological lensingCosmological lensing Large scale structureLarge scale structure

PS1 Observatory on Haleakala PS1 Observatory on Haleakala Telescope and Camera operational by Telescope and Camera operational by
interactiveinteractive or queue control or queue control

1.4 Gigapixel Camera Assembly 1.4 Gigapixel Camera Assembly with L3 Corrector Lens as Dewar with L3 Corrector Lens as Dewar
WindowWindow

Gibbous Moon
1millisec exposure

M31 M31 Poster Poster at the at the JanuarJanuary 2008y 2008
AAS AAS MeetiMeeti
ngng

M51M51

Astronomy Is Happening Astronomy Is Happening Now!Now!
The project is not yet to the The project is not yet to the Operational Readiness Review Operational Readiness Review (November 2008) but data taken (November 2008) but data taken with PS1 and processed through the with PS1 and processed through the system has been used to:system has been used to: Discover brown dwarf candidatesDiscover brown dwarf candidates Discover new asteroidsDiscover new asteroids Monitor one of the medium deep target Monitor one of the medium deep target
fields for supernovae.fields for supernovae.

What is the PSPS?What is the PSPS?The Published Science Products Subsystem The Published Science Products Subsystem
of Pan-STARRS will:of Pan-STARRS will: Provide access to the data products generated Provide access to the data products generated
by the Pan-STARRS telescopes and data by the Pan-STARRS telescopes and data reduction pipelinesreduction pipelines
Provide a data archive for the Pan-STARRS Provide a data archive for the Pan-STARRS data productsdata products
Provide adequate security to protect the Provide adequate security to protect the integrity of the Pan-STARRS data products & integrity of the Pan-STARRS data products & protect the operational systems from malicious protect the operational systems from malicious attacks.attacks.

PSPS Design Driving PSPS Design Driving RequirementsRequirements
Hold over 1.5x10Hold over 1.5x101111 detections and their supporting detections and their supporting metadata for ~ 5.5x10metadata for ~ 5.5x109 9 objects.objects.
Support ~ 100 TBytes of disk storage on hardware that Support ~ 100 TBytes of disk storage on hardware that is > 99% reliableis > 99% reliable
Serve as an archive for the Pan-STARRS data productsServe as an archive for the Pan-STARRS data products Provide security for the data stored within the system, Provide security for the data stored within the system,
both against accidental and intentional actions.both against accidental and intentional actions. Provide users access to the data stored in the system, Provide users access to the data stored in the system,
and the ability to search it.and the ability to search it. Hold sufficient metadata to allow users to determine the Hold sufficient metadata to allow users to determine the
observational legacy and processing history of the Pan-observational legacy and processing history of the Pan-STARRS data products.STARRS data products.
The PSPS baseline configuration should accommodate The PSPS baseline configuration should accommodate future additions of databases (i.e., be expandable).future additions of databases (i.e., be expandable).

What is PSPS?What is PSPS? From the PS1 System View From the PS1 System View
PS1 PSPS will not receive PS1 PSPS will not receive image files, which are image files, which are retained by IPPretained by IPP
Three significant PS1 I/O Three significant PS1 I/O threads:threads:
Responsible for managing Responsible for managing the catalogs of digital the catalogs of digital datadata Ingest of detections Ingest of detections
and initial celestial and initial celestial object data from IPPobject data from IPP
Ingest of moving object Ingest of moving object data from MOPSdata from MOPS
User queries of User queries of detection/object data detection/object data recordsrecords

Image Processing
Pipeline(IPP)
Moving Object Processing
System(MOPS)
Solar System Data Manager
(SSDM)
Object Data Manager(ODM)
Web-Based Interface
(WBI)
Data Retrieval Layer(DRL)
End Users
Detection Records
Rec
ord
s
Rec
ord
s
Gigapixel Camera
Images
Pho
tons
Telescope
Published ScienceProducts Subsystem
(PSPS)

PSPS ComponentsPSPS ComponentsOverview/TerminologyOverview/Terminology
DRL: Data Retrieval DRL: Data Retrieval LayerLayer Software clients, Software clients,
not humans, are not humans, are PDCsPDCs
Connects to DMsConnects to DMs PDC: Published Data PDC: Published Data
ClientClient WBI: Web Based WBI: Web Based
InterfaceInterface External PDCs External PDCs
(non-PSPS)(non-PSPS) DM: Data Manager DM: Data Manager
(generic)(generic) ODM: Object Data ODM: Object Data
ManagerManager SSDM: Solar SSDM: Solar
System Data System Data ManagerManager
WBI
PublishedData Client
DRL
Standard User APIAdministrator API
Data Manager API
science data
interfacecontract
interfacedependency
Legend
ODMSSDM
MOPS IPP
Pan-STARRSSubsystem
PSPSComponent
FutureComponent
DataManager
PSPS-IPP InterfacePSPS-MOPS Interface
PSPS
metadata, detections raw
science data
science data
IDs
PreferredScience Client(Data Provider)
FuturePan-STARRS
Subsystem
PublishedData Client
NonPan-STARRS
System
Data Manager API Data Manager API

Prototype ODM Prototype ODM StructureStructure
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)

ODM ComponentsODM Components
Query Manager (QM)
Workflow Manager (WFM) Cluster Manager (CLM)
PS1 ODM Database
Perform
ance Monitor


PS1 Schema PS1 Schema RelationshipsRelationships

Detailed DesignDetailed Design
Reuse SDSS software as much as possibleReuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to Data Transformation Layer (DX) – Interface to
IPPIPP Data Loading Pipeline (DLP)Data Loading Pipeline (DLP) Data Storage (DS)Data Storage (DS)
Schema and Test QueriesSchema and Test Queries Database Management SystemDatabase Management System Scalable Data ArchitectureScalable Data Architecture HardwareHardware
Query Manager (QM: CasJobs for prototype)Query Manager (QM: CasJobs for prototype)

Data Storage – DBMSData Storage – DBMS Microsoft SQL Server 2005Microsoft SQL Server 2005
Relational DBMS with excellent query optimizerRelational DBMS with excellent query optimizer PlusPlus
Spherical/HTM (C# library + SQL glue)Spherical/HTM (C# library + SQL glue) Spatial index (Hierarchical Triangular Mesh)Spatial index (Hierarchical Triangular Mesh)
Zones (SQL library)Zones (SQL library) Alternate spatial decomposition with Alternate spatial decomposition with decdec
zoneszones Many stored procedures and functionsMany stored procedures and functions
From coordinate conversions to neighbor From coordinate conversions to neighbor search functionssearch functions
Self-extracting documentation (metadata) and Self-extracting documentation (metadata) and diagnosticsdiagnostics

Data Storage – Data Storage – Scalable ArchitectureScalable Architecture Monolithic database design (a la SDSS) will not do itMonolithic database design (a la SDSS) will not do it SQL Server does not have cluster implementationSQL Server does not have cluster implementation
Do it by handDo it by hand Partitions vs SlicesPartitions vs Slices
Partitions are file-groups on the same serverPartitions are file-groups on the same server Parallelize disk accesses on the same machineParallelize disk accesses on the same machine
Slices are data partitions on separate serversSlices are data partitions on separate servers We use both!We use both!
Additional slices can be added for scale-outAdditional slices can be added for scale-out For PS1, use SQL Server Distributed Partition Views For PS1, use SQL Server Distributed Partition Views
(DPVs)(DPVs)

Distributed Distributed ArchitectureArchitecture
The bigger tables will be spatially partitioned The bigger tables will be spatially partitioned across servers called across servers called SlicesSlices
Using slices improves system scalabilityUsing slices improves system scalability Tables are sliced into ranges of ObjectID, Tables are sliced into ranges of ObjectID,
which correspond to broad declination rangeswhich correspond to broad declination ranges ObjectID boundaries are selected so that ObjectID boundaries are selected so that
each slice has a similar number of objectseach slice has a similar number of objects Distributed Partitioned Views “glue” the data Distributed Partitioned Views “glue” the data
togethertogether

Adding New Types of Adding New Types of Data in the ODMData in the ODM
Because of the interaction between our logical and Because of the interaction between our logical and physical schema, we do not consider it prudent to physical schema, we do not consider it prudent to arbitrarily add new types of data to the ODM.arbitrarily add new types of data to the ODM.
One area where expansion does fit naturally into One area where expansion does fit naturally into our design is the addition of new filters. These can our design is the addition of new filters. These can accommodate new detections (perhaps not even accommodate new detections (perhaps not even coming from Pan-STARRS) that cover all or part coming from Pan-STARRS) that cover all or part (e.g., Medium Deep Survey fields) of the sky. This (e.g., Medium Deep Survey fields) of the sky. This would allow including into the data tables would allow including into the data tables observations from other sources (e.g., Galex observations from other sources (e.g., Galex Extended Mission, Spitzer Warm Mission, UKIRT, Extended Mission, Spitzer Warm Mission, UKIRT, CFHT) that range from the far ultraviolet to the far CFHT) that range from the far ultraviolet to the far infrared, provided the data are formatted infrared, provided the data are formatted consistently with the ODM logical schema.consistently with the ODM logical schema.

Client DatabasesClient Databases
Client databases can be eitherClient databases can be either Standalone databases attached to the DRL (as Standalone databases attached to the DRL (as
shown in the earlier slide)shown in the earlier slide) MyDB instances attached to the ODM internal MyDB instances attached to the ODM internal
network. These are SQL Server databases withnetwork. These are SQL Server databases with Ownership by individuals, groups, or key projects/science Ownership by individuals, groups, or key projects/science
clientsclients Unidirectional (ODM to MyDB) write privilegeUnidirectional (ODM to MyDB) write privilege Bidirectional read privilegeBidirectional read privilege Table access which can be defined at the user, group, or Table access which can be defined at the user, group, or
world level, allowing selected export of resultsworld level, allowing selected export of results The ability to load data into the MyDB from outside the The ability to load data into the MyDB from outside the
ODMODM

Some Lessons LearnedSome Lessons Learned
““GrayWulf: Scalable Cluster Architecture GrayWulf: Scalable Cluster Architecture for Data Intensive Computing” submitted for Data Intensive Computing” submitted to HICCS-09 conference.to HICCS-09 conference.
Big databases are not created equal -- Big databases are not created equal -- user query patterns will dictate the data user query patterns will dictate the data storage model/architecture.storage model/architecture.
““When” matters -- PS1 has to do things When” matters -- PS1 has to do things with today’s technology & can’t count on with today’s technology & can’t count on Moore’s law. This also will affect how Moore’s law. This also will affect how much data you’ll have to deal with. much data you’ll have to deal with.

Some Lessons LearnedSome Lessons Learned
Resources are accessed byResources are accessed by End users who perform analyses on End users who perform analyses on
shared databaseshared database Data valets who maintain shared Data valets who maintain shared
databasesdatabases Operators who maintain compute & Operators who maintain compute &
storagestorage The ApproachThe Approach
““20 queries” capture science interests20 queries” capture science interests
But which set of 20 queries? Not all users will want to access the tables in the same way. However, there are clear patterns of queries that are common to all users and we have designed to implement them.

Some Lessons LearnedSome Lessons Learned
Resources are accessed byResources are accessed by End users who perform analyses on shared End users who perform analyses on shared
databasedatabase Data valets who maintain shared databasesData valets who maintain shared databases Operators who maintain compute & storageOperators who maintain compute & storage
The ApproachThe Approach ““20 queries” capture science interests20 queries” capture science interests Divide & Conquer determines partitioningDivide & Conquer determines partitioning
This is an area where our team has spent a great deal of effort. There are any possibilities available and it’s unclear which is the best. We’ve decided on a model with objects held in the main data base and detections and copies of some smaller tables in the slices. OK, then how do you choose to partition? What RAID model?

Some Lessons LearnedSome Lessons Learned
Resources are accessed byResources are accessed by End users who perform analyses on shared End users who perform analyses on shared
databasedatabase Data valets who maintain shared databasesData valets who maintain shared databases Operators who maintain compute & Operators who maintain compute &
storagestorage The ApproachThe Approach
““20 queries” capture science interests20 queries” capture science interests Divide & Conquer determines partitioningDivide & Conquer determines partitioning Faults Happen – handling must be designed Faults Happen – handling must be designed
into all data valet processesinto all data valet processes
This is a second area that has involved a great deal of design effort. In SDSS much of the work flow monitoring and error handling occurred in the loading phase – but the PS1 ODM will be loading all the time. We expect the most potential problems in the load/merge process!We’re taking a Sunny, Sticky, and Cloudy day approach to the testing and error handling implementation. Ultimately real data will define the Rainy day case – hopefully it won’t be a Cat 5 hurricane!

And FinallyAnd Finally