R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem...
-
Upload
janessa-schofield -
Category
Documents
-
view
214 -
download
0
Transcript of R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem...
![Page 1: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/1.jpg)
R3 Kickoff Meeting
Ocean Observatories Initiative
Common Execution Infrastructure (CEI) Subsystem
OOI CI System Architecture Team:
1
![Page 2: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/2.jpg)
R3 Kickoff Meeting
CEI Developers
204/18/23
2
CEI DeveloperJohn BresnahanArgonne National Lab(part-time)
CEI DeveloperPatrick ArmstrongUniversity of Chicago
CEI DeveloperPierre RiteauUniversity of Chicago(part-time)
CEI Senior DeveloperPierre RiteauUniversity of Chicago
![Page 3: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/3.jpg)
R3 Kickoff Meeting
Subsystem Purpose
• Allow OOI applications and system to– Provide Highly Available (HA)
services– Scale to demand
• Enact OOI deployment policies in elastic environment
• Provide a deployment foundation for OOI CI
3
![Page 4: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/4.jpg)
R3 Kickoff Meeting
Core System Structure: Service Layers
4
![Page 5: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/5.jpg)
R3 Kickoff Meeting
CEI Scope
• Elastic Computing Services– Implement elastic computing services to provide on-demand scaling and high
availability.
• Execution Engine Catalog & Repository Services– Working with operations and ITV to develop and refine tools to upload and sync the
different deployable type representations adapted to each site.
• Process Management Services– Provide the management services for policy-based process execution within specified
deployable types intended to support the data distribution services; as such the processes are sequential and require primarily a process to resource match.
• Process Catalog & Repository Services– The Process Catalog and Repository Services maintain process definitions as well as
lists active processes.
• Integration with the National Computing Infrastructure– Provide the capability to deploy OOI processing on the Amazon cloud services as well
as academic clouds
5
![Page 6: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/6.jpg)
R3 Kickoff Meeting
High Availability and Scaling
• High Availability– Towards an always-on service model – Failures in outsourced resources– Providing a pool of replenishable compute
resources
• Autoscaling– Provide resources for peaks in demand– Ensure good utilization during “valleys” in
demand– Flexible resource mix
04/18/23
6
![Page 7: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/7.jpg)
R3 Kickoff Meeting
Resources for HA and Scaling
04/18/23
7
EPU ManagementMonitor and regulate set properties
based on system-specific and application-specific metrics
– Cloud resources are available on-demand, but any particular resource may fail at any time
– Applications/processes can absorb new resources– Applications/processes can tolerate failures
EPU
![Page 8: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/8.jpg)
R3 Kickoff Meeting
Managing Resources
8
![Page 9: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/9.jpg)
R3 Kickoff Meeting
EE ioncore 1.3
EPU ManagementEPU ManagementEPU Management
Elastic Processing Unit (EPU) Management
9
EE ioncore 1.2
context-agent
ou-agent
EE matlab 6.1
context-agent
ou-agent
Decision Engine
context-agent
ou-agent
Provisioner
IaaS
create instance
AMQP
OtherDTRS
CB
![Page 10: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/10.jpg)
R3 Kickoff Meeting
Making the EPU HA
ou-agent ou-agent ou-agent
EPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU Worker
EPU WorkerEPU WorkerEPU Worker
Bootstrap EPU
Dedicated DEProvisioner/DTRS
IaaS
create instance
AMQP
Other
cloudinit.d
![Page 11: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/11.jpg)
R3 Kickoff Meeting
Managing Processes
![Page 12: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/12.jpg)
R3 Kickoff Meeting
Creating a Process I
12
Process Definition Registry
Process Dispatcher EE type A instanceProcess Instance Registry
request to activateprocess X
ee-agentDecision Enginelookup
launch
enter
AMQP
Other
![Page 13: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/13.jpg)
R3 Kickoff Meeting
Creating a Process II
13
Process Definition Registry
Process Dispatcher
Provisioner/DTRS
IaaS
EE type A instance
EPU Management
Process Instance Registry
request to activateprocess X
ee-agentDecision Enginelookup
launch
enter
request instance
create instance
AMQP
Other
![Page 14: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/14.jpg)
R3 Kickoff Meeting
CC instance
CC instance
Inside an Execution Engine
14
EE type A instance
context-agent
ee-agent
ou-agent
supervisord
supervisord
supervisord
Matlab scriptC
C
M
CMR
CMR
CMK
CMKO
CMKO
datastream subscription result
Process Dispatcher
EPU Management
Package Server
process (adapter) 1
AMQP
Other
C – create M – monitor R – restart K – kill O – I/OC – create M – monitor R – restart K – kill O – I/O
![Page 15: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/15.jpg)
R3 Kickoff Meeting
Adventures in Availability
• Time to repair (TTR)– Diagnosis– Time to scale (TTS)
• PENDING (request)• STARTED (deployment)• RUNNING
(contextualization)
04/18/23
15
A = MTBFMTBF+MTTR
Mean time between failures
Mean time to repair
TTS: preliminary results for 2,000 VMs provisioned on AWS EC2
![Page 16: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/16.jpg)
R3 Kickoff Meeting
R3 Scope
• Process management– Activation and validation– New execution site registration
• Integration with National Infrastructure– Framework for integration of academic cloud
providers, TeraGrid and OSG– Integration with Microsoft cloud
16
![Page 17: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/17.jpg)
R3 Kickoff Meeting
R3 Activities
• Refine/change scope to achieve a complete and maintainable system
• Decide on specific solutions for R3 scope
17
![Page 18: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649c785503460f9492d362/html5/thumbnails/18.jpg)
R3 Kickoff Meeting
Questions?
18