1 [email protected] ARGONNE CHICAGO How the Linux and Grid Communities can Build the Next-...

Click here to load reader

download 1 foster@mcs.anl.gov ARGONNE  CHICAGO How the Linux and Grid Communities can Build the Next- Generation Internet Platform Ian Foster Argonne National.

of 29

Transcript of 1 [email protected] ARGONNE CHICAGO How the Linux and Grid Communities can Build the Next-...

  • Slide 1
  • 1 [email protected] ARGONNE CHICAGO How the Linux and Grid Communities can Build the Next- Generation Internet Platform Ian Foster Argonne National Lab University of Chicago Globus Project
  • Slide 2
  • 2 [email protected] ARGONNE CHICAGO Ottawa Linux Symposium, July 24, 2003 Linux has gained tremendous traction as a server operating system. However, a variety of technology trends, the Grid being one, are converging to create a service-based future in which functions such as computing and storage are virtualized and services and resources are increasingly integrated within and across enterprises. The servers that will power this sort of environment will require new capabilities including high scalability, integrated resource management, and RAS. I discuss what I see as development priorities if Linux is to retain its leadership role as a server operating system.
  • Slide 3
  • 3 [email protected] ARGONNE CHICAGO The (Power) Grid: On-Demand Access to Electricity Time Quality, economies of scale
  • Slide 4
  • 4 [email protected] ARGONNE CHICAGO By Analogy, A Computing Grid Decouple production and consumption Enable on-demand access Achieve economies of scale Enhance consumer flexibility Enable new devices On a variety of scales Department Campus Enterprise Internet
  • Slide 5
  • 5 [email protected] ARGONNE CHICAGO Requirements Dynamically link resources/services From collaborators, customers, eUtilities, (members of evolving virtual organization) Into a virtual computing system Dynamic, multi-faceted system spanning institutions and industries Configured to meet instantaneous needs, for: Multi-faceted QoX for demanding workloads Security, performance, reliability,
  • Slide 6
  • 6 [email protected] ARGONNE CHICAGO For Example: Real-Time Online Processing Servers: Execution Application Services: Distribution Applications: Delivery Application Virtualization Automatically connect applications to services Dynamic & intelligent provisioning Infrastructure Virtualization Dynamic & intelligent provisioning Automatic failover
  • Slide 7
  • 7 [email protected] ARGONNE CHICAGO Examples of Linux-Based Grids: High Energy Physics Production Run on the Integration Testbed Simulate 1.5 million full CMS events for physics studies: ~500 sec per event on 850 MHz processor 2 months continuous running across 5 testbed sites Managed by a single person at the US-CMS Tier 1
  • Slide 8
  • 8 [email protected] ARGONNE CHICAGO Examples of Linux-Based Grids: Earthquake Engineering U.Nevada Reno www.neesgrid.org
  • Slide 9
  • 9 [email protected] ARGONNE CHICAGO Grid Technologies & Community Grid technologies developed since mid-90s Product of work on resource sharing for scientific collaboration; commercial adoption Open source Globus Toolkit has emerged as a de facto standard International community of contributors Thousands of deployments worldwide Commercial support providers Global Grid Forum serves as a community and standards body Home to recent OGSA work
  • Slide 10
  • 10 [email protected] ARGONNE CHICAGO Increased functionality, standardization Custom solutions 1990199520002005 Open Grid Services Arch Real standards Multiple implementations Web services, etc. Managed shared virtual systems Computer science research Globus Toolkit Defacto standard Single implementation Internet standards The Emergence of Open Grid Standards 2010
  • Slide 11
  • Service registry Service requestor (e.g. user application) Service factory Create Service Grid Service Handle Resource allocation Service instances Regist er Service Service discovery Interactions standardized using WSDL and SOAP Service data Keep-alives Notifications Service invocation Authentication & Authorization are applied to all requests Open Grid Services Infrastructure (OGSI)
  • Slide 12
  • 12 [email protected] ARGONNE CHICAGO Web Services: Basic Functionality OGSA Open Grid Services Architecture OGSI: Interface to Grid Infrastructure Applications in Problem Domain X Compute, Data & Storage Resources Distributed Application & Integration Technology for Problem Domain X Users in Problem Domain X Virtual Integration Architecture Generic Virtual Service Access and Integration Layer - Structured Data Integration Structured Data Access Structured Data RelationalXMLSemi-structured Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation
  • Slide 13
  • 13 [email protected] ARGONNE CHICAGO But Its Not Turtles All the Way Down Our ability to deliver virtualized services efficiently and with desired QoX ultimately depends on the underlying platform! At multiple levels, including but not limited to Dynamic provisioning & resource management Reliability, availability, manageability Performance and parallelism New demands on the OS in each area
  • Slide 14
  • 14 [email protected] ARGONNE CHICAGO (1) Dynamic Provisioning Static provisioning dedicates resources Typical of co-lo hosting Reprovision manually as needed But load is dynamic Must overprovision for surges High variable cost of capacity Need dynamic provisioning to achieve true economies of scale Load multiplexing Tradeoff cost vs. quality Service level agreements Dynamic resource recruitment
  • Slide 15
  • 15 [email protected] ARGONNE CHICAGO Load Is Dynamic ibm.com external site February 2001 Daily fluctuations (3x) Workday cycle Weekends off World Cup soccer site May-June 1998 Seasonal fluctuations Event surges (11x) ita.ee.lbl.gov M T W Th F S S M T W Th F S S Week 6 7 8 Week 6 7 8
  • Slide 16
  • 16 [email protected] ARGONNE CHICAGO For Example: Energy-Conscious Provisioning Light load: concentrate traffic on a minimal set of servers Step down surplus servers to low-power state APM and ACPI Activate surplus servers on demand Wake-On-LAN Browndown: provision for a specified energy target Even smarter: also manage air conditioning CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand.
  • Slide 17
  • 17 [email protected] ARGONNE CHICAGO Power Management via MUSE: IBM Trace Run (Before) 1 ms Throughput (requests/s ) Power draw (watts) Latency (ms*50) MUSE: Jeff Chase et al., Duke University (SOSP 2003)
  • Slide 18
  • 18 [email protected] ARGONNE CHICAGO Power Management via MUSE: IBM Trace Run (After) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003)
  • Slide 19
  • 19 [email protected] ARGONNE CHICAGO Dynamic Provisioning: OS Issues Hot plug memory, CPU, and I/O For partitioning, core virtualization capabilities Security Containment & data integrity in a virtualized environment: user-mode Linux++? Scheduler improvements for resource and workload management Allocate for required resource consumption Dynamic, sub processor logical partitioning Improved instrumentation & accounting Determine actual resource consumption
  • Slide 20
  • 20 [email protected] ARGONNE CHICAGO (2) Reliability, Availability, Manageablity Error log and diagnostics frameworks Foundation for automated error analysis and recovery of distributed & remote systems Enable problem determination, automated reconfiguration, localization of failure Configuration management Determine hardware configuration/inventory Apply/remove service/support patches Isolate failing components quickly
  • Slide 21
  • 21 [email protected] ARGONNE CHICAGO (3) Performance and Parallelism: E.g., Data Integration Assume Remote data at 1 GB/s 10 local bytes per remote 100 operations per byte Local Network Wide area link (end-to-end switched lambda?) 1 GB/s Parallel I/O: 10 GB/s Parallel computation: 1000 Gop/s Remote data >1 GByte/s achievable today (FAST, 7 streams, LA Geneva)
  • Slide 22
  • 22 [email protected] ARGONNE CHICAGO Performance and Parallelism Distributed/cluster/parallel file systems Optimized TCP/IP stacks Scheduling of computation & communication Web100 configuration & instrumentation
  • Slide 23
  • 23 [email protected] ARGONNE CHICAGO Web100: Overcome TCP/IP Wizard Gap
  • Slide 24
  • 24 [email protected] ARGONNE CHICAGO Web100 Kernel Instrument Set Definition Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection How it is implemented Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure Linux /proc interface is used to expose these instruments outside the kernel
  • Slide 25
  • 25 [email protected] ARGONNE CHICAGO For Example Recent transAtlantic transfer showed frequent drops in data rate But no loss or retransmit Web100 identified problem as Linux send stall congestion events
  • Slide 26
  • 26 [email protected] ARGONNE CHICAGO Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility Grid/Linux Cooperation: We Have Testbeds, Users, Applications Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton
  • Slide 27
  • 27 [email protected] ARGONNE CHICAGO Increased Flexibility (and Complexity) Evolution of the Server Time Significant implications for the underlying operating system
  • Slide 28
  • 28 [email protected] ARGONNE CHICAGO Summary The Grid community is creating middleware for distributed resource & service sharing Open source software for resource & service virtualization, service management/integration Motivated by wonderful applications But we need help from the OS Linux: the next-generation Internet platform? Could be: but significant evolution is required to address provisioning/resource management; availability, manageability; performance and parallelism; and other issues Grid community can provide testbeds, users, requirements, applications
  • Slide 29
  • 29 [email protected] ARGONNE CHICAGO For More Information The Globus Project www.globus.org Global Grid Forum www.ggf.org Background information www.mcs.anl.gov/~foster GlobusWORLD 2004 www.globusworld.org Jan 2023, San Fran 2nd Edition: November 2003