Challenges and Opportunities in Autonomic Computing · Sybase Security Sybase Security Servers...
Transcript of Challenges and Opportunities in Autonomic Computing · Sybase Security Sybase Security Servers...
Thomas J. Watson Research CenterPO Box 218Yorktown Heights, NY 10598
Challenges and Opportunities in Autonomic Computing
June 25, 2002presentation to ICS'02
Alfred Z. SpectorVP, Services & Software
Copyright IBM 2002
1
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Abstract
Significant advances are required to make systems more adaptive to the growing range of impulses affecting them and to reduce their total cost of management. Progress seems to require significant innovation in adaptive techniques, systems architecture, software engineering, and standards. In this presentation, I will survey the space of the requirements and draw example problems from real systems. I'll then discuss the space of our research at IBM and highlight some of the more compelling research projects we are doing in the area. I'll conclude with a summary of some key challenges for the broader community as they relate to autonomic computing.
2
AZS Presentation to ICS'02 June 25 02 Copyright IBM
IntroductionAutonomic Computing
MotivationSpaceGoalsExamples, Mature and ResearchOur Research Agenda
The Space of Research
Outline
3
AZS Presentation to ICS'02 June 25 02 Copyright IBM
1945 1st IBM Research Lab
in NY (Columbia U)
Established: 1995 Established: 1972
Established: 1982
Established: 1961
Established: 1998
ZürichBeijing
AustinDelhi
Tokyo
Established: 1955
Established: 1995
1952San JoseCalifornia
Established: 1986
AlmadenWatson
Haifa
IBM Research Worldwide
4
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Geometric growth now generating really large quantum gainsInstalled base has reached critical massBuilding blocks, painstakingly developed, over many years workSociety increasingly accepts & needs I/T
So many more things are now feasibleBut, challenges in harnassing I/T technology grow; e.g., using massive parallelism
Unabashed Technical Optimism
5
AZS Presentation to ICS'02 June 25 02 Copyright IBM
An application server typically supports
5 Applications10 EJBsHundreds of servlets~ 100 configuration parameters
A web server typically serves
Thousands of web artifacts~ 20 configuration parameters
Failure protocols for each component are different: time-out, number of retries, where and what they log, how they fail
The increasing challenge of managing large systems is due to the inherent complexity of the solution and the sheer number of heterogeneous components
APPCLU 6.2
SUNE-mail
SUN
E-mailAddress Capture
AIXDSS
AIX
DSSGateways
SUN
Sybase Security
AIX
SybaseSecurity Servers
LocalDirector
Network
SUN
Sybase
SUN
Sybase Expressnet DB Servers
APPCLU 6.2
APPCLU 6.2
TPF
TPF
EPRDSYSPLEXIMS
DSUs
PPRDComplex
IMSDSUs
IPCE SYSPLEX
IMSDSUs
CICS
MSC
OS390
OS390
OS390
OS390
CASTPF
SYSPLEXIMSDSUs
IPCW
OS390
Back-end Systems
Typical Enterprise System Configuration
Complex System Topology
Messaging has ~ 50 configuration parameters
Front end for online customer service
SUNSUN
App Logging
MQ AIX
Logging
MQAIX
GatewayLogging
MQ
Hub Server Group
WebsphereApp Server
Netscape Ent. Server
SUN
MQ
HTTP
Presentation Business Logic Gateway
IMSW
IMSS
CASMQ
SNA
OICS Engine
AIX
SNA
SNA
DSSClient
JDBC
HTTP
MQ
SUN
Netscape Ent. Server
CIO’s speak out:
“Most of my costs are really pure maintenance and operations – keeping the processes running that keep the ship afloat. Our development budget suffers.” “Y2K and 9/11 have forced us to look at what we have – and we have too much complexity.”
7
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Increasing emphasis on Total Cost Of OwnershipIncreasing emphasis on QoSIncreasing emphasis on time to market installing applications
Which creates change and instabilityImprovement in Manageability
Absolute requirement w/exponential growth of boxes outstripping productivity improvements for administrators
Problems:Increasing complexityManagement is people intensive
Cost of managementAvailability of people and skills to do management
Solutions must be open
Industry Trends
8
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Towards Autonomic Computing
Self-optimizing System designed to automatically manage resources to allow the servers to meet the enterprise needs in the most efficient fashion
Self-configuring systems designed to define itself "on the fly"
Self-protecting System designed to protect itself from any unauthorized access anywhere
Self-healingAutonomic problem determination and resolution
9
AZS Presentation to ICS'02 June 25 02 Copyright IBM
IBM GoalsCreate and deploy self-managing infrastructure technologies to reduce complexity, lower cost of ownership, and increase reliabilityEstablish an architectural framework for leadership in Autonomic ComputingProvide technologies to reduce the cost of managing systems; that is automating automation (automation squared)
10
AZS Presentation to ICS'02 June 25 02 Copyright IBM
FailureRandom
Malicious
CatastrophicSparse
Aggressive
Load Variability
Attack
Small
Highly malicious
Autonomic Computing Dimensions
Other dimensions
11
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Principles
Local management structureRedundancy, heterogeneityDynamic run-time bindingValidation and self-protection
Requirements
System is always on, always live
Zero IT administrationAny system element can fail
Problems
Testing / verificationRoot cause analysisGlobal system management"Evolving" software vs. upgradingMachine-optimizable componentsStandards
Principles, Requirements, Problems
12
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Society
Enterprise
Campus
System
ComponentStatic, predesigned,fewer options
Dynamic, self-assembling,many options
Architectural Styles at Various Stages
13
AZS Presentation to ICS'02 June 25 02 Copyright IBM
zSeries CPU recoveryCPU duplex
zSeries SysplexWebSphereDB2 self managementIntrusion detection and rejectionAntivirus immune systemNetwork Dispatcher
IBM Example Mature Technologies
14
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Duplicated:Complex controlsArithmetic dataflow
Shared:Cache controlsCache data/address flowR-Unit Check all state updates
Preserve known good stateIf error1. Stop state updates2. Refresh from saved state3. Restart CPU
If error persists1. Extract saved state (SE)2. Load into spare CPU3. Start spare CPU
CFW 3/30/00
E-Unit(unchecked)
Cache(parity)
I-Unit(mirror)
E-Unit(mirror)
R-Unit(ECC on
saved state)
I-Unit(unchecked)
AddressCache dataInstructionsResults / state updatesSaved state data
zSeries CPU Error Detection and Recovery
15
AZS Presentation to ICS'02 June 25 02 Copyright IBM
SMP CEC
CICS
IMS
DB2
SMP CEC
CICS
IMS
DB2
SMP CEC
CICS
IMS
DB2
SMP CEC
CICS
IMS
DB2
SysplexTimer
SysplexTimer
CouplingFacility
CouplingFacility
ESCONDirector
ESCONDirector
CICS ApplicationsIMS Applications
DB2 Applications
No SPOF - hardware or software
CEC16 CPU SMP
Sysplex32 CECsor 512 processors
zSeries Parallel Sysplex
16
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Nanny process to restart application server processes that have failed or hung.Basic resource management - threads, connections, bean pools allocated as needed (within pre-set min and max).Optimized workload management using both session and transactional affinity.Transaction log recoverability. Centralized administration for clustering. Can duplicate server configuration across servers.
WebSphere Application Server: Today
17
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Initial Design and LayoutHardware configuration (a la Estimator for DB2 for 390)Logical database designPhysical data layout (partitioning, allocation to nodegroups, clustering)Auxiliary data structures (indexes, ASTs)Configuration parameters
DB2 for Unix, Windows, & OS/2 V7.1: 73 database manager parms, 72 database parameters (vs. 52 in V5!)330 registry variables!Memory allocation among various heaps, buffer pools, etc.
DB2 for OS/390 and z/OS V7: 200 DB2 system parameters (ZPARMs) -- 116 hiddenMemory allocation among EDM, Statement Caching, and Sort pools60 bufferpools with choices of Virtual, Hiper, and DataSpace-backed
Dynamic Monitoring & Adjustment Database statistics to collect and when, Clustering and REORG Buffer pool hit ratios, Memory allocation Problem determination (deadlocks, bad plans, ...) System / query status & visualization of all the above
Huge Scope of DBA Responsibilities
18
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Event Correlation to improve accuracy and scalabilityIntrusion Tolerance to ensure that the IDS itself is protected against attackBehavior-Based Intrusion Detection to enable detection of previously unknown attacksDistributed Event Triage and CorrelationAgent-based ID systems
State of the Art in Intrusion Detection
19
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Automated VirusAnalysis Center
ActiveNetwork
AdministratorAdministrator
ClientsClients
Widget Co.
AnalyzeDerive CureDistribute
* Sold as Norton Anti-Virus Corporate Edition
Digital Immune System
20
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Automated VirusAnalysis Center
ActiveNetwork
AdministratorAdministrator
ClientsClients
Widget Co.
Wodget Co.
Digital Immune System
21
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Internet
ActiveStandby
Multiple Virtual ClustersMultiple services within each ClusterSeparate balancing parameters used for each Cluster
Automatically balances load within each ClusterFault tolerant: standby ND automatically takes over for failed active NDRequires no operating system modifications Requires no physical alteration to networkRequires no specific code on servers. Server agent code can be installed for but is not requiredUtilizes up to three metrics to balance within each Cluster
Static: based on counts at ND (no server code)Advisors: Measures performance of specific application (server code)System: Measures over all performance of the system (utilizes OS performance monitors)
Dynamic feedback used to balance the loadMonitors systems and uses a weighted combination of the metrics to reassign loadWeighted round-robin, weights automatically adjusted based on feedback
Remotely manageableInterfaces available to connect to a broader autonomic systemStart, Stop, Quiesce, machines in a ClusterAdd or Remove Clusters
Layer 3 and layer 7 routing supported
Network Dispatcher: Autonomic Load Distribution
22
AZS Presentation to ICS'02 June 25 02 Copyright IBM
CACHE
eNetDispatcher
CACHE
CACHE
CACHE
CACHE
net
net
CACHE
eNetDispatcher
CACHE
CACHE
CACHE
CACHE
Origin Server
Origin Server
Origin Server
PODs PODsFront End CachingFront End Caching
Origin Origin cachescaches
Origin Origin ServersServers
ContentContentManagement Management ServersServers
CACHE
HITHIT
CACHE
CACHE
MISSMISS
ContMgmtSvr
ContMgmtSvr
pre-feed
ContentContentSourcesSources
Results
LotusNews/Photos
Publishing
CIS/NetCam
Results
LotusNews/Photos
Publishing
CIS/NetCam
Four-tier Web Serving ArchitectureFour-tier Web Serving Architecture
IBM Olympic Experiences
23
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Oceano provisioning and running stateless servers
eWLM ebusiness Work Load Manager-open servers
eBPM WebSphere
ABLE AI, Policy engine, and Agents
Blue GeneCellular computing architecture
SecuritySelf healing
Ongoing IBM Research Projects
24
AZS Presentation to ICS'02 June 25 02 Copyright IBM
RequestsRequests
Macy's SportsWeb
Macy's
Virtualized HardwareSingle Point of System Management
SportsWeb
Track performance metricsAggregate & correlate metrics (end-to-end) to SLA violationsOrchestrate reconfiguration
Fixed resource allocationSeparate managementBest effort basis, using own resources
RouterRouterThrottle incoming requests
Océano:
Today:
Océano Project
26
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Self-tuning, End-to-End Performance Management:Self-tuning, End-to-End Performance Management:Dynamic, allocation of server resourcesDynamic, allocation of server resourcesWorkload balancing & routingWorkload balancing & routingCross platform reportingCross platform reportingPolicy based for various classes of users Policy based for various classes of users & applications& applications
InternetInternet
Appliance Appliance ServersServers
Web Web Application Application
ServersServersData and Data and
Transaction Transaction ServersServers
Internet/Internet/ExtranetExtranet
Business Business PartnersPartners
Existing Existing Business Business
DataData
Distributed Workload Management
27
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Adjust every configuration parameter dynamically, while the system is in use!Expand and shrink memory usage, based on workloadAutomatically profile workloads and create/recommend indexes, partitioning, clustering, summary tables, ... to improve performanceAutomatically detect the need, estimate the duration of, and schedule maintenance operations (like reorg, statistics collection, backup, load, rebind)Observe actual performance and exploit that information to improve operations. Recommend action when things aren't they way you want them to be.Project into the future to detect coming problems, like low memory or constrained disk space, and notify you by page or e-mail days or weeks in advance!
Wouldn't it be great if your database was as easy to maintain
and as self- controlled as your
fridge?
Can your database do this? Soon it will...
SMART's Vision
28
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Java-based agent framework and AI component libraryAgent builder, test and debug tools, multi-agent platformAdd adaptivity through on-line machine learning (data mining)Policy-based behavior using rules-based knowledge representationAdd reflexive, reactive, and deliberative goal-seeking behaviorsDistributed hierarchical communication and feedback control
AbleAgent Sensors Effectors
Learning
Intelligent Control
Reasoning
SystemMonitors
System Controls
ABLE Autonomic Components
29
AZS Presentation to ICS'02 June 25 02 Copyright IBM
2.8/5.6 GF/s4 MB
Chip(2 processors)
Board(8 chips, 2x2x2)
Rack(128 boards, 8x8x16)
22.4/44.8 GF/s2.08 GB
2.9/5.7 TF/s266 GB
System(64 cabinets, 32x32x64)
180/360 TF/s16 TB
440 core
440 core
EDRAM
I/O
Autonomic Computing Issues: checkpointing, routing around failed nodes, data migration, communication route optimization
Blue Gene/L System
30
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Behavior-Based Intrusion DetectionSecure Distributed StorageSecure Boot & System Configuration MonitoringTamper-responsive hardwareTraps for catching worms and DoS agentsCertified systems that guarantee program separation
Current Security Research
31
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Self-managing storage systemsSelf-managing data base systemsLEO, DB2 Learning OptimizerArchitecture for control of autonomic systems
A Few New Projects
32
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Space Sequential Skip Sequential
Random
1
2
3
Device Sequential
Skip Sequential
Random
a
b
c
DatabaseDatabase Autonomic Manager
Policy andHistory
Policy
Alerts
Storage SystemStorage System autonomic Manager
Policy andHistory
File System
File System Autonomic Manager
Policy andHistory
StandardPorting Layer
Enhancementadditinos
ALOMS-Tango: Storage for Data Base Systems
33
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Statistics
Plan Execution
Optimizer
Best Plan
Plan Plan ExecutionExecution
OptimizerOptimizer
Best Best PlanPlan Adjustments
SQL Compilation
Actual Cardinalities
Estimated Cardinalities
1. Monitor1. Monitor
2. Analyze2. Analyze
3. Feedback3. Feedback4. Exploit4. Exploit
AdjustmentsAdjustments
EstimatedEstimatedCardinalitiesCardinalities
ActualActualCardinalitiesCardinalities
Learning in Query Optimization
34
AZS Presentation to ICS'02 June 25 02 Copyright IBM
DataBase
Application and Integration Middleware
Operating System
File System
Storage System Processor System
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and service agreements
Hints andDirections
AdministratorAlerts andmeasurement
IBM
ManagedOperations
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and service agreements
Hints and
Directions
AdministratorAlerts andmeasurement
IBM
ManagedOperations
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and service agreements
Hints andDirections
AdministratorAlerts andmeasurement
IBM
ManagedOperations
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and service agreements
Hints andDirections
AdministratorAlerts andmeasurement
IBM
ManagedOperations
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and service agreements
Hints andDirections
AdministratorAlerts andmeasurement
IBM
ManagedOperations
ManagedComponent
ManagedComponent
ManagedComponent
ManagedComponent
Autonomic ManagerPolicy based management,measure, model,
direct
Policy andHistory
Policy
Alerts
Measurement
Measurement
Workload and service agreements
Workload and
service agreements
Hints andDirections
AdministratorAlerts and
measurement
IBM
ManagedOperations
Autonomic Computing - The Whole System
35
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Managementchannel(output)
Managementchannel(input)
Functionalchannel(output)
Functionalchannel(input)
Monitor,control
Mgt.Unit
Func.Unit
Accesscontrol
Encapsulates servicesFunctional unit
Provides the serviceWeb server, DB, etc.
Management unitControls functional unitControl accessNegotiates for input,output services
Autonomic System ArchitectureAn Autonomic Element
36
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Negotiates withdirectory for service
Gets location of DB,storage services
Web ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Web Server
Self-configuringNew web server added
(Leg of a) Strawman ArchitectureAn Autonomic System
37
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Web ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Web Server
Self-configuringNew web server addedNegotiates withdirectory for service
Gets location of DB,storage services
Negotiates with DB,storage services
(Leg of a) Strawman ArchitectureAn Autonomic System
38
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Web ServerWeb ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Self-healing
Storage
Storage service dies
(Leg of a) Strawman ArchitectureAn Autonomic System
39
AZS Presentation to ICS'02 June 25 02 Copyright IBM
DB gets location ofnew storage service
Web ServerWeb ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Self-healingStorage service dies
Storage
(Leg of a) Strawman ArchitectureAn Autonomic System (x)
40
AZS Presentation to ICS'02 June 25 02 Copyright IBM
DB binds new storageservice
Web ServerWeb ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Self-healingStorage service dies
DB gets location ofnew storage service
Storage
DB initializes newstorage service
(Leg of a) Strawman ArchitectureAn Autonomic System
41
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Web ServerWeb ServerWeb Server
DB
Storage Storage
SystemsWebs of elements
Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA
Directory
Self-healingStorage service dies
DB gets location ofnew storage service
DB binds new storageservice
DB initializes newstorage service
Back in business withno interruption !
Storage
(Leg of a) Strawman ArchitectureAn Autonomic System
42
AZS Presentation to ICS'02 June 25 02 Copyright IBM
A long list of difficult problemsSystems
An extremely different way of creating systems
TheoryDifficult issues in complex systems, etc.
Candidate Grand Challenge in Computing Research Association (CRA) Grand Challenges Conference (ongoing today)
Autonomic Computing:A Grand Challenge?
43
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Architecture and basic principlesFundamentals and theoryStandardsProduct applications + implicationsSoftware engineering discipline
proof points for all above
(IBM) Autonomic Computing Action Framework
44
AZS Presentation to ICS'02 June 25 02 Copyright IBM
Component System Federation
Optimization Algorithms
Data Mining, Continual OptimizationWorkload management
Extended Cross system workload management
Control Theory Resource SLA managementComponent policy management and enforcementMonitoring
Agregating data and keeping relevant history
End to End Service level agreement managementgreement
Distributed Alg. & Control
Scripting sensors & control Distributed Alg. & ControlOptimization without complete or up to date information
Security Intrusion detection Sensor, Instrumentation Federated Intrusion Detection
Special Languages
Translate Business Policy to component policies
SLA specification language and processor,Policy specification language and processor
Rationalizing distributed policy
Adaptive/Learning Theories
Call Center Optimization,SLA and Policy Enginex
Complex Systems Automated Operation,Agent Technology,Autonomic Computing framework
Federated SystemArchitecture
Infrastructure Component level problem determination,
Unit of work tracking
Time
The Space of Research
45