Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™
-
Upload
eunice-glenn -
Category
Documents
-
view
215 -
download
2
Transcript of Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™
Mark JonesSenior Product ManagerMark JonesSenior Product Manager
How Automation Can Help You: Use Cases for NetIQ Aegis™How Automation Can Help You: Use Cases for NetIQ Aegis™
Our Vision For IT Process Automation3 Years In The Making
3 years ago NetIQ had a vision for converging our systems & security management products to support consolidated incident & event handling.
But customers said, help us connect to our other tools as well. We’re the Noah’s Ark of tools – we have two of everything.
VP of Operations at a major Financial Institution
So we altered our plan to give customers greater control of the tools they’ve already invested in by creating a strategy for heterogeneous IT Process Automation (ITPA).
Introducing NetIQ® Aegis™The Control & Automation Platform for IT Processes
NetIQ Aegis is a software platform that models, automates, measures and improves run books and ITIL-based processes, bringing control and automation to IT Operations
ITILProcess(macro)
Run Books(micro)
Automate
Model
Measure
Improve
Use Case #1Sympathetic Event Correlation
NetIQ AegisNetIQ Aegis
4. AppManager receives sympathetic access failure events
• From application and web servers5. Aegis’ correlation engine
sees the sympathetic events• And matches them to pre-defined
rules
2. AppManager receives event• From the agent on the server
1. SQL Server down event
6. Aegis closes the sympathetic events
• Reducing the volume of AppManager events to be dealt with
• Update comments in the original event accordingly
3. AppManager event triggers an Aegis workflow
• Correlation engine begins listening for sympathetic events that match rules
NetIQ AppManagerNetIQ AppManager
Database Database ServerServer
Web ServerWeb ServerApplication Application
ServerServer
! ! !Additional correlation examples:
• Suppress machine down events from hosts on attached subnets when a router fails
• Identify root cause from multiple events, e.g. a congested network segment identified by a combination of Network ResponseTime events, and high queue lengths on some Exchange servers
1
2
3
4
5
6
Use Case #2Managing Maintenance Modes
NetIQ AegisNetIQ Aegis
4. Aegis sets the maintenance mode in AppManager
• On the right machine at the right time
6. Aegis’ sends a reminder email before the expiration of maintenance
• With an opportunity to “snooze” or extend via email
2. Aegis receives the email and parses
• Identifies the resource to set maintenance mode on and the time window
1.Application owner sends an email request to set maintenance mode
• Using an Outlook form
7. Aegis stops maintenance mode
• On time with no further approval
3. Aegis sends a reminder email before the start of maintenance
• With an opportunity to cancel via email
NetIQ AppManagerNetIQ AppManager
5. Administrator performs maintenance
Application Application OwnerOwner
Outlook FormOutlook Form
1
23
4
6
5
7
8. Aegis sends email confirming maintenance stoppage
8
Use Case #3Low Disk Space Response
3. Aegis requests disk usage analysis from AppManager
• Identify top N culprits by folder, file type, age
• Extra attention on known temp file storage areas
4. Aegis sends email to admin requesting approval to clean up
• Embed results of disk usage analysis & link to Aegis web site
2. AppManager detects condition• AppManager Knowledge Script generates
event
1. Available disk space falls below threshold
• Likely caused by temp file growth
5. Administrator approves partial cleanup through Aegis (or by replying to email)
• Admin can select individual folders or file types for deletion, archiving or user attention
6. Aegis commands AM to perform cleanup
• Delete approved files and analyze new disk space status
7. Aegis sends confirmation email to admin
• Identify files deleted and new disk space status
NetIQ NetIQ AppManagAppManagerer
NetIQ AegisNetIQ Aegis
AdminAdmin
AppManagAppManager Agenter Agent
ArchiveArchive TrashTrash
1
2
3
4
5
6
7
Use Case #4VM Dynamic Performance ManagementNetIQ AegisNetIQ Aegis
9. Verify improved service performance
• Repeat as necessary for up to 3 new guests total
4. Provision new VM guest• Clone VM, configure LAN settings,
etc & boot5. Apply post-image updates
per corp standard• Patches, configuration updates since
VM image was created
2. Identify VM host with spare capacity
1. Detect poor performance on VM-hosted service
• Performance problem detected by AppManager ResponseTime
6. Configure applications• Machine-specific settings required
on guest and other machines in business service
7. Validate application function• Verify proper application function
before bringing into production
8. Bring new guest into production rotation
• Configure load balancer, application controller or similar
VMWare VMWare Virtual CenterVirtual Center
Attachmate Attachmate WinInstallWinInstall
Load Balancer Load Balancer or Controlleror Controller
VMware VMware ESX HostsESX Hosts
3. Gain approval to provision new VMs
• Send email to admin with proposed changes, requesting approval to automatically respond NetIQ NetIQ
AppManagAppManagerer
AdminAdmin
Critical Business
Service
1 2
3
4 5
6
7
8
9
Use Case #5Web Server Sequential Restart
3. Aegis blocks new sessions to first server
• Uses NetIQ AppManager to configure load balancer4. Aegis commands AppManager to
monitor for server to reach zero active sessions
• Users “bleed” off as they end their sessions on their own; AppManager sends event when zero session remain
2. Admin initiates “Restart Web Farm” Runbook
• Customized runbook automated by Aegis
1. Admin applies a patch to all web servers
• Reboot needed to finalize
5. Aegis commands AppManager to restart the web server
• Aegis waits for notification that reboot is complete
6. Aegis commands AM to test basic functionality
• Verify that web server properly performs expected duties7. Aegis enables new sessions to the
server• Uses NetIQ AppManager to configure load
balancer
NetIQ NetIQ AppManagerAppManager
NetIQ AegisNetIQ Aegis
AdminAdmin
AttachmateAttachmateWinINSTALLWinINSTALL
Active Active SessionsSessions
Web ServersWeb Servers
Load Load BalancerBalancer
8. Aegis verifies web site health• Users are accessing the rebooted server
successfully and no Response Time or other errors reported on the web farm
9. Send progress notification to Admin
• Include % remaining & ETA for completion10. Go to Step 3 for next server
• Iterate until all servers completed
1
2
3
4
5
6
7
8
9
10
Use Case #6Incident Management
Other Sources Other Sources (RFCs, CMDB, (RFCs, CMDB, NetIQ Change NetIQ Change Guardian, etc.)Guardian, etc.)
3. Create helpdesk ticket• Apply proper classifications• Embed link to web page with related incidents
4. Helpdesk staff works ticket• Relevant information already collected & presented
with ticket
2. Collect related events from other data sources
• Changes, tickets, intrusions, etc during same time period
• Broaden scope to other machines in business service and correlate
1. Incident occurs• Performance problem detected by AppManager
ResponseTime
5. Monitor existing incident management workflow
• Support ticketing workflow with Aegis Investigation Assistance
• Wait for ticket to be resolved (not closed)
6. Initiate Incident Probation Period• Verify proper service restoration, record in ticket• Search all tools for unanticipated downstream
impacts, reopen ticket if found
7. Coordinate post-incident review for Problem Management
• Request explanatory info from stakeholders, e.g. how well was incident handled, how to prevent recurrence
• Produce unified report for management
NetIQ NetIQ AppManagAppManag
erer
HelpdeskHelpdesk
NetIQ AegisNetIQ Aegis
Incident Incident StakeholderStakeholder
ss
ManagemenManagementt
TicketingTicketing
1
2
3
4
56
7
Use Case #7Change Management
AppManagAppManagerer
8. Correlate changes to impacts• Search other tools for downstream impacts
from change such as performance problems, new vulnerabilities, etc.
All Data All Data Sources (Net. Sources (Net.
Mgmt, Etc)Mgmt, Etc)
4. Change Requester executes change per approved ticket
• Actions bounded by change control tool
1. Change is requested & approved • via existing “Request for Change” process
6. Reconcile audited changes to the approved RFC
• Group audited changes by time, machine, individual
• Request review of changes: auth or unauth, relevant ticket ID, etc
• Update ticket and CMDB with related changes
7. Perform system health check• After change, verify proper service levels
““Request for Request for Change” Change” ProcessProcess
NetIQ AegisNetIQ Aegis
Change Change RequesterRequester
ManagementManagement
9. Coordinate Post-Change Review• Change is “completed” but not “closed” until
the CAB has completed review
Tripwire, Tripwire, NetIQ Change NetIQ Change Guardian, etcGuardian, etc
AdministratorAdministrator
2. Detect approved change request• Monitor Remedy or other change management
system
5. Change audit tool detects actual config changes
• Tripwire or NetIQ Change Guardian
NetIQ NetIQ Change Change
AdministratoAdministratorr
CMDBCMDB
3. Provision access in change control tool
• Managed by NetIQ Change Administrator
Incident Incident StakeholdersStakeholders
1
2
3
45
6
7
8
9
Use Case #8Vulnerability Management
8. Relate changes to impacts• Search other tools for downstream
impacts from change such as performance problems, new vulnerabilities, etc.
All Data All Data Sources (VM, Sources (VM,
SM, Etc)SM, Etc)
3. Request permission to remediate via existing Change Management process (RFC)
• Group by machine, service, vulnerability class, etc.
1. Initiate vulnerability & policy violation scan
• Or scan on an existing schedule
5. Initiate remediation• Using provisioning tools such as
WinINSTALL, SMS, etc. or by assigned administrator
7. Perform system health check
• After change, verify that remediation did not impact service levels
AppManagAppManagerer
RemedyRemedy
NetIQ AegisNetIQ Aegis
Secure Secure Configuration Configuration
ManagerManager
AdministratorAdministrator
2. Identify resulting vulnerabilities
4. Monitor for approved RFC Patch Patch Manager, Manager,
WinINSTALL, WinINSTALL, SMS, EtcSMS, Etc
6. Initiate vulnerability scan to verify remediation
• Verify that vulnerability was indeed remediated
9. Close change request• Or escalate if impacts are found
1
2
3 4
567
8
9