December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

44
December, 2006 F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance

Transcript of December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Page 1: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

December, 2006 F I N I S A R C O R P O R A T I O N

Finisar Corporation

Monitoring PerformanceMonitoring Performance

Page 2: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

complexitycomplexity

heterogeneityheterogeneity

virtualizationvirtualization

changechange

fabric blindness

fabric blindness

The growing SAN challenge

you don’t know what to do when things go wrong

you don’t know the source of

SAN issues

you don’t know what you can’t see

you don’t know what to do when things go wrong

you don’t know the source of

SAN issues

you don’t know what you can’t see

Page 3: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Fabric blindness leads to…

Application brownouts or blackouts occur - and have significant business impactbusiness impact

Frantic fire-fighting

Internal finger-pointing Application vs. network vs. storage

External finger-pointing Vendors

Unacceptably long resolution times

Page 4: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Information Highway

Network performance shares many similarities with that of your daily commute

Storage Area Networks are no exception

Just like a large sprawling city, as SAN’s grow performance becomes more difficult to ensure

Lets take a look at planning for a faster commute

Page 5: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Is Performance Important?

Bugatti Veyron – The fastest production car 0 – 60 mph in 3.2 seconds Top speed well over 200 mph Price more than $1,000,000

Chevy Matiz – One of the slowest cars 0 – 60 mph in 21.9 seconds Top speed about 85 mph Price about $10,000

The Difference 6.8 times the acceleration 3 – 4 times as fast More than 10 times the price

Page 6: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

The Real Difference

In this environment they both go they same speed.

In fact in most environments they would have roughly the same time from A to B.

So maybe the right question is when is performance important and how is it measured

Page 7: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Rush Hour

In LA which has the worst rush hour commute time there is an 81% average delay during rush hour

Often certain routes are congested while others have limited traffic that is not affected

Page 8: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

On many SANs there is a 500% average delay during peak times

There is no notification of a problem (time out) until it is at 6000% of normal maximums and 75,000% of the low load average

Queues (just like on ramps) can fill even at low bandwidth conditions

Often certain routes are congested while others have limited traffic that is not affected

Rush Hour

Page 9: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

The impact of accidents

The impact of accidents depends on their severity

Pileups can result in routes that are impassable

Minor accidents can cause delays that far exceed even the impact of rush hour

Page 10: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

The impact of errors

The impact of errors depend on the severity of the issue

Physical errors can result in routes that are unusable

Occasional errors can cause delays that far exceed even the impact of rush hour

Page 11: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Patch Work

Often short term solutions to problems become long term hazards

Page 12: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Patch Work

Often short term solutions to problems become long term hazards

Page 13: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Planning for and monitoring the commute

City planners architect the roadways for what they believe will be the commute demands

In some cases they use simulation to compare various alternatives

Finally they monitor the traffic patterns to prevent and resolve problems and better plan for the future

Page 14: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

SAN Architects plan the fabrics for what they believe will be the storage demands

In some cases they use simulation and tests to compare various alternatives

Finally they monitor the traffic patterns to prevent and resolve problems and better plan for the future

Planning for and monitoring the SAN

Page 15: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Planning

The roadways are designed for the expected traffic loadsOften one of the biggest mistakes in the planning is using information that is out of date or incorrect assumptions.

Page 16: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Planning

The Fabrics are designed for the expected traffic loads

Often one of the biggest mistakes in the planning is using information that is out of date or incorrect assumptions.

Page 17: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Simulations are sometimes used to compare changes

Page 18: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Simulations are sometimes used to compare changes

Page 19: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Monitoring: I/O’s Per Second

Which route has more cars passing by every second?

In this scenario they could all be the same…Some with a few cars moving very fast while others with many cars that are going slowSo what if anything does that measurement tell us about performance?

Page 20: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Which route has more MB’s passing through every second

In this scenario they could all be the same…Some with no requests and some with slow request due to congestion

So what if anything does that measurement tell us about performance?

Monitoring: I/O’s Per Second

Page 21: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Looks at the real traffic flowsCan assess performancePinpoints the source of slow downs such as accidents and congestionSpeeds resolution to many of the problemsIn many cases helps to prevent issues from becoming problems

Modern Monitoring

Page 22: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Different method of Network Monitoring

Software Monitoring

Software Monitoring No interfering on the physical link Software Agent needed Effected by host system performance

Hardware Monitoring Isolate from Software and Host issue Intrusive on the physical link Dedicated monitoring HW.

Page 23: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Modern Monitoring

single TAP

Looks at the real traffic flowsCan assess performancePinpoints the source of slow downs such as accidents and congestionSpeeds resolution to many of the problemsIn many cases helps to prevent issues from becoming problems

Page 24: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Performance Analysis and Tuning

Request size and Queue dept are two keys contribution to performance tuning

Pre-Production run with variable queue dept and request size. Higher Queuept could increased throughput but also could cause

congestion and reduce throughput

Queue = 2

Queue = 4

Queue = 8

Queue = 16

Page 25: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Performance Analysis and Tuning

Read size 8 Kb with variable queue dept setting. Response time range from 10ms to 65ms. The ideal Queue dept for this system would be at 8 with 8Kb i/o

Page 26: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Performance Analysis and Tuning

Queue dept of 4 with variable read size

Throughput gain at the expense of latency

At 32k I/O throughput gain is no longer keeping up with the latency

Page 27: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Good Performance Monitoring

Does not focus on the irrelevant

Alarm for know issues

Unless there is an increasing pattern

Page 28: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Good Performance Monitoring

Does not focus on the irrelevant

Alarm for know issues

Unless there is an increasing pattern

Page 29: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Effects of SAN performance monitoring

Eliminate internal and vendor finger-pointing

Receive advance warning ofpotential problems

Reduce business riskriskrisk

Page 30: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Two recent customer case studies

Case 1A SAN problem was the root cause of an application disruption

Case 1A SAN problem was the root cause of an application disruption

Case 2A SAN problem was suspected as the root cause of an application disruption - but it was not the cause

Case 2A SAN problem was suspected as the root cause of an application disruption - but it was not the cause

Page 31: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 1: company profile

Large US insurance firm

Broad offering of insurance and financial products

10,000+ agents and employees

Large Microsoft Exchange implementation

Exchange data replicated to a remote site for backup and disaster recovery

Page 32: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 1: customer crisis

Exchange application slowed and became essentially unusable

User complaints flood IT

Business operations adversely impacted

Page 33: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 1: resolution efforts

Exchange server event log - no problemsStorage arrays log file - no problemsPrimary and secondary DR links tested - no problemsSwitch fabric manager - no problemsExchange throughput still low - pressure mounting - but no way to diagnose the problem. Elapsed time = 8+ hours

Page 34: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 1: Modern Performance monitoring

Probed storage link – unusually high Exchange Completion Times – proved SAN is the problemStorage array response – goodRemote replication acknowledgments – too long

Solution – re-route the DR traffic through secondary link – Exchange performance restored. Elapsed time = 30 minutesCause and Fix – Remote switch was busy dealing with RSCN storm because of a bad HBA in a unrelated application server in the remote site – Replaced HBA

Page 35: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Sync replication impact on production

Remote replication enabled

Remote replication disabled

Page 36: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 1: summary

Normal business operations were quickly restoredConclusive data that prevented finger-pointingWithout deep SAN Performance monitoring/analysis: it would have taken extraordinary effort to get to the root cause and resolutionIf deep SAN monitoring/analysis was in place: problem would have been prevented

Page 37: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 2: company profile

Large UK financial services firm

Assets of £540 billion

Over 20 million customers

Major UK mortgage and savings provider and credit card issuer

Relies on Oracle databases for transaction processing systems

Page 38: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 2: problem statement

Sudden, but intermittent slow down of Oracle-based applications

Widespread user complaints driving high level of internal visibility

Business operations adversely impacted

SAN was assumed to be the problem

Page 39: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 2: Modern Performance Monitoring

Deep SAN monitoring/analysis solution already in place

Quickly determined that all SAN parameters were within normal ranges - problem was not within the SAN

Trending report indicated time of problem occurrence - IT tracked back to an application “enhancement”

Elapsed time = <30 mins

Page 40: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 2: Modern Performance Monitoring

Page 41: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Increased link traffic

Case 2: Modern Performance Monitoring

Page 42: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Case 2: summary

Quickly identified SAN was not the root problemIdentified exact time of problem manifestation – helped identify the root cause: poorly designed database queryQuickly restored normal business operationsCustomer acknowledgement: without deep SAN monitoring/analysis solution, it would have taken days and many unproductive efforts to resolve

Page 43: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Where do you stand?

Are your networks being planned with the appropriate timely information or are the just happening?How are you monitoring performance? Do you know if your response times are degrading? Are your queue depth settings correct? How would you react to a brown out?What would the impact be to your business of response times that were 6000% longer than you are seeing now due to errors or congestion?Does your monitoring alert you to conditions that are irrelevant while not informing you of conditions that are likely to impact your business?Are you flying blind in when comes to the health and performance of your SAN?

Page 44: December, 2006F I N I S A R C O R P O R A T I O N Finisar Corporation Monitoring Performance.

Thank You. Questions? Or, Contact us to:Get a Finisar SAN assessment of your availability and performance needs

Walk through detailed SAN diagnostic scenarios

Schedule a web briefing for your organization

Today’s slides: www.finisar.com/webcast/NW1006.php

Thank You. Questions? Or, to contact us