Trouble Shoot in Guide

94
Application T roublesho oting Guide

Transcript of Trouble Shoot in Guide

Page 1: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 1/93N E T W O R K S U P E R V I S I O N

ApplicationTroubleshooting Guide

Page 2: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 2/93

Table of contents

Introduction

Background The TCP Protocol The Li e o a Packet Other Troubleshooting Factors 7Why UDP is so Di erent Why is Understanding Troubleshooting Important 9

Application Flow The Amazing Sequence o Events in Connectingto a Server DHCP DNS Lookups ARP Broadcasting The Route to the Server 24Establishing the Connection to the Server 26Sender/Receiver Interaction 30 The Flow o Data Closing the Connection 40Understanding How Applications Fail 42 Failures with DNS Failures with ARP Failures with Routing 47 Problems in Establishing a Connection 51 Slow Responses rom Servers 53 Closing the Connection 54

Troubleshooting Applications Baselining

Five Key Steps to Success ul Application Troubleshooting 58 Determine the domain o the problem and

exonerate the network 58 Validating Connectivity to the Application Server 65 Determine the Network Path 70 Application Flow Analysis 74 Using OptiView ® Protocol Expert 78 Fix the Problem Validate the Fix

Document the Fix 8Case Studies Case Study 1: Obtaining Switch Statistics 87 Case Study 2: Investigating WAN Link Per ormance 89

Summary

www fukenetworks com1

Page 3: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 3/93

Fluke Networks 2

Guide to Troubleshooting Application Problems

IntroductionControversy began early on when computers were connected toeach other and programmers began to write applications that

sent in ormation back and orth between them. The controversy iswhether the apparent slowness in per ormance o the applicationis due to how the program is executing, or the slowness in thenetwork. When networks were rst used, links were very slow whencompared to direct connections between computers, printers anddisk drives. So, networks o ten were blamed or the waiting time

or the system to produce output.

However, today’s networks operate at vastly higher speeds. Whenusers get rustrated with the per ormance o an application, it isdi cult to isolate whether the lack o per ormance is in the pro-cessing in the network attached devices, or in the network itsel .

In this document, we will address this problem. In order to dothis we will need to discuss how applications work. In particular,we will ocus on how they use or ail to use the network. This will include both ailure to send in ormation in a timely manner and

ailure to respond to requests coming rom the network. We’ll also give you a guide in what to look or when troubleshooting

applications.

BackgroundCompanies have become almost totally dependent on computersto conduct their business. How many times have you been in aretail store or government o ce and heard a sta member say,

“We can’t help you because the system is down.” Companiesdepend on computers or a variety o tasks. Revenue andexpenditures are recorded in transaction processing systems.

Page 4: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 4/93

www fukenetworks com3

Application Troubleshooting Resource Center: www fukenetworks com/appts

Background

Orders are taken and inventories are adjusted in similar systems.Most written communications are now done by email ratherthan by letters that are typed and mailed. Even instant written

communications such as instant messaging is moving rom adstatus to serious message communications where business isnegotiated and deals are closed.

Beginning about 2000, businesses began to seriously evolvevoice communications over to data networks with VoIP (Voice overIP). Most communications carriers and a majority o large enter-prise companies use VoIP exclusively or voice messaging. Withthe explosion o YouTube and Internet TV, video is beginning tocome into enterprise networks as well. Very recently, a new areais collaboration. Under the label o Web 2.0, it represents a groupo tools that allow users to share and exchange voice, data andvideo or the purpose o collaborating on projects. These toolsand services are rapidly becoming popular with lawyers, marketinggroups, consulting rms and service providers such as architectsand engineers.

In TCP/IP networks, applications can be distinguished bythe protocol over which they run. Nearly 90% run over TCP(transmission control protocol). That is, they send and receivetheir messages in packets that have been built by TCP. The other10% use the UDP (user datagram protocol). These two protocolswere devised nearly orty years ago but have proven to beextremely robust and enduring. While modi cations in TCP havebeen made over the years, its essential operation has remainedalmost constant. In our document we’ll ocus on TCP applications,

Page 5: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 5/93

Fluke Networks 4

Guide to Troubleshooting Application Problems

since they represent the large majority o the instances in whichusers are reporting slowness in application per ormance.

The TCP ProtocolI you segment the tra c on a corporate network into categoriesdetermined by the purpose o that tra c, you would probably

nd a break down something like this: email (10%), transactionprocessing (25%), http web tra c (35%), broadcast and control (10%) and other tra c (20%). The only part o the mix that is

based primarily on UDP is in the “other” category. The remainderuse mostly TCP.

TCP is designed to be adaptive to the network. This means when itworks on behal o the application,it modi es how it is interacting with

the network. This is based on howthe network seems to be respond-ing to the requests and commandsit is sending into the network. I the network seems to be ast, it will become more aggressive in sending

in ormation. I it senses a slow-down or problem with packets beingdropped, it will quickly decrease therate at which it sends packets.

TCP is based on a well documented algorithm ( ormula or opera-tional procedure). However, individual operating systems such asMicroso t® Windows® XP and a particular version o Linux make

TCP has three main unctions:

(1) It handles the rate o fowbetween the two end applica-tions. (2) It assures reliability

or the delivery o all data. (3)It provides or the detection o errors and correction o data byusing retransmission o packetsthat are corrupted or dropped.

Page 6: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 6/93

www fukenetworks com5

Application Troubleshooting Resource Center: www fukenetworks com/appts

Background

slight variations in how they implement the algorithm. A detailedunderstanding o the algorithm is not necessary to troubleshoota network. Yet, its e ect is pro ound and later, we’ll see some

examples o how it works.

The adaptive nature o TCP is what makes it very use ul. It can beused over 56 kbps dial-up links and 10 Gbps links. The applicationprogrammer doesn’t have to deal with that signi cant di erence inlink speeds - TCP deals with it. Onthe other hand, this adaptive natureo TCP means that slowness in theapplication per ormance might be theway the application is coded or theway in which the application interactswith the network. I the applica-tion processing is slow, the data tobe transmitted isn’t sent to the TCPstack in a timely manner. This alsomeans TCP can’t deliver packets to thenetwork. In the other case TCP sensesa slow network and decreases the rate at which it o ers packets tothe network.

So, when TCP is the protocol being used, both network problemsand server problems are hidden by TCP’s adaptive nature.

IP Ne tw ork

Transpo rt

App lica t ion

Figure 1 TCP Holds the Applicationand Network Together

Page 7: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 7/93

Fluke Networks 6

Guide to Troubleshooting Application Problems

The Li e o a PacketLet’s consider a relatively simple model o client/server interactionwhere TCP is being used. Let’s say we have data to send rom the

client to the server. (It works almost identically when the serversends data to the client.) The client application creates a blocko data and sends it to the TCP so tware in the client’s computer.It’s placed in an output bu er and timers are started thatare associated with that block o data. TCP nds out rom theoperating system how much data it can send in each outgoing

TCP block called a segment . Usually it is 1460 bytes maximum.This segment is delivered to the IP so tware and the IP addresseso the sender and receiver are added. Then the IP so twarecontacts the operating system to say an outgoing message isready to be sent. When the operating system gives the okay,the outgoing packet is moved to the network inter ace card (NIC)

to be sent. In this entire process, the per ormance o the clientprocessor is critical to how ast this all can take place.

A ter the packet is sent rom the inter ace card, the networkper ormance is the critical actor. Packets can ollow a short, astroute, a slow, long route or any combination o links. It is common

or a packet to traverse 10-20 individual network links when goingover an Internet connection. Like the adage about the strengtho a chain depending on the weakest link, the speed through thenetwork depends on the slowest link. More o ten than not, theslowest link is the access link into the network and the egress linkout o the network.

Page 8: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 8/93

www fukenetworks com7

Application Troubleshooting Resource Center: www fukenetworks com/appts

Background

At the server, the TCP segment is received on the NIC, deliveredto the IP so tware and the process described above is reversed.During this process, it is the speed o the processing in the server

that a ects how quickly the data is received and the acknowledge-ment is prepared to be sent back.

In conclusion, there are three critical phases necessary or thesuccess ul delivery o the application message to the server’sapplication:

1. Processing o the message in the client to prepare it to be sent.

2. Network delivery.

3. Processing in the server when the message arrives.

Other Troubleshooting FactorsThere are some other things that make it di cult to troubleshootTCP applications. The TCP protocol isn’t very well understood. Justrecently in ormation about TCP’s operation has begun to appearin networking textbooks and industry whitepapers. Yet, it is thesingle layer o the seven layer model which has a signi cantinfuence on the per ormance o the application-to-networkinter ace. Because it isn’t well understood, network engineers andtechnicians usually don’t look at it as a source o explaining whatis causing a problem. O ten, they look at server con gurations,cabling issues, or system memory as explanations when actuallythese have little relationship to the underlying problem.

Page 9: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 9/93

Fluke Networks 8

Guide to Troubleshooting Application Problems

Also, TCP isn’t an especially e cient protocol. For years, datacommunications experts considered an overhead amount o 20%or more to be excessive. But, TCP routinely operates with 50% or

more overhead when the actual application data bytes and thetotal bytes are used to calculate the overhead.

Finally, as we will learn later in this document, TCP reacts veryaggressively to packets that are dropped within the network.When a single packet is dropped, a le trans er may decrease itsdelivery rate by 70% or more.

Why UDP is so Di erentWhile we won’t examine UDP in detail in this paper, it is worth-while to consider how it compares with TCP. That will ampli y whatwe have learned about TCP.

UDP makes no provision or fow control, error detection or correc-tion, or the reliability o data delivery. These responsibilities arepushed up to the application. There ore, since UDP involves muchless processing, poor per ormance is more o ten dependent on theper ormance o the network or the slowness o the processor unit.I the processor is slow, other applications will show the sameslowness. This makes the isolation o the cause o slowness lesscomplicated than in the case o TCP applications.

Page 10: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 10/93

www fukenetworks com9

Application Troubleshooting Resource Center: www fukenetworks com/appts

Background

Why is Understanding TroubleshootingImportantWhen applications appear to be slow, rustrated users waste time

and energy. Slow applications cause lost time which, in turn,means lost money. Also, IT sta time is misspent. In someinstances, help desk or PC analysts expend much o their timerebooting client stations or checking IP con gurations andcabling connections.

When an e ort is made to learn about troubleshooting applicationsand the skills are applied to the system to improve the per or-mance o both the network and the applications, it’s an investmentin the company. It pays o in dividends the same way that savingon shipping costs, reducing uel consumption or eliminating anyother ine ciency, pay o .

But troubleshooting TCP applications isn’t easy. Unlike voice orvideo, which are typically UDP based, application per ormancedegradation isn’t obvious. I the sound is bad or the TV screenshows distortion, everyone sees it immediately. But with TCPapplications like transaction processing systems, per ormancetends to be judged based on how the system behaved yesterday.I it is slower today than it was yesterday, users begin tocomplain. Even worse, i the application degrades slowly, thechange may not be detected at all and the company begins tolose money through the ine ciency.

Page 11: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 11/93

Fluke Networks 10

Guide to Troubleshooting Application Problems

Understanding Types o Applications

There are many examples o applications that will all under ourscrutiny. Web applications include both applications that actuallycommunicate over the Internet and any that use a browser suchas Internet Explorer® or Fire ox.® The latter are o ten calledweb enabled applications . It is becoming more common orapplications to be described as “browser-based.” This generallymeans that the user begins by opening a browser and connectingto the server application rom within that browser window. Mosto us have used these applications when we do on-line banking,order a movie, or reserve an airline ticket. What is important to usis that the underlying tra c will be transmitted between the clientand the server using HTTP (hypertext trans er protocol) or HTTPS(secure hypertext trans er protocol). Both o these are TCP basedapplication program inter aces.

SQL(structured query language) applications involve calls roma client application to a server database application or recordsor tables rom a SQL database. These can be, or don’t have to be,web enabled and use HTTP or HTTPS. For example, the clientapplication can present a result to a user that asks or a eld to

be completed on the screen. A ter lling in a value, it is sent tothe server to do a computation. The server takes the result anddetermines that another record needs to be returned to theclient. All o this interaction takes place under the control o theTCP protocol and will be a ected by TCP’s operation, the individual processors’ per ormances and the network’s per ormance.

Closely associated to database activity are transaction processing systems. These o ten overlap SQL database applications.

Page 12: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 12/93

www fukenetworks com11

Application Troubleshooting Resource Center: www fukenetworks com/appts

Background

In transaction processing systems, the emphasis is on takingrelatively small amounts o data and doing a calculation in a shortperiod o time. ERP systems that have individual components such

as an accounts payable module, inventory module, or order entrymodule are examples o transaction processing systems. When auser changes a customer’s telephone number, a transaction hasoccurred. The data to make such a change and the in ormationthat shows that the change occurred are trans erred betweenthe client and the server under the control o TCP.

Imaging systems , such as those that inventory photos or x-rays,are also very o ten TCP based. They are usually very large les andmoving them rom a server to a client is a bulk trans er. Such bulktrans ers are the reason TCP was originally designed. One majorapplication program inter ace that is used to do this is FTP( le trans er protocol). It can be executed rom the commandline, rom within a le trans er program or rom within a browser.A user clicks on a thumbnail o the image they want to view andthe FTP protocol is invoked to retrieve the image rom the server.FTP is one o the best examples o simple application o the TCPprotocol. The client identi es the le to be retrieved (in this casean image), and a separate connection is opened or the trans er.The emphasis in the operation is to send as much data as possibleas quickly as the network can handle it. These applications cansaturate links on networks more quickly than almost any other typeo application. As a result, they appear to be more sensitive to themethod o operation dictated within TCP. Small network problemsbecome very evident in a le trans er, especially i it is a large le.

Page 13: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 13/93

Fluke Networks 12

Guide to Troubleshooting Application Problems

Data warehousing systems are also dependent on bulk letrans ers. A data warehouse is generally considered to be a highlyorganized, indexed repository o a company’s electronic documents.

When a document needs to be retrieved, the client applicationindicates the document or documents or parameters that uniquelyde ne the in ormation needed. When the server applicationreceives this in ormation, it retrieves the in ormation and sendsit as a bulk le trans er. Consequently they are nearly always TCPbased. And, as a result, they behave very much like other le

trans er based processes.

While VoIP payload is trans erred between the phones using UDP,the call set-up o ten uses TCP. In the exchange that takes placewhen a phone goes o -hook, only small amounts o data aretrans erred. However, the process can involve ARP, DNS and theother process that typically de ne a connection to a server.Poor per ormance or ailure in this connection won’t a ect thequality o the call. It will be apparent in the time it takes to makethe connection or the lack o signaling tones to which we havebecome accustomed. For example, the user might dial but not heareither a busy signal or a ringing tone.

Another category o applications that use TCP are CRM(customer relationship management) systems . They are usuallya combination o transaction oriented and database systems.They also depend on TCP.

Be ore we delve more deeply into the application fow process,

we need to consider one more item.

Page 14: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 14/93

www fukenetworks com13

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

Application Flow

We’re going to consider six issues that a ect application fow:

1. DNS lookups

2. ARP resolution

3. Establishing the TCP connection

4. Sender/receiver (interaction)

5. Data fow

6. Closing the TCP connection

In this section we will not consider UDP applications because theytypically represent 15% or less o the tra c on the network and

it is easier to isolate the cause o the poor per ormance. We will discuss UDP later in the paper.

The Amazing Sequence o Events inConnecting to a ServerVery ew o us ever stop to consider what is actually taking

place when we click on an icon that tells our application to getsomething rom a server. But understanding the process can bevery help ul. Let’s start when our user sits down at their desk tobegin working with the application. We’ll assume the computer wasturned o over night and the rst thing they want to do is checkthe company’s home page or noti cations about weather related

cancellations.

Page 15: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 15/93

Fluke Networks 14

Guide to Troubleshooting Application Problems

Here is a typical sequence:

1. Computer o .

2. Computer powered up.

3. OS loaded, NIC detected.

4. TCP/IP stack checked: IP address, mask, de ault router(gateway) and DNS server known.

5. I DHCP is enabled, the DHCP server is contacted to receive

these values.

6. User indicates to start connection to server (e.g. open IE).

7. IE request is created or the de ault home page(e.g. www.google.com).

8. IP so tware looks up a DNS server IP address

(which we assume is not local).

9. IP so tware sends ARP request to local net to get MAC addresso router.

10. The router receives the broadcast, recognizes that it is thetarget o the query and sends a response containing its MACaddress onto the network as a broadcast.

11. IP so tware sends IP packet with DNS request to router,which orwards to DNS server.

12. DNS server returns IP address or www.google.com.

13. TCP/IP stack sends connection request to server at Google. TM

14. Network delivers connection request.

15. Connection established.

16. (Story to be continued later).

Page 16: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 16/93

www fukenetworks com15

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

It’s rather easy to see that there is much that can go wrong.We’ll be re erring back to this sequence later in our discussion.

With the dependence that companies have on TCP basedapplications, we now turn our attention to the six phases o the TCP connection between the client and server applications.

DHCPThere are two ways a user can make sure that a client has the

appropriate con guration parameters to connect to the network.One way is to assign the values through the operating system.In Windows,® a screen in the Network Settings allows entry o theIP address, subnet mask, de ault router, and DNS server.On the other hand, in the same window, the user can check theradio buttons to obtain these automatically. In this case, the

dynamic host confguration protocol (DHCP) will be used.Figure 2 illustrates the process.

client server

DHCPREQUEST

DHCPOFFER

DHCPDISCOVER

DHCPACK

Figure 2 DHCP Process

Page 17: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 17/93

Fluke Networks 16

Guide to Troubleshooting Application Problems

To begin, the client sends a local broadcast indicating it needsservice rom a DHCP server. This rame is called aDHCP Discover .Typically, all DHCP servers that hear the broadcast will respond

with a broadcast, which is called a DHCP O er . The o er will contain the con guration parameters the client needs. More thanone o er may be received since there may be multiple DHCPservers. The client chooses one o the o ers by sending a broad-cast called a DHCP Request . The servers that sent the o ers thatweren’t accepted realize they are no longer part o the process.

The server that made the accepted o er completes the assignmento the parameters by sending a broadcast called the DHCP

Acknowledgement .

In the event that the DHCP server is on a di erent network thanthe client, the local router that separates their networks mustknow to orward DHCP broadcasts. This orwarding action is calledDHCP relay. Routers do not normally orward broadcasts.

DNS LookupsBe ore, we briefy mentioned DNS. But now let’s consider it inmore detail. DNS re ers to both the protocol domain name services

and the device called the domain name server . While it seems tocontain a redundancy, a DNS server is a domain name servicesserver. The purpose o DNS is to match IP addresses o hosts tothe name that resides in that host. For example, i Apple BakerCompany, Inc. has the domain name abc.com registered with theInternet community and they have also registered the address

221.221.13.0, we say that abc.com maps to 221.221.13.0.

221.221.13.0 <-> abc.com

Page 18: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 18/93

www fukenetworks com17

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

When a process queries with a name and learns the associatedaddress, we say the name was resolved . Most o ten names areresolved to provide the address. However, it is sometimes

necessary to resolve the address to learn the name. This is knownas a reverse lookup. DNS server can do both o these. The DNSserver is a computer with a stored table that contains the mapping(or association) o many names and IP addresses. Note that theDNS table does not contain the hardware addresses o the devicethat has a particular IP address. That’s in a di erent table and

we’ll consider that later.

In the sequence o events on page 14, the client browser knew thehome page was www.google.com. In steps 10 and 11, you see thatthe client browser asked to be connected to the server at Google TM But the client’s TCP/IP so tware didn’t know the IP address orwww.google.com. As a result a query was sent to the DNS serverand the IP address was returned.

On some rare occasions, another method is used to resolve namesto addresses. Clients will store a host name table. This table actslike a local DNS table and also the administrator o the local computer to create xed, permanent associations between namesand IP addresses. This is occasionally used in manu acturingoperations where there is a need, or desire, to avoid a name server.This technique is not adequate or situation where the user mightconnect to the public Internet because the names that need to beresolved are unpredictable.

Page 19: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 19/93

Fluke Networks 18

Guide to Troubleshooting Application Problems

Now that we know something about how DNS works, let’s considerwhat can go wrong. First, it is important to understand thatapplications are rarely written to send messages or data to an

IP address. It is almost always sent to a server with a name.So, DNS is almost always invoked. Second, the address o the DNSserver that is con gured in the client or o ered by DHCP is o tenthe wrong server or one that isn’t easily reached as in Figure 3.I it is the wrong server, the name query will be relayed to anotherserver, and potentially to more servers until it reaches one that

can resolve the name. Consequently, to the client application,it appears the query was handled slowly by the local DNS server,when, in act, the query was resolved by a device that was much

urther away. In addition, or e ciency and redundancy purposes,many large enterprises have many DNS servers. In one largeuniversity, there are ten DNS servers. Users are emailed a list and

they may pick two to con gure their client. In a case like this, thelikelihood that the user picks the best two servers, assuming thereare two best, is less than 3%! So, in most cases i there are manyDNS servers, you can probably assume the best two haven’t beencon gured in the client.

Page 20: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 20/93

www fukenetworks com19

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

Figure 3 The Location o DNS

The packet that contains the query name is called a command.The response, coming rom the DNS server that contains the IPaddress, is called the response. In Figure 4 you can see three DNScommands and three corresponding responses.

DNS Serverclient

Local Network

Ideal

DNS query

client DNS ServerRemote NetworkInefficient

DNS query

14 router hops

Local Network

Page 21: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 21/93

Fluke Networks 20

Guide to Troubleshooting Application Problems

Figure 4 DNS Commands and Responses

So, in conclusion, DNS can be slow due to network slowness,improper or poor choice o DNS server, or a lost query. In thelatter case, usually the client automatically sends a second or thirdrequest a ter it times out on the rst request. DNS can also ail,i the query contains a name that can’t be recognized or i noDNS server can be ound. O these two possibilities, the second ismore common because the administrator o the computer lists anonexistent DNS server or provides an invalid address in the TCP/IPcon guration. For example, you might be surprised by how manyusers con gure their TCP/IP stack by inserting the company’s email or web server in the DNS server eld.

Page 22: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 22/93

www fukenetworks com21

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

ARP BroadcastingARP (address resolution protocol) is used when a device knowsthe IP address but doesn’t know the MAC address o the same

device. In our description o connection to the server on page 14,step 9 indicates what happens. The client was con gured with theIP address o the router. But sending the packet requires the MAC(hardware) address o the device. The ARP broadcast containsthe query and the response to the query lls the need and theaddresses are resolved. When the client learns the MAC address,

it will temporarily store it in a table called its arp cache(it is not common to use upper case letters in this case.)

Figure 5 The ARP Command

Figure 5 shows the arp table o a client. This table is available inWindows® clients by typing the command arp –a on the commandline. The entries in the arp table are usually stored or abouttwo minutes in Windows.® Then it is discarded. Other operatingsystems store the entries or di erent lengths o time. All operat-ing systems supporting TCP/IP will have such a table. The arp tableis a good place to look to rst to see i a device knows the addresso a particular local station. It is important to remember that thearp table contains only devices rom the local network (subnet).

Page 23: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 23/93

Fluke Networks 22

Guide to Troubleshooting Application Problems

Devices beyond a router will not be listed because they are onanother network.

Another view o this problem is illustrated here:

IP A <--> IP B

MAC x MAC y

Suppose the device with IP address A wants to send a message tothe device with IP address B. It knows its own MAC address but not

the MAC address o the other station. Its application directed themessage to be sent to IP address B. Or, possibly, its applicationdirected it be sent to a name at B and DNS provided the IP addressB. Either way, it still needs the MAC address o B to complete itstask. ARP will be invoked and the entry B and y will be added tothe arp table or temporary storage.

The ARP protocol is a necessary part o TCP network (in the caseo IPv4). ARP is also used or some more discretionary tasks.Sometimes re erred to as discovery techniques, ARP is used whena device like a server wants to see who is actually on a local network. By sequentially sending an ARP query to every possibleaddress and then watching who responds, it can build a list o devices that are actually unctioning on the network. Windows®browsing and many network troubleshooting tools implement thistechnique.

Figures 6 and 7 show an ARP query and the correspondingresponse.

Page 24: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 24/93

www fukenetworks com23

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

Figure 6 The ARP Query

Figure 7 The ARP Response

The Internet standard that speci es the ARP protocol also makesprovision or a process called reverse ARP. This allows a device tobroadcast a MAC address to nd out who has the corresponding IPaddress. While it might seem this could be use ul, it is rarely usedin practice.

Page 25: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 25/93

Fluke Networks 24

Guide to Troubleshooting Application Problems

The Route to the ServerSuppose that the client has now obtained the IP address o theserver and knows it isn’t on the local network. The client also

knows the MAC address o the router. The client has a connectionrequest that needs to reach the server. O course it will send it tothe router which will in turn orward it to another router, and so

orth until it arrives at the router on the server’s local network.Keep in mind, that when it reaches the destination network, thatrouter may need to invoke ARP to get the server’s MAC address.

But how many steps (links) will the connection request passthrough and how ast and reliable is each link? And, is the routeselected the optimal route? Usually that is determined by twothings: how the network was designed and what routing protocol was used. I a particular link is slow and was used to create thepath rom the client to the server, a response to the connection

request will be slow. Had a aster link been used to create the paththe response will be quicker.

In today’s IP networks, routers establish a set o links betweenthe potential sending and receiving networks using a routing

protocol. Investigating how such protocols work is beyond the

scope o this document, but it is important to understand a ewthings that are common to all routing protocols. First, they aredynamic and automatic. That is, they will build a set o linksamong themselves so that all devices can be reached by otherdevices. Second, they periodically update each other so that i alink ails, a new set o paths are created. Finally, they all make

provisions or static routes , or routes that are manually created and

Page 26: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 26/93

Page 27: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 27/93

Fluke Networks 26

Guide to Troubleshooting Application Problems

Establishing the Connection to the ServerThe method o creating the connection or all TCP applicationsis the same. It is widely known as the three-way handshake.

Here’s what happens.

Be ore data can be trans erred over the connection, certainparameters o operation must be agreed upon by the sending andreceiving TCP so tware modules. Recall that TCP is responsible orfow control, reliable delivery, and error control. In order to agree

on how this will be handled, each TCP stack exchanges certainin ormation during this three-way handshake. For example, in orderto establish the maximum number o bytes that will be permittedin each packet, the sender includes a statement o the intendedMTU (maximum transmission unit). While this is normally 1460bytes, it isn’t mandatory that value be used. I the other TCP

module receives this indication and doesn’t like the value andthinks it should be lower, it simply doesn’t acknowledge that itreceived it. The sender must take the hint and lower the value.

A second thing that must be established is how many bytes thereceiver can receive in total in its receive bu er. This value, calledthe window advertisement , is also indicated in the rst packetexchanged in the handshake. The other end records this value andany adjustments it receives so that it never overwhelms its partnerwith too much data. The exchange o these two values provide orfow control.

Another value that must be established is the point at which eachend will begin to count bytes o data. This is a random value calledthe initial sequence number and each announces their value in the

rst two steps o the handshake. From that initial value, every

Page 28: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 28/93

Page 29: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 29/93

Fluke Networks 28

Guide to Troubleshooting Application Problems

Port numbers or server processes generally are well-knownorregistered . That means that everyone agrees on which applicationprocess the number is associated with. For example, email o ten

uses ports 25 or 110. HTTP uses port 80. There are hundreds o others. It’s a good idea to learn most o the common ones. Likeso many other network topics, using your avorite search engineand the phrase “TCP port numbers” will lead you to many lists thatare available. In this step, the client will normally use a randomlychosen port. Windows® o ten selects a value in the range o 2000-

2500.

The SYN packet also contains the sender’s window advertisement.Also, it contains a request or the maximum transmission unit(MTU). Finally, it indicates the act that it is a SYN packet bysetting a single bit equal to one. That particular bit is never setto one unless SYN is being indicated.

In the second step in the three-way handshake, the server sendsa response called the SYN/ACK packet to the client. It containsits window advertisement, initial sequence number and the portnumbers o the two application processes. It also takes thesequence number it received, increases the value by one andreturns it as its acknowledgement value. This is how it tells theother TCP entity that it is now synchronized to its sequencenumbers. It will also change the value o a single bit, theacknowledgement bit to one. There ore, in its packet both theSYN bit and the ACK bit are one and that is why the packet iscalled the SYN/ACK packet.

Page 30: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 30/93

www fukenetworks com29

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

In the third and last step o the handshake, the client sendsa packet with the sequence number increased by one (That’scheating because it didn’t actually send data bytes. It’s the only

exception to the rule about sequence numbers.) This third packetallows the client to acknowledge that it received the server’sacceptance o the opening o the connection. The connection isnow prepared to allow data to fow between the two applicationsrepresented by the port numbers.

A common metric o per ormance o the network application is theround trip time (RTT). This is the time rom when the SYN is sentand the SYN/ACK is returned. This value is one o the things thesending TCP stack uses to determine how much data should beo ered to the network. I the RTT is low, the network appears tobe ast. I the RTT is high, the network or the server appears tobe slow and the TCP stack will adjust accordingly.

In Figure 10, an HTTP Get command is sent to the server in therst step. 65 ms. later that packet is acknowledged. This time

is the RTT. It is important to note that this value isn’t theapplication response time. This is the acknowledgement o theTCP protocol in the server. The application responds in the thirdstep shown in Figure 10.

Page 31: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 31/93

Fluke Networks 30

Guide to Troubleshooting Application Problems

Sender/Receiver InteractionThe most common model or interaction in the TCP/IP environ-ment is the client server model. In this model, the client makes a

request or some orm o service and the server ul lls this request.For example, the client browser may request the objects on a homepage. An ERP client may request a eld rom within a data recordon the server. As illustrated be ore, a client process may invoke FTPto receive a large le. In each case, the client is making a requestthat it hopes the server will ul ll. The server applications that

ul ll these requests are sometimes called services. So, we may saythat a particular service is or isn’t available on the server. The waya client knows whether or not a service (process) is available isby the response to the SYN packet. I the server responds to theSYN with a SYN/ACK, it is in orming the client that the service isavailable. For example, that’s why requests or web pages are sent

to port 80. That port identi es a server that is acting as a webserver. While it might also act as an email server, i it doesn’trespond to SYNs sent to port 25, it is in orming the client itdoesn’t do email services using the SMTP (simple mail trans erprotocol) system. That service is not available.

So, a ter a connection is established, a client makes a requestor in ormation. Say the request is or a web page element. The

server may acknowledge receipt o the request with an ACK packetcontaining no data. Then, it may send the le containing therequested in ormation. In act that is what is being illustratedin Figure 10.

Page 32: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 32/93

www fukenetworks com31

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

Figure 10 A Request and the Response

TCP acknowledgements are governed by a airly complicated seto rules. However, there is a basic set o behaviors we need tounderstand. First, not every packet needs to be acknowledged.The TCP stack, written or the operating system involved, mayimplement a procedure in which every packet is acknowledged,only occasional packets are acknowledged, or some combinationo the two rules. What is rather typical o all TCP stacks is thatdata is sent in a TCP segment and then the receiver waits until one o two things occurs. Either it receives another segment, or200 ms. transpire. I the 200 ms. pass and no additional segmentsare received, it will acknowledge the one it received. I a secondor third segment is received, it may acknowledge them at anypoint that it is ready to do so. Sometimes when the packets areexamined, you will see a very homogeneous pattern: two segmentssent, acknowledgement received, two more sent, acknowledgementreceived and so orth.

Page 33: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 33/93

Page 34: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 34/93

www fukenetworks com33

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

• It uses a technique called slow start . In this procedure one ortwo segments are sent and the acknowledgements are awaited.I they return quickly, the sender increases the block size (that is,

the number o segments it sends simultaneously). It will continueto increase the block size until one o two things happens: eitherthere is a time out because an acknowledgement isn’t received(such as when a segment has been discarded) or the total datasent is one-hal the receiver’s advertised window value.

Let’s consider some examples. Suppose the sender has a hundredor so segments o data to transmit. We show the interaction inFigure 11. However, rather than complicate things by using theactual acknowledgement values, we simply use consecutiveintegers that correspond to the number o blocks o data.

Step one starts a ter the three-way handshake has been completedand the client is ready to up load its in ormation. The client sendsa rame and awaits an acknowledgement. The acknowledgementreturns quickly as Ack1. In step 2, because the client’s TCP stackis using slow start and it has received one acknowledgement, itnow increases the send group to two segments and sends segment2 and segment 3. Again, the server responds quickly with Ack3.In step 3, it increases the outgoing group size to 4 because itreceived acknowledgements or two more segments. It sends all

our segments. Once again, the server quickly responds withAck7, assuring the client that it received all our segments.

Page 35: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 35/93

Fluke Networks 34

Guide to Troubleshooting Application Problems

1

client

time

2

server

1

3

4

Ack 1

2

3

4

7

8

Ack 12

Ack 3

Ack 7

12

}

Figure 11 TCP Operation

Page 36: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 36/93

Page 37: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 37/93

Page 38: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 38/93

www fukenetworks com37

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

In step 1, the client sends segment 1 and the server quicklyresponds with Ack1. In step 2, using slow start, the client sendstwo blocks o data and the server quickly responds with Ack3.

In step three, again because o slow start, the client sends oursegments, segments 4, 5, 6 and 7. But segment 6 is discardedby the network. When the server receives segments 4, 5 and 7,it doesn’t acknowledge receipt o 7. Instead, it acknowledgessegment 5, the last one that was success ully received in order .In step our, the client has received Ack5 but isn’t sure i six was

lost or is still wandering around the network waiting to arrive. So,the client sends just segment 8. Since, segment 6 isn’t wanderingaround but has been lost, the server quickly acknowledges receipto data but it acknowledges segment 5 again with Ack5. In step5, the client still isn’t permitted to assume that segment 6 is lostso it again sends one segment, segment 9. The server sends Ack5

again. Finally, because the client has received three duplicateacknowledgements (steps 3, 4 and 5), it is permitted to assumethat segment 6 has been lost and it retransmits that segment.Now that the server has segments 4 though 9, it sends Ack9. Asillustrated in the nal step, the client continues as it begins andsends segment 10. It will continue to implement slow start with

sending succeeding segments.

This illustration showed some rather remarkable acts. First,because a single rame was dropped, the sending TCP entityreacted very strongly by cutting back to sending one rame at atime until the problem was resolved by the retransmission. Keep inmind that i the segments each contained 1000 bytes, or example,this reaction may have been caused by a single bit error!

Page 39: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 39/93

Fluke Networks 38

Guide to Troubleshooting Application Problems

This clearly demonstrates that TCP does not behave well in errorprone environments such as those created by wireless or poorlycabled networks. When comparing the rst and second examples,

it was clear that a single bit error could reduce the throughput bythousands o bytes in the time interval illustrated.

Our third example involves a clean network but a severely over-whelmed server. It’s illustrated in Figure 13.

1

2

3

client

time

2

server

200 ms.

230 ms.

280 ms.

1

3

4

}}

}

Ack 1

Ack 3

Ack 5

4

5

6

7

Figure 13 Overwhelmed Server

Page 40: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 40/93

www fukenetworks com39

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

The client begins by sending a single segment in step 1. In orderto comply with delayed acknowledgement, the server waits 200ms. to see i more data will arrive. It does not. So, it sends Ack1.

In step 2, the client increases the outgoing bu er to two segmentsand sends segments two and three. The server’s reaction isimportant to understand. I the server were quickly processingthe incoming data, it would respond on or be ore the 200 ms.wait time. Instead, the server uses 230 ms. to acknowledge thatit received the two segments. Then it sends Ack3. In step three,

the client’s calculation o round trip time (RTT) tells TCP that thissituation involves either a slow network or slow server. One o the two is not operating optimally. So, rather than increase itsoutgoing number o segments as it did in the other examples, theclient sends two segments again, segments our and ve. In step3, the server is still operating sub optimally, so it takes it 280 ms.

to determine that it received two more segments. Then the serversends Ack5. In the nal step illustrated, step 4, the client still realizes the RTT is high, so it proceeds to send only two segmentsagain. A ter these segments are delivered, i the server wouldhappen to increase its per ormance (maybe it was busy withanother large computational task), the RTT would be a lower value.

The client’s calculation would indicate that it can increase thenumber o simultaneous segments it is sending.

This example points out an additional important point. A subtlepoint is illustrated by the act that the client never aggressivelyincreased the number o segments being simultaneously sent. Thisindicated slowness somewhere. The turn-around time at the server

Page 41: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 41/93

Page 42: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 42/93

Page 43: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 43/93

Fluke Networks 42

Guide to Troubleshooting Application Problems

In some o the early implementations o TCP applications, theconnection was closed by issuing a segment with the RESET bit setto one. While the other TCP stack might see this as an indication

the session was over, it might also correctly interpret it as anindication that the timers and sequence numbers were con usedand needed to be corrected. Issuing resets occasionally causesservers to retain resources or the TCP session long a ter they areno longer being used by the applications. O course, nearly thesame thing might be expected i the FIN segment were to be lost.

But in this case, the other end would not acknowledge receipt o the FIN segment and it would eventually be retransmitted. So, the

our-way handshake is the proper way or a session to be closed.

Understanding How Applications Fail

Failures with DNSIn order to understand how DNS can ail we need to look moreclosely at how it works. When a client wants to make a requesto a server, it usually has the name o the server such aswww.psu.edu or www.fukenetworks.com. DNS is a distributeddatabase that stores the names and the corresponding IP

addresses. The database in DNS is a hierarchical tree whichdenotes two things. First, every node (DNS server) is above orbelow another DNS server. Second, there is a single path betweenany two servers. The nodes close to the root o the tree are calledroot servers. They keep track o who knows all the names in theirown branch which is below them. This can be seen in Figure 15.

For example, the .com server knows who is responsible or a namelike fukenetworks.com because the su x is .com. Likewise, thenode fukenetworks.com knows who can resolve names containingsupport.fukenetworks.com.

Page 44: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 44/93

Page 45: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 45/93

Page 46: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 46/93

Page 47: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 47/93

Fluke Networks 46

Guide to Troubleshooting Application Problems

server 1 server 2

firewall

Where isServer 1?

Here Iam!

Figure 17 Proxy ARP

In this case when a station wants to communicate with Server 1,it will send an ARP broadcast. The rewall will respond with aunicast response. This creates the impression in the station thatthe rewall is, in act, Server 1. Network administrators like thistechnique because it completely hides the act that the two

servers are on a separate, protected network.

But proxy ARP is prone to being miscon gured. Especially in thecase o servers that need multiple IP addresses on a physical NICand in cases where it is used in mobile IP applications.

On some occasions con guration mistakes can lead to incorrect

ARP responses. Consider the example in Figure 18. Suppose theserver has been accidentally con gured with the incorrect address

Page 48: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 48/93

Page 49: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 49/93

Fluke Networks 48

Guide to Troubleshooting Application Problems

and most servers that act as routers use the RIP (routing in orma-tion protocol). That protocol is over thirty years old. Routers andservers that use the protocol choose the destination path based on

the number o hops (links between routers). However, no provisionis made to assess the speed or throughput o a link. Consequently,links might be in a path that is very slow when an alternative pathis not being used. The best way to spot a slow link is probably touse a tool like trace route. In Figure 8 (page 25) you can see thatlink 8 had a higher than normal RTT with 88 ms. While that may be

an aberration, repeated use o trace route might con rm that it isalways the slow link.

One actor that can be important is the network router discardingpackets. Among routers, there is no provision or retransmission.So, the client or the server must determine that the routers havediscarded packets. When they do, they will generally retransmit thepackets. But you will recall that we pointed out that this retrans-mission can happen signi cantly later and cause TCP to slow downits rate o delivery by a large actor. So, why would routers discardpackets? There are two main reasons. First, a packet arrives at arouter and it is corrupted. That is, it has been damaged and therouter detects that some in ormation it contains is incorrect.It won’t try to determine the correct in ormation. That woulduse valuable processor cycles. It discards the packet. In certaininstances, it will send a packet called an ICMP (Internet Control Message Protocol) packet back to the source indicating thediscard. But the packet is lost. The second thing that can happenis that the router simply becomes too congested (that is, busy).Nearly all routers will reach a threshold where they begin to

Page 50: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 50/93

Page 51: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 51/93

Page 52: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 52/93

www fukenetworks com51

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

or the cause and miss the actual source o the degradation inper ormance. Other causes o packet loss can be an over subscribedquality o service technique, bad NIC cards in routers, or

intermediate wireless links (such as wireless bridges).

Problems in Establishing a Connection

Applications communicate between ports. In act, in TCP andUDP discussions, we say the TCP session is de ned or uniquelycharacterized by these our parameters: the sending IP address,the sending TCP port number, the receiving IP address, and thereceiving TCP port. Programmers re er to the set o our parametersalong with the protocol as a socket . These our values

(IP address A, port x) <--> (IP address B, port y)

uniquely identi y the logical communications channel between theclient and the server.

client server

IP A IP B

APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC

IP Destination Address BIP Source Address ASource Port X

Destination Port YFigure 20 Defning a Session

Page 53: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 53/93

Page 54: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 54/93

www fukenetworks com53

Application Troubleshooting Resource Center: www fukenetworks com/appts

Application Flow

Most servers have several open ports which represent servicesit will provide. For example, a server with ports 25 and 80 openwould be available to provide email and web page serving. I the

server actively announces these services, we say it advertises theservices. So, o ten a network support tool will scan the network tosee which devices have open ports and then send a SYN to veri ythe status o the port. Among the bad guys are hackers that areattempting to do reconnaissance on your network. They also wantto know which devices are servers and which ports are open.

But it isn’t likely that they have honorable purposes.

Slow Responses rom Servers

In the client-server relationship, when a client sends a request toa server, it is usually represented by a packet that has a unique

ormat which is determined by the application program inter ace.For example, with HTTP, when a client requests a home page orpart o a page rom a web server, an HTTP GET packet is sent to theserver. It contains a signi cant number o parameters such as theversion o HTTP being used, the date and time, the le requestedand other values. The GET is sent only a ter the client and sessionserver has been established by the three-way handshake.

How quickly the service request is ul lled may depend on several actors. The server may be slow to respond because it needs to

compute something or search or the le. In the case o HTTP,it may even send a response that indicates acknowledgemento the request without supplying the le to ul ll the request.It’s like saying, “I’m working on it.” The request may get lost or be

Page 55: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 55/93

Page 56: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 56/93

www fukenetworks com55

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

This may orce the operating system to keep memory resourcesand times active when they are no longer needed. Keeping theresources viable will take additional computational cycles rom

the server and decrease its overall per ormance.

Troubleshooting ApplicationsO ten troubleshooting is challenging and a “best” method isn’tobvious. Some technicians use the ew tools they are amiliar withto try to solve every problem. The adage, “When the only tool you

have is a hammer, everything begins to look like a nail, “ comesto mind. In network troubleshooting, technicians who know thephysical level sometimes attempt to solve a problem with cabletesters and meters. Technicians who have an RF communicationsbackground gravitate towards spectrum analyzers.

On the other hand, team members working on a problem mayeach have a di erent view o the same problem. The systemsanalyst will think o CPU per ormance, insu cient memory, and

ragmented disk space as the cause o the problem. The networkarchitect might see it as routing issues. Or, the manager o network in rastructure might decide to test the WAN links or

throughput or replace copper links with ber connections.

Good troubleshooting requires having a broad base o experience,a solid knowledge o the technology involved, and the availabilityo the proper tools.

Page 57: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 57/93

Fluke Networks 56

Guide to Troubleshooting Application Problems

BaseliningOne o the techniques that is used the least is baselining thenetwork. Yet, it is one o the most use ul techniques. Baselining isrecording data based on the “normal” per ormance o the network.By knowing how the network per orms when users are satis ed,you have a point o comparison when problems are reported. Havethe level o broadcasts gone up? Are there new protocols on the

network? Is utilization higher? Is the ERP application responsetime slower? Questions like these are impossible to answer withoutthe corresponding data gathered under normal operating

OptiView® Analyzers: Portableand Rack Models.

Fluke Networks’ OptiView Series IIINetwork Analyzers are available in twoorm actors to give you a clear view o

your entire enterprise – providing visibilityinto every piece o hardware, everyapplication, and every connectionon your network.

Choose the Integrated Network Analyzeror portable, all-in-one analysis or the

Workgroup Analyzer or permanent orsemi-permanent deployment that acts

as a “virtual network engineer 24/7” in the core or at remotesites. Or, use them together to create a power ul combination– a troubleshooting tool or the access layer and an analyzerwatching the core, remote site or critical network point. Networkpro essionals can conduct all the necessary tests at the remote sitewithout ever leaving the headquarters site.

Page 58: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 58/93

Page 59: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 59/93

Page 60: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 60/93

Page 61: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 61/93

Page 62: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 62/93

www fukenetworks com61

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

An easier approach is to use the same screen that we used totest DHCP. In gure 23 we can see that the OptiView has beencon gured to resolve the name DBase02.appsvr04. net.com. The

test was success ul and the IP address was returned in 140 ms.By clicking on the edit button on the right any speci c name canbe queried. By clicking on the Add DNS Server button, any speci cserver can be tested.

Figure 23 DNS Test using OptiView

A test which we use less requently is the reverse lookup.By submitting an IP address, the DNS server should return thecorresponding server that uses that name. While the command linetool nslookup can be used, you can add a reverse name lookup tothe OptiView by just entering the IP address in the screen thatpops up when you select a DNS Server and click Add . This is shownin Figure 24.

Page 63: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 63/93

Page 64: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 64/93

www fukenetworks com63

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

and ping allowing up to N hops (ping – i N). You can see the rsto these illustrated in Figure 25. The continuous ping is usuallystopped with CNTL-C. While there are slight variations o the ping

command across di erent operating systems, most support theusage described here.

Figure 25 Ping with Designated Payload

So, suppose we believe that we have a route that is droppingpackets that are over 1000 bytes because the MTU isn’t thestandard value 1460. We can send a ping that speci es a payloado 900 bytes ollowed by a ping that speci es a payload o 1100bytes. This will give us an indication. Keep in mind that thisdepends on the act that nothing in the path is blocking ICMPpackets or security reasons.

Ping connectivity tests are easy to con gure and run in theOptiView. In Figure 26 you can see the screen that appears whenyou select a device on the Discovery screen and click on Device

Detail. From here you can run ping tests and other tests that wewill discuss later.

Page 65: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 65/93

Page 66: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 66/93

www fukenetworks com65

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

Figure 28 The Ping Report

Validating Connectivity to the Application Server

There are three tasks to be accomplished here. First, we should seei we can connect to the application o interest. Second, we shouldmeasure the response time o the application. Third, we shouldcheck the vital statistics o the server.

Application Connectivity . As we discussed earlier, applicationscommunicate through ports. In particular, server applications usewell-known ports. Once the server has been selected, we can usethe same screen that we used to generate a ping test. The dropdown menu allows us to choose the TCP connectivity test. This isshown in Figure 29.

Page 67: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 67/93

Page 68: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 68/93

Page 69: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 69/93

Fluke Networks 68

Guide to Troubleshooting Application Problems

were, we get a list o the transactions, a Bounce Chart and atable o Transport Statistics about those transactions. From thetransport statistics we can see what the transaction was about

and how quickly the server responded. This is shown in Figure 34.

Figure 32 Server Overview Screen

Figure 33 HTTP Servers

Page 70: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 70/93

Page 71: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 71/93

Page 72: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 72/93

www fukenetworks com71

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

From the command line o the operating system trace route (layer3 only) is a utility that is actually an extension o the ping utility.In Windows,® i you type the command tracert x.x.x.x, a series o

ping packets are sent to the target. In the rst packet, the hopcount (TTL) is set to zero. There ore, the rst routing device thatreceives the packet is required to discard it and report its action tothe source. Then the utility increases the hop count in the packetto 1 and sends the ping. This packet passes thought the rstrouter. But that router is required to decrease the hop count by

one. So, when it arrives at the second router, the hop counthas been reduced to zero and the second router drops thepacket. However, like the rst router, it reports to the sourcethat it dropped the packet. You can see what is happening.Each successive ping packet goes one more hop. Also, each time apacket is discarded, the source gets a report telling it who dropped

the packet. By using a timer on each ping, the source can build areport like the one in Figure 36.

Figure 36 Trace Route

This creates a listing o the path that was ollowed rom the sourceto the target x.x.x.x. The trace route tool also gives us important

Page 73: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 73/93

Page 74: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 74/93

www fukenetworks com73

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

in ormation about how long it takes the ping to travel acrosseach link.

Most technicians either don’t know or don’t remember the switchesused with trace route. So, OptiView makes trace route an easy-to-use tool. In addition, it determines the layer two and layer threeroutes on a single screen and allows you to move on to router andswitch queries and more. Figure 37 shows a trace route analysis

rom OptiView.

Figure 37 OptiView Trace Route

When you nd a link that is slow or appears to have a problemon one o its attached links, you can query the device to lookat the link rom the perspective o the switch or router. Later,

we will consider a case study in which solving the problem usessuch queries.

Page 75: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 75/93

Page 76: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 76/93

Page 77: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 77/93

Page 78: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 78/93

www fukenetworks com77

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

Troubleshooting Intermittent Problems

When intermittent problems occur, most IT pro essionals turn to packetcapture to diagnose the problem. The OptiView analyzers make it easy totroubleshoot intermittent problems with advanced triggering and lter-ing to capture the tra c be ore, a ter or around the event occurrenceand ensures the event is captured the rst time to avoid doing randomtra c captures that may not contain anything o interest. To capture aspeci c event the analyzer must inspect the contents o each packet tosee i it matches a pattern or error message string which is indicativeo the event occurring. The string matching is per ormed in hardware

in real-time with line-rate gigabit capture to ensure all the relevantpackets are captured – a ter all, you can’t analyze packets that werenever captured. A total o eight sets o triggers or lters can be de nedto trigger a capture unattended or later analysis, allowing analysiswhen you have time, not when the event occurred.

Free String Match

Page 79: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 79/93

Page 80: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 80/93

www fukenetworks com79

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

Figure 41 The Capture Tab

This shows the details o the packets on the le t side o the screenand a bounce chart on the right side o the screen. The bouncechart shows the interaction and the time and size o packets inthe interaction. The packet detail on the le t is broken into threesections. The top o the screen shows a listing o the rames asthey were captured. The middle section shows the detailed decodeo the rame highlighted and the bottom section shows both theASCII value and the hex value o each byte in order.

From a portion o the bounce chart shown in Figure 42, you cansee the TCP connection being made between port 1080 on theclient and port 80 (HTTP) on the server. It takes 73.193 ms.

Page 81: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 81/93

Page 82: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 82/93

www fukenetworks com81

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

Figure 43 Server In ormation

The screen that appears will look like Figure 44. It contains a

wealth o in ormation about the exchange between the clientand the server you picked. Here you can look or evidence o poorserver per ormance, small payload sizes and other issues.

Reporting and documenting

Fluke Networks’ OptiView® Reporter works with the OptiView IntegratedNetwork Analyzers and Workgroup Analyzers to provide documenting andtrending capabilities rom one central location. The OptiView hardwareagents automatically per orm detailed device, VLAN and networkdiscovery, problem identi cation, and in rastructure device analysis.This in ormation, in turn, is automatically imported into OptiViewReporter or reporting, trending, and event noti cation.

Page 83: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 83/93

Fluke Networks 82

Guide to Troubleshooting Application Problems

Figure 44 Server Details

Starting rom the Overview screen, i you click on the Transactionstab, you can see all the transactions or a particular server. This isshown in Figure 45.

Figure 45 The Transactions Tab

Page 84: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 84/93

Page 85: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 85/93

Fluke Networks 84

Guide to Troubleshooting Application Problems

From the Overview screen, you can click on the Issues Tab.That will display all o the issues that OptiView ound in theconnection. In Figure 48 you see a list o retransmissions thatwere discovered. As we discussed in the section on TCP operation,retransmissions will cause the TCP timing algorithm to assume thelink is not robust. As a result it will slow down the rate at whichit o ers packets to the connection.

Figure 47 The Bounce Chart or Slow Server

Page 86: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 86/93

www fukenetworks com85

Application Troubleshooting Resource Center: www fukenetworks com/appts

Troubleshooting Applications

Figure 48 The Issues Tab

Fix the Problem

The third step in our ve step process is to x the problem.Because there can be a great number o causes, we can onlyassume you’ve ound it by this time. However, you should knowwhat single or what ew items combined to cause the inadequateper ormance o the application. Sometimes this step will be easy.

However, it can be both time consuming and expensive. Forexample, it could be as easy as changing the con guration o arouter. On the other hand, you might need a larger server or a

aster WAN link. However, it could also be the application itsel – as we pointed out earlier, TCP routinely operates with 50% ormore overhead, so imagine what is happening when an application

uses minimum length rames o 64 bytes. The MAC rame head-ers, IP and TCP headers and rame check sequences can occupy

Page 87: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 87/93

Fluke Networks 86

Guide to Troubleshooting Application Problems

Application Troubleshooting Resource Center: www fukenetworks com/appts

46 bytes o the rame, leaving only 18 bytes available to trans erdata. Badly written applications could require thousands o ramesexchanged to complete a single simple transaction.

Validate the Fix

I the problem a ected one particular user, spend some time to besure the user sees a di erence a ter the problem has been xed.I the entire network was down, thoroughly check that all partsare now up and unctioning well. I connectivity to a serverapplication was the issue, don’t just check connectivity to theserver. Check that the application can be reached by making surea complete transaction can occur. I a particular link was slow, runa series o ping tests over time and check back to make sure thatall o the response times are good. Taking a little extra care a ter a

x is made can lead to work time saved because the problem won’toccur again.

Document the Fix

It’s a very good idea to keep written records o what you x and tohave the records well organized. This is important to other individ-uals who may need to x a similar problem i you aren’t availableto consult. Also, records are necessary or documenting the needto acquire larger servers or test equipment. Documentation shouldinclude screen shots, notes stored in text documents, spreadsheets with incidents and time on task and a host o other kindso documents. What is important is to create a record or your

uture re erral and or others who may have need o such history.

Page 88: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 88/93

Page 89: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 89/93

Page 90: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 90/93

www fukenetworks com89

Application Troubleshooting Resource Center: www fukenetworks com/appts

Case Study 2: Investigating WAN LinkPer ormance

In this case study, we’ll see how the OptiView can be used toanalyze a WAN link that appears to be a bottleneck in the path tothe server. In this scenario, access and per ormance across a WANlink has been slow. The service provider is saying that the companyneeds an additional T-1 circuit.

In Figure 51 below, we begin with the Device Detail Inter ace

screen that is similar to the one in Case Study 1. This time we arequerying a router.

Figure 51 Inter aces on the Router

Page 91: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 91/93

Fluke Networks 90

Guide to Troubleshooting Application Problems

By highlighting the inter ace with the suspected WAN link, we cansee some o the inter ace statistics and see that it is, in act at T-1circuit running at 1.544 Mbps and that the MTU is 1500 bytes.

This means that IP packets with the standard maximum size o 1500 bytes will not be ragmented in order to traverse the link.By clicking on the bar graph button next to View . the screen inFigure 52 is produced.

Figure 52 Utilization o the WAN Link

This screen makes it obvious that the utilization o the WAN linkis only slightly under 50%. This screen is periodically updated,so you could study it over a period o time to be sure about yourconclusion. As we mentioned be ore, it would be wise to moveon to an analysis o the server or the application running on

the server.

Page 92: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 92/93

www fukenetworks com91

Guide to Troubleshooting Application Problems

Application Troubleshooting Resource Center: www fukenetworks com/appts

SummaryIn this document, we have provided in ormation on how networkapplications work. We have discussed the typical sequence or a

client to use when it makes a connection to a server and asks ora service to be provided. As part o that analysis we considered theDNS lookup, the role o ARP, how the TCP connection is opened,how the request is sent and the response is received, and howthe connection is closed. We considered the major categories o applications such as web applications, database applications and

transaction processing systems. During that explanation we notedthat 90% o the applications are based on the TCP protocol.

In analyzing how the client interacts with the server, we consid-ered the route to the server and how it can hide problems thata ect per ormance. We also did a brie tutorial on the operation

o TCP since its operation has such a signi cant infuence on howwell the application appears to per orm. We discovered that whenpackets are lost in the network, the impact on TCP per ormance issigni cant. This led us to the conclusion that the quality o thenetwork is very important.

In considering TCP operation, we described the three-wayhandshake that is used to create the connection, the methodo assuring reliability o the trans er o data and the our-wayhandshake that closes the connection.

Page 93: Trouble Shoot in Guide

8/8/2019 Trouble Shoot in Guide

http://slidepdf.com/reader/full/trouble-shoot-in-guide 93/93

Guide to Troubleshooting Application Problems

We moved on to a discussion o a Five Step Process to troubleshootapplications. We stress that a logical approach would include:(1) determining the domain o the problem; (2) per orming an

application fow analysis; (3) xing the problem; (4) validatingthat the problem is xed; and (5) documenting that was xed andhow it was xed.

Using the techniques suggested in this paper should reduce ngerpointing between your application developers and your networksta . And, it will increase your success in solving networkproblems, reducing costs to your company.

For more in ormation on Network Troubleshooting and ApplicationDiagnostics visit the Application Troubleshooting Resource Centerat

www.fukenetworks.com/apptsto request a ree 5-day network

analyzer trial, sign-up or upcoming events and downloadtechnical content