Sclends basic network troubleshooting

34
Is Evergreen Slowing Down – Basic Network Troubleshooting Rogan Hamby, June 13 th 2013

Transcript of Sclends basic network troubleshooting

Page 1: Sclends basic network troubleshooting

Is Evergreen Slowing Down – Basic Network TroubleshootingRogan Hamby, June 13th 2013

Page 2: Sclends basic network troubleshooting

Is Evergreen slowing down?

Page 3: Sclends basic network troubleshooting

There could be several culprits.

Page 4: Sclends basic network troubleshooting

Staff Client Issues

Page 5: Sclends basic network troubleshooting

There are known memory leaks in the staff client. These are being

actively addressed by the community.

Page 6: Sclends basic network troubleshooting

If this is happening it probably isn’t happening the same at all stations.

Reboot the troubled station.

Page 7: Sclends basic network troubleshooting

Network Issues

Page 8: Sclends basic network troubleshooting

From your local switch having fits to a router in Tennessee dying to

someone in Atlanta doing a thirteen terabit backup we are at the mercy

of the pipes inbetween.

Page 9: Sclends basic network troubleshooting

Usually these problems will grow slowly. All machines will be affected but it may not seem like that at first as some activities are more prone to

interruption.

Page 10: Sclends basic network troubleshooting

Staff facing patrons and those functions moving large data frames (e.g. cataloging) will usually notice

first because lost packets and latency have the greatest

perceivable impact.

Page 11: Sclends basic network troubleshooting

Now it’s important to look at your network path. There are many

common elements in the paths from SCLENDS member libraries to the

hosting facility but no universal ones except the last few.

Page 12: Sclends basic network troubleshooting

If you use ICMP or UDP based tools be aware of the false positives they

can give since they are often blocked.

Page 13: Sclends basic network troubleshooting

I recommend that you use TCP based trace routes.

Page 14: Sclends basic network troubleshooting

Windows – Pingplotter Pro

http://www.pingplotter.com/pro/

Page 15: Sclends basic network troubleshooting

Linux – traceroute -T

Page 16: Sclends basic network troubleshooting

Mac – Path Analyzer ProUses protocol paths, not just hops.

http://www.pathanalyzer.com/

Page 17: Sclends basic network troubleshooting

If the issue is on your local LAN or anywhere in SC and ongoing you need to either address the issue

internally or with the State level e-rate board.

Page 18: Sclends basic network troubleshooting

If the issue is outside SC we can look at trying to appeal for a remedy or some kind of routing but we can’t

guarantee results.

Page 19: Sclends basic network troubleshooting

If the issue is at the hosting facility we can fix the issues immediately.

Page 20: Sclends basic network troubleshooting

Standard Traceroute

Page 21: Sclends basic network troubleshooting

TCP Based Path

Page 22: Sclends basic network troubleshooting

So… what if everything so far looks clear?

Page 23: Sclends basic network troubleshooting

It’s a SERVER(s)!

Page 24: Sclends basic network troubleshooting

Our Setup

Load Balancer

App Servers

Production

Replication and Reporting

Database Servers

Page 25: Sclends basic network troubleshooting

How can I tell which has just gone to meet Werner Jacob?

(warning: broad simplifications ahead)

Page 26: Sclends basic network troubleshooting

If it’s the DB servers then everything goes to heck starting with database

retrieval and the errors will say ‘SQL’ in them somewhere usually. But it’s

quick!

Page 27: Sclends basic network troubleshooting

If it’s only the replication one then only reports will be affected

including notices.

Page 28: Sclends basic network troubleshooting

App bricks – its very rare for all four app bricks to fail at once so usually some machines will do fine while others have issues or it appears

random.

Page 29: Sclends basic network troubleshooting

Example: When catalogers have template issues, they may have lost them on one brick but not others.

Page 30: Sclends basic network troubleshooting

When a brick crashes you will usually get errors referencing

various PM files (perl modules) or specific scripts.

Page 31: Sclends basic network troubleshooting

When it’s the load balancer – everything slows down painfully and everything goes to heck. Eventually stations will time out and errors will

reflect that.

Page 32: Sclends basic network troubleshooting

Don’t jump to conclusions but these examples should give you some

insight into the kinds of things to look for.

Page 33: Sclends basic network troubleshooting

Copy errors. Observe and report. Communicate on listserv. IRC

channel is also available specific to SCLENDS. Call Rogan in an

emergency (he’s not always at his desk).

Page 34: Sclends basic network troubleshooting