Post on 12-Jan-2016
description
NOC Services and Applications 1
NOC Services and Applications
AFNOG 2003
Sunday Folayan & Brian Longwe
Based on:Netmgt T4-98 by Scott BradnerNetmgt T2-99 by Abha AhujaAfnog T2-2001/2 by Brian Longwe
NOC Services and Applications 2
What is Network Management?
“In order operate a reliable service, the network must be managed according to a determined discipline, using a coherent structure of information management.”
Geoff Huston, ISP Survival Guide
NOC Services and Applications 3
What is a NOC?
Network Operations Centre (NOC) Monitors and manages a service provider’s
network• Information about current, historical and planned
availability of systems• Network status and operational statistics• Fault monitoring and management
Engineers can coordinate their work through the NOC
NOC Services and Applications 4
Network Management - Components
Parts of Network Management
• Configuration/Change management• Performance/Accounting management• Fault management • Security management
NOC Services and Applications 5
Configuration Management
Maintaining information relating to the design of the network and its current configuration
Network State• Record of network topology
– Static what is deployed where it is deployed how it is attached Who is responsible for it How do I contact them
– Dynamic operational status of the network elements
NOC Services and Applications 6
Configuration Management
inventory management• database of network elements• history of changes & problems
directory maintenance• all hosts & applications• nameserver database
host and service naming coordination• "Information is not information if you can't find it"
NOC Services and Applications 7
Configuration Management
Operational Control of network Start/stop individual components Alter configuration of devices Load and save config versions Hardware/Software upgrades Methods of access
• SNMPGet / SNMPSet• Out-of-Band access
NOC Services and Applications 8
What is SNMP?
Simple Network Management Protocol query - response system
• can obtain status from a device• standard queries• enterprise specific
uses database defined in MIB• management information base
NOC Services and Applications 9
What do we use SNMP for?
query routers for:• in and out bytes per second• CPU load• uptime• BGP peer session status
query hosts for:• network status• Message queues• Web traffic• Squid proxy load
NOC Services and Applications 10
SNMP Exercise
Please complete the SNMP Execise
NOC Services and Applications 11
Configuration Management
nnhvd
husc6
harvard
geo
oitgw1
mghgw
sphgw1
wjhgw1
wjh12
generali
talcott
harvisr
huelings
pitirium
nngw
lmagw1
dfch tch tch
SNMP driven display
NOC Services and Applications 12
Performance Management
A Consistent level of network performance Data collection
– interface stats– throughput– error rates– usage– percent availability
Data analysis for performance metrics and trends
Establishment of performance thresholds Capacity planning and deployment
NOC Services and Applications 13
Importance of Network Statistics
Accounting Troubleshooting Long-term trend analysis Capacity Planning Two different types
• active measurement• passive measurement
Management Tools have statistical functionality
NOC Services and Applications 14
MRTGTraffic Analysis for Hssi1/0/0
System: msu.mich.net in Maintainer: Interface: Hssi1/0/0 (2) IP: hssi1-0-0.msu.mich.net (198.108.22.102) Max Speed: 5630.6 kBytes/s (propPointToPointSerial)
NOC Services and Applications 15
MRTG
Checkout http://noc.ws.afnog.org/mrtg
NOC Services and Applications 16
Performance Management Tools
netflow• cflowd
(http://www.caida.org/tools/measurement/cflowd)• collects flow information from cisco routers• AS to AS information• src and destination ip and port information• useful for accounting and statistics• how much of my traffic is port 80?• how much of my traffic goes to AS237?
NOC Services and Applications 17
Netflow examples
Top ten lists (or top five) ##### Top 5 AS's based on number of bytes #######srcAS dstAS pkts bytes 6461 237 4473872 3808572766 237 237 22977795 3180337999 3549 237 6457673 2816009078 2548 237 5215912 2457515319
##### Top 5 Nets based on number of bytes ######Net Matrix----------number of net entries: 931777 SRCNET/MASK DSTNET/MASK PKTS BYTES 165.123.0.0/16 35.8.0.0/13 745858 1036296098 207.126.96.0/19 198.108.98.0/24 708205 907577874 206.183.224.0/19 198.108.16.0/22 740218 861538792 35.8.0.0/13 128.32.0.0/16 671980 467274801 ##### Top 10 Ports ####### input outputport packets bytes packets bytes119 10863322 2808194019 5712783 42730455680 36073210 862839291 17312202 138781709420 1079075 1100961902 614910 627542687648 1146864 419882753 1147081 41466321225 1532439 97294492 2158042 722584770
NOC Services and Applications 18
Accounting Management
What do you account for?• Use of the network and the services it provides
Types of accounting data• RADIUS/TACACS accounting data from Access
servers• Interface statistics• Protocol statistics
Accounting Data affects Business Models• Bill on usage?• Flat-rate billing?
NOC Services and Applications 19
Fault Management
Identify the fault• Regular polling of network elements
Isolate the fault• Diagnosis of the network components
Respond to the fault• Allocate resources to resolve the fault• Priority scheduling• Technical/management escalation
Resolve the fault• notification
NOC Services and Applications 20
Fault Management - systems
reporting mechanism• link to NOC• notify on-call personnel
setup & control alarm procedures repair/recovery procedures ticket system
NOC Services and Applications 21
Fault Management - Fault Detection
Who notices a problem with the network?• Network Operations Center w/ 24x7 operations staff
– open trouble ticket to track problem– preliminary troubleshooting– Assign engineer to problem or escalate ticket status
• Customer call• Other ISPs
NOC Services and Applications 22
Fault Management - Fault Detection (con)How can you tell if there is a problem with the
network?• Network Monitoring Tools
– common utilities ping traceroute Snmp
– Monitoring Systems NOCol Big Brother NetSaint NMIS HP Openview, etc…
• Report state or unreachability– detect node down– routing problems
NOC Services and Applications 23
Exercise: Big BrotherDownload Big Brother Source from http://t2.ws.afnog.org/downloads.htm
Follow instructions on http://t2.ws.afnog.org/bigbrother-setup-notes.txt
Set up bb-hosts to monitor routers of other tables in the class:
NOC Services and Applications 24
Fault Management - Ticket System
Very Important! Need mechanism to track:
• failures• current status of outage• carrier tickets
NOC Services and Applications 25
Fault Management:Ticket System
system provides for:• short term memory & communication• scheduling and work assignment• referrals and dispatching• oversight• statistical analysis• long term accountability
NOC Services and Applications 26
Fault Management - Ticket Usage
create a ticket on ALL calls create a ticket on ALL problems create a ticket for ALL scheduled events copy of ticket mailed to reporter and mailing
list(s) all milestones in resolution of problem maintain
the same ticket # ticket stays "open" until problem resolved Ticket reporter determines that ticket should be
closed.
NOC Services and Applications 27
Fault Management - Ticket Example
Sample opening ticketSubject Serial Number Fix sshd on T2 instructor machines 6
Area Queue none afnog-noc
Requestors Owner pfs@cisco.com inst
Status Last User Contact resolved Wed Jun 11 17:02:21 2003 (30 hr ago)
Current Priority Final Priority 1 1 Due No date assigned
Last Action Wed June 11 17:02:21 2003 (30 hr ago) Created Mon June 9 14:08:08 2003 (2 days ago)
NOC Services and Applications 28
Fault Management - Ticket Example
Sample progress ticket
TT0000033975 has been MODIFIED. Here are the fields that have been changed:
CopyOfTime : 5TTC Temp : 0Ticket information log : toppi@umich.edu said ...
While I was investigating this, Debbie from UUNet called (via Merit main number) to tell us they were seeing it down. She can be reached at xxx-xxxx. The UUNet ticket is xxxxx..
NOC Services and Applications 29
Fault Management - Ticket Example
Sample closing ticket• includes previous ticket contents plus resolution
Users on the laptop station minihub are not getting correct DHCP responses. No gateway or DNS entries are returned. Thanks, - Hervey
-- CUSTOMER INFORMATION --------------------- 'inst' (AFNOG Instructors) –
-------------------------------------There have been several issues. First, the Cisco config-switch was set so the box would forget it's config on a power cycle (and we've had a few). Second, I made a typo when I cleaned up a DNS file. Things *should* be working now (famous last words). Resolving this till I hear otherwise. GJ ---------------------------------------------------------------->otherwise. >GJ Many thanks! - Hervey
NOC Services and Applications 30
Exercise: Ticket System•Download OTRS Source from http://t2.noc.ws.afnog.org/downloads.htm
•Follow instructions on http://t2.noc.ws.afnog.org/OTRS-setup-notes.txt
•Create 2-3 users within ticket system
•Create tickets to track network occurrences as they occur - network failures will be provided ;-)
NOC Services and Applications 31
Fault Management - typical failures
• Node unpingable• no ip connectivity to router• possible reasons:
– serial link downcall telco
– router down/hardware problemcall engineer
– routing problem troubleshoot with tracerouterouteviews machine
NOC Services and Applications 32
Security Management: Do’s & Don’t’s
Dont’ leave things that are likely to be interesting to mice lying on the kitchen table overnight
Plug the holes that mice are using to get into the house Don’t provide places within the house for mice to build nests Set traps along walls where you often see mice out of the corner
of your eye Check the traps daily to rebait them and to dispose of squashed
mice. Full traps don’t catch mice, and they smell Avoid using commercial bait-and-kill poisons. Traditional snap
traps are best. Get a cat!
NOC Services and Applications 33
Security Management - Tools
security tools• cops - host configuration checker (www.cert.org)• swatch - email reports of activity on machine• Tcpwrappers – log connections, restrict access• ssh/skey – crypto authentication and communications• Tripwire – monitor changes to system files
Keep up to date with security information• bug reports
– CERT advisories mailing list: http://www.cert.org./contact_cert/certmaillist.html
• bug fixes• intruder alerts
NOC Services and Applications 34
Security Management – Good Practice
reporting procedure for security events• e.g. break-ins• abuse email address for customers to report
complaints (abuse@your-isp.net) control internal and external gateways
• control firewalls (external and internal) security log management
• centralised logging host
NOC Services and Applications 35
How do I manage my network?
Which tools should I use? What do I really need?• Keep it simple!• Need to consider engineers working remotely• Don’t want to spend too much time maintaining the
tool (it should be helping you!)• Different tools for NOC and engineers• Different tools for statistics• RELIABILITY!
NOC Services and Applications 36
References http://www.merit.edu/ipma/docs/isp.html http://www.nanog.org http://www.caida.org http://www.nlanr.net http://www.cisco.com http://www.amazing.com/internet/ http://www.isp-resource.com/ http://www.merit.edu/ipma http://www.ripe.net
NOC Services and Applications 37
More Tools!
http://www.caida.org/Tools/• OC3Mon/Coral
http://www.merit.edu/~ipma• RouteTracker• IRRj• ASExplorer
http://www.geektools.com/ http://www.merit.edu/ipma/tools/other.html
NOC Services and Applications 38
ASexplorer
NOC Services and Applications 39
Route Flap Stats
NOC Services and Applications 40
Looking Glass Tools
route-views.oregon-ix.net>show ip bgp 35.0.0.0BGP routing table entry for 35.0.0.0/8, version 56135569Paths: (17 available, best #12) 11537 237 198.32.8.252 from 198.32.8.252 Origin incomplete, localpref 100, valid, external Community: 11537:900 11537:950 2914 5696 237 129.250.0.3 (inaccessible) from 129.250.0.3 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 2914 5696 237 129.250.0.1 (inaccessible) from 129.250.0.1 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 3561 237 237 237 204.70.4.89 from 204.70.4.89 Origin IGP, localpref 100, valid, external 267 1225 237 204.42.253.253 from 204.42.253.253 Origin IGP, localpref 100, valid, external Community: 267:1225 1225:237
http://www.merit.edu/~ipma/tools/lookingglass.html
NOC Services and Applications 41
More Looking Glass Tools
Traceroute servers http://www.merit.edu/ipma/tools/trace.html
Query: trace Addr: www.isoc.org
Translating "www.isoc.org"...domain server (206.205.242.132) [OK]
Type escape sequence to abort.Tracing the route to info.isoc.org (198.6.250.9)
1 iad1-core2-fa5-0-0.atlas.digex.net (165.117.129.2) 0 msec 0 msec 4 msec 2 dca5-core2-s5-0-0.atlas.digex.net (165.117.53.41) 0 msec 4 msec 0 msec 3 dca5-core1-fa5-1-0.atlas.digex.net (165.117.56.117) 4 msec 0 msec 4 msec 4 Hssi3-1-0.BR1.DCA1.ALTER.NET (209.116.159.98) 0 msec 0 msec 4 msec 5 101.ATM2-0.XR1.DCA1.ALTER.NET (146.188.160.226) [AS 701] 4 msec 0 msec 4 msec 6 195.ATM7-0.XR1.TCO1.ALTER.NET (146.188.160.102) [AS 701] 4 msec 0 msec 0 msec 7 193.ATM8-0-0.GW1.TCO1.ALTER.NET (146.188.160.33) [AS 701] 4 msec 4 msec 4 msec 8 charlie.isoc.org (198.6.250.1) [AS 701] 8 msec 8 msec 8 msec 9 info.isoc.org (198.6.250.9) [AS 701] 8 msec * 12 msec
NOC Services and Applications 42
SNMP Tool references
• MON - http://www.kernel.org/software/mon/• NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz • Sysmon - ftp://puck.nether.net/pub/jared • Rover - http://www.merit.edu/~rover• Concord - http://www.concord.com• http://www.merit.net/~netscarf