NAPUS Performance & Availability Reporting June, 2012John Sherwood.
-
Upload
millicent-cook -
Category
Documents
-
view
216 -
download
0
Transcript of NAPUS Performance & Availability Reporting June, 2012John Sherwood.
![Page 1: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/1.jpg)
NAPUSPerformance & Availability Reporting
June, 2012 John Sherwood
![Page 2: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/2.jpg)
What is NAPUS?
• Network Availability, Performance, and User Support Working Group (NAPUS-WG)– Formed under CANARIE Technical Committee at
CANHEIT 2011
• July 4, 2011 Inaugural NAPUS meeting– Goal “To enable national consistency across Canada for
measuring network availability and performance..”
• Chairs: Andre Toonk (BCnet) and JF Amiot (Cybera)
![Page 3: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/3.jpg)
NAPUS Sub-Committee
• Sept 13, 2011 meeting set up sub-committee “... to work on set of best practices and recommendations for Availability and Performance reporting”
• Two reports commissioned: – “Network Availability and Performance Monitoring and
Reporting”– “Reporting and tracking Multi-domain Lightpath service
issues”
• Reports were received, approved, and sent to NAPUS, Tech Committee, and OAC in March/April
![Page 4: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/4.jpg)
Network Availability and Performance Monitoring and Reporting
Andree ToonkJean-Francois Amiot
Jun JianGerry Miller
John SherwoodThomas Tam
![Page 5: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/5.jpg)
Goal of the Report
• “... to provide definitions and guidelines for measuring and reporting network operational status in a standardized way”
• Attempt to report Availability or Performance in a single number – e.g. “99.97% availability during March”
![Page 6: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/6.jpg)
What is Availability?
• A service, such as a network, is engineered to certain design criteria.
• The service is “available” if it meets those design criteria.
![Page 7: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/7.jpg)
What is Performance?
• Wikipedia says: “Network performance refers to the service quality of a telecommunications product as seen by the customer.” (http://en.wikipedia.org/wiki/Network_performance)
– i.e. Performance is in the eye of the beholder
• Abortive attempt to quantify:
JLD
RP
![Page 8: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/8.jpg)
Availability vs. Performance
• Availability is quantifiable and measurable.• Performance is much more subjective.• Therefore, NAPUS decided to focus their effort on
Availability.
![Page 9: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/9.jpg)
Step one: Availability of What?
• Define “Service”– “...an entity with well defined endpoints, characteristic
parameters, and performance criteria”
• A Service could be a network, a web server, or some other definable entity.
![Page 10: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/10.jpg)
Step two: Endpoints
• For a web server, there is only one• For a network there are two endpoints
– must be accurately defined– typically unidirectional
![Page 11: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/11.jpg)
Step three: Define Parameters
• Characteristic parameters define how well a service behaves
• Some possibilities for networks:– BER (bit error rate), mostly useful for layer 1 links– latency– jitter– packet loss, measured at layer 2 or 3– bandwidth
![Page 12: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/12.jpg)
Step four: Performance Targets
• Each parameter should have a performance target• May have a secondary (“degraded”) target• Service is considered “available” if it meets all of
its targets• Availability is “unknown” if data is missing
![Page 13: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/13.jpg)
Example
SERVICE TITLE IP Transport, NBnet to CANARIE Halifax
Endpoint1 NBnet perfSONAR station
Endpoint2 CANARIE Halifax perfSONAR station
Latency performance target ≤10.0 msec
Latency performance target (degraded service)
> 10.0 and ≤ 35.0 msec (to allow for failure of the Fredericton-Halifax link, and rerouting of traffic through Montreal)
IP Successful Delivery, long term ≥0.9995 in any 24 hour period
IP Successful Delivery, short term ≥0.998 in any 10 minute period
![Page 14: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/14.jpg)
Availability Definitions
TERM DEFINITION
Operational A service is considered “operational” if it meets all of its performance targets for a non-degraded service. This is the status at a moment in time.
Degraded A service is considered “degraded” if it meets all of its performance targets, except that one or more of the targets it meets are defined as a degraded target (e.g. longer than normal latency, but still usable)
Unavailable A service is considered “unavailable” if it fails to meet one or more of its operational or degraded performance targets.
Availability The fraction of time over a defined window during which a service is considered to be “operational”. This is the status over time rather than at a particular moment.
![Page 15: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/15.jpg)
Sample “Core” Network
![Page 16: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/16.jpg)
Meta Service
• Sample network is too complex to define as a service
• So, define “meta service” as a set of simpler services, e.g. {S1, S2, ... Sn}
• Then, 5 minute measures from each service are aggregated & time sorted
![Page 17: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/17.jpg)
Meta Service States
META SERVICE STATUS DEFINITION
Operational All of the most recent results from all services are “operational”
Degraded Any of the most recent results from any of the services are “degraded” and all others are “operational”
Unavailable One or more of the most recent results from any of the services are “unavailable”
Unknown Any of the most recent results from any of the services are “unknown”
![Page 18: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/18.jpg)
Mtl->Wpg Latency
![Page 19: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/19.jpg)
Mtl->Hfx Latency
![Page 20: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/20.jpg)
Mtl->Hfx Latency June 7
![Page 21: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/21.jpg)
Possible explanations
• Traffic burst – but 10msec @ 2Gbps is 20Mbits, or more than 1500
normal ethernet packets!
• Measurement error– perfSONAR station load– clock error (is ntp hiccupping?)– ...??
• Router queuing (packet has low priority)• Packet re-routing• Maybe it is real
![Page 22: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/22.jpg)
Recommendations
• CANARIE maintain perfSONAR at each core router, IX, etc
• Each ORAN measure IP Transport availability• CANARIE and each ORAN report monthly on
network availability• These reports be published
![Page 23: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/23.jpg)
perfSONAR Workshop
• Cybera will host the free Internet2 perfSONAR workshop on October 1 at Summit2012
![Page 24: NAPUS Performance & Availability Reporting June, 2012John Sherwood.](https://reader035.fdocuments.us/reader035/viewer/2022062720/56649ef65503460f94c0a55a/html5/thumbnails/24.jpg)