SAA Troubleshooting
-
Upload
weerapong-pumngern -
Category
Documents
-
view
227 -
download
0
Transcript of SAA Troubleshooting
-
8/10/2019 SAA Troubleshooting
1/42
Troubleshooting Network Degradation
with SAA Rujipars AungkhanawinAlcatel-Lucent
March 25, 2013
-
8/10/2019 SAA Troubleshooting
2/42
-
8/10/2019 SAA Troubleshooting
3/42
SAA Configuration Review
GUI Configuration in 5620-SAM
Configuration Generated on NE
Script Files
-
8/10/2019 SAA Troubleshooting
4/42
Naming Convention
ICMP Ping Test Name
I_BASD03_01/1/10-BPBI13_02/2/20
I_TASD03_01/1/10-TPBI13_02/2/20
ETH-CFM Test Name
E_BASD03_01/1/10-BPBI13_02/2/20
E_TASD03_01/1/10-TPBI13_02/2/20
SAA Name Example
Use one
WordUse Four
Words
PNCABKCA17W
Node Name & CLLI
CSN & WiFi Gateway
I_TASD03_01/L002-TPBI13_02/L001I_TASD03_01/L001-TPNCA1_02/1/12
Note: Don't use LAG for TUC testing
01/1/10 = Card/MDA/Port number
L001 = Lag1
I = Icmp-pingE = Eth-CFM
B = BFKT
T = TUC
ASD03 = Node name(Tested node)
PNCA7
Replace Code
CWT = MTG
HAM = PBI
SRW = BRK
PTN, RCU, RN MSC
-
8/10/2019 SAA Troubleshooting
5/42
ICMP Ping SAA Test Configuration
-
8/10/2019 SAA Troubleshooting
6/42
SAA Configuration in 5620-SAMUnder menu Tools/Service Test Manager (STM)
-
8/10/2019 SAA Troubleshooting
7/42
ICMP Ping Properties General
- Name and Descriptions are same
System address of node generating the test
(Not the IP address for ICMP ping)
Tar get IP address for ICMP Ping
Egress interface for ICMP Ping
-
8/10/2019 SAA Troubleshooting
8/42
ICMP Ping Properties Parameters
1000 Ping packets per test
Time-out to determine that there is no response
for a ping packet
Interval has no meaning for rapid ping
Rapid Ping, generate 100 packets per second
QoS marking for generated ping packet
as Network Control, in-profile
On-demand execution
-
8/10/2019 SAA Troubleshooting
9/42
ICMP Ping Properties Result Configuration
Will consider test fail if number of time-out
pac e exceee pac e n a es
Generate SNMP trap only when test fail,
not every time the test is done.
-
8/10/2019 SAA Troubleshooting
10/42
ICMP Ping Properties Threshold Alarm generated when number of packet loss reach a defined threshold
Alarm cleared when number of packet loss goes below the theshold
-
8/10/2019 SAA Troubleshooting
11/42
ICMP Ping Properties Threshold
Time since last threshold crossing occurs
Number of acket loss when
Threshold crossing alarm will be
generated when there are 10or more packet loss in 1 test
last threshold crossing occur
-
8/10/2019 SAA Troubleshooting
12/42
SAA Test Configuration Created on NE*A:HAMMBK0900M>config>saa# info
----------------------------------------------
test "I_THAM09_01/1/01-THAM05_02/1/12" owner "sas:0:1:1:e"
description "I_THAM09_01/1/01-THAM05_02/1/12"
type
icmp-ping 10.197.254.44 rapid size 1500 source 10.197.170.18 next-hop 10.197.170.17 count 1000
exit
trap-gen
test-fail-enable
exit
oss-event r s ng-t res o 10
no shutdownexit
test "I_THAM09_02/1/01-THAM05_03/2/05" owner "sas:0:2:2:e"
description "I_THAM09_02/1/01-THAM05_03/2/05"
type
icmp-ping 10.197.254.44 rapid size 1500 source 10.197.170.238 next-hop 10.197.170.237 count 1000
exit
trap-gentest-fail-enable
exit
loss-event rising-threshold 10
no shutdown
exit
----------------------------------------------
Nex-hop parameter added by CLI configuration after
configuration created by Service Test Manater GUI
-
8/10/2019 SAA Troubleshooting
13/42
CRON Configuration (Test Scheduling)
*A:HAMMBK0900M>config>cron# info----------------------------------------------
script "SAA-Icmp-ping"location "cf3:/cron-script/SAA-Icmp-ping.txt"no shutdown
script define script files to be scheduled by CRON (via script binding)
action define binding of script and result location and will be
referred to by CRON
script "Deleted-SAA-result"location "cf3:/cron-script/Deleted-SAA-result.txt"no shutdown
exitaction "SAA-Icmp-ping"
results "cf3:/cron-result/SAA-Icmp-ping"
script "SAA-Icmp-ping"no shutdownexitaction "Deleted-SAA-result"
results "cf3:/cron-result/Deleted-SAA-result"script "Deleted-SAA-result"no shutdown
exit
-
8/10/2019 SAA Troubleshooting
14/42
CRON Configuration (Test Scheduling)
schedule "SAA-Icmp-ping"description "SAA-Icmp-ping"
action "SAA-Icmp-ping"type calendarday-of-month allhour allminute 1 30
Run SAA test every the minute of 1 and 30, every hour, everyday.
Its resulting log will be cleared by Delete-SAA-result script.
schedule define when to execute the action
weekday allno shutdownexitschedule "Deleted-SAA-result"
description "Deleted-SAA-result"action "Deleted-SAA-result"type calendarday-of-month allhour 4minute 45month allweekday allno shutdown
exit----------------------------------------------
Run script to clear log resulted by SAA test schedule at 4:45 am everyday.
Its result, in turn, deleted by SAA-Icmp-ping script.
-
8/10/2019 SAA Troubleshooting
15/42
Script Files
*A:HAMMBK0900M>file cf3:\cron-script\ # dir
Volume in drive cf3 on slot A has no label.
Volume in drive cf3 on slot A is formatted as FAT32.Directory of cf3:\cron-script\
02/20/2013 10:12a .
03/26/2013 11:40a ..
02/18/2013 02:51p 38 Deleted-SAA-result.txt
02/20/2013 10:12a 195 SAA-Icmp-ping.txt
There are 2 script files used, SAA-Icmp-ping.txt and Deleted-SAA-result.txt
e s y es.
2 Dir(s) 1601929216 bytes free.*A:HAMMBK0900M>file cf3:\cron-script\ # type SAA-Icmp-ping.txt
File: SAA-Icmp-ping.txt
-------------------------------------------------------------------------------
exit all
oam saa I_THAM09_01/1/01-THAM05_02/1/12 owner sas:0:1:1:e start
oam saa I_THAM09_02/1/01-THAM05_03/2/05 owner sas:0:2:2:e start
file delete cf3:/cron-result/Deleted*.* force
===============================================================================
*A:HAMMBK0900M>file cf3:\cron-script\ # type Deleted-SAA-result.txt
File: Deleted-SAA-result.txt
-------------------------------------------------------------------------------
file delete cron-result\SAA*.* force
===============================================================================
Run SAA test one by one
Delete result file generated by the
Deleted-SAA-result script
Delete result log generated by SAA Test
It self also create log, deleted by
SAA-Icmp-ping script a bove.
-
8/10/2019 SAA Troubleshooting
16/42
SAA Working Status on NE*A:HAMMBK0900M# show saa "I_THAM09_0"I_THAM09_01/1/01-THAM05_02/1/12" "I_THAM09_02/1/01-THAM05_03/2/05"
*A:HAMMBK0900M# show saa "I_THAM09_01/1/01-THAM05_02/1/12" owner
owner
"sas:0:1:1:e"
*A:HAMMBK0900M# show saa "I_THAM09_01/1/01-THAM05_02/1/12" owner "sas:0:1:1:e"
===============================================================================
SAA Test Information
===============================================================================
Test name : I_THAM09_01/1/01-THAM05_02/1/12
Owner name : sas:0:1:1:e
Type the command and partial SAA name
then press TAB for auto fill/hint.
Dont forget to put the owner parameter.
Description : I_THAM09_01/1/01-THAM05_02/1/12
Accounting policy : None
Continuous : No
Administrative status : Enabled
Test type : icmp-ping 10.197.254.44 rapid size 1500 source
10.197.170.18 next-hop 10.197.170.17 count 1000
Trap generation : test-fail-enable test-fail-threshold 1
Test runs since last clear : 1642
Number of failed test runs : 12Last test result : Success
-------------------------------------------------------------------------------
Threshold
Type Direction Threshold Value Last Event Run #
-------------------------------------------------------------------------------
Loss-rt Rising 10 15 03/24/2013 09:30:21 1534Falling None None Never None
===============================================================================
Number of packet loss and date/time
when that exceed occurs.
Interesting test result
The ping command can be executedmanually with these same parameter.
(use ping in CLI instead of icmp-ping
-
8/10/2019 SAA Troubleshooting
17/42
SAA Working Status on NE (Contd)*A:HAMMBK0900M# show saa "I_THAM09_01/1/01-THAM05_02/1/12" owner "sas:0:1:1:e"
Loss-rt Rising 10 15 03/24/2013 09:30:21 1534
Falling None None Never None
===============================================================================
Test Run: 1643
Total number of attempts: 1000
Number of requests that failed to be sent out: 0
Num er o responses t at were rece ve : 1000
Number of test runs, should be counting every hour
Number of requests that did not receive any response: 0
Total number of failures: 0, Percentage: 0
(in ms) Min Max Average Jitter
Outbound : 0.000 0.000 0.000 0.000
Inbound : 0.000 0.000 0.000 0.000
Roundtrip : 0.434 2.00 0.563 0.135
Per test packet:
Sequence Outbound Inbound RoundTrip Result
1 0.000 0.000 0.464 Response Received2 0.000 0.000 0.462 Response Received
3 0.000 0.000 0.464 Response Received
997 0.000 0.000 0.534 Response Received
998 0.000 0.000 0.570 Response Received
999 0.000 0.000 0.531 Response Received
1000 0.000 0.000 0.557 Response Received
Summary of round-trip time
Round-trip time per test packet
kept for 3 tests = 3000 packets total
-
8/10/2019 SAA Troubleshooting
18/42
-
8/10/2019 SAA Troubleshooting
19/42
SAA Configuration in 5620-SAMUnder menu Tools/Service Test Manager (STM)
The example is filtered to display only ETH-CFM tests
-
8/10/2019 SAA Troubleshooting
20/42
ETH-CFM Properties General
- Name and Descriptions are same
Tar get MAC address
System address of node generating the test
-
8/10/2019 SAA Troubleshooting
21/42
ETH-CFM Properties Test Parameters &Results Configuration
Number of Loopback (LB) messages sent in a test
NE will raise SNMP trap when there is
at least 1 message loss in 1 test
-
8/10/2019 SAA Troubleshooting
22/42
ETH-CFM Properties Threshold Alarm generated when number of packet loss reach a defined threshold
Alarm cleared when number of packet loss goes below the theshold
-
8/10/2019 SAA Troubleshooting
23/42
ETH-CFM Properties Threshold (Contd)
Threshold crossing alarm will be
generated when there are 3
or more packet loss in 1 test
Time since last threshold crossing occurs
Number of packet loss when
last threshold crossing occur
-
8/10/2019 SAA Troubleshooting
24/42
SAA Test Configuration Created on NE*A:EKCCBK0200M# /configure saa*A:EKCCBK0200M>config>saa# info----------------------------------------------
test "E_TEKC02_02/1/03-TEKC23_01/2/02" owner "sas:0:1:1:k"
description "E_TEKC02_02/1/03-TEKC23_01/2/02"type
eth-cfm-loopback 00:00:00:00:00:01 mep 102 domain 1 association 2 size 1500 count 10 timeout1 interval 1
exittrap-gen
test-fail-enable
ex tloss-event rising-threshold 3no shutdown
exit----------------------------------------------
*A:EKCCBK0200M>config>saa# /configure eth-cfm*A:EKCCBK0200M>config>eth-cfm# info----------------------------------------------
domain 1 format none level 3
association 2 format string name "SAA.EKC02.EKC23"bridge-identifier 720000472exit
exitexit
----------------------------------------------
MEP ID = 102
MD Index = 1MA Index = 2
Assosiation 2 is configured
under servicd ID 720000472
-
8/10/2019 SAA Troubleshooting
25/42
SAA Test Configuration Created on NE*A:EKCCBK0200M# show eth-cfm cfm-stack-table===============================================================================CFM Stack Table Defect Legend:R = Rdi, M = MacStatus, C = RemoteCCM, E = ErrorCCM, X = XconCCM, A = AisRx
===============================================================================CFM SAP Stack Table===============================================================================Sap Lvl Dir Md-index Ma-index MepId Mac-address Defect-------------------------------------------------------------------------------2/1/3:624.0 3 Up 1 2 102 00:00:00:00:00:02 ------===============================================================================
Where
MEP ID = 102
MD Index = 1
MA Index = 2
Got this
SAP ID = 2/1/3:624.0
*A:EKCCBK0200M# /configure service vpls 720000472 sap 2/1/3:624.0*A:EKCCBK0200M>config>service>vpls>sap# info----------------------------------------------
eth-cfmmep 102 domain 1 association 2 direction up
mac-address 00:00:00:00:00:02no shutdown
exitexit
----------------------------------------------
View the configuration of the knownservice ID and SAP
Following the previous step, all the configuration related to the ETH-CFM test
could be tracked.
-
8/10/2019 SAA Troubleshooting
26/42
-
8/10/2019 SAA Troubleshooting
27/42
CRON Configuration*A:EKCCBK0200M>config>cron# info----------------------------------------------
script "SAA-ETH-CFM"location "cf3:/cron-script/SAA-ETH-CFM.txt"no shutdown
exitaction "SAA-ETH-CFM"
results "cf3:/cron-result/SAA-ETH-CFM"script "SAA-ETH-CFM"no shutdown
exitschedule "SAA-ETH-CFM"
escr pt on SAA-ETH-CFMaction "SAA-ETH-CFM"type calendarday-of-month allhour allminute 15 45month allweekday allno shutdown
exit
schedule "Deleted-SAA-result"description "Deleted-SAA-result"action "Deleted-SAA-result"type calendarday-of-month allhour 4minute 45month allweekday all
no shutdownexit
----------------------------------------------
Run SAA test every the minute of 15 and 45, every hour, everyday.
Its resulting log will be cleared by Delete-SAA-result script.
-
8/10/2019 SAA Troubleshooting
28/42
Script Files*A:EKCCBK0200M>file cf3:\cron-script\ # dirVolume in drive cf3 on slot A is SMART.
Volume in drive cf3 on slot A is formatted as FAT32.
Directory of cf3:\cron-script\
03/05/2013 05:25p .
04/01/2013 02:03a ..
02/15/2013 05:07p 128 SAA-Icmp-ping.txt
- - .
03/05/2013 05:25p 121 SAA-ETH-CFM.txt
3 File(s) 287 bytes.
2 Dir(s) 1764376576 bytes free.
*A:EKCCBK0200M>file cf3:\cron-script\ # type SAA-ETH-CFM.txt
File: SAA-ETH-CFM.txt
-------------------------------------------------------------------------------
exit all
oam saa "E_TEKC02_02/1/03-TEKC23_01/2/02" owner "sas:0:1:1:k" startfile delete cron-result\Deleted*.* force
===============================================================================
*A:EKCCBK0200M>file cf3:\cron-script\ # type Deleted-SAA-result.txt
File: Deleted-SAA-result.txt
-------------------------------------------------------------------------------
file delete cron-result\SAA*.* force
===============================================================================
Run SAA test one by one, if more then one. Delete result file generated by the
Deleted-SAA-result script
-
8/10/2019 SAA Troubleshooting
29/42
SAA Troubleshooting Example
-
8/10/2019 SAA Troubleshooting
30/42
Troubleshooting SAA Alarms
Confirm the fault location (Optional)
SAA name already gives clue of problem location Recheck to make sure that given name is correctly
define the problem location
Correlate between SAA alarm and other faults
If no other fault related to the problem location
indicated by SAA, apply verification steps.
If other alarm(s) on the location indicated by SAA,
follow those alarms troubleshooting steps.
-
8/10/2019 SAA Troubleshooting
31/42
SAA Alarm Example #1 3 links on different cards at PNCA BKCA17W ( PNC1 CSN )Its unlikely that the problem should caused from 3 faulty cards/port at the same time.
Anyway the demonstration will pick one SAA test to check further for example.
3 links on 3 difference cards
-
8/10/2019 SAA Troubleshooting
32/42
Name of the SAA test
I_TPNCA7W_04/2/12-TPKG04_02/1/05
I = ICMP Ping
T = TUC
PNCA7W is from PNCA BKCA17W
T = TUC
The port on PKG-04 that is
Example #1 Understanding the situation
w ere e es s run ng on
The port that originate the
SAA ICMP ping to its neighbor
PKG04 is from PKGGBK0406W
(PKG-04 PTN), the Ping target
e arge o p ng
Summary
The SAA test which use ICMP ping from CSN PNC-1
(PNCABKCA17W) egress on port 4/2/12 to PTN PKG-04
(PKGGBK0406W) ingress on port 2/1/5 experienced
packet loss of 14 packets from 1000 packets sent.
The packet loss of 14 exceed the threshold of 10, so
the alarm was raised on Mar 29 16:30:24 local time.
Time of detection
-
8/10/2019 SAA Troubleshooting
33/42
Example #1 Go to the NE go check
Its not always needed to go to the NE if other alarms already indicate the
problem of some component involving the alarm, for example
Alarm on transmission network of the link
Alarm on the NE itself indicating faulty card/port If there is no other alarm related to the components used by the SAA test,
or just want to check on the NE for some reason, the NE CLI could be reached
by right click on the NE then select NE Sessions then Telnet Session or
SSH Session.
-
8/10/2019 SAA Troubleshooting
34/42
Example #1 View the SAA Test status
*A:PNCABKCA17W# show saa "I_TPNCA7W_0"I_TPNCA7W_01/2/03-THAM05_01/1/03" "I_TPNCA7W_02/2/11-TBGU05_02/1/03""I_TPNCA7W_04/2/12-TPKG04_02/1/05
*A:PNCABKCA17W# show saa "I_TPNCA7W_04/2/12-TPKG04_02/1/05" ownerowner "sas:0:3:3:e"
*A:PNCABKCA17W# show saa "I_TPNCA7W_04/2/12-TPKG04_02/1/05" owner "sas:0:3:3:e"
===============================================================================
Type show saa I_ then press TAB, the CLI will partially
fill the name and show the available SAA test names.
Type a few more character then press tab, the SAA name
will be filled, type the keyword owner then press TAB again.
Type s, the CLI will fill the owner of the SAA test
automatically, then press ENTER to view the result.
SAA Test Information===============================================================================Test name : I_TPNCA7W_04/2/12-TPKG04_02/1/05Owner name : sas:0:3:3:eDescription : I_TPNCA7W_04/2/12-TPKG04_02/1/05Accounting policy : NoneContinuous : NoAdministrative status : EnabledTest type : icmp-ping 10.197.254.46 rapid size 1500 source
10.100.17.53 next-hop 10.100.17.54 count 1000
Trap generation : test-fail-enable test-fail-threshold 1Test runs since last clear : 1529Number of failed test runs : 1Last test result : Success-------------------------------------------------------------------------------ThresholdType Direction Threshold Value Last Event Run #-------------------------------------------------------------------------------Jitter-in Rising None None Never None
Press any key to continue (Q to quit)
Source 10.100.17.53 is the
parameter that precisely identify
the network interface. Will usethis value for the next step.
Note Although the port number
in the SAA test name can also
identify the ingress/egress port
it could be incorrect by some
mistake in configuration.
-
8/10/2019 SAA Troubleshooting
35/42
Example #1 View the SAA Test status (contd)
-------------------------------------------------------------------------------ThresholdType Direction Threshold Value Last Event Run #-------------------------------------------------------------------------------Jitter-in Rising None None Never None
Falling None None Never None
Loss-out Rising None None Never NoneFalling None None Never None
Loss-rt Rising 10 14 03/29/2013 16:30:24 1396Falling None None Never None
===============================================================================
Test Run: 1528Total number of attempts: 1000Number of requests that failed to be sent out: 0Number of responses that were received: 1000Number of requests that did not receive any response: 0Total number of failures: 0, Percentage: 0(in ms) Min Max Average JitterOutbound : 0.000 0.000 0.000 0.000Inbound : 0.000 0.000 0.000 0.000
Roundtrip : 1.33 2.08 1.37 0.013Per test packet:Sequence Outbound Inbound RoundTrip Result
1 0.000 0.000 1.38 Response Received2 0.000 0.000 1.44 Response Received3 0.000 0.000 1.37 Response Received4 0.000 0.000 1.36 Response Received5 0.000 0.000 1.36 Response Received6 0.000 0.000 1.36 Response Received
-
8/10/2019 SAA Troubleshooting
36/42
Example #1 Check for the ports involved
Show router interface and look for the line that contain
the source IP address and also one previous line.
From the command below, we can confirm that
The port that source the SAA test is 4/2/12
The network interface is To_PKG04_7750_G2/1/5
The interface name is also like a description, CANNOT ensure that
the target node is PKG-04 or the target port is 2/1/5.
*A:PNCABKCA17W# show router interface | match 10.100.17.53 pre-lines 1
To_PKG04_7750_G_2/1/5 Up Up/Down Network 4/2/1210.100.17.53/30 n/a
*A:PNCABKCA17W#
-
8/10/2019 SAA Troubleshooting
37/42
Example #1 Check for the ports involved
*A:PNCABKCA17W# telnet 10.100.17.54Trying 10.100.17.54 ...##################################################################### W A R N I N G ###################################################################### ## Unauthorized access to this system is forbidden and will be #
Telnet using the destination IP address
The ingress port at the destination is 2/1/5
The network interface is To_PNC1_7750_G_4/2/12
.# authorized user. #
# ## By accessing this system, you agree that your actions may be ## monitored if unauthorized usage is suspected. ## #####################################################################Login: someuserPassword:
*A:PKGGBK0406W# show router interface | match 10.100.17.54 pre-lines 1To_PNC1_7750_G_4/2/12 Up Up/Down Network 2/1/5
10.100.17.54/30 n/a*A:PKGGBK0406W#
Show router interface and look for the line that contain
the destination IP address and also one previous line.
-
8/10/2019 SAA Troubleshooting
38/42
Example #1 Check for the ports involved
From the previous checking we can confirm that the SAA alarm correctly
indicate the problem between
CSN PNC-1 (PNCABKCA17W) port 4/2/12 and
PTN PKG-04 (PKGGBK0406W) port 2/1/5as indicated by the SAA name
If the checking result indicate some inconsistence information, for example
Incorrect destination node name Incorrect source/destination port number
The checking result from the previous CLI steps should be used as reference.
*** The correction of SAA name, to reflect the correct source/destinationdescription, require to
Remove and re-create in Service Test Manager GUI
Reconfigure the new SAA in CLI to add the next-hop parameter
Edit the script file to update the SAA test name to be executed
-
8/10/2019 SAA Troubleshooting
39/42
SAA Alarm Example #2 In this case, PTN RST-02 rebooted completed at 13:34, started some time before that.It could be concluded that the SAA alarms on the underlying RCU caused from the PTN reboot.
Node reboot at PTN cause
SAA test fail at many RCUs
-
8/10/2019 SAA Troubleshooting
40/42
SAA Alarm Example #3 Alarm on SAA test named E_TEKC02_02/1/03-TEKC23_01/2/02. This alarm happened 20 days ago, no clue in theactive Alarm window.
Try checking in Historical Alarm
-
8/10/2019 SAA Troubleshooting
41/42
SAA Alarm Example #3 Historical Alarm indicate that there was some problem of the link between EKC-23and RCUs like EKC-02, EKC-20
In this case, no need
-
8/10/2019 SAA Troubleshooting
42/42
Further Troubleshooting Commands
ICMP Ping Test (For network ports)
show port detail
show card detail
In case that the Alarm / Historical Alarm could not identify the cause of
SAA Alarm raised, the following command shall be applied to the related
components (port/card/sap/sdp).
ETH-CFM (For access ports)
show service id sap detail
show service id sdp detail