0% found this document useful (0 votes)
1K views174 pages

SAN Congestion! Understanding, Troubleshooting, Mitigating in A Cisco Fabric

This document discusses SAN congestion caused by slow drain, which occurs when devices do not receive data at the expected line rate. It defines key slow drain terminology like buffer-to-buffer credits, transitions to zero credits, slow ports, and credit loss recovery. It also describes features in Cisco MDS switches for detecting and mitigating slow drain issues, such as stuck port detection and dropping frames to free up buffer space.

Uploaded by

Pavan Nav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views174 pages

SAN Congestion! Understanding, Troubleshooting, Mitigating in A Cisco Fabric

This document discusses SAN congestion caused by slow drain, which occurs when devices do not receive data at the expected line rate. It defines key slow drain terminology like buffer-to-buffer credits, transitions to zero credits, slow ports, and credit loss recovery. It also describes features in Cisco MDS switches for detecting and mitigating slow drain issues, such as stuck port detection and dropping frames to free up buffer space.

Uploaded by

Pavan Nav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

SAN Congestion!

Understanding, Troubleshooting,
Mitigating in a Cisco Fabric
Edward Mazurek
Technical Lead Data Center Storage Networking
CCIE 6448
BRKSAN-3446
Agenda

• Introduction
• Slow Drain Terminology
• Understanding Fibre Channel Flow Control
• MDS Slow Drain Features
• Troubleshooting Slow Drain
• Alerting and Mitigating Slow Drain
• Conclusion
Introduction
• Slow drain is a term to describe SAN congestion
• When devices do not receive data at the line rate this can cause congestion in
the SAN
• SANs are getting increasing complex and heterogeneous
• Many different speeds
• Many different types of devices
• Host/storage workloads increasing
Reasons for Slow Drain
• Edge devices - An edge device can be slow to respond for a variety of reasons:
• Server performance problems: application or OS
• Host bus adapter (HBA) problems: driver or physical failure
• Speed mismatches: one fast device and one slow device
• Nongraceful virtual machine exit on a virtualized server, resulting in packets held in HBA
buffers
• Storage subsystem performance problems, including overload
• Inter Switch Links (ISL)
• Lack of B2B credits for the distance the ISL is traversing
• Ex: 4 credits per KM @ 8Gbps
• The existence of slow drain edge devices
• Edge devices with faster speeds than ISLs even when port-channeled
Reasons for Slow Drain
Port-channel BW not the same as individual link BW 4x4Gb not equal 16Gb
Member ISL sending at full 4Gbps rate causing congestion back to storage

8Gb Port-
H1 channel
16Gb(total)
4 x 4Gb links VOQs
8Gb

No B2B No R_Rdy
Src, dst, oxid Credits Sent

0 Rx credits remaining

Tx queue full due


Individual exchange Congestion
To slower link Congestion
traverses single ISL
Slow Drain Terminology

SAN Congestion! BRKSAN-3446


Slow Drain Terminology
• B2B – Buffer to Buffer Credits / Credits Remaining
• B2B Transitions to Zero
• Slow Ports / B2B Credit not Available
• Stuck Ports
• Credit Loss Recovery
Slow Drain Terminology
B2B credits / Credits remaining End Device MDS

• Buffer to Buffer credits or B2B FLOGI


x Tx credits
credits are the agreed upon buffer x Rx credits
space on each side of a FC link ACC(FLOGI)
y Tx credits
• Occurs on FLOGI and ACC(FLOGI) y Rx credits

• Occurs on ELP and ACC(ELP)


MDS MDS
• B2B credit remaining is the count of
FC frames that still can be sent by ELP
x Tx credits
each side of a FC link x Rx credits
ISL
ACC(ELP)
• Credits are returned by R_Rdy FC y Tx credits
ordered set y Rx credits
Slow Drain Terminology
B2B Transitions to zero
• Whenever a port hits zero credits this is
counted as a “B2B transition to zero”
• Occurs and is counted in both Tx and
Rx direction Int fc1/1
FC Frame
3 Tx credits F FC Frame
• Transmit B2B transition to zero indicates FC Frame
attached device didn’t return credits
• Receive B2B transitions to zero indicates Increment Tx transitions
to zero
port didn’t return credits
• Important: Amount of time at zero
credits is not easily determined 0 remaining Tx credits
• Can occur normally 130 transmit B2B credit transitions to zero
Slow Drain Terminology
Slow Port Detection
• A port that is attached to a FC FC Frame
Int fc1/1
device that returns credits slowly 3 Tx credits E FC Frame
FC Frame

• The receiver of the FC frame does


not immediately return an R_Rdy
message to the sender
• B2B Tx Credit not available is a 0 remaining Tx credits
term most often used when the 0ms
New!
MDS detects that the B2B credits Record slowport event (if configured)
are at zero for 100ms 100ms
Increment Tx credit not available
• Called a Slow Port
200ms
Increment Tx credit not available
MDS Slow Drain Features 0 Tx credits
0 sec --
Frame
Frame
Frame
Frames in Rx queue
from other ports
Stuck Port Detection --
Frame
Frame

-- No-credit-
• If no-credit-drop is configured and Tx drop 300ms
credits are at 0 for that amount of time 300ms --
then port is considered “stuck”.
--
• Start dropping frames immediately Drop frames in Rx queue
--
without regard to age of frames
No Tx credits

• Any newly arriving frames are dropped


immediately as long as the port remains Drop any new arriving
at 0 Tx credits Frames immediately

• Frees up frames queued at ISLs


destined for slow/stuck ports quicker --
Credit
Once credit arrives
Frame
resume sending
Frame
Frame
Successful
Slow Drain Terminology Credits
Credit Loss Recovery 0 sec --
• If transmit credits are at zero for 1 second No Credits (Stuck)
(F port) 1.5 second (E) port then it invokes
1/1.5 sec -- LR
credit loss recovery
• Link Reset(LR) is transmitted LRR
+60ms --
• If Link Reset Response(LRR) is received then Port resumes normal
both sides are back at the full B2B credits operation –
nondisruptive
• If LRR is not received then link is failed
Unsuccessful
• Counter is incremented and optionally a Credits
port-monitor alert can be generated 0 sec --
No Credits (Stuck)
• Link Reset is better named Link Credit
1/1.5 sec --
Reset – Part of FC-FS - Framing and LR

Signaling No Response
+60ms -- Shut/No Shut
Understanding Fibre
Channel Flow Control

SAN Congestion! BRKSAN-3446


Understanding Fibre Channel Flow Control
• Fiber Channel classes
• Fibre Channel class 3
• Fibre Channel Flow Control
• Fibre Channel Flow Control – Example
• MDS frame and credit processing
Fiber Channel classes

Class 1 X
F
Class 2 X
a
b Class 3 X
r Class 4 X
i Class 6 X
c
Class F X
FC-AL X
All data currently is transported using class 3
Fibre Channel Class 3
• Class 3 is a best-effort packetized service:
• The receiving port does not acknowledge receipt of frames. If the fabric cannot
deliver the frame for any reason, the frame can be discarded without notifying
the sending port. However, Class 3 is not really unreliable, because it relies on
ULP to help ensure that frames are delivered, by detecting and recovering from
lost frames
• Class 3 does not guarantee fixed latency because data paths are variable
• Class 3 does not guarantee in-order delivery. For most Fibre Channel
applications, including storage applications, the ULP is responsible for
guaranteeing in-order delivery
Fibre Channel Flow Control
• Fibre Channel flow control attempts to minimize the chance of dropped frames
• Frames are only transmitted when it is known that the receiver has buffer space
• For each frame sent an R_Rdy (B2B Credit) should be returned
• R_Rdys can only be returned once the frame that has previously occupied that
buffer location has been handled
• R_Rdys are not sent reliably – they can be corrupted/lost
• Each side informs the other side of the number of buffer credits it has
• F ports - In the Fabric Login(FLOGI)
• E ports – In the Exchange Link Parameters(ELP)

• Note: B2B credits are not negotiated – just agreed to


Fibre Channel Flow Control
N-Port Login
FLOGI 1 credit N-port
has one
credit!

B N F B B B
ACC (FLOGI) 3 credits
End Device
MDS9710-A
F-Port has
three
credits! MDS9710-A# show int fc1/14
fc1/14 is up
……….
Note: These values are Transmit B2B Credit is 1
not typical. They are Receive B2B Credit is 3
chosen for simplicity. 3 receive B2B credit remaining
1 transmit B2B credit remaining
Typical F ports values 1 low priority transmit B2B credit remaining
16-32
Fibre Channel Flow Control
Frame Flow Control
• As FC frames flow into the fabric, the MDS Rx buffer queue is decremented by 1
B2B credit for each received frame
• Once an R_Rdy is sent by the MDS, it frees up one B2B credit

B N Frame R-Rdy F B B B

MDS9710-A# show interface fc1/14


MDS9710-A#
fc1/14 is up show interface fc1/14
fc1/14 is up
……….
……….
Transmit B2B Credit is 1
Transmit
Receive B2B B2B Credit
Credit is 3is 1
0Receive
receiveB2B
B2BCredit
creditisremaining
3
1 receiveB2B
1 transmit B2Bcredit
creditremaining
remaining
1 transmit
1 low priorityB2B credit B2B
transmit remaining
credit remaining
1 low priority transmit B2B credit remaining
Understanding Fibre Channel Flow Control
Tx and Rx Perspective MDS9710-A# show interface fc1/1 bbcredit
fc1/1 is trunking
Transmit B2B Credit is 500
Receive B2B Credit is 250
• Tx indicates transmit side of port Receive B2B Credit performance buffers is 0
250 receive B2B credit remaining
500 transmit B2B credit remaining
• Rx indicates receive side of port 500 low priority transmit B2B credit remaining

• One side’s Tx is the adjacent


side’s Rx Receive B2B
Credit is 250
Transmit B2B
Credit is 250

• Important to understand which ISL


fc1/1 E E fc1/2
direction the congestion is on
Transmit B2B Receive B2B
Credit is 500 Credit is 500
• Note: Increasing B2B credits does
not usually increase performance
MDS9710-A MDS9710-B
Fibre Channel Flow Control – Example cont’
Normal flow
Xgig
FC Port(1,1,3) Analyzer FC Port(1,1,4) Server
MDS FC Data
R_Rdy
ports

Delta time ~0.7us

FC data and R_Rdy


Fibre Channel Flow Control – Example cont’
Delayed/No R_RDYs
Xgig
FC Port(1,1,3) Analyzer FC Port(1,1,4) Server
MDS FC Data
FC Data
FC Data

Only data – no
R_Rdys
Fibre Channel Flow Control – Example cont’
R-RDY recovery
Xgig
FC Port(1,1,3) Analyzer FC Port(1,1,4) Server
MDS R_Rdy
R_Rdy
R_Rdy

R_Rdys start arriving

More R_Rdys

More R_Rdys
MDS Frame and Credit Processing
1 Initiator sends an FC frame
to the MDS port ASIC 6 FC Frame is forwarded to
XBAR then R_Rdy sent back
2 FC frame is received in its
entirety and stored since buffer is now free
Active
3 FC Frame transmitted to Supervisor
FC Frame is forward to
VOQ Line Arbiter Line 7 egress line card
Card Card
1 2
FC Frame P
P
o XBAR
FC Frame
o
R-Rdy
r interface r

4 XBAR interface sends t t

request to Arbiter for grant to FC


Frame
transmit frame to egress port VOQ MDS Port ASIC
via XBAR 8 forwards frame to target
5 Arbiter grants request to XBAR
interface to forward frame – only Fabric Module(XBAR)
sent when egress port has Credit is returned to
buffer space available Fabric Module(XBAR) 9 Arbiter
The Issue: Non-Responsive Devices causing
upstream blocking
0 Rx credits remaining
All
Devices
Impacted
VOQs
H1
S1
No R_Rdy
VOQs No B2B
Sent
ISL Credits

No B2B Interface VOQs


No R_Rdy
Credits Buffer
Sent S2
H2 No B2B No R_Rdy
No R_Rdy Sent
Credits
Sent
Slow
0 Tx credits remaining 0 Rx credits remaining 0 Tx credits remaining 0 Rx credits remaining
Drain
Device
5 MB read issued from host H2 to storage S2

Congestion Congestion Congestion Congestion Congestion


MDS Slow Drain Features

SAN Congestion! BRKSAN-3446


MDS Slow Drain Features - Existing
• Virtual Output Queues
• Display credits and remaining credits
• Detect Tx and Rx credit transitions to zero
• Slow Port Detection
• Tx and Rx Credit not Available
• Stuck Port Detection
• Credit Loss Recovery / LR Rcvd B2B
• Display ingress queuing
MDS Slow Drain Features - Enhanced!
• Congestion drop frames
• No credit drop frames
• On Board Failure Logging
• Port-monitor alerting / portguard
MDS Slow Drain Features - New!
• slowport-monitor
• show interface counters - txwait
• show interface - Percentage Tx credits are available for last 1s/1m/1h/72h
• txwait-history graphs
• show logging onboard txwait
• SNMP fcIfTxWaitCount variable
• show tech-support slowdrain
• DCNM Slow Drain Analysis
Virtual Output Queues (VOQs)
Switch Without VOQ VOQ Model

Top of Queue
Frame to Port 4
X Top of VOQ Top of VOQ
X Top of VOQ 
Frame to Port 6  Frame to Port 4 Frame to Port 6
Frame to Port 6 Frame to Port 5 Frame to Port 4 Frame to Port 6
Frame to Port 4 Frame to Port 5 Frame to Port 4 Frame to Port 6
Frame to Port 4 Input Queue at Port Input Queue at Port Input Queue at Port
Frame to Port 6 1 1 1
Frame to Port 5 This diagram shows the primary difference between a VOQ-based
switch and a switch without VOQ.
Frame to Port 5
If destination port 4 was congested, the switch without VOQ would
Input Queue at Port
1
block with frames to other output ports waiting behind the blocked
port.
In contrast, VOQ means that only the VOQ associated with port 4 will
be blocked; frames to all other ports will flow normally.

MDS implements VOQs on the input interface


VOQs help prevent head of line blocking
VOQs can alleviate but do not prevent congestion caused by slow drain
MDS Slow Drain Features
Display credits and remaining credits
• MDS can display the Tx and Rx MDS owes 8
credits
credits agreed upon on each
interface MDS9710# show interface fc1/1 bbcredit
fc1/1 is up
• MDS can also display the credits Transmit B2B Credit is 128
remaining in both directions Receive B2B Credit is 32
Receive B2B Credit performance buffers is 0

• Tx and Rx credits are a static value


24 receive B2B credit remaining
• Remaining credits are an 100 transmit B2B credit remaining
100 low priority transmit B2B credit remaining
instantaneous value
• Available via show interface 28 Tx frames
bbcredit command outstanding
MDS Slow Drain Features
Detect credit transitions to zero
• Each time the Tx or Rx credits go to zero the MDS increments a counter
• Maintained as a hardware statistic
• Available in
• show interface counters
• slot x show hardware internal fc-mac port y statistics
• show hardware internal statistics

• Since there is no indication of time at zero this is not a great indication of slow
drain in and of itself
• Use the slowport-monitor or various txwait commands instead
MDS Slow Drain Features
Tx/Rx Credit not Available B2B Credits Sampled
Every 100 ms
• MDS software process detects when a port
is at zero Tx or Rx credits for 100ms
0 sec --
Credits
• Since done by software may not catch each
and every time
100 ms
• Available in: Timestamped!
• slot x show hardware internal fc-mac port y
<snip>
error-statistics
• show logging onboard error-stats
• xxx_CNTR_RX_WT_AVG_B2B_ZERO
• xxx_CNTR_TX_WT_AVG_B2B_ZERO
100 ms
• show system internal snmp credit-not-available
1 sec --
• port-monitor tx-credit-not-available
Successful
MDS Slow Drain Features 0 sec --
Credits

Credit Loss Recovery No Credits (Stuck)

• Creditmon is a process that runs 1/1.5 sec -- LR


periodically in each linecard
LRR
+60ms --
• It checks for transmit credits at zero Port resumes normal
• F Port at 0 Tx credits for 1 second operation

• E Port at 0 Tx credits for 1.5 seconds Unsuccessful


Credits
• Credit loss recovery invoked 0 sec --
• If successful then non-disruptive No Credits (Stuck)
• If port at 0 Rx credits, adjacent device 1/1.5 sec -- LR
is responsible for initiating recovery No Response
+60ms -- Link failure Link
• Part of FC-FS specification reset failed due to
timeout
MDS Slow Drain Features
slot 10 show port-config internal link-events
LR Rcvd B2B
Time PortNo Speed Event Reason
---- ------ ----- ----- ------
• Adjacent device initiates credit loss Apr 3 18:53:36 2014 00591356 fc10/30 4G UP Not FL

recovery Apr 3 18:53:34 2014 00810034 fc10/30 --- DOWN LR Rcvd B2B

• If MDS receives LR it checks if input MDS Port FC Device


buffers are empty
Credits
• If input buffers are not empty in 0 sec --
90ms the “LR Rcvd B2B” condition No Credits from MDS
occurs and the link fails with reason 1 sec -- LR
“Link failure Link Reset failed
No Response
nonempty Recv queue”
+90ms -- Shut/No Shut
• Indication of upstream congestion
LR Rcvd B2B
MDS Slow Drain Features
Congestion Drop Frames
Frames arrive
• Each frame the MDS receives is
0 sec --
time stamped Frame
Check Timestamp
Frame
• If frame cannot be delivered to the Frame of each frame
egress port it is timeout dropped
• MDS (by default) drops frames as 500ms --
Drop the Frames
timeout drops at 500ms
from the queue
• Can be configured 100ms-500ms in
1ms intervals Enhanced! Credit
Frame
• Lowering will timeout frames quicker
and reduce effects of slow drain
devices
MDS Slow Drain Features 0 Tx credits
0 sec --
Frame
Frame
Frame
Frames in Rx queue
from other ports
Stuck port / No-credit-drop frames --
Frame
Frame

-- No-credit-
• Frames normally queued for Congestion drop 300ms
Drop time 300ms --

• Optionally, frames can be dropped --


immediately if the egress port is at 0 Tx B2B -- Drop frames in Rx queue
credits for a specified time
No Tx credits

• Frees up frames queued at ISLs destined


for slow/stuck ports quicker Drop any new arriving
Frames immediately
• Helps unrelated devices in the presence of
congestion
Enhanced!

• Configured 1ms-500ms in 1ms intervals --


Credit
• Done by hardware at exact time Once credit arrives
Frame
resume sending
Frame
Frame
MDS Slow Drain Features
Display ingress queuing
• MDS can show ports that have
Egress ports Ingress port
frames queued and the destination
(egress) port(s) they are queued for
VOQs
DI
• Instantaneous (real time) only
• Helpful when other indications are DI

not showing clear indications


DI
• DI – Destination Index – This is an
internal representation of the port DI

DI
MDS Slow Drain Features
Display dropped packet info
• MDS 9710/9396S has the capability of displaying some key packet info for
packets that have experienced a timeout drop
• 32 packets are kept per forwarding instance
• Output contains:
• Source FCID (SID)
• Destination FCID(DID)
• RCTL – Routing control (ELS, ABTS, etc.)
• Source Index(SI)
• Destination Index(DI)
Interface Delay Timestamp

MDS Slow Drain Features fc1/13 11ms 03/27/2015 12:01:00


New!
Slowport-monitor 0 Tx credits
0 sec -- fc1/13 8ms 03/27/2015 14:09:45

--
• MDS can monitor ports withholding
credits for as low as 1ms --

--
• Records last 10 events for duration and
--
date/time when occurred
5ms --
• Included in OBFL
--
• Full featured for MDS 9700, 9396S, --
9148s and 9250i R_RDY --

• Gen 3 has similar but only records 1 --


event per 100ms cycle 10ms --

• Gen 4 records total wait time in 100ms


system timeout slowport-monitor 5 mode f
-
-
MDS Slow Drain Features
On Board Failure Logging(OBFL)
• Each linecard logs significant events to OBFL - NVRAM
Error-stats
Flow-control
an NVRAM buffer Timeouts
Request-timeouts
Line Card 1 Slowport-monitor-events
Txwait

• Events are time stamped Line Card 2 OBFL - NVRAM


Error-stats
Flow-control

• Events can be displayed by date/time Timeouts

.
Request-timeouts
Slowport-monitor-events
Txwait

• Show logging onboard <module x>


<starttime mm/dd/yy-hh:mm:ss>
• error-stats
.
• flow-control request-timeout . OBFL - NVRAM
Error-stats
• flow-control timeout-drops Line Card n
Flow-control
Timeouts
Request-timeouts

• slowport-monitor-events New!
• txwait
MDS Slow Drain Features
Port-monitor / Portguard
DCNM Server
• Allows alerting on many slow drain
indications Port-monitor active
SNMP Alerts
Link-loss
New! Credit-loss
• Three new counters! Tx-credit-not-avail
Slowport-count
Slowpoer-oper-delay
txwait

• Optional portguard action allows


either port to be flapped or error-
disabled
• Different policies for E / F ports
MDS Slow Drain Features
New!
txwait-history
MDS# show process creditmon txwait-history port 13
• Displays graphical history of Txwait TxWait history for port fc1/13:
– credit not available ==============================
79998 79993 999999
08887 58882 9899999
000000000000299870000000000000000029994000000000000362999500
• Shows 1000 ### ### ######
900 #### ### ######
• Last minute 800
700
####
#####
####
####
######
######
600 ##### #### ######
• Last 60 minutes 500 ##### #### ######
400 ##### #### ######
• Last 72 hours 300
200
#####
#####
#####
#####
######
######
100 ##### ##### #######
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0

Credit Not Available per second (last 60 seconds)


# = TxWait (ms)
MDS Slow Drain Features New!
Percentage Tx credits are available for last 1s/1m/1h/72h
• show interface counters command
• Provides a quick way to check for
problems
• Available for:
• MDS 9500 (Gen4 only)
• MDS 9700 (Gen5)
• MDS 9396S (Gen5)
• MDS 9148S
• MDS 9250i
MDS Slow Drain Features
show tech-support slowdrain New!

• New “show tech-support” flavor


• Contains all the commands
necessary to troubleshoot SAN
congestion issues
• Best when issued against the entire
fabric via DCNM
Troubleshooting Slow
Drain

SAN Congestion! BRKSAN-3446


Troubleshooting Slow Drain
• Classifying Slow Drain Symptoms
• Methodology
• Level by Level Troubleshooting
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms
Levels of Performance Degradation
Level Host Symptoms Default Switch Behavior
1 Latency Frame queuing

2 SCSI errors/retransmission Frame dropping

3 Extreme Delay Links failing/reset

Note: Each level includes all the symptoms of the previous levels
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 1: Latency
• Latency indicates SCSI exchanges are taking longer than normal
• No SCSI errors or retransmissions are noted
• Subtle and difficult to detect
• ISLs and other ports should be checked for low numbers of Tx/Rx remaining
credits
• Use new slowport-monitor, OBFL txwait, txwait-history and alerting
capabilities
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 2: Retransmission
• Once any frame in a SCSI exchange is dropped the exchange will be aborted
• Abort exchanges will be listed in host logs
• Frames are held for a maximum of 500ms prior to dropping as timeouts
• This is the default Congestion Drop value
• Frames can also be dropped as timeouts if no-credit-drop is configured
• Use “show logging onboard starttime <date-time> error-stats”
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 3: Extreme Delay
• Typically caused by ports without credits for 1 or 1.5 seconds
• Credit-loss Recovery is invoked
• Links may fail and/or flap
• Typically many timeout drops are also recorded
Troubleshooting Slow Drain
Methodology
• Cisco recommends troubleshooting slow drain in the following order:

Level 3: Extreme Delay

Level 2: Retransmission

Level 1: Latency
Troubleshooting Slow Drain
Methodology – Follow Congestion to Source
• If Rx congestion then find ports
communicating with this port that
have Tx congestion
• Zoning defines which devices
communicate with this port
• Understand topology F E

• If port communicating with port


showing Rx congestion is FCIP
• Check for TCP retransmits
• Check for overutilization of FCIP Rx Credits Tx Credits
0 Remaining
Congestion 0 Remaining
Troubleshooting Slow Drain
Methodology - Follow Congestion to Source
• If Tx congestion found
• If F port then device
attached is slow drain
device
• If E port then go to
adjacent switch and F E E F
continue troubleshooting
• Continue to track through
the fabric until destination
F-port is discovered Tx Credits
Congestion
Rx Credits
0 Remaining 0 Remaining
Level 3: Extreme Delay - Troubleshooting
Check for credit loss recovery
• Supervisor command on all MDS9710-1# show process creditmon credit-loss-events
platforms Module: 01 Credit Loss Events: YES

• Module command also available ----------------------------------------------------


| Interface | Total | Timestamp |
| | Events | |
• Credit loss recovery events are the ----------------------------------------------------
| fc1/13 | 11524 | 1. Sat Mar 29 14:21:48 2014 |
most severe slow drain indications | | | 2. Sat Mar 29 14:21:47 2014 |
| | | 3. Sat Mar 29 14:21:46 2014 |
| | | 4. Sat Mar 29 14:21:45 2014 |
• Check/change cables/SFPs/HBAs | | | 8. Sat Mar 29 14:21:41 2014 |
| | | 9. Sat Mar 29 14:21:40 2014 |
| | |10. Sat Mar 29 14:21:39 2014 |
• Show logging onboard error-stats ----------------------------------------------------
also contains this
Level 3: Extreme Delay - Troubleshooting
Check for LR Rcvd B2B
• Two places to check: MDS9710-1# slot 1 show port-config internal link-events

1. Module link-events
*************** Port Config Link Events Log ***************
2. Logging log ----
Time
------
PortNo
----- ----- ------
Speed Event Reason
---- ------ ----- ----- ------
• Both indicate the same thing – ...
Jul 28 00:46:39 2012 00670297 fc1/25 --- DOWN LR Rcvd B2B
Rx congestion
MDS9710-1# show logging log
• Not normally a problem w/this %PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
failure)
port but the port this port is
%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
switching packets to failure Link Reset failed nonempty recv queue)

• If multiple ports fail at similar


times then they are switching
to same port
Level 2: Retransmission - Troubleshooting
Check for Transmit Frame Drops
• TIMEOUT drops are MDS9710-1# show hard internal statistics module 1 pktflow dropped
Hardware statistics on module 01:
drops for packets that hit |------------------------------------------------------------------------|
| Device:Lightning Role:ARB-MUX Mod: 1 |
either the Congestion |------------------------------------------------------------------------|
|------------------------------------------------------------------------|
Drop or No-Credit-Drop | Device:F16 Xbar Driver Role:FABRIC Mod: 1 |
|------------------------------------------------------------------------|
thresholds |------------------------------------------------------------------------|
| Device:F16 Que Driver Role:QUE Mod: 1 |
|------------------------------------------------------------------------|
• They are normally |------------------------------------------------------------------------|
| Device:F16 Fwd Driver Role:L2 Mod: 1 |
counted several different |------------------------------------------------------------------------|
|------------------------------------------------------------------------|
ways | Device:F16 Mac Driver Role:FCMAC Mod: 1 |
|------------------------------------------------------------------------|
Instance:1
Cntr Name Value Ports
• Reference appendix for ----- ----- ----- -----
0 F16_TMM_TIMEOUT_STATS_DROP 0000000000088775 13-16 -
counter names and 1 F16_TMM_PORT_FRM_DROP_CNT 0000000000088775 13 -
2 F16_TMM_TOLB_TIMEOUT_DROP_CNT 0000000000088775 13 -
definitions
Level 2: Retransmission - Troubleshooting
Show logging onboard error-stats
• Counters are polled mds9710-2# show logging onboard error-stats
every 20 seconds ----------------------------
Module: 1
242618 – 201650 =
40968 timeout drops in
----------------------------
the last 20 seconds
• When counter value --------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
changes it is included --------------------------------------------------------------------------------
Interface | | | Time Stamp
Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
• Several different | | |
--------------------------------------------------------------------------------
counters are in error- fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58
stats: fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38
• Timeout drops fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38
• Credit loss recovery
• Tx/Rx credit not
available(100ms)
• Force timeout on/off
Level 2: Retransmission - Troubleshooting
module-1# show hardware internal fcmac inst 0 tmm_timeout_stat_buffer
Display dropped packet Port Group num: 0 TMM TIMEOUT BUFFERS
---------------------------------------------
information TO_RD:22 TO_WR:6 NUM PKTS:32
--------------------------------------------------------------
• MDS 9710 and 9396S maintains a TMM TIMEOUT Packet :0
CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0
FIFO list of last 32 dropped packets SID:330040 DID:170040
TSTMP_VALID:1 HDRTSTMP:14176(0x3760)
RCTL:0
HDRCTL:6144 SI:12
DI:2 AT:0 PORTNUM:1
• Display is per instance(8 ports) TMM TIMEOUT Packet :1
CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0
SID:330040 DID:170040 RCTL:0
• These contain: TSTMP_VALID:1 HDRTSTMP:14176(0x3760) HDRCTL:6144 SI:12
DI:2 AT:0 PORTNUM:1
• Source FCID (SID) MDS9710-2# show system internal fcfwd idxmap port-to-interface
Port to Interface Table:(All values in hex)
• Destination FCID(DID) --------------------------------------------------------------------------------
glob| |VL|lcl| if |slot|port| mts | port| flags
• RCTL – Routing control (ELS, ABTS, idx | if_index | |idx|type| | | node| mode|
-----|--------------------------|--|---|----|----|----|-----|-----|-------------
etc.) 0| 01000000 fc1/1 | 0| 00| 01 | 00 | 00 | 0102| 08 | 00
1| 01001000 fc1/2 | 0| 01| 01 | 00 | 01 | 0102| 00 | 00
• Source Index(SI) 2| 01002000 fc1/3 | 0| 02| 01 | 00 | 02 | 0102| 00 | 00

• Destination Index(DI) <snip> Actual interface name


12| 01012000 fc1/13 | 0| 12| 01 | 00 | 12 | 0102| 00 | 00
• These are not necessarily the slow
device! Could be a victim!
Shows packets from fc1/13 to fc1/3 dropped
Level 1: Latency - Troubleshooting
Credit Not Available
• Indicates 100ms increments where Tx B2B credits were 0
• % indicate % of 1 second so 20% is 200ms

MDS9513# show system internal snmp credit-not-available

Module: 6 Number of events logged: 6


------------------------------------------------------------------------------------------
Port Threshold Rising/Falling Interval(s) Event Time Type Duration available
----------------------------------------------------------------------------------------------------------

fc6/32 10/0(%) 1 Wed Apr 2 17:23:54 2014 Rising 10% 100ms Tx Delay
fc6/32 10/0(%) 1 Wed Apr 2 17:24:39 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:24:40 2014 Rising 20%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:53 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:54 2014 Rising 20% 200ms Tx Delay
Level 1: Latency - Troubleshooting
Credit Not Available – continued
• Included in OBFL error-stats
• Tracked in both Rx and Tx directions
• Indicates 100ms intervals where Tx or Rx credit is not available Credit Loss
Incremented by 19 - 9 = 10
in 20 seconds

--------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
FCP_SW_CNTR_TX_WT_AVG_B2 --------------------------------------------------------------------------------
B_ZERO Interface | | | Time Stamp
Credit not available 100ms Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
| | |
increments --------------------------------------------------------------------------------
Incremented by 217-108 = 109 fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1496855 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |217 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |19 |04/07/15 22:44:23
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1486654 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |9 |04/07/15 22:44:03
Level 1: Latency - Troubleshooting
Check ISLs for Lack of Transmit Credits
• Data frames are sent using low
priority credits
MDS9710# show interface | include "fc|Belong|low
priority|remain" | exclude "description" |exclude
• If Tx B2B credit remaining is low "Peer" | include "trunking" next 3
then congestion is toward the fc1/3 is trunking
adjacent switch 500 receive B2B credit remaining
0 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
• If Rx B2B credit remaining is low
then congestion is in this switch and
perhaps other switches
• Can only be done while congestion
is in progress
Level-1 Troubleshooting: Latency
Check Transitions to Zero counters
9710-1# show int fc1/13 counters
• Transitions to zero indicate when Tx fc1/13

or Rx B2B credits go to zero even 549317 Transmit B2B credit transitions to zero
just for an instant of time 2388296 Receive B2B credit transitions to zero
1934443328 2.5us TxWait due to lack of transmit
credits
• Transmit indicates the adjacent Percentage Tx credits not available for last
1s/1m/1h/72h: 0%/0%/98%/1%
device is withholding credits 32 receive B2B credit remaining
17 transmit B2B credit remaining
17 low priority transmit B2B credit remaining
• Receive indicates this MDS is Last clearing of "show interface" counters 01:25:25
withholding credits from the
adjacent device
• Look for large incrementing
numbers since some devices go to
zero normally
Level-1 Troubleshooting: Latency
Check for Frame Queuing on Ingress Ports
module-1# show hardware internal f16_que inst 0 table iqm-statusmem0
• “Prio 3” is class 3 +-------------------------------------------------------------------------------
| IQM: PG0 Status Memory (logical layout) for F16 Que Driver
| Inst 0; port(s) 1-8
• 000004 is port bitmap in | Each instance is 8 ports on this LC
hexadecimal indicating Note: Only non-zero entries are displayed
Each non-zero bit indicates pending frame in VOQ for that IB
the presence of one or
+----------+--------+--------+--------+--------+
more queued frames | GI (Hex) | Prio 0 | Prio 1 | Prio 2 | Prio 3 | Egress port fc1/13
+----------+--------+--------+--------+--------+
• B’0000 0000 0000 0000 | c | 000000 | 000000 | 000000 | 000004 |
+----------+--------+--------+--------+--------+
Port fc1/3
0000 0100’ rtp-san-33-18-9710-2# show system internal fcfwd idxmap port-to-interface
Port to Interface Table:(All values in hex)
--------------------------------------------------------------------------------
Port 3 Port 2 Port 1 glob| |VL|lcl| if |slot|port| mts | port| flags
idx | if_index | |idx|type| | | node| mode|
-----|--------------------------|--|---|----|----|----|-----|-----|-------------
• GI (Hex) is Global Index 0| 01000000 fc1/1
1| 01001000 fc1/2
| 0| 00| 01 | 00 | 00 | 0102| 00 | 00
| 0| 01| 01 | 00 | 01 | 0102| 00 | 00
(egress port) …snip
b| 0100b000 fc1/12 | 0| 0b| 01 | 00 | 0b | 0102| 00 | 00
c| 0100c000 fc1/13 | 0| 0c| 01 | 00 | 0c | 0102| 00 | 00

Input interface fc1/3 has frame(s) queued for fc1/13


Level-1 Troubleshooting: Latency
Check for Frame Queuing on Ingress Ports - continued
• For generation 3:
• slot x show hardware internal up-xbar <0-1> queued-packet-info
• For generation 4:
• slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1
• For generation 5/9396S:
• slot x show hardware internal f16_que inst 0 table iqm-statusmem0
• 9148, 9250i & 9148S – Not available
• Each instance is a defined number of ports and is LC specific
• Issue command several times and look for patterns of GI that are the same. This
is the slow port.
• Real time (instantaneous)
Level-1 Troubleshooting: Latency
Check for Arbitration Timeouts
• Request-timeouts indicate
frames that could not
immediately be sent to
“Dest Intf” (egress port - MDS9513# show logging onboard flow-control request-timeout

slow) ----------------------------
Module: 9
----------------------------
• Do not indicate actual --------------------------------------------------------------------------------
| Dest | Source |Events| Timestamp | Timestamp |
packet drops – just delayed | Intf | Intf | Count| Earliest | Latest |
--------------------------------------------------------------------------------
|fc1/2 |fc9/24, | 28|Sun Feb 9 00:28:23 2014|Sun Feb 9 00:28:24 2014|
• If Dest Intf is FCIP then --------------------------------------------------------------------------------
there are problems on the
FCIP tunnel
• Check for TCP retransmits
• Check for overutilization of
FCIP
Level-1 Troubleshooting: Latency
New!
txwait
• txwait is a counter that increments every 2.5us when port is at 0 Tx credits and
there are frames queued for transmit
• txwait * 2.5 / 1000000 = seconds of time the port was unable to transmit
• Only applies to the following:
• MDS 9500 with generation 4 linecards:
• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch

• Others will return zero


Level-1 Troubleshooting: Latency
New!
txwait - continued
• txwait can be seen in the following:
• show interface counters
• Raw value in 2.5us units
• show interface counters
• Percentage Tx credits are available for last 1s/1m/1h/72h
• show process creditmon txwait-history
• 60sec, 60min, 72hour graphs
• show logging onboard txwait
• SNMP fcIfTxWaitCount variable
Level-1 Troubleshooting: Latency
txwait - show interface counters New!

mds9710-1# show interface fc1/13 counters | i fc|wait


fc1/13
6252650 2.5us Txwaits due to lack of transmit credits
6252650 * 2.5 / 1000000 = 15.631625 seconds
The above indicates the MDS was not able to transmit for over 15 seconds since
the counters were cleared last
Level-1 Troubleshooting: Latency
New!
txwait - Percentage Tx credits are available for last 1s/1m/1h/72h
• Utilizes the underlying MDS9710-1# show interface fc1/13 counters

txwait counter fc1/13



5 Transmit B2B credit transitions to zero
2 Receive B2B credit transitions to zero
0 2.5us TxWait due to lack of transmit credits
Percentage Tx credits not available for last 1s/1m/1h/72h: 1%/5%/3%/2%
32 receive B2B credit remaining
128 transmit B2B credit remaining
128 low priority transmit B2B credit remaining
Level-1 Troubleshooting: Latency
txwait - show logging onboard txwait New!

MDS9513# show logging onboard txwait module 4



--------------------------------- Recorded every
Module: 4 txwait count 20 seconds
---------------------------------
Notes:
- Sampling period is 20 seconds
- Only txwait delta >= 100 ms are logged

-----------------------------------------------------------------------------
| Interface | Delta TxWait Time | Congestion | Timestamp |
| | 2.5us ticks | seconds | | |
-----------------------------------------------------------------------------
| fc4/1 | 52927 | 0 | 0% | Wed May 27 13:20:12 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:52 2015 |
| fc4/1 | 105854 | 0 | 1% | Wed May 27 13:19:32 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:12 2015 |
• Delta values recorded when they are more than 100ms in the
20 second interval
Level-1 Troubleshooting: Latency
txwait-history New!
mds9710-1# show process creditmon txwait-history module 1 port 13
• Graphical display of time
TxWait history for port fc1/13:
where Tx credits are not ==============================
available 697
299
54 6994
18 4780
000000000000000000000000000000000029000290088400000000000000
• Similar in format to cpu history 1000 # ##
900 # ##
800 ## ##
• 3 graphs per port 700 ## ##
600 ### ###
• Last 60 seconds 500 ### ## ###
400 ### ## ####
• Last 60 minutes 300 ### ## ####
200 ### ## ####
• Last 72 hours 100 ### ## ####
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
• Utilizes the underlying txwait
Credit Not Available per second (last 60 seconds)
counter # = TxWait (ms)
Level-1 Troubleshooting: Latency
slowport-monitor New!
• system timeout slowport-monitor <1-500> mode e|f

• Events are captured every 100ms


• Last 10 events per port captured in slowport-monitor-events
• Logging onboard slowport-monitor-events captures more events
• Currently implemented for:
• 9500
• Gen 3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules
• Gen 4 LCs - DS-X9232-256K9 and DS-X9248-256K9 modules
• 9700 & 9396S (Gen 5)
• 9250i & 9148S

• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S


Level-1 Troubleshooting: Latency
show process creditmon slowport-monitor command New!
• system timeout slowport-monitor… must be configured

• Events are captured every 100ms


• Last 10 events per port captured in slowport-monitor-events
• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S
Level-1 Troubleshooting: Latency
New!
Slowport-monitor – Gen3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules
mds9513# show process creditmon slowport-monitor-events module 2
• Gen3 modules have basic HW
Module: 02 Slowport Detected: YES
capabilities ==================================================================
=======
Interface = fc2/1
• Each 100ms it can only be --------------------------------------------------------
| admin | slowport | Timestamp |
determined if port was at zero Tx | delay | detection |
| (ms) | count |
|
|
credits for the admin delay period --------------------------------------------------------
| 10 | 194 | 1. 04/29/15 17:19:13.345 |
| 10 | 193 | 2. 04/29/15 17:19:13.245 |
• The actual amount of time and | 10
| 10
|
|
192 | 3. 04/29/15 17:19:13.145
191 | 4. 04/29/15 17:19:13.045
|
|
the number of times in that | 10
| 10
|
|
190 | 5. 04/29/15 17:19:12.945
189 | 6. 04/29/15 17:19:12.845
|
|
100ms cannot be determined | 10
| 10
|
|
188 | 7. 04/29/15 17:19:12.745
187 | 8. 04/29/15 17:19:12.645
|
|
| 10 | 186 | 9. 04/29/15 17:19:12.545 |
• Recorded when at least one | 10 | 185 |10. 04/29/15 17:19:12.445 |
--------------------------------------------------------
complete event occurred 100ms intervals

• No oper delay Configured delay 10ms Only 1 event in last 100ms


Level-1 Troubleshooting: Latency
New!
Slowport-monitor – Gen4 LCs - DS-X92xx-256K9
MDS9513# show process creditmon slowport-monitor-events module 4
• Gen4 modules use txwait for Module: 04 Slowport Detected: YES
slowport-monitor ==================================================================
Interface = fc4/1
----------------------------------------------------------------
• Recorded when txwait is >= | admin | slowport | txwait|
| delay | detection | oper |
Timestamp |
|
| (ms) | count | delay | |
admin delay within 100ms | | | (ms) | |
----------------------------------------------------------------
| 10 | 18 | 16 | 1. 05/21/15 14:39:09.102 |
• oper delay is cumulative delay | 10 | 17 | 56 | 2. 05/21/15 14:39:09.002 |
| 10 | 16 | 59 | 3. 05/21/15 14:39:08.905 |
| 10 | 15 | 10 | 4. 05/21/15 14:38:54.590 |
• Txwait is cumulative for the | 10 | 14 | 41 | 5. 05/21/15 14:38:54.490 |
| 10 | 13 | 80 | 6. 05/21/15 14:38:54.390 |
100ms interval | 10 | 12 | 37 | 7. 05/21/15 14:38:39.970 |
| 10 | 11 | 56 | 8. 05/21/15 14:38:39.870 |
| 10 | 10 | 34 | 9. 05/21/15 14:38:39.775 |
• 1 x 10ms | 10 | 9 | 29 |10. 05/21/15 14:38:25.430 |
----------------------------------------------------------------

• 10 x 1ms Configured delay 10ms 100ms intervals

Only 1 event per 100ms Cumulative delay in 100ms


Level-1 Troubleshooting: Latency
slowport-monitor – 9700/9250i/9148S/9396S (Gen 5 LCs) New!

MDS9710-1# show process creditmon slowport-monitor-events


• Gen5/9250i/9148S/9396S have
Module: 01 Slowport Detected: YES
enhanced HW capabilities ==================================================================
=======
Actual average delay
• Each 100ms interval the number Interface = fc1/13
----------------------------------------------------------------
of times Tx credits remained at 0 | admin | slowport | oper |
| delay | detection | delay |
Timestamp |
|
for the configured(admin) delay | (ms) | count | (ms) | |
----------------------------------------------------------------
is counted. | 5 | 1300 | 20 | 1. 04/01/15 23:03:38.823 |
| 5 | 1296 | 19 | 2. 04/01/15 23:03:38.724 |
| 5 | 1291 | 19 | 3. 04/01/15 23:03:38.623 |
• The average operational delay is …
determined – This is how long | 5 | 1256 | 19 |10. 04/01/15 23:03:37.923 |
----------------------------------------------------------------
the port was at 0 Tx credits Note: Oper
te 4 events in last 100ms
• Recorded when at least one delay limited
complete event occurred Configured delay(5ms)
by no-credit-
drop threshold
Level-1 Troubleshooting: Latency Gen5/9250i/9148S/9396S
slowport-monitor – Comparison 2 events in 100ms
Oper delay
Credits
System timeout slowport-monitor 5

15ms+30ms/2 = 22ms
2 events logged

100
10
20
25

80
90
95
0
5
15

35
40
45
50
55
60

85
30

65
70
75
Time (ms) 9500 Gen3
Credits

0 Tx >= 5ms in 100ms


Poll 1 event logged

100
95
10
20
25

80
90
0
5

40
15

35

45
50
55
60

85
30

65
70
75
Time (ms) 9500 Gen4
Credits

Total time 45ms >= 5ms in


100ms
1 event logged
100
95
10
20
25

80
90
0
5

40
15

35

45
50
55
60

85
30

65
70
75

Time (ms)
Level-1 Troubleshooting: Latency
show logging onboard slowport-monitor command New!

More events available via logging onboard


MDS9710-1# show logging onboard slowport-monitor-events

---------------------------------
Module: 1 slowport-monitor-events Gen5/9250i/9148S/9396S
---------------------------------

--------------------------------------------------------------------------
| admin | slowport | oper | Timestamp | Interface
| delay | detection | delay | |
| (ms) | count | (ms) | |
--------------------------------------------------------------------------
| 20 | 49 | 489 | 05/11/15 21:04:46.779 | fc1/13
| 20 | 48 | 489 | 05/11/15 21:04:46.272 | fc1/13
| 20 | 47 | 489 | 05/11/15 21:04:45.779 | fc1/13
| 20 | 46 | 489 | 05/11/15 21:04:45.272 | fc1/13
Level-1 Troubleshooting: Latency
Slowport-monitor – Comparison
Linecard Maximum events Actual delay Notes
per 100ms interval measured?
DS-X9248-48K9 (gen3) No – Just an If actual delay hits slowport-monitor
DS-X9224-96K9 (gen3) indication if admin admin delay then an indication is
DS-X9248-96K9 (gen3) 1 delay was reached. made. That indication is checked
Actual delay could be every 100ms and if true then raise
much more event
DS-X9232-256K9 (gen4) If total delay(sum of all individual
Yes - Actual delay is
DS-X9248-256K9 (gen4) delays) in 100ms interval hits
1 total delay per 100ms
slowport-monitor admin delay then
interval
raise event
DS-X9448-768K9 (gen5) If actual delay hits slowport-monitor
MDS 9396S(gen5) Yes – Average delay admin delay and port
MDS 9148S 100 for all events in recovered(received credit) then
MDS 9250i 100ms interval raise event. These are checked
every 100ms interval.
Level-1 Troubleshooting: Latency
show tech-support slowdrain New!

• Contains all the commands available that pertain to slow drain


• Contains “context” commands to understand the FC topology
• Contains name server commands to identify devices
• Contains active zonesets to understand device relationships
• Most useful when run from DCNM and gathered for the entire fabric
• SAN Client -> Tools -> Run CLI Commands…
DCNM Slow Drain
Analysis

SAN Congestion! BRKSAN-3446


New!
DCNM Slow Drain Analysis
• DCNM 7.1(1) added Slow Drain Analysis
• Used for pulling fabric wide slow drain counters for a defined period of time
• Useful for ongoing slow drain problems
• Accessed from the Web Client Health -> Diagnostics -> Slow drain Analysis
DCNM Slow Drain Analysis
Starting

Slow Drain Analysis


DCNM Slow Drain Analysis
3 steps to initiate collection of slow drain counters for a fabric

Step 1:
Select
fabric

Step 2: Step 3:
Choose Start
duration collection
DCNM Slow Drain Analysis
While underway…

Almost
finished
DCNM Slow Drain Analysis
Finished

Select
job
DCNM Slow Drain Analysis
509 credit
Completed Report loss events
in 10
minutes!

Only show
rows with
non-zero
counters

Filter results
as needed
DCNM Slow Drain Analysis
Counter explanations - help
Hover over
counter for
addition
information
DCNM Slow Drain Analysis
Show non-zero data rows only

Only show
rows with
non-zero
counters

Only 3
rows with
non-zero
counters
DCNM Slow Drain Analysis
Filtering

Filter results
as needed
Slow Drain Alerting and
Mitigation

SAN Congestion! BRKSAN-3446


Slow Drain Alerting and Mitigation
• Port Monitor
• Congestion counters
• Portguard

• Adjust Congestion Drop Threshold


• Setting the No Credit Drop Threshold
Slow Drain Alerting and Mitigation
Port-monitor alerting
• Port-monitor allows monitoring of several counters relating to slow drain
• credit-loss-reco Credit loss recovery counter
• lr-rx The number of link resets received by the fc-port
• lr-tx Link resets transmitted by the fc-port
• timeout-discards Timeout discards counter
• tx-credit-not-available Credit not available counter(in 100ms increments)
• tx-discards Tx discards counter New!
• slowport-count Number of slowport events
New!
• slowport-oper-delay Slowport operational delay
New!
• txwait Amount of time at 0 Tx credits and packets queued
Note: There are other counters that are valuable and should also be considered for
inclusion in monitoring but are not part of slow drain
Slow Drain Alerting and Mitigation
Port-monitor counter - credit-loss-reco
• Number of times credit loss recovery was initiated due to port at 0 Tx credits for
1/1.5 seconds
• Most severe indication of congestion
• Normally other counters like timeout-discards will also increment
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Slow Drain Alerting and Mitigation
Port-monitor counter - lr-rx and lr-tx
• Number of times a Link Reset(LR) was received(lr-rx)
• Number of times a Link Reset(LR) was transmitted(lr-tx)
• Similar to credit-loss-reco counter
• May increment for other reasons besides congestion
• Normally other counters like timeout-discards will also increment
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Slow Drain Alerting and Mitigation
Port-monitor counter - timeout-discards
• Number of packets dropped due to reaching the congestion-drop (timeout)
threshold
• When packets are dropped SCSI errors will result at the hosts and targets
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Slow Drain Alerting and Mitigation
Port-monitor counter - tx-credit-not-available
• Indicates 100ms intervals of a port at 0 Tx credits
• rising-threshold is configured as a percentage of polling-
interval(1 second)
• Examples:
• counter tx-credit-not-available poll-interval 1 delta rising-
threshold 10 event 4 falling-threshold 0 event 4
• 10 is 10% of 1 second or 100ms
• counter tx-credit-not-available poll-interval 1 delta rising-
threshold 20 event 4 falling-threshold 0 event 4
• 20 is 20% of 1 second or 200ms
• Only multiples 10 (10, 20, 30, etc…) should be configured
• Applies to all types of switches and linecards
Slow Drain Alerting and Mitigation
Port-monitor counter - tx-discards
• The number of packets dropped at egress for a variety of reasons.
• This counter would include timeout-drops as well
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Slow Drain Alerting and Mitigation
New!
Port-monitor counter - Slowport-count
• Counts the number of times the slowport-monitor threshold was reached
• Only applies to MDS 9500 with generation 3 linecards
• 1/2/4/8 Gbps 24-Port Fibre Channel switching module (DS-X9224-96K9)
• 1/2/4/8 Gbps 48-Port Fibre Channel switching module (DS-X9248-96K9)
• 1/2/4/8 Gbps 4/44-Port Fibre Channel switching module (DS-X9248-48K9)

• Only counts a maximum of once per 100ms interval (10 per second)
• Indicates 0 Tx credits for at least the slowport-monitor interval
• Slowport-monitor must be configured for this to alert
• Refer to gen3 slowport-monitor section for more info
Slow Drain Alerting and Mitigation
New!
Port-monitor counter - slowport-oper-delay
• Alerts on slowport operational(actual) delay
• Only applies to the following
• MDS 9500 with generation 4 linecards
• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch

• Alerts on operational(actual) delay not on the admin(configured) delay


Slow Drain Alerting and Mitigation New!
Port-monitor counter - slowport-oper-delay - continued
• Configured as an absolute counter
• Slowport-monitor must be configured for this to alert!
• Refer to Gen4 slowport-monitor section for more info
• Refer to Gen5/9250i/9148S/9396S slowport-monitor section for more info
Slow Drain Alerting and Mitigation
New!
Port-monitor counter - txwait
• Measures time port is at 0 Tx credits and frames are queued to send
• Only applies to the following
• MDS 9500 with generation 4 linecards
• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch

• Configured as a percentage of the polling interval


Slow Drain Alerting and Mitigation
Port-monitor slowport counters - comparison
Linecard slowport-count slowport-oper-delay tx-wait
DS-X9248-48K9 (gen3) X
DS-X9224-96K9 (gen3) X
DS-X9248-96K9 (gen3) X
DS-X9232-256K9 (gen4) X X
DS-X9248-256K9 (gen4) X X
DS-X9448-768K9 (gen5) X X
MDS 9148S X X
MDS 9250i X X
MDS 9396S X X
Slow Drain Alerting and Mitigation
Port-monitor alerting
• Port-monitor allows separate policies
• F, FL ports(access) MDS9513(config-port-monitor)# port-type ?
access-port Configure port-monitoring for access ports
• E, TL ports(trunks) all Configure port-monitoring for all ports
trunks Configure port-monitoring for trunk ports
• Both F ports and E ports

• Only one policy type per port can be


active at a time
• Note: port-type access includes F
port connections to NPV switches that
can carry several logins
• Note: NP ports are not currently
monitored
Slow Drain Alerting and Mitigation
Port-monitor alerting - continued
• counter <name> poll-interval <interval> delta rising-threshold <rthresh> event
<id> falling-threshold <fthres> event <id> <portguard errordisable | flap>
• poll-interval – Seconds - How often should this counter be checked?
• delta – Compare the current value with the value at the previous poll interval
• absolute – Match the actual value
• rising-threshold – How much the counter must increase in this poll interval to trigger
• event – Indicates severity of alert - info, warning, error, etc.
• falling-threshold - How much the counter must decrease in this poll interval to reset
• portguard – Optional – Action to take when rising-threshold is reached
• errordisable – Place put in error-disable state. Requires manual shut/no shut to re-activate
• flap – shut/no shut port
Slow Drain Alerting and Mitigation
Port-monitor alerting – continued
• Monitor-counter command determines which counters are active in a policy

rtp-san-33-18-9710-1(config-port-monitor)# monitor counter ?


credit-loss-reco Configure credit loss recovery counter
err-pkt-from-port Configure err-pkt-from-port counter
err-pkt-from-xbar Configure err-pkt-from-xbar counter
err-pkt-to-xbar Configure err-pkt-to-xbar counter
invalid-crc Configure invalid-crc counter
invalid-words Configure invalid-words counter
link-loss Configure link-failure counter
lr-rx Configure the number of link resets received by the fc-port
lr-tx Configure the number of link resets transmitted by the fc-port
rx-datarate Configure rx performance counter
signal-loss Configure signal-loss counter
slowport-count Configure slow port sub-100ms counter
slowport-oper-delay Configure slow port operation delay
sync-loss Configure sync-loss counter
timeout-discards Configure timeout discards counter
tx-credit-not-available Configure credit not available counter
tx-datarate Configure tx performance counter
tx-discards Configure tx discards counter
txwait Configure tx total wait counter
Slow Drain Alerting and Mitigation
Port-monitor alerting – RMON event severities
• Event indicates severity in alert
• 1 – Fatal
• 2 – Critical mds9513(config-port-monitor)# show rmon events
Event 1 is active, owned by PMON@FATAL
• 3 – Error Description is FATAL(1)
Event firing causes log and trap to community public, last fired never
• 4 – Warning Event 2 is active, owned by PMON@CRITICAL
Description is CRITICAL(2)
• 5 - Informational Event firing causes log and trap to community public, last fired never
Event 3 is active, owned by PMON@ERROR
Description is ERROR(3)
Event firing causes log and trap to community public, last fired never
Event 4 is active, owned by PMON@WARNING
Description is WARNING(4)
Event firing causes log and trap to community public, last fired
2014/02/21-17:13:11
Event 5 is active, owned by PMON@INFO
Description is INFORMATION(5)
Event firing causes log and trap to community public, last fired
2014/03/08-08:25:19
Slow Drain Alerting and Mitigation
Port-monitor alerting – Example
port-monitor name AllPorts
port-type all Policy applies to Access(F)
no monitor counter link-loss and Trunk(E) ports
no monitor counter sync-loss
no monitor counter signal-loss These counters are not
no monitor counter invalid-words monitored
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar New!
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4

Note: The above monitors 9 slow drain counters and does not monitor 10 others
Slow Drain Alerting and Mitigation
Port-monitor alerting – activation and output

MDS9710-1# show port-monitor AllPorts

Policy Name : AllPorts


Admin status : Not Active
Oper status : Not Active
Port type : All Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 50 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled
slowport-count Delta 1 5 4 0 4 Not enabled New!
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
txwait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------
Slow Drain Alerting and Mitigation
SNMP trap OIDs sent by port-monitor
• SNMP traps that are sent with the following object identifiers (OIDs):
• fcIfTxWtAvgBBCreditTransitionToZero: 1.3.6.1.4.1.9.9.289.1.2.1.1.38
• Note: There is no OID in the Rx direction.
• fcIfCreditLoss: 1.3.6.1.4.1.9.9.289.1.2.1.1.37
• fcIfLinkResetOuts: 1.3.6.1.4.1.9.9.289.1.2.1.1.10
• fcIfLinkResetIns: 1.3.6.1.4.1.9.9.289.1.2.1.1.9
• fcIfTimeOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.35
• fcIfOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.36
• fcIfSlowportCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.44 New!
• fcIfSlowportOperDelay: 1.3.6.1.4.1.9.9.289.1,2,1,1,45 New!
• fcIfTxWaitCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.15 New!
Slow Drain Alerting and Mitigation
Port-monitor portguard

• Adding portgard to errdisable or flap a port can help the switch automatically
mitigate problems
• Should be done to access(F) ports only
• Use separate access(F) and trunk(E) policies
• Applies to delta counters only
Slow Drain Alerting and Mitigation
Port-monitor portguard - continued
• The following adds portguard to timeout-discards and credit-loss-reco and
adjusts the rising-threshold up a bit:
port-monitor name AccessPorts
port-type access
no monitor counter link-loss Error disable the port when
no monitor counter sync-loss Access(F) port policy 60 timeout-discards
no monitor counter signal-loss happen in 60 seconds
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable
counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate Error disable the port when
no monitor counter err-pkt-from-port 4 credit loss recovery
no monitor counter err-pkt-to-xbar events occur in 60 seconds
no monitor counter err-pkt-from-xbar
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4 New!
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Slow Drain Alerting and Mitigation
Port-monitor portguard – trunk (E) port policy
port-monitor name ISLPorts
port-type trunks TrunkE) port policy
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar New!
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 80 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Slow Drain Alerting and Mitigation
Port-monitor portguard – when activated
mds9710-1# show port-monitor active

Policy Name : ISLPorts


Admin status : Active
Oper status : Active
Port type : All Trunk Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 100 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 100 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled New!
slowport-count Delta 1 5 4 0 4 Not enabled
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
txwait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------

Continued next slide…


Slow Drain Alerting and Mitigation
Port-monitor portguard – when activated - continued

…continued from previous slide


Policy Name : AccessPorts
Admin status : Active
Oper status : Active
Port type : All Access Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 60 4 10 4 Error Disable
Credit Loss Reco Delta 60 4 4 0 4 Error Disable
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled
slowport-count Delta 1 5 4 0 4 Not enabled New!
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
Tx wait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------
Slow Drain Alerting and Mitigation
DCNM event log
Slow Drain Alerting and Mitigation
Adjust Congestion Drop Threshold Lower
• Lowering congestion drop timeout value
system timeout congestion-drop 200 mode f
from 500ms to 200ms
• Frees up ingress buffer space quicker 0 sec --
Frame
Check Timestamp
• Can be set differently on F and E ports Frame
of each frame
Frame

• Congestion timeout for mode F should


be smaller than(or equal to) mode E.
200ms --
Drop the Frames
• Global command for switch
from the queue
• Recommended for F ports
Credit
Frame
Setting the No Credit Drop Threshold
Setting the No Credit Drop Threshold system timeout no-credit-drop 200 mode f

• No-credit-drop causes frames to be


dropped immediately if the destination port
is at 0 Tx credits for the time specified
• Should be used in conjunction with
lowering congestion-drop threshold
• Recommended for F ports
• Can drastically improve ISL performance
under slow drain conditions
• xxx_FORCE_TIMEOUT_ON/OFF counter
• By default no-credit-drop is not enabled
Test results – congestion-timeout/no-credit-timeout
Topology

Ag104/1
Fc1/13 Fc1/13
Ag104/2
4Gbps 4Gbps
ISL
Fc1/3 8Gbps Fc1/3
Slow
Fc1/14
Fc1/14 Ag104/4 Drain
Ag104/3 Device
4Gbps
4Gbps
Test results – congestion-timeout/no-credit-timeout
104/4 R-Rdy delay 300ms - Default timeout settings – frames/sec
Test results – congestion-timeout/no-credit-timeout
104/4 R-Rdy delay 300ms – Congestion-drop/no-credit-drop 200ms

Almost 3X
improvement
on the flow!
Summary

SAN Congestion! BRKSAN-3446


Summary
• FC B2B flow control helps reduce packet loss
• Devices with problems can cause congestion problems in the fabric
• This congestion can propagate through the fabric affecting unrelated devices
• MDS has several features designed to alert, identify and mitigate
• Classify your problem and follow the troubleshooting guidelines
Troubleshooting Summary
Where do you start?
• Proactive
 Configure slowport-monitor
 Configure congestion-drop and no-credit-drop
 Configure port-monitor policies

• Reactive
 Use several show logging onboard commands with starttime option to display events
Troubleshooting Summary
Proactive
• Configure slowport-monitor @ 10-25ms for both E & F ports
 system timeout slowport-monitor 10 mode e
 system timeout slowport-monitor 10 mode f

• Configure congestion-drop on F ports


 system timeout congestion-drop 200ms mode f
 Don’t go below 200ms!

• Configure no-credit-drop on F ports


 System timeout no-credit-drop <ms> mode f
 200ms – safe, 100ms – aggressive, 50ms – Very aggressive

• Configure port-monitor policy(s)


 Use samples included in port-monitor section
Troubleshooting Summary
Reactive
• Show logging onboard <starttime mm/dd/yy-00:00:00> error-stats
 Includes timestamped indications of all three levels of congestion
 Credit-loss-recovery
 timeout-discards
 Latency 100ms Tx & Rx average wait

• Show logging onboard <starttime mm/dd/yy-00:00:00> slowport-monitor-events


 Includes timestamped slowport-monitor-events
 Mostly for grade 1 (latency) issues

• Show logging onboard <starttime mm/dd/yy-00:00:00> txwait


 Includes timestamped interfaces that had >=100ms delay in 20 seconds
 Mostly for grade 1 (latency) issues
Additional References
• Slow Drain Device Detection and Congestion Avoidance Whitepaper
• http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-
multilayer-directors/white_paper_c11-729444.html

• Generation 4 (gen4) Linecard Slow Drain Counters and Commands


Troubleshooting
• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-
director/116098-trouble-gen4-00.html

• MDS 9148 Slow Drain Counters and Commands


• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9100-series-
multilayer-fabric-switches/116401-trouble-mds9148-00.html
MDS Command Reference
MDS 9500
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to
in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or
more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,
not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not much
different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL error-stats. This contains many counters related to slow drain
error-stats including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-
timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display txwait delta values recorded when greater than 99ms per 20 second
txwait interval
MDS Command Reference
MDS 9500 - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10 events
per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well

show hardware internal statistics module x pktflow dropped Displays packet drop counters

show hardware internal errors [module x] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command Reference
MDS 9500 - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal up-xbar <0-1> queued-packet-info Displays information indicating packets that are momentarily queued.
For generation 3 linecards only
slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1 Displays information indicating packets that are momentarily queued.
For generation 4 linecards only
MDS Command Reference
MDS 9700 / MDS 9396S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow
drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20
second interval
MDS Command Reference
MDS 9700 / MDS 9396S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well
show hardware internal statistics [module x|module-all] pktflow Displays packet drop counters
dropped Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
show hardware internal errors [module x|module-all] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command Reference
MDS 9700 / MDS 9396S - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal fcmac inst [0-5 | 0-11] Displays information indicating packets dropped due to timeouts
tmm_timeout_stat_buffer

slot x show hardware internal f16_que inst [0-5 | 0-11] table iqm- Displays information indicating packets that are momentarily queued.
statusmem0|1
MDS Command Reference
MDS 9148
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use.
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] request- Display OBFL arbitration timeouts. Note these are not packet drops. These likely
timeout [module x] indicate the destination interface listed is congested. The source interface will retry
the arbitration request.
MDS Command Reference
MDS 9148 - continued
Command Function
show hardware internal statistics all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
show hardware internal packet-flow dropped Display counts of packets dropped
show hardware internal packet-dropped-reason Displays counters of packets dropped and the counter names(reasons) for
each
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
MDS Command Reference
MDS 9250i
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show interface fcx/y counters details Displays more counters pertaining to the interface but regarding slow drain, not
much different than the above.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use. This command requires an single interface or
interface range.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20 second interval
MDS Command Reference
MDS 9250i - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
MDS Command Reference
MDS 9148S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to
in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or
more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,
not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL error-stats. This contains many counters related to slow
error-stats drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display txwait delta values recorded when greater than 99ms per 20
txwait second interval
MDS Command Reference
MDS 9148S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
Slow drain counters and descriptions
For the following MDS switches:
9500 – Gen2/3/4 linecards
9700 – Gen 5 linecard
9148 – 8G 48 port Fabric switch
9250i – Multiservice Fabric Switch
9148S – 16G 48 port Fabric switch
9396S – 16G 96 port fabric switch

Table 1 – Counters indicating delay only


Table 2 – Counters indicating frame drops
Table 3 – Counters indicating action on or for an interface
Table 4 – Counters representing interrupts
Table 5 – SNMP variables
Superscripts indicate linecard generation or switch type. See the
list located after Table 5.
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
Counter Name Description Commands Additional Info

FCP_CNTR_RCM_CH0_LACK_OF_CREDIT2 Total count of transitions to zero for Rx B2B credits on ch0; these Sup: Note1: CSCts28865 B2B credit 0
transitions typically indicate that the switch is applying back pressure to transitions incorrect for
AK_FCP_CNTR_RCM_CH0_LACK_OF_CREDIT3 show hardware internal statistics all 2,3,4,48,note2 generation 4 linecards
the attached device because of perceived congestion, and this perceived
congestion can be the result of a lack of Tx B2B credits being returned show hardware internal statistics device all 5,note2
THB_RCM_RCP0_RBBZ_CH04,note1 Integrated in NX-OS 5.2(2)
on an interface over which this device is communicating
F16_RCM_RCP0_RBBZ_CH05 None48s,50i,note3

FCP_CNTR_RCM_RBBZ_CH0 48
Note2: CSCut21070 show
There is no indication of time at zero for this counter. It could stay at zero
hardware internal statistics sup
VIP_RCM_RBBZ_CH0_CNT 50i, 48S for just an instant or for an extended duration of time. Linecard:
command does not include fcmac
slot x show hardware internal statistics2,3,48

Also shown in the output of show interface counters: slot x show hardware internal fc-mac port x error-statistic2,3
Note3: CSCus85931 Need
xxxx receive B2B credit transitions from zero slot x show hardware internal statistics device fcmac all4 show hardware internal errors
command on MDS 9250i, 9148,
or slot x show hardware internal statistics device 9148S

xxxx Receive B2B credit transitions to zero fcmac|all5,48s,50i

In the above “from” was changed to “to” via this bug:

CSCug35184 show interface counters - transitions of rx BB credit to


zero state

Table 1 - Counters indicating delay only


Slow drain counters and descriptions
Table 1 - Counters indicating delay only
FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT2 Total count of transitions to zero for Tx B2B credits on ch0 or ch1; these Sup: Note1: CSCts28865 B2B credit 0
transitions are typically the result of the attached device's withholding of transitions incorrect for generation
AK_FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT3 show hardware internal statistics all2,3,4,48 4 linecards
R_Rdy primitive from the switch due to congestion in that device.
THB_TMM_PORT_TBBZ_CH04,note1 show hardware internal statistics device all5

F16_RCM_RCP0_TBBZ_CH05 None48s,50i,note3 Note2: CSCut21070 show


There is no indication of time at zero for this counter. It could stay at zero for
FCP_CNTR_TMM_TBBZ_CH048 just an instant or for an extended duration of time. hardware internal statistics sup
FCP_CNTR_TMM_TBBZ_CH148 command does not include fcmac
Linecard:
VIP_TMM_TBBZ_CH0_CNT50i, 48S
Also shown in the output of show interface counters: slot x show hardware internal statistics2,3,48
VIP_TMM_TBBZ_CH1_CNT50i, 48S Note3: CSCus85931 Need show
xxxx transmit B2B credit transitions from zero slot x show hardware internal fc-mac port x error-statistic2,3 hardware internal errors command
on MDS 9250i, 9148, 9148S
or slot x show hardware internal statistics device fcmac all4
xxxx Transmit B2B credit transitions to zero slot x show hardware internal statistics device
In the above “from” was changed to “to” via this bug: fcmac|all5,48s,50i
CSCug35184 show interface counters - transitions of rx BB credit to zero
state

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only

None2,3 Packet is available to send, but no credit is available; Sup:note3 Note1: CSCut21070 show
hardware internal statistics sup
THB_TMM_PORT_TWAIT_CNT4 Gen4/Gen5: increments every clock cycle (cycle = 2.353 nanoseconds 425Mhz) None5,note1 command does not include fcmac

F16_TMM_PORT_TWAIT_CNT5 9250i/9148s: Increments every clock cycle (cycle = 2ns 500MHz) None48s,50i,note2

None48 Must multiply by number of ports in port-group to get actual time. Note2: CSCus85931 Need show
hardware internal errors command
VIP_TMM_TXWAIT_CH0_CNT50i, 48S Linecard:
on MDS 9250i, 9148, 9148S
VIP_TMM_TXWAIT_CH1_CNT50i, 48S To calculate actual time: slot x show hardware internal statistics device fcmac|all5,48s,50i

Twait * clock_rate * ports in port_group


Note3: See Table 3 SNMP variable
fcIfTxWaitCount

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
FCP_CNTR_RX_WT_AVG_B2B_ZERO2, 48 Count of the number of times an interface was at zero Rx B2B credits for 100 OBFL: Note 1: MDS 9148 added support
ms; this status typically indicates that the switch is withholding R_Rdy for this counter in
AK_FCP_CNTR_RX_WT_AVG_B2B_ZERO3 Show logging onboard error-stats2,3,4,5,48,48s,50i
primitive to the device attached on that interface due to congestion in the
NX-OS 5.2(6)
path to devices with which it is communicating
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO4

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48,note1 Sup Hardware internal errors:


Note2: Gen5 and 9250i do not
Always incremented by the software creditmon process. show hardware internal errors all|module x2,3,4
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO5,50i, note2 increment 6.2(1), 6.2(5) and 6.2(7).

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48S None5,note5 CSCui27981


FCP_SW_CNTR_RX_WT_AVG_B2B_
None48s,50i,note4
ZERO not incrementing on DS-
X9448-768K9

Sup Hardware internal statistics: Integrated in: 6.2(9)

show hardware internal statistics all2,3,48

None4,5,note3 Note3: CSCut21070 show


hardware internal statistics sup
None48,48s,50i,note4 command does not include fcmac

Integrated in: open

Linecard Hardware internal statistics:

slot x show hardware internal statistics2,3,48 Note4: CSCus85931 Need show


hardware internal errors & stats
slot xshow hardware internal fc-mac port x error-statistic2,3
cmds on MDS 9250i 9148 9148S
slot xshow hardware internal errors4,5,48,48s,50i
Integrated in: open
slot x show hardware internal statistics device fcmac all port x4,5

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
FCP_CNTR_TX_WT_AVG_B2B_ZERO2 Count of the number of times that an interface was at zero Tx B2B credits for OBFL: Note 1: MDS 9148 added support
100 ms. This status typically indicates congestion at the device attached on for this counter in
AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO3 Show logging onboard error-stats2,3,4,5,48,48s,50i
that interface.
NX-OS 5.2(6)
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO4

FCP_CNTR_TX_WT_AVG_B2B_ZERO48,note1,note2 Sup Hardware internal errors:


Incremented by the creditmon software process on MDS 9500 and 9148.
Note2: CSCud93587 MDS9148
Consequently, it could indicate an interval between 100ms and 199ms. show hardware internal errors all|module x2,3,4
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO5, 50i,48s OBFL doesn't contain
FCP_CNTR_TX_WT_AVG_B2B_ZERO
None5,note5
NX-OS 6.2(1) through 6.2(7) on the 9710 and NX-OS 6.2(5) through 6.2(7) on Integrated in: Unresolved
the 9250i this was incremented based on
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H and None48s,50i,note4
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING Note3: CSCut21070 show
interrupts. hardware internal statistics sup
Sup Hardware internal statistics: command does not include fcmac
Consequently, this only occurred once when the HW interrupt occurred and
not each 100ms interval like in prior instances. Integrated in: open
show hardware internal statistics all2,3,48

None4,5,note3
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) these are once again Note4: CSCus85931 Need show
None48,48s,50i,note4
incremented by the software creditmon process. They will once again hardware internal errors & stats
increment each 100ms interval where the port remains at 0 Tx credits. cmds on MDS 9250i 9148 9148S

Linecard Hardware internal statistics: Integrated in: open

slot x show hardware internal statistics2,3,48

slot xshow hardware internal fc-mac port x error-statistic2,3

slot xshow hardware internal errors4,5,48,48s,50i

slot x show hardware internal statistics device fcmac all port x 4,5

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
RI12_CP_CNT_RESEND_MSG_DROP 2,3 These are not packet drops. Only the request resend message to arbiter was Sup:
dropped. This can be the case when the original request was finally serviced,
FAL_RI0_CP_CNT_RESEND_MSG_DROP4 so the follow up message was dropped. It can indicate some minor show hardware internal errors all|module x
congestion of the egress port, so request could not be granted immediately.
This is counted against the ingress port. It probably indicates some
congestion on an egress port. OBFL:

Check show logging onboard flow-control request-timeout - You might see show logging onboard error-stats
corresponding entries.

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5,note1,note3 Count of times port was at zero Tx credits for the stuck port timeout value. Sup: Note1: Might falsely increment
during port flap:
NoneNote2
CSCus70632
NX-OS 6.2(1) through 6.2(7) on the 9710 this was used for credit loss F16_TMM_PORT_STUCK_FORCE_TI
recovery so was set to 1s(F port)/1.5s(E port). MEOUT_L_H_CNT increments
Linecard:
during port flap
Slot x show hardware internal statistics device all
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) the software creditmon process
once again detects credit loss recovery and the stuck force timout is used for slot x show hardware internal errors
Note2: CSCut21070 show
“system timeout no-credit-drop”. Defaults to 500ms with no action taken(no hardware internal statistics sup
packets are dropped then it is reached). command does not include fcmac

Needs to be configured via: Integrated in: open

System timeout no-credit-drop <ms> mode e|f

This counter will increment even if “system timeout no-credit-drop” is not Note3: CSCut27271 Stuck port
configured since it defaults to 500ms. If no-credit-drop is not configured then threshold not reset to default when
no action is taken and it simply indicates the port was at zero Tx credits for removing no-credit-drop
500ms. Note3
Integrated in: NX-OS 6.2(13)
This is similar to the viper counter:

VIP_TMM_STK_PRT_TO_TRANSITION_CHx_CNT

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only

F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_H_L_CNT5 Count of times a credit was received after the slow port timeout threshold had Sup: Note1: CSCut21070 show
been triggered. hardware internal statistics sup
NoneNote1 command does not include fcmac

Integrated in: open

Linecard:

Slot 1 show hardware internal statistics device all|fcmac

slot x show hardware internal errors

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i,note2 Count of times port was at zero Tx credits for the stuck port timeout value. Sup: Note1: CSCus85931 Need show
hardware internal errors & stats
VIP_TMM_STK_PRT_TO_TRANSITION_CH1_CNT48S,50i,note2 - channel 0 (high priority queue) NoneNote1 cmds on MDS 9250i 9148 9148S

- channel 1 (low priority queue) Integrated in: open

Linecard:

NX-OS 6.2(5) through 6.2(7) on the 9250i this was used for credit loss slot 1 show hardware internal statistics device all|fcmac Note2: CSCut27271Stuck port
recovery so was set to 1s(F port)/1.5s(E port). threshold not reset to default when
slot 1 show hardware internal errors
removing no-credit-drop

Integrated in: open


In NX-OS 6.2(9) the software creditmon process once again detects credit loss
recovery and the stuck force timout is used for “system timeout no-credit-
drop”. Defaults to 500ms with no action taken (no packets are dropped then it
is reached).

Needs to be configured via:

System timeout no-credit-drop <ms> mode e|f

This counter will increment even if “system timeout no-credit-drop” is not


configured since it defaults to 500ms. If no-credit-drop is not configured then
no action is taken and it simply indicates the port was at zero Tx credits for
500ms. Note3

This is similar to the Gen5 counter


F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT

Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
VIP_TMM_SLO_PRT_TO_TRANSITION_CH0_CNT48S,50i Count of times port was at zero Tx credits for the slow port timeout value. Sup: Note1: CSCus85931 Need show
hardware internal errors & stats
VIP_TMM_SLO_PRT_TO_TRANSITION_CH1_CNT48S,50i - channel 0 (high priority queue) NoneNote1 cmds on MDS 9250i 9148 9148S

- channel 1 (low priority queue) Integrated in: open

Linecard:

NX-OS 6.2(5) through 6.2(7) on the 9250i this was used to increment the show hardware internal statistics device all|fcmac
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter.

Consequently, this only occurred once when the HW interrupt occurred and
not each 100ms interval like in prior instances.

In NX-OS 6.2(9) this is used for the slowport-monitor feature.

Needs to be configured via:

System timeout slowport-monitor <ms> mode e|f

Slowport-monitor events can be displayed via:

show process creditmon slowport-monitor-events

show logging onboard slowport-monitor-events

This is similar to the Gen5 counter;

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H

Table 1 - end
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
Counter Name Description Commands Additional Info

None2 Number of frames dropped in tolb_path or np path by the Transmit Memory OBFL: Note 1: Bugs:
Manager(TMM); these drops include all types of packet drops: timeout,
None3 Show logging onboard error-stats2,3,4,5,48,48s,50i CSCud77292 Gen 4 linecards do
offline, abort drops, dummy frame drops at egress, etc.
not increment output discards on
THB_TMM_PORT_FRM_DROP_CNT4 ,note 1 interface statistics

F16_TMM_PORT_FRM_DROP_CNT5 Sup Hardware internal errors/statistics: Integrated into NX-OS 5.2(8c) and
These counters are the aggregate counters for all the underlying counters.
6.2(1)
FCP_CNTR_TMM_NORMAL_DROP48 show hardware internal errors all|module x2,3,4

VIP_TMM_NORMAL_DROP_CNT50i, 48S, note2


Note 2: It is not normal for drops to
VIP_TMM_TOTAL_DROP_ CNT50i, 48S, note2 show hardware internal statistics all48
occur so this counter's name is
misleading. The following bug
renamed
None48s,50i,note3 VIP_TMM_NORMAL_DROP_CNT to
VIP_TMM_TOTAL_DROP_CNT since
the drops included in this counter
Sup packet-dropped-reason: are not necessarily normal.

show hardware internal packet-dropped-reason mod 48,48S,50i


CSCus60322 Add
VIP_TMM_TO_CNT and
Linecard Hardware internal errors VIP_TMM_TO_DROP_CNT to
packet-flow dropped
show hardware internal fc-mac port x error-statistic2,3
Integrated in NX-OS 6.2(13)
show hardware internal errors4,5,48,48s,50i

Table 2
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES2 Timeout drops at egress due to frames hitting the congestion drop threshold OBFL: Note1: These do not appear until
NX-OS 6.2(9)
AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES3 Congestion drop threshold is set via the following command and is on at 500ms Show logging onboard error-stats2,3,4,5,48,48s,50i
by default on all “modes” (port types): Note2: These are included in
THB_TMM_TOLB_TIMEOUT_DROP_CNT4 Show logging onboard flow-control timeout-drops 2,3,4,5,48,48s,50i,note1 VIP_TMM_TO_CNT.
system timeout congestion-drop mode e|f
F16_TMM_TOLB_TIMEOUT_DROP_CNT5 Sup Hardware internal errors Note3: Should be included in `show
hardware internal statistics pktflow
FCP_CNTR_TMM_TIMEOUT_DROP48 show hardware internal errors all|module x2,3,4
dropped`:
VIP_TMM_TO_DROP_CNT50i, 48S,note2,note3 show hardware internal statistics all48
CSCus60322 Add
show hardware internal statistics all5,note4 VIP_TMM_TO_CNT and
VIP_TMM_TO_DROP_CNT to
None48s,50i,note5 packet-flow dropped

Sup packet-dropped-reason Integrated into NX-OS 6.2(13)

show hardware internal packet-dropped-reason mod 2,3,48,48S,50i Note4: CSCut21070 show


hardware internal statistics sup
Linecard Hardware internal errors
command does not include fcmac
slot x show hardware internal statistics2,3,48
Integrated in: open
show hardware internal fc-mac port x error-statistic2,3
Note5: CSCus85931 Need show
show hardware internal errors2,3,4,5,48,48s,50i hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
show hardware internal statistics device fcmac all4
Integrated in: open
show hardware internal statistics device all|fcmac2,3,5,48,48s,50i

Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
THB_TMM_TIMEOUT_STATS_DROP4 Timeout stats dropped because stats fifo full

F16_TMM_TIMEOUT_STATS_DROP5 These counters are not real drops. Basically what I have
understood from F16/TBIRD ASIC is that, there is TIMEOUT
STATS FIFO available at TMM. This FIFO holds, packets which
are timed out. If the FIFO is full and not read, newly timed
out packets will not be overwritten in to the FIFO and new
time-out packets are counted by TIMEOUT_STATS_DROP.

It is TBIRD/F16 feature. Viper does not have this feature.

Gen2/Gen3 also do not have this feature.


FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD2 Count of class-3 Fibre Channel frames dropped as a result OBFL: Note1: CSCut21070
of congestion-drop timeout show hardware internal
AK_FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD3 Show logging onboard error-stats2,3,4,5 statistics sup command
does not include fcmac
THB_TMM_TO_CNT_CLASS_34 Sup Hardware internal errors
Integrated in: open
F16_TMM_TO_CNT_CLASS_35 show hardware internal errors all|module x2,3

None48,50i, 48S show hardware internal statistics device


fcmac|all4,5,note1

Linecard Hardware internal errors

show hardware internal fc-mac port x error-


statistic2,3

show hardware internal statistics2,3

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac5

show hardware internal errors2,3

Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD2 Count of class-F Fibre Channel frames dropped due to congestion-drop OBFL: Note1: CSCut21070 show
timeout hardware internal statistics sup
AK_FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD3 Show logging onboard error-stats2,3,4,5 command does not include fcmac

THB_TMM_TO_CNT_CLASS_F4 Sup Hardware internal errors Integrated in: open

F16_TMM_TO_CNT_CLASS_F5 show hardware internal errors all|module x2,3

None48,48s,50i show hardware internal statistics device fcmac|all4,5,note1

Linecard Hardware internal errors

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac5

show hardware internal errors2,3

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5 Total number of frames force timeout dropped by Stuck port processing(no- Sup: Note1: CSCut21070 show
credit-drop) hardware internal statistics sup
VIP_TMM_STUCK_PORT_TO_CNT48S,50i None 5,note1 command does not include fcmac

None 48S,50i,note2 Integrated in: open


Gen2/3/4 /9148 do not have a counter for this. Any frames dropped as a result
of no-credit-drop on these are just counted as timeout discards. Linecard:

show hardware internal statistics device all|fcmac Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S

Integrated in: open

Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_TMM_TIMEOUT48 Total number of Timeout drops counter which includes Sup hardware internal statistics: Note1: Should be included in `show
hardware internal statistics pktflow
VIP_TMM_TO_CNT 48S,50i,note1 frames timed out due to congestion(pkt timeout), HW stuck force timeout, show hardware internal statistics all48 dropped`. See:
HW slow port force timeout.
None 48S,50i,note2

CSCus60322 Add
VIP_TMM_TO_CNT and
Sup packet-dropped-reason
VIP_TMM_TO_DROP_CNT to
show hardware internal packet-dropped-reason mod 48,48S,50i packet-flow dropped

Integrated into 6.2(13)

Linecard hardware internal errors/statistics:

show hardware internal errors48,48s,50i

show hardware internal statistics48 Note2: CSCus85931 Need show


hardware internal errors & stats
show hardware internal statistics device all|fcmac48s,50i cmds on MDS 9250i 9148 9148S

Integrated in: open

OBFL:

Show logging onboard error-stats48,48s,50i

F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_CNT5 Total timeout packets dropped due to slow-port-monitor processing. This None. Not implemented.
doesn’t increment since the slow-port-monitor feature doesn’t include a
VIP_TMM_SLOW_PORT_TO_CNT 48S,50i packet drop function

Table 2 - end
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
Counter Name Description Commands Additional Info

FCP_CNTR_CREDIT_LOSS2 ,48 Count of the number of times that creditmon credit loss recovery has been OBFL: Note1: CSCut21070 show
invoked on a port hardware internal statistics sup
AK_FCP_CNTR_CREDIT_LOSS3 Show logging onboard error-stats2,3,4,5,48,48s,50i command does not include fcmac

FCP_SW_CNTR_CREDIT_LOSS4,5 ,48s,50i Integrated in: open


Sup Hardware internal errors/statistics:

show hardware internal errors all|module x2,3,4 Note2: CSCus85931 Need show
hardware internal errors & stats
None4,5,note1
cmds on MDS 9250i 9148 9148S
show hardware internal statistics all48
Integrated in: open
None48s,50i,note2

Linecard hardware internal errors/statistics

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3,48

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac|all5,48,48S,50i

show hardware internal errors2,3,4,48,48S,50i

Table 3
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
FCP_CNTR_FORCE_TIMEOUT_ON2 ,48 Count of the number of times the "system timeout no-credit-drop threshold" OBFL: Note2: CSCus85931 Need show
has been reached by this port; when a port is at zero Tx B2B credits for the time hardware internal errors & stats
AK_FCP_CNTR_FORCE_TIMEOUT_ON3 Show logging onboard error-stats2,3,4,5,48,48s,50i cmds on MDS 9250i 9148 9148S
specified, the port starts to drop packets at line rate

FCP_SW_CNTR_FORCE_TIMEOUT_ON4 Integrated in: open

FCP_SW_CNTR_FORCE_TIMEOUT_ON5,50i,48s,note2 Sup hardware internal errors/statistics


Note 1: For the 9700 and 9250i these counters will only increment prior to the
introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first show hardware internal errors all|module x2,3,4 Note3:
supported in NX-OS 6.2(9) it will never have it increment. Reference the
show hardware internal statistics all48 After NX-OS 6.2(9) the following
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5 and
counters are not incrementing on
VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i counters which indicate None48s,50i,note2
MDS 9700, 9250i, 9148S:
the same thing (but are not in OBFL). Checking on whether these should be re-
added. CSCus93140 no-credit-drop SW
Linecard hardware internal errors/statistics: counters not incrementing on MDS
9700, 9250i, 9148S
See the following counters to determine frames dropped due to force show hardware internal fc-mac port x error-statistic2,3
timeout:
show hardware internal statistics2,3,48
Integrated in: open
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5
show hardware internal statistics device fcmac all4
VIP_TMM_STUCK_PORT_TO_CNT48S,50i
show hardware internal statistics device fcmac|all48

show hardware internal errors2,3,4,48

None 5,48S,50i,note3

Table 3 - continued
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
FCP_CNTR_FORCE_TIMEOUT_OFF2 Count of the number of times that the port has recovered from the system OBFL: Note2: CSCus85931 Need show
timeout no-credit-drop condition; this status typically means that R_Rdy hardware internal errors & stats
AK_FCP_CNTR_FORCE_TIMEOUT_OFF3 Show logging onboard error-stats2,3,4,5,48,48s,50i cmds on MDS 9250i 9148 9148S
primitive has been returned or possibly that an LR and LRR has occurred.

FCP_SW_CNTR_FORCE_TIMEOUT_OFF4,48 Integrated in: open

FCP_SW_CNTR_FORCE_TIMEOUT_OFF5,50i,48s,note 1 Sup hardware internal errors/statistics


Note 1: For the 9700 and 9250i these counters will only increment prior to the
introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first show hardware internal errors all|module x2,3,4 Note3:
supported in NX-OS 6.2(9) they will never increment. They are re-added into
hardware internal statistics and logging onboard error-stats via the following show hardware internal statistics all48 After NX-OS 6.2(9) the following
bug: counters are not incrementing on
None48s,50i,note2
MDS 9700, 9250i, 9148S:
CSCus93140 no-credit-drop SW counters not incrementing on MDS 9700,
9250i, 9148S CSCus93140 no-credit-drop SW
Linecard hardware internal errors/statistics: counters not incrementing on MDS
9700, 9250i, 9148S
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_H_L_CNT5 indicates the same thing. show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3,48


Integrated in: open
show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac|all48

show hardware internal errors2,3,4,48

None 5,48S,50i,note3

Table 3 - continued
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
AK_FCP_CNTR_LINK_RESET_OUT2,3 Count of times a Link Credit Reset(LR) was transmitted from the interface. Sup hardware internal errors/statistics Note1: These are not incremented.
FCP_SW_CNTR_LINK_RESET_OUT48
show hardware internal statistics all2,3,48
FCP_SW_CNTR_LINK_RESET_OUT4,5,50i,48S, note 1 Also shown in the output of “show interface counters detail” as: CSCus99138 Port software counters
not incrementing
xxx link reset protocol errors transmitted
Linecard hardware internal errors/statistics Integrated in: open
Or show hardware internal statistics all2,3,48

xxx link reset transmitted while link is active

Note the above just counts link resets that are transmitted when the link is
active.
AK_FCP_CNTR_LINK_RESET_IN2,3 Count of times a Link Credit Reset(LR) was received on the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LINK_RESET_IN48 AK_FCP_CNTR_LINK_RESET_OUT
FCP_SW_CNTR_LINK_RESET_IN4,5,50i,48S, note 1 Also shown in the output of “show interface counters detail” as: Show logging onboard interrupt-stats will show above.
IP_FCMAC_INTR_PRIM_RX_SEQ_LR
xxx link reset protocol errors received

Or

xxx link reset received while link is active

Note the above just counts link resets that are received when the link is active.
AK_FCP_CNTR_LRR_OUT 2,3 Count of times a Link Credit Reset Response(LRR) was transmitted from the Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1 interface. AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_LRR_IN 2,3 Count of times a Link Credit Reset Response(LRR) was received on the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LRR_IN4,5,48,50i,48S, note 1 AK_FCP_CNTR_LINK_RESET_OUT
Also shown using show interface fcx/y above.

xx input OLS,xx LRR,0 NOS,xx loop inits


Table 3 - continued
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
AK_FCP_CNTR_OLS_OUT 2,3 Count of times an Off Line Sequence(OLS) was transmitted from the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_OLS_OUT4,5,48,60i,48S, note 1 AK_FCP_CNTR_LINK_RESET_OUT
Also shown using show interface fcx/y above.

xx output OLS,xx LRR, xx NOS, xx loop inits


AK_FCP_CNTR_OLS_IN 2,3 Count of times an Off Line Sequence(OLS) was received on the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_OLS_IN4,5,48,50i,48S, note 1 AK_FCP_CNTR_LINK_RESET_OUT
Also shown using show interface fcx/y above.

xx input OLS,xx LRR,0 NOS,xx loop inits


AK_FCP_CNTR_NOS_OUT 2,3 Count of times an Not Operational Sequence(NOS) was transmitted from the Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_NOS_OUT4,5,48,50i,48S, note 1 interface. AK_FCP_CNTR_LINK_RESET_OUT
above.
Also shown using show interface fcx/y

xx output OLS,xx LRR, xx NOS, xx loop inits


AK_FCP_CNTR_NOS_IN 2,3 Count of times an Not Operational Sequence(NOS) was received on the Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_NOS_IN4,5,48,50i,48S, note 1 interface. AK_FCP_CNTR_LINK_RESET_OUT
above.
Also shown using show interface fcx/y

xx input OLS,xx LRR,0 NOS,xx loop inits


AK_FCP_CNTR_LRR_OUT 2,3 Count of times a Link Credit Reset Response(LRR) was transmitted from the Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1 interface. AK_FCP_CNTR_LINK_RESET_OUT
above.
Also shown using show interface fcx/y

xx output OLS,xx LRR, xx NOS, xx loop inits

Table 3 - end
Slow drain counters and descriptions
Table 4 – Interrupt counters
Counter Name Description Commands Additional Info

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H5 Slowport condition detected count (Low to High transition: i.e. credit wait OBFL:
(cwait) > threshold)
show logging onboard interrupt-stats

NX-OS 6.2(1) through 6.2(7) - Count of times port was at zero Tx credits
Linecard:
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
slowport-monitor” releases this counter was used to trigger the Slot x show hardware internal fcmac port x interrupt-counts
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.

NX-OS 6.2(9) and later – Should not occur.

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_H_L5 Slowport condition exited count (High to Low transition: ie creditwait OBFL:
(cwait) < threshold)
show logging onboard interrupt-stats

NX-OS 6.2(1) through 6.2(7) - Count of times port received a credit after
Linecard:
being at zero Tx credits for 100ms or longer. In these “pre-slowport-
monitor” releases this counter was used to re-arm the Slot x show hardware internal fcmac port x interrupt-counts
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.

NX-OS 6.2(9) and later – Should not occur.

Table 4
Slow drain counters and descriptions
Table 4 – Interrupt counters
F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_L_H5 Stuck port condition detected count (Low to High transition. OBFL:

show logging onboard interrupt-stats

Configured via:

“system timeout slowport-monitor xxx mode|f” Linecard:

Slot x show hardware internal fcmac port x interrupt-counts


Defaults to 500ms with no action.
F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_H_L5 Count of times stuck port condition exited. OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_RAISING48s, Slowport condition detected count (Low to High transition: i.e. credit wait OBFL: Note: VIPER does not have a High to
50i (cwait) > threshold) Low interrupt like F16.
show logging onboard interrupt-stats

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING48s, NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
50i
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
Linecard:
slowport-monitor” releases this counter was used to trigger the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment. Slot x show hardware internal fcmac port x interrupt-counts

NX-OS 6.2(9) and later – Should not occur.

Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_FALLING 48s,50i Slowport condition detected count exited. OBFL: Note: These are displayed in OBFL
with the VIPER_FCP_INTR_ prefix but
show logging onboard interrupt-stats without the prefix in other places.
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_FALLING 48s,50i

Linecard:
Slot x show hardware internal fcmac port x interrupt-counts

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_RAISING Count of times port was at zero Tx credits for the stuck port timeout OBFL:
value ”no-credit-drop” (default value 500ms).
show logging onboard interrupt-stats
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_RAISING

NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
Linecard:
for 1s(F port) or 1.5s(E port). In these “pre-slowport-monitor” releases
this interrupt was used to trigger the FCP_SW_CNTR_CREDIT_LOSS Slot x show hardware internal fcmac port x interrupt-counts
counter increment.

NX-OS 6.2(9) and later – Should not occur.

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_FALLING Count of times stuck port condition exited. OBFL:


VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_FALLING
show logging onboard interrupt-stats

Linecard:
Slot x show hardware internal fcmac port x interrupt-counts

Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters
IP_FCMAC_INTR_PRIM_RX_SEQ_NOS Not Operational Sequence received on the interface. OBFL: show interface fcx/y counters details

show logging onboard interrupt-stats


NOS is a sequence that is transmitted continuously until a OLS is received. show interface detailed-counters

xxx non-operational sequences


Linecard: received

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

IP_FCMAC_INTR_PRIM_RX_SEQ_OLS Off Line Sequence received on the interface. OBFL: show interface fcx/y counters details

show logging onboard interrupt-stats


OLS is a sequence that is transmitted continuously until a LR is received. show interface detailed-counters

xxx Offline Sequence errors received


Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i


IP_FCMAC_INTR_PRIM_RX_SEQ_LR Link Reset received on the interface. LR is sent under two conditions OBFL: show interface fcx/y counters details
normally:
show logging onboard interrupt-stats
show interface detailed-counters
1) Link bringup – NPS/OLS/LR/LRR
2) Credit Loss Recovery – LR is sent to bring each side up to its full xxx link reset protocol errors received
complement of B2B credits. This doesn’t bounce or flap the link Linecard:
but just restore the B2B credits.
show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters

IP_FCMAC_INTR_PRIM_RX_SEQ_LRR Link Reset received on the interface. This is sent in response to a Link Reset. OBFL: show interface fcx/y counters details

show logging onboard interrupt-stats


show interface detailed-counters

xxx link reset responses received


Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

Table 4 - end
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
Counter Name Description Commands Additional Info

n
fcIfTxWaitCount 2,3,48 ,see note 1 OID 1.3.6.1.4.1.9.9.289.1.2.1.1.15 Displayed via: Note1: On gen2, gen3 and 9148,
this will always return zero.
fcIfTxWaitCount 4,note2,note3 The number of times the FC-port waited due to lack of transmit credits and there Show interface fcx/y counters detailed | i wait
were packets queued for transmit. This is in units of 2.5us. Note2: Added to Gen4 linecards in
fcIfTxWaitCount 5,note 3 -or- NX-OS 5.2(2)
To calculate seconds txwait * 2.5 /1000000
fcIfTxWaitCount 50i, 48S, see note 4 Show interface detailed-counters | i fc|wait Note3: Prior to 6.2(11a) this counter
There is no OID for the Rx direction of this. was inaccurate. See the following
bug:
Not generated by port-monitor
Example:
CSCus15233 fcIfTxWaitCount
Based on the following counters:
rtp-san-34-15-9513# show int fc4/1 counters details | i wait incorrect on DS-X9232-256K9 and
THB_TMM_PORT_TWAIT_CNT4 DS-X9248-768K9
F16_TMM_PORT_TWAIT_CNT5 82864704 waits due to lack of transmit credits
VIP_TMM_TXWAIT_CH0_CNT50i, 48S Fixed in 6.2(11a)
VIP_TMM_TXWAIT_CH1_CNT50i, 48S
Note4: Prior to 6.2(11a) this counter
Not generated by port-monitor was inaccurate. See the following
bug:

CSCus15745 fcIfTxWaitCount
incorrect for MDS 9250i and 9148S

Fixed in 6.2(11a)

fcIfCreditLoss2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.37 Generated by port-monitor counter credit-loss-reco Credit loss recovery is initiated by
the MDS after 1 second(F port) / 1.5
seconds(E port) at zero Tx credits.

The number of link resets that have occurred due to unavailable Shown in the output of show interface counters: Other products may initiate at
credits from the peer side of the link. different intervals
xxx timeout discards, xxx credit loss

Table 5
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
fcIfLinkResetOuts 2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.10 Generated by port-monitor counter lr-tx

The number of link reset protocol errors issued by Shown in the output of show interface fcx/y counters detailed:
the FC-Port to the attached FC-Port.
xxx link reset protocol errors transmitted

or

xxx link reset transmitted while link is active

fcIfLinkResetIns2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.9 Generated by port-monitor counter lr-rx

The number of link reset protocol errors received by Shown in the output of show interface fcx/y counters detailed:

the FC-Port from the attached FC-port xxx link reset protocol errors received

or

xxx link reset received while link is active

fcIfTimeOutDiscards2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.35 Generated by port-monitor counter timeout-discards

The number of packets that are dropped due to time-out at the FC-port or due to the FC-port going Shown in the output of show interface counters:
offline.
xxx timeout discards, xxx credit loss

Table 5 - continued
Slow drain counters and descriptions
Table 5 - SNMP variables applicable
fcIfOutDiscards2,3,4,5,48,50i, 48S to slow drain
OID 1.3.6.1.4.1.9.9.289.1.2.1.1.36 Generated by port-monitor counter tx-discards

The total number of packets that are discarded in the egress side of the FC-port.

Shown in the show interface fcx/y command:

xxx discards, xxx errors


fcIfTxWtAvgBBCreditTransitionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.38 Generated by port-monitor counter tx-credit-not-available See the following hardware internal
error for more info:
Count of the number of times that an interface was at zero Tx B2B credits for 100 ms. show system internal snmp credit-not-available
This status typically indicates congestion at the device attached on that interface. xxx_CNTR_TX_WT_AVG_B2B_ZERO

Note: There is no OID in the Rx


direction.

CSCus93323 Portmonitor
fcIfTxWtAvgBBCreditTransitionToZero
truncates hcAlarmOwner

fcIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.28 Not generated by port-monitor Based off of the TBBZ hardware
statistic.
Increments when the transmit B2B credit transitions to zero Shown in the output of show interface counters:

There is no indication of time at zero for this counter. It could stay at zero for just an xxxx Transmit B2B credit transitions to zero
instant or for an extended duration of time.

Also shown in the output of show interface counters:

xxxx Transmit B2B credit transitions to zero

Table 5 - continued
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
fcHCIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.40 Not generated by port-monitor Based off of the TBBZ hardware
statistic.

Increments when the transmit B2B credit transitions to zero Shown in the output of show interface counters:

xxxx Transmit B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.

fcIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.39 Not generated by port-monitor Based off of the RBBZ hardware
statistic.

Increments when the receive B2B credit transitions to zero Shown in the output of show interface counters:

xxxx Receive B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.

fcHCIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.41 Not generated by port-monitor Based off of the RBBZ hardware
statistic.
Shown in the output of show interface counters:

Increments when the receive B2B credit transitions to zero xxxx Receive B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.

Table 5 - end
Slow drain counters and descriptions
Legend - superscripts
• Superscripts:
• 1: Generation 1 modules are no longer supported by NX-OS 5.0 (and later releases) and are not covered by this presentation

• 2: Generation 2 DS-X9112, DS-X9124, and DS-X9148 and DS-X9304-18K9 modules

• 3: Generation 3 DS-X9248-48K9 and DS-X92xx-96K9 modules

• 4: Generation 4 DS-X92xx-256K9 modules

• 5: Generation 5 Cisco MDS 9710/9706 DS-X9448-768K9 module and MDS 9396S

• 48: Cisco MDS 9148

• 50i: Cisco MDS 9250i

• 48S: Cisco MDS 9148s

• Legend
• AK: Aakash (Generation 2 or Generation 3 line card MAC ASIC)
• THB: Thunderbird (Generation 4 ASIC)
• F16: F16 (Generation 5 ASIC)
• SAB: Sabre ASIC for MDS 9148
• VIP: Viper ASIC for MDS 9250i and 9148S
• RI: Request Interface
• TMM: Transmit Memory Manager
• FCP_SW: These indicate software counters
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
Thank you

You might also like