SAN Congestion! Understanding, Troubleshooting, Mitigating in A Cisco Fabric
SAN Congestion! Understanding, Troubleshooting, Mitigating in A Cisco Fabric
Understanding, Troubleshooting,
Mitigating in a Cisco Fabric
Edward Mazurek
Technical Lead Data Center Storage Networking
CCIE 6448
BRKSAN-3446
Agenda
• Introduction
• Slow Drain Terminology
• Understanding Fibre Channel Flow Control
• MDS Slow Drain Features
• Troubleshooting Slow Drain
• Alerting and Mitigating Slow Drain
• Conclusion
Introduction
• Slow drain is a term to describe SAN congestion
• When devices do not receive data at the line rate this can cause congestion in
the SAN
• SANs are getting increasing complex and heterogeneous
• Many different speeds
• Many different types of devices
• Host/storage workloads increasing
Reasons for Slow Drain
• Edge devices - An edge device can be slow to respond for a variety of reasons:
• Server performance problems: application or OS
• Host bus adapter (HBA) problems: driver or physical failure
• Speed mismatches: one fast device and one slow device
• Nongraceful virtual machine exit on a virtualized server, resulting in packets held in HBA
buffers
• Storage subsystem performance problems, including overload
• Inter Switch Links (ISL)
• Lack of B2B credits for the distance the ISL is traversing
• Ex: 4 credits per KM @ 8Gbps
• The existence of slow drain edge devices
• Edge devices with faster speeds than ISLs even when port-channeled
Reasons for Slow Drain
Port-channel BW not the same as individual link BW 4x4Gb not equal 16Gb
Member ISL sending at full 4Gbps rate causing congestion back to storage
8Gb Port-
H1 channel
16Gb(total)
4 x 4Gb links VOQs
8Gb
No B2B No R_Rdy
Src, dst, oxid Credits Sent
0 Rx credits remaining
-- No-credit-
• If no-credit-drop is configured and Tx drop 300ms
credits are at 0 for that amount of time 300ms --
then port is considered “stuck”.
--
• Start dropping frames immediately Drop frames in Rx queue
--
without regard to age of frames
No Tx credits
Signaling No Response
+60ms -- Shut/No Shut
Understanding Fibre
Channel Flow Control
Class 1 X
F
Class 2 X
a
b Class 3 X
r Class 4 X
i Class 6 X
c
Class F X
FC-AL X
All data currently is transported using class 3
Fibre Channel Class 3
• Class 3 is a best-effort packetized service:
• The receiving port does not acknowledge receipt of frames. If the fabric cannot
deliver the frame for any reason, the frame can be discarded without notifying
the sending port. However, Class 3 is not really unreliable, because it relies on
ULP to help ensure that frames are delivered, by detecting and recovering from
lost frames
• Class 3 does not guarantee fixed latency because data paths are variable
• Class 3 does not guarantee in-order delivery. For most Fibre Channel
applications, including storage applications, the ULP is responsible for
guaranteeing in-order delivery
Fibre Channel Flow Control
• Fibre Channel flow control attempts to minimize the chance of dropped frames
• Frames are only transmitted when it is known that the receiver has buffer space
• For each frame sent an R_Rdy (B2B Credit) should be returned
• R_Rdys can only be returned once the frame that has previously occupied that
buffer location has been handled
• R_Rdys are not sent reliably – they can be corrupted/lost
• Each side informs the other side of the number of buffer credits it has
• F ports - In the Fabric Login(FLOGI)
• E ports – In the Exchange Link Parameters(ELP)
B N F B B B
ACC (FLOGI) 3 credits
End Device
MDS9710-A
F-Port has
three
credits! MDS9710-A# show int fc1/14
fc1/14 is up
……….
Note: These values are Transmit B2B Credit is 1
not typical. They are Receive B2B Credit is 3
chosen for simplicity. 3 receive B2B credit remaining
1 transmit B2B credit remaining
Typical F ports values 1 low priority transmit B2B credit remaining
16-32
Fibre Channel Flow Control
Frame Flow Control
• As FC frames flow into the fabric, the MDS Rx buffer queue is decremented by 1
B2B credit for each received frame
• Once an R_Rdy is sent by the MDS, it frees up one B2B credit
B N Frame R-Rdy F B B B
Only data – no
R_Rdys
Fibre Channel Flow Control – Example cont’
R-RDY recovery
Xgig
FC Port(1,1,3) Analyzer FC Port(1,1,4) Server
MDS R_Rdy
R_Rdy
R_Rdy
More R_Rdys
More R_Rdys
MDS Frame and Credit Processing
1 Initiator sends an FC frame
to the MDS port ASIC 6 FC Frame is forwarded to
XBAR then R_Rdy sent back
2 FC frame is received in its
entirety and stored since buffer is now free
Active
3 FC Frame transmitted to Supervisor
FC Frame is forward to
VOQ Line Arbiter Line 7 egress line card
Card Card
1 2
FC Frame P
P
o XBAR
FC Frame
o
R-Rdy
r interface r
Top of Queue
Frame to Port 4
X Top of VOQ Top of VOQ
X Top of VOQ
Frame to Port 6 Frame to Port 4 Frame to Port 6
Frame to Port 6 Frame to Port 5 Frame to Port 4 Frame to Port 6
Frame to Port 4 Frame to Port 5 Frame to Port 4 Frame to Port 6
Frame to Port 4 Input Queue at Port Input Queue at Port Input Queue at Port
Frame to Port 6 1 1 1
Frame to Port 5 This diagram shows the primary difference between a VOQ-based
switch and a switch without VOQ.
Frame to Port 5
If destination port 4 was congested, the switch without VOQ would
Input Queue at Port
1
block with frames to other output ports waiting behind the blocked
port.
In contrast, VOQ means that only the VOQ associated with port 4 will
be blocked; frames to all other ports will flow normally.
• Since there is no indication of time at zero this is not a great indication of slow
drain in and of itself
• Use the slowport-monitor or various txwait commands instead
MDS Slow Drain Features
Tx/Rx Credit not Available B2B Credits Sampled
Every 100 ms
• MDS software process detects when a port
is at zero Tx or Rx credits for 100ms
0 sec --
Credits
• Since done by software may not catch each
and every time
100 ms
• Available in: Timestamped!
• slot x show hardware internal fc-mac port y
<snip>
error-statistics
• show logging onboard error-stats
• xxx_CNTR_RX_WT_AVG_B2B_ZERO
• xxx_CNTR_TX_WT_AVG_B2B_ZERO
100 ms
• show system internal snmp credit-not-available
1 sec --
• port-monitor tx-credit-not-available
Successful
MDS Slow Drain Features 0 sec --
Credits
recovery Apr 3 18:53:34 2014 00810034 fc10/30 --- DOWN LR Rcvd B2B
-- No-credit-
• Frames normally queued for Congestion drop 300ms
Drop time 300ms --
DI
MDS Slow Drain Features
Display dropped packet info
• MDS 9710/9396S has the capability of displaying some key packet info for
packets that have experienced a timeout drop
• 32 packets are kept per forwarding instance
• Output contains:
• Source FCID (SID)
• Destination FCID(DID)
• RCTL – Routing control (ELS, ABTS, etc.)
• Source Index(SI)
• Destination Index(DI)
Interface Delay Timestamp
--
• MDS can monitor ports withholding
credits for as low as 1ms --
--
• Records last 10 events for duration and
--
date/time when occurred
5ms --
• Included in OBFL
--
• Full featured for MDS 9700, 9396S, --
9148s and 9250i R_RDY --
.
Request-timeouts
Slowport-monitor-events
Txwait
• slowport-monitor-events New!
• txwait
MDS Slow Drain Features
Port-monitor / Portguard
DCNM Server
• Allows alerting on many slow drain
indications Port-monitor active
SNMP Alerts
Link-loss
New! Credit-loss
• Three new counters! Tx-credit-not-avail
Slowport-count
Slowpoer-oper-delay
txwait
Note: Each level includes all the symptoms of the previous levels
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 1: Latency
• Latency indicates SCSI exchanges are taking longer than normal
• No SCSI errors or retransmissions are noted
• Subtle and difficult to detect
• ISLs and other ports should be checked for low numbers of Tx/Rx remaining
credits
• Use new slowport-monitor, OBFL txwait, txwait-history and alerting
capabilities
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 2: Retransmission
• Once any frame in a SCSI exchange is dropped the exchange will be aborted
• Abort exchanges will be listed in host logs
• Frames are held for a maximum of 500ms prior to dropping as timeouts
• This is the default Congestion Drop value
• Frames can also be dropped as timeouts if no-credit-drop is configured
• Use “show logging onboard starttime <date-time> error-stats”
Troubleshooting Slow Drain
Classifying Slow Drain Symptoms - Level 3: Extreme Delay
• Typically caused by ports without credits for 1 or 1.5 seconds
• Credit-loss Recovery is invoked
• Links may fail and/or flap
• Typically many timeout drops are also recorded
Troubleshooting Slow Drain
Methodology
• Cisco recommends troubleshooting slow drain in the following order:
Level 2: Retransmission
Level 1: Latency
Troubleshooting Slow Drain
Methodology – Follow Congestion to Source
• If Rx congestion then find ports
communicating with this port that
have Tx congestion
• Zoning defines which devices
communicate with this port
• Understand topology F E
1. Module link-events
*************** Port Config Link Events Log ***************
2. Logging log ----
Time
------
PortNo
----- ----- ------
Speed Event Reason
---- ------ ----- ----- ------
• Both indicate the same thing – ...
Jul 28 00:46:39 2012 00670297 fc1/25 --- DOWN LR Rcvd B2B
Rx congestion
MDS9710-1# show logging log
• Not normally a problem w/this %PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
failure)
port but the port this port is
%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
switching packets to failure Link Reset failed nonempty recv queue)
fc6/32 10/0(%) 1 Wed Apr 2 17:23:54 2014 Rising 10% 100ms Tx Delay
fc6/32 10/0(%) 1 Wed Apr 2 17:24:39 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:24:40 2014 Rising 20%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:53 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:54 2014 Rising 20% 200ms Tx Delay
Level 1: Latency - Troubleshooting
Credit Not Available – continued
• Included in OBFL error-stats
• Tracked in both Rx and Tx directions
• Indicates 100ms intervals where Tx or Rx credit is not available Credit Loss
Incremented by 19 - 9 = 10
in 20 seconds
--------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
FCP_SW_CNTR_TX_WT_AVG_B2 --------------------------------------------------------------------------------
B_ZERO Interface | | | Time Stamp
Credit not available 100ms Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
| | |
increments --------------------------------------------------------------------------------
Incremented by 217-108 = 109 fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1496855 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |217 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |19 |04/07/15 22:44:23
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1486654 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |9 |04/07/15 22:44:03
Level 1: Latency - Troubleshooting
Check ISLs for Lack of Transmit Credits
• Data frames are sent using low
priority credits
MDS9710# show interface | include "fc|Belong|low
priority|remain" | exclude "description" |exclude
• If Tx B2B credit remaining is low "Peer" | include "trunking" next 3
then congestion is toward the fc1/3 is trunking
adjacent switch 500 receive B2B credit remaining
0 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
• If Rx B2B credit remaining is low
then congestion is in this switch and
perhaps other switches
• Can only be done while congestion
is in progress
Level-1 Troubleshooting: Latency
Check Transitions to Zero counters
9710-1# show int fc1/13 counters
• Transitions to zero indicate when Tx fc1/13
…
or Rx B2B credits go to zero even 549317 Transmit B2B credit transitions to zero
just for an instant of time 2388296 Receive B2B credit transitions to zero
1934443328 2.5us TxWait due to lack of transmit
credits
• Transmit indicates the adjacent Percentage Tx credits not available for last
1s/1m/1h/72h: 0%/0%/98%/1%
device is withholding credits 32 receive B2B credit remaining
17 transmit B2B credit remaining
17 low priority transmit B2B credit remaining
• Receive indicates this MDS is Last clearing of "show interface" counters 01:25:25
withholding credits from the
adjacent device
• Look for large incrementing
numbers since some devices go to
zero normally
Level-1 Troubleshooting: Latency
Check for Frame Queuing on Ingress Ports
module-1# show hardware internal f16_que inst 0 table iqm-statusmem0
• “Prio 3” is class 3 +-------------------------------------------------------------------------------
| IQM: PG0 Status Memory (logical layout) for F16 Que Driver
| Inst 0; port(s) 1-8
• 000004 is port bitmap in | Each instance is 8 ports on this LC
hexadecimal indicating Note: Only non-zero entries are displayed
Each non-zero bit indicates pending frame in VOQ for that IB
the presence of one or
+----------+--------+--------+--------+--------+
more queued frames | GI (Hex) | Prio 0 | Prio 1 | Prio 2 | Prio 3 | Egress port fc1/13
+----------+--------+--------+--------+--------+
• B’0000 0000 0000 0000 | c | 000000 | 000000 | 000000 | 000004 |
+----------+--------+--------+--------+--------+
Port fc1/3
0000 0100’ rtp-san-33-18-9710-2# show system internal fcfwd idxmap port-to-interface
Port to Interface Table:(All values in hex)
--------------------------------------------------------------------------------
Port 3 Port 2 Port 1 glob| |VL|lcl| if |slot|port| mts | port| flags
idx | if_index | |idx|type| | | node| mode|
-----|--------------------------|--|---|----|----|----|-----|-----|-------------
• GI (Hex) is Global Index 0| 01000000 fc1/1
1| 01001000 fc1/2
| 0| 00| 01 | 00 | 00 | 0102| 00 | 00
| 0| 01| 01 | 00 | 01 | 0102| 00 | 00
(egress port) …snip
b| 0100b000 fc1/12 | 0| 0b| 01 | 00 | 0b | 0102| 00 | 00
c| 0100c000 fc1/13 | 0| 0c| 01 | 00 | 0c | 0102| 00 | 00
slow) ----------------------------
Module: 9
----------------------------
• Do not indicate actual --------------------------------------------------------------------------------
| Dest | Source |Events| Timestamp | Timestamp |
packet drops – just delayed | Intf | Intf | Count| Earliest | Latest |
--------------------------------------------------------------------------------
|fc1/2 |fc9/24, | 28|Sun Feb 9 00:28:23 2014|Sun Feb 9 00:28:24 2014|
• If Dest Intf is FCIP then --------------------------------------------------------------------------------
there are problems on the
FCIP tunnel
• Check for TCP retransmits
• Check for overutilization of
FCIP
Level-1 Troubleshooting: Latency
New!
txwait
• txwait is a counter that increments every 2.5us when port is at 0 Tx credits and
there are frames queued for transmit
• txwait * 2.5 / 1000000 = seconds of time the port was unable to transmit
• Only applies to the following:
• MDS 9500 with generation 4 linecards:
• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch
-----------------------------------------------------------------------------
| Interface | Delta TxWait Time | Congestion | Timestamp |
| | 2.5us ticks | seconds | | |
-----------------------------------------------------------------------------
| fc4/1 | 52927 | 0 | 0% | Wed May 27 13:20:12 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:52 2015 |
| fc4/1 | 105854 | 0 | 1% | Wed May 27 13:19:32 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:12 2015 |
• Delta values recorded when they are more than 100ms in the
20 second interval
Level-1 Troubleshooting: Latency
txwait-history New!
mds9710-1# show process creditmon txwait-history module 1 port 13
• Graphical display of time
TxWait history for port fc1/13:
where Tx credits are not ==============================
available 697
299
54 6994
18 4780
000000000000000000000000000000000029000290088400000000000000
• Similar in format to cpu history 1000 # ##
900 # ##
800 ## ##
• 3 graphs per port 700 ## ##
600 ### ###
• Last 60 seconds 500 ### ## ###
400 ### ## ####
• Last 60 minutes 300 ### ## ####
200 ### ## ####
• Last 72 hours 100 ### ## ####
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
• Utilizes the underlying txwait
Credit Not Available per second (last 60 seconds)
counter # = TxWait (ms)
Level-1 Troubleshooting: Latency
slowport-monitor New!
• system timeout slowport-monitor <1-500> mode e|f
15ms+30ms/2 = 22ms
2 events logged
100
10
20
25
80
90
95
0
5
15
35
40
45
50
55
60
85
30
65
70
75
Time (ms) 9500 Gen3
Credits
100
95
10
20
25
80
90
0
5
40
15
35
45
50
55
60
85
30
65
70
75
Time (ms) 9500 Gen4
Credits
80
90
0
5
40
15
35
45
50
55
60
85
30
65
70
75
Time (ms)
Level-1 Troubleshooting: Latency
show logging onboard slowport-monitor command New!
--------------------------------------------------------------------------
| admin | slowport | oper | Timestamp | Interface
| delay | detection | delay | |
| (ms) | count | (ms) | |
--------------------------------------------------------------------------
| 20 | 49 | 489 | 05/11/15 21:04:46.779 | fc1/13
| 20 | 48 | 489 | 05/11/15 21:04:46.272 | fc1/13
| 20 | 47 | 489 | 05/11/15 21:04:45.779 | fc1/13
| 20 | 46 | 489 | 05/11/15 21:04:45.272 | fc1/13
Level-1 Troubleshooting: Latency
Slowport-monitor – Comparison
Linecard Maximum events Actual delay Notes
per 100ms interval measured?
DS-X9248-48K9 (gen3) No – Just an If actual delay hits slowport-monitor
DS-X9224-96K9 (gen3) indication if admin admin delay then an indication is
DS-X9248-96K9 (gen3) 1 delay was reached. made. That indication is checked
Actual delay could be every 100ms and if true then raise
much more event
DS-X9232-256K9 (gen4) If total delay(sum of all individual
Yes - Actual delay is
DS-X9248-256K9 (gen4) delays) in 100ms interval hits
1 total delay per 100ms
slowport-monitor admin delay then
interval
raise event
DS-X9448-768K9 (gen5) If actual delay hits slowport-monitor
MDS 9396S(gen5) Yes – Average delay admin delay and port
MDS 9148S 100 for all events in recovered(received credit) then
MDS 9250i 100ms interval raise event. These are checked
every 100ms interval.
Level-1 Troubleshooting: Latency
show tech-support slowdrain New!
Step 1:
Select
fabric
Step 2: Step 3:
Choose Start
duration collection
DCNM Slow Drain Analysis
While underway…
Almost
finished
DCNM Slow Drain Analysis
Finished
Select
job
DCNM Slow Drain Analysis
509 credit
Completed Report loss events
in 10
minutes!
Only show
rows with
non-zero
counters
Filter results
as needed
DCNM Slow Drain Analysis
Counter explanations - help
Hover over
counter for
addition
information
DCNM Slow Drain Analysis
Show non-zero data rows only
Only show
rows with
non-zero
counters
Only 3
rows with
non-zero
counters
DCNM Slow Drain Analysis
Filtering
Filter results
as needed
Slow Drain Alerting and
Mitigation
• Only counts a maximum of once per 100ms interval (10 per second)
• Indicates 0 Tx credits for at least the slowport-monitor interval
• Slowport-monitor must be configured for this to alert
• Refer to gen3 slowport-monitor section for more info
Slow Drain Alerting and Mitigation
New!
Port-monitor counter - slowport-oper-delay
• Alerts on slowport operational(actual) delay
• Only applies to the following
• MDS 9500 with generation 4 linecards
• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch
Note: The above monitors 9 slow drain counters and does not monitor 10 others
Slow Drain Alerting and Mitigation
Port-monitor alerting – activation and output
• Adding portgard to errdisable or flap a port can help the switch automatically
mitigate problems
• Should be done to access(F) ports only
• Use separate access(F) and trunk(E) policies
• Applies to delta counters only
Slow Drain Alerting and Mitigation
Port-monitor portguard - continued
• The following adds portguard to timeout-discards and credit-loss-reco and
adjusts the rising-threshold up a bit:
port-monitor name AccessPorts
port-type access
no monitor counter link-loss Error disable the port when
no monitor counter sync-loss Access(F) port policy 60 timeout-discards
no monitor counter signal-loss happen in 60 seconds
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable
counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate Error disable the port when
no monitor counter err-pkt-from-port 4 credit loss recovery
no monitor counter err-pkt-to-xbar events occur in 60 seconds
no monitor counter err-pkt-from-xbar
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4 New!
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Slow Drain Alerting and Mitigation
Port-monitor portguard – trunk (E) port policy
port-monitor name ISLPorts
port-type trunks TrunkE) port policy
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar New!
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 80 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Slow Drain Alerting and Mitigation
Port-monitor portguard – when activated
mds9710-1# show port-monitor active
Ag104/1
Fc1/13 Fc1/13
Ag104/2
4Gbps 4Gbps
ISL
Fc1/3 8Gbps Fc1/3
Slow
Fc1/14
Fc1/14 Ag104/4 Drain
Ag104/3 Device
4Gbps
4Gbps
Test results – congestion-timeout/no-credit-timeout
104/4 R-Rdy delay 300ms - Default timeout settings – frames/sec
Test results – congestion-timeout/no-credit-timeout
104/4 R-Rdy delay 300ms – Congestion-drop/no-credit-drop 200ms
Almost 3X
improvement
on the flow!
Summary
• Reactive
Use several show logging onboard commands with starttime option to display events
Troubleshooting Summary
Proactive
• Configure slowport-monitor @ 10-25ms for both E & F ports
system timeout slowport-monitor 10 mode e
system timeout slowport-monitor 10 mode f
show hardware internal statistics module x pktflow dropped Displays packet drop counters
show hardware internal errors [module x] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command Reference
MDS 9500 - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal up-xbar <0-1> queued-packet-info Displays information indicating packets that are momentarily queued.
For generation 3 linecards only
slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1 Displays information indicating packets that are momentarily queued.
For generation 4 linecards only
MDS Command Reference
MDS 9700 / MDS 9396S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow
drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20
second interval
MDS Command Reference
MDS 9700 / MDS 9396S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well
show hardware internal statistics [module x|module-all] pktflow Displays packet drop counters
dropped Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
show hardware internal errors [module x|module-all] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command Reference
MDS 9700 / MDS 9396S - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal fcmac inst [0-5 | 0-11] Displays information indicating packets dropped due to timeouts
tmm_timeout_stat_buffer
slot x show hardware internal f16_que inst [0-5 | 0-11] table iqm- Displays information indicating packets that are momentarily queued.
statusmem0|1
MDS Command Reference
MDS 9148
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use.
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] request- Display OBFL arbitration timeouts. Note these are not packet drops. These likely
timeout [module x] indicate the destination interface listed is congested. The source interface will retry
the arbitration request.
MDS Command Reference
MDS 9148 - continued
Command Function
show hardware internal statistics all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
show hardware internal packet-flow dropped Display counts of packets dropped
show hardware internal packet-dropped-reason Displays counters of packets dropped and the counter names(reasons) for
each
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
MDS Command Reference
MDS 9250i
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show interface fcx/y counters details Displays more counters pertaining to the interface but regarding slow drain, not
much different than the above.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use. This command requires an single interface or
interface range.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20 second interval
MDS Command Reference
MDS 9250i - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
MDS Command Reference
MDS 9148S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to
in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or
more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,
not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL error-stats. This contains many counters related to slow
error-stats drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display txwait delta values recorded when greater than 99ms per 20
txwait second interval
MDS Command Reference
MDS 9148S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] Display OBFL slowport-monitor-events. This is similar to show process
slowport-monitor-events creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] Display OBFL arbitration timeouts. Note these are not packet drops. These
request-timeout [module x] likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
Slow drain counters and descriptions
For the following MDS switches:
9500 – Gen2/3/4 linecards
9700 – Gen 5 linecard
9148 – 8G 48 port Fabric switch
9250i – Multiservice Fabric Switch
9148S – 16G 48 port Fabric switch
9396S – 16G 96 port fabric switch
FCP_CNTR_RCM_CH0_LACK_OF_CREDIT2 Total count of transitions to zero for Rx B2B credits on ch0; these Sup: Note1: CSCts28865 B2B credit 0
transitions typically indicate that the switch is applying back pressure to transitions incorrect for
AK_FCP_CNTR_RCM_CH0_LACK_OF_CREDIT3 show hardware internal statistics all 2,3,4,48,note2 generation 4 linecards
the attached device because of perceived congestion, and this perceived
congestion can be the result of a lack of Tx B2B credits being returned show hardware internal statistics device all 5,note2
THB_RCM_RCP0_RBBZ_CH04,note1 Integrated in NX-OS 5.2(2)
on an interface over which this device is communicating
F16_RCM_RCP0_RBBZ_CH05 None48s,50i,note3
FCP_CNTR_RCM_RBBZ_CH0 48
Note2: CSCut21070 show
There is no indication of time at zero for this counter. It could stay at zero
hardware internal statistics sup
VIP_RCM_RBBZ_CH0_CNT 50i, 48S for just an instant or for an extended duration of time. Linecard:
command does not include fcmac
slot x show hardware internal statistics2,3,48
Also shown in the output of show interface counters: slot x show hardware internal fc-mac port x error-statistic2,3
Note3: CSCus85931 Need
xxxx receive B2B credit transitions from zero slot x show hardware internal statistics device fcmac all4 show hardware internal errors
command on MDS 9250i, 9148,
or slot x show hardware internal statistics device 9148S
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
None2,3 Packet is available to send, but no credit is available; Sup:note3 Note1: CSCut21070 show
hardware internal statistics sup
THB_TMM_PORT_TWAIT_CNT4 Gen4/Gen5: increments every clock cycle (cycle = 2.353 nanoseconds 425Mhz) None5,note1 command does not include fcmac
F16_TMM_PORT_TWAIT_CNT5 9250i/9148s: Increments every clock cycle (cycle = 2ns 500MHz) None48s,50i,note2
None48 Must multiply by number of ports in port-group to get actual time. Note2: CSCus85931 Need show
hardware internal errors command
VIP_TMM_TXWAIT_CH0_CNT50i, 48S Linecard:
on MDS 9250i, 9148, 9148S
VIP_TMM_TXWAIT_CH1_CNT50i, 48S To calculate actual time: slot x show hardware internal statistics device fcmac|all5,48s,50i
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
FCP_CNTR_RX_WT_AVG_B2B_ZERO2, 48 Count of the number of times an interface was at zero Rx B2B credits for 100 OBFL: Note 1: MDS 9148 added support
ms; this status typically indicates that the switch is withholding R_Rdy for this counter in
AK_FCP_CNTR_RX_WT_AVG_B2B_ZERO3 Show logging onboard error-stats2,3,4,5,48,48s,50i
primitive to the device attached on that interface due to congestion in the
NX-OS 5.2(6)
path to devices with which it is communicating
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO4
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
FCP_CNTR_TX_WT_AVG_B2B_ZERO2 Count of the number of times that an interface was at zero Tx B2B credits for OBFL: Note 1: MDS 9148 added support
100 ms. This status typically indicates congestion at the device attached on for this counter in
AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO3 Show logging onboard error-stats2,3,4,5,48,48s,50i
that interface.
NX-OS 5.2(6)
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO4
None4,5,note3
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) these are once again Note4: CSCus85931 Need show
None48,48s,50i,note4
incremented by the software creditmon process. They will once again hardware internal errors & stats
increment each 100ms interval where the port remains at 0 Tx credits. cmds on MDS 9250i 9148 9148S
slot x show hardware internal statistics device fcmac all port x 4,5
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
RI12_CP_CNT_RESEND_MSG_DROP 2,3 These are not packet drops. Only the request resend message to arbiter was Sup:
dropped. This can be the case when the original request was finally serviced,
FAL_RI0_CP_CNT_RESEND_MSG_DROP4 so the follow up message was dropped. It can indicate some minor show hardware internal errors all|module x
congestion of the egress port, so request could not be granted immediately.
This is counted against the ingress port. It probably indicates some
congestion on an egress port. OBFL:
Check show logging onboard flow-control request-timeout - You might see show logging onboard error-stats
corresponding entries.
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5,note1,note3 Count of times port was at zero Tx credits for the stuck port timeout value. Sup: Note1: Might falsely increment
during port flap:
NoneNote2
CSCus70632
NX-OS 6.2(1) through 6.2(7) on the 9710 this was used for credit loss F16_TMM_PORT_STUCK_FORCE_TI
recovery so was set to 1s(F port)/1.5s(E port). MEOUT_L_H_CNT increments
Linecard:
during port flap
Slot x show hardware internal statistics device all
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) the software creditmon process
once again detects credit loss recovery and the stuck force timout is used for slot x show hardware internal errors
Note2: CSCut21070 show
“system timeout no-credit-drop”. Defaults to 500ms with no action taken(no hardware internal statistics sup
packets are dropped then it is reached). command does not include fcmac
This counter will increment even if “system timeout no-credit-drop” is not Note3: CSCut27271 Stuck port
configured since it defaults to 500ms. If no-credit-drop is not configured then threshold not reset to default when
no action is taken and it simply indicates the port was at zero Tx credits for removing no-credit-drop
500ms. Note3
Integrated in: NX-OS 6.2(13)
This is similar to the viper counter:
VIP_TMM_STK_PRT_TO_TRANSITION_CHx_CNT
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_H_L_CNT5 Count of times a credit was received after the slow port timeout threshold had Sup: Note1: CSCut21070 show
been triggered. hardware internal statistics sup
NoneNote1 command does not include fcmac
Linecard:
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i,note2 Count of times port was at zero Tx credits for the stuck port timeout value. Sup: Note1: CSCus85931 Need show
hardware internal errors & stats
VIP_TMM_STK_PRT_TO_TRANSITION_CH1_CNT48S,50i,note2 - channel 0 (high priority queue) NoneNote1 cmds on MDS 9250i 9148 9148S
Linecard:
NX-OS 6.2(5) through 6.2(7) on the 9250i this was used for credit loss slot 1 show hardware internal statistics device all|fcmac Note2: CSCut27271Stuck port
recovery so was set to 1s(F port)/1.5s(E port). threshold not reset to default when
slot 1 show hardware internal errors
removing no-credit-drop
Table 1 - continued
Slow drain counters and descriptions
Table 1 - Counters indicating delay only
VIP_TMM_SLO_PRT_TO_TRANSITION_CH0_CNT48S,50i Count of times port was at zero Tx credits for the slow port timeout value. Sup: Note1: CSCus85931 Need show
hardware internal errors & stats
VIP_TMM_SLO_PRT_TO_TRANSITION_CH1_CNT48S,50i - channel 0 (high priority queue) NoneNote1 cmds on MDS 9250i 9148 9148S
Linecard:
NX-OS 6.2(5) through 6.2(7) on the 9250i this was used to increment the show hardware internal statistics device all|fcmac
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter.
Consequently, this only occurred once when the HW interrupt occurred and
not each 100ms interval like in prior instances.
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H
Table 1 - end
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
Counter Name Description Commands Additional Info
None2 Number of frames dropped in tolb_path or np path by the Transmit Memory OBFL: Note 1: Bugs:
Manager(TMM); these drops include all types of packet drops: timeout,
None3 Show logging onboard error-stats2,3,4,5,48,48s,50i CSCud77292 Gen 4 linecards do
offline, abort drops, dummy frame drops at egress, etc.
not increment output discards on
THB_TMM_PORT_FRM_DROP_CNT4 ,note 1 interface statistics
F16_TMM_PORT_FRM_DROP_CNT5 Sup Hardware internal errors/statistics: Integrated into NX-OS 5.2(8c) and
These counters are the aggregate counters for all the underlying counters.
6.2(1)
FCP_CNTR_TMM_NORMAL_DROP48 show hardware internal errors all|module x2,3,4
Table 2
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES2 Timeout drops at egress due to frames hitting the congestion drop threshold OBFL: Note1: These do not appear until
NX-OS 6.2(9)
AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES3 Congestion drop threshold is set via the following command and is on at 500ms Show logging onboard error-stats2,3,4,5,48,48s,50i
by default on all “modes” (port types): Note2: These are included in
THB_TMM_TOLB_TIMEOUT_DROP_CNT4 Show logging onboard flow-control timeout-drops 2,3,4,5,48,48s,50i,note1 VIP_TMM_TO_CNT.
system timeout congestion-drop mode e|f
F16_TMM_TOLB_TIMEOUT_DROP_CNT5 Sup Hardware internal errors Note3: Should be included in `show
hardware internal statistics pktflow
FCP_CNTR_TMM_TIMEOUT_DROP48 show hardware internal errors all|module x2,3,4
dropped`:
VIP_TMM_TO_DROP_CNT50i, 48S,note2,note3 show hardware internal statistics all48
CSCus60322 Add
show hardware internal statistics all5,note4 VIP_TMM_TO_CNT and
VIP_TMM_TO_DROP_CNT to
None48s,50i,note5 packet-flow dropped
Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
THB_TMM_TIMEOUT_STATS_DROP4 Timeout stats dropped because stats fifo full
F16_TMM_TIMEOUT_STATS_DROP5 These counters are not real drops. Basically what I have
understood from F16/TBIRD ASIC is that, there is TIMEOUT
STATS FIFO available at TMM. This FIFO holds, packets which
are timed out. If the FIFO is full and not read, newly timed
out packets will not be overwritten in to the FIFO and new
time-out packets are counted by TIMEOUT_STATS_DROP.
Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD2 Count of class-F Fibre Channel frames dropped due to congestion-drop OBFL: Note1: CSCut21070 show
timeout hardware internal statistics sup
AK_FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD3 Show logging onboard error-stats2,3,4,5 command does not include fcmac
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5 Total number of frames force timeout dropped by Stuck port processing(no- Sup: Note1: CSCut21070 show
credit-drop) hardware internal statistics sup
VIP_TMM_STUCK_PORT_TO_CNT48S,50i None 5,note1 command does not include fcmac
show hardware internal statistics device all|fcmac Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Table 2 - continued
Slow drain counters and descriptions
Table 2 - Counters indicating frame drops
FCP_CNTR_TMM_TIMEOUT48 Total number of Timeout drops counter which includes Sup hardware internal statistics: Note1: Should be included in `show
hardware internal statistics pktflow
VIP_TMM_TO_CNT 48S,50i,note1 frames timed out due to congestion(pkt timeout), HW stuck force timeout, show hardware internal statistics all48 dropped`. See:
HW slow port force timeout.
None 48S,50i,note2
CSCus60322 Add
VIP_TMM_TO_CNT and
Sup packet-dropped-reason
VIP_TMM_TO_DROP_CNT to
show hardware internal packet-dropped-reason mod 48,48S,50i packet-flow dropped
OBFL:
F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_CNT5 Total timeout packets dropped due to slow-port-monitor processing. This None. Not implemented.
doesn’t increment since the slow-port-monitor feature doesn’t include a
VIP_TMM_SLOW_PORT_TO_CNT 48S,50i packet drop function
Table 2 - end
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
Counter Name Description Commands Additional Info
FCP_CNTR_CREDIT_LOSS2 ,48 Count of the number of times that creditmon credit loss recovery has been OBFL: Note1: CSCut21070 show
invoked on a port hardware internal statistics sup
AK_FCP_CNTR_CREDIT_LOSS3 Show logging onboard error-stats2,3,4,5,48,48s,50i command does not include fcmac
show hardware internal errors all|module x2,3,4 Note2: CSCus85931 Need show
hardware internal errors & stats
None4,5,note1
cmds on MDS 9250i 9148 9148S
show hardware internal statistics all48
Integrated in: open
None48s,50i,note2
Table 3
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
FCP_CNTR_FORCE_TIMEOUT_ON2 ,48 Count of the number of times the "system timeout no-credit-drop threshold" OBFL: Note2: CSCus85931 Need show
has been reached by this port; when a port is at zero Tx B2B credits for the time hardware internal errors & stats
AK_FCP_CNTR_FORCE_TIMEOUT_ON3 Show logging onboard error-stats2,3,4,5,48,48s,50i cmds on MDS 9250i 9148 9148S
specified, the port starts to drop packets at line rate
None 5,48S,50i,note3
Table 3 - continued
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
FCP_CNTR_FORCE_TIMEOUT_OFF2 Count of the number of times that the port has recovered from the system OBFL: Note2: CSCus85931 Need show
timeout no-credit-drop condition; this status typically means that R_Rdy hardware internal errors & stats
AK_FCP_CNTR_FORCE_TIMEOUT_OFF3 Show logging onboard error-stats2,3,4,5,48,48s,50i cmds on MDS 9250i 9148 9148S
primitive has been returned or possibly that an LR and LRR has occurred.
None 5,48S,50i,note3
Table 3 - continued
Slow drain counters and descriptions
Table 3 - Counters indicating an action on or for an interface
AK_FCP_CNTR_LINK_RESET_OUT2,3 Count of times a Link Credit Reset(LR) was transmitted from the interface. Sup hardware internal errors/statistics Note1: These are not incremented.
FCP_SW_CNTR_LINK_RESET_OUT48
show hardware internal statistics all2,3,48
FCP_SW_CNTR_LINK_RESET_OUT4,5,50i,48S, note 1 Also shown in the output of “show interface counters detail” as: CSCus99138 Port software counters
not incrementing
xxx link reset protocol errors transmitted
Linecard hardware internal errors/statistics Integrated in: open
Or show hardware internal statistics all2,3,48
Note the above just counts link resets that are transmitted when the link is
active.
AK_FCP_CNTR_LINK_RESET_IN2,3 Count of times a Link Credit Reset(LR) was received on the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LINK_RESET_IN48 AK_FCP_CNTR_LINK_RESET_OUT
FCP_SW_CNTR_LINK_RESET_IN4,5,50i,48S, note 1 Also shown in the output of “show interface counters detail” as: Show logging onboard interrupt-stats will show above.
IP_FCMAC_INTR_PRIM_RX_SEQ_LR
xxx link reset protocol errors received
Or
Note the above just counts link resets that are received when the link is active.
AK_FCP_CNTR_LRR_OUT 2,3 Count of times a Link Credit Reset Response(LRR) was transmitted from the Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1 interface. AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_LRR_IN 2,3 Count of times a Link Credit Reset Response(LRR) was received on the interface. Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
FCP_SW_CNTR_LRR_IN4,5,48,50i,48S, note 1 AK_FCP_CNTR_LINK_RESET_OUT
Also shown using show interface fcx/y above.
Table 3 - end
Slow drain counters and descriptions
Table 4 – Interrupt counters
Counter Name Description Commands Additional Info
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H5 Slowport condition detected count (Low to High transition: i.e. credit wait OBFL:
(cwait) > threshold)
show logging onboard interrupt-stats
NX-OS 6.2(1) through 6.2(7) - Count of times port was at zero Tx credits
Linecard:
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
slowport-monitor” releases this counter was used to trigger the Slot x show hardware internal fcmac port x interrupt-counts
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_H_L5 Slowport condition exited count (High to Low transition: ie creditwait OBFL:
(cwait) < threshold)
show logging onboard interrupt-stats
NX-OS 6.2(1) through 6.2(7) - Count of times port received a credit after
Linecard:
being at zero Tx credits for 100ms or longer. In these “pre-slowport-
monitor” releases this counter was used to re-arm the Slot x show hardware internal fcmac port x interrupt-counts
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.
Table 4
Slow drain counters and descriptions
Table 4 – Interrupt counters
F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_L_H5 Stuck port condition detected count (Low to High transition. OBFL:
Configured via:
Linecard:
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_RAISING48s, Slowport condition detected count (Low to High transition: i.e. credit wait OBFL: Note: VIPER does not have a High to
50i (cwait) > threshold) Low interrupt like F16.
show logging onboard interrupt-stats
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING48s, NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
50i
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
Linecard:
slowport-monitor” releases this counter was used to trigger the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment. Slot x show hardware internal fcmac port x interrupt-counts
Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_FALLING 48s,50i Slowport condition detected count exited. OBFL: Note: These are displayed in OBFL
with the VIPER_FCP_INTR_ prefix but
show logging onboard interrupt-stats without the prefix in other places.
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_FALLING 48s,50i
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_RAISING Count of times port was at zero Tx credits for the stuck port timeout OBFL:
value ”no-credit-drop” (default value 500ms).
show logging onboard interrupt-stats
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_RAISING
NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
Linecard:
for 1s(F port) or 1.5s(E port). In these “pre-slowport-monitor” releases
this interrupt was used to trigger the FCP_SW_CNTR_CREDIT_LOSS Slot x show hardware internal fcmac port x interrupt-counts
counter increment.
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters
IP_FCMAC_INTR_PRIM_RX_SEQ_NOS Not Operational Sequence received on the interface. OBFL: show interface fcx/y counters details
IP_FCMAC_INTR_PRIM_RX_SEQ_OLS Off Line Sequence received on the interface. OBFL: show interface fcx/y counters details
Table 4 - continued
Slow drain counters and descriptions
Table 4 – Interrupt counters
IP_FCMAC_INTR_PRIM_RX_SEQ_LRR Link Reset received on the interface. This is sent in response to a Link Reset. OBFL: show interface fcx/y counters details
Table 4 - end
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
Counter Name Description Commands Additional Info
n
fcIfTxWaitCount 2,3,48 ,see note 1 OID 1.3.6.1.4.1.9.9.289.1.2.1.1.15 Displayed via: Note1: On gen2, gen3 and 9148,
this will always return zero.
fcIfTxWaitCount 4,note2,note3 The number of times the FC-port waited due to lack of transmit credits and there Show interface fcx/y counters detailed | i wait
were packets queued for transmit. This is in units of 2.5us. Note2: Added to Gen4 linecards in
fcIfTxWaitCount 5,note 3 -or- NX-OS 5.2(2)
To calculate seconds txwait * 2.5 /1000000
fcIfTxWaitCount 50i, 48S, see note 4 Show interface detailed-counters | i fc|wait Note3: Prior to 6.2(11a) this counter
There is no OID for the Rx direction of this. was inaccurate. See the following
bug:
Not generated by port-monitor
Example:
CSCus15233 fcIfTxWaitCount
Based on the following counters:
rtp-san-34-15-9513# show int fc4/1 counters details | i wait incorrect on DS-X9232-256K9 and
THB_TMM_PORT_TWAIT_CNT4 DS-X9248-768K9
F16_TMM_PORT_TWAIT_CNT5 82864704 waits due to lack of transmit credits
VIP_TMM_TXWAIT_CH0_CNT50i, 48S Fixed in 6.2(11a)
VIP_TMM_TXWAIT_CH1_CNT50i, 48S
Note4: Prior to 6.2(11a) this counter
Not generated by port-monitor was inaccurate. See the following
bug:
CSCus15745 fcIfTxWaitCount
incorrect for MDS 9250i and 9148S
Fixed in 6.2(11a)
fcIfCreditLoss2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.37 Generated by port-monitor counter credit-loss-reco Credit loss recovery is initiated by
the MDS after 1 second(F port) / 1.5
seconds(E port) at zero Tx credits.
The number of link resets that have occurred due to unavailable Shown in the output of show interface counters: Other products may initiate at
credits from the peer side of the link. different intervals
xxx timeout discards, xxx credit loss
Table 5
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
fcIfLinkResetOuts 2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.10 Generated by port-monitor counter lr-tx
The number of link reset protocol errors issued by Shown in the output of show interface fcx/y counters detailed:
the FC-Port to the attached FC-Port.
xxx link reset protocol errors transmitted
or
The number of link reset protocol errors received by Shown in the output of show interface fcx/y counters detailed:
the FC-Port from the attached FC-port xxx link reset protocol errors received
or
The number of packets that are dropped due to time-out at the FC-port or due to the FC-port going Shown in the output of show interface counters:
offline.
xxx timeout discards, xxx credit loss
Table 5 - continued
Slow drain counters and descriptions
Table 5 - SNMP variables applicable
fcIfOutDiscards2,3,4,5,48,50i, 48S to slow drain
OID 1.3.6.1.4.1.9.9.289.1.2.1.1.36 Generated by port-monitor counter tx-discards
The total number of packets that are discarded in the egress side of the FC-port.
CSCus93323 Portmonitor
fcIfTxWtAvgBBCreditTransitionToZero
truncates hcAlarmOwner
fcIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.28 Not generated by port-monitor Based off of the TBBZ hardware
statistic.
Increments when the transmit B2B credit transitions to zero Shown in the output of show interface counters:
There is no indication of time at zero for this counter. It could stay at zero for just an xxxx Transmit B2B credit transitions to zero
instant or for an extended duration of time.
Table 5 - continued
Slow drain counters and descriptions
Table 5 - SNMP variables applicable to slow drain
fcHCIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.40 Not generated by port-monitor Based off of the TBBZ hardware
statistic.
Increments when the transmit B2B credit transitions to zero Shown in the output of show interface counters:
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
fcIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.39 Not generated by port-monitor Based off of the RBBZ hardware
statistic.
Increments when the receive B2B credit transitions to zero Shown in the output of show interface counters:
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
fcHCIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.41 Not generated by port-monitor Based off of the RBBZ hardware
statistic.
Shown in the output of show interface counters:
Increments when the receive B2B credit transitions to zero xxxx Receive B2B credit transitions to zero
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
Table 5 - end
Slow drain counters and descriptions
Legend - superscripts
• Superscripts:
• 1: Generation 1 modules are no longer supported by NX-OS 5.0 (and later releases) and are not covered by this presentation
• Legend
• AK: Aakash (Generation 2 or Generation 3 line card MAC ASIC)
• THB: Thunderbird (Generation 4 ASIC)
• F16: F16 (Generation 5 ASIC)
• SAB: Sabre ASIC for MDS 9148
• VIP: Viper ASIC for MDS 9250i and 9148S
• RI: Request Interface
• TMM: Transmit Memory Manager
• FCP_SW: These indicate software counters
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
Thank you