Troubleshooting Supplement
Cisco UCS B-Series
Section Links
UCS tools for Troubleshooting Page 3
Blade/Server Troubleshooting Page 36
IOM (FEX) Troubleshooting Page 56
Fabric Interconnect Troubleshooting Page 74
SAN Troubleshooting Page 88
UCS tools for Troubleshooting
System Components -Major Points of Service
UCS Manager (XML and CLI),
NXOS, Physical Connections
to Chassis & Core SAN/LAN
network, Cluster Operations
Cisco UCSManager
Embedded in Fabric Interconnect
Cisco UCS6100 Series Fabric
Interconnects
UCS6120XP20 Port Fabric Interconnect
Chassis Management
UCS6140XP40 Port Fabric Interconnect
Controller (CMC) Operations,
Chassis Discovery, Physical
Cisco UCS2100 Series Fabric Extenders
Connections to Fabric
Logically part of Fabric Switch
Interconnect (FI) and Logical
Inserts into Blade Enclosure
Connections to Adaptor
Cards
Cisco UCS5100 Series Blade Chassis
Flexible bay configurations
Logically part of Fabric Interconnect
Baseboard Management
Controller (BMC) of
Compute nodes, All Compute
node Components (memory,
proc, mezzcards, disk
Cisco UCSB-Series Blade Servers
UCS B-200 M1 Blade Server
UCS B-250 M1 Extended Memory Blade Server
Cisco UCS Network Adapters
Power, Fans, Connectors
Three adapter options
Mix adapters within blade chassis
61xx Fabric Interconnect (FI)
Active/Active Clustered System
Navigation to proper component when troubleshooting
CLI NX-OS or UCSM
Virtual IP
Management Network
IP #A
Switch-A#
IP #B
Switch-B#
UCS 2100 Fabric Extender Switch Connection
Each UCS 2100 Fabric Extender in a UCS 5100 Blade Server Chassis is
connected to a 6100 Series Fabric Interconnect for Redundancy or
Bandwidth Aggregation
Fabric Extender provides 4x10GE ports to the NX5K switch.
Link physical health and the chassis discovery occurs over these links
UCS 6100 Series Switch B
UCS 6100 Series Switch A
UCS 5100 Series Blade
Server Chassis
Back
UCS 2100 Series Fabric Extenders
Unified Compute System Manager
Part of UCS Troubleshooting will be the
verification UCSM is communicating to end
systems correctly
Management
interfaces
Redundant
management
service
UCSM
switch elements
UCSM
chassis elements
multiple protocol
support
server elements
Redundant management plane
7
UCSM access
Enable Logging in Java to capture issues
Example of session log file on client
Client logs for debugging UCSM access & Client KVMaccess are found at this location
on Client system:
C:\Documents and Settings\userid\Application Data\Sun\Java\Deployment\log\.ucsm
UCSM Client Logs
To find what log youshould currently view for issues with UCSM Window go to task manager to check the
process id forthe javaw process. The same file should appear in the log area also baseit offthe time
modified.
Presentation_ID
2010 Cisco and/or its affiliates. All rights reserved.
Cisco Confidential
Interface Stats and reports
Statistics breakdown
Live/now
History
UCS Internal Operations
Unified Compute System Manager (UCSM)
& Data Management Engine (DME) Runs as a cluster
State-full switch-over
Object state is replicated
Distributed Cluster State
Stored in Chassis EPROM
Solves split brain
Application Gateway (AG)
interfaces to the blade
Fabric Interconnect A
Fabric Interconnect B
Interface Layer
Interface Layer
UCSM-A
Replicator
DME
UCSM-B
HA
Controller
HA
Controller
Replicator
FSM
FSM
(active)
Persistifier
Application Gateway Layer
Application Gateway Layer
Chassis 2
CMC
CMC
Chassis 3
CMC
...
CMC
EPROM
CMC
EPROM
Chassis 1
CMC
EPROM
EPROM
CMC
(standby)
Persistifier
flash
flash
DME
chassis
CMC
Events per component
FarNorth-A# scope server ?
WORD
<chassis-id>/<blade-id>
dynamic-uuidDynamic UUID
FarNorth-A# scope server 1/1
FarNorth-A /chassis/server # show event
Server Discovery FSM
FSM runs as a workflow involving many stages (FSM-Stage)
Workflows are predefined and stages can be skipped if:
Not needed (in HA if remote is down, not NIC configuration for Oplin)
FSM Flags (shallow checkpoint or deep checkpoint)
Each Stage is an interaction between:
DME Application Gateway -> End Point
DME just manages the state of the object and workflow, and then
instructs the AG to perform the activity.
AGs do the real work.
FSM usually have the following notation
FSM <Object><Workflow><Operation><Where-is-it-executed>
Object Blade/Chassis
Processing Node Utility OS
Linux-based pre-boot execution environment that can boot on a
Workflow Discover/Association
processing node to run diagnostics, report inventory, or configure the
firmware state of the Blade
Operation Pnuos-Config
Where is generally , or A or B or Local or Peer
If Where is not specified, it is executed on managing node
FSM
Most every action
done by the
UCSM has a
FSM to verify
operation and
status
View and monitor
each action for
ongoing feedback
and progress
state of an action
Logs kept for
review and
troubleshooting
OBFL
Onboard Fault Log stores hardware logs on the
different components, saved at time of issue.
Alternate method to viewed by connecting to the
device.
Show tech-support will capture these logs
System Event Log (SEL) EventsSupported
Server BIOS events
3 Kinds of equipment end-points:
Memory Unit (DIMM)
ECCerrors, Address Parity, Memory Mismatch
Processor Unit
Memory Mirroring, Sparing, SMILink errors
Motherboard
PCIe, QPIuncorrectable errors, Legacy PCI errors
All these errors are modeled as stats properties. The ones for which thresholds are not
defined get reported as statistics only
BMC, BIOS, OS log platform errors to BMCs System Event Log
(SEL) Buffer
POST and Run Time errors
Used as an Effective health monitoring tool
System Event Log (SEL) -config
Users can define rules (policies) for backing up and clearing SELacross all
servers in the UCS system, or they can manually trigger a SELbackup on
individual servers.
System Event Logs = Management Logs
Chassis
Make sure that servers are discovered
Make sure backup destination path is valid
Can be done via CLIalso
Server
CLI navigation
SSH or Telnet to the Cluster IP when possible
You will connect to the Primary FI in the cluster automatically
Cisco UCS 6100 Series Fabric Interconnect
Using keyboard-interactive authentication.
The copyrights to certain works contained herein are owned by
other third parties and are used and distributed under license.
Some parts of this software may be covered under the GNU Public
License or the GNU Lesser General Public License. A copy of
each such license is available at
http://www.gnu.org/licenses/gpl.html and
http://www.gnu.org/licenses/lgpl.html
FarNorth-B#
FarNorth-B# show cluster state
Cluster Id: 0xf76362a0c56011de-0x8446000decd07b44
B: UP, PRIMARY
A: UP, SUBORDINATE
HA READY
UCS CLI navigation Structure
Almost same as NXOS, slight differences in layout
But Configuration is in XML structure
FarNorth-B#
acknowledge Acknowledge
backup
Backup
clear
Reset functions
commit-buffer Commit transaction buffer
connect
Connectto Another CLI
decommission Decommissionmanaged objects
discard-buffer Discard transaction buffer
end
Go to exec mode
exit
Exitfrom command interpreter
recommission RecommissionServer Resources
remove
Remove
scope
Changes the current mode
set
Setproperty values
show
Showrunning system information
terminal
Set terminal line parameters
top
Go to the top mode
up
Go up one mode
where
Show information about the current
mode
FarNorth-B# show
chassis
Chassis
cli
CLIcommands
clock
Display current Date
cluster
Clustermode
configuration
Show information about configuration sessions
eth-uplink
Ethernet Uplink
event
EventManager commands
fabric-interconnect Show Fabric Interconnect
fault
Fault
identity
Identity
iom
IO Module
license
Show the contents of all the license files
org
Organizations
security
Securitymode
sel
System Event Log
server
Server
service-profile
Service Profile
system
System-related show commands
timezone
Set timezone
version
System version
vif
Virtual Interfaces
UCS Configuration from CLI
Not recommended as best practice but is some
times required due to problem
More for use when direct troubleshooting or
verification of proper configfrom UCSM
Will give you good understanding of XML structure
for third party API configurations and uses of
navigation
As system admin for troubleshooting you will need
to be somewhat familiar with CLI
XML configuration naviation
Configuration verification or to so pending changes
FarNorth-A# show configuration ?
<CR>
>
Redirect it to a file
>>
Redirect it to a file in append mode
all
All
no-diff-markers Don't Show Diff Markers
no-pending
Don't Show Pending Config
pending
Show Only Pending Config
|
Pipe command output to filter
Save off configto file
(UCSM also has backup methods)
FarNorth-A# show configuration > ?
ftp:
DestFile URI
scp:
DestFile URI
sftp:
DestFile URI
tftp:
DestFile URI
volatile:
DestFile URI
workspace: DestFile URI
Configuration tools
FarNorth-A# show configuration | ?
cut
Print selected parts of lines.
egrep Egrep-print lines matching a pattern
grep
Grep-print lines matching a pattern
head Display first lines
last Display last lines
less Filter for paging
no-more Turn-off pagination for command output
sort Stream Sorter
tr
Translate, squeeze, and/or delete characters
uniq
Discard all but one of successive identical
lines
vsh
The shell than understands clicommand
wc
Count words, lines, characters
begin Beginwith the line that matches
count Countnumber of lines
end
Endwith the line that matches
exclude Excludelines that match
include Includelines that match
Scope
Scoping movement to different UCS configurationComponents
Details on hardware components done with connect command
You want to be on the Primary FI
FarNorth-B# scope
adapter
chassis
eth-server
eth-uplink
fabric-interconnect
fc-uplink
firmware
host-eth-if
host-fc-if
monitoring
org
security
server
service-profile
system
vhba
Mezzanine Adapter
Chassis
Ethernet Server Domain
Ethernet Uplink
Fabric Interconnect
FC Uplink
Firmware
Host Ethernet Interface
Host FC Interface
Monitor the system
Organizations
Securitymode
Server
Service Profile
Systems
VHBA
Management Commands (scope, where, up & top)
UCSM Navigation
CLI Equivalent to NavPane
Connect NXOS
Connecting from the XML to the Fabric Interconnect
(FI) standard NXOS component.
Used to assist in troubleshooting very familiar to IOS
and Nexus users and all the show commands
Used to run advised debugs
Show switch running config(non server config)
Enable and run ethanalyzer
Clear interface counters found on the FI
Cannot be used to configure UCS (read only)
Connect
Hardware Troubleshooting
Connect attaches you to hardware
and read only NXOS
FarNorth-B# connect
adapter
bmc
clp
iom
local-mgmt
nxos
Mezzanine Adapter
Baseboard Management Controller (CIMC)
Connect to DMTFCLP
IO Module
Connect to Local Management CLI
Connect to NXOSCLI
FarNorth-A# connect local-mgmt
<CR>
a
Fabric A
Defaults to primary
b
Fabric B
Most dangerous
-erase configuration
-reboot
FarNorth-A(local-mgmt)# ?
cd
Change current directory
clear
Reset functions
cluster
Clustermode
connect
Connectto Another CLI
copy
Copya file
cp
Copy a file
delete
Deletemanaged objects
dir
Show content of dir
enable
Enable
end
Go to exec mode
erase
Erase
erase-log-configErase the mgmt logging configfile
exit
Exitfrom command interpreter
install-license Install a license
ls
Show content of dir
mkdir
Create a directory
move
Movea file
mv
Move a file
ping
Test network reachability
pwd
Print current directory
reboot
Reboots Fabric Interconnect
rm
Remove a file
rmdir
Remove a directory
run-script
Run a script
show
Showrunning system information
ssh
SSHto another system
tail-mgmt-log Tail mgmt log file
telnet
Telnetto another system
terminal
Set terminal line parameters
top
Go to the top mode
traceroute
Tracerouteto destination
Connect to NXOS
FarNorth-A# connect nxos <CR>
a
b
Fabric A
Fabric B
FarNorth-A(nxos)# ?
clear
Reset functions
only place you can clear counters today
cli
CLIcommands
debug
Debugging functions
debug-filter Enable filtering for debugging functions
end
Go to exec mode
ethanalyzer Configure ciscofabric analyzer
exit
Exitfrom command interpreter
no
Negate a command or set its defaults
ntp
Execute NTPcommands
pop
Popmode from stack or restore from name
push
Pushcurrent mode to stack or save it under name
show
Showrunning system information
system
Systemmanagement commands
terminal
Set terminal line parameters
test
Test command
undebug
Disable Debugging functions (See also debug)
where
Shows the clicontext you are in
Most popular example:
Show run
Show fex detail
Show interface
Show lacp
Debug
Sh npvflogi-table
Show mac-address-table
Ethernet Interfaces on CPU
Troubleshooting Uses
Ethanalyzerterminology, internal ethernetinterfaces are used:
eth3= inbound-lo
eth4= inbound-hi
eth3handles Rx and Txof low priority control pkts
IGMP, CDP
TCP/UDP/IP/ARP (for management purpose only)
eth4 handles Rx and Txof high priority control pkts
FC (FC packets come to Switch CPU as FCoE packets) and FCoE
STP(spanning-tree) , LACP, DCBX(Data Center Bridging)
Save to file and use Wiresharktool to help diagnose issue
1) FarNorth-A(nxos)# ethanalyzerlocal interface inbound-hi write volatile:///ciscolive
2) FarNorth-A(local-mgmt)# cdvolatile:///
FarNorth-A(local-mgmt)# dir
25192 May 18 11:08:17 2010 ciscolive
3) FarNorth-A(local-mgmt)# copy volatile:///ciscolive tftp:
Enter hostname for the tftpserver: 10.91.42.134
Trying to connect to tftpserver......
Connection to server Established. Copying Started.....
TFTPput operation was successful
KVM
Tool to snapshot screen for support
Doing Web-ex recording best
Monitoring with UCSM and CLI
Compute System
Fabric Monitoring
BMC (Per blade)
Voltage, current sensors (
Power)
Thermal Sensors
DIMMs, CPUs, Adapter,
Sensor values available via IPMI
CMC
Per blade totals
Per chassis totals
PSU redundancy state
Changes are passed to UCSM
Critical transitions via asyncnotifications
Periodic polling
UCSM maintains stats
SAM Maintains state
State, stats available via GUI, CLI, API
Vifs
Interface stats
States
Adaptor
Interface stats
Aggregate stats
States
FEX
Interface stats
States
Switch
Interface stats
Vifs stats
States
Data Gathering for Support
UCSM detailed tech-support should be taken as soon as possible after a
failure occurred. UCSM tech-support contains a running configuration
snapshot as well as an application error/debug log.
If a problem is easily reproducible, please re-try a configuration attempt and
collect tech-support files immediately.
A# connect local-mgmt
A(local-mgmt)# show tech-support ucsmdetail
2. Collect tech-support on one or more problematic chassis (and its
components like server, IOM, BMC)
A(local-mgmt)# show tech-support chassis <chassis id> all detail
3.
Copy collected file to tftp.cisco.com (171.69.17.19)
A(local-mgmt)# copy
workspace:///techsupport/<name_of_the_file>.tar tftp://171.69.17.19
Data Gathering for Support -examples
FarNorth-A(local-mgmt)# show tech-support ucsmdetail
Initiating tech-support information task on FABRIC A ...
Initiating tech-support information task on FABRIC B ...
Completed initiating tech-support subsystem tasks (Total: 2)
All tech-support subsystem tasks are completed (Total: 2)
The detailed tech-support information is located at
workspace:///techsupport/20100517125801_FarNorth_UCSM.tar
FarNorth-A(local-mgmt)# dir
16 Oct 30 09:31:03 2009 cores
31 Nov 20 13:14:20 2009 diagnostics
1024 Oct 30 09:29:05 2009 lost+found/
1024 May 17 12:59:47 2010 techsupport/
FarNorth-A(local-mgmt)# show tech-support chassis 1 all detail
Initiating tech-support information task on Chassis 1 FabricExtender1 ...
Remotely initiating tech-support information task on Chassis 1 FabricExtender2
Initiating tech-support information task on Chassis 1 FabricExtender2 ...
Initiating tech-support information task on IBMC1 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/1 ...
Initiating tech-support information task on IBMC2 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/2 ...
Initiating tech-support information task on IBMC3 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/3 ...
Initiating tech-support information task on Adaptor 2 on Chassis/Blade 1/3 ...
Initiating tech-support information task on IBMC7 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/7 ...
Completed initiating tech-support subsystem tasks (Total: 11)
All tech-support subsystem tasks are completed (Total: 11)
The detailed tech-support information is located at
workspace:///techsupport/20100517124544_FarNorth_BC001_all.tar
FarNorth-A(local-mgmt)# cd///techsupport
FarNorth-A(local-mgmt)# ls
2140160 May 17 12:52:58 2010 20100517124544_FarNorth_BC001_all.tar
12871680 May 17 12:59:47 2010 20100517125801_FarNorth_UCSM.tar
Core Dumps
Once TFTPcore Exporter is
configured and enabled, dumps
will be transferred
Once transferred, select and
move to trash can
Blade Troubleshooting
Troubleshooting Flow
For rest of the session we will work from Blade servers up toward LAN and
SAN network
End
LAN-SAN
FabricInterconnects
IOM Modules
Blades
Start
Common Debug Scenarios
Blades
BMC doesnt boot
Corrupt BMC BIOS, Post Failure, not completing
Attempt to connect to BMC to diagnose
View Logs, collect tech-support
Bad Service-Profile -Association Failure
Bad Hardware
Bad/Reseat/Replace Dimm(s)
CPU or other component check logs
Adaptors issues
Connect to Mezzcards to Diagnose issues
BMC Troubleshooting
- Debug Firmware Utility
Command
Description
mctool
Gets basic information on the State of the BMC to
USC management API
network
See current network configuration and socket
information
obfl
Live obfl
messages
Live /var/log/messages file
alarms
What sensors are in alarm
sensors
Current sensor readings from IPMI
power
The current power state of the x86
Connect CIMC
Debug Utility
Show tech detail and logs
Get snapshot of KVMscreen
To verify health of blade if
questioning UCSM and
wanting to look at lowest level
of Blade data points
FarNorth-A# connect cimc1/3
Trying 127.5.1.3...
Connected to 127.5.1.3.
Escape character is '^]'.
BMC Debug Firmware Utility Shell
[ help ]#
Useful commands marked with arrow
__________________________________________
Debug Firmware Utility
__________________________________________
Command List
__________________________________________
alarms
cores
exit
help [COMMAND]
images
mctools
memory
messages
network
obfl
post
power
sensors
sel
fru
mezz1fru
mezz2fru
tasks
top
update
users
version
__________________________________________
Notes:
"enter Key" will execute last command
"COMMAND ?" will execute help for that command
__________________________________________
MezzCards Common Debug & Isolation Hints
Verify physical link state between IOM and M81KR
using show interface brief on the switch CLI
VIC M81KR(Palo)
Verify vifstate and vnicstate from M81KR
perspective using show-vifs command and showsystemstatuscommand.
Find vifcorresponding to the link
Verify M81KR-Intel/M81KR-QorEphysical link state
using M81KRLink Event Log
Verify state of the control channel (VIC/DCBX/VNTAG)
Verify state of VIF from vicprotocol perspective (VIC
log on M81KR)
For FC, look at FC logs for FLOGI/LS_ACC
Look at the link state from host perspective using
host based tools
M71KR-Q & M71KR-E (Menlo)
M81KR
-Palo Adaptor
adapter 1/1/1 # help
Available commands:
connect
-Connect to remote debug shell
exit
-Exit from subshell
help
-List available commands
history
-Show command history
show-fwlist
-Show firmware versions on the adapter
show-identity
-Show adapter identity
show-phyinfo
-Show adapter phyinfo
show-systemstatus -Show adapter status
adapter 1/1/1 # connect
adapter 1/1/1 (top):1# help
Available commands:
attach-fls
-Attach to fls
attach-mcp
-Attach to mcp
estat
-Run fcperformance monitor
exit
-Exit from subshell
help
-List available commands
history
-Show command history
phy-read
-Read PHYregister
show-fru
-Show FRUcontents
show-fwdtab -Show forwarding table
show-log
-Show system log
show-macstats -Show MAC statistics
Same type commands
as M71KR
Use connect command to Attach to Master
Control Program which is main Palo
firmware application to get more details
adapter 1/1/1 (top):2# attach-mcp
M81KR- Adapter Debug CLI (vifinfo)
adapter 2/8/1 (top):2# attach-mcp
vnic-shows vnicoverview
FarNorth-A# connect adapter 2/8/1
adapter 2/8/1 # connect
adapter 2/8/1 (top):1# attach-mcp
adapter 2/8/1 (mcp):1# vnic
vnicid : internal id of vnic, use for other vniccmds
vnicname : ucsmprovisioned name for this vnic
vnictype : en=ethernet, fc=fcoe
vnicstate: state of vnic
lif
: internal logical if id, use for other lif/vifcmds
lifstate : state of lif
vifuif : bound uplink 0 or 1, =:primary, -:secondary,
>:current
vifucsm: ucsmid for this vif
vifidx : switch id for this vif(vethXXX)
vifvlan: default vlanfor traffic
vifstate : state of vif
Details of Vif
Vifinfo shows network connectivity
COS, default vlan, rate limits
Vifinfo shows address registration list
Unicast, broadcast, multicast
adapter 2/8/1 (mcp):2# vif2
lifid: 2
uif: 0
state: UP
adminst: UP
flags: NIV, CREATED, VIFHASH, VUP, VIFINFO
vifindex: 1241
hash: 89
priority: 0
create retries: 2
provinfo.oui : 00 00 0c
last req: VIF_ENABLE
provinfo.type: SAM_CA
reqstatus: OK
provinfo.data.vifid: 1241
reqcc: SUCCESS
provinfo.data.cookie: 0x5285a
evtrace: LINK_UPCREATE_FAILEDTIMEOUT
provinfo.data.viftype: ETH
CREATE_FAILEDTIMEOUT CREATE_OKENABLE_OKSET_UP
vifinfo.priority : 0
vifinfo.vifid
:2
reg'daddrs: vlan
0 mac00:25:b5:00:00:17
vifinfo.default_cos: 0
vlan 0 macff:ff:ff:ff:ff:ff
vifinfo.vifstate : E--vlan 0 mac00:00:00:00:00:00
vifinfo.vlan
:1
inaddaddrs:
vifinfo.ratelimit.burstsize
:0
toaddaddrs:
vifinfo.ratelimit.rate
: -1
indeladdrs:
todeladdrs:
M81KR MAC Statistics
adapter 2/8/1 (mcp):3# dcem-macstats0
TOTAL DESCRIPTION
24841 Txframes len== 64
63470 Txframes 64 < len<= 127
51113 Txframes 128 <= len<= 255
380 Txframes 256 <= len<= 511
225020 Txframes 512 <= len<= 1023
160 Txframes 1024 <= len<= 1518
2865 Txframes 1519 <= len<= 2047
367849 Txtotal packets
147903879 Txbytes
367849 Txgood packets
346958 Txunicastframes
20277 Txmulticast frames
614 Txbroadcast frames
25 Txframes with VLAN tag
8 Rx Frames len== 64
1063448 Rx Frames 64 < len<= 127
41133 Rx Frames 128 <= len<= 255
24707 Rx Frames 256 <= len<= 511
2359 Rx Frames 512 <= len<= 1023
372 Rx Frames 1024 <= len<= 1518
8901 Rx Frames 1519 <= len<= 2047
1140928 Rx total received packets
110619220 Rx bytes
1140928 Rx good packets
311492 Rx unicastframes
74263 Rx multicast frames
755173 Rx broadcast frames
147903879 Txbytes for good packets
110619220 Rx bytes for good packets
Adapter Debug
CLI(logs)
show-log display internal adapter logs
adapter 1/3/1 (top):2# show-log
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0 set to VNTAG mode
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0: Running
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vif[289]-6-uif0 starting link up in niv
mode
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: peer eth0.0
00:0d:ec:6d:b8:3c start
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0 FSM:
WAIT_NIVDELAYTIMEO/RXVNTAG => RUNNING
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: starting timer for peer
VIC_OPEN
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: app_start_done flags
OPEN_SENT status OK
...
Memory errors
Check Server Event Log/Faults
sh sel2/1
5ed| 03/29/2010 02:20:50 | Memory 0x02| Uncorrectable ECC/other uncorrectable memory error | Rank: 0, DIMMSocket: 1, Channel: C, Socket: 0 | Asserted
What to gather and look at for memory issues
On CIMC -do show tech
On KVM-capture the BIOS version
On KVM-BIOS capture the memory configuration
On CIMC -capture the BIOS version
On CIMC -capture the memory inventory
Show memdetails (get shot)
Reboots
Need to find out reason for reboot of hardware
BMC (CIMC) issue in hardware/firmware on server
UCS Service Profile caused by a profile change/issue
Other Hardware on the blade CPU, Memory
User induced reset button
Blade Reboots
Viewing OBFLfor reason of reboot
Reboot - pressing front-panel button:
0:2009 Dec 29 19:45:04:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Dec 29 19:45:04:BMC:kernel::<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618FCSd/bmc/drivers/vdd_pwr_good
/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 29 19:45:04:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
5:2009 Dec 29 19:45:05:BMC:IPMI:470: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Dec 29 19:45:05:BMC:IPMI:497: VirtualSEL.c:26:SELEvt[02 0D]< C10B02 41 5C3A4B20 00 04 25 52 08 00 FF FF>
4:2009 Dec 29 19:45:34:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Asserted
5:2009 Dec 29 19:45:34:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = ON
This is a signature of HW failure (power off followed by power on in 4-5 seconds. Intel feature to react on HW failure):
0:2009 Nov 25 11:44:55:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Nov 25 11:44:55:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Nov 25 11:44:55:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
4:2009 Nov 25 11:44:55:BMC:kernel:-:<4>kbdmouse_write: mouse write aborted for device reset.
5:2009 Nov 25 11:44:55:BMC:IPMI:472: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Nov 25 11:44:55:BMC:IPMI:500: VirtualSEL.c:26:SELEvt[22 02]< 22 02 02 B718 0D4B20 00 04 25 52 08 00 FF FF>
3:2009 Nov 25 11:45:16:BMC:doctor-bmc:584: doctor-bmc.c:1143:Tcp-> Connection between remote ip0xFE00037Fat port 0x86A4
and local ip0x200037Fat port 0xFAAis in TCP_TIME_WAITstate for at least 2 min 30 seconds.
3:2009 Nov 25 11:45:16:BMC:doctor-bmc:584: doctor-bmc.c:1155:Tcp-> Total Errors Found: 1
5:2009 Nov 25 11:45:21:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/pilot2_power
/pilot2_power.c:266:do_power_on
remote ip0xFE00037F= 254 0 3 127 or 127.0.3.254 (the CMC0interface to the blades) and local ip0x200037F= 2 0 3 127 or 127.3.0.2
Blade Reboots
Viewing OBFL for reason
This is actual customer power reset from UCSM (power on in 8 minutes):
0:2009 Dec 22 17:16:26:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Dec 22 17:16:26:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 22 17:16:26:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
5:2009 Dec 22 17:16:27:BMC:IPMI:474: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Dec 22 17:16:27:BMC:IPMI:511: VirtualSEL.c:26:SELEvt[98 02]< 98 02 02 EBFE 30 4B20 00 04 25 52 08 00 FF FF>
5:2009 Dec 22 17:24:49:BMC:
[email protected]:1275: mcserver_ipmi_extensions.c:212:[mcserver_set_vdd_power] "Power Cycle
5:2009 Dec 22 17:24:49:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:313:do_cycle
5:2009 Dec 22 17:24:49:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:232:do_power_off
5:2009 Dec 22 17:24:59:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:266:do_power_on
4:2009 Dec 22 17:24:59:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/vdd_pwr_good/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Asserted
5:2009 Dec 22 17:24:59:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = ON
Blade Reboots
Viewing OBFL for reason
This is IPMI request, coming from UCSM as authorized reboot or a result of having Desired power State as OFF.
5:2009 Dec 23 18:16:58:BMC:[email protected]:1275: mcserver_
ipmi_extensions.c:212:[mcserver_set_vdd_power]
"Power Off"
<---indicator that an IPMI initiated reset has occurred.
5:2009 Dec 23 18:16:58:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:232:do_power_off
0:2009 Dec 23 18:17:03:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates you've entered Reset for whatever reason
4:2009 Dec 23 18:17:03:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/
vdd_pwr_good/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 23 18:17:03:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/block_transfer/block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
Also for all Resets the DMElogs should be viewed for more information,
DMElogs are found in the in /var/sysmgr/sam_logs/ inside the .tar file of
the <show tech-support ucsmdetail> svc_sam_dme.log
A# connect local-mgmt
A(local-mgmt)# show tech-support ucsmdetail
Serial over LAN
(SoL)
Requires Serial over LAN configured and IPMI profile configured
then applied to Server-profile
Access via same IP address as KVM
Can be configured on the fly and applied to service-profile without
disruption
Used IPMI open tool
http://ipmitool.sourceforge.net/
Management Network
IPMI User
Accessing
BMC
interface
Serial over LAN connection
KVMend point IP address on
Blade
IPMI
IPMI doesnot runon the OS installedon the blade
Totallyindependentof the installedOS; runs evenif OS isdown
IPMI runs on the Baseboard Management Controller
Supports servicabilityin four main areas:
System Event Log (SEL)
OS Watchdog, hardware alerts, etc.
SensorsData Repository(SDR)
Temperaturecontrols, Inventory, etc.
Power control
Serial over LAN
DMIDECODE
http://www.nongnu.org/dmidecode/
Dmidecode reports information about your system's hardware as described in
your system BIOS according to the SMBIOS/DMIstandard.
This will often include usage status for the CPU sockets, expansion slots (e.g.
AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial,
parallel, USB).
Support for Linux and Windows
dmidecode--type {KEYWORD / Number }
bios
system
baseboard
chassis
processor
memory
cache
connector
slot
IOM (FEX) Troubleshooting
Troubleshooting Flow
We will work from Blade servers up toward LAN and SAN network
End
LAN-SAN
FabricInterconnects
IOM Modules
Blades
Start
IOMconnections: chassisbackplaneview
Chassis
Path A
Path B
Path A
Path A
Blade 2
Blade 1
Path B
Path B
Blade 3
Blade 4
Blade 5
Blade 6
IOM1
IOM2
Blade 7
Half-widthservers: 1 mezzcard(one A and one B path)
Full-widthservers: 2 mezzcards(twoA & B paths)
FarNorth-A(nxos)# show fex
FEX
FEX
FEX
FEX
Number
Description
State
Model
Serial
-----------------------------------------------------------------------1
FEX0001
Online
N20-C6508 QCI132800SN
2
FEX0002
Online
N20-C6508 QCI131600Z9
IOM connections
EachIOM(akaFabricExtender) provides
8+1 internal IO channels(8 slots + 1 internal mgmtnetwork)
4 external ports (10Gbpseach; no Etherchannel in the 1st release)
The servers mezzcardsuse thoseIO channelsfor external
connectivity
Servers withone mezzcarduse one IO channelper IOM
vNIC1canfor instance use IOM1 whilevNIC2uses IOM2
This vNIC-to-IOMrouting isflexible and user-configurable
Servers withtwomezzcardsuse twoIO channelsper IOM
Server vNICsare automaticallypinnedto fabriclinks
EachIOMactuallyprovidesa 9
management connectivity
th
internalIO channelfor internal
Viewing Blade ports
Theseinterfaces
From<sh intbrief> at NXOSprompt)
are backplanetraces
EthX/Y/Z where
X= chassisnumber
Y= mezzcardnumber(always1 withhalf-widthblades)
Z = IOMport number(slot wherethe bladeserver resides)
IOM to Fabric Interconnect connections
UCSM calls theseports server ports
NXOSCLIcalls themfex-fabricinterfaces
Note: those EthX/Y ports are interfaces on the fabric interconnects
There canbe1, 2 or 4 ports betweenan IOMand a FI
FarNorth-A(nxos)# sh interface fex-fabric
Fabric
Fabric
Fex
FEX
Fex Port
PortState Uplink Model
Serial
--------------------------------------------------------------1 Eth1/1
Active 1
N20-C6508QCI132800SN
1 Eth1/2
Active 2
N20-C6508QCI132800SN
2 Eth1/5
Active 2
N20-C6508QCI131600Z9
2 Eth1/6
Active 1
N20-C6508QCI131600Z9
interface Ethernet1/1
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left
no shutdown
interface Ethernet1/2
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left
Actual IOM-to-FI pinning scheme
Server slots pinned to uplink
slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8
slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8
slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8
I
O
M
I
O
M
I
O
M
1 link
switch
Uplink: slots 1,2,3,4,5,6,7,8
How to read this: with one IOM-to-FIlink, all servers use that link
2 links
switch
Uplink 1: slots 1,3,5,7
Uplink 2: slots 2,4,6,8
How to read this: with two IOM-to-FIlinks, servers in slots 1,3,5,7 use link
number 1 while other slots use link number 2
4 links
switch
Uplink 1: slots 1,5
Uplink 2: slots 2,6
Uplink 3: slots 3,7
Uplink 4: slots 4,8
How to read this: with four IOM-to-FIlinks, servers in slots 1 and 5 use link 1,
Verifying IOM-to-FI pinning
FarNorth-A(nxos)# show run interface
ethernet1/1/7
version 4.1(3)N2(1.3)
interface Ethernet1/1/7
vntagmax-vifs30
pinning server
fabric-interface Eth1/1
no shutdown
FarNorth-A(nxos)# show run interface
ethernet2/1/8
version 4.1(3)N2(1.3)
interface Ethernet2/1/8
vntagmax-vifs30
pinning server
fabric-interface Eth1/5
no shutdown
Good for identifingproper
pathto Mezzadaptor
Eg: IOM1 ,slot 7 pinned
to link1; IOM2 slot 8
pinnedto link5 Do show
runinteX/Y/Z to verify
Show Fex Detail
FEX: 1 Description: FEX0001 state: Online
FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1327GKGN
Extender Model: N20-I6584, Extender Serial: QCI132800SN
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:26:51:08:67:f4, Num Macs: 10
Module SwGen: 12594 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 -Interface Up. State: Active
Eth1/2 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth1/1/1 Up
Eth1/1
Eth1/2
Eth1/1/2 Up
Eth1/2
Eth1/2
Eth1/1/3 Up
Eth1/1
Eth1/2
Eth1/1/4 Up
Eth1/2
Eth1/2
Eth1/1/7 Up
Eth1/1
Eth1/2
Eth1/1/9 Up
Eth1/2
Eth1/2
FEX: 2 Description: FEX0002 state: Online
FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1317G26R
Extender Model: N20-I6584, Extender Serial: QCI131600Z9
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:24:97:1f:6d:aa, Num Macs: 10
Module SwGen: 12594 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/5
Fabric interface state:
Eth1/5 -Interface Up. State: Active
Eth1/6 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth2/1/1 Up
Eth1/6
Eth1/5
Eth2/1/2 Up
Eth1/5
Eth1/5
Eth2/1/8 Up
Eth1/5
Eth1/5
Eth2/1/9 Up
Eth1/5
Eth1/5
Understanding the Virtual Interface
The servers with one mezzcard present two 10GE
external to the Fabric Interconnect interfaces
The Server OS views the interfaces as 10GENICsand
HBAsdepending on the configuration specified in the
Service Profile
These northbound interfaces can carry both Ethernet
and FC traffic (FCoE). We need a mechanism to identify
the origin server
Concept of Virtual Interface or VIFis created (see next slide)
Virtual interfaces (Vif)
Blade 1
Southbound or OS-side interfaces
veth1
OS
veth0
vhba0
vhba1
External mezz card 10GE port
Virtual interface tag
to associate frames to a VIF
IOM 1
Eth X/Y/Z interface
IOM 2
IOM-to-FI link
Vif 1
Vif 2
Fabric A
Vif3
Vif4
Fabric B
Attaching to FEX
FarNorth-A# connect iom?
<1-255> Chassis ID
FarNorth-A# connect iom1
Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.'
Bad terminal type: "xterm". Will assume vt100.
From FEX attach CLI, user can monitor
CPU, memory etc.
show system resources
show process cpu
show process memory
show system uptime
VIFs
Ethernet and FC are muxedon the samephysical
links
concept of virtualinterfaces (vifs) to split
Ethand FC
Twotypes of VIFs: vethand vfc
Vethfor Ethernet ; vfcfor FC traffic
EachEthX/Y/Z interface typicallyhas multiple vifs
attachedto itto carry trafficto and froma server
To findall vifs associatedwitha EthX/Y/Z interface,
do this:
FarNorth-A(nxos)# show vifsinterface ethernet2/1/8
Interface
VIFS
----------------------------------------------------------------------Eth2/1/8
veth1241, veth1243, veth9461, veth9463
VIFs for FC traffic (FCoE)
FarNorth-A(nxos)# show vifsinterface ethernet2/1/8
Interface
VIFS
----------------------------------------------------------------------Eth2/1/8
veth1241, veth1243, veth9461, veth9463 ,
FarNorth-A(nxos)# sh int vethernet9463
vethernet9463is up
Bound Interface is Ethernet2/1/8
Hardware: VEthernet
Encapsulation ARPA
Port mode is access
Last link flapped 1week(s) 1day(s)
Last clearing of "show interface" counters never
1 interface resets
FarNorth-A(nxos)# show intvfc1271
vfc1271is up
Bound interface is vethernet9463
Hardware is Virtual Fibre Channel
Port WWNis 24:f6:00:0d:ec:d0:7b:7f
Admin port mode is F, trunk mode is off
snmplink state traps are enabled
Port mode is F, FCIDis 0x710005
Port vsanis 100
All vifs associatedwitha EthX/Y/Z
interfaces are pinnedto the fabricport
thatEthX/Y/Z interface ispinnedto.
Vifs in the 10000+ range are usedfor FC
traffic. Check the VLAN to VSAN
mapping(show vlan fcoe)
FarNorth-A(nxos)# show vifsinterface vethernet9463
Interface
VIFS
----------------------------------------------------------------------veth9463
vfc1271,
FCoE VLAN is100
FarNorth-A(nxos)# show vlanfcoe
VLAN
VSAN
Status
-------- -------- -------1
1
Operational
100
100
Operational
Redwood Connection Information
show tech-support fex <1 or 2>
This will capture a needed output
to determine congestion, packet
counters, Pause control on Server
ports and network ports on IOM
Next few slides are few examples of output
Redwood Traffic Information
Traffic Rates on IOM
Will show pause frames and drops if looking for performance concerns
RMON
Stats
Top commands to debugging
# Port Info
Show clock
Show platform fwmevent-history lif<PORT>
Show system internal ethpminfo interface <PORT>
Show system internal ethpmeven-history interface <PORT>
Show platform software dcbxinternal info interface <PORT>
Show platform software dcbxinternal errors
Show platform software sifmgrinfo interface <PORT>
Show clock
# IOM
Connected local-mgm<fabric>
Connect iom<chassis_id>
terminal length 0
show platform software redwood sts
show platform software redwood oper
show platform software redwood log
show platform software redwood elog
show platform software redwood ilog
show platform software redwood ints
#Global Info
Show clock
Show platform fwmevent-history errors
Show platform fwmevent-history msgs
Show platform fwmerrors
Show system internal ethpmevent-history errors
Show system internal ethpminfo trace
Show system internal ethpmevent-history msgs
Show platform software sifmgrevent-history errors
Show platform software sifmgrevent-history lock
Show platform software sifmgrinfo trace
Show platform software sifmgrevent-history msgs
Fabric Interconnect Troubleshooting
Troubleshooting Flow
We will work from Blade servers up toward LAN and SAN network
End
LAN-SAN
FabricInterconnects
IOM Modules
Blades
Start
6100 Fabric Interconnect Troubleshooting
Understanding the Fabric Port Manager
Physical Links issues
Server Links
FEX-Links
DCBXDiscovery
Mac Addresses functions in End Host Mode
Fabric Port Management
Managed by UCS Manager as part of overall chassis discovery
process
Number of deployed fabric ports defined in UCS Manager
service profile
Change in the number of deployed fabric ports require Reacknowledge Chassis
Supports Explicit Pinning only, as determined by UCS Manager
UCS Manager recalculates pinning distribution when fabric
port(s) go down
Supports even number of fabric ports only
No support for fabric port channel
Troubleshooting 10GBE-
Link Not Coming Up
Check PHYdriver software link state:
switch# show hardware internal gatosport ethernet1/19 xcvrinfo
Port 0/18:
State: UP
XCVRinsert debouncetimer running
XCVRlink debouncetimer not running
TX enable signal is on
Debouncetimeout: 0.100 seconds
Link up : 506097 usecsafter Wed May 12 22:38:08 2010
Link dndebouncestart : 0 usecsafter Thu Jan 1 00:00:00 1970
Link debounceend : 0 usecsafter Thu Jan 1 00:00:00 1970
Counters:
Interrupt cntrs:
Bit error cntrs:
Bit Error Rate: 0x0000000000000000Bit Error Rate(since linkup): 0x00000000
Error blocks : 0x0000000000000043Error blocks(since linkup) : 0x00000011
Link cntrs:
Link up: 0x9(9)
Link dn: 0x0(0)
Link debouncedwith link up: 0x0(0)
Link debouncedwith link up since last enable: 0x0(0)
Enabling the Server link
After enabling fabric port
FarNorth-A(nxos)# show running-configinterface ethernet1/1
version 4.1(3)N2(1.3)
interface Ethernet1/1
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left
no shutdown
FarNorth-A(nxos)# show interface fex-fabric
Fabric
Fabric
Fex
FEX
Fex Port
PortState Uplink Model
Serial
-------------------------------------------------------------------------------------1 Eth1/1
Active
1
N20-C6508QCI132800SN
1 Eth1/2
Active
2
N20-C6508QCI132800SN
2 Eth1/5
Active
2
N20-C6508QCI131600Z9
2
Discovered 1
N20-C6508QCI131600Z9
2 Eth1/6
Configured 1
N20-C6508QCI131600Z9
2 Eth1/6
Fabric Up 0
2 Eth1/6
Active
1
N20-C6508QCI131600Z9
Transition States
Fabric Port Management
FarNorth-A(nxos)# show fex 1 detail
FEX: 1 Description: FEX0001 state: Online
FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1327GKGN
Extender Model: N20-I6584, Extender Serial: QCI132800SN
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:26:51:08:67:f4, Num Macs: 10
Module SwGen: 21 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 -Interface Up. State: Active
Fabric Ports
Eth1/2 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth1/1/1 Up
Eth1/1
Eth1/2
Eth1/1/2 Up
Eth1/2
Eth1/2
Eth1/1/3 Up
Eth1/1
Eth1/2
Pinned fabric Port
Eth1/1/4 Up
Eth1/2
Eth1/2
Eth1/1/7 Up
Eth1/1
Eth1/2
Eth1/1/9 Up
Eth1/2
Eth1/2
Logs:
[05/12/2010 22:38:28.273779] Module register received
[05/12/2010 22:38:28.276776] Registration response sent
FEX Event history
[05/12/2010 22:38:28.546132] Module Online Sequence
Network Interface Virtualization (NIV)
protocol negotiation w/ DCBX
Switch and adapter uses DCBX(LLDPbased protocol) NIV
TLV(Feature Type 7, Subtype 0) to:
indicate NIVcapability
negotiate control VNTAGfor virtual interface used by adapter
management entity
Initial protocol frames are non-VNTAG
All frames contain VNTAGonce negotiated
VIC protocol
Allocate/Deallocatevirtual
interfaces (driven by Interface Virtualizer)
Set VIFState (active/standby)
Virtual Interface list management (driven by switch)
MAC address registration (macfiltering offload from adapter to switch)
DCBXTroubleshooting
Checking for DCBXnegotiation results
In the dump of show platform software dcbxinternal info interface ethernet1/1/1 look
for every feature negotiation result as shown below
feature type 3 sub_type0
feature state variables: oper_version0 error 0 oper_mode1 feature_seq_no0 remote_feature_tlv_present1
remote_tlv_not_present_notification_sent0 remote_tlv_aged_out0
feature register paramsmax_version0, enable 1, willing 0 advertise 1, disruptive_error0 mts_addr_node
0x101mts_addr_sap0x1e5
Desired configcfglength: 1 data bytes:08
Operating configcfglength: 1 data bytes:08
Error
1)Indicates negotiation error.
2) Never expected to happen when connected to CNA adaptor
3) When two N5Ksare connected back-to-back
4) If PFCis enabled on different CoSvalues negotiation error can happen
Operating Config
Indicates negotiation result
Absence of operating configindicates that the peer does not support this DCBXTLVor negotiation error
remote_feature_tlv_present indicates whether the remote peer supports this feature TLVor not
MAC Address Learning Functions
Server macaddress is learned via traffic generated by the server
Once learned, the server macaddress is static
Server macaddress only learned on server port
MAC address learning is disabled on border ports
Network to server traffic can only be forwarded (subject to RFP and dj vu
check) if server macaddress is already learned on server port.
Server macaddress can move from one server port to another server port
Server macaddress can move outside the EH-node. The old server mac
address is removed when packet with the same source macis received on
the original pinned border port (more on that later). E.g. a VMmoved and
generates a gratuitous arp
Adapter can register macaddresses with the switch
Switch offloads adapter from performing macaddress filtering
Menlo adapters always registers * (send all traffic to Menlo)
Verifying End Host Mode Status and Configuration
FarNorth-A(nxos)# show mac-address-table
VLAN
MAC Address
Type Age
Port
---------+-----------------+-------+---------+-----------------------------FarNorth-A(nxos)# show mac-address-table ?
1
0025.b500.0004 static 0
veth1235
<CR>
1
0025.b500.0007 static 0
veth1243
>
Redirect it to a file
1
0025.b500.0008 static 0
veth1200
>>
Redirect it to a file in append mode
1
0025.b500.0009 static 0
veth1199
address
Address
0025.b500.000c static 0
veth1207
aging-time Display Aging Time (configured or default) 1
1
0025.b500.0017 static 0
veth1241
count
Display only the count of MAC entries
dynamic
Display Dynamic Entries
1
0025.b500.0018 static 0
veth1277
interface Interface
.
multicast Show Multicast MAC Table entries
. <cut>
notification Display Notification Information
.
static
Display Static Entries
4044
0024.971f.6a45 dynamic 0
Eth1/1/9
vlan
VLAN
4044
0024.971f.6b6f dynamic 0
Eth1/1/9
|
Pipe command output to filter
4044
0024.971f.6b8d dynamic 0
Eth2/1/9
4044
0024.971f.6da8 dynamic 0
Eth2/1/9
4044
0026.5108.67f2 dynamic 0
Eth1/1/9
4044
0026.5108.7de1 dynamic 0
Eth1/1/9
4044
0026.5108.ac59 dynamic 0
Eth1/1/9
4044
0026.5108.c9a1 dynamic 0
Eth2/1/9
1
0100.5e7f.fffa igmp 0
Po2veth1207
1
0100.5e7f.fffd igmp 0
Po2veth1277
200
0100.5e7f.fffa igmp 0
veth1199veth1200
Total MAC Addresses: 47
Mac address table
Verifying End Host Mode Status and Configuration
running-config
UCS-HA-B(nxos)# show running-configinterface ethernet1/9
nterfaceEthernet1/9
switchportmode trunk
switchporttrunk allowed vlan1
pinning border
no shutdown
UCS-HA-B(nxos)# show running-configinterface veth681
interface vethernet681
switchporttrunk allowed vlan1
bind interface Ethernet1/1/5
no pinning server sticky
pinning server pinning-failure link-down
Verifying End Host Mode Status and Configuration
Server port pinning information
FarNorth-A(nxos)# show pinning server-interfaces
---------------+-----------------+------------------------+----------------SIFInterface Sticky
Pinned Border Interface Pinned Duration
---------------+-----------------+------------------------+----------------Eth1/1
Yes
Eth1/2
Yes
Eth1/5
Yes
Eth1/6
Yes
veth1199
No
Po2
2d53:9:57
veth1200
No
Po2
2d53:9:59
veth1207
No
Po2
2d53:10:18
veth1235
No
Po2
2d53:10:22
veth1241
No
Po2
2d53:9:38
veth1243
No
Po2
2d53:9:38
veth1277
No
Po2
2d53:9:50
veth9395
Yes
veth9396
Yes
.
. <cut.>
.
Total Interfaces : 37
Verifying End Host Mode Status and Configuration
Border port information
FarNorth-A(nxos)# show pinning border-interfaces
--------------------+---------+---------------------------------------------------------Border Interface Status SIFs
--------------------+---------+---------------------------------------------------------Po2
Active veth1199veth1200veth1207veth1235
veth1241veth1243veth1277
Eth1/19
Down
Eth1/20
Down
Total Interfaces : 3
SAN Troubleshooting
Tracing a server FC connection
Determine the servers pWWN
Assigned through the service profile
Verify on the host it will match:
Check local FLOGI for that pWWN on UCS: