ECS - ECS Upgrade Procedures-ECS 2.2.1 HF1 or 3.0.0 To 3.0.0 HF1 Operating System Offline Update
ECS - ECS Upgrade Procedures-ECS 2.2.1 HF1 or 3.0.0 To 3.0.0 HF1 Operating System Offline Update
Topic
ECS Upgrade Procedures
Selections
What ECS Version Are You Upgrading To?: ECS 3.0.x.x or below
Select Type of ECS Upgrade Being Performed: ECS OS Upgrade - Offline/Online Procedures
Select ECS OS Upgrade Version/Procedure: 2.2.1 HF1 or 3.0.0 to 3.0.0 HF1 Upgrade
Select ECS OS Upgrade Type: OS - Offline
REPORT PROBLEMS
If you find any errors in this procedure or have comments regarding this application, send email to
SolVeFeedback@dell.com
Copyright © 2022 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell
EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
trademarks of their respective owners.
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of
any kind with respect to the information in this publication, and specifically disclaims implied warranties of
merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable
software license.
This document may contain certain words that are not consistent with Dell's current language guidelines.
Dell plans to update the document over subsequent future releases to revise these words accordingly.
This document may contain language from third party content that is not under Dell's control and is not
consistent with Dell's current guidelines for Dell's own content. When such third party content is updated
by the relevant third parties, this document will be revised accordingly.
Page 1 of 70
Contents
Preliminary Activity Tasks .......................................................................................................3
Read, understand, and perform these tasks.................................................................................................3
Page 2 of 70
Preliminary Activity Tasks
This section may contain tasks that you must complete before performing this procedure.
Table 1 List of cautions, warnings, notes, and/or KB solutions related to this activity
2. This is a link to the top trending service topics. These topics may or not be related to this activity.
This is merely a proactive attempt to make you aware of any KB articles that may be associated with
this product.
Note: There may not be any top trending service topics for this product at any given time.
Page 3 of 70
Dell Technologies Confidential Information version: 2.3.6.90
Page 4 of 70
ECS 3.0.0 Update OS OFFLINE
Note: The next section is an existing PDF document that is inserted into this procedure. You may see
two sets of page numbers because the existing PDF has its own page numbering. Page x of y on the
bottom will be the page number of the entire procedure.
Page 5 of 70
Elastic Cloud Storage (ECS)
Version 3.0 HF1
Page 6 of 70
Copyright © 2013-2017 Dell Inc. or its subsidiaries. All rights reserved.
Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND
WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED
IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.
Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners.
Published in the USA.
EMC Corporation
Hopkinton, Massachusetts 01748-9103
1-508-435-1000 In North America 1-866-464-7381
www.EMC.com
2 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 7 of 70
CONTENTS
Figures 5
Tables 7
Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 3
Page 8 of 70
CONTENTS
4 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 9 of 70
FIGURES
1 U-Series racks............................................................................................................. 11
2 C-Series racks.............................................................................................................12
3 Turtle switch laptop connection.................................................................................. 13
Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 5
Page 10 of 70
FIGURES
6 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 11 of 70
TABLES
1 Revision history...........................................................................................................10
2 Switch rack private IP addresses................................................................................ 13
3 Node-rack private IP addresses.................................................................................. 14
4 Node Rack Private IP addresses................................................................................. 43
5 S3 browser settings....................................................................................................59
Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 7
Page 12 of 70
TABLES
8 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 13 of 70
CHAPTER 1
ECS OS Offline upgrade
l Revision history.................................................................................................. 10
l Introduction........................................................................................................ 10
l Connect the service laptop to the ECS appliance............................................... 10
l Verifying the xDoctor version............................................................................. 15
l Checking ECS health using xDoctor................................................................... 16
l Disabling alerting................................................................................................ 18
l Preparing for multi-node command tool usage................................................... 18
l Check if Compliance is enabled and working properly........................................ 20
l Verify the disks................................................................................................... 21
l Record reference information.............................................................................21
l Prevent install during restart..............................................................................24
l Turn off the master role as rack install server.................................................... 25
l Set the ignore list on all nodes........................................................................... 25
l Preserve the previous refit bundle and history................................................... 26
l Verify the update bundle on <Node 1>................................................................27
l Distribute the OS update files to all nodes......................................................... 28
l Verify remote IPMI management functionality................................................... 32
l Check for NFS mounts.......................................................................................32
l Offline upgrade.................................................................................................. 33
l Exit containers................................................................................................... 33
l Perform the OS update on all nodes................................................................... 41
l Move the PXE menu.......................................................................................... 42
l Restart and reconnect each node, except <Node 1>.......................................... 43
l Reboot <Node 1>............................................................................................... 46
l Verify bonding mode.......................................................................................... 47
l Restart the containers....................................................................................... 48
l Save the post-update OS information................................................................50
l Compare pre- and post-upgrade information......................................................51
Page 14 of 70
ECS OS Offline upgrade
Revision history
This revision history table lists the changes made in this Update OS Offline Guide.
Introduction
Describes how the Update OS Offline procedure is used.
Of the supported upgrade paths to ECS 3.0 HF1, only the following path requires an
OS update: ECS 2.2.1 HF1 to ECS 3.0 HF1.
After performing a full OS update, also known as a "refit," continue on with the Fabric
and Object Services Upgrade to ECS 3.0 HF1.
Use this procedure to update the ECS OS of a single appliance only when the system
has been taken offline and out-of-service for a predefined maintenance window.
This procedure does not upgrade the ECS fabric software running on the nodes. This
procedure describes how to apply the OS update to all nodes at the same time,
rebooting all nodes except one, and finally rebooting the last node to complete the OS
update.
During this ECS OS Offline update procedure, no object CRUD operations are
functional.
Upgrade order:
l Perform the ECS OS update for all nodes in a rack before upgrading the fabric and
object layers.
l Perform the OS and fabric update of all nodes on all racks at a given site before
upgrading the object service at that site.
l Perform the update of all layers at a given site before proceeding to a peer site for
Geo.
10 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 15 of 70
ECS OS Offline upgrade
Page 16 of 70
ECS OS Offline upgrade
2. Configure the service laptop hardwire Ethernet interface to use the static IP:
IP Address 192.168.219.250, Netmask 255.255.255.0
3. Connect the service laptop to turtle switch port 50 by using the RJ-45 cable,
highlighted in orange in the following figure. If the port is in use, label and
disconnect the existing cable.
Assign the IP address, 192.168.219.99 with the 255.255.255.0 subnet mask.
12 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 17 of 70
ECS OS Offline upgrade
ping 192.168.219.251
Example output:
5. From the node identified in the service request, or by using Node 1 from node
specific procedures, find the private IP address for the node in the following
table:
Page 18 of 70
ECS OS Offline upgrade
14 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 19 of 70
ECS OS Offline upgrade
7. Display the ECS OS currently installed on each node in sequential order and
verify that the same OS version is found on every node:
If you are unsure that the displayed OS version indicates a valid version that you
can upgrade from, refer to the Introduction section of this document or the ECS
3.0 HF1 Release Notes for more detail.
CAUTION
If all nodes do not have the same OS version, contact EMC Technical
Support to open a Service Request. Do not proceed with the upgrade.
CAUTION
If the output does not contain any IPs, then the Nile Area Network (NAN)
is not installed or not in use at the site. NAN is used for node to node
communication. Contact EMC Technical Support to open a Service
Request. Do not proceed with the upgrade.
CAUTION
After the completion of an OS update, the installed version of xDoctor may have
reverted to a version earlier than the latest. It is vital to recheck the xDoctor
version and upgrade to the latest before proceeding.
Procedure
1. Log into the current Rack Master.
ssh master.rack
Page 20 of 70
ECS OS Offline upgrade
In the following example, the xDoctor version is not uniform on all nodes.
3. If the installed version of xDoctor listed in the above step is not the latest
version then the ECS xDoctor Users Guide available in ECS SolVe provides
details on upgrading or reinstalling xDoctor.
sudo –i xdoctor –u -A
sudo -i xdoctor
For example:
sudo -i xdoctor
2015-10-27 18:25:16,149: xDoctor_4.4-24 - INFO: Initializing xDoctor v4.4-24 ...
2015-10-27 18:25:16,191: xDoctor_4.4-24 - INFO: Removing orphaned session -
session_1445968876.975
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Starting xDoctor
session_1445970315.896 ... (SYSTEM)
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Master Control Check ...
2015-10-27 18:25:16,242: xDoctor_4.4-24 - INFO: xDoctor Composition - Full Diagnostic
Suite for ECS
2015-10-27 18:25:16,364: xDoctor_4.4-24 - INFO: Session limited to 0:30:00
2015-10-27 18:25:16,465: xDoctor_4.4-24 - INFO: Validating System Version ...
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- xDoctor version is sealed to 4.4-24
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- System version is sealed to
1.2.0.0-417.6e959c4.75
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: Distributing xDoctor session files ...
2015-10-27 18:25:17,627: xDoctor_4.4-24 - INFO: Collecting data on designated nodes,
please be patient ... (update every 5 to 30 seconds)
2015-10-27 18:25:22,650: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:05
2015-10-27 18:25:32,698: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:15
2015-10-27 18:25:47,770: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:30
2015-10-27 18:26:07,870: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:50
2015-10-27 18:26:10,283: xDoctor_4.4-24 - INFO: Waiting for local data collectors ...
2015-10-27 18:26:20,324: xDoctor_4.4-24 - INFO: All data collected in 0:01:02
2015-10-27 18:26:20,325: xDoctor_4.4-24 - INFO: -----------------
16 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 21 of 70
ECS OS Offline upgrade
. . . . . . . . . .
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: -------------------------
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: xDoctor session_1445970315.896
finished in 0:01:21
2015-10-27 18:26:37,027: xDoctor_4.4-24 - INFO: Successful Job:1445970315 Exit Code:
141
2. Determine the report archive for the xDoctor session executed in the previous
step.
For example:
3. View the latest xDoctor report using the output from the command in previous
step.
Add the -WEC option to display only "Warning, Error and Critical" events.
Timestamp = 2015-10-27_210554
Category = health
Source = fcli
Severity = ERROR
Message = Object Main Service not Healthy
Extra = 10.241.172.46
RAP = RAP014
Solution = 204179
4. If the report returns any Warning, Errors, or Critical events resolve those events
before continuing this procedure.
All xDoctor reported Warning, Error, and Critical events must be resolved
before proceeding. Contact ECS Global Technical Support (GTS) or Field
Support Specialists for assistance as required.
Page 22 of 70
ECS OS Offline upgrade
logout
Disabling alerting
Disable dial home during the maintenance period.
Procedure
1. Temporarily disable connect home using xDoctor.
This command prevents the transmission of dial home alerts during the service
engagement.
For example,
2. Notify the customer to disabled all notification policies configured (email, snmp
and/or rsyslog) through the ECS GUI during this service activity.
Note
The above action ensures the customer ECS monitoring process will not be
flooded with events caused by the service activity.
18 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 23 of 70
ECS OS Offline upgrade
Verify the output from all nodes to ensure that the viprexec command returns
successful output from every node.
Procedure
1. Check that the system has visibility to all the nodes you intend to update by
using Putty or a similar tool. As the administrative user, SSH to Node 1:
cd /var/tmp
sudo getrackinfo
Example output:
cat /var/tmp/MACHINES
viprexec -i "pingall"
Page 24 of 70
ECS OS Offline upgrade
If the MACHINES file is empty, there will be no return. If a MACHINES file is not
found, the following error displays:
6. Verify the RMM settings and create a backup copy of the RMM settings.
CAUTION
If the ipmitool cannot reach all nodes in the rack, do not continue with the
procedure.
7. Remain in the SSH session on <Node 1> and in the /var/tmp directory.
Sample output:
[fabric.agent.security]
compliance_enabled = true
If Compliance is working properly, you will see COMPLIANT in the output for
each node. If any node is listed as NON-COMPLIANT, then halt the upgrade and
contact ECS Technical Support.
3. Document the compliance info in a file that will be accessible by the person
doing the services upgrade. (There is no quick way to determine if a cluster is a
Compliance cluster after the OS update procedure is complete.)
a. Switch to the /var/tmp/upgrade directory or create it if needed.
b. Create a file named compliance.info.
c. Add a statement to the file like "This is a Compliance cluster." or
"This is NOT a Compliance cluster."
20 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 25 of 70
ECS OS Offline upgrade
Volume(s):
SCSI Device Block Device FS UUID Type Slot
Label Partition Name SMART Mount Point
----------- ------------ ------------------------------------- --------- -----
-------------- ----------------------------------- ------------ ------------
/dev/sg0 /dev/sda1 d85465e1-cf40-46f1-90fe-e4a04e5c3d17 ext3 0
BOOT n/a /
boot
/dev/sg0 /dev/sda2 n/a n/a 0 n/
a n/a n/
a
total: 62
By using grep excluding the string GOOD, the command lists everything other
than GOOD storage disks. Here you see only system disks listed. System disks
do not report a status. Since no BAD or SUSPECT disks are reported, all
storage disks are GOOD.
Example output:
Page 26 of 70
ECS OS Offline upgrade
Example output:
viprexec -i 'docker ps|grep -v NAMES|awk "{print \$NF}"|tr "\n" " " && echo -ne "\n"'
Expected output:
22 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 27 of 70
ECS OS Offline upgrade
The output shows that all containers are up and running. If any container is not
up and running, investigate the problem before proceeding to upgrade.
5. Collect the node UID/Agent ID information and save it for possible analysis in
the event of failure. Run the following command:
Page 28 of 70
ECS OS Offline upgrade
Example output:
Example output:
24 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 29 of 70
ECS OS Offline upgrade
2. To verify that the rack install server is disabled, run the following command:
Expected output:
no
for mac in $(sudo getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do sudo
setrackinfo --installer-ignore-mac $mac; done
2. Verify that the number of MAC addresses and entries match the node count
and that the status is "Done" for each. Verifying this will ensure that the nodes
will not be able to PXE boot. You will verify the addresses in the next step:
sudo getrackinfo -i
This output verifies that the dnsmasq.dhcpignore file on the node contains a
MAC address for each node.
Page 30 of 70
ECS OS Offline upgrade
Check that the count (4) before each MAC matches the node count for the
rack by using the command output. The MAC addresses must also match those
from the output of the getrackinfo -i command in step 2.
Example output:
4
4 00:1e:67:96:3e:59,ignore # (port 1) provo
4 00:1e:67:96:40:1b,ignore # (port 3) orem
4 00:1e:67:96:40:2f,ignore # (port 4) ogden
4 00:1e:67:96:40:75,ignore # (port 2) sandy
1 Output from host : 192.168.219.1
1 Output from host : 192.168.219.2
1 Output from host : 192.168.219.3
1 Output from host : 192.168.219.4
CAUTION
If the MAC addresses here do not match the MAC addresses shown in step
2, halt the procedure and open a Service Request with EMC Technical
Support.
4. Confirm that the ignore files are the same on all the nodes by comparing
md5sums:
CAUTION
If the md5sums are not the same for all nodes, the files are not identical. If
they are not identical, halt the upgrade and open a Service Request with
EMC Technical Support.
Sample output:
The checksums at your site will be different from those shown in this example.
Make sure the checksums for all your nodes match.
26 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 31 of 70
ECS OS Offline upgrade
2. Stay in the /var/tmp directory in the SSH session on <Node 1> for the
remaining OS update procedure.
Log in to nodes with the administrator account. The default credentials are admin/
ChangeMe.
Procedure
1. Copy the OS update.zip file /var/tmp directory on <Node 1> by using
pscp.exe, or similar copy tool. The OS update zip file is in a format similar to
ecs-os-update-<version>.zip.:
cd /var/tmp
unzip ecs-os-update-<version>.zip
chmod +x /var/tmp/refit
Page 32 of 70
ECS OS Offline upgrade
md5sum -c MD5SUMS
The first sum that is shown by the third command must match the sum output
by the first command for each node. The second sum that is shown by the third
command must match the sum output by the second command for each node.
2f4f9e07fabff6f7bb3a429192e31897 /var/tmp/ecs-os-setup-
target.x86_64-2.1196.578.update.tbz
28 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 33 of 70
ECS OS Offline upgrade
3. Capture the RPM version info: the following command lists the last 60 installed
RPMs. The third command displays, embedded in file names, the version
information for the hosts to be updated.
4. Create the local repository on all nodes from the OS update bundle:
a. Run the following command to capture the RPM info, which can take up to 5
minutes:
Nothing to do.
Removing unnedded Hal and HWmgr
/srv/www/htdocs/repo / ~
Saving Primary metadata
Saving file lists metadata
Saving other metadata
/ ~
~
sudo zypper ref
Page 34 of 70
ECS OS Offline upgrade
5. Run the following command to validate the "repo" RPM repository is present
and that only one such repository is present on each node:
viprexec zypper lr
Sample output:
6. Verify that the PXE media is in place for post-update node rebuilds, or rack
expansions:
30 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 35 of 70
ECS OS Offline upgrade
total 93556
drwxr-xr-x 2 root root 31 Nov 25 19:02.
drwxr-xr-x 4 root root 68 Nov 23 17:40 ..
-rw-r--r-- 1 root root 90885996 Nov 8 18:39 initrd
-rw-r--r-- 1 root root 4911968 Nov 8 18:11 linux
/srv/tftpboot/pxelinux.cfg/:
total 4
drwxr-xr-x 2root root 20 Aug 12 21:10 .
drwxr-xr-x 4 root root 68 Nov 23 17:40 ..
-rw-r--r-- 1 root root 530 Aug 12 20:55 old.file
/srv/www/htdocs/image/:
total 836772
drwxr-xr-x 2 root root 99 Nov 25 19:02 .
drwxr-xr-x 4 root root 29 Nov 23 17:40 ..
-rw-r--r-- 1 root root 59 Nov 8 18:38 ecs-os-setup-
target.x86_64-2.481.121.md5
-rw-r--r-- 1 root root 856845664 Nov 8 18:38 ecs-os-setup-
target.x86_64-2.481.121.xz
You should see initrd and linux in the boot directory and an md5 and
compressed file in the image directory. The image directory may have additional
files.
Note
If this system had a 2.2.1 > 2.2.1 HF1 quickfit update performed on it, you will
also see a second set of initrd and linuxon each node.
/srv/tftpboot/boot/:
total 93636
-rw-r--r-- 1 root root 90903648 Sep 7 15:39 initrd
-rw-r--r-- 1 root root 4975344 Sep 7 15:09 linux
/srv/tftpboot/pxelinux.cfg/:
total 4
-rw-r--r-- 1 root root 526 Jun 20 15:53 old.file
/srv/www/htdocs/image/:
total 1662120
-rw-r--r-- 1 root root 59 Sep 7 15:38 ecs-os-setup-
target.x86_64-3.1429.666.md5
-rw-r--r-- 1 root root 1702001696 Sep 7 15:38 ecs-os-setup-
target.x86_64-3.1429.666.xz
-rw-rw-rw- 1 root root 374 Sep 15 21:55 preset.cfg
Page 36 of 70
ECS OS Offline upgrade
Where:
l <lower> is the node ID* of lowest port device on your shared 1G turtle switch
l <upper> is the node ID* of highest port device on your shared 1G turtle switch
l <x> is the node ID* of the node where you are executing the IPMI command
l <action> is the IPMI power action: on|off|status
l *The getrackinfo command provides the node ID
In this example, you will get the status for all nodes in an eight-node system. After you
enter a command, you will be prompted to press Enter. The command then completes
and displays output:
Procedure
1. Get the status of <Node 1> and press Enter when prompted:
2. Do not proceed if communication with the BMC using IPMI commands is unable
to obtain status for any node.
32 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 37 of 70
ECS OS Offline upgrade
rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,nolock,proto=tcp,timeo=60
0,retrans=2,sec=sys,mountaddr=10.245.100.10,mountvers=3,mountport=2049,mountproto=udp,
local_lock=all,addr=10.245.100.10 0 0
2. Before shutting down ECS services, the NFS mounts need to be unmounted:
CAUTION
Offline upgrade
For offline upgrade, upgrade the nodes in parallel by using ViPR multi-node tools, and
the /var/tmp/MACHINES file that were created.
After the upgrades are deployed, restart all nodes except <Node 1>. Monitor the
restarts from <Node 1>. When all these nodes are restarted, use an SSH or ESRS
connection to another node, like <Node 2>, to monitor the <Node 1> restart.
Access any node in the rack by using its IP address to SSH to it by its IP or hostname.
(See thethe node-rack reference tables.)Use .rack for the FQDN for the rack local
hostnames to differentiate them from the customer DNS environment.
Exit containers
In this procedure, you will exit the containers to preserve any customer-specific
modifications made to the current containers.
Procedure
1. From the service laptop, using Putty or a similar tool, SSH to <Node 1>.
Offline upgrade 33
Page 38 of 70
ECS OS Offline upgrade
Expected output:
md5sum /var/tmp/OSUpgradeExitContainers_offline.py
Expected output:
9dde8ae0775482f1e405de4d7598d239
OSUpgradeExitContainers_offline.py
Sample output:
34 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 39 of 70
ECS OS Offline upgrade
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Could not get number of CPUS from
dmidecode
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: executed cli: /sbin/lspci
terminated unexpectedly, rc: 13
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Could not get number of NICs from
lspci
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Error accessing
callhome.properties file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: dockerCmd: docker images --no-
trunc | grep 780c449fdd0e
Dec 19 07:17:38 ogden-pineapple libviprhal[38332]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 ogden-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 ogden-pineapple systemd[1]: fabric-agent.service: main process
exited, code=exited, status=143/n/a
Dec 19 07:26:07 ogden-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 ogden-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: Could not get number of NICs from
lspci
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: Error accessing callhome.properties
Exit containers 35
Page 40 of 70
ECS OS Offline upgrade
file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep 780c449fdd0e
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep a07479a114af
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep 32cce433c3dc
Dec 19 07:17:37 orem-pineapple libviprhal[38280]: ioctl 2285(0x12) failed on /dev/sg3
with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 orem-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 orem-pineapple systemd[1]: fabric-agent.service: main process exited,
code=exited, status=143/n/a
Dec 19 07:26:07 orem-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 orem-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: Could not get number of NICs from
lspci
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: Error accessing
callhome.properties file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep 780c449fdd0e
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep a07479a114af
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep 32cce433c3dc
Dec 19 07:17:38 sandy-pineapple libviprhal[38264]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 sandy-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 sandy-pineapple systemd[1]: fabric-agent.service: main process
exited, code=exited, status=143/n/a
Dec 19 07:26:07 sandy-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 sandy-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.
36 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 41 of 70
ECS OS Offline upgrade
Exit containers 37
Page 42 of 70
ECS OS Offline upgrade
38 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 43 of 70
ECS OS Offline upgrade
Exit containers 39
Page 44 of 70
ECS OS Offline upgrade
40 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 45 of 70
ECS OS Offline upgrade
Active: inactive (dead) since Mon 2016-12-19 07:33:37 UTC; 193ms ago
Because the command does not indicate progress, it can appear that it is not
responding. To monitor the execution, start another SSH session to <Node 1>
and run the watch command to monitor what is happening.
2. Monitor the update by either of the following methods:
l Watch method:
Example output when completed (note the only refit found is in watch and grep,
not refit itself still running)
Use Ctrl +C to exit watch mode when the refit doupdate commands
complete.
l Use the tail log output method on each node:
Where <Node x> is the rack-local hostname for the node of interest. Use Ctrl
+C to exit.
Page 46 of 70
ECS OS Offline upgrade
For example:
Example output:
Note
42 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 47 of 70
ECS OS Offline upgrade
Example output:
Page 48 of 70
ECS OS Offline upgrade
Running on host <y> not connected to,or thru, node <x>and you passed: <x>:1
<action>:statusVerify with enter/return to continue - or Ctrl-C to abort
You must press Enter to confirm the command.
3. Validate that this output shows the power state for <Node x> is Chassis
Power is off.
44 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 49 of 70
ECS OS Offline upgrade
4. If after 5 minutes, <Node x> remains powered on, force it to power off:
To confirm the command, press Enter when prompted. Verify the status and
repeat if <Node x> does not power off (step 2).
5. Power on <Node x>:
Sample output:
To confirm the command, press Enter when prompted. Verify the status and
repeat if <Node x> does not power on (step 2).
Note
CAUTION
doit uname -a
Sample output:
b. Ping the nodes and ensure that the current node is reachable.
pingall
Sample output
admin@ccaecslab01n1:~> pingall
192.168.219.1 ping succeeded
192.168.219.2 ping succeeded
Page 50 of 70
ECS OS Offline upgrade
c. Check to make sure the node appears in the list of running nodes.
doit uptime
Go to step 1.
46 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 51 of 70
ECS OS Offline upgrade
4. If after 5 minutes, <Node 1> reports 'on,' then force it 'off' with this command:
refit ipmipower_node_x 1 on
Example output:
2. If you do not see slave-o and slave-1 in the output, follow these steps:
a. SSH to the node in error:
sudo rm /etc/sysconfig/network/ifcfg-public
Page 52 of 70
ECS OS Offline upgrade
exit
The default bonding mode you should see (unless configured differently):
4. If the output from step 3 does not display the correct bonding mode, run the
following command:
ssh <Node x> 'sudo ifdown public && sudo ifup public'
Once corrections are made for all nodes as needed, recheck with step 3.
b. If you see the following output, you have encountered a known bug:
Reboot the affected nodes and restart this procedure from step 1a.
48 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 53 of 70
ECS OS Offline upgrade
Sample output:
total: 62
Note
The first two entries in the output represent the system disks. These two
entries will not display a status. This is normal.
Do not proceed unless you get similar output for each node.
3. Start the fabric agent:
Page 54 of 70
ECS OS Offline upgrade
6. Verify that the node's containers have started and the nodes are ACTIVE:
AGENT ID
HOSTNAME ACTUAL MODE
86c0baa9-f35a-4766-b2f8-3046aedc5eb1 layton-
chestnut.ecs.lab.emc.com ACTIVE
Ensure that all docker containers are ACTIVE, and Exited is not displayed in
the output.
7. If the containers are not running, stop and restart the fabric agent:
50 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 55 of 70
ECS OS Offline upgrade
Sample output:
ls /var/tmp/postupdateversions*.log
2. Run the following command to Collect the pre upgrade version information, and
assign it to the variable PREUP.
echo $PREUP
head $PREUP
4. Collect the post-update version information, and assign the variable the
POSTUP:
echo $POSTUP
head $POSTUP
Page 56 of 70
ECS OS Offline upgrade
6. Review the output of each command and verify the pre-update timestamp is
earlier than the post-update timestamp:
Example output:
---
/var/tmp/refit.d/20150708-212626-preupdateversions.log 2015-07-08
21:26:27.850666079 +0000 +++
/var/tmp/refit.d/20150708-222904-postupdateversions.log 2015-07-08
22:29:05.439526482 +0000 @@
-1,23 +1,23 @@ -information
about version: 1.2.0.0-398.b278210.58 +information about version:
1.2.0.0-403.b98c1c4.63 rpm -qa
| egrep 'emc|ecs|nile|vipr|asd' | sort connectemc-3.1.0.1-1.x86_64 ecs-
callhome-1.1.0.0-1964.d913a13.x86_64
-ecs-os-base-1.2.0.0-398.b278210.58.noarch +ecs-os-base-1.2.0.0-403.b98c1c4.63.noarch
emc-arista-firmware-1.2-1.0.x86_64
emc-cdes-firmware-1.1.3.326-8.02.1.x86_64 emc-cdes-testeses-8.14-1.0.x86_64 emc-cdes-
zoning-1.1-1.2.x86_64emc-drive
-firmware-1.3-1.x86_64
-emc-ecs-diags-2.1.1.0-656.b635b0b.noarch
+emc-ecs-diags-2.1.1.0-673.49ee354.noarch
emc-intel-firmware-13.1.5-1.2.BIOS02.03.0003_BMC6680.x86_64
emc-lab-utils-1.10-1.0.x86_64
emc-lsi-hba-firmware-4.1-1.0.x86_64
emc-lsi-storelibir-2-17.01-657.1b62e78.1.x86_64
-emc-nan-2.1.1.0-665.836312e.x86_64
-nile-hwmgr-1.1.1.0-331.0e14626.x86_64
-nile-hwmgr-utils-1.1.1.0-331.0e14626.x86_64
+emc-nan-2.1.1.0-675.584c94e.x86_64
+nile-hwmgr-1.1.1.0-333.cf89b22.x86_64
+nile-hwmgr-utils-1.1.1.0-333.cf89b22.x86_64
python-viprhal-1.1.1.0-1180.00f72e7.x86_64
viprhal-1.1.1.0-1180.00f72e7.x86_64
REFIT SUCCESS running rpm -qa | egrep
'emc|ecs|nile|vipr|asd' | sort
The lines describing the pre-update version are denoted by the "-" symbol and
the post-update version is denoted by the "+" symbol. Locate the ecs-os-base
version in the output and verify that the post-update version displays the
expected value. The Readme.txt file or Release Notes provide more information
on package versions that must be verified.
52 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 57 of 70
ECS OS Offline upgrade
Page 58 of 70
ECS OS Offline upgrade
54 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 59 of 70
CHAPTER 2
ECS OS offline post upgrade tasks
Page 60 of 70
ECS OS offline post upgrade tasks
sudo getrackinfo -i
56 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 61 of 70
ECS OS offline post upgrade tasks
You can check that the count (4) prefixing each MAC matches the node count
for the rack by using the command output. The MACs must also match those
from the output of the getrackinfo -i command in step 2.
Example output:
4
4 00:1e:67:96:3e:59,ignore # (port 1) provo
4 00:1e:67:96:40:1b,ignore # (port 3) orem
4 00:1e:67:96:40:2f,ignore # (port 4) ogden
4 00:1e:67:96:40:75,ignore # (port 2) sandy
1 Output from host : 192.168.219.1
1 Output from host : 192.168.219.2
1 Output from host : 192.168.219.3
1 Output from host : 192.168.219.4
3. If needed, run the following commands and then repeat steps 1 and 2.
for mac in $(sudo getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do sudo
setrackinfo --installer-ignore-mac $mac; done
4. Confirm that the ignore files are the same on all the nodes by comparing
md5sums:
Page 62 of 70
ECS OS offline post upgrade tasks
Example output:
<entry>
<total_dt_num>1920</total_dt_num>
<unready_dt_num>355</unready_dt_num>
<unknown_dt_num>0</unknown_dt_num>
</entry>
<entry>
<type>RR</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>95</unready_dt_num>
--
<entry>
<type>MR</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>128</unready_dt_num>
--
<entry>
<type>LS</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>9</unready_dt_num>
<entry>
<total_dt_num>1920</total_dt_num>
<unready_dt_num>0</unready_dt_num>
<unknown_dt_num>0</unknown_dt_num>
</entry>
2. Verify.
The first time you run the command, the entry for <unready_dt_num> might
not be listed in the output. Rerun the command until in the "total_dt_num"
section, the unready_dt_num is listed as 0.
Note
A cluster with a large load can take up to 40 minutes to initialize the DT. It is not
necessary to wait for all DTs to initialize before you begin the OS update on the
next node. For example, the following types can be ignored:RR, MR, or MA. If
other types have non-zero unready or unknown DT counts, do not continue.
58 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 63 of 70
ECS OS offline post upgrade tasks
Procedure
1. List the kernel version on each node and compare it to the information you
saved at the beginning of this procedure. Verify the kernel information has
changed:
Example output:
Option Setting
Storage Type S3 Compatible Storage
2. Use the S3 browser to upload the test file from your laptop to verify that you
are able to write to the appliance.
Page 64 of 70
ECS OS offline post upgrade tasks
Enabling alerting
Re-enable dial home events after maintenance.
Procedure
1. Log into the current Rack Master.
ssh master.rack
This command enables the transmission of dial home alerts during the service
engagement.
For example,
CAUTION
After the completion of an OS update, the installed version of xDoctor may have
reverted to a version earlier than the latest. It is vital to recheck the xDoctor
version and upgrade to the latest before proceeding.
Procedure
1. Log into the current Rack Master.
ssh master.rack
60 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 65 of 70
ECS OS offline post upgrade tasks
In the following example, the xDoctor version is not uniform on all nodes.
3. If the installed version of xDoctor listed in the above step is not the latest
version then the ECS xDoctor Users Guide available in ECS SolVe provides
details on upgrading or reinstalling xDoctor.
sudo –i xdoctor –u -A
sudo -i xdoctor
For example:
sudo -i xdoctor
2015-10-27 18:25:16,149: xDoctor_4.4-24 - INFO: Initializing xDoctor v4.4-24 ...
2015-10-27 18:25:16,191: xDoctor_4.4-24 - INFO: Removing orphaned session -
session_1445968876.975
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Starting xDoctor
session_1445970315.896 ... (SYSTEM)
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Master Control Check ...
2015-10-27 18:25:16,242: xDoctor_4.4-24 - INFO: xDoctor Composition - Full Diagnostic
Suite for ECS
2015-10-27 18:25:16,364: xDoctor_4.4-24 - INFO: Session limited to 0:30:00
2015-10-27 18:25:16,465: xDoctor_4.4-24 - INFO: Validating System Version ...
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- xDoctor version is sealed to 4.4-24
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- System version is sealed to
1.2.0.0-417.6e959c4.75
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: Distributing xDoctor session files ...
2015-10-27 18:25:17,627: xDoctor_4.4-24 - INFO: Collecting data on designated nodes,
please be patient ... (update every 5 to 30 seconds)
2015-10-27 18:25:22,650: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:05
2015-10-27 18:25:32,698: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:15
2015-10-27 18:25:47,770: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:30
2015-10-27 18:26:07,870: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:50
Page 66 of 70
ECS OS offline post upgrade tasks
2015-10-27 18:26:10,283: xDoctor_4.4-24 - INFO: Waiting for local data collectors ...
2015-10-27 18:26:20,324: xDoctor_4.4-24 - INFO: All data collected in 0:01:02
2015-10-27 18:26:20,325: xDoctor_4.4-24 - INFO: -----------------
. . . . . . . . . .
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: -------------------------
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: xDoctor session_1445970315.896
finished in 0:01:21
2015-10-27 18:26:37,027: xDoctor_4.4-24 - INFO: Successful Job:1445970315 Exit Code:
141
2. Determine the report archive for the xDoctor session executed in the previous
step.
For example:
3. View the latest xDoctor report using the output from the command in previous
step.
Add the -WEC option to display only "Warning, Error and Critical" events.
Timestamp = 2015-10-27_210554
Category = health
Source = fcli
Severity = ERROR
Message = Object Main Service not Healthy
Extra = 10.241.172.46
RAP = RAP014
Solution = 204179
4. If the report returns any Warning, Errors, or Critical events resolve those events
before continuing this procedure.
All xDoctor reported Warning, Error, and Critical events must be resolved
before proceeding. Contact ECS Global Technical Support (GTS) or Field
Support Specialists for assistance as required.
62 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 67 of 70
ECS OS offline post upgrade tasks
logout
Page 68 of 70
ECS OS offline post upgrade tasks
64 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide
Page 69 of 70
Dell Technologies Confidential Information version: 2.3.6.90
Page 70 of 70