0% found this document useful (0 votes)
102 views70 pages

ECS - ECS Upgrade Procedures-ECS 2.2.1 HF1 or 3.0.0 To 3.0.0 HF1 Operating System Offline Update

Uploaded by

pedram.sarraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views70 pages

ECS - ECS Upgrade Procedures-ECS 2.2.1 HF1 or 3.0.0 To 3.0.0 HF1 Operating System Offline Update

Uploaded by

pedram.sarraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ECS ™ Procedure Generator

Solution for Validating your engagement

ECS 2.2.1 HF1 or 3.0.0 to 3.0.0 HF1 Operating System


Offline Update

Topic
ECS Upgrade Procedures
Selections
What ECS Version Are You Upgrading To?: ECS 3.0.x.x or below
Select Type of ECS Upgrade Being Performed: ECS OS Upgrade - Offline/Online Procedures
Select ECS OS Upgrade Version/Procedure: 2.2.1 HF1 or 3.0.0 to 3.0.0 HF1 Upgrade
Select ECS OS Upgrade Type: OS - Offline

Generated: July 5, 2022 5:54 PM GMT

REPORT PROBLEMS

If you find any errors in this procedure or have comments regarding this application, send email to
SolVeFeedback@dell.com

Copyright © 2022 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell
EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
trademarks of their respective owners.

The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of
any kind with respect to the information in this publication, and specifically disclaims implied warranties of
merchantability or fitness for a particular purpose.

Use, copying, and distribution of any software described in this publication requires an applicable
software license.

This document may contain certain words that are not consistent with Dell's current language guidelines.
Dell plans to update the document over subsequent future releases to revise these words accordingly.

This document may contain language from third party content that is not under Dell's control and is not
consistent with Dell's current guidelines for Dell's own content. When such third party content is updated
by the relevant third parties, this document will be revised accordingly.

Publication Date: July, 2022

Dell Technologies Confidential Information version: 2.3.6.90

Page 1 of 70
Contents
Preliminary Activity Tasks .......................................................................................................3
Read, understand, and perform these tasks.................................................................................................3

ECS 3.0.0 Update OS OFFLINE.............................................................................................5

Dell Technologies Confidential Information version: 2.3.6.90

Page 2 of 70
Preliminary Activity Tasks
This section may contain tasks that you must complete before performing this procedure.

Read, understand, and perform these tasks


1. Table 1 lists tasks, cautions, warnings, notes, and/or knowledgebase (KB) solutions that you need to
be aware of before performing this activity. Read, understand, and when necessary perform any
tasks contained in this table and any tasks contained in any associated knowledgebase solution.

Table 1 List of cautions, warnings, notes, and/or KB solutions related to this activity

2. This is a link to the top trending service topics. These topics may or not be related to this activity.
This is merely a proactive attempt to make you aware of any KB articles that may be associated with
this product.

Note: There may not be any top trending service topics for this product at any given time.

ECS Top Service Topics

Dell Technologies Confidential Information version: 2.3.6.90

Page 3 of 70
Dell Technologies Confidential Information version: 2.3.6.90

Page 4 of 70
ECS 3.0.0 Update OS OFFLINE

Note: The next section is an existing PDF document that is inserted into this procedure. You may see
two sets of page numbers because the existing PDF has its own page numbering. Page x of y on the
bottom will be the page number of the entire procedure.

Dell Technologies Confidential Information version: 2.3.6.90

Page 5 of 70
Elastic Cloud Storage (ECS)
Version 3.0 HF1

ECS OS Offline Update Guide


02

Page 6 of 70
Copyright © 2013-2017 Dell Inc. or its subsidiaries. All rights reserved.

Published March 2017

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND
WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED
IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.

Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners.
Published in the USA.

EMC Corporation
Hopkinton, Massachusetts 01748-9103
1-508-435-1000 In North America 1-866-464-7381
www.EMC.com

2 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 7 of 70
CONTENTS

Figures 5

Tables 7

Chapter 1 ECS OS Offline upgrade 9


Revision history...........................................................................................10
Introduction................................................................................................ 10
Connect the service laptop to the ECS appliance....................................... 10
Verifying the xDoctor version......................................................................15
Checking ECS health using xDoctor............................................................16
Disabling alerting.........................................................................................18
Preparing for multi-node command tool usage............................................18
Check if Compliance is enabled and working properly................................ 20
Verify the disks........................................................................................... 21
Record reference information..................................................................... 21
Prevent install during restart...................................................................... 24
Turn off the master role as rack install server.............................................25
Set the ignore list on all nodes....................................................................25
Preserve the previous refit bundle and history........................................... 26
Verify the update bundle on <Node 1>........................................................ 27
Distribute the OS update files to all nodes..................................................28
Verify remote IPMI management functionality............................................32
Check for NFS mounts............................................................................... 32
Offline upgrade...........................................................................................33
Exit containers............................................................................................33
Perform the OS update on all nodes............................................................41
Move the PXE menu...................................................................................42
Restart and reconnect each node, except <Node 1>...................................43
Reboot <Node 1>........................................................................................46
Verify bonding mode...................................................................................47
Restart the containers................................................................................48
Save the post-update OS information........................................................ 50
Compare pre- and post-upgrade information.............................................. 51

Chapter 2 ECS OS offline post upgrade tasks 55


ECS OS offline post-update tasks for all nodes.......................................... 56
Restore the PXE menu.................................................................. 56
Post-update: verify the ignore list on all nodes.............................. 56
Check that object service initialization process is complete...........57
Check the OS kernel on all nodes.................................................. 58
Verifying the data path functionality............................................. 59
Enabling alerting............................................................................ 60
Verifying the xDoctor version........................................................ 60
Checking ECS health using xDoctor............................................... 61
Disconnect from the turtle switch.............................................................. 63

Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 3

Page 8 of 70
CONTENTS

4 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 9 of 70
FIGURES

1 U-Series racks............................................................................................................. 11
2 C-Series racks.............................................................................................................12
3 Turtle switch laptop connection.................................................................................. 13

Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 5

Page 10 of 70
FIGURES

6 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 11 of 70
TABLES

1 Revision history...........................................................................................................10
2 Switch rack private IP addresses................................................................................ 13
3 Node-rack private IP addresses.................................................................................. 14
4 Node Rack Private IP addresses................................................................................. 43
5 S3 browser settings....................................................................................................59

Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide 7

Page 12 of 70
TABLES

8 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 13 of 70
CHAPTER 1
ECS OS Offline upgrade

This section explains the following topics:

l Revision history.................................................................................................. 10
l Introduction........................................................................................................ 10
l Connect the service laptop to the ECS appliance............................................... 10
l Verifying the xDoctor version............................................................................. 15
l Checking ECS health using xDoctor................................................................... 16
l Disabling alerting................................................................................................ 18
l Preparing for multi-node command tool usage................................................... 18
l Check if Compliance is enabled and working properly........................................ 20
l Verify the disks................................................................................................... 21
l Record reference information.............................................................................21
l Prevent install during restart..............................................................................24
l Turn off the master role as rack install server.................................................... 25
l Set the ignore list on all nodes........................................................................... 25
l Preserve the previous refit bundle and history................................................... 26
l Verify the update bundle on <Node 1>................................................................27
l Distribute the OS update files to all nodes......................................................... 28
l Verify remote IPMI management functionality................................................... 32
l Check for NFS mounts.......................................................................................32
l Offline upgrade.................................................................................................. 33
l Exit containers................................................................................................... 33
l Perform the OS update on all nodes................................................................... 41
l Move the PXE menu.......................................................................................... 42
l Restart and reconnect each node, except <Node 1>.......................................... 43
l Reboot <Node 1>............................................................................................... 46
l Verify bonding mode.......................................................................................... 47
l Restart the containers....................................................................................... 48
l Save the post-update OS information................................................................50
l Compare pre- and post-upgrade information......................................................51

ECS OS Offline upgrade 9

Page 14 of 70
ECS OS Offline upgrade

Revision history
This revision history table lists the changes made in this Update OS Offline Guide.

Table 1 Revision history

Revision Description Date


01 Initial revision of this guide. September 2016

02 ECS 3.0 HF1. Switch to exit January 2017


containers script.

Introduction
Describes how the Update OS Offline procedure is used.
Of the supported upgrade paths to ECS 3.0 HF1, only the following path requires an
OS update: ECS 2.2.1 HF1 to ECS 3.0 HF1.
After performing a full OS update, also known as a "refit," continue on with the Fabric
and Object Services Upgrade to ECS 3.0 HF1.
Use this procedure to update the ECS OS of a single appliance only when the system
has been taken offline and out-of-service for a predefined maintenance window.
This procedure does not upgrade the ECS fabric software running on the nodes. This
procedure describes how to apply the OS update to all nodes at the same time,
rebooting all nodes except one, and finally rebooting the last node to complete the OS
update.
During this ECS OS Offline update procedure, no object CRUD operations are
functional.
Upgrade order:
l Perform the ECS OS update for all nodes in a rack before upgrading the fabric and
object layers.
l Perform the OS and fabric update of all nodes on all racks at a given site before
upgrading the object service at that site.
l Perform the update of all layers at a given site before proceeding to a peer site for
Geo.

Connect the service laptop to the ECS appliance


This section explains how to connect to a node in the appliance rack by using the 1GbE
switch.
Many procedures require access to the console of an ECS node. The segment/rack
1GbE Management Switch, also referred to as the Turtle switch, can be used to
connect the service laptop to any node by using an RJ-45 network connection through
the segment/rack private network.
The Service Request identifies which node or nodes to perform procedures. <Node 1>
is the first node in a rack. Use <Node 1> for non-node specific procedures.
Obtain the following:

10 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 15 of 70
ECS OS Offline upgrade

l Node ID, for node specific procedures


l The Administrator account password from customer
l [Optional] For troubleshooting purposes, be sure that you have the admin account
password for the 1GbE switch (turtle).
Procedure
1. Identify the turtle switch associated with the node specified in the service
request. The location of the turtle switches is shown in the following figures.
The C-Series Rack has two turtle switches.
Figure 1 U-Series racks

Connect the service laptop to the ECS appliance 11

Page 16 of 70
ECS OS Offline upgrade

Figure 2 C-Series racks

2. Configure the service laptop hardwire Ethernet interface to use the static IP:
IP Address 192.168.219.250, Netmask 255.255.255.0
3. Connect the service laptop to turtle switch port 50 by using the RJ-45 cable,
highlighted in orange in the following figure. If the port is in use, label and
disconnect the existing cable.
Assign the IP address, 192.168.219.99 with the 255.255.255.0 subnet mask.

12 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 17 of 70
ECS OS Offline upgrade

Figure 3 Turtle switch laptop connection

4. Test connectivity to the turtle switch 192.168.219.251. For example:

ping 192.168.219.251

Example output:

PING 192.168.219.251 (192.168.219.251) 56(84) bytes of data.


64 bytes from 192.168.219.251: icmp_seq=1 ttl=64 time=0.168 ms
64 bytes from 192.168.219.251: icmp_seq=2 ttl=64 time=0.143 ms
64 bytes from 192.168.219.251: icmp_seq=3 ttl=64 time=0.166 ms

Table 2 Switch rack private IP addresses

Description Name Rack Private IP Admin Account


address
Arista 48-port turtle.rack 192.168.219.251 admin
Management Switch

Arista 24-port rabbit.rack 192.168.219.252 admin


Primary Public Switch

Arista 24-port hare.rack 192.168.219.253 admin


Secondary Public
Switch

5. From the node identified in the service request, or by using Node 1 from node
specific procedures, find the private IP address for the node in the following
table:

Connect the service laptop to the ECS appliance 13

Page 18 of 70
ECS OS Offline upgrade

Table 3 Node-rack private IP addresses

Node number Node Name Rack Private IP Admin Account


address
1 provo.rack 192.168.219.1 root/admin

2 sandy.rack 192.168.219.2 root/admin

3 orem.rack 192.168.219.3 root/admin

4 ogden.rack 192.168.219.4 root/admin

5 layton.rack 192.168.219.5 root/admin

6 logan.rack 192.168.219.6 root/admin

7 lehi.rack 192.168.219.7 root/admin

8 murray.rack 192.168.219.8 root/admin

9 boston.rack 192.168.219.9 root/admin

10 chicago.rack 192.168.219.10 root/admin

11 houston.rack 192.168.219.11 root/admin

12 phoenix.rack 192.168.219.12 root/admin

13 dallas.rack 192.168.219.13 root/admin

14 detroit.rack 192.168.219.14 root/admin

15 columbus.rack 192.168.219.15 root/admin

16 austin.rack 192.168.219.16 root/admin

17 memphis.rack 192.168.219.17 root/admin

18 seattle.rack 192.168.219.18 root/admin

19 denver.rack 192.168.219.19 root/admin

20 portland.rack 192.168.219.20 root/admin

21 tucson.rack 192.168.219.21 root/admin

22 atlanta.rack 192.168.219.22 root/admin

23 fresno.rack 192.168.219.23 root/admin

24 mesa.rack 192.168.219.24 root/admin

6. Connect to <Node 1> by using either of the following methods:


l The admin account from the Node-rack IP address table.
l Credentials from the customer by using an SSH client
Example connection:

sudo su ssh admin@192.168.219.1


Password:
Last login: Thu Mar 31 10:19:53 2016 from 10.250.102.247

14 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 19 of 70
ECS OS Offline upgrade

7. Display the ECS OS currently installed on each node in sequential order and
verify that the same OS version is found on every node:

viprexec -i 'rpm -qv ecs-os-base'

Example output for ECS 2.2.1 HF1:

Output from host : 192.168.219.5


ecs-os-base-2.2.1.0-1309.3719890.88.noarch

Output from host : 192.168.219.6


ecs-os-base-2.2.1.0-1309.3719890.88.noarch

Output from host : 192.168.219.7


ecs-os-base-2.2.1.0-1309.3719890.88.noarch

Output from host : 192.168.219.8


ecs-os-base-2.2.1.0-1309.3719890.88.noarch

If you are unsure that the displayed OS version indicates a valid version that you
can upgrade from, refer to the Introduction section of this document or the ECS
3.0 HF1 Release Notes for more detail.

CAUTION

If all nodes do not have the same OS version, contact EMC Technical
Support to open a Service Request. Do not proceed with the upgrade.

CAUTION

If the output does not contain any IPs, then the Nile Area Network (NAN)
is not installed or not in use at the site. NAN is used for node to node
communication. Contact EMC Technical Support to open a Service
Request. Do not proceed with the upgrade.

Verifying the xDoctor version


Verify that xDoctor is installed and is at the latest version across the ECS systems.

CAUTION

After the completion of an OS update, the installed version of xDoctor may have
reverted to a version earlier than the latest. It is vital to recheck the xDoctor
version and upgrade to the latest before proceeding.

Procedure
1. Log into the current Rack Master.

ssh master.rack

2. Check the xDoctor version.


sudo -i xdoctor --sysversion

Verifying the xDoctor version 15

Page 20 of 70
ECS OS Offline upgrade

In the following example, the xDoctor version is uniform on all nodes.

sudo -i xdoctor --sysversion


xDoctor Uniform on all nodes: 4.4-24

In the following example, the xDoctor version is not uniform on all nodes.

sudo -i xdoctor --sysversion


xDoctor Not Uniform on all nodes:
[4.4-17] -> ['169.254.1.1', '169.254.1.2', '169.254.1.4']
[4.4-10] -> ['169.254.1.3']

3. If the installed version of xDoctor listed in the above step is not the latest
version then the ECS xDoctor Users Guide available in ECS SolVe provides
details on upgrading or reinstalling xDoctor.

sudo –i xdoctor –u -A

After you finish


If all nodes have the latest version, then continue with next task in this procedure.

Checking ECS health using xDoctor


Procedure
1. Launch xDoctor and perform a Full Diagnosis Suite using the system scope
(default).

sudo -i xdoctor

For example:

sudo -i xdoctor
2015-10-27 18:25:16,149: xDoctor_4.4-24 - INFO: Initializing xDoctor v4.4-24 ...
2015-10-27 18:25:16,191: xDoctor_4.4-24 - INFO: Removing orphaned session -
session_1445968876.975
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Starting xDoctor
session_1445970315.896 ... (SYSTEM)
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Master Control Check ...
2015-10-27 18:25:16,242: xDoctor_4.4-24 - INFO: xDoctor Composition - Full Diagnostic
Suite for ECS
2015-10-27 18:25:16,364: xDoctor_4.4-24 - INFO: Session limited to 0:30:00
2015-10-27 18:25:16,465: xDoctor_4.4-24 - INFO: Validating System Version ...
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- xDoctor version is sealed to 4.4-24
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- System version is sealed to
1.2.0.0-417.6e959c4.75
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: Distributing xDoctor session files ...
2015-10-27 18:25:17,627: xDoctor_4.4-24 - INFO: Collecting data on designated nodes,
please be patient ... (update every 5 to 30 seconds)
2015-10-27 18:25:22,650: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:05
2015-10-27 18:25:32,698: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:15
2015-10-27 18:25:47,770: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:30
2015-10-27 18:26:07,870: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:50
2015-10-27 18:26:10,283: xDoctor_4.4-24 - INFO: Waiting for local data collectors ...
2015-10-27 18:26:20,324: xDoctor_4.4-24 - INFO: All data collected in 0:01:02
2015-10-27 18:26:20,325: xDoctor_4.4-24 - INFO: -----------------

16 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 21 of 70
ECS OS Offline upgrade

. . . . . . . . . .
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: -------------------------
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: xDoctor session_1445970315.896
finished in 0:01:21
2015-10-27 18:26:37,027: xDoctor_4.4-24 - INFO: Successful Job:1445970315 Exit Code:
141

2. Determine the report archive for the xDoctor session executed in the previous
step.

sudo -i xdoctor -r | grep -a1 Latest

For example:

sudo -i xdoctor -r | grep -a1 Latest


Latest Report:
xdoctor -r -a 2015-10-27_183001

3. View the latest xDoctor report using the output from the command in previous
step.
Add the -WEC option to display only "Warning, Error and Critical" events.

sudo -i xdoctor -r -a <archive date_time> -WEC

The following example shows a clean report with no events.

sudo -i xdoctor -r -a 2015-10-27_183001 -WEC

Displaying xDoctor Report (2015-10-27_183001) Filter:


['CRITICAL', 'ERROR', 'WARNING'] ...

The following example shows a report with an error.

sudo -i xdoctor -r -a 2015-10-27_210554 -WEC

Displaying xDoctor Report (2015-10-27_210554) Filter:


['CRITICAL', 'ERROR', 'WARNING'] ...

Timestamp = 2015-10-27_210554
Category = health
Source = fcli
Severity = ERROR
Message = Object Main Service not Healthy
Extra = 10.241.172.46
RAP = RAP014
Solution = 204179

4. If the report returns any Warning, Errors, or Critical events resolve those events
before continuing this procedure.
All xDoctor reported Warning, Error, and Critical events must be resolved
before proceeding. Contact ECS Global Technical Support (GTS) or Field
Support Specialists for assistance as required.

Checking ECS health using xDoctor 17

Page 22 of 70
ECS OS Offline upgrade

5. Logout from the master node.

logout

Disabling alerting
Disable dial home during the maintenance period.
Procedure
1. Temporarily disable connect home using xDoctor.

sudo -i xdoctor --tool --exec=connecthome_maintenance --method=disable

This command prevents the transmission of dial home alerts during the service
engagement.

For example,

2016-04-22 17:02:23,330: xDoctor_4.4-24 - INFO: Executing xDoctor Tool:


[connecthome_maintenance], using Method: [disable], Options: [] and Args: []
2016-04-22 17:02:31,108: xDoctor_4.4-24 - INFO: Request to go into a Maintenance
Window. Disabling ConnectHome ...
2016-04-22 17:02:31,109: xDoctor_4.4-24 - INFO: Disabling ConnectHome, getting into a
Maintenance Window
2016-04-22 17:02:37,162: xDoctor_4.4-24 - INFO: xDoctor Alerting successfully
disabled ...
2016-04-22 17:02:37,309: xDoctor_4.4-24 - INFO: Successful activation of the
reverting flag ...
2016-04-22 17:02:37,310: xDoctor_4.4-24 - INFO: Successfully disabled ConnectHome ...

2. Notify the customer to disabled all notification policies configured (email, snmp
and/or rsyslog) through the ECS GUI during this service activity.

Note

The above action ensures the customer ECS monitoring process will not be
flooded with events caused by the service activity.

3. Once you view a message of Successfully disabled ConnectHome and


the customer has disabled (if elected) any configured notification policies,
continue to the next task of this procedure.

Preparing for multi-node command tool usage


The viprscp and viprexec commands enable file distribution to, and command
execution on multiple nodes.
The viprscp and viprexec commands require a MACHINES file. The file lists the
following:
l Node, rack, and private IP addresses
l Shared SSH keys (passwordless)

18 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 23 of 70
ECS OS Offline upgrade

Verify the output from all nodes to ensure that the viprexec command returns
successful output from every node.
Procedure
1. Check that the system has visibility to all the nodes you intend to update by
using Putty or a similar tool. As the administrative user, SSH to Node 1:

cd /var/tmp

sudo getrackinfo

Example output:

Node private Node Public RMM


Ip Address Id Status Mac Ip Address
Mac Ip Address Node Name
=============== ====== ====== ================= =================
================= ================= =========
192.168.219.1 1 MA 00:1e:67:ab:62:84 123.249.249.211 00:1e:
67:96:3e:5d 123.249.249.201 provo-red
192.168.219.2 2 SA 00:1e:67:ab:62:94 123.249.249.212 00:1e:
67:96:40:79 123.249.249.202 sandy-red
192.168.219.3 3 SA 00:1e:67:ab:62:80 123.249.249.213 00:1e:
67:96:40:1f 123.249.249.203 orem-red
192.168.219.4 4 SA 00:1e:67:ab:62:a4 123.249.249.214 00:1e:
67:96:40:33 123.249.249.204 ogden-red
Status:
M - Master, S - Slave
E - Epoxy
I - Initializing, U - Updating, A - Active
P - On, O - Off
! - Warning/Error list:
1 - Hostname set to default hostname set by installer

2. Create the MACHINES file.

sudo getrackinfo -c MACHINES

3. verify that the MACHINES file has all nodes:

cat /var/tmp/MACHINES

4. Copy the MACHINES file to all nodes:

viprscp -X /var/tmp/MACHINES /var/tmp

5. Verify the MACHINES file is working on all nodes:

viprexec -i "pingall"

Preparing for multi-node command tool usage 19

Page 24 of 70
ECS OS Offline upgrade

If the MACHINES file is empty, there will be no return. If a MACHINES file is not
found, the following error displays:

no suitable MACHINES file found

6. Verify the RMM settings and create a backup copy of the RMM settings.

viprexec -i 'ipmitool lan print 3|egrep -i "ip|gate|vlan|mask"' > /home/admin/RMM.bk

CAUTION

If the ipmitool cannot reach all nodes in the rack, do not continue with the
procedure.

7. Remain in the SSH session on <Node 1> and in the /var/tmp directory.

Check if Compliance is enabled and working properly


Use this procedure to determine if the Compliance feature is currently enabled and
working. If so, you will need to re-enable the Compliance monitoring setting before the
Fabric and Services upgrade.
Procedure
1. Determine if the cluster is configured as a Compliance cluster:

grep "security\|compliance" /opt/emc/caspian/fabric/agent/conf/agent_customize.conf

Sample output:

[fabric.agent.security]
compliance_enabled = true

If Compliance is enabled, then continue with Step 2.


2. Determine if Compliance is working properly:

viprexec "sudo /opt/emc/caspian/fabric/cli/bin/fcli lifecycle cluster.compliance"

If Compliance is working properly, you will see COMPLIANT in the output for
each node. If any node is listed as NON-COMPLIANT, then halt the upgrade and
contact ECS Technical Support.
3. Document the compliance info in a file that will be accessible by the person
doing the services upgrade. (There is no quick way to determine if a cluster is a
Compliance cluster after the OS update procedure is complete.)
a. Switch to the /var/tmp/upgrade directory or create it if needed.
b. Create a file named compliance.info.
c. Add a statement to the file like "This is a Compliance cluster." or
"This is NOT a Compliance cluster."

20 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 25 of 70
ECS OS Offline upgrade

Verify the disks


It is important to verify that all disks are in the Good state before beginning this
procedure.
Procedure
1. Verify that the state of all drives is Good.

viprexec "cs_hal list vols | grep -v GOOD"

Example output for one node:

Volume(s):
SCSI Device Block Device FS UUID Type Slot
Label Partition Name SMART Mount Point
----------- ------------ ------------------------------------- --------- -----
-------------- ----------------------------------- ------------ ------------
/dev/sg0 /dev/sda1 d85465e1-cf40-46f1-90fe-e4a04e5c3d17 ext3 0
BOOT n/a /
boot
/dev/sg0 /dev/sda2 n/a n/a 0 n/
a n/a n/
a

total: 62

By using grep excluding the string GOOD, the command lists everything other
than GOOD storage disks. Here you see only system disks listed. System disks
do not report a status. Since no BAD or SUSPECT disks are reported, all
storage disks are GOOD.

Record reference information


This section explains how to collect the node IP addresses, FQDNs, and containers for
all racks being updated. All the information generated in this section should be saved
in a file that can be easily referenced by you and future support engineers.
Procedure
1. Run the following command to list the node IP addresses and other info:

viprexec ip ad show public

Example output:

Output from host : 192.168.219.1


12: public: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
stat e UP group default
link/ether 00:1e:67:e3:0f:12 brd ff:ff:ff:ff:ff:ff
inet 10.241.207.57/24 scope global public
valid_lft forever preferred_lft forever
inet6 fe80::21e:67ff:fee3:f12/64 scope link
valid_lft forever preferred_lft forever

Output from host : 192.168.219.4

Verify the disks 21

Page 26 of 70
ECS OS Offline upgrade

12: public: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue


stat e UP group default
link/ether 00:1e:67:e3:0c:26 brd ff:ff:ff:ff:ff:ff
inet 10.241.207.60/24 scope global public
valid_lft forever preferred_lft forever
inet6 fe80::21e:67ff:fee3:c26/64 scope link
valid_lft forever preferred_lft forever

Output from host : 192.168.219.2


12: public: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
stat e UP group default
link/ether 00:1e:67:e3:0c:fa brd ff:ff:ff:ff:ff:ff
inet 10.241.207.58/24 scope global public
valid_lft forever preferred_lft forever
inet6 fe80::21e:67ff:fee3:cfa/64 scope link
valid_lft forever preferred_lft forever

Output from host : 192.168.219.3


12: public: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
stat e UP group default
link/ether 00:1e:67:e3:09:f6 brd ff:ff:ff:ff:ff:ff
inet 10.241.207.59/24 scope global public
valid_lft forever preferred_lft forever
inet6 fe80::21e:67ff:fee3:9f6/64 scope link
valid_lft forever preferred_lft forever

2. Run the following command to list the FQDNs:

viprexec -i 'hostname -f'

Example output:

Output from host: 192.168.219.1


layton-melon.yourcompany.com

Output from host: 192.168.219.2


murray-melon.yourcompany.com

Output from host: 192.168.219.3


logan-melon.yourcompany.com

Output from host: 192.168.219.4


lehi-melon.yourcompany.com

3. Run the following command to list the containers:

viprexec -i 'docker ps|grep -v NAMES|awk "{print \$NF}"|tr "\n" " " && echo -ne "\n"'

Expected output:

Output from host: 192.168.219.21


object-main fabric-lifecycle fabric-zookeeper fabric-registry

Output from host: 192.168.219.22


object-main fabric-lifecycle fabric-zookeeper

Output from host: 192.168.219.23


object-main fabric-lifecycle fabric-zookeeper

22 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 27 of 70
ECS OS Offline upgrade

Output from host: 192.168.219.24


object-main

4. Run the following command:

viprexec -i 'docker ps -a'

The output shows that all containers are up and running. If any container is not
up and running, investigate the problem before proceeding to upgrade.

admin@provo-plum:~> viprexec -i "docker ps -a"

Output from host : 192.168.219.1


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
bcff72ccd8f6 780c449fdd0e "/opt/vipr/boot/boot." 3 days ago
Up 3 days object-main
46750e0b7259 a07479a114af "./boot.sh lifecycle" 3 days ago
Up 3 days fabric-lifecycle
e11b44351b3d 32cce433c3dc "./boot.sh 1 1=169.25" 3 days ago
Up 3 days fabric-zookeeper
8df446ce8305 524f8808202b "./boot.sh" 3 days ago
Up 3 days fabric-registry

Output from host : 192.168.219.2


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
fd9760df40df 780c449fdd0e "/opt/vipr/boot/boot." 3 days ago
Up 3 days object-main
1f8faa613a69 a07479a114af "./boot.sh lifecycle" 3 days ago
Up 3 days fabric-lifecycle
bd3a43cc38c0 32cce433c3dc "./boot.sh 2 1=169.25" 3 days ago
Up 3 days fabric-zookeeper

Output from host : 192.168.219.3


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
1d237251a9c9 780c449fdd0e "/opt/vipr/boot/boot." 3 days ago
Up 3 days object-main
3baec7f83974 a07479a114af "./boot.sh lifecycle" 3 days ago
Up 3 days fabric-lifecycle
f99b9ae4b1eb 32cce433c3dc "./boot.sh 3 1=169.25" 3 days ago
Up 3 days fabric-zookeeper

Output from host : 192.168.219.4


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
69d729d7681a 780c449fdd0e "/opt/vipr/boot/boot." 3 days ago
Up 3 days object-main

5. Collect the node UID/Agent ID information and save it for possible analysis in
the event of failure. Run the following command:

viprexec -i 'dmidecode -s system-uuid'

Record reference information 23

Page 28 of 70
ECS OS Offline upgrade

Example output:

Output from host: 192.168.219.1


C1A71276-DCA4-E111-AC62-001E674DA2DE

Output from host: 192.168.219.2


015BCC13-11EE-E111-BE34-001E675A8E71

Output from host: 192.168.219.3


412734ED-CBA4-E111-A136-001E674D9F9B

Output from host: 192.168.219.4


110B451E-CDA3-E111-B5C6-001E674DA202

6. List the kernel version on each node:

viprexec 'uname -a'

Example output:

192.168.219.1: Linux provo-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC


2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.2: Linux sandy-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.3: Linux orem-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.4: Linux ogden-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux

Prevent install during restart


Procedure
1. To avoid install during restart for upgrade, archive the PXE menu:

viprexec 'mv /srv/tftpboot/pxelinux.cfg/default /srv/tftpboot/pxelinux.cfg/old.file'

2. Verify that the PXE menu is archived:

viprexec 'ls -l /srv/tftpboot/pxelinux.cfg/'

Example output for each node:

-rw-r--r-- 1 root root 526 Nov 25 14:35 old.file

24 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 29 of 70
ECS OS Offline upgrade

Turn off the master role as rack install server


Procedure
1. To disable the rack install server, run the following command:

sudo setrackinfo -p RackInstallServer no

2. To verify that the rack install server is disabled, run the following command:

sudo getrackinfo -p RackInstallServer

Expected output:

no

Set the ignore list on all nodes


Procedure
1. To reconstruct the DHCP ignore list so that PXE requests are ignored by the
master:

for mac in $(sudo getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do sudo
setrackinfo --installer-ignore-mac $mac; done

2. Verify that the number of MAC addresses and entries match the node count
and that the status is "Done" for each. Verifying this will ensure that the nodes
will not be able to PXE boot. You will verify the addresses in the next step:

sudo getrackinfo -i

Example output from a four-node system:

Rack Installer Status


=====================
Mac Name Port Ip Status
00:1e:67:96:3e:59 provo 1 none Done!
00:1e:67:96:40:75 sandy 2 none Done!
00:1e:67:96:40:1b orem 3 none Done!
00:1e:67:96:40:2f ogden 4 none Done!

This output verifies that the dnsmasq.dhcpignore file on the node contains a
MAC address for each node.

Turn off the master role as rack install server 25

Page 30 of 70
ECS OS Offline upgrade

3. Check the dnsmasq.dhcpignore file:

viprexec cat /etc/dnsmasq.dhcpignore/all | sort | uniq -c

Check that the count (4) before each MAC matches the node count for the
rack by using the command output. The MAC addresses must also match those
from the output of the getrackinfo -i command in step 2.
Example output:

4
4 00:1e:67:96:3e:59,ignore # (port 1) provo
4 00:1e:67:96:40:1b,ignore # (port 3) orem
4 00:1e:67:96:40:2f,ignore # (port 4) ogden
4 00:1e:67:96:40:75,ignore # (port 2) sandy
1 Output from host : 192.168.219.1
1 Output from host : 192.168.219.2
1 Output from host : 192.168.219.3
1 Output from host : 192.168.219.4

CAUTION

If the MAC addresses here do not match the MAC addresses shown in step
2, halt the procedure and open a Service Request with EMC Technical
Support.
4. Confirm that the ignore files are the same on all the nodes by comparing
md5sums:

viprexec md5sum /etc/dnsmasq.dhcpignore/all

CAUTION

If the md5sums are not the same for all nodes, the files are not identical. If
they are not identical, halt the upgrade and open a Service Request with
EMC Technical Support.

Sample output:

Output from host : 192.168.219.1


b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all
Output from host : 192.168.219.3
b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all
Output from host : 192.168.219.2
b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all
Output from host : 192.168.219.4
b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all

The checksums at your site will be different from those shown in this example.
Make sure the checksums for all your nodes match.

Preserve the previous refit bundle and history


This section explains how to archive the upgrade package zip and bundle .tbz files,
so the files do not interfere with upgrade commands. It also explains how to move the

26 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 31 of 70
ECS OS Offline upgrade

pre-upgrade and post-upgrade logs from previous upgrades, into a time-stamped


archive directory.
Procedure
1. Run the following commands from the /var/tmp directory in the SSH session
on <Node 1>:

viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/refit /var/tmp/refit.d/. 2>/dev/null'

viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/*update* /var/tmp/refit.d/. 2>/dev/


null'

viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/MD5SUMS /var/tmp/refit.d/. 2>/dev/


null'

viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/refit.d /var/tmp/refit.d.$(date +"%Y


%m%d-%H%M%S")'

2. Stay in the /var/tmp directory in the SSH session on <Node 1> for the
remaining OS update procedure.

Verify the update bundle on <Node 1>


Note

Log in to nodes with the administrator account. The default credentials are admin/
ChangeMe.

Procedure
1. Copy the OS update.zip file /var/tmp directory on <Node 1> by using
pscp.exe, or similar copy tool. The OS update zip file is in a format similar to
ecs-os-update-<version>.zip.:

pscp ecs-os-update-<version>.zip root@192.168.219.1:/var/tmp/ecs-os-update-


<version>.zip

2. SSH to <Node 1> as the administrative user (admin).


3. Prepare the OS update files by running the following commands:

cd /var/tmp

unzip ecs-os-update-<version>.zip

chmod +x /var/tmp/refit

Verify the update bundle on <Node 1> 27

Page 32 of 70
ECS OS Offline upgrade

4. Verify the checksums of the files from the bundle:

md5sum -c MD5SUMS

Distribute the OS update files to all nodes


Procedure
1. Distribute the refit script to all nodes and validate them:

viprscp /var/tmp/refit /usr/local/bin/

viprscp /var/tmp/*update.tbz /var/tmp/

2. Verify the md5sum matches on all nodes.

viprexec 'md5sum /var/tmp/*update.tbz'

viprexec md5sum /usr/local/bin/refit

if [ -f MD5SUMS ]; then cat MD5SUMS; fi

The first sum that is shown by the third command must match the sum output
by the first command for each node. The second sum that is shown by the third
command must match the sum output by the second command for each node.

viprexec 'md5sum /var/tmp/*update.tbz'

Output from host : 192.168.219.2


2f4f9e07fabff6f7bb3a429192e31897 /var/tmp/ecs-os-setup-
target.x86_64-2.1196.578.update.tbz

Output from host : 192.168.219.4


2f4f9e07fabff6f7bb3a429192e31897 /var/tmp/ecs-os-setup-
target.x86_64-2.1196.578.update.tbz

Output from host : 192.168.219.1


2f4f9e07fabff6f7bb3a429192e31897 /var/tmp/ecs-os-setup-
target.x86_64-2.1196.578.update.tbz

Output from host : 192.168.219.3

2f4f9e07fabff6f7bb3a429192e31897 /var/tmp/ecs-os-setup-
target.x86_64-2.1196.578.update.tbz

provo-yellow:/var/tmp # viprexec md5sum /usr/local/bin/refit

Output from host : 192.168.219.1


8598a779130252e8272e99e44f9083da /usr/local/bin/refit

Output from host : 192.168.219.4


8598a779130252e8272e99e44f9083da /usr/local/bin/refit

28 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 33 of 70
ECS OS Offline upgrade

Output from host : 192.168.219.2


8598a779130252e8272e99e44f9083da /usr/local/bin/refit

Output from host : 192.168.219.3


8598a779130252e8272e99e44f9083da /usr/local/bin/refit

provo-yellow:/var/tmp # if [ -f MD5SUMS ]; then cat MD5SUMS; fi


2f4f9e07fabff6f7bb3a429192e31897 ecs-os-setup-target.x86_64-2.1196.578.update.tbz
8598a779130252e8272e99e44f9083da refit

3. Capture the RPM version info: the following command lists the last 60 installed
RPMs. The third command displays, embedded in file names, the version
information for the hosts to be updated.

viprexec -i refit preupdateversions

viprexec -i 'grep REFIT /var/tmp/refit.d/*-


preupdateversions.log'

viprexec -i 'ls -al /var/tmp/preupdateversions*.log'

4. Create the local repository on all nodes from the OS update bundle:
a. Run the following command to capture the RPM info, which can take up to 5
minutes:

viprexec 'refit deploybundle /var/tmp/ecs-os-setup-target.x86_64-*.update.tbz'

Removing old PXE files


sudo rm -f /srv/www/htdocs/image/*.{md5,xz}
Removing old repos.d files
[ -d /etc/zypp/repos.d.keep ] || sudo mkdir -p /etc/zypp/repos.d.keep
sudo mv /etc/zypp/repos.d/* /etc/zypp/repos.d.keep/.
Removing old repo packages
sudo rm -f /srv/www/htdocs/repo/*.rpm
Untarring update bundle... this can take up to 5 min
sudo tar xvf /var/tmp/ecs-os-setup-target.x86_64-2.1269.57.update.tbz
Retrieving repository 'repo' metadata [.done]
Building repository 'repo' cache [....done]
All repositories have been refreshed.
Loading repository data...
Reading installed packages...
'createrepo' is already installed.
No update candidate for 'createrepo-0.10.3-2.8.x86_64'. The highest available
version is already installed.
Resolving package dependencies...

Nothing to do.
Removing unnedded Hal and HWmgr
/srv/www/htdocs/repo / ~
Saving Primary metadata
Saving file lists metadata
Saving other metadata
/ ~
~
sudo zypper ref

Distribute the OS update files to all nodes 29

Page 34 of 70
ECS OS Offline upgrade

Bundled deployed to host. Proceed with update.


run 'refit summary' for more instructions.

b. Verify the deployment:

viprexec 'grep REFIT /var/tmp/refit.d/*-deploybundle.log'

Example output for each node:

REFIT SUCCESS running rm -f /srv/www/htdocs/image/*.{md5,xz}


REFIT SUCCESS running [ -d /etc/zypp/repos.d.keep ] || mkdir -p /etc/zypp/
repos.d.keep
REFIT SUCCESS running mv /etc/zypp/repos.d/* /etc/zypp/repos.d.keep/.
REFIT SUCCESS running rm -f /srv/www/htdocs/repo/*.rpm
REFIT SUCCESS running tar xvf /var/tmp/ecs-os-setup-target.x86_64-
1.398.58.update.tbz
REFIT SUCCESS running zypper ref

5. Run the following command to validate the "repo" RPM repository is present
and that only one such repository is present on each node:

viprexec zypper lr

Sample output:

Output from host : 192.168.219.5


# | Alias | Name | Enabled | GPG Check | Refresh
--+-------+------+---------+-----------+--------
1 | repo | repo | Yes | ( ) No | Yes

Output from host : 192.168.219.7


# | Alias | Name | Enabled | GPG Check | Refresh
--+-------+------+---------+-----------+--------
1 | repo | repo | Yes | ( ) No | Yes

Output from host : 192.168.219.8


# | Alias | Name | Enabled | GPG Check | Refresh
--+-------+------+---------+-----------+--------
1 | repo | repo | Yes | ( ) No | Yes

Output from host : 192.168.219.6


# | Alias | Name | Enabled | GPG Check | Refresh
--+-------+------+---------+-----------+--------
1 | repo | repo | Yes | ( ) No | Yes

6. Verify that the PXE media is in place for post-update node rebuilds, or rack
expansions:

viprexec 'ls -lR /srv/tftpboot/*/ /srv/www/htdocs/image/'

Example output for each node:

Output from host : 192.168.219.5


/srv/tftpboot/boot/:

30 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 35 of 70
ECS OS Offline upgrade

total 93556
drwxr-xr-x 2 root root 31 Nov 25 19:02.
drwxr-xr-x 4 root root 68 Nov 23 17:40 ..
-rw-r--r-- 1 root root 90885996 Nov 8 18:39 initrd
-rw-r--r-- 1 root root 4911968 Nov 8 18:11 linux

/srv/tftpboot/pxelinux.cfg/:
total 4
drwxr-xr-x 2root root 20 Aug 12 21:10 .
drwxr-xr-x 4 root root 68 Nov 23 17:40 ..
-rw-r--r-- 1 root root 530 Aug 12 20:55 old.file

/srv/www/htdocs/image/:
total 836772
drwxr-xr-x 2 root root 99 Nov 25 19:02 .
drwxr-xr-x 4 root root 29 Nov 23 17:40 ..
-rw-r--r-- 1 root root 59 Nov 8 18:38 ecs-os-setup-
target.x86_64-2.481.121.md5
-rw-r--r-- 1 root root 856845664 Nov 8 18:38 ecs-os-setup-
target.x86_64-2.481.121.xz

You should see initrd and linux in the boot directory and an md5 and
compressed file in the image directory. The image directory may have additional
files.

Note

If this system had a 2.2.1 > 2.2.1 HF1 quickfit update performed on it, you will
also see a second set of initrd and linuxon each node.

Sample output from one node:

Output from host : 192.168.219.5


/srv/tftpboot/2.2.1.0-1281.e8416b8.68/:
total 93908
-rw-r--r-- 1 root root 91186138 Sep 8 18:37 initrd
-rw-r--r-- 1 root root 4968544 Sep 8 18:37 linux

/srv/tftpboot/boot/:
total 93636
-rw-r--r-- 1 root root 90903648 Sep 7 15:39 initrd
-rw-r--r-- 1 root root 4975344 Sep 7 15:09 linux

/srv/tftpboot/pxelinux.cfg/:
total 4
-rw-r--r-- 1 root root 526 Jun 20 15:53 old.file

/srv/www/htdocs/image/:
total 1662120
-rw-r--r-- 1 root root 59 Sep 7 15:38 ecs-os-setup-
target.x86_64-3.1429.666.md5
-rw-r--r-- 1 root root 1702001696 Sep 7 15:38 ecs-os-setup-
target.x86_64-3.1429.666.xz
-rw-rw-rw- 1 root root 374 Sep 15 21:55 preset.cfg

Distribute the OS update files to all nodes 31

Page 36 of 70
ECS OS Offline upgrade

Verify remote IPMI management functionality


Use the refit command to verify if remote IPMI management works. The command
syntax is shown below:

refit ipmipower_all_not_node_x <lower> <upper> <x> <action>

Where:
l <lower> is the node ID* of lowest port device on your shared 1G turtle switch
l <upper> is the node ID* of highest port device on your shared 1G turtle switch
l <x> is the node ID* of the node where you are executing the IPMI command
l <action> is the IPMI power action: on|off|status
l *The getrackinfo command provides the node ID
In this example, you will get the status for all nodes in an eight-node system. After you
enter a command, you will be prompted to press Enter. The command then completes
and displays output:

refit ipmipower_all_not_node_x 1 8 1 status

Procedure
1. Get the status of <Node 1> and press Enter when prompted:

refit ipmipower_node_x 1 status

2. Do not proceed if communication with the BMC using IPMI commands is unable
to obtain status for any node.

Check for NFS mounts


NFS mounts can interfere with the upgrade procedure.
During the planning for the upgrade, any customer NFS mounts should have been
identified to ensure that they can be disconnected at the time of upgrade without
delay. Ensure now that these previously identified NFS mounts have been unmounted.
If not, stop and work with the customer to unmount them. Do not proceed with the
update until all mounts are unmounted.
Procedure
1. Check for NFS mounts:

viprexec -i "cat /proc/mounts | grep nfs"

Example output showing an NFS mount on <Node 1>:

Output from host : 192.168.219.1


10.245.100.10:/ans/test /tmp/test nfs

32 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 37 of 70
ECS OS Offline upgrade

rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,nolock,proto=tcp,timeo=60
0,retrans=2,sec=sys,mountaddr=10.245.100.10,mountvers=3,mountport=2049,mountproto=udp,
local_lock=all,addr=10.245.100.10 0 0

Output from host : 192.168.219.2

Output from host : 192.168.219.3

Output from host : 192.168.219.4

2. Before shutting down ECS services, the NFS mounts need to be unmounted:

CAUTION

The customer should perform the NFS unmount.

sudo umount /tmp/test

Where /tmp/test is the name of the mount.


3. Re-check for NFS mounts:

viprexec -i "cat /proc/mounts | grep nfs"

Example output showing no mounts:

Output from host : 192.168.219.1

Output from host : 192.168.219.2

Output from host : 192.168.219.3

Output from host : 192.168.219.4

Offline upgrade
For offline upgrade, upgrade the nodes in parallel by using ViPR multi-node tools, and
the /var/tmp/MACHINES file that were created.
After the upgrades are deployed, restart all nodes except <Node 1>. Monitor the
restarts from <Node 1>. When all these nodes are restarted, use an SSH or ESRS
connection to another node, like <Node 2>, to monitor the <Node 1> restart.
Access any node in the rack by using its IP address to SSH to it by its IP or hostname.
(See thethe node-rack reference tables.)Use .rack for the FQDN for the rack local
hostnames to differentiate them from the customer DNS environment.

Exit containers
In this procedure, you will exit the containers to preserve any customer-specific
modifications made to the current containers.
Procedure
1. From the service laptop, using Putty or a similar tool, SSH to <Node 1>.

Offline upgrade 33

Page 38 of 70
ECS OS Offline upgrade

2. Copy the OSUpgradeExitContainers_offline.py script to the <Node


1> /var/tmp directory of the node that is being upgraded from the /Tools
directory of the ECS section of the EMC SolVe Desktop by using a copy tool,
such as PSCP:

pscp -pw <root password>


"C:\ProgramData\EMC\SoLVeDesktop\ECS.SOLVE\tools\OSUpgradeExitContainers_offline.py"
root@192.168.219.1:/var/tmp/OSUpgradeExitContainers_offline.py"

Expected output:

OSUpgradeExitContainers_offline.py | 3 KB | 2.9 kB/s | ETA:


00:00:00 | 100%

3. Verify the OSUpgradeExitContainers_offline.py md5sum:

md5sum /var/tmp/OSUpgradeExitContainers_offline.py

Expected output:

9dde8ae0775482f1e405de4d7598d239
OSUpgradeExitContainers_offline.py

4. Configure permissions to run the script:

cd /var/tmp; chmod +755 /var/tmp/


OSUpgradeExitContainers_offline.py

5. Exit the containers by running the


OSUpgradeExitContainers_offline.py script from the /var/tmp
directory:

python OSUpgradeExitContainers_offline.py --ECSbuild=2.2.1

This script executes many commands to exit containers on the nodes.

Sample output:

admin@provo-pineapple:/var/tmp> python OSUpgradeExitContainers_offline.py --


ECSBuild=2.2.1
Executing command viprexec '/opt/emc/caspian/fabric/cli/bin/fcli maintenance list'

Output from host : 192.168.219.2


AGENT ID HOSTNAME ACTUAL
MODE
fc50a133-6eae-444b-b3d7-1b8647b64e88 sandy-pineapple.ecs.lab.emc.com ACTIVE

Output from host : 192.168.219.4


AGENT ID HOSTNAME ACTUAL
MODE
238f91e9-08d6-4493-a3c1-51da90a89cbc ogden-pineapple.ecs.lab.emc.com ACTIVE

34 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 39 of 70
ECS OS Offline upgrade

Output from host : 192.168.219.1


AGENT ID HOSTNAME ACTUAL
MODE
3511d9e6-4ead-4de7-ab7e-6399ba34e08f provo-pineapple.ecs.lab.emc.com ACTIVE

Output from host : 192.168.219.3


AGENT ID HOSTNAME ACTUAL
MODE
a2a48af2-0471-4e5c-9ab1-78214183a6e0 orem-pineapple.ecs.lab.emc.com ACTIVE

Executing command viprexec 'systemctl stop fabric-agent'

Output from host : 192.168.219.4

Output from host : 192.168.219.2

Output from host : 192.168.219.3

Output from host : 192.168.219.1

wait for 20 sec


Executing command viprexec 'systemctl status fabric-agent'

Output from host : 192.168.219.4


fabric-agent.service - fabric agent
Loaded: loaded (/usr/lib/systemd/system/fabric-agent.service; enabled)
Active: failed (Result: exit-code) since Mon 2016-12-19 07:26:07 UTC; 20s ago
Process: 38332 ExecStart=/opt/emc/caspian/fabric/agent/bin/fabric-agent
(code=exited, status=143)
Process: 38330 ExecStartPre=/bin/rm -f /var/run/fabric-agent.pid (code=exited,
status=0/SUCCESS)
Main PID: 38332 (code=exited, status=143)

Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Could not get number of CPUS from
dmidecode
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: executed cli: /sbin/lspci
terminated unexpectedly, rc: 13
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Could not get number of NICs from
lspci
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: Error accessing
callhome.properties file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 ogden-pineapple libviprhal[124038]: dockerCmd: docker images --no-
trunc | grep 780c449fdd0e
Dec 19 07:17:38 ogden-pineapple libviprhal[38332]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 ogden-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 ogden-pineapple systemd[1]: fabric-agent.service: main process
exited, code=exited, status=143/n/a
Dec 19 07:26:07 ogden-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 ogden-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.

Output from host : 192.168.219.3


fabric-agent.service - fabric agent
Loaded: loaded (/usr/lib/systemd/system/fabric-agent.service; enabled)
Active: failed (Result: exit-code) since Mon 2016-12-19 07:26:07 UTC; 20s ago
Process: 38280 ExecStart=/opt/emc/caspian/fabric/agent/bin/fabric-agent
(code=exited, status=143)
Process: 38278 ExecStartPre=/bin/rm -f /var/run/fabric-agent.pid (code=exited,
status=0/SUCCESS)
Main PID: 38280 (code=exited, status=143)

Dec 18 10:32:20 orem-pineapple libviprhal[13559]: Could not get number of NICs from
lspci
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: Error accessing callhome.properties

Exit containers 35

Page 40 of 70
ECS OS Offline upgrade

file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep 780c449fdd0e
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep a07479a114af
Dec 18 10:32:20 orem-pineapple libviprhal[13559]: dockerCmd: docker images --no-trunc
| grep 32cce433c3dc
Dec 19 07:17:37 orem-pineapple libviprhal[38280]: ioctl 2285(0x12) failed on /dev/sg3
with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 orem-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 orem-pineapple systemd[1]: fabric-agent.service: main process exited,
code=exited, status=143/n/a
Dec 19 07:26:07 orem-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 orem-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.

Output from host : 192.168.219.2


fabric-agent.service - fabric agent
Loaded: loaded (/usr/lib/systemd/system/fabric-agent.service; enabled)
Active: failed (Result: exit-code) since Mon 2016-12-19 07:26:07 UTC; 20s ago
Process: 38264 ExecStart=/opt/emc/caspian/fabric/agent/bin/fabric-agent
(code=exited, status=143)
Process: 38261 ExecStartPre=/bin/rm -f /var/run/fabric-agent.pid (code=exited,
status=0/SUCCESS)
Main PID: 38264 (code=exited, status=143)

Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: Could not get number of NICs from
lspci
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: Error accessing
callhome.properties file /opt/emc/caspian/fabric/agent/data/callhome.properties.
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep 780c449fdd0e
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep a07479a114af
Dec 18 10:32:20 sandy-pineapple libviprhal[43000]: dockerCmd: docker images --no-
trunc | grep 32cce433c3dc
Dec 19 07:17:38 sandy-pineapple libviprhal[38264]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 sandy-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 sandy-pineapple systemd[1]: fabric-agent.service: main process
exited, code=exited, status=143/n/a
Dec 19 07:26:07 sandy-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 sandy-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.

Output from host : 192.168.219.1


fabric-agent.service - fabric agent
Loaded: loaded (/usr/lib/systemd/system/fabric-agent.service; enabled)
Active: failed (Result: exit-code) since Mon 2016-12-19 07:26:07 UTC; 20s ago
Process: 41691 ExecStart=/opt/emc/caspian/fabric/agent/bin/fabric-agent
(code=exited, status=143)
Process: 41689 ExecStartPre=/bin/rm -f /var/run/fabric-agent.pid (code=exited,
status=0/SUCCESS)
Main PID: 41691 (code=exited, status=143)

Dec 18 10:32:20 provo-pineapple libviprhal[48253]: dockerCmd: docker images --no-


trunc | grep a07479a114af
Dec 18 10:32:20 provo-pineapple libviprhal[48253]: dockerCmd: docker images --no-
trunc | grep 32cce433c3dc
Dec 18 10:32:20 provo-pineapple libviprhal[48253]: dockerCmd: docker images --no-
trunc | grep 524f8808202b
Dec 19 07:16:08 provo-pineapple libviprhal[41691]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:16:11 provo-pineapple libviprhal[41691]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:16:14 provo-pineapple libviprhal[41691]: ioctl 2285(0x12) failed on /dev/
sg3 with 12: Cannot allocate memory: info=0x0; size=512
Dec 19 07:26:07 provo-pineapple systemd[1]: Stopping fabric agent...
Dec 19 07:26:07 provo-pineapple systemd[1]: fabric-agent.service: main process

36 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 41 of 70
ECS OS Offline upgrade

exited, code=exited, status=143/n/a


Dec 19 07:26:07 provo-pineapple systemd[1]: Stopped fabric agent.
Dec 19 07:26:07 provo-pineapple systemd[1]: Unit fabric-agent.service entered failed
state.

Executing command viprexec 'cd /opt/emc/caspian/fabric/agent; conf/configure-object-


main.sh --stop'

Output from host : 192.168.219.3


Mon Dec 19 07:26:28 UTC 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0blobsvc
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0cm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0eventsvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0metering preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
rm preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0resourcesvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0ssm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
SET PREPAREFORUPGRADE TO 10.245.133.197 SUCCEEDED!!

Output from host : 192.168.219.1


Mon Dec 19 07:26:28 UTC 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
blobsvc preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
cm preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
eventsvc preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
metering preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
rm preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current

Exit containers 37

Page 42 of 70
ECS OS Offline upgrade

Dload Upload Total Spent Left Speed


0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
resourcesvc preupgrade notification sent
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
ssm preupgrade notification sent
SET PREPAREFORUPGRADE TO 10.245.133.195 SUCCEEDED!!

Output from host : 192.168.219.2


Mon Dec 19 07:26:28 UTC 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0blobsvc
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0cm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0eventsvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0metering preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0rm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:11 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0resourcesvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
ssm preupgrade notification sent
SET PREPAREFORUPGRADE TO 10.245.133.196 SUCCEEDED!!

Output from host : 192.168.219.4


Mon Dec 19 07:26:28 UTC 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0blobsvc
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0cm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0eventsvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0metering preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0

38 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 43 of 70
ECS OS Offline upgrade

% Total % Received % Xferd Average Speed Time Time Time Current


Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0rm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:--
0resourcesvc preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0ssm
preupgrade notification sent
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
SET PREPAREFORUPGRADE TO 10.245.133.198 SUCCEEDED!!

Executing command viprexec 'docker stop --time 30 object-main'

Output from host : 192.168.219.3


object-main

Output from host : 192.168.219.2


object-main

Output from host : 192.168.219.4


object-main

Output from host : 192.168.219.1


object-main

Executing command viprexec 'docker stop fabric-lifecycle'

Output from host : 192.168.219.4


Error response from daemon: no such id: fabric-lifecycle
Error: failed to stop containers: [fabric-lifecycle]

Output from host : 192.168.219.1


fabric-lifecycle

Output from host : 192.168.219.3


fabric-lifecycle

Output from host : 192.168.219.2


fabric-lifecycle

Executing command viprexec 'docker stop fabric-zookeeper'

Output from host : 192.168.219.4


Error response from daemon: no such id: fabric-zookeeper
Error: failed to stop containers: [fabric-zookeeper]

Output from host : 192.168.219.3


fabric-zookeeper

Output from host : 192.168.219.1


fabric-zookeeper

Output from host : 192.168.219.2


fabric-zookeeper

Executing command viprexec 'docker stop fabric-registry'

Output from host : 192.168.219.2


Error response from daemon: no such id: fabric-registry
Error: failed to stop containers: [fabric-registry]

Output from host : 192.168.219.3


Error response from daemon: no such id: fabric-registry

Exit containers 39

Page 44 of 70
ECS OS Offline upgrade

Error: failed to stop containers: [fabric-registry]

Output from host : 192.168.219.4


Error response from daemon: no such id: fabric-registry
Error: failed to stop containers: [fabric-registry]

Output from host : 192.168.219.1


fabric-registry

Executing command viprexec 'sudo docker ps -a'

Output from host : 192.168.219.1


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
58b099a69307 780c449fdd0e "/opt/vipr/boot/boot." 2 weeks ago
Exited (137) 23 seconds ago object-main
aa7c5da31814 a07479a114af "./boot.sh lifecycle" 2 weeks ago
Exited (137) 11 seconds ago fabric-lifecycle
2633e5003bde 32cce433c3dc "./boot.sh 1 1=169.25" 2 weeks ago
Exited (137) 1 seconds ago fabric-zookeeper
6d6b6d46cdc4 524f8808202b "./boot.sh" 2 weeks ago
Exited (0) Less than a second ago fabric-registry

Output from host : 192.168.219.4


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
5d9e046aae72 780c449fdd0e "/opt/vipr/boot/boot." 2 weeks ago
Exited (137) 23 seconds ago object-main

Output from host : 192.168.219.3


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
7e9bab2da339 780c449fdd0e "/opt/vipr/boot/boot." 2 weeks ago
Exited (137) 23 seconds ago object-main
3c1b21a806b2 a07479a114af "./boot.sh lifecycle" 2 weeks ago
Exited (137) 11 seconds ago fabric-lifecycle
3558ff263b6f 32cce433c3dc "./boot.sh 3 1=169.25" 2 weeks ago
Exited (137) 1 seconds ago fabric-zookeeper

Output from host : 192.168.219.2


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
b92c3cd86c4a 780c449fdd0e "/opt/vipr/boot/boot." 2 weeks ago
Exited (137) 23 seconds ago object-main
9cfe6f765e9f a07479a114af "./boot.sh lifecycle" 2 weeks ago
Exited (137) 11 seconds ago fabric-lifecycle
28aa10da3c71 32cce433c3dc "./boot.sh 2 1=169.25" 2 weeks ago
Exited (137) 1 seconds ago fabric-zookeeper

Executing command viprexec systemctl stop docker

Output from host : 192.168.219.1

Output from host : 192.168.219.3

Output from host : 192.168.219.4

Output from host : 192.168.219.2

Executing command viprexec 'systemctl status docker | grep Active'

Output from host : 192.168.219.4


Active: inactive (dead) since Mon 2016-12-19 07:33:37 UTC; 175ms ago

Output from host : 192.168.219.1

40 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 45 of 70
ECS OS Offline upgrade

Active: inactive (dead) since Mon 2016-12-19 07:33:37 UTC; 193ms ago

Output from host : 192.168.219.3


Active: inactive (dead) since Mon 2016-12-19 07:33:37 UTC; 191ms ago

Output from host : 192.168.219.2


Active: inactive (dead) since Mon 2016-12-19 07:33:37 UTC; 139ms ago

Perform the OS update on all nodes


This part of the update can take up to 40 minutes. Also, when using the tail command
with the /var/tmp/refit.d/*-doupdate.log option, expect, significant pauses
on some packages as they are being deployed.
Procedure
1. Update the ECS OS on all nodes:

viprexec 'nohup refit doupdate >/var/tmp/refit.d/doupdate.out 2>&1 </dev/null &'

Because the command does not indicate progress, it can appear that it is not
responding. To monitor the execution, start another SSH session to <Node 1>
and run the watch command to monitor what is happening.
2. Monitor the update by either of the following methods:
l Watch method:

watch viprexec 'ps auxwww | grep refit'

Example output when completed (note the only refit found is in watch and grep,
not refit itself still running)

Every 2.0s: ps auxwww | grep


refit Mon Nov 30 12:40:55 2015

root 115344 0.0 0.0 12740 1768 pts/0 S+ 12:40


0:00 watch ps auxwww | grep refit
root 115363 0.0 0.0 12736 652 pts/0 S+ 12:40
0:00 watch ps auxwww | grep refit
root 115364 0.0 0.0 13060 1460 pts/0 S+ 12:40
0:00 sh -c ps auxwww | grep refit
root 115366 0.0 0.0 10492 932 pts/0 S+ 12:40
0:00 grep refit

Use Ctrl +C to exit watch mode when the refit doupdate commands
complete.
l Use the tail log output method on each node:

ssh <Node x>.rack tail -100f /var/tmp/refit.d/doupdate.out

Where <Node x> is the rack-local hostname for the node of interest. Use Ctrl
+C to exit.

Perform the OS update on all nodes 41

Page 46 of 70
ECS OS Offline upgrade

For example:

ssh provo.rack tail -100f /var/tmp/refit.d/doupdate.out

Example output:

Preinstall some packages with troublesome dependencies


cd /srv/www/htdocs/repo && rpm -Uvh --nodeps git-
core*.x86_64.rpm qemu-x86*.x86_64.rpm
This can take up to 40min
zypper -n dup -l
zypper -n in -l fping jre1.8.0_60 libcairo-gobject2 libxml2-
tools
Remove the old jre
rpm -e jre-1.7.0_67-fcs.x86_64
DONE!

Note

Do not proceed until refit doupdate command completes on all nodes.

3. Confirm that the updates are in place:

viprexec 'zypper lu --all'

Example output from one node:

Output from host : 192.168.219.15


Loading repository data...
Reading installed packages...
No updates found.

Move the PXE menu


Moving the PXE menu prevents re-install during the OS update.
Procedure
1. To prevent re-install during the update:

viprexec "mv /srv/tftpboot/pxelinux.cfg/default /srv/tftpboot/pxelinux.cfg/new.file"

2. Verify that the PXE menu is moved:

viprexec 'ls -l /srv/tftpboot/pxelinux.cfg/.'

42 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 47 of 70
ECS OS Offline upgrade

Example output:

-rw-r--r-- 1 root root 526 Apr 15 20:14 new.file


-rw-r--r-- 1 root root 526 Dec 7 14:41 old.file

A file named default should not be displayed in the output.

Restart and reconnect each node, except <Node 1>


Follow the steps in this section carefully to avoid a reboot that hangs and nodes that
enter an indeterminate state (Is the node shutting down? Rebooting? Stuck?). Do not
immediately try to power off nodes using IPMI. The goal is to achieve a graceful
shutdown.
You are going to shut down and power up each node starting with the last node in the
rack. For example, if you have a four-node system, start with <Node 4> then <Node
3> and then <Node 2>. Do not power down <Node 1> at this stage.
In the procedure that follows, <Node x> refers to the node you are currently working
on. <Node y> refers to a node other than <Node x> that you will use to monitor the
shutdown of <Node x>.
Procedure
1. This procedure uses the shutdown -h command to gracefully power off
nodes. The shutdown command fails over to <Node y>. Ensure that there is a
connection to some node, <Node y>, and that it does not depend on a
connection from <Node x>. For example, do not use EMC Secure Remote
Support or SSH from <Node x> to <Node y>. SSH to <Node x> and shut down
<Node x>:

sudo shutdown -h now

2. Verify the power status for <Node x>:

refit ipmipower_node_x <x> status

Where <x> is the node number from the table below.

Table 4 Node Rack Private IP addresses

Node number Node name Rack Private IP Admin account


address
1 provo.rack 192.168.219.1 root/admin

2 sandy.rack 192.168.219.2 root/admin

3 orem.rack 192.168.219.3 root/admin

4 ogden.rack 192.168.219.4 root/admin

5 layton.rack 192.168.219.5 root/admin

6 logan.rack 192.168.219.6 root/admin

Restart and reconnect each node, except <Node 1> 43

Page 48 of 70
ECS OS Offline upgrade

Table 4 Node Rack Private IP addresses (continued)

Node number Node name Rack Private IP Admin account


address
7 lehi.rack 192.168.219.7 root/admin

8 murray.rack 192.168.219.8 root/admin

9 boston.rack 192.168.219.9 root/admin

10 chicago.rack 192.168.219.10 root/admin

11 houston.rack 192.168.219.11 root/admin

12 phoenix.rack 192.168.219.12 root/admin

13 dallas.rack 192.168.219.13 root/admin

14 detroit.rack 192.168.219.14 root/admin

15 columbus.rack 192.168.219.15 root/admin

16 austin.rack 192.168.219.16 root/admin

17 memphis.rack 192.168.219.17 root/admin

18 seattle.rack 192.168.219.18 root/admin

19 denver.rack 192.168.219.19 root/admin

20 portland.rack 192.168.219.20 root/admin

21 tucson.rack 192.168.219.21 root/admin

22 atlanta.rack 192.168.219.22 root/admin

23 fresno.rack 192.168.219.23 root/admin

24 mesa.rack 192.168.219.24 root/admin

After running this command, you are prompted with:

Running on host <y> not connected to,or thru, node <x>and you passed: <x>:1
<action>:statusVerify with enter/return to continue - or Ctrl-C to abort
You must press Enter to confirm the command.

The command waits for a response before continuing.


Example output:

ipmi status for bmc of node : ipmitool -H


192.168.219.104 -U root -P passwd power statusChassis
Power is offREFIT SUCCESS running ipmitool -H
192.168.219.104 -U root -P passwd power status

3. Validate that this output shows the power state for <Node x> is Chassis
Power is off.

44 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 49 of 70
ECS OS Offline upgrade

4. If after 5 minutes, <Node x> remains powered on, force it to power off:

refit ipmipower_node_x <x> off

To confirm the command, press Enter when prompted. Verify the status and
repeat if <Node x> does not power off (step 2).
5. Power on <Node x>:

refit ipmipower_node_x <x> on

Sample output:

To confirm the command, press Enter when prompted. Verify the status and
repeat if <Node x> does not power on (step 2).

Note

It will take a few minutes for the node to reboot.

CAUTION

If the node fails to power up successfully because of a hardware failure,


stop the OS update procedure immediately. Contact Engineering for
recommendations and remediation.

6. Confirm that <Node x> successfully rebooted:


a. Verify the node is in the node list.

doit uname -a

Sample output:

admin@acmelab:~> doit uname -a


192.168.219.1: Linux lab1.acme.com 3.12.53-60.30-default #1 SMP Wed Feb 10
14:41:46 UTC 2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.2: Linux lab2.acme.com 3.12.53-60.30-default #1 SMP Wed Feb 10
14:41:46 UTC 2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.3: Linux lab3.acme.com 3.12.53-60.30-default #1 SMP Wed Feb 10
14:41:46 UTC 2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.4: Linux lab4.acme.com 3.12.62-60.62-default #1 SMP Thu Aug 4
09:06:08 UTC 2016 (b0e5a26) x86_64 x86_64 x86_64 GNU/Linux

b. Ping the nodes and ensure that the current node is reachable.

pingall

Sample output

admin@ccaecslab01n1:~> pingall
192.168.219.1 ping succeeded
192.168.219.2 ping succeeded

Restart and reconnect each node, except <Node 1> 45

Page 50 of 70
ECS OS Offline upgrade

192.168.219.3 ping succeeded


192.168.219.4 ping succeeded

c. Check to make sure the node appears in the list of running nodes.

doit uptime

admin@ccaecslab01n1:~> doit uptime


192.168.219.1: 19:01pm up 91 days 0:06, 7 users, load
average: 0.10, 0.06, 0.11
192.168.219.2: 19:01pm up 91 days 0:26, 0 users, load
average: 0.11, 0.09, 0.13
192.168.219.3: 19:01pm up 91 days 0:10, 0 users, load
average: 0.10, 0.07, 0.11
192.168.219.4: 19:01pm up 0:03, 0 users, load
average: 4.37, 3.26, 1.32

The output should show <Node x> is up.


7. Repeat this procedure for the next node in the sequence (from the highest to
lowest node number), but do not shut down <Node 1>. Login to the next node:

ssh <Node x>

Go to step 1.

Reboot <Node 1>


After all other nodes are back up and network accessible, proceed with the power
cycle for <Node 1>
Procedure
1. Using Putty or a similar tool, SSH directly to <Node 2> as the administrative
user (admin).
2. Shut down <Node 1>:

ssh 192.168.219.1 'sudo shutdown -h now'


Connection to 192.168.219.1 closed by remote host.

3. Check the power status of <Node 1>:

refit ipmipower_node_x 1 status

After running the command, you are prompted with:

Running on host <y> not connected to, or thru, node <x>


and you passed: <x>:1 <action>:status
Verify with enter/return to continue - or Ctrl-C to abort

You must press <Enter> to confirm the command and continue.

46 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 51 of 70
ECS OS Offline upgrade

4. If after 5 minutes, <Node 1> reports 'on,' then force it 'off' with this command:

refit ipmipower_node_x 1 off

You must press <Enter> to confirm the command and continue.


5. Power on the node:

refit ipmipower_node_x 1 on

You must press <Enter> to confirm the command and continue.


6. Verify the status is on, as in step 3. Repeat step 3 and 4 until the node reports
'on'.
Monitor booting with any means at your disposal, such as RMM virtual consoles
or serial-over-lan connections.

Verify bonding mode


For Highly Available (HA) configurations, check that both slave interfaces are
configured and that the mode is correct.
Procedure
1. Verfiy that both slave interfaces are configured:

viprexec 'ip link show | egrep "slave-|public"'

Example output:

6: slave-0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500


qdisc mq master public state UP mode DEFAULT group default
qlen 1000
8: slave-1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500
qdisc mq master public state UP mode DEFAULT group default
qlen 1000
10: public: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500
qdisc noqueue state UP mode DEFAULT group default

2. If you do not see slave-o and slave-1 in the output, follow these steps:
a. SSH to the node in error:

ssh <Node x>

b. Run these commands:

sudo ifdown public

sudo rm /etc/sysconfig/network/ifcfg-public

sudo systemctl restart nan

Verify bonding mode 47

Page 52 of 70
ECS OS Offline upgrade

c. Recheck using the ip link command exactly as shown:

ip link show | egrep "slave-|public"

d. Exit the SSH session to get back to <Node 1>.

exit

3. Check that the bonding mode is set to the correct value.

viprexec grep Mode /proc/net/bonding/public

The default bonding mode you should see (unless configured differently):

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

4. If the output from step 3 does not display the correct bonding mode, run the
following command:

ssh <Node x> 'sudo ifdown public && sudo ifup public'

Once corrections are made for all nodes as needed, recheck with step 3.

Restart the containers


Procedure
1. Verify that docker is running, and if it is not running, start it:
a. Verify

viprexec 'systemctl status docker | grep Active'

Example output from one node:

Active: active (running) since Tue 2015-08-11 15:02:28 UTC;


1min 37s ago

b. If you see the following output, you have encountered a known bug:

Output from host : 192.168.219.16


Failed to get properties: Activation of org.freedesktop.systemd1 timed out

Output from host : 192.168.219.14


Failed to get properties: Activation of org.freedesktop.systemd1 timed out

Reboot the affected nodes and restart this procedure from step 1a.

48 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 53 of 70
ECS OS Offline upgrade

c. If the container is not running, run:

viprexec 'systemctl start docker'

Sample output:

Repeat step 1a.


2. Verify that all storage disks are in the Good state by listing all disks with a
status other than Good:

viprexec 'cs_hal list vols | grep -v GOOD'

Example output for one node:

Output from host : 192.168.219.3


Volume(s):
SCSI Device Block Device FS UUID Type Slot
Label Partition Name SMART Mount Point
----------- ------------ ------------------------------------- --------- -----
-------------- ----------------------------------- ------------ ------------
/dev/sg0 /dev/sda1 eb2bc514-8bd8-4ae7-8bc5-ee806863ba40 ext3 0
BOOT n/a /
boot
/dev/sg0 /dev/sda2 n/a n/a 0 n/
a n/a n/
a

total: 62

Note

The first two entries in the output represent the system disks. These two
entries will not display a status. This is normal.

Do not proceed unless you get similar output for each node.
3. Start the fabric agent:

viprexec "systemctl start fabric-agent"

There is no node output.


4. Verify the fabric agent is running:

viprexec "systemctl status fabric-agent"

Sample output from one node:

Active: active (running) since Wed 2016-09-07 20:52:59 UTC;


4min 35s ago

Restart the containers 49

Page 54 of 70
ECS OS Offline upgrade

5. Wait until the fabric agent starts all required containers:

viprexec "docker ps -a"

Sample output from one node:

Output from host : 192.168.219.1


CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
f4c615024fc6 780c449fdd0e "/opt/vipr/boot/boot." 13 days ago
Up 5 minutes object-main
aecfe6b312c9 a07479a114af "./boot.sh lifecycle" 13 days ago
Up 5 minutes fabric-lifecycle
40005d2645f0 32cce433c3dc "./boot.sh 1 1=169.25" 13 days ago
Up 5 minutes fabric-zookeeper
8eb5a107361a 524f8808202b "./boot.sh" 13 days ago
Up 5 minutes fabric-registry

6. Verify that the node's containers have started and the nodes are ACTIVE:

viprexec "cd /opt/emc/caspian/fabric/cli && bin/fcli


maintenance list"

Example output from one node:

AGENT ID
HOSTNAME ACTUAL MODE
86c0baa9-f35a-4766-b2f8-3046aedc5eb1 layton-
chestnut.ecs.lab.emc.com ACTIVE

Ensure that all docker containers are ACTIVE, and Exited is not displayed in
the output.
7. If the containers are not running, stop and restart the fabric agent:

viprexec "systemctl stop fabric-agent"

There is no node output. Start the fabric-agent:

viprexec "systemctl start fabric-agent"

There is no node output.

Save the post-update OS information


Procedure
1. Create an audit record that indicates an upgrade was performed on the nodes:

viprexec "refit postupdateversions"

50 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 55 of 70
ECS OS Offline upgrade

2. Check the status:

viprexec "grep REFIT /var/tmp/refit.d/*-postupdateversions.log


grep REFIT /var/tmp/refit.d/*-postupdateversions.log"

Sample output:

REFIT SUCCESS running rpm -qa | egrep 'emc|ecs|nile|vipr|asd'


| sort
REFIT SUCCESS running rpm -qa --last | head -60

Compare pre- and post-upgrade information


This section explains how to validate the success of the ECS OS update by obtaining
and comparing pre- and post-upgrade information on a single node.
Procedure
1. Display the post-upgrade ECS OS version embedded in the filename:

ls /var/tmp/postupdateversions*.log

2. Run the following command to Collect the pre upgrade version information, and
assign it to the variable PREUP.

export PREUP=$(ls -1rt /var/tmp/refit.d/*-preupdateversions.log|head -1)

echo $PREUP

3. Verify that the file is populated:

head $PREUP

4. Collect the post-update version information, and assign the variable the
POSTUP:

export POSTUP=$(ls -1rt /var/tmp/refit.d/*-postupdateversions.log|tail -1)

echo $POSTUP

5. Verify that the file is populated:

head $POSTUP

Compare pre- and post-upgrade information 51

Page 56 of 70
ECS OS Offline upgrade

6. Review the output of each command and verify the pre-update timestamp is
earlier than the post-update timestamp:

ls -alrt $PREUP $POSTUP

Example output:

-rw-r--r-- 1 root root 5258 Jul 8 21:26


/var/tmp/refit.d/20150708-212626-preupdateversions.log-rw-r--
r--
1 root root 5258 Jul 8 22:29
/var/tmp/refit.d/20150708-222904-postupdateversions.log

7. Review the output from the following command:

diff -Naurp $PREUP $POSTUP | sed '/REFIT/q'

---
/var/tmp/refit.d/20150708-212626-preupdateversions.log 2015-07-08
21:26:27.850666079 +0000 +++
/var/tmp/refit.d/20150708-222904-postupdateversions.log 2015-07-08
22:29:05.439526482 +0000 @@
-1,23 +1,23 @@ -information
about version: 1.2.0.0-398.b278210.58 +information about version:
1.2.0.0-403.b98c1c4.63 rpm -qa
| egrep 'emc|ecs|nile|vipr|asd' | sort connectemc-3.1.0.1-1.x86_64 ecs-
callhome-1.1.0.0-1964.d913a13.x86_64
-ecs-os-base-1.2.0.0-398.b278210.58.noarch +ecs-os-base-1.2.0.0-403.b98c1c4.63.noarch
emc-arista-firmware-1.2-1.0.x86_64
emc-cdes-firmware-1.1.3.326-8.02.1.x86_64 emc-cdes-testeses-8.14-1.0.x86_64 emc-cdes-
zoning-1.1-1.2.x86_64emc-drive
-firmware-1.3-1.x86_64
-emc-ecs-diags-2.1.1.0-656.b635b0b.noarch
+emc-ecs-diags-2.1.1.0-673.49ee354.noarch
emc-intel-firmware-13.1.5-1.2.BIOS02.03.0003_BMC6680.x86_64
emc-lab-utils-1.10-1.0.x86_64
emc-lsi-hba-firmware-4.1-1.0.x86_64
emc-lsi-storelibir-2-17.01-657.1b62e78.1.x86_64
-emc-nan-2.1.1.0-665.836312e.x86_64
-nile-hwmgr-1.1.1.0-331.0e14626.x86_64
-nile-hwmgr-utils-1.1.1.0-331.0e14626.x86_64
+emc-nan-2.1.1.0-675.584c94e.x86_64
+nile-hwmgr-1.1.1.0-333.cf89b22.x86_64
+nile-hwmgr-utils-1.1.1.0-333.cf89b22.x86_64
python-viprhal-1.1.1.0-1180.00f72e7.x86_64
viprhal-1.1.1.0-1180.00f72e7.x86_64
REFIT SUCCESS running rpm -qa | egrep
'emc|ecs|nile|vipr|asd' | sort

The lines describing the pre-update version are denoted by the "-" symbol and
the post-update version is denoted by the "+" symbol. Locate the ecs-os-base
version in the output and verify that the post-update version displays the
expected value. The Readme.txt file or Release Notes provide more information
on package versions that must be verified.

52 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 57 of 70
ECS OS Offline upgrade

8. Unset the two exported variables and exit the node:

unset PREUP POSTUP

Compare pre- and post-upgrade information 53

Page 58 of 70
ECS OS Offline upgrade

54 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 59 of 70
CHAPTER 2
ECS OS offline post upgrade tasks

This section explains the following topics:

l ECS OS offline post-update tasks for all nodes..................................................56


l Disconnect from the turtle switch......................................................................63

ECS OS offline post upgrade tasks 55

Page 60 of 70
ECS OS offline post upgrade tasks

ECS OS offline post-update tasks for all nodes


After the nodes have been updated to the new OS, perform these tasks.
After all nodes have been updated and rebooted, proceed with these post-update
tasks.
To begin, SSH to <Node 1> using Putty or a similar tool.

Restore the PXE menu


Restore the upgraded PXE menu in preparation for possible node expansion or
replacement.
Procedure
1. Restore the PXE menu:

viprexec 'mv /srv/tftpboot/pxelinux.cfg/new.file /srv/tftpboot/pxelinux.cfg/default'

2. Verify that the PXE menu is moved:

viprexec 'ls -l /srv/tftpboot/pxelinux.cfg/'

Example output showing the default file:

-rw-r--r-- 1 root root 526 Apr 15 20:14 default


-rw-r--r-- 1 root root 526 Dec 7 14:41 old.file

Post-update: verify the ignore list on all nodes


Procedure
1. Verify that the number of MAC addresses, and entries match the node count:

sudo getrackinfo -i

Example output showing 'Done!' for all nodes:

Rack Installer Status


=====================
Mac Name Port Ip Status
00:1e:67:96:3e:59 provo 1 none Done!
00:1e:67:96:40:75 sandy 2 none Done!
00:1e:67:96:40:1b orem 3 none Done!
00:1e:67:96:40:2f ogden 4 none Done!

2. Check the dnsmasq ignore file:

viprexec cat /etc/dnsmasq.dhcpignore/all | sort | uniq -c

56 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 61 of 70
ECS OS offline post upgrade tasks

You can check that the count (4) prefixing each MAC matches the node count
for the rack by using the command output. The MACs must also match those
from the output of the getrackinfo -i command in step 2.
Example output:

4
4 00:1e:67:96:3e:59,ignore # (port 1) provo
4 00:1e:67:96:40:1b,ignore # (port 3) orem
4 00:1e:67:96:40:2f,ignore # (port 4) ogden
4 00:1e:67:96:40:75,ignore # (port 2) sandy
1 Output from host : 192.168.219.1
1 Output from host : 192.168.219.2
1 Output from host : 192.168.219.3
1 Output from host : 192.168.219.4

3. If needed, run the following commands and then repeat steps 1 and 2.

for mac in $(sudo getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do sudo
setrackinfo --installer-ignore-mac $mac; done

4. Confirm that the ignore files are the same on all the nodes by comparing
md5sums:

viprexec md5sum /etc/dnsmasq.dhcpignore/all

The checksums should be the same for all nodes.


Example output:

Output from host : 192.168.219.1


b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all

Output from host : 192.168.219.3


b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all

Output from host : 192.168.219.2


b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all

Output from host : 192.168.219.4


b5d076b7566a89553b49e2cfdef3e1f9 /etc/dnsmasq.dhcpignore/all

Check that object service initialization process is complete


Run the following commands from one of the upgraded nodes:
Procedure
1. Verify that the directory tables are all online:
For a non-network-segmented cluster, use this command with a public IP
address, replacing the sample IP address with your own:

curl -s http://10.245.132.45:9101/stats/dt/DTInitStat/ | xmllint --format - |grep -A4


'<entry>'

Check that object service initialization process is complete 57

Page 62 of 70
ECS OS offline post upgrade tasks

For a network-segmented cluster, use this command with a data IP address,


replacing the sample IP address with your own:

curl -s http://10.10.10.45:9101/stats/dt/DTInitStat/ | xmllint --format - |grep -A4


'<entry>'

Example output:

<entry>
<total_dt_num>1920</total_dt_num>
<unready_dt_num>355</unready_dt_num>
<unknown_dt_num>0</unknown_dt_num>
</entry>
<entry>
<type>RR</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>95</unready_dt_num>
--
<entry>
<type>MR</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>128</unready_dt_num>
--
<entry>
<type>LS</type>
<level>0</level>
<total_dt_num>128</total_dt_num>
<unready_dt_num>9</unready_dt_num>
<entry>
<total_dt_num>1920</total_dt_num>
<unready_dt_num>0</unready_dt_num>
<unknown_dt_num>0</unknown_dt_num>
</entry>

2. Verify.
The first time you run the command, the entry for <unready_dt_num> might
not be listed in the output. Rerun the command until in the "total_dt_num"
section, the unready_dt_num is listed as 0.

Note

A cluster with a large load can take up to 40 minutes to initialize the DT. It is not
necessary to wait for all DTs to initialize before you begin the OS update on the
next node. For example, the following types can be ignored:RR, MR, or MA. If
other types have non-zero unready or unknown DT counts, do not continue.

Check the OS kernel on all nodes


After all OS reboots are complete, check that the kernels have been updated.

58 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 63 of 70
ECS OS offline post upgrade tasks

Procedure
1. List the kernel version on each node and compare it to the information you
saved at the beginning of this procedure. Verify the kernel information has
changed:

viprexec 'uname -a'

Example output:

192.168.219.1: Linux provo-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC


2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.2: Linux sandy-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.3: Linux orem-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux
192.168.219.4: Linux ogden-sage 3.12.53-60.30-default #1 SMP Wed Feb 10 14:41:46 UTC
2016 (e57129f) x86_64 x86_64 x86_64 GNU/Linux

Verifying the data path functionality


Before you begin
Do the following:
l Obtain the S3 browser tool
l Create or locate a text file on your laptop that you can upload to the appliance.
Consult with the customer to determine if there is an object user, a secret key, and a
bucket that you can use to test the data path. If they do not have these, you will have
to create them. Consult the ESC documentation for instructions about how to create
an object user, a secret key, and a bucket: ECS 3.0 Documentation Index.
Procedure
1. Start the S3 Browser and set up an account for the ECS Appliance with the
following settings for the given options:
You should now see the bucket provided ot you or created by you in the
browser.

Table 5 S3 browser settings

Option Setting
Storage Type S3 Compatible Storage

REST Endpoint IP address of an ECS Appliance node using


port 9020 or 9021. For example:
198.51.100.244:9021

Access Key ID ecsuser

Secret Access Key Object Secret Access Key

2. Use the S3 browser to upload the test file from your laptop to verify that you
are able to write to the appliance.

Verifying the data path functionality 59

Page 64 of 70
ECS OS offline post upgrade tasks

Enabling alerting
Re-enable dial home events after maintenance.
Procedure
1. Log into the current Rack Master.
ssh master.rack

2. Enable connect home using xDoctor.


sudo -i xdoctor --tool --exec=connecthome_maintenance --
method=enable

This command enables the transmission of dial home alerts during the service
engagement.
For example,

sudo -i xdoctor --tool --exec=connecthome_maintenance --method=enable

2016-04-22 17:23:30,327: xDoctor_4.4-24 - INFO: Executing xDoctor Tool:


[connect home_maintenance], using Method: [enable],
Options: [] and Args: []
2016-04-22 17:23:33,532: xDoctor_4.4-24 - INFO: Request to get out a
Maintenance Window. Re-enabling ConnectHome ...
2016-04-22 17:23:33,532: xDoctor_4.4-24 - INFO: ConnectHome is in Maintenance,
r e-enabling it
2016-04-22 17:23:39,710: xDoctor_4.4-24 - INFO: xDoctor Alerting successfully
reverted back to Enabled ...
2016-04-22 17:23:39,861: xDoctor_4.4-24 - INFO: Successfully cleared reverting
flag ...
2016-04-22 17:23:39,862: xDoctor_4.4-24 - INFO: Successfully re-enabled
ConnectHome ...

3. If the customer disabled notification policies (email, snmp and/or rsyslog)


through the ECS GUI during this service activity, then notify them to re-enable
previous notification policies.
4. Once you view a message of Successfully re-enable ConnectHome and
the customer has disabled (if elected) any configured notification policies,
continue to the next task of this procedure.

Verifying the xDoctor version


Verify that xDoctor is installed and is at the latest version across the ECS systems.

CAUTION

After the completion of an OS update, the installed version of xDoctor may have
reverted to a version earlier than the latest. It is vital to recheck the xDoctor
version and upgrade to the latest before proceeding.

Procedure
1. Log into the current Rack Master.

ssh master.rack

60 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 65 of 70
ECS OS offline post upgrade tasks

2. Check the xDoctor version.


sudo -i xdoctor --sysversion

In the following example, the xDoctor version is uniform on all nodes.

sudo -i xdoctor --sysversion


xDoctor Uniform on all nodes: 4.4-24

In the following example, the xDoctor version is not uniform on all nodes.

sudo -i xdoctor --sysversion


xDoctor Not Uniform on all nodes:
[4.4-17] -> ['169.254.1.1', '169.254.1.2', '169.254.1.4']
[4.4-10] -> ['169.254.1.3']

3. If the installed version of xDoctor listed in the above step is not the latest
version then the ECS xDoctor Users Guide available in ECS SolVe provides
details on upgrading or reinstalling xDoctor.

sudo –i xdoctor –u -A

After you finish


If all nodes have the latest version, then continue with next task in this procedure.

Checking ECS health using xDoctor


Procedure
1. Launch xDoctor and perform a Full Diagnosis Suite using the system scope
(default).

sudo -i xdoctor

For example:

sudo -i xdoctor
2015-10-27 18:25:16,149: xDoctor_4.4-24 - INFO: Initializing xDoctor v4.4-24 ...
2015-10-27 18:25:16,191: xDoctor_4.4-24 - INFO: Removing orphaned session -
session_1445968876.975
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Starting xDoctor
session_1445970315.896 ... (SYSTEM)
2015-10-27 18:25:16,193: xDoctor_4.4-24 - INFO: Master Control Check ...
2015-10-27 18:25:16,242: xDoctor_4.4-24 - INFO: xDoctor Composition - Full Diagnostic
Suite for ECS
2015-10-27 18:25:16,364: xDoctor_4.4-24 - INFO: Session limited to 0:30:00
2015-10-27 18:25:16,465: xDoctor_4.4-24 - INFO: Validating System Version ...
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- xDoctor version is sealed to 4.4-24
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: |- System version is sealed to
1.2.0.0-417.6e959c4.75
2015-10-27 18:25:16,703: xDoctor_4.4-24 - INFO: Distributing xDoctor session files ...
2015-10-27 18:25:17,627: xDoctor_4.4-24 - INFO: Collecting data on designated nodes,
please be patient ... (update every 5 to 30 seconds)
2015-10-27 18:25:22,650: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:05
2015-10-27 18:25:32,698: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:15
2015-10-27 18:25:47,770: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:30
2015-10-27 18:26:07,870: xDoctor_4.4-24 - INFO: Collecting data ... at 0:00:50

Checking ECS health using xDoctor 61

Page 66 of 70
ECS OS offline post upgrade tasks

2015-10-27 18:26:10,283: xDoctor_4.4-24 - INFO: Waiting for local data collectors ...
2015-10-27 18:26:20,324: xDoctor_4.4-24 - INFO: All data collected in 0:01:02
2015-10-27 18:26:20,325: xDoctor_4.4-24 - INFO: -----------------
. . . . . . . . . .
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: -------------------------
2015-10-27 18:26:37,013: xDoctor_4.4-24 - INFO: xDoctor session_1445970315.896
finished in 0:01:21
2015-10-27 18:26:37,027: xDoctor_4.4-24 - INFO: Successful Job:1445970315 Exit Code:
141

2. Determine the report archive for the xDoctor session executed in the previous
step.

sudo -i xdoctor -r | grep -a1 Latest

For example:

sudo -i xdoctor -r | grep -a1 Latest


Latest Report:
xdoctor -r -a 2015-10-27_183001

3. View the latest xDoctor report using the output from the command in previous
step.
Add the -WEC option to display only "Warning, Error and Critical" events.

sudo -i xdoctor -r -a <archive date_time> -WEC

The following example shows a clean report with no events.

sudo -i xdoctor -r -a 2015-10-27_183001 -WEC

Displaying xDoctor Report (2015-10-27_183001) Filter:


['CRITICAL', 'ERROR', 'WARNING'] ...

The following example shows a report with an error:

sudo -i xdoctor -r -a 2015-10-27_210554 -WEC

Displaying xDoctor Report (2015-10-27_210554) Filter:['CRITICAL', 'ERROR',


'WARNING'] ...

Timestamp = 2015-10-27_210554
Category = health
Source = fcli
Severity = ERROR
Message = Object Main Service not Healthy
Extra = 10.241.172.46
RAP = RAP014
Solution = 204179

4. If the report returns any Warning, Errors, or Critical events resolve those events
before continuing this procedure.
All xDoctor reported Warning, Error, and Critical events must be resolved
before proceeding. Contact ECS Global Technical Support (GTS) or Field
Support Specialists for assistance as required.

62 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 67 of 70
ECS OS offline post upgrade tasks

5. Logout from the master node.

logout

Disconnect from the turtle switch


Disconnect the turtle switch when the service of the target nodes is complete.
Procedure
1. Log out from the host.
2. Disconnect the RJ-45 cable from the turtle switch.
3. If a node cable was disconnected to attach the service laptop, reconnect the
node cable.

Disconnect from the turtle switch 63

Page 68 of 70
ECS OS offline post upgrade tasks

64 Elastic Cloud Storage (ECS) 3.0 HF1 ECS OS Offline Update Guide

Page 69 of 70
Dell Technologies Confidential Information version: 2.3.6.90

Page 70 of 70

You might also like