IBM SmartCloud Control Desk - High Availability and Disaster Recovery Configurations sg248109
IBM SmartCloud Control Desk - High Availability and Disaster Recovery Configurations sg248109
Configure middleware
components for availability
Axel Buecker
Daniel McConomy
Alfredo Ferreira
Niraj Vora
ibm.com/redbooks
International Technical Support Organization
February 2013
SG24-8109-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page vii.
This edition applies to Version 7.5 of IBM SmartCloud Control Desk (product number 5725-E24)
and the IBM Tivoli Process Automation Engine Version 7.5.0.2.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
iv HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
4.4 Storage replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.5 Web server and load balancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.5.1 IBM HTTP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.5.2 Load balancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.6 Application server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.6.1 WebSphere Application Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.7 Integration framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.7.1 Service integration bus configuration . . . . . . . . . . . . . . . . . . . . . . . 164
4.7.2 WebSphere MQ configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.8 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.8.1 Database recovery techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.9 IBM SmartCloud Control Desk configuration. . . . . . . . . . . . . . . . . . . . . . 174
4.9.1 EAR configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.9.2 Database-related changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.10 Failover scenarios and testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.10.1 Switching sites gracefully . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.10.2 Disaster failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.11 Symptoms of failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.12 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Contents v
5.9.2 Primary site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.9.3 Secondary site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.10 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
vi HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your
local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not infringe
any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and
verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the
information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the materials
for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any
obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made on
development-level systems and there is no guarantee that these measurements will be the same on generally
available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual
results may vary. Users of this document should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as
completely as possible, the examples include the names of individuals, companies, brands, and products. All of
these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is
entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any
form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs
conforming to the application programming interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or
imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample
programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing
application programs conforming to IBM's application programming interfaces.
viii HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Preface
In today’s global environment, more and more organizations need to reduce their
downtime to the minimum possible and look for continuous availability of their
systems. Products based on the IBM® Tivoli® Process Automation Engine
(TPAE), such as IBM Maximo® Asset Management, Maximo Industry Solutions,
and IBM SmartCloud™ Control Desk, often play a role in such environments and
thus also have continuous availability requirements. As part of that, it is important
to understand the High Availability (HA) and Disaster Recovery (DR) capabilities
of IBM SmartCloud Control Desk and IBM Maximo Products, and how to assure
that all the components of an HA/DR solution are properly configured and tested
to handle outages. By outlining some of the topologies we have tested, and the
documentation we created, we hope to demonstrate how robust the IBM
SmartCloud Control Desk and IBM Maximo infrastructure can be.
Thomas Alcott, Ana Biazetti, Alex Chung, Pam Denny, Robert Dunyak, Belinda
Fuller, Samuel Hokama, Bruce Jackson, Thomas Lumpp, Leonardo Matos,
Markus Mueller, Steven Raspudic, Martin Reitz, Lohitashwa Thyagaraj,
x HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Cheryl Thrailkill
IBM
Aurelien Jarry
Le Groupe Createch, IBM Business Partner, Canada
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
Preface xi
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the
IBM Redbooks weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
xii HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Part 1
Applicability: Although this book was specifically created and tested for the
IBM SmartCloud Control Desk product, its concepts and configurations are
also valid for other IBM Maximo-based products, including IBM Maximo Asset
Management and its Industy Solution modules, which are based on the IBM
Tivoli Process Automation Engine.
The solution relies on the high availability of the underlying components, such as
web server, application server, database, LDAP server, and the IBM Tivoli
Process Automation Engine. A cluster manager can be used to monitor and
automate system failover.
You can configure the components of your solution environment for high
availability in various ways. Different high availability configurations handle
failover differently. For this reason, it is important to choose the correct
configuration to suit the needs for your organization.
4 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Service-level agreement
SmartCloud Control Desk is used to manage enterprise assets, IT
environments, and availability of systems. These tasks are commonly
referenced in service-level agreements (SLAs). Therefore, contractual
obligations can mandate a certain level of system availability to meet SLAs.
User satisfaction
Frequent and unexpected outages during system utilization can directly
impact user satisfaction. Users who rely on SmartCloud Control Desk for daily
operations may lose confidence in the solution if their productivity is affected.
Keep in mind: Distance may affect network latency and overall system
performance.
6 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
2
This first section introduces and explains high availability and disaster recovery
terminology.
8 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Hot standby
The software is installed and available on both the primary and secondary
nodes. The secondary system is up and running, but no transactions are
processed unless the primary node fails. Both systems have access to
identical data.
Warm standby
The software is installed and available on the secondary node, which is up
and running. If a failure occurs on the primary node, the software components
are started on the secondary node. Data is regularly replicated on the
secondary system or stored on a shared disk.
Cold standby
The secondary node acts as a backup for an identical primary system. The
secondary node is configured and installed only when the primary system
fails. Data is restored from the primary node and the secondary system is
restarted. Data is usually backed up and restored from an external storage
system.
Tip: For more information about designing highly available applications, refer
to:
http://pic.dhe.ibm.com/infocenter/tivihelp/v49r1/topic/com.ibm.mb
s.doc/gp_highavail/c_ctr_high_availability.html
10 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Configure standby systems on a remote site for disaster recovery.
Develop and test internal failover procedures.
You have several options for designing a disaster recovery plan. Based on
business need, you may decide to have a full backup to guard against any data
loss. A secondary site can serve as standby and take over the operations in case
of a disaster. Database and file system replication can be used to synchronize
data across both sites.
Important: The disaster recovery example is not meant to replace the system
backups. Backups should be taken at regular intervals in addition to the
disaster recovery solution.
2.2 Assumptions
The topologies described in this book are built on the following assumptions.
Storage
User authentication
Load balancer
2.2.1 Storage
To avoid a single point of failure, a data storage replication mechanism must be in
place to achieve local high availability for the required data. Cross-site data
replication should be in place for disaster recovery.
There are several forms of data sharing and replication; the most common are:
SAN mirroring with cross-site replication
IBM General Parallel File System (GPFS™)
Redundant Array of Independent Disks (RAID) for local high availability
Network File System (NFS) with replication
Tip: For more information about Tivoli Directory Server high availability for
IBM SmartCloud Control Desk, refer to
http://pic.dhe.ibm.com/infocenter/tivihelp/v49r1/index.jsp?topic=
/com.ibm.mbs.doc/gp_highavail/c_ctr_ha_directory_server.html
12 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
2.3 Configuration options
This publication covers three potential high availability and disaster recovery
configuration scenarios. It is important to review and weigh the pros and cons of
each configuration to help determine which solution is best for your organization.
Local high availability Fast failover times No protection from complete site
Protection from process and failure
system failure Not a valid disaster recovery
WAN communication topology
independent
Multisite active-passive Protection from complete site Secondary site is unused while
failure primary is active.
Database and file system High-speed reliable WAN link
replication can decrease between sites for replication is
complexity. required.
Scheduled downtime for system Secondary site will increase system
maintenance can be decreased infrastructure costs.
by switching to secondary site.
Persisted integration transaction
recovery on site failure
Upgrades to the Tivoli Process
Automation Engine itself are not
included. Examples of such
maintenance include
middleware fixpack updates,
OS updates, and hardware
upgrades.
Multisite hybrid-active Protection from complete site Potential increased licensing costs
failure as secondary site will be operational
Certain workloads can be Increased configuration complexity
distributed and shared amongst and potential for configuration error
both sites. Remote database connection could
decrease performance on
secondary site.
High-speed reliable WAN link
between sites for replication is
required.
Upon site failure, the entire workload
will now run on one site only and
could impact performance.
Potential loss of integration
transactions on site failure with
WebSphere Application Server
Service Integration Bus.
14 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Application
– IBM SmartCloud Control Desk 7.5.0.0
– IBM Tivoli Process Automation Engine 7.5.0.2
Messaging - WebSphere MQ 7.0.1.9
Introducing service (virtual) Internet Protocol (IP) addresses and hostnames can
provide transparency to connecting applications while eliminating the need for
reconfiguration upon failover. A cluster manager, such as IBM Tivoli System
Automation for Multiplatforms (SA MP), can be used to monitor and automate
failover when a system or process fails.
16 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
be passive, or offline, and does not process any transactions. The secondary site
should be designed with the same capacity and storage as the primary site.
All the components that are necessary to ensure that the application is functional
should be replicated to the secondary site. Replication techniques, such as file
system replication using Storage Area Network (SAN) mirroring or any other disk
storage mechanism, should be deployed to ensure that the secondary site is
synchronized with the primary site at all times.
For some organizations, a localized high availability solution does not provide
enough protection and requires a secondary site for disaster recovery. The
multisite active-passive topology can provide the benefits of a local high
availability environment combined with the added protection from a complete site
failure.
In the case of a disaster that would affect an entire site, a secondary site can be
brought online to continue operations where the primary left off. Database and
file-based replication mechanisms can be used to keep the standby site
synchronized with the primary.
An active-passive topology can also help reduce scheduled downtime for system
maintenance. If hardware upgrades or maintenance is required on the primary
site, the standby site can be brought online to continue processing while the
primary is down.
All components that are necessary to ensure that the application is functional
should be replicated to the secondary site using techniques as described here.
The middleware layer for WebSphere Application Server consists of two or more
18 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
separate cells dispersed across the sites. The separate cells manage their own
set of Java Virtual Machine (JVM) processes, which have to be maintained
independently. All the JVMs process transactions using the primary database
server. In case a primary site failure occurs, the JVMs can connect to the
secondary database server, which is replicated with the primary database server.
It is important to note that during normal operation the secondary database
server is in passive mode—it cannot be configured in read-write mode.
These topologies assume that there is a high-speed network link connecting both
sites with mirroring techniques to keep the necessary data synchronized.
Often, organizations who invest in a secondary site for disaster recovery like to
utilize these resources to take some of the processing load off the primary site.
Certain resources can be brought online on the secondary site to help balance
the load between the two sites. Although this scenario is often perceived to be
ideal, the increased configuration complexity and additional licensing costs are
often overlooked.
IBM SmartCloud Control Desk relies heavily on the database layer for
transactions. If the secondary site’s application is online, it needs to connect to
the primary site’s database because the standby database is only available in
read-only mode. For this reason, this is not considered a true active-active
configuration. This cross site connection to the database can cause performance
problems due to network latency.
20 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 2-3 shows an example of a hybrid-active topology that spans across
two sites.
2.4 Conclusion
This chapter outlined the various configuration examples for high availability and
disaster recovery for IBM SmartCloud Control Desk and IBM Maximo. By
understanding the pros and cons of each topology, an organization can begin to
design and implement a configuration that works for them.
Implementing a cluster manager into your topology will allow for automated
failure detection and failover. The cluster manager is highly customizable and
can dramatically decrease failover times versus manual methods. Introducing
service (virtual) Internet Protocol (IP) addresses can mask these system
failovers from connecting applications and users.
Local high availability is also a great place to start when trying to achieve a full
disaster recovery plan. For example, when multiple sites are introduced, having
local high availability on these sites can prevent the need to execute a full
disaster recovery procedure when there is just a component failure on the overall
solution.
26 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-1 Example of local high availability topology configured in this chapter
3.2 Prerequisites
Before following the directions outlined here for configuring IBM SmartCloud
Control Desk and middleware components, there are some prerequisites that
should be in place:
Storage
A highly available storage solution should be in place for attachments, search
index files, integration framework files and any other files that must be shared
amongst the nodes. Redundancy through storage replication and local
mirroring is a common method to ensure storage is not a single point of
failure. There are many ways to create storage redundancy that are not
covered specifically in this book.
28 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.3.1 Cluster manager concepts
Let us take a closer look at some of the components and concepts related to a
cluster manager:
Cluster
A group of connected systems (nodes) that work together as a single
functional system in the perspective of the user. Clustering allows servers to
back each other up when failures occur by picking up the workload of the
failed server.
Cluster member
A single node that is defined within the cluster
Cluster manager
An application or tool used to combine the nodes of a cluster and detect the
status of the processes as defined by cluster resources and policies. The
cluster manager drives the automated failover procedures and is highly
configurable to the administrator. Many cluster managers are available that
should function properly with IBM SmartCloud Control Desk, but our
examples all use Tivoli System Automation for Muliplatforms as the cluster
management tool.
IP address takeover
The ability to transfer an Ethernet interface's IP address from one machine to
another when a server fails; to a client application, the two machines appear
at different times to be the same server. Your cluster manager can be
configured to automatically apply this service IP (or virtual IP) to the active
node only. Upon failure, the cluster manager should remove the alias interface
(and service IP along with it) from the primary node and apply it to the
secondary node where services will be restored. Transactions and
connections to the applications are always through the service IP address so
the configuration of connecting components never has to change. In many
cases, the startup of services is dependent on the application of the service
IP.
Heartbeating
A communication mechanism implemented by the cluster manager on each
node to allow each system to detect whether other cluster members are alive
or down. The nodes will send heartbeats to each other as a low-level system
status.
Resources
Applications or pieces of hardware that can be defined to the cluster
manager. These can be manually defined, or some cluster managers such as
Tivoli System Automation for Muliplatforms can use the concept of harvesting
More information: These were just a few of the many concepts related to
cluster managers. For more information, review End-to-end Automation with
IBM Tivoli System Automation for Multiplatforms, SG24-7117.
30 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.3.2 Tivoli System Automation for Multiplatforms
Tivoli System Automation for Multiplatforms is the cluster manager used in the
local high availability configuration outlined in this chapter. Although System
Automation for Multiplatforms is the official acronym for Tivoli System Automation
for Multiplatforms, it is also referenced as TSAMP or TSA on many websites.
Tivoli System Automation for Multiplatforms comes prebundled with IBM DB2
v9.5 and later for use with the database components. For other products such as
IBM HTTP Server and IBM WebSphere Application Server, Tivoli System
Automation for Multiplatforms must be purchased and installed separately. To
install System Automation for Multiplatforms you can follow this procedure:
1. Obtain the Tivoli System Automation for Multiplatforms v3.2.2 installation
media and license from IBM Passport advantage if you are entitled. You can
also access a trial of Tivoli System Automation for Multiplatforms from:
http://www-01.ibm.com/software/tivoli/products/sys-auto-multi/
2. Launch the installSAM executable from the Tivoli System Automation for
Multiplatforms folder.
3. Follow the instructions to install System Automation for Multiplatforms.
4. Users that will be running System Automation for Multiplatforms for
monitoring or start/stop of services, must export CT_MANAGEMENT_SCOPE=2 as
a global environment variable. It is suggested to put this export command into
the startup profile for these users. Placing this command in /etc/profile, for
example, ensures that this is exported globally at system startup. Running env
| grep CT_MANAGEMENT_SCOPE should show that the variable is set.
As an optional step, there are some predefined policies you can install for
System Automation for Multiplatforms that can be used for monitoring and
configuring your automation. This example is for Linux and these policies can be
found at:
https://www-304.ibm.com/software/brandcatalog/ismlibrary/details?cat
alog.label=1TW10SA02
1. Download the policies from this link.
2. Install the RPM file by running:
rpm -ivh sam.policies-1.3.3.0-1138.i386.rpm
3. The policy files should now exist in /usr/sbin/rsct/sapolicies.
Additional policies for other platforms and products can be found at:
https://www-304.ibm.com/software/brandcatalog/ismlibrary/search?rc=T
ivoliSystemAutomation&catalog.start=0#rc=TivoliSystemAutomation#cata
log.start=0
Manual mode
When performing maintenance tasks on clusters controlled by SAMP, it may be
desirable to put System Automation for Multiplatforms into manual mode, which
allows you to stop and start services without System Automation for
Multiplatforms interference. Issue the command samctrl -M t to enter manual
mode. When maintenance is complete, samctrl -M f will enable System
Automation for Multiplatforms automation once again.
Important: After configuring the web server for high availability, make sure all
web interactions (web services and the user interface) utilize the service IP
address.
For this book the variables shown in Table 3-1 on page 33 are assumed. These
values are not mandatory for all installations and might vary in other
environments.
32 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Table 3-1 Variables
Name Description Value
34 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
1. Install Tivoli System Automation for Multiplatforms (SAMP) on both nodes.
3.3.2, “Tivoli System Automation for Multiplatforms” on page 31 outlines this
procedure.
2. Prepare the servers to run in an System Automation for Multiplatforms
domain. On both nodes run the command:
preprpnode ihshost1 ihshost2
3. Create the System Automation for Multiplatforms domain. On one of the
nodes run:
mkrpdomain IHS_SAMP_DOMAIN ihshost1 ihshost2
4. Start the new System Automation for Multiplatforms domain. On one of the
nodes run:
startrpdomain IHS_SAMP_DOMAIN
5. Running the lsrpdomain command should show that your domain is listed and
online; see Example 3-1.
Now it is time to create the resource groups and resources for IHS.
6. Create a new ihs.def file or modify and use the existing one in
/usr/sbin/rsct/sapolicies/ihs. Use the proper node names and ensure
that the scripts in the monitor, stop and start command paths exist;
Example 3-2.
8. Create the resource group in System Automation for Multiplatforms for IHS by
running:
mkrg ihs-rg
9. Create the IHS Application resource using your ihs.def file by running:
mkrsrc -f ihs.def IBM.Application
10.Add the IHS Application resource to the ihs-rg resource group by running:
addrgmbr -g ihs-rg IBM.Application:ihs-rs
11.Create the IHS ServiceIP resource using your ihsip.def file by running:
mkrsrc -f ihsip.def IBM.ServiceIP
12.Add the IHS ServiceIP resource to the ihs-rg resource group by running:
addrgmbr -g ihs-rg IBM.ServiceIP:ihs-ip
13.Create a network equivalency resource that can detect the status of the
nodes’ Ethernet interfaces by running:
mkequ ihs-ip-equ IBM.NetworkInterface:eth0:ihshost1,eth0:ihshost2
36 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
14.Create a dependency relationship that specifies that the ServiceIP depends
on the status of the network equivalency by running:
mkrel -p DependsOn -S IBM.ServiceIP:ihs-ip -G
IBM.Equivalency:ihs-ip-equ ihs-ip-rel-equ
15.Create a dependency relationship that specifies that the IHS application
depends on the status of the ServiceIP by running:
mkrel -p DependsOn -S IBM.Application:ihs-rs -G IBM.ServiceIP:ihs-ip
ihs-rs-rel-ip
16.Now that all the resources and relationships are created, run the lssam
command to view the status of the ihs-rg resource group.
17.You can now issue the chrg -o online ihs-rg command, which will change
the nominal status of the resource group to online. This will enable the
service IP on the first node and bring the IHS application online.
Figure 3-4 Example lssam output after switching the nominal status to online
19.If the ServiceIP and Application both show online in the status, then the IBM
HTTP Server should be up and running on the primary node. The service IP
should also be added as an alias to the Ethernet interface. Running ifconfig
on the active node should show the alias; refer to Example 3-4.
Example 3-4 Example ifconfig with service IP applied (eth0:0 in this case)
ti2022-l1:~ # ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:BC:70:A9
inet addr:9.12.5.169 Bcast:9.12.15.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:92730 errors:0 dropped:0 overruns:0 frame:0
TX packets:24333 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8758460 (8.3 Mb) TX bytes:2836407 (2.7 Mb)
38 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
20.You should now be able to access the IBM HTTP Server through
http://IHS_SVC_IP/.
21.Issuing the rgreq -o move ihs-rg command switches the resources to the
second node and they should go offline on the first node; see Figure 3-5.
Figure 3-5 Example of lssam output after moving the resources to second node
Now that the resources and policy are configured, youl need to add a tie
breaker. In a 2-node cluster configuration, when the nodes lose contact with
each other, they will not be able to figure out which one failed and which one
should obtain quorum. This example uses a network tie breaker to help
resolve this problem. When specifying a network tie breaker, we use an IP
address that should always be ping-able from the cluster nodes. The gateway
(router) is usually a good candidate for this.
22.Create the tie breaker resource by running this command:
mkrsrc IBM.TieBreaker Type="EXEC" Name="networktb"
DeviceInfo='PATHNAME=/usr/sbin/rsct/bin/samtb_net
Address=GATEWAY_IP Log=1' PostReserveWaitTime=30;
23.Activate this network tie breaker resource in the domain by running:
chrsrc -c IBM.PeerNode OpQuorumTieBreaker="networktb"
24.You can verify the status of this tie breaker by running the lsrsrc -c
IBM.PeerNode and lsrsrc -Ab IBM.TieBreaker commands; see Example 3-5.
Troubleshooting
If the IHS or service IP resources do not start or show in FAILED OFFLINE
status, you may have done something wrong when creating the resources.
40 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
The /var/log/messages file may have some information relating to the failed
startup of the service. You can also manually run the start command
/usr/sbin/rsct/sapolicies/ihs/ihs start IHS_ROOT to see whether it starts
this way. If not, there is most likely a problem with your ihs script or the IHS itself.
Starting IHS with IHS_ROOT/bin/apachectl start may also give some indication
of the problem.
The supported application servers for IBM SmartCloud Control Desk are IBM
WebSphere Application Server and Oracle WebLogic Server. In this section we
cover the setup steps to enable local high availability topology with WebSphere
Application Server, where we take a closer look at the following details:
WebSphere Application Server variables
WebSphere Application Server internal architecture
Installing WebSphere Application Server
Installing deployment manager
Automating deployment manager failover with SA MP
Troubleshooting
Installing application server profile on nodes
Automating nodeagent restart with SA MP
Federating web servers
Cluster configuration
42 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.5.3 Installing WebSphere Application Server
Before configuring WebSphere Application Server for high availability, the
product must be installed on all nodes. If any profile was created during
installation, remove it using the manageprofiles command.
Tip: For more information about the manageprofiles command, refer to:
http://pic.dhe.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/c
om.ibm.websphere.base.doc/info/aes/ae/rxml_manageprofiles.html
The installation of the deployment manager profile as Dmgr01 may only occur in
one node. Server washost1 will be used as shown in Example 3-6.
Now it is time to create the resource groups and resources for the deployment
manager.
6. If administrative security is enabled, modify the scripts installed by System
Automation for Multiplatforms policies in 3.3.2, “Tivoli System Automation for
Multiplatforms” on page 31. The original script,
/usr/sbin/rsct/sapolicies/was/wasctrl-dmgr, needs to be modified to
support username and password.
a. Add the following between lines 40 and 41 (Example 3-8).
USER=$3
PASSWORD=$4
44 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
b. Replace the current stop_server function with the example in
Example 3-9.
8. Create a new dmgr-ip.def file for the Service IP. Be sure to use the correct IP
address (the service IP you will use), netmask and node hostnames for your
environment.
9. Create the resource group in System Automation for Multiplatforms for the
deployment manager by running:
mkrg dmgr-rg
10.Create the deployment manager Application resource using your
dmgr-jvm.def file by running:
mkrsrc -f dmgr-jvm.def IBM.Application
11.Add the deployment manager Application resource to the dmgr-rg resource
group by running:
addrgmbr -g dmgr-rg IBM.Application:dmgr-jvm
12.Create the deployment manager ServiceIP resource using your dmgr-ip.def
file by running:
mkrsrc -f dmgr-ip.def IBM.ServiceIP
13.Add the deployment manager ServiceIP resource to the dmgr-rg resource
group by running:
addrgmbr -g dmgr-rg IBM.ServiceIP:dmgr-ip
14.Create a network equivalency resource that can detect the status of the
nodes’ Ethernet interfaces by running:
46 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
mkequ dmgr-ip-equ IBM.NetworkInterface:eth0:washost1,eth0:washost2
15.Create a dependency relationship which specifies that the ServiceIP
depends on the status of the network equivalency by running:
mkrel -p DependsOn -S IBM.ServiceIP:dmgr-ip -G
IBM.Equivalency:dmgr-ip-equ dmgr-ip-rel-equ
16.Create a dependency relationship which specifies that the deployment
manager depends on the status of the ServiceIP by running:
mkrel -p DependsOn -S IBM.Application:dmgr-jvm -G
IBM.ServiceIP:dmgr-ip dmgr-jvm-rel-ip
17.Now that all the resources and relationships are created, run the lssam
command to view the status of the dmgr-rg resource group, as shown in
Figure 3-7.
18.Ensure that the deployment manager is down and update its hostname to use
the service IP using the wsadmin tool, as shown in Example 3-12.
F (Finish)
C (Cancel)
19.You can now issue the chrg -o online dmgr-rg command, which will change
the nominal status of the resource group to online. This will enable the
service IP on the first node and bring the deployment manager online.
48 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
20.Run lssam again. It may show pending online while the services are brought
up. Running lssam again in a few moments should show that the application
and service IP are online on the first node, as shown in Figure 3-8.
Figure 3-8 Example lssam output after switching the nominal status to online
21.If the ServiceIP and Application both show online in the status, then the
deployment manager should be up and running on the primary node. The
service IP should also be added as an alias to the Ethernet interface. Running
ifconfig on the active node should show the alias; see Example 3-13 on
page 49.
Example 3-13 Example ifconfig with service IP applied (eth0:0 in this case)
ti2022-l3:~ # ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:BC:70:AB
inet addr:9.12.5.152 Bcast:9.12.15.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8364877 errors:0 dropped:0 overruns:0 frame:0
TX packets:12631835 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
22.You should now be able to access the Integrated Solutions Console through:
Figure 3-9 Example of lssam output after moving the resources to the second node
Now that the resources and policy are configured, add a tie breaker. When
specifying a network tie breaker, use an IP address that should always be
ping-able from the cluster nodes. The gateway (router) is usually a good
candidate for this.
24.Create the tie breaker resource by running this command:
mkrsrc IBM.TieBreaker Type="EXEC" Name="networktb"
DeviceInfo='PATHNAME=/usr/sbin/rsct/bin/samtb_net Address=GATEWAY_IP
Log=1' PostReserveWaitTime=30
25.Activate this network tie breaker resource in the domain by running:
chrsrc -c IBM.PeerNode OpQuorumTieBreaker="networktb"
For more information about cluster management, refer to 3.3.1, “Cluster manager
concepts” on page 29.
3.5.6 Troubleshooting
If the deployment manager or service IP resources do not start or show in the
FAILED OFFLINE status, you may have done something wrong when creating
the resources.
The /var/log/messages file may have some information relating to the failed
startup of the service. You can also manually run the start command
/usr/sbin/rsct/sapolicies/was/wasctrl-dmgr start WAS_DMGR_PATH to see if it
starts this way. If not, there is most likely a problem with your wasctrl-dmgr
50 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
script or the deployment manager itself. Starting the deployment manager with
WAS_DMGR_PATH/bin/startManager.sh may also give some indication of the
problem.
After installing both application server profiles, they must be federated with
deployment manager. To accomplish this, start deployment manager with the
chrg -o online dmgr-rg command.
After starting deployment manager, run the addNode command on all nodes. To
federate the washost1 node is shown in Example 3-15.
ADMU0306I: Note:
ADMU0302I: Any cell-level documents from the standalone ti2022-l3Cell01
configuration have not been migrated to the new cell.
ADMU0307I: You might want to:
ADMU0303I: Update the configuration on the ti2022-l3Cell01 Deployment Manager
with values from the old cell-level documents.
ADMU0306I: Note:
ADMU0304I: Because -includeapps was not specified, applications installed on
the standalone node were not installed on the new cell.
ADMU0307I: You might want to:
ADMU0305I: Install applications onto the ti2022-l3Cell01 cell using wsadmin
$AdminApp or the Administrative Console.
52 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
1. Install System Automation for Multiplatforms on both nodes. 3.3.2, “Tivoli
System Automation for Multiplatforms” on page 31 outlines this procedure.
2. Prepare the servers to run in an System Automation for Multiplatforms
domain. On both nodes run the command:
preprpnode washost1 washost2
3. If administrative security is enabled, modify the scripts installed by System
Automation for Multiplatforms policies in 3.3.2, “Tivoli System Automation for
Multiplatforms” on page 31. The original script,
/usr/sbin/rsct/sapolicies/was/wasctrl-na, needs to be modified to
support username and password:
a. Add the lines shown in Example 3-16 between line 60 and 61.
NA_USER=$5
NA_PASSWORD=$6
b. Replace the current stop case statement shown in Example 3-17. The
example only shows an excerpt of the complete script file, and the “...”
placeholder represents content that has been left out intentionally.
5. Create the resource group in System Automation for Multiplatforms for the
washost1 nodeagent by running:
mkrg nodeagent-washost1-rg
6. Create the washost1 nodeagent Application resource using your
nodeagent-washost1.def file by running:
mkrsrc -f nodeagent-washost1.def IBM.Application
7. Add the washost1 nodeagent Application resource to the
nodeagent-washost1-rg resource group by running:
addrgmbr -g nodeagent-washost1-rg IBM.Application:nodeagent-washost1
8. You can now issue the chrg -o online nodeagent-washost1-rg command,
which will change the nominal status of the resource group to online. This will
bring the nodeagent online.
54 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.5.9 Federating web servers
Federating the web servers into the WebSphere Deployment Manager allows
them to be mapped to the Tivoli Process Automation Engine modules during
deployment. Having the web servers in WebSphere also allows for generating
and propagating the web server plug-ins for load balancing.
Copy the configuration scripts generated during IBM HTTP Server installation in
“Installing IBM HTTP Server” on page 33 to WAS_DMGR_PATH/bin. Run all
configuration files copied as shown in Example 3-19.
Input parameters:
56 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Follow these steps to create the clusters and application servers:
1. Log into the Integrated Solutions Console and navigate to Servers
Clusters WebSphere application server clusters.
2. Select New.
3. Type SCCDUI for the Cluster name as shown in Figure 3-10.
4. Select Next.
5. Type SCCDUI1 for Member name as shownin Figure 3-11 on page 58.
6. Select Next.
7. Type SCCDUI2 for Member name as shown in Figure 3-12 on page 59.
58 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-12 Cluster member SCCDUI2 creation
8. Select Next.
9. A summary table (Figure 3-13 on page 60) is displayed with cluster
information. Review and select Finish.
After performing all the steps, the cluster panel should look as in Figure 3-14 on
page 61, and the application server panel should look as in Figure 3-15 on
page 61.
60 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-14 Clusters panel
3.6 Database
The database is one of the most critical components of the IBM SmartCloud
Control Desk application. It provides the option of using IBM DB2, Oracle or
Microsoft SQL Server for the deployment. The middleware installer program
provides the option of installing a new instance of DB2, or use a preexisting
instance of the DB2 database. If you choose Oracle or Microsoft SQL Server,
then you must install and configure them manually.
Bringing down the database will disrupt the IBM SmartCloud Control Desk
function. It is advised to test your database high availability solution extensively in
your environment. Various high availability database configurations are available.
It is suggested that you review IBM SmartCloud Control Desk high availability
documents to choose the optimum solution for your environment.
In this book we cover the high availability topology using IBM DB2 High
Availability and Disaster Recovery (HADR), DB2 shared disk, DB2 IBM
pureScale®, Oracle Real Application Clusters, and Oracle Active Data Guard.
62 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
This section describes three options for implementing a high availability solution
for IBM SmartCloud Control Desk: Cluster management with DB2 shared disk
and DB2 HADR, both with Automatic Client Reroute (ACR); for high availability
combined with the performance and scalability of an active cluster, DB2
pureScale can be implemented. Let us now take a closer look at the following
details:
DB2 setup
HADR setup
HADR requirements
HADR considerations
HADR setup
DB2 High Availability Instance Configuration Utility
HADR with a cluster manager
HADR setup with db2haicu
DB2 shared disk high availability setup
DB2 shared disk HA requirements
DB2 shared disk HA setup
DB2 pureScale
DB2 setup
It is assumed that the DB2 is already installed and configured for IBM
SmartCloud Control Desk application.
For this topology DB2 was installed on two separate servers. The first server
performed the role of the primary IBM SmartCloud Control Desk database, the
secondary server was the standby database. This data was kept synchronized
using shared disk or HADR.
HADR setup
The HADR feature provides a highly available solution for database failure.
HADR protects against data loss by replicating data changes from the primary
database to the secondary database.
All changes that take place at the primary database are written to the DB2 logs.
These logs are shipped to the secondary database server, where the log records
are replayed to the local copy of the database. This ensures that the data on the
primary and secondary database are in a synchronized state. The secondary
server is always in the rollforward mode, in the state of near readiness, so the
takeover to the standby server is fast.
HADR uses dedicated TCP/IP communication ports and a heartbeat to track the
current state of the replication. If the standby database is up to date with the
primary database, it is known as a HADR peer state.
HADR requirements
The following requirements must be in place to set up HADR:
The operating system version and patch level must be the same on the
primary and standby database server. For a short duration during the rolling
upgrade they may be different, but use caution.
The DB2 version, level and bit size (32-bit or 64-bit) must be identical on the
primary and standby database server.
The primary and standby database must have the same name. This means
that the two databases cannot be on the same server.
A reliable TCP/IP interface must be available between the HADR servers.
The database layout including the bufferpool sizes, tablespace name, size
and type, and log space must be identical on the primary and secondary
database servers.
HADR considerations
The following parameters should be considered for the HADR setup and adjusted
according to the needs:
AUTORESTART
Consider setting this db cfg parameter to OFF, when the HADR database is
configured with Automatic Client Reroute (ACR). Leave AUTORESTART to ON for
non-HADR environments.
LOGINDEXBUILD
This db cfg parameter should be set to ON so that index creation, recreation,
or reorganization on the tables are logged on the primary database and
replayed on the secondary database.
HADR_PEER_WINDOW
This is used to ensure data consistency. If the value is set to greater than
zero, the HADR database pair continues to behave as though they are in the
peer state, for the configured time in case the connection is lost between the
two databases.
The advantage of configuring the peer window is a lower risk of transaction
loss during multiple or cascading failures. The disadvantage of configuring the
peer window is that transactions on the primary database will take longer or
64 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
time out when the primary database is in the peer window waiting for the
connection to the standby database or for the peer window to expire.
This parameter must be adjusted according to the needs of your environment.
HADR_TIMEOUT
This db cfg parameter specifies the time in seconds that the DB2 HADR
database waits for response from the other database before it considers the
communication to have failed and closes the connection.
This parameter must be adjusted according to the needs of your environment.
SYNCMODE
This db cfg parameter specifies one of the three synchronization modes. The
synchronization modes indicate how log writing is managed between the
primary and secondary servers. These modes apply only when the HADR is
in the peer state. The valid values are:
– SYNC (Synchronous)
This mode provides the greatest protection against transaction loss, and
using it results in the longest transaction response time among the three
modes.
– NEARSYNC (Near synchronous)
While this mode has a shorter transaction response time than
synchronous mode, it also provides slightly less protection against
transaction loss.
– ASYNC (Asynchronous)
This mode has the highest chance of transaction loss if the primary
system fails. It also has the shortest transaction response time among the
three modes.
– SUPERASYNC (Super Async)
This mode ensures that the transaction can never be blocked or
experience elongated response times due to network interruption or
congestion, thereby allowing transactions to be processed more quickly.
In our example we used the value of SYNC for the parameter.
The installation path and other variables are listed in Table 3-3.
To set up HADR using the command line interface, complete the following steps:
1. Set the required database configuration parameters.
If archive logging is not configured already, then update the LOGRETAIN and
LOGARCHMETH1 parameters by running the following commands:
db2 update db cfg for DB2_DBNAME using LOGRETAIN recovery
db2 update db cfg for DB2_DBNAME using LOGARCHMETH1 LOGRETAIN
Set the LOGINDEXBUILD parameter so that the index creation and
reorganization operations are logged by running the following command:
db2 update db cfg for DB2_DBNAME using LOGINDEXBUILD ON
2. Back up the database on the primary node by running the following
command. The database backup should be an offline backup, which means
no user connections are allowed on the database.
db2 backup database DB2_DBNAME to BACKUP_PATH
3. Transfer the backup image to the secondary node.
4. Restore the database on the secondary server by running the following
command. The standby database must be in the Rollforward pending mode.
db2 restore database DB2_DBNAME from BACKUP_PATH taken at
BACKUP_TIMESTAMP replace history file
66 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Tip: Check the database Rollforward pending status issuing the db2 get
db cfg for DB2_DBNAME |grep "Rollforward pending" command.
Example 3-20 DB2 HADR db cfg update commands for the primary
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_HOST db2hadrhost1”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_HOST db2hadrhost2”
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_SVC db2hadrlocalsvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_SVC
db2hadrremotesvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_INST DB2_INSTANCE”
db2 “update db cfg for DB2_DBNAME using HADR_TIMEOUT 120”
db2 “update db cfg for DB2_DBNAME using HADR_SYNCMODE SYNC”
db2 “update db cfg for DB2_DBNAME using HADR_PEER_WINDOW 120”
6. Run the db2 "get db cfg for DB2_DBNAME" |grep HADR command.
Example 3-21 lists the db cfg parameter configuration for HADR on the
primary database.
Example 3-22 DB2 HADR db cfg update commands for the secondary
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_HOST db2hadrhost2”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_HOST db2hadrhost1”
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_SVC
db2hadrremotesvc”
8. Run the db2 "get db cfg for DB2_DBNAME" |grep HADR command.
Example 3-23 lists the db cfg parameter configuration for HADR on the
secondary database.
9. From DB2 version 9.7 Fixpack 5, the secondary database can be configured
with read-only access, which allows the application and users to run queries
and reports against the database. This step is optional.
Example 3-24 shows the commands to set the read-only access on the
secondary database.
Example 3-24 Commands to set the read-only access on the secondary database
db2set -i DB2_INSTANCE DB2_STANDBY_ISO=UR
db2set -i DB2_INSTANCE DB2_HADR_ROS=ON
68 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
12.Verify HADR status by running the following command.
db2pd -d DB2_DBNAME -hadr
Example 3-25 displays the HADR status.
HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Peer Sync 0 0
PeerWindowEnd PeerWindow
Wed Oct 31 17:49:25 2012 (1351720165) 120
LocalHost LocalService
ti2022-l5 55001
In this section, we describe how to configure HADR with IBM Tivoli System
Automation for Multiplatforms (SA MP) to enable automating HADR takeover.
Combining HADR with a cluster manager strengthens high availability for the IBM
SmartCloud Control Desk database.
The cluster manager monitors the health of the network interface, hardware, and
software processes, and detects and displays any failure. In case of failure the
cluster manager can transfer the service and all the resources to the secondary
server.
The primary and secondary server should be able to ping each other using
the hostnames and IP addresses.
The hostname resolution should be successful on the primary and secondary
server.
The service IP address must be available to configure the cluster manager.
Before using the db2haicu utility, the primary and secondary nodes must be
prepared. Run the following command as root on both servers. This command
needs to be run once per node.
70 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
preprpnode db2hadrhost1 db2hadrhost2
Once the nodes are prepared, log on to the secondary database server and
issue the db2haicu command as DB2 instance administrator. The following
section lists the setup tasks for db2haicu.
1. From the secondary database server, issue the db2haicu command; see
Example 3-27.
3. A quorum must be configured for the cluster domain. The supported quorum
type for this solution is the network quorum, which must be a ping-able IP
address (gateway router in the example) that is used to decide which node in
the cluster will act as the active node during failure, and which node will be
offline. Example 3-29 on page 72 shows the quorum creation. Enter 1 and
press Enter to create the quorum.
72 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
1. Yes
2. No
1
The following is a list of supported quorum device types:
1. Network Quorum
Enter the number corresponding to the quorum device type to be used:
[1]
1
Specify the network address of the quorum device:
9.12.4.1
Configuring quorum device for domain sccd_hadr_domain ...
Configuring quorum device for domain sccd_hadr_domain was
successful.
The cluster manager found 2 network interface cards on the machines
in the domain. You can use db2haicu to create networks for these
network interface cards. For more information, see the topic
'Creating networks with db2haicu' in the DB2 Information Center.
4. After the quorum configuration, define the public and private networks of your
system to db2haicu. This step is important for the cluster to detect network
failure. All network interfaces are automatically discovered by the db2haicu
tool. Example 3-30 shows the definition of a public network.
5. After the network definition, db2haicu prompts for the cluster manager
software being used for the current setup. Example 3-31 lists the selection of
cluster manager software, in this case SA MP.
74 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance
db2inst1
to TSA.
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
6. After the DB2 secondary instance resource has been added to the cluster
domain, confirm automation for the HADR database. Example 3-32 displays
the validation of the HADR configuration.
7. After the secondary instance has been configured, the db2haicu configuration
has to be run on the primary instance. Run the db2haicu command again on
the primary node as DB2 instance administrator. The first step is to select
cluster manager software for the setup. Example 3-33 shows the db2haicu
setup on the primary node.
8. db2haicu will then proceed to add the DB2 single partition resource for the
primary database to the cluster. Next it will prompt you for confirmation of
automating a HADR failover. Example 3-34 on page 77 shows the addition of
the HADR primary database to the domain.
76 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Example 3-34 Add HADR primary database to the domain
Do you want to validate and automate HADR failover for the HADR
database MAXDB75? [1]
1. Yes
2. No
1
Adding HADR database MAXDB75 to the domain ...
Adding HADR database MAXDB75 to the domain was successful.
9. Once the HADR database resource has been added to the cluster, db2haicu
will prompt you to create a virtual IP address. Example 3-35 shows the
addition of the service IP configuration for the cluster.
Ensure that the service IP address and subnet mask values are correct. All
invalid inputs will be rejected. The configuration for the cluster has been
completed. As root, issue the lssam command to see the resources created.
Figure 3-16 on page 78 lists the output of the lssam command. The resource
group should be listed online along with the status for various other
resources.
10.Set up the ACR feature on both DB2 database catalogs. The client reroute
feature allows a DB2 client application to recover from a lost database
connection in case of a network failure. In the high availability configuration
the service IP address is used as alternate server for the DB2 database
catalog. Example 3-36 displays the ACR setup. This command must be run
on both DB2 nodes.
Database 1 entry:
78 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname = 9.12.4.135
Alternate server port number = 60000
Database 1 entry:
This configuration assumes that there is a shared disk configured and available
for use between the active and passive DB2 servers. Figure 3-17 on page 80
displays the typical two-node setup using a shared disk.
In case the active node fails, all the DB2 resources on the shared disk are failed
over to the passive node. The cluster manager will automatically mount the
shared disk on the passive node and restart the? DB2 instance. At that time, the
second node becomes the primary database server.
80 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
These file systems cannot be made highly available by default:
– Shared file systems such as NFS
– Clustered file systems such as GPFS, CFS
– Any file system mounted on the root (/) directory
– Any virtual file system such as /proc
Important: The mount points for the shared disks must be defined to the
operating system being run on the active and the passive nodes
(/etc/fstab file for Linux and /etc/filesystems for AIX). Consult your
system administrator for details about other operating systems.
The DB2 instance owner name, owner ID, group name, and group ID should
be the same on all the nodes. In addition, it is required that the DB2 instance
owner password be the same on both nodes.
The DB2 installation should be performed on both nodes. The DB2 instance
and database should be created on the shared disk mounted on the active
node. Ensure that the /etc/services files on both the nodes match the DB2
entries.
The installation path and other variables are listed in Table 3-4.
Example 3-37 Cluster domain creation using db2haicu for shared disk
db2inst1@ti2022-i7:~> db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility
(db2haicu).
82 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
sccd_db2shared_domain
Nodes must now be added to the new domain.
How many cluster nodes will the domain sccd_db2shared_domain
contain?
2
3. Enter the hostnames of the active and passive nodes and confirm the domain
creation. Example 3-38 lists the domain creation.
4. A quorum must be configured for the cluster domain. The supported quorum
type for this solution is network quorum. A network quorum must be a
ping-able IP address (gateway router in the example) that is used to decide
which node in the cluster will act as the active node during failure.
Example 3-39 shows quorum creation. Type 1 and press Enter to configure
the quorum device.
5. After the quorum configuration, define the public network of your system to
db2haicu. This step is important for the cluster to detect network failure.
Example 3-40 shows an example of a public network setup.
84 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Adding network interface card eth0 on cluster node
ti2022-i7.itso.ibm.com to the network db2_public_network_0 was
successful.
Enter the name of the network for the network interface card: eth0
on cluster node: ti2022-l8.itso.ibm.com
1. db2_public_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card eth0 on
cluster node ti2022-l8.itso.ibm.com to the network
db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth0 on cluster node
ti2022-l8.itso.ibm.com to the network db2_public_network_0 ...
Adding network interface card eth0 on cluster node
ti2022-l8.itso.ibm.com to the network db2_public_network_0 was
successful.
6. After the network definition, db2haicu prompts for the cluster manager
software being used for the current setup. Example 3-41 displays the cluster
manager selection. For our example we selected SA MP.
7. The next step is to configure the failover policy for the instance db2inst1. The
failover policy determines the nodes on which the cluster manager will restart
8. Next db2haicu prompts you to designate any noncritical mount points. In our
example we chose to designate two such points. You may choose to add any
other noncritical mount points that you are sure that you never want to
failover. The list should include any mount points listed in /etc/fstab on Linux
or /etc/filesystem on AIX, except for the DB2 shared ones. Example 3-43
displays the noncritical mount point selection.
86 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
1. Yes
2. No
1
Enter the full path of the mount to be made non-critical:
/dev
Adding path /dev to the non-critical path list ...
Adding path /dev to the non-critical path list was successful.
Do you want to add more paths to the non-critical path list? [1]
1. Yes
2. No
2
9. Next specify the hostnames for the active and passive nodes. The db2haicu
utility will automatically add the DB2 nodes to the specified cluster manager.
Example 3-44 displays the selection of the hostnames for active and passive
nodes.
10.Once the database resource has been added to the cluster, db2haicu will
prompt you to create a service IP address. Example 3-45 lists the setup of the
service IP address.
Example 3-45 Service IP address setup for shared disk HA for DB2
Do you want to configure a virtual IP address for the DB2 partition:
0? [2]
1. Yes
2. No
1
Enter the virtual IP address:
9.12.4.167
Enter the subnet mask for the virtual IP address 9.12.4.167:
[255.255.255.0]
255.255.240.0
Select the network for the virtual IP 9.12.4.167:
1. db2_public_network_0
Ensure that the service IP address and subnet mask values are correct. All
invalid inputs will be rejected. The configuration for the cluster has been
completed.
11.Set up the ACR feature on both DB2 database catalogs. The client reroute
feature allows a DB2 client application to recover from a lost database
connection in case of a failure. In the HA configuration the service IP address
is used as alternate server for the DB2 database catalog. This command
should be run on the active database node. Example 3-46 displays the
example of an automatic client reroute.
Database 1 entry:
88 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
12.Run lssam as root from the active server to see the status of the cluster and
the new resource groups created during this process. Figure 3-18 displays
the lssam output.
DB2 pureScale
The DB2 pureScale feature incorporates several design features to deliver fault
tolerance that not only can keep your instance available, but also minimizes the
effect of component failures on the rest of the database system. DB2 pureScale
works as an active cluster, which is accessible through a single IP and helps to
achieve seamless failover. This solution gives your environment the performance
benefits of a load-balanced database and the reliability of a highly available
system.
You can implement DB2 pureScale to use these features and assist with your
high availability configuration. However, you must complete some IBM
SmartCloud Control Desk post-install configuration steps to use DB2 pureScale
with your product.
For more information about DB2 pureScale, visit the DB2 pureScale Information
Center at:
http://pic.dhe.ibm.com/infocenter/db2luw/v9r8/index.jsp
3.6.2 Oracle
If your environment uses Oracle as a database platform, high availability options
are available that are compatible with IBM SmartCloud Control Desk. Oracle
Real Application Clusters and Oracle Active Data Guard are two of Oracle’s high
You can create an Oracle RAC database across multiple nodes; however, you
must perform specific configuration tasks to ensure that Oracle RAC operates
smoothly with IBM SmartCloud Control Desk.
More information about Oracle RAC with IBM SmartCloud Control Desk can be
found at:
http://pic.dhe.ibm.com/infocenter/tivihelp/v49r1/topic/com.ibm.mbs.d
oc/gp_highavail/c_oracle_rac.html
Active Data Guard provides several protection modes for log shipping and
synchronization. It is important to research and determine which configuration
works best for your organization. For more information about Active Data Guard,
consult Oracle’s website at:
http://www.oracle.com/ha
90 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.7 IBM SmartCloud Control Desk
In this section we describe the steps to enable high availability for the IBM
SmartCloud Control Desk. These steps can also be used to configure other IBM
Maximo products that are based on the Tivoli Process Automation Engine.
For this book we used the variables shown in Table 3-5. These values are not
mandatory for all installations and might vary in other environments.
After modifying the IP address, rebuild and redeploy all the ear files. For a new
IBM SmartCloud Control Desk installation, enter the service IP address as the
database hostname and IP address instead of the hostname and IP address of
the primary database server.
92 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3. Add the following properties to maximo_UI.properties; Example 3-50.
7. Update the web_MIF.xml file and comment out the BIRT servlet
configurations, as shown in Example 3-52. The example only shows an
excerpt of the complete XML file; the “...” placeholder represents content that
has been left out intentionally.
94 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
-rw-r--r-- 1 root root 1398 Oct 25 16:46 deployment-application_UI.xml
<echo>properties.jar
file=${maximo.deploydir.temp}/${maximo.propertiesjarfile}</echo>
<copy todir="${maximo.deploydir.temp}/properties" >
<fileset dir="${maximo.basedir}/properties">
<include name="**/*.*"/>
<exclude name="maximo_*.properties"/>
<exclude name="version/*.*"/>
</fileset>
</copy>
...
96 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
example only shows an excerpt of the complete XML file; the “...” placeholder
represents content that has been left out intentionally.
# Changes IBM SmartCloud Control Desk default EAR build definition and properties
file
export BASE_DIR=./../applications/maximo
cp buildmaximoear_UI.xml buildmaximoear.xml
cp $BASE_DIR/properties/maximo_UI.properties $BASE_DIR/properties/maximo.properties
cp $BASE_DIR/mboweb/webmodule/WEB-INF/web_UI_CRON.xml \
98 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
$BASE_DIR/mboweb/webmodule/WEB-INF/web.xml
export BUILD_DIR=./default
export EAR_FILENAME=sccdui.ear
export MAXIMO_PROPERTIES=maximo.properties
...
# Changes IBM SmartCloud Control Desk default EAR build definition and properties
file
export BASE_DIR=./../applications/maximo
cp buildmaximoear_MIF.xml buildmaximoear.xml
cp $BASE_DIR/properties/maximo_MIF.properties $BASE_DIR/properties/maximo.properties
cp $BASE_DIR/mboweb/webmodule/WEB-INF/web_MIF.xml \
$BASE_DIR/mboweb/webmodule/WEB-INF/web.xml
export BUILD_DIR=./default
export EAR_FILENAME=sccdmif.ear
export MAXIMO_PROPERTIES=maximo.properties
...
# Changes IBM SmartCloud Control Desk default EAR build definition and properties
file
export BASE_DIR=./../applications/maximo
cp buildmaximoear_CRON.xml buildmaximoear.xml
export BUILD_DIR=./default
export EAR_FILENAME=sccdcron.ear
export MAXIMO_PROPERTIES=maximo.properties
...
After following these steps, generate new ear files running the three custom built
scripts.
100 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-19 sccdui.ear deployment
7. Select Next.
8. If there is a custom virtual host configuration, map it to the appropriate
modules and select Next.
9. Click Finish.
102 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
b. Server
SCCDCRON1
c. Value
The list built based on the SQL shown in Example 3-67
Tip: Depending on how many cron tasks are defined, the size of the
attribute MAXPROPVALUE.PROPVALUE may need to be increased.
6. Select Save.
7. Repeat steps 4 on page 102 through 6 for server SCCDCRON.
8. Connect to the IBM SmartCloud Control Desk database and run the query
shown in Example 3-68 to get all non-JMS cron tasks configured in the
environment.
13.Select Save.
14.Repeat steps 8 through 13 for server SCCDMIF2.
104 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
AllowOverride None
Order allow,deny
Allow from all
</Directory>
...
3. Restart IHS.
4. Repeat steps 1 on page 104 through 3 for all IHS servers.
5. Log in to IBM SmartCloud Control Desk and navigate to System
Configuration Platform Configuration System Properties.
6. Search for mxe.doclink.doctypes.defpath and set its value to
ATTACHMENTS_PATH.
7. Search for mxe.doclink.doctypes.topLevelPaths and set its value to
ATTACHMENTS_PATH.
8. Restart the IBM SmartCloud Control Desk application servers.
9. Navigate to any application with attachments, for example Service Desk
Incidents.
10.Select Select Action Attachment Library/Folders Manage Folders.
11.Update all existing Default File Path values to ATTACHMENTS_PATH; see
Figure 3-20 on page 106.
More information: For more information about the the Global Search
application, refer to:
http://pic.dhe.ibm.com/infocenter/tivihelp/v50r1/index.jsp?topi
c=%2Fcom.ibm.tusc.doc%2Fglobal_search%2Ft_gsearch_intro.html
106 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
3.8 Integration framework
Integration is an important part of IBM SmartCloud Control Desk, because it
allows inter-operation with other systems and data loads. The Maximo
Integration Framework (MIF) makes use of Java Message Service (JMS)
resources and file system-based error management to function properly. This
section describes the configuration necessary to make this feature highly
available.
For this book we used the variables in Table 3-5. These values are not
mandatory for all installations and might vary in other environments.
For local high availability topology, the SIB solution is faster to set up and
requires less resources, since its runtime is embedded in WebSphere
Application Server. The WebSphere MQ solution is more robust for
active-passive and hybrid-active topologies and requires less configuration
efforts when changing the environment.
Tip: The DB2_INSTALL_PATH for DB2 V9.7 default value for Linux is
/opt/ibm/db2/V9.7.
108 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
5. Type WAS_DMGR_PATH for Name and the deployment manager path for Value as
shown in Figure 3-21.
6. Select OK.
7. Save and synchronize changes.
8. Navigate to Resources JDBC JDBC providers.
9. Select Cell scope and select New.
10.Select DB2 as Database Type, DB2 Using IBM JCC Driver as Provider Type
and Connection pool data source as Implementation type, as shown in
Figure 3-22 on page 110.
11.Select Next.
12.Type ${WAS_DMGR_PATH}/jdbc as both paths as shown in Figure 3-23 on
page 111.
110 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-23 JDBC provider driver path setup
13.Select Next.
14.A summary table (Figure 3-24 on page 112) will be displayed with the
provider information. Review and select Finish.
Important: The user ID configured must have create schema and select,
insert, delete, and update privileges.
112 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
22.Type MAXDB75 as Data source name, jdbc/MAXDB75 as JNDI name, and select
Next.
23.Select DB2 Universal JDBC Driver Provider as an existing JDBC provider
and select Next.
24.Type MAXDB75 as Database name, database IP/hostname address as Server
name, database port as Port number, uncheck the “Use this data source in
container managed persistence (CMP)” option (Figure 3-25), and select Next.
Attention: Database name, address and port number might vary from
environment to environment. Check database information before creating the
data source
25.Select the maximo alias created in Figure 3-26 on page 114 for
”Component-managed authentication alias” and select Next.
26.A summary table (Figure 3-27 on page 115) will be displayed with the data
source information. Review and select Finish.
114 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-27 JDBC data source creation summary
37.Select Next.
38.Select the High Availability policy as shown in Figure 3-29 on page 117.
116 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-29 Bus member policy
39.Select Next.
40.Select Data Store and then Next.
41.Select SCCDMIF.000-intjmsbus.
42.Type jdbc/MAXDB75 as the data source JNDI name, select the maximo alias as
authentication alias and then select Next as shown in Figure 3-30 on
page 118.
43.Select Next.
44.Select Next.
45.A summary will be displayed with the bus member information. Review and
select Finish.
46.Save and synchronize changes.
47.Navigate to Service integration Buses intjmsbus Destinations.
48.Select New.
49.Select Queue and select Next.
50.Type cqinbd as Identifier and select Next.
51.Select Cluster=SCCDMIF as bus member and select Next.
52.A summary will be displayed with the destination information. Review and
select Finish.
53.Repeat steps 48 through 52 for destinations cqinerrbd, sqinbd and sqoutbd.
118 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
54.Save and synchronize changes.
55.Navigate to Resources JMS Queues.
56.Select Cell scope and then New.
57.Select Default messaging provider and then OK.
58.Type cqin as the name and jms/maximo/int/queues/cqin as the JNDI name.
59.Select intjmsbus as bus name, cqinbd as queue name and thenselect OK as
shown in Figure 3-31.
60.Repeat steps 56 on page 119 through 59 for queues cqinerr, sqin and sqout.
61.Save and synchronize changes.
120 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-33 Queue connection factory creation
122 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-35 intjmsacterr activation specification creation
WebSphere MQ configuration
The WebSphere MQ queue manager must be a highly available component to
avoid a single point of failure.
User mqm and group mqm have the same uid and gid, respectively, on both
servers.
The MQ_QM_PATH is shared and mounted on the same path on both servers.
3. Start the SCCDMIF queue manager using the strmqm command with the -x
parameter to allow a standby queue manager, as shown in Example 3-73.
124 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Log replay for queue manager 'SCCDMIF' complete.
Transaction manager state recovered for queue manager 'SCCDMIF'.
WebSphere MQ queue manager 'SCCDMIF' started.
4. Check the status of the queue manager using the dspmq command with the -x
-o all parameters to show details and multi-instance status as shown in
Example 3-74.
5. Create a file named mqsc_sccdmif.in that will be used to define the queue
manager listener and queues as shown in Example 3-75.
START LISTENER(LISTENER.TCP)
START CHANNEL(SYSTEM.ADMIN.SVRCONN)
Reminder: The MAXDEPTH values used are an example; these values can
be changed to fit environment characteristics.
7. Get the queue manager definition using the dspmqinf command with the -o
command argument as shown in Example 3-77.
126 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
10.Start the SCCDMIF queue manager using the strmqm command with the -x
parameter to start as a standby queue manager, as shown in Example 3-79.
11.Check the queue manager status again on any of the servers. mqhost2 will
now show as standby, as shown in Example 3-80.
Important: The native library path varies for each operating system. For
specific information, refer to:
http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/index.jsp?t
opic=%2Fcom.ibm.mq.csqzaw.doc%2Fja10340_.htm
3. Select OK.
4. Save and synchronize changes.
10.Select OK.
128 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
11.Repeat steps 6 on page 128 through 10 for the queues cqinerr, sqin, and
sqout (Figure 3-37).
12.Save and synchronize changes.
130 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
38.Navigate to Resources JMS Activation specifications.
39.Select Cell Scope and then New.
40.Select WebSphere MQ messaging provider and select OK.
41.Type intjmsact as name and JNDI name.
42.Select Next.
43.Type jms/maximo/int/queues/cqin as destination JNDI name and select
Next.
44.Select Enter all the required information into this wizard and select
Next.
45.Type SCCDMIF as queue manager.
46.Select Client as transport.
47.Type mqhost1 as hostname, 1414 as port and SYSTEM.DEF.SVRCONN as server
connection channel.
48.Select Next.
49.Select Test connection.
50.If the connection test result is not successful, try replacing mqhost1 by its IP
address. This value will be overridden later by the connection name list
parameter.
51.Select Next.
52.A summary will be displayed; review and select Finish.
53.Select the recently created connection factory intconfact.
54.Select Custom properties.
55.Select New.
56.Type connectionNameList as name and mqhost1(1414),mqhost2(1414) as
value.
57.Select OK.
58.Repeat steps 38 on page 131 through 57 for activation specification
intjmsacterr using jms/maximo/int/queues/cqinerr as destination JNDI
name.
59.Save and synchronize changes.
Other scenarios: Network failure and hardware failure testing should also be
considered but the results and symptoms are very similar to the overall system
failure testing. To simulate network failure, the Ethernet or fiber cable can be
disconnected. Hardware failures such as storage failure can be simulated by
carefully disconnecting them from the system, if possible. Although these
scenarios are not specifically covered, they should be considered when
testing.
System failure
By powering off the active node you can simulate an entire system failing.
System Automation for Multiplatforms can detect that the IBM HTTP Server
active system is offline and quickly restore services by bringing the passive node
online.
1. Run the lssam command on one of the nodes to check which node is active.
2. Open a terminal or ssh connection to the passive node to monitor the status
when the active system fails.
3. Power off the active system.
4. On the passive system, run the lssam command; it should look similar to the
output in Figure 3-39 on page 133. This shows how the first node is detected
132 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
as offline by System Automation for Multiplatforms and the services are being
restored on the second node.
Process failure
When the IBM HTTP Server process terminates unexpectedly, System
Automation for Multiplatforms will detect this and should attempt to bring the
process back online on the current node. This failover sequence is often very fast
and symptoms in the user interface are minimal.
1. Run the lssam command on one of the nodes to check which node is active.
2. Open a terminal or ssh connection on the active node to monitor the status
when the process fails.
3. Determine the process ID of the HTTP server. Note that there may be multiple
process IDs. On Linux, running ps -ef | grep httpd should show the
process IDs.
4. Kill the active processes for httpd to simulate process failure. Running kill
-9 and listing all process IDs would work, or using a command such as:
for pid in $(ps -ef |grep -v grep |grep httpd |awk '{print $2}'); do
kill -9 $pid; done
5. Run the lssam command to view the status of the cluster. System Automation
for Multiplatforms output should show that the HTTP server process is
pending online on the same node. Figure 3-40 on page 134 shows the lssam
output before and after killing the HTTP server processes.
6. The failover sequence should complete and the IBM HTTP server should
come back online on the same node. This procedure happens very fast, often
without users noticing it.
Graceful failover
Sometimes it may be desirable to change the active node from one system to
another. If the current active node requires maintenance or a reboot for example.
Forcing a graceful failover can push the active node to the second system and
users can continue to browse the application with minimal interruption.
1. Run the lssam command on one of the nodes to determine which is active.
2. Run the rgreq -o move ihs-rg command on either node to force a graceful
failover.
3. Run lssam again to see that the resources (including the service IP) are
moving to the second node.
4. Services should be restored quickly on the second node.
Symptoms of failover
Although the failover times will differ from one failure type to another, the
symptoms in the user interface should be similar for all.
134 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
1. From a web browser, connect to the IBM SmartCloud Control Desk through
the service IP or hostname.
2. Simulate a failure and try to retrieve records or interact with applications while
the cluster manager is performing the failover sequence. When the “User
Interface property setting” on page 104 is set, users in the UI should receive a
pop-up dialog explaining that communication has been lost from the server.
When communication is restored, they should be able to continue using their
session. Figure 3-41 shows an example of the dialog the user will receive.
When the second node is online and connection is re-established, another
dialog indicating the connection has been restored will show. The user can
now continue using the application with minimal interruption.
System failure
Simulating an entire system failure will prove that the deployment manager can
successfully failover to the second node and resume administrative operations
through the same service IP.
1. Run the lssam command on one of the nodes to determine which is the active
deployment manager node.
2. Open a terminal or ssh connection to the passive node to monitor the status
when the active system fails.
Process failure
Simulating an unexpected crash of the deployment manager process on the
active node will show how System Automation for Multiplatforms reacts. The
process should be quickly brought back online on the same node.
1. Run the lssam command on one of the deployment manager nodes to
determine which node is active.
2. Connect to a terminal or ssh session on the active node to monitor the status
3. Determine the process ID of the deployment manager. On Linux running ps
-ef | grep dmgr should show the process ID.
4. Kill the dmgr process by running kill -9 on the process ID.
5. Running lssam should show that the process is pending online and will be
brought online on the same node. Figure 3-43 on page 137 shows an
example lssam output for a process failure.
136 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-43 lssam output when the dmgr process is killed
Graceful takeover
If you need to move the active node to the second machine, you can perform a
graceful takeover.
1. Run the lssam command to determine which is the active deployment
manager node.
2. Run the rgreq -o move dmgr-rg command to force the deployment manager
to move to the second node.
3. Running lssam again should show that the deployment manager has moved
to the second node.
Symptoms of failover
The main symptom of failover for the deployment manager is a loss of connection
to the deployment manager console. Administrators who may be logged into the
console at the time of failover will receive an error in the web browser indicating
that the page cannot be displayed. When the failover sequence completes,
administrators should be able to log back into the deployment manager and
resume operations.
There may be a brief moment when the application server status does not show
properly in the deployment manager console. Synchronizing the nodes should
correct this problem.
System failure
If the system fails, System Automation for Multiplatforms will show the nodeagent
in a failed offline status. When the system comes back online, System
Automation for Multiplatforms attempts to restart the nodeagent process.
Figure 3-44 shows an example of lssam output when the nodeagent system fails.
The remaining nodeagent stays online unless it fails as well.
Process failure
If the nodeagent process on either server terminates unexpectedly, System
Automation for Multiplatforms will attempt to restart the process as soon as it
realizes this. The pending online status of the failed nodeagent will show until the
nodeagent process is back online.
System failure
When the WebSphere Application Server system fails, the corresponding
applications will no longer show online on the Deployment Manager console. The
corresponding nodeagent on the same machine will also fail with system failure,
so when the system comes back online the nodeagent may restart automatically,
depending on the policy you have configured. It is possible to have the
138 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
application server JVMs start with the nodeagent by configuring the monitoring
policy on the application server (Figure 3-45 on page 139). To configure the
application servers to start with the nodeagents, you can follow this procedure:
1. In the deployment manager console, log in and go to Servers Server
Types WebSphere application servers SERVER_NAME.
2. In the Server Infrastructure section, click Java and Process Management
Monitoring policy.
3. In the General Properties section, change the Node restart state to
RUNNING.
Symptoms of failover
When a user session is active on the WebSphere application server, this user
session is persistent to that application server. If this application server were to
fail or restart (Figure 3-46), the user session would be terminated and the user
will be directed to another application server. IBM SmartCloud Control Desk
does not support session failover, so the user will have to log back in to the
application.
Figure 3-46 Shows an error the user may receive when an application server fails
140 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
2. The messaging engine will be displayed. Its name (Figure 3-47) will be used
to query its current application server process through the wsadmin tool. For
this example, the value used is SCCDMIF.000-intjmsbus.
3. Create a script file as shown in Example 3-81. If needed, modify the meName
variable with the current messaging engine name. For this example, the script
file is WAS_DMGR_PATH/bin/sibMEProcess.py.
4. Run the script with the wsadmin tool; the output will show the application
server running the messaging engine, as shown in Example 3-82.
System failure
When the WebSphere Application Server system fails, the corresponding
application severs will no longer show online in the Integrated Solution Console.
If the messaging engine is running in one of the application servers affected, it
will need to failover to another application server.
Process failure
By default, the WebSphere Application Server Network Deployment HAManager
will attempt to restart any application servers that have crashed or terminated
unexpectedly. When a process fails, WebSphere should restart the application
server automatically. During its restart, the messaging engine should failover to
another application server in the cluster and provide access to JMS queues
normally.
Symptoms of failover
Due to database locking mechanisms, when an unexpected process termination
occurs, the connections holding the locks are not released. The solution for this
situation is tweaking DB2 server operating systemTCP “keep alive” parameters.
During messaging engine startup, it will try to obtain the lock on datastore tables
for 15 minutes. If not possible, the messaging engine will be disabled. Make sure
your TCP keep alive parameters are set to release these idle connections in less
than 15 minutes.
Tip: For more information about how the WebSphere Application Server
messaging engine works, refer to:
http://www-01.ibm.com/support/docview.wss?uid=swg27020333
142 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
If the application server that is running the JMS cron tasks is affected by a
system outage, the cron task will failover to another node in the cron task cluster.
This failover takes approximately 5 minutes to complete, so messages will not be
consumed until the cron task failover is complete.
System failure
This scenario was tested by powering down the primary database server. The
entire workload was transferred to the secondary database server by the cluster
manager. The following steps were executed:
1. Run the lssam command as root from either the primary or the secondary
database server. Figure 3-48 shows the output of the lssam command in the
normal operating environment.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one
of the application panels.
4. The IBM SmartCloud Control Desk session hangs for a short interval while
the cluster manager transfers the workload to the secondary server. In one of
the tests, the IBM SmartCloud Control Desk session was lost. In that case
relog in to the application and resume work. All the transactions that were not
committed would be lost or rolled back.
5. All the resources are now transferred to the secondary server. When the
primary server comes back up, the old primary server will be added back to
the cluster manager and monitored.
Process failure
This scenario was tested by simulating the DB2 server process failure. The
database server instance was shut down while the application was connected.
The cluster manager detected that the DB2 server process was down and
restarted the process. The following steps were executed:
1. Run the lssam command as root from either the primary or the secondary
database server. The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one
of the application screens.
3. Issue the db2_kill command to abruptly end all the DB2 server processes.
Run lssam as root user to list the status of the cluster. Figure 3-50 on
page 145 shows the lssam output during the DB2 server process failure.
144 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-50 lssam output during the DB2 server process failure
Symptoms of failover
When a database failover occurs, the IBM SmartCloud Control Desk application
will appear to hang until the database failover sequence is complete. When
service is restored, the user interface may show a brief database error: The
database connection failed and the record was not retrieved. Try the
operation again. If you experience repeated failures, check the log
files in the home directory or contact your system administrator.
Sometimes the user may receive a blank panel when using the application during
failover. Refreshing the browser page often corrects this problem. If the browser
System failure
This scenario can be tested by powering down the primary database server. The
entire workload should be transferred to the secondary database server by the
cluster manager.
1. Run lssam as root from either the primary or the secondary database server.
Figure 3-51 shows the output of the lssam command in the normal operating
environment.
2. Log on the IBM SmartCloud Control Desk application and navigate to one of
the application panels.
3. Shut down the primary database server. Run lssam on the secondary server
to see the behavior of the system. Figure 3-52 on page 147 shows the lssam
command output in case of a server failure.
146 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 3-52 lssam output in case of a server failure
4. All the resources are now transferred to the secondary server. When the
primary server comes back up, the old primary server will be added back to
the cluster manager and monitored.
Process failure
This scenario simulates the DB2 server process failure. The cluster manager
detects that the DB2 server process is down and restarts the process.
1. Run lssam as root from either the primary or the secondary database server.
The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one
of the application panels.
3. Issue the db2_kill command to abruptly end all the DB2 server processes.
Run lssam as root user to list the status of the cluster. Figure 3-53 displays
the lssam output in case of DB2 process failure in the shared disk setup.
Figure 3-53 lssam output during DB2 process failure in the shared disk setup
Graceful failover
This scenario can be tested by manually transferring the resources to the
secondary server. In case of a planned change the application resources can be
transferred to the secondary server while the primary server undergoes any
maintenance change.
1. Run lssam as root from either the primary or the secondary database server.
The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one
of the application panels.
3. Issue the rgreq -o move db2_db2inst1_db2inst1_0-rg command to move
the resources over to the secondary server.
4. All the DB2 resources are transferred to the secondary node. The DB2
application or the server can now be taken down for maintenance or changes.
Symptoms of failure
When a database failover occurs, the IBM SmartCloud Control Desk application
will appear to hang until the database failover sequence is complete. When
service is restored, the user interface may show a brief database error: The
database connection failed and the record was not retrieved. Try the
operation again. If you experience repeated failures, check the log
files in the home directory or contact your system administrator.
Sometimes you may receive a blank panel when using the application during
failover. Refreshing the browser page often corrects this problem. If the browser
session cannot be recovered, you may need to navigate back to the login page
and re-authenticate.
3.10 Conclusion
This chapter gave an overview and configuration examples for local high
availability. It described how to eliminate single points of failure in an IBM
SmartCloud Control Desk environment.
148 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
4
150 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 4-1 Active-passive disaster recovery topology
When selecting a second site for the disaster recovery data center, RTO and
RPO are major considerations. If there were an environmental disaster, you
would want to make sure your second site was far enough away that it would not
be affected. The distance, however, will affect the synchronization state of the
sites and could impact the RPO times. Two sites that are very close together
could potentially maintain a near synchronous state with a low RPO (assuming a
very fast WAN link). This can also be a dangerous situation because it creates a
higher probability that both sites will be impacted by a disaster. Spreading the
152 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
duration of the maintenance. This also gives administrators a chance to
ensure the disaster recovery plan still works as designed.
– Possibility of a disaster
Many disaster situations happen unexpectedly. Massive power outages,
earthquakes and hardware failure are examples of disasters that may
happen without any warning. Other types, such as hurricanes and other
weather-related disasters, you often have warning of. When there is the
possibility of these types of disasters it may be a good idea to failover to
another site that has less of a chance of being affected. For instance, if a
hurricane is heading your way, and you have a site far enough away,
switching services to this site should be considered.
Disaster
This is the main reason for the disaster recovery topology. Unpredictable
events can affect your primary site and completely bring down the service.
Weather, human error, hardware failures, malicious users are some of the
many types of problems that can bring down a site. When a disaster occurs, it
is time to execute the disaster recovery plan. Restoring services as quickly as
possible on the backup site will cause minimal impact on operations.
4.3 Prerequisites
There are many prerequisites to implementing a disaster recovery topology.
Some of these topics are covered in this IBM Redbooks publication and others
are assumptions. It is best to research which solutions are best for your
organizational needs.
Local high availability
Most disaster recovery topologies are supplemental to a local high availability
solution. Often, process and hardware failures can be corrected quickly
without the need for failover to a second site. Completely switching sites will
have an impact to users that cannot be masked like the high availability
solution. Please refer to Chapter 3, “Local high availability topology” on
page 25 for more information.
Load balancer
A load balancer can be used in front of the web servers to provide balancing
across several web servers and also provide a transparent access point for
users when a site failover occurs. There are hardware and software solutions
for load balancers but care should be taken to ensure this is not a single point
of failure. Having a load balancer that can detect when one site is offline can
help ease the transition in a disaster scenario. Most load balancing solutions
have high availability and disaster recovery options.
Appropriate licenses
When implementing an additional site into your environment it is important to
check your license agreements to make sure this is covered. Please check
with your IBM sales associate to review the license agreements for your
organizations. Additional licensing may be required for the new site.
Networking
The network link between the two sites is critical for synchronization of the
application and data. Network administrators should be involved in the
planning process and the network may need to be upgraded when connecting
a second site. Redundant network links could help avoid synchronization loss
when a single link fails.
154 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Other networking considerations such as DNS and routing can help reduce
the recovery time during a failover. For instance, if both sites have similar
networking and IP addresses can be rerouted to the second site, this can help
eliminate the need for reconfiguration when failover occurs. Full application
replication using technologies such as SAN Global Mirroring, will synchronize
the hostnames and IP addresses as well as the application configuration. The
ability to manipulate the networking and hostname resolution without having
to reconfigure the whole application can speed up the process.
Storage replication
When implementing a second site there are files that will get stored on the
primary and should be replicated to the passive site. Attached documents,
global search indexes, integration framework files are examples of such files.
If the primary site fails, these files are required on the standby site to continue
with full application functionality. An example storage solution is disk-based
mirroring as described in 4.4, “Storage replication” on page 156.
Some organizations choose to replicate the storage for the entire IBM
SmartCloud Control Desk environment from the active site to the passive site
instead of using the middleware replication mechanisms. This includes:
– Web server installation and configuration files
– Application server installation, configuration files, and profiles
– Database installation and database files
– Tivoli Process Automation Engine application installation files
– Any other middleware files used in the topology
This would allow for the second site to be an exact duplicate of the primary. It
is important that a robust storage mirroring solution be in place for this to work
effectively. Asynchronous versus synchronous mirroring can affect the RPO of
the site failover and should be considered. The distance between sites can
affect the synchronization ability of the storage solution.
A second site
An obvious prerequisite to a disaster recovery plan is a second site that can
take over from the primary when a disaster occurs. Careful consideration
should be taken when selecting a location. Sites too close could both be
affected in a disaster, but sites too far will not be able to synchronize as
quickly and could potentially have data loss.
Administrators with necessary skills
When implementing disaster recovery technologies such as database and file
system replication, the complexity of the IBM SmartCloud Control Desk
topology increases. Administrators who are familiar with these technologies
and posses the skills required to configure, maintain and test the
infrastructure are critical. Lack of coordination amongst the team can lead to a
failed disaster recovery.
In a typical disaster recovery topology using mirroring, both the sites are
equipped with the exactly same hardware configuration. The data is replicated or
mirrored using replication techniques like flashcopy, synchronous mirror or
asynchronous mirror. By using these mirroring techniques, a storage disk cluster
is setup so that an update performed on the primary site gets mirrored on the
secondary site. The storage volumes can be on the remote locations with high
speed WAN network link.
There are two modes for disk mirroring. Depending on your applications need,
distance and tolerance for data loss, one of the two modes can be selected.
Synchronous mirroring
In synchronous mirroring mode the application write would be committed on
the secondary site before the next write operation is permitted. This can affect
the performance over the WAN. The distance between two sites can impact
the performance of the writes.
Asynchronous mirroring
156 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
In asynchronous mirroring mode the application writes can be configured to
be written to the secondary site on the predefined interval. In this mode, all
the writes are stacked before they get written to the secondary site. This
provides better performance over the WAN. There is potential loss of data if
the primary site or the disks go offline before the writes completed on the
secondary site.
A mirror relationship is established between the two storage sites. The primary
role of this relationship allows for read/write access to the disk drives on the
primary site. The secondary role of the mirror relationship prohibits write access
to the secondary drive from any other host except the owning controllers. This
ensures that the data is in a synchronized state between the two sites. The data
is initially copied from the primary site to the secondary site. This process
requires an outage window. After the full synchronization is completed, the
updates are logged on the mirrored drive and replicated to the secondary site. If
the network communication was interrupted between the two sites, then mirror is
suspended until the communication is restored and all the updates are
transferred to the secondary site.
SAN: For more information about SAN mirroring features, refer to the following
IBM Redbooks publications:
IBM System Storage DS8000 Copy Services for Open Systems,
SG24-6788
IBM XIV Storage System: Copy Services and Migration, SG24-7759
IBM System Storage DS Storage Manager Copy Services Guide,
SG24-7822
SAN Volume Controller and Storwize V7000 Replication Family Services,
SG24-7574
IBM HTTP Server can be used for other IBM SmartCloud Control Desk
capabilities, such as serving attachments for the attached documents
functionality within many of the applications. These attachments are stored on a
file system and should be replicated to the second site if they are considered
critical. If you are using file-based replication for the HTTP server configuration,
these attached documents should be stored on the same paths on the backup
158 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
site so configuration changes on the IBM HTTP Server are not required. Even
when manually configuring the IBM HTTP Server on the second site, it is a good
practice to replicate the attached documents to the same location to keep the
configuration consistent.
It is important to note that IBM SmartCloud Control Desk does not support
session failover. If an application server fails and users are rerouted to another
application server JVM, they will need to log back in to a new session. Load
balancers must also be configured for sticky sessions. This means, when the
load balancer attaches a user to a specific server, this user must remain attached
to the same server throughout the duration of the session. This is referred to as
session affinity.
Important: If the SIB is being used for integration, the backup and restore
configuration method must be used to reflect the same UUIDs for messaging
engines and destinations on both environments
For this IBM Redbooks publications it is assumed the following variables shown
in Table 4-1. These values are not mandatory for all installations and might vary
in other environments.
160 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Name Description Value
Before starting, make sure all assumptions and prerequisites from “Installing
WebSphere Application Server” on page 43 are being met.
1. Login as WebSphere Application Server installation user on
primary_washost1.
2. Back up the current deployment manager and node profiles using the
manageprofiles command shown in Example 4-1.
162 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Example 4-5 Hostname updates
ti-2021-2:WAS_HOME/bin # ./wsadmin.sh -lang jython -conntype NONE
WASX7357I: By request, this scripting client is not connected to any server process.
Certain configuration and application operations will be available in local mode.
WASX7031I: For help, enter: "print Help.help()"
wsadmin>AdminTask.changeHostName('[-nodeName primary_washost1CellManager01 -hostName
secondary_washost1 ]')
''
wsadmin>AdminTask.changeHostName('[-nodeName primary_washost1Node01 -hostName
secondary_washost1 ]')
''
wsadmin>AdminTask.changeHostName('[-nodeName primary_washost2Node01 -hostName
secondary_washost2 ]')
''
wsadmin>AdminTask.changeHostName('[-nodeName primary_ihshost1-node -hostName
secondary_ihshost1 ]')
''
wsadmin>AdminTask.changeHostName('[-nodeName primary_ihshost2-node -hostName
secondary_ihshost2 ]')
''
wsadmin>AdminConfig.save()
''
wsadmin>exit
Storage mirroring
This section describes an optional topology that utilizes a disk replication system
for IBM WebSphere Application Server. There are three types of data that are
important to be captured and replicated to the secondary site:
Installation data associated with the WebSphere product.
Configuration data associated with the application and the resources needed
to run them.
Run data associated with the specific instance of process and business data.
The rationale for dividing the data into these consistency groups is that the
actions that make the data inconsistent are different for each group:
The install data for each of the servers is included in the same consistency
group.
The configuration data for each of the profiles in the cell is included in the
same consistency group.
The run data for each of the profiles in the environment is included in the
same consistency group.
It is important to ensure that the write order is preserved on both sites to maintain
the consistency of the data. For more information about the disaster recovery
setup of the WebSphere Application Server using disk mirroring refer to:
http://www.ibm.com/developerworks/websphere/library/techarticles/080
9_redlin/0809_redlin.html
This section describes the necessary configuration for each of these JMS
providers.
164 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 4-2 Active-passive SIB configuration
To configure the SIB, follow the steps outlined in “Service integration bus
configuration” on page 108 using secondary site hosts and IP addresses. On
step 24 on page 113, set the secondary site database hostname/IP address.
The datastore used by SIB will be replicated to the passive site using database
replication techniques as described in 4.8.1, “Database recovery techniques” on
page 167.
4.8 Database
Disaster recovery for an enterprise application means that all critical business
operations are recovered in case of any disaster or site wide outage. Some
organizations have little or no tolerance for data loss, in which case the disaster
recovery solution needs to be deployed to restore data to the applications rapidly.
The solution must ensure the consistency of the data, allowing for restoration of
the systems and applications reliably and fast.
The database is one of the most critical components of the IBM SmartCloud
Control Desk application. It provides the option of using IBM DB2, Oracle or
Microsoft SQL Server for the deployment. The middleware installer program
provides the option of installing a new instance of DB2 without high availability, or
166 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
using a preexisting instance of the DB2 database. If you choose Oracle or
Microsoft SQL Server, then you must install and configure them manually.
Bringing down the database will disrupt the IBM SmartCloud Control Desk
function. There are various disaster recovery database configurations available.
This chapter describes the passive disaster recovery setup for IBM SmartCloud
Control Desk using DB2 and Oracle databases. We describe some of the
replication features of DB2 and Oracle and how to maintain database
consistency using the disk mirroring techniques.
It is important to understand that with DB2 v9.7.x there can only be one HADR
standby database. For this reason, you cannot configure DB2 HADR for local
high availability as well as HADR across sites. For local high availability, the DB2
Figure 4-4 DB2 disaster recovery setup across two remote locations
If local high availability on the disaster recovery site is not required, a single node
can be used instead of a shared disk cluster on both sites. Figure 4-5 shows an
example of a single node disaster recovery standby.
168 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
The installation path and other variables are listed in Table 4-2.
The database on the primary site can be a standalone database with HADR
providing data synchronization on the secondary passive site. The following
topology describes a DB2 shared disk high availability setup on the primary site
with HADR providing data synchronization on the secondary passive site. For
more information about the DB2 shared disk high availability setup, refer to “DB2
shared disk high availability setup” on page 79.
To set up HADR for DB2 in the disaster recovery setup using the command line
interface, complete the following steps:
1. Set the required database configuration parameters on the primary database
server.
If archive logging is not configured already, then update the LOGRETAIN and
LOGARCHMETH1 parameters by running the following commands:
db2 update db cfg for DB2_DBNAME using LOGRETAIN recovery
db2 update db cfg for DB2_DBNAME using LOGARCHMETH1 LOGRETAIN
Set the LOGINDEXBUILD parameter so that the index creation and
reorganization operations are logged by running the following command.
db2 update db cfg for DB2_DBNAME using LOGINDEXBUILD ON
2. Back up the database on the primary node by running the following
command. The database backup should be an offline backup, which means
no user connections are allowed on the database.
db2 backup database DB2_DBNAME to BACKUP_PATH
Tip: Check the database Rollforward pending status issuing the db2 get db
cfg for DB2_DBNAME |grep “Rollforward pending” command.
Example 4-6 DB2 HADR setup in disaster recovery mode on the primary database server
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_HOST db2_sd_svcip”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_HOST db2drhost”
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_SVC db2hadrlocalsvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_SVC db2hadrremotesvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_INST DB2_INSTANCE”
db2 “update db cfg for DB2_DBNAME using HADR_TIMEOUT 120”
db2 “update db cfg for DB2_DBNAME using HADR_SYNCMODE SYNC”
db2 “update db cfg for DB2_DBNAME using HADR_PEER_WINDOW 120”
6. Run the db2 “get db cfg for DB2_DBNAME” |grep HADR command.
Example 4-7 lists the db cfg parameter configuration for HADR on the primary
database.
Example 4-8 DB2 HADR setup in disaster recovery mode on secondary database server
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_HOST db2drhost”
170 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_HOST db2_sd_svcip”
db2 “update db cfg for DB2_DBNAME using HADR_LOCAL_SVC db2hadrremotesvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_SVC db2hadrlocalsvc”
db2 “update db cfg for DB2_DBNAME using HADR_REMOTE_INST DB2_INSTANCE”
db2 “update db cfg for DB2_DBNAME using HADR_TIMEOUT 120”
db2 “update db cfg for DB2_DBNAME using HADR_SYNCMODE SYNC”
db2 “update db cfg for DB2_DBNAME using HADR_PEER_WINDOW 120”
8. Run the db2 “get db cfg for DB2_DBNAME“ |grep HADR command.
Example 4-9 lists the db cfg parameter for HADR on the secondary db server.
9. From DB2 version 9.7 Fixpack 5, the secondary database can be configured
with read-only access, which allows the application and users to run queries
and reports against the database. This step is optional. Execute the steps
only if the read only access is provided for the secondary database for
reporting needs only.
Example 4-10 shows the commands to set the read-only access on the
secondary database.
Example 4-10 Commands to set the read-only access on the secondary database.
db2set -i DB2_INSTANCE DB2_STANDBY_ISO=UR
db2set -i DB2_INSTANCE DB2_HADR_ROS=ON
HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Peer Sync 0 4147958
PeerWindowEnd PeerWindow
Thu Nov 8 12:10:25 2012 (1352394625) 120
LocalHost LocalService
9.12.4.167 55001
172 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Storage mirroring
This section describes a topology that utilizes a disk replication system for the
DB2 database server. There are three types of data that are important to be
captured and replicated to the secondary site:
Installation data associated with the DB2 installation product
Configuration data associated with the application and the resources needed
to run them
Run data associated with the specific instance of process and business data
The data can be divided into three independent but related consistency groups or
a single consistency group. In some cases the consistency group should be
expanded to include data from the WebSphere Application server. The rationale
for including the data for all the servers in a single consistency group is that it is
required by all the data to be consistent. In some cases, as the number of nodes
grows, it may be important to limit the number of consistency groups.
The rationale for dividing the data into these consistency groups is that the
actions that make the data inconsistent are different for each group:
The install data for each of the servers is included in the same consistency
group. This includes the DB2 installation folder on all the servers.
The configuration data for each of the nodes is included in the same
consistency group. This includes the DB2 instance and associated home
directory, which is needed for the operation of the DB2 server.
The run data for each of the databases in the environment is included in the
same consistency group. This includes the DB2 tablespace devices,
database backups, and database logs.
It is important to ensure that the write order is preserved on both sites to maintain
the consistency of the data.
Active Data Guard can be used as a local high availability solution or can be
extended across sites for disaster recovery. Active Data Guard can also be
combined with Oracle Real Application Clusters (RAC) technology for
performance, high availability, and disaster recovery.
For this book the variables shown in Table 4-3 are assumed. These values are
not mandatory for all installations and might vary in other environments.
174 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
2. Modify mxe.db.url property on maximo _UI_secondary.properties to use
secondary site DB addresses as described in 4.9.2, “Database-related
changes” on page 176.
3. Repeat steps 1 through 2 for the files maximo_MIF.properties and
maximo_CRON.properties.
export BUILD_DIR=./default
export EAR_FILENAME=sccdui_primary.ear
export MAXIMO_PROPERTIES=maximo.properties
6. Modify the properties file replace command and EAR file name for
buildmaximoear_UI_secondary.sh as shown in Example 4-14.
export BUILD_DIR=./default
export EAR_FILENAME=sccdui_secondary.ear
export MAXIMO_PROPERTIES=maximo.properties
8. Deploy primary EAR files on the primary site environment and secondary
EAR files on the secondary site environment, as outlined in “Ear file
deployment on WebSphere Application Server” on page 100.
If the primary and secondary sites’ WebSphere Application Server and Tivoli
Process Automation Engine application files are constantly synchronized using
storage replication, then the maximo.ear file may not be able to be preconfigured
176 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
to the second site’s database hostname or IP address. Another option is to use
only hostnames for the database in the maximo.properties file and edit the hosts
file on the second site’s WebSphere Application Server servers to eliminate the
need for application redeployment. If the second site can resolve the same
hostname to the IP address of the standby database, this can help to reduce
failover time and eliminate the need for application redeployment. It is important
to review and plan this with the network administrators for both sites because
network reconfiguration may not be possible. In this case, a redeployment of the
application is required.
Tip: In all cases, the secondary site’s application servers should not be
running until a failover is required.
When a site switch or failover occurs, the standby database becomes the primary
and the IBM SmartCloud Control Desk can be brought online. When developing
a disaster recovery plan, it is important to ensure that the standby database on
the passive site becomes the primary before attempting to start the application.
Tip: IBM System Automation Application Manager (AppMan) can help simplify
the DB2 HADR failover to the secondary site based on using System
Automation for Multiplatforms clusters on both sites. The AppMan site switch
is triggered by an operator that then uses the policies defined to start DB2 and
perform the HADR takeover commands. Although this configuration is not
included in this book, more information can be found at:
http://www-304.ibm.com/software/brandcatalog/ismlibrary/details?c
atalog.label=1TW10SA08#
178 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
d. The servers should stop and the status should update, showing offline;
see Figure 4-6.
3. Shut down the WebSphere Application Server nodeagents with the following
command for each nodeagent:
– With System Automation for Multiplatforms (for all nodeagent resource
groups):
chrg -o offline nodeagent-nodename-rg
– Without System Automation for Multiplatforms (for all nodes):
$WAS_HOME/profiles/profile_name/bin/stopNode.sh -user userid
-password password
4. Shut down the WebSphere Application Server Deployment Manager by
running:
– With SA MP:
chrg -o offline dmgr-rg
– Without SA MP:
$WAS_DMGR_PATH/bin/stopManager.sh -user userid -password password
5. If using WebSphere MQ, the server should also be stopped. Run the following
command on the active server:
endmqm -w SCCDMIF
HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Peer Sync 0 0
PeerWindowEnd PeerWindow
Mon Nov 12 18:53:11 2012 (1352764391) 120
LocalHost LocalService
ti-2021-3 55002
180 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
StandByFile StandByPg StandByLSN
S0000083.LOG 3394 0x00000000C7082624
8. If using WebSphere MQ, start the active and standby servers by running:
strmqm -x SCCDMIF
9. Start the Deployment Manager by running:
– With SA MP:
chrg -o online dmgr-rg
– Without SA MP:
$WAS_DMGR_PATH/bin/startManager.sh
10.Start the nodeagents by running the following command for each nodeagent:
– With System Automation for Multiplatforms (for all nodeagent resource
groups):
chrg -o online nodeagent-nodename-rg
– Without System Automation for Multiplatforms (for all nodes):
$WAS_HOME/profiles/profile_name/bin/startNode.sh
11.Start the IBM HTTP Servers by running:
– With SA MP:
chrg -o online ihs-rg
– Without SA MP:
$IHS_ROOT/bin/apachectl start
12.Start the WebSphere Application Server clusters, which will start the
application server JVMs:
a. Log in to the Integrated Solutions Console for WebSphere Application
Server as the WebSphere administrative user.
b. Go to Servers Clusters WebSphere application server clusters.
c. Select all the clusters from the list and click Start or Ripplestart.
13.The servers should start and the status should update showing online; see
Figure 4-7 on page 182.
14.If using WebSphere Application Server SIB, verify that the Messaging Engine
comes back online on the second site:
a. Log in to the WebSphere Integrated Solutions Console.
b. Navigate to Service integration Buses intjmsbus Messaging
Engines.
c. Verify the Messaging Engine status is online; see Figure 4-8.
182 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
15.If using WebSphere MQ, run the following command to check the status of the
queue managers:
dspmq -x -o all
16.If using a load balancer, ensure that the load balancer redirects users to the
new site.
If the failover operation succeeds, it is a good idea to document the time and any
notes that may be important for future reference. After the failover you should
perform the same procedure again to fail back to the primary and make sure it
works in both directions.
Optional step: It is good practice to warn the users when there will be a site
switch to minimize the impact on operations. The Bulletin Board application in
IBM SmartCloud Control Desk is one way to notify users:
1. Log in to the IBM SmartCloud Control Desk application as an administrative
user.
2. On the Go To Applications menu, click Administration Bulletin Board.
3. Click the New Message icon.
4. Fill out the message form (Figure 4-9) and select the appropriate dates. Try to
give users plenty of notice. You can leave the Organizations, Sites and Person
Groups empty to send to all users or specify an audience.
The message should now show on the Bulletin Board on users’ Start Centers
during the time period specified.
184 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Database Partition 0 -- Database MAXDB75 -- Active -- Up 4 days 17:03:44 -- Date
11/13/2012 11:22:02
HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Disconnected Sync 0 0
PeerWindowEnd PeerWindow
Null (0) 120
LocalHost LocalService
ti-2021-3 55002
4. If using WebSphere MQ, start the active and standby servers by running:
strmqm -x SCCDMIF
5. Start the Deployment Manager by running:
– With SA MP:
chrg -o online dmgr-rg
– Without SA MP:
$WAS_DMGR_PATH/bin/startManager.sh
6. Start the nodeagents by running the following command for each nodeagent:
– With System Automation for Multiplatforms (for all nodeagent resource
groups):
chrg -o online nodeagent-nodename-rg
– Without System Automation for Multiplatforms (for all nodes):
$WAS_HOME/profiles/profile_name/bin/startNode.sh
10.If using WebSphere Application Server SIB, verify that the Messaging Engine
comes back online on the second site:
a. Log in to the WebSphere Integrated Solutions Console.
b. Navigate to Service integration Buses intjmsbus Messaging
Engines.
c. Verify that the Messaging Engine status is online; see Figure 4-11.
186 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 4-11 Messaging engine status
11.If using WebSphere MQ, run the following command to check the status of the
queue managers (Example 4-18):
dspmq -x -o all
12.If using a load balancer, ensure that the load balancer redirects users to the
new site.
13.Log in and test the application.
If the failover operation succeeds, it is a good idea to document the time and
any notes that may be important for future reference.
When the original site becomes available again, you will need to reconnect
the HADR back to a peer state for synchronization.
14.On the primary DB2 site, start the DB2 instance by running db2start as the
instance administrator. If you are using System Automation for Multiplatforms
to manage DB2, then the DB2 services should come back automatically if the
nominal status is online.
15.As the DB2 instance administrator, enable HADR by running:
db2 deactivate database DB2_DBNAME
db2 start hadr on database DB2_DBNAME as standby
16.Run the following command and ensure that the State is back in peer. It may
take some time to go back to the peer state depending on how much data
needs to synchronize.
db2pd -db DB2_DBNAME -hadr
After the failover, perform the same procedure again to fail back to the primary
and make sure it works in both directions.
When the site switch is complete and services (cron tasks, integrations, web
services, and others) are restored, the load balancer should now send users to
the active site. Depending on the networking and load balancer configuration,
users should be able to access the same web address and continue using the
application.
4.12 Conclusion
Adding a second site as a backup to your IBM SmartCloud Control Desk
topology can allow administrators to restore services in case of a site failure or
disaster. The active-passive topology is a sort of insurance policy when
maintaining system availability is critical. By developing a disaster recovery plan,
an organization can restore essential services and reduce downtime.
188 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
5
190 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Important: Implementing DB2 Automatic Client Reroute (ACR) or hosts file
changes can speed up the recovery by eliminating the need for application
redeployment on the second site after failover.
A load balancer is required for the topology to distribute users across the sites.
This load balancer should itself be highly available so it does not act as a single
point of failure. Even though the application server load is distributed to both
sites, database load is all on one site.
There is also the possibility of processing background tasks other than user
sessions exclusively on the second site. For instance, if the cron tasks are
segregated from the user interface JVMs, the cron task cluster can be offline in
one site and online in the second site. Be careful to ensure that these processes
can be brought online on the opposite site in case of a site failure. If a disaster
were to occur, each site should be able to take over all processing from the other
and handle full capacity.
192 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
higher chance of losing messages during a failover. In a disaster scenario,
complexity may prove to be a serious issue. Complexity may have a negative
effect on the recovery time.
Licensing
License agreements should be reviewed with your IBM sales representative.
Some middleware components may have specific limited usage agreements
depending on the type of license your company has. Having two sites active,
even if only at the application server level, might increase licensing costs. This
should be reviewed during the planning phase.
Integration
Integrating with external systems using IBM SmartCloud Control Desk’s
integration framework and WebSphere SIB will become more difficult to
manage in a failure scenario. SIB is limited to the scope of a WebSphere cell,
and stretching a cell across multiple sites is not recommended. Therefore, if
two sites are using independent SIB messaging engines, there will potentially
be stuck or lost transactions if a site fails with messages in the queue.
WebSphere MQ can be used to help with messaging high availability and
guaranteed delivery.
If a hybrid-active topology is the right choice for your organization, this chapter
offers information and configuration examples for this setup.
Because IBM SmartCloud Control Desk does not support a fully active-active
configuration with independent databases in both sites, both sites will need to
point to the same database server. If Site B is pointing to the database on Site A
and A fails, reconfiguration of the application on B will take time. Rebuilding and
redeploying the application, restarting application servers and redirecting any
integrations can all increase the recovery time. Reconfiguring the networking, or
modifying the host files on the recovery site can help speed up this process.
Additionally, implementing DB2 ACR can help speed up the recovery by allowing
the application to reconnect to the database on the second site after failover.
If the site that is not hosting the active database goes down, users here can be
routed to the remaining site with the load balancer. Essential services that were
194 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Many disaster situations are not forecasted and happen unexpectedly.
Massive power outages, earthquakes and hardware failure are examples
of disasters that may happen without any warning. Other types such as
hurricanes and other weather-related disasters are things that you often
have warning of. When there is the possibility of these types of disasters it
may be a good idea to divert all the processing to the site that is least likely
to be affected.
Disaster
Unpredictable events can affect one of your sites and bring down some of the
essential services. Weather, human error, hardware failures, malicious users
are some of the many types of problems that can bring down a site. When a
disaster occurs, it is time to execute the disaster recovery plan. Restoring lost
services as quickly as possible on the remaining site will allow for a minimal
impact on operations.
When a site fails, then any essential services from that site will need to be
brought online on the remaining site. This procedure could differ depending on
which site fails. For instance, if Site A hosts the active database and Site B fails,
then the database role switch will not need to occur. If Site A were to fail, then the
standby database on Site B will need to become the new primary. Other services
such as integrations, cron tasks and reporting may need to be brought online or
redirected depending on which site fails. The procedure will vary depending on
how the workload is distributed across the sites.
Without a procedure in place, this failure recovery can take too long or fail
completely. A plan that prepares for potential loss of either site will be needed.
Disaster recovery plans should include:
Names and contact information of all parties involved with the failover
procedure. It is a good idea to have backups for everyone in case someone
cannot be contacted at that time.
A detailed step-by-step outline of the order of operations for failover.
System information required for administrators to restore services.
Test cases so administrators can verify that the system is functioning properly
before allowing users to reconnect.
These were just a few of the common needs in a disaster recovery plan. Detailed
documentation and thorough testing can help with creating a solid plan.
Communication of this plan to all administrators is extremely important. Many
disaster recovery procedure failures are attributed to a lack of internal planning
and coordination amongst all system administrators.
5.3 Prerequisites
There are many prerequisites to implementing a disaster recovery topology.
Some of these topics are covered in this book and others are assumptions. It is
best to research which solutions are best for your organizational needs.
Repetition: Although the prerequisites for this scenario are exactly the same
as for the implementation of a passive disaster recovery site, we decided to
repeat them here for convenience.
196 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
The network link between the two sites is critical for synchronization of the
application and data. Network administrators should be involved in the
planning process. The network may need to be upgraded when connecting a
second site. Redundant network links could help avoid synchronization and
communication loss.
Storage replication and sharing
When implementing a second site there are files that will get stored on the
primary and should be replicated to the passive site. Attached documents,
global search indexes, integration framework files are examples of such files.
If the primary site fails, these files may be needed on the standby site to
continue with full application functionality. An example storage solution is
disk-based mirroring as described in 4.4, “Storage replication” on page 156.
A second site
An obvious prerequisite to a disaster recovery plan is a second site that can
take over from the primary when a disaster occurs. Careful consideration
should be taken when selecting a location. Sites too close could both be
affected in a disaster, but sites too far will not be able to synchronize as
quickly and have data loss. In a hybrid-active configuration, this proximity of
the second site could affect performance because there will be a remote
database connection.
Administrators with necessary skills
When implementing disaster recovery technologies such as database and file
system replication, the complexity of the IBM SmartCloud Control Desk
topology increases. Administrators who are familiar with these technologies
and posses the skills required to configure, maintain and test the
infrastructure are critical. Lack of coordination amongst the team can lead to a
failed disaster recovery.
Application installation files on both sites
It is important that the application installation directory for IBM SmartCloud
Control Desk are copied to the secondary site. If the primary site fails and
application reconfiguration is required, these files will need to be accessible
from the secondary. These directories should also be kept synchronized with
each other after any changes. Frequent backups of the application installation
directories is advised.
198 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
considered because in the event of a site failure, the remaining site should be
able to handle the extra load.
This section describes the necessary configuration for each of these JMS
providers.
Datastore configuration
Following are the steps to configure different data stores for each WebSphere
Application Server (refer to Figure 5-3 on page 201):
1. Log in to the primary site Integrated Solutions Console.
2. Navigate to Service integration Buses intjmsbus Messaging
engines SCCDMIF.000-intjmsbus Message store.
3. Type SIBSITEA as Schema name.
200 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure 5-3 SIBSITEA data store
4. Select OK.
5. Save and synchronize changes.
6. Repeat steps 1 on page 200 through 2 on page 200 on the secondary site
WebSphere Application Server using SIBSITEB as Schema name.
202 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
9. Add the following values to the Generic JVM arguments field:
-DJMSQSEQCONSUMER.SEQQIN_SITEB=1 -DJMSQSEQCONSUMER.SEQQOUT_SITEB=1
10.Select OK.
11.Repeat steps 8 through 10 for application server SCCDMIF2.
12.Save and synchronize changes.
13.Restart application servers SCCDMIF1 and SCCDMIF2.
The queue manager will only be available at one site at a time, and its data and
log files will be replicated to the inactive site. The configuration of the queue
manager is the same as described in 4.7.2, “WebSphere MQ configuration” on
page 165.
For this book the variables shown in Table 5-1 are assumed. These values are
not mandatory for all installations and might vary in other environments.
3. Select OK.
4. Navigate to Resources JMS Activation specifications
intjmsact Custom properties connectionNameList.
5. Change the current Value to the value shown in Example 5-2.
6. Select OK.
7. Repeat steps 4 through 6 for activation specification intjmsacterr.
8. Save and synchronize changes.
204 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
9. Repeat steps 1 through 8 for the secondary WebSphere Application Server.
5.7 Database
Disaster recovery for an enterprise application means that all critical business
operations are recovered in case of any disaster or site-wide outage. Some
organizations have little or no tolerance for data loss, in which case the disaster
recovery solution needs to be deployed to restore data to the applications rapidly.
The solution must ensure the consistency of the data, allowing for restoration of
the systems and applications reliably and fast.
Bringing down the database disrupts the IBM SmartCloud Control Desk function.
Various disaster recovery database configurations are available. It is suggested
that you review IBM SmartCloud Control Desk disaster recovery documents to
choose the optimum solution for your environment.
Here we describe the hybrid-active disaster recovery setup for IBM SmartCloud
Control Desk using DB2 and Oracle databases, and how to set up replication
features using DB2 and Oracle. In this scenario all the WebSphere Application
Servers will still point to the database on the primary site. The database on the
secondary site can be used for reporting or ad hoc querying. In case of a failover,
the database on the secondary site will take over the operations, and all the
application servers will now point to this database. Some organizations may
choose to mirror the database using disk mirroring techniques, in which case the
secondary database cannot be used for any reporting.
If ACR is configured with the alternate hostname in the database catalog, the
alternate server information is cached in memory when the JVMs start. In case
the primary site becomes inaccessible, ACR will direct the connections to the
secondary site. If the JVMs are restarted after the primary site failure, the
alternate server information cannot be read and the JVMs will fail to start
because they will not be able to connect to the database server on the primary
site and will have no information about the alternate database server on the
secondary site. There are two solutions to avoid this scenario. In the first solution
the application has to be rebuilt after modifying the database properties in the
maximo.properties file. In the second solution the hostnames can be modified in
the /etc/hosts file to avoid the need for rebuild and redeploy.
To set up ACR with HADR in the active-hybrid disaster recovery setup, complete
the following steps:
1. Update the /etc/hosts file on the WebSphere and DB2 servers on both sites.
Add the hostnames for the DB2 server from both both sites in the host file as
shown in Example 5-3.
2. Update the alternate server information for the database catalog on the DB2
database across both sites.
On the primary site run this command:
db2 "update alternate server for database DB2_DBNAME using hostname
hostname.db2.siteB port 60000"
On the secondary site run this command:
db2 "update alternate server for database DB2_DBNAME using hostname
hostname.db2.siteA port 60000"
3. In the scenario where the primary site is functional, comment out the entry for
hostname for Site B in the /etc/hosts file on all WebSphere Application
Servers. In this case, the application will be connected to the database on
Site A.
206 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
4. In case of disaster, take over the primary HADR role on the secondary site:
db2 “takeover hadr on database DB2_DBNAME by force”
5. Update the /etc/hosts files on the WebSphere Application Server to swap
the IP address for the hostnames, shown in Example 5-4.
6. The application server on the secondary site will now connect to the database
on the same site. Users who were connected to the application server on the
primary site will have to relaunch the application and re-login.
Warning: The ACR transition may allow users to connect to the secondary
site after the HADR takeover command but there may be errors and
inconsistent behavior in the user interface. Although this may hasten the
failover/recovery to the secondary site, the application servers should be
recycled as soon as possible.
Some organizations may modify the network setup or the hostname setup to
resolve the database hostname from the primary site to the secondary. In this
case the application does not need to be modified. The application has to be
restarted before the users can resume their work. Optionally, DB2 ACR can be
added to the database configuration, which will allow the application to reconnect
to the secondary site database if the primary site fails. Refer to 5.7.1, “DB2
HADR” on page 205 for more information.
208 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Reporting may have to be reconfigured when a WAN link fails, depending on
which database the reports access. Refer to Appendix A, “Reporting” on
page 219.
When both sites are operational, the /etc/hosts files on both sites should
resolve the hostname to the primary site’s database server. If a failure of the
primary site occurs, the /etc/hosts files on the standby site can be modified to
resolve to the IP address (or service IP address if applicable) on the standby site.
Another option is to modify the DNS server entries to resolve the hostname to the
correct IP on the secondary site after failure. Because DNS entries are cached
and may have to filter through several DNS servers, this change may take a long
time, which may not be considered acceptable.
Changing the /etc/hosts file entries or DNS resolution may go against some
organization’s security policies. The operating system and network
administrators should be involved with the planning of this solution.
DB2 ACR
DB2 Automatic Client Reroute (ACR) can be implemented to automatically
reconnect the application to an alternate database host upon failover. Setting the
second site as the alternate for the first site and vice versa can allow IBM
SmartCloud Control Desk to reconnect without the need for restarting the
Recovery procedure
When the primary site fails, essential services such as the database will need to
be failed over to the secondary site. This is an example procedure and may differ
depending on your environment.
Database
The database should be the first priority when a primary site failure occurs. The
standby database will need to become the primary. When using DB2 HADR on
the standby site:
1. Run the following command on the standby database as the DB2 instance
administrator:
db2 takeover hadr on database DB2_DBNAME by force
2. Check the status of the database to ensure that it is now the primary:
db2pd -db DB2_DBNAME -hadr
The output should show that the HADR Role is Primary. Notice that the State
is Disconnected. This is because the original site is offline; refer to
Example 5-5.
HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Disconnected Sync 0 0
PeerWindowEnd PeerWindow
Null (0) 120
LocalHost LocalService
ti-2021-3 55002
210 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
PrimaryFile PrimaryPg PrimaryLSN
S0000122.LOG 2653 0x00000000EDD9DB12
When the primary site becomes available again, you will need to reconnect
the HADR back to a peer state for synchronization.
3. On the primary DB2 site, start the DB2 instance by running db2start as the
instance administrator. If you are using System Automation for Multiplatforms
to manage DB2. then the DB2 services should come back automatically if the
nominal status is online.
4. As the DB2 instance administrator, enable HADR by running:
db2 deactivate database DB2_DBNAME
db2 start hadr on database DB2_DBNAME as standby
5. Run the following command and ensure that the State is back in peer. It may
take some time to go back to the peer state depending on how much data
needs to synchronize.
db2pd -db DB2_DBNAME -hadr
6. It may be desirable to make the original site the primary site again when
synchronization is complete by running a graceful takeover. To do this, run:
db2 takeover hadr on database DB2_DBNAME
If the site has been down for a long time or has to be completely rebuilt, a backup
and restore will be necessary to synchronize the databases. HADR will need to
be reconfigured on the rebuilt server as well. For more information about this
procedure, refer to “DB2 HADR configuration” on page 168.
d. Restart the application servers and clusters for the changes to take effect.
The database takeover commands will need to have been run before
attempting to start the application.
Hosts file and DNS switch
For information about this procedure, review “Using the hosts file or DNS to
speed up recovery” on page 209.
DB2 ACR
For more information, review 5.7.1, “DB2 HADR” on page 205.
Reporting
If reports have been configured to point to the standby reporting database as
explained in Appendix A, “Reporting” on page 219, then no additional
reconfiguration should be required.
212 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Integrations
If using SIB as JMS provider, the messages on the failed site will remain on the
datastore and will be processed when the site recovers from the failure
automatically.
When using WebSphere MQ, if the active multi-instance queue manager was on
the failed site, it must be started on the remaining site. Utilize the strmqm -x
SCCDMIF command on both active and standby servers for the remaining active
site.
Cron tasks, integrations, and other services running exclusively on this site will
also stop. These services will have to be brought online on the primary manually
after the failure. If configuring reporting against the standby database as defined
in Appendix A, “Reporting” on page 219, this will have to be reconfigured to point
to the primary site database connection string.
Services that are not exclusive to the secondary site should failover automatically
to the primary.
Recovery procedure
This is an example procedure for recovering from a standby site failure. The
steps may be different, depending on your environment.
Database
If the secondary site fails, the impact from a database level should be minimal
because the primary database will still be active.
When the secondary site is repaired and comes back online, the database
should be started as HADR standby if using DB2 HADR:
1. Start the DB2 instance by running db2start as the instance administrator. If
you are using System Automation for Multiplatforms to manage DB2, then the
DB2 services should come back automatically if the nominal status is online.
2. As the DB2 instance administrator, enable HADR by running:
db2 start hadr on database DB2_DBNAME as standby
214 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
If the site has been down for a long time or has to be completely rebuilt on the
failed site, a backup and restore will be necessary to synchronize the databases.
HADR will need to be reconfigured on the rebuilt server as well. For more
information about this procedure, refer to “DB2 HADR configuration” on
page 168.
Application
When the secondary site fails, there should be minimal reconfiguration required
from the application level because the primary database is still active. Any cron
tasks or services that were running exclusively on the secondary site will need to
be brought online on the primary.
Reporting
If reports have been configured to point to the standby reporting database as
explained in Appendix A, “Reporting” on page 219, then they will need to be
redirected to the primary database to function correctly. Follow the instructions in
the Appendix to reconfigure the reports back to the primary database.
Integrations
There are no specific actions required when the failed site comes back online. If
using SIB as the JMS provider, the messages stuck in the datastore will continue
to process. If using WebSphere MQ as the JMS provider, another active instance
should already be in place on the site that did not fail.
Important: Make sure that only one site has WebSphere MQ multi-instance
queue manager online.
5.10 Conclusion
Activating resources on the secondary site can help to distribute some of the
load across both sites. Knowing the implications of this topology is important
when selecting this configuration. Understanding how failures can affect each
site can help design an effective disaster recovery plan.
216 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Part 3
Part 3 Appendixes
Appendix A. Reporting
This appendix provides details about how to configure BIRT reports within IBM
SmartCloud Control Desk to run against a secondary or replicated database.
BIRT provides functions such as, for example, print, email reports, schedule
reports, reports usage, and monitoring. Users can generate ad-hoc reports,
create their own reports by selecting fields, sorting, grouping and filtering the
records. Users can also create their own local queries. These reports and
queries can be shared with other users, scheduled, and edited to meet business
needs.
Based on the business needs, organizations may want all the reports to execute
using a separate reporting database for performance enhancements. In this
scenario the IBM SmartCloud Control Desk application is connected to the
primary production database, and the BIRT reporting engine is connected to the
secondary database. Out-of-the-box the BIRT reports use the default database
value defined in the maximo.properties file to connect to the database. This
default data source needs to be updated to the new reporting database.
Figure A-1 on page 221 displays the typical BIRT engine pointing to the
secondary reporting database. In the example, the application JVMs for the user
interface, the integration framework, and cron all execute transactions against the
primary production database. The BIRT Report Only Server (BROS) is
configured to execute reports against the secondary reporting database.
Depending on the organization’s requirements, the secondary reporting
database can be an exact mirror image of the primary database or it can be
synchronized at a regular interval. In our topology, a DB2 read-only HADR server
was deployed as the reporting database, which maintains an almost exact copy
of the primary database at all times. For more information about the DB2 HADR
setup, refer to “HADR setup” on page 63.
220 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure A-1 BIRT reporting engine with secondary reporting database
222 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
To use multiple data sources with reports, in our case, we assumed that the BIRT
designer version 3.7.1 was installed on the client workstation and report source
code was available for modification. A local IBM SmartCloud Control Desk with
the report source files needed to be available as well. The following steps
describe how to configure reports to execute against the primary production
transactional database and an external reporting database.
1. In case of two data sources, you need to update the address for the
maximoDataSource to point to the primary transactional database.
2. Add a second database source, which points to the secondary, or external
reporting database. Use the steps described to add a new data source.
Figure A-3 displays the BIRT reporting setup with two different data sources.
3. Locate the compiled class used for the application report scripting from the
path /opt/IBM/SMP/maximo/reports/birt/scriptlibrary/classes.
4. Navigate to the BIRT designer folder. Note that these are samples using BIRT
designer 3.7.1
\birt-report-designer-all-in-one-3_7_1\eclipse\plugins\org.eclipse.b
irt.report.viewer_3.7.1.v20110905\birt\WEB-INF. Copy the entire classes
folder from step 3 to this Eclipse directory.
reportDataSource.url=jdbc.db2://9.42.170.180:60000/maxdb75
reportDataSource.driver=com.ibm.db2.jcc.DB2Driver
reportDataSource.username=maximo
reportDataSource.password=xyzpwd
reportDataSource.schemaowner=maximo
7. If the database drivers for the databases are not loaded, they can be copied
from /opt/IBM/SMP/maximo/applications/maximo/lib to
\birt-report-designer-all-in-one-3_7_1\eclipse\plugins\org.eclipse.b
irt.report.viewer_3.7.1.v20110905\birt\WEB-INF\lib.
8. Copy the report design file and rename it. Launch the BIRT designer and
open the report design file.
9. The default datasource is displayed under Data sources. Highlight the
datasource and click the XML Source to modify the datasource name.
Figure A-4 on page 225 displays the BIRT designer data source.
224 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Figure A-4 BIRT designer data source
10.From the top of the XML file, search for the value maximoDataSource. The
following line will be displayed:
<script-data-source name="maximoDataSource" id="64"
extends="MaximoSystemLibrary.maximoDataSource"/>
11.Update maximoDataSource to reportDataSource. The line should look like this:
<script-data-source name="reportDataSource" id="64"
extends="MaximoSystemLibrary.maximoDataSource"/>
12.Continue to search for maximoDataSource. Update all occurrences of the line:
<property name="dataSource">maximoDataSource</property>
13.The updated line will reflect the new data source:
<property name="dataSource">reportDataSource</property>
14.After all the occurrences of the original data source are updated to the new
one, go back to the layout and save the report design file.
226 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Configuring BIRT Report Only Server
In IBM SmartCloud Control Desk, you have an option to configure the
environment to include a BIRT Report Only Server (BROS). This server enables
you to offload report processing requirements to a different server. Enabling
BROS can balance report load processing and improve the overall system
performance. The BIRT reports can now be executed from the separate BROS
JVM, thus enhancing the UI JVM performance for the users. The BROS is
utilized for report processing regardless of what clustered server the users may
be on.
228 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
12.Launch the WebSphere Administration console and navigate to the web
container cookies at Application Server MXBrosServer Web
containers Cookies. Modify the cookie value to a distinct unique value
that is not used for any other application server in this environment. This will
prevent the users from being logged out from their UI session when they
launch a report and close the report window. Figure A-7 displays the
appropriate dialog where we changed the Cookie name to JSESSIONBIRTID.
13.Click Apply and synchronize the node with the updates. Follow step 12 for all
the BIRT JVMs in the cluster.
14.Stop and restart the BIRT cluster. When the user executes the reports from
their UI session, it will transfer the execution of the report to the BROS and
run in a separate window. The report can be closed by closing the report
window. The UI will not be affected.
Example A-3 Sample sql to check whether the LOB data is inline
select commlogid, ADMIN_IS_INLINED(<LOB Column>) as IS_INLINED,
ADMIN_EST_INLINE_LENGTH(<LOB Column>) as EST_INLINE_LENGTH from
maximo.<Table> where ADMIN_IS_INLINED(<LOB Column>) = 0
3. Alter the table to set the inline limit on the LOB field to match the limit in
Maximo, shown in Example A-4. Repeat these commands for all the LOB
columns in the table and run a reorg command on those tables.
4. Rerun the sql in step 2 and verify that all the LOB columns are inline.
This concludes the setup of the BIRT Report Only Server. By changing the
reports to use the secondary reporting database and BROS, the load can be
shifted from the UI JVMs and primary database and enhance performance.
230 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
B
Before you import data from an external data source into the Maximo target
database, use Integration Composer to create a mapping to transform data from
the source format to the target format. A mapping is a set of expressions that tell
Integration Composer how to create data in the target using information from a
source. For each property that you want to import, define an expression that
specifies how to transform the data for that property when Integration Composer
imports the data from the source into the target. When you execute a mapping,
Integration Composer transforms the collected data and imports it into the target.
When you first implement IT asset management, you can also use Integration
Composer's asset initialization adapter to create a baseline set of authorized IT
asset records from the deployed asset data that you imported. Authorized IT
asset data is managed in the Assets application.
Overview
Integration Composer is separately installed software that is required by the
integration adapter. To use the integration adapter, first you have to install and
understand the basics about Integration Composer.
http://pic.dhe.ibm.com/infocenter/tivihelp/v50r1/topic/com.ibm.tusc.doc
/int_comp/c_ctr_ic_overview.html
232 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
Disaster Recovery consideration
While Integration Composer is an important integration to IBM SmartCloud
Control Desk, it is not mandatory for functioning of IBM SmartCloud Control
Desk. Integration Composer is used to import hardware and software inventory
data. It is important to keep the inventory data current in the database for
providing efficient service. Care should be taken to properly back up the
Integration Composer server. In the disaster recovery topology, provisions should
be made for deployment of Integration Composer server on both sites.
Integration Composer server can be a stand-alone server on both the sites. The
data can be synchronized by taking snapshots of the data on the primary server
and restoring the data on the secondary server. If the disk mirroring topology is
deployed, then the data can be synchronized by mirroring the disks on which the
Integration Composer directory structure resides.
Since Integration Composer uses JDBC drivers to establish the connection to the
target IBM SmartCloud Control Desk database, the JDBC connection string
within the data source needs to be modified on the secondary site to point to the
database server on that site. In a failover scenario, the Integration Composer
may need to be modified to integrate with a discovery tool on the secondary site.
In some cases during the failover the mappings may have to be recreated.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about
the topic in this document. Note that some publications referenced in this list
might be available in softcopy only.
End-to-end Automation with IBM Tivoli System Automation for Multiplatforms,
SG24-7117
IBM System Storage DS8000 Copy Services for Open Systems, SG24-6788
IBM XIV Storage System: Copy Services and Migration, SG24-7759
IBM System Storage DS Storage Manager Copy Services Guide, SG24-7822
SAN Volume Controller and Storwize V7000 Replication Family Services,
SG24-7574
You can search for, view, download or order these documents and other
Redbooks, Redpapers, Web Docs, draft and additional materials, at the following
website:
ibm.com/redbooks
Other information
The following information sources may provide additional material of value.
Implementing highly available systems with IBM Maximo:
http://pic.dhe.ibm.com/infocenter/tivihelp/v49r1/index.jsp?topic=%2F
com.ibm.mbs.doc%2Fgp_highavail%2Fc_ctr_high_availability.html
IBM SmartCloud Control Desk, Version 7.5 product documentation Info
Center:
http://pic.dhe.ibm.com/infocenter/tivihelp/v50r1/index.jsp?topic=%2F
com.ibm.tusc.doc%2Fic-homepage.html
236 HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
HA/DR Configurations for IBM SmartCloud Control Desk and IBM Maximo Products
(0.2”spine)
0.17”<->0.473”
90<->249 pages
Back cover ®
Learn how to set up In today’s global environment, more and more organizations
high availability and need to reduce their downtime to the minimum possible and INTERNATIONAL
disaster recovery look for continuous availability of their systems. Products TECHNICAL
configuration based on the IBM Tivoli Process Automation Engine (TPAE), SUPPORT
options such as IBM Maximo Asset Management, Maximo Industry ORGANIZATION
Solutions, and IBM SmartCloud Control Desk, often play a
role in such environments and thus also have continuous
Design a multisite
availability requirements. As part of that, it is important to
deployment with understand the High Availability (HA) and Disaster Recovery BUILDING TECHNICAL
load balancing (DR) capabilities of IBM SmartCloud Control Desk and IBM INFORMATION BASED ON
PRACTICAL EXPERIENCE
Maximo Products, and how to assure that all the components
Configure of an HA/DR solution are properly configured and tested to
middleware handle outages. By outlining some of the topologies we have IBM Redbooks are developed by
components tested, and the documentation we created, we hope to the IBM International Technical
demonstrate how robust the IBM SmartCloud Control Desk Support Organization. Experts
and IBM Maximo infrastructure can be. from IBM, Customers and
Partners from around the world
This IBM Redbooks publication covers alternative topologies create timely technical
for implementing IBM SmartCloud Control Desk and IBM information based on realistic
scenarios. Specific
Maximo in High Availability and Disaster Recovery recommendations are provided
configurations. to help you implement IT
solutions more effectively in
your environment.