NetBackup and VCS
NetBackup and VCS
Many of us who have been working with NetBackup for a long time come across a
situation where we need to work on NetBackup that is configured with VCS. Not
everyone who knows NetBackup would necessarily know VCS. So here is a small
overview of VCS and how it works with NetBackup.
CLUSTERS:
High-availability clusters (also known as Failover Clusters and most common for
NetBackup) are implemented primarily for the purpose of improving the availability of
services which the cluster provides. They operate by having redundant nodes, which
are then used to provide service when system components fail. The most common size
for an HA cluster is two nodes, which is the minimum requirement to provide
redundancy. HA cluster implementations attempt to use redundancy of cluster
components to eliminate single points of failure.
For NetBackup, you would usually have a 2-Node cluster. An active node, and a failover
node.
HOW IT WORKS:
In this case, VCS monitors the NetBackup application on active node at all times and if
NetBackup becomes unavailable, VCS will detect this failure, it will gracefully stop
everything, unmount the Shared volume from active node, mount on the Passive node
and start netbackup there. The failed node can be now worked upon for disaster
recovery and backups will be interrupted for just a few minutes.
VCS:
Terminology:
There are three categories of VCS resources: on-off, on-only, and persistent.
- On-off means VCS can fully control the resource;
- on-only is a resource that VCS can restart but not shutdown;
- persistent resource is something that VCS will just monitor but cannot control. (NIC)
Resource agent: Every resource has an agent associated. The agent is responsible for
various actions on resource like online, offline, monitor
Jeopardy: A system is in jeopardy when only one of its heartbeat connections is still
functioning. A loss of the remaining heartbeat network will not allow VCS to know
whether the host has crashed or the last heartbeat network has been disabled.
1. Cluster Communication:
Low Latency Transport (LLT) and Global Atomic Broadcast (GAB) are
responsible for heartbeat and cluster communication. These are kernel modules
and are installed with VCS. LLT provides a fast and high-priority internal cluster
communication. LLT does not work on TCP/IP and its a different technology of
communication. GAB runs over LLT. GAB is primarily responsible for cluster
membership. So, LLT on each node will do the communication and GAB on each
node will maintain the cluster membership.
Configuration files:
/etc/llttab
/etc/llthosts
Commands:
lltstat
lltconfig
GAB -
Configuration file:
/etc/gabtab
Command:
gabconfig
NOTE: For LLT, GAB and HAD, there is a dependency. At the system start up, first LLT
starts, then GAB and then HAD. HAD will not run without GAB and GAB will not run
without LLT:
VCS startup:
VCS -
Start: Follow steps above.
Stop: Stop the HAD, unload GAB and then unload LLT.
Service Groups -
Online: Manually bring a specific service group online on a specific node or all nodes.
Offline: Manually bring a service group offline on a specific node or all nodes.
Freeze: In terms of netbackup, if netbackup has problems, you might want to stop and
start netbackup a couple of times. Its necessary to freeze the service group at that time.
By freezing service group, we are telling VCS not to take any action on it.
Resource -
Online: Manually online a resource
Offline: Manually offline a resource
Probe: Ask the resource agent to probe for the resource and get its current status.
Netbackup in VCS:
Install Netbackup on nodes the way you would normally do. Netbackup installation
wizard asks for EMM server name and Master server name, at that time, give "virtual
name" for installation on both the nodes. Note that right now, nothing will go on the
shared LUN.
/usr/openv/netbackup/bin/cluster/cluster_config
This script will prompt for all the information that it needs and does the following:
- Create an agent "NetBackup" and its cf file at
/usr/openv/netbackup/bin/cluster/vcs/NetBackupTypes.cf
- Create service group. (usually nbu_group)
- create resources. (NIC, IP, DG, VOL, MOUNT and NETBACKUP)
- Moves the databases to the shared location
- Creates the file /usr/openv/netbackup/bin/cluster/NBU_RSP which holds information
about cluster configuration.
The good part about cluster_config script is that if any thing fails in the script, it does an
undo on everything, which means that next time you run the script again, it wont create
any duplicates in config.
Basic Tasks:
Create service group (hagrp -add)
Modify service group (hagrp –modify)
Delete service group (hagrp –delete)
Add resource(s) to a service group (hares –add)
Modify resources (hares –modify)
Delete resources (hares –delete)
Monitor the cluster (hastatus)
Switch over service group from one node to other (hagrp –switch)
Config files:
/etc/VRTSvcs/conf/config/main.cf
/etc/VRTSvcs/conf/config/types.cf
/usr/openv/netbackup/bin/cluster/vcs/NetBackupTypes.cf
/usr/openv/netbackup/bin/cluster/NBU_RSP
Logs:
System log
/var/VRTSvcs/logs/engine_A.log
/usr/openv/netbackup/bin/cluster/AGENT_DEBUG.log
I hope you enjoyed reading through it and hope it helps you in your day to day work.