Efficient Data Protection With EMC Avamar Global Deduplication Software - 2010
Efficient Data Protection With EMC Avamar Global Deduplication Software - 2010
Abstract
This white paper provides a technical overview of EMC® Avamar® backup and recovery software with
integrated global, source data deduplication technology. It includes an in-depth look at the Avamar architecture,
patented global data deduplication technology, key applications, and customer benefits.
January 2010
Copyright © 2007, 2008, 2009, 2010 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION
MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com
All other trademarks used herein are the property of their respective owners.
Part Number H2681.5
Introduction
This white paper provides a technical overview of EMC Avamar backup and recovery software. It provides
a closer look at patented Avamar global, source data deduplication technology, Avamar’s architecture,
supported platforms, high availability, and ease of management. Other topics include integrity checking,
encryption, and integration with event management solutions.
Audience
This white paper is intended for backup administrators or technical staff seeking a more in-depth look at
EMC Avamar’s underlying architecture and technology.
A
Only unique B Data already backed up, New data segment
data segments
C so only unique IDs stored E identified and backed up
are backed up (20 byte pointers)
D
A B C D E
Avamar Server
(stored backup data)
Figure 1. EMC Avamar software identifies the unique, subfile variable length data
segments that comprise the data (in this case, a PowerPoint presentation). Only a single
instance of each data segment is stored globally, across sites and servers
Amount of Amount of
Data Type Primary Data Data Moved
Backed Up Daily
Figure 2. While results will vary by data type and mix, Avamar can dramatically improve
backup performance and efficiency
General architecture
Avamar’s software consists of a number of components, including the Avamar server, Avamar
Administrator, Avamar Enterprise Manager, and Avamar client software. Avamar servers can be deployed
in either single-node or scalable multinode configurations, depending upon the amount of performance and
disk capacity required. Avamar Replicator efficiently and securely replicates Avamar servers across the
WAN between sites.
Avamar servers
EMC offers flexible Avamar server deployment options including Avamar Data Store, the Avamar Virtual
Edition virtual appliance, and the option to leverage existing certified servers. The easiest way to deploy a
physical Avamar server is via the Avamar Data Store – a scalable, all-in-one packaged solution consisting
of Avamar software preinstalled and preconfigured on EMC-certified hardware to simplify purchasing,
deployment, and service while minimizing onsite setup. For virtual environments, Avamar Virtual Edition
for VMware enables an Avamar server to be quickly deployed as a virtual appliance, leveraging an existing
ESX server and its attached disk storage. Another option is to install Avamar software on industry-
standard Intel-based servers (certified on Dell, HP, IBM) running Red Hat Enterprise Linux.
Avamar servers store client backups and manage the policies for scheduling, determining datasets, and
retention periods. An Avamar backup provides a point-in-time full copy of data that can be restored on
demand from an Avamar server. Multinode Avamar servers segregate components of the Avamar server
across multiple hardware servers for scalability and performance.
There are two primary node types for a multinode Avamar server:
• Storage Node—Stores the deduplicated backup data. Multiple Storage Nodes are configured with
multinode Avamar servers based upon performance and capacity requirements. Storage Nodes can be
added to an Avamar server over time to expand performance with no downtime required. Avamar
clients connect directly with Avamar Storage Nodes; client connections and data are load balanced
across Storage Nodes automatically without any downtime.
• Utility Node—Node dedicated to scheduling and managing background Avamar server jobs. One
utility node is configured per multinode Avamar server. Data on the Utility Node is protected by the
Avamar server. Note: Utility Nodes are not single points of access for an Avamar server; backups and
restores can still complete to connected clients, even without a Utility Node.
Other optional nodes include:
• NDMP Accelerator Node—Specialized node that works with NDMP in order to provide data
protection for NAS filers (EMC Celerra® and NetApp filers).
Avamar Administrator
The Avamar Administrator is a graphical management application that includes the following:
• Management of backup policies, including datasets, schedules, and retention
• Management of users and clients
• Centralized on-demand backups and restores
• Detailed monitoring and reporting
The Avamar Administrator can be launched directly from the Web-based Enterprise Manager user interface
without any software installation or can be installed locally on any Windows or Linux system.
Avamar Replicator
Avamar Replicator enables efficient, encrypted, and asynchronous replication of data stored in an Avamar
server to another Avamar server deployed in a remote location without the need to ship tapes. Avamar
Replicator is a scheduled process between two independent Avamar servers, providing a higher level of
UNIX
Avamar supports a variety of UNIX operating systems, including Solaris, HP-UX, SCO, Free BSD, and
IBM AIX.
Linux
Avamar supports Red Hat and SUSE Linux operating systems.
MAC OS X
Avamar supports Mac OS X on the PowerPC and Intel platforms.
NetWare
Avamar supports Novell NetWare on Intel platforms.
Microsoft SharePoint
Avamar supports Microsoft SharePoint environments.
Oracle databases
Avamar utilizes Oracle Recovery Manager to provide fast and automated protection of Oracle databases,
while maintaining online availability. The Avamar client for Oracle can be used to provide full daily
backups of an active Oracle database while generating only a small amount of incremental storage.
DB2 databases
The Avamar client for DB2 provides fast and automated protection of DB2 databases, while maintaining
the online availability of the databases.
NAS filers
Avamar supports NDMP backups via the Avamar NDMP Accelerator Node to provide reliable and high-
performance backup and recovery for NAS filers (for example, EMC Celerra and NetApp filers). Avamar
provides fast, daily full backups for filers while requesting only a level-one (incremental) dump of data
from the filer itself, dramatically reducing backup times and network utilization (see Figure 4).
Figure 4. Avamar delivers fast, daily full backups for NAS filers
VMware environments
Avamar software quickly and efficiently protects VMware environments by reducing the size of backup
data within and across virtual machines. This eliminates traditional backup bottlenecks caused by the large
amount of data that must pass through the same set of shared resources—the physical server’s CPU, NIC,
memory, and disk storage. Avamar reduces the traditional backup load—up to 200 percent weekly—to as
little as 2 percent over the same seven-day period, dramatically reducing backup times and resource
utilization.
Avamar can quickly protect VMware environments by installing agents on the virtual machine Guests,
proxy server. In all cases, Avamar’s powerful deduplication technology provides fast, daily full backups
while reducing required network/infrastructure bandwidth and storage (Figure 5). In addition, unlike
traditional backup solutions, Avamar can deduplicate the data stored in virtual disks (*.vmdk files),
significantly reducing storage utilization and enabling replication of virtual disks across congested WANs.
Figure 5. Avamar deduplicates backup data within and across virtual machines, at the
source, and globally, providing fast and reliable daily full backups
Global deduplication
Avamar stores only a single instance of each unique subfile variable length data segment for all protected
servers, desktops, and laptops. Each segment of data is assigned a unique ID, using the SHA-1 encryption
algorithm. This 20-byte ID is unique to the data segment. Whenever an Avamar client encounters a new
data segment, it generates the unique ID and sends the ID to the Avamar server to determine if the segment
has been previously stored. The segment will only be sent to an Avamar server if it is new and unique.
Avamar’s grid server architecture, shown in Figure 7, provides scalable performance and capacity. Every
Avamar client can connect to every Storage Node for both backup and restore, which eliminates potential
performance bottlenecks.
Furthermore, Avamar takes groups of unique IDs (for instance, all the IDs for a set of segments that make
up a file) and generates a new unique ID for that group. A request for that group ID will cascade into
requests for all the segments that make up that group. This process continues hierarchically, so Avamar can
quickly store and retrieve files, directories, entire file systems, and even databases, without the need for any
centralized index or database that can become a bottleneck to performance or scalability. It is important to
note that the Avamar Utility Node is not used as a database or index for storage or access of unique
segments. Unlike some other deduplication architectures, Avamar does not use access nodes or metadata
nodes, which can become bottlenecks to performance. Every Avamar client can connect to every Storage
Node in an Avamar Grid Server for both backup and restore. Avamar’s elegant index structure eliminates
redundancy even for indices and metadata, ensuring that the indexing component of an Avamar Grid Server
remains approximately two percent of total data storage.
Flexibility in deployment
Enterprises have a tremendous amount of flexibility in
planning long-term deployments of an Avamar server.
Avamar’s grid architecture enables an Avamar system to be
expanded one node at a time—even across dissimilar
hardware platforms from different manufacturers or with
different capacity disk drives. This allows customers to take
advantage of new, more cost-effective hardware as their
Avamar systems grow over time.
Encryption
Avamar provides comprehensive encryption capabilities, including the ability to encrypt backup data while
in transit and at rest. For enhanced security during client/server data transfers, Avamar supports SSL
encryption. SSL encryption utilizes the 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm
and should be used for any external network communications, where security is a significant concern. The
choice of encryption method can be made on a client-by-client basis or for an entire group of clients.
Avamar also supports the option to enable encryption of data at rest using 128-bit Blowfish encryption. By
encrypting data at rest, organizations are further protected from backup data theft or unauthorized access.
SAN/LAN
VMware Virtualization Layer
x86 Architecture
Tape
Resource
Pool
vault
Figure 10. Avamar Data Transport exports deduplicated backup data to tape for cost-
effective, long-term retention
To protect larger remote offices and data centers, data can be backed up to a local Avamar server for faster
recovery, and then replicated to another Avamar server located at the data center or remote disaster
recovery site. When an Avamar server is required, there are several deployment options available:
EMC-certified server - Avamar software installed on an EMC-certified server running Red Hat Enterprise
Linux from vendors including Dell, HP, and IBM.
EMC Avamar Virtual Edition for VMware - The industry’s first deduplication virtual appliance for
backup, recovery, and disaster recovery. Enables Avamar to be deployed easily, effectively, and in a
repeatable fashion on VMware ESX Server hosts, leveraging the existing server CPU and disk storage.
EMC Avamar Data Store – An all-in-one packaged solution consisting of EMC Avamar software running
on preconfigured EMC-certified hardware, the Avamar Data Store is available in two models – a scalable
multi-node model and a single-node model. This approach simplifies purchasing, deployment, and service
while minimizing on-site setup.
The multi-node Avamar Data Store is designed for the data center where backup data is being consolidated
from multiple remote locations or to protect VMware environments and LAN/NAS servers. It can
efficiently retain the equivalent of up to several petabytes of traditional cumulative daily full backups.
The single-node Avamar Data Store is ideal for deployment at remote offices that require faster local
recovery performance. It provides up to 1 TB, 2 TB, or 3.3 TB of deduplicated backup capacity, which
under a typical traditional backup schedule could require tens of terabytes of disk or tape storage,
depending on the backup method and retention period. In addition, both models support replication, either
from the remote office to the data center for consolidation, or between data centers for disaster recovery.
Figure 11. Avamar’s intuitive, easy-to-use interface enables desktop and laptop users to
quickly recover their own data, reducing the burden on IT staff.