Data Protection Participant Guide PDF
Data Protection Participant Guide PDF
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Data Protection
Data Protection
Data Deduplication................................................................................................... 44
Data Deduplication.............................................................................................................45
Data Deduplication Overview .............................................................................................46
Key Benefits of Data Deduplication ....................................................................................48
Data Deduplication Method: Source-Based........................................................................49
Data Deduplication Method: Target-Based.........................................................................50
Data Deduplication: Additional Information.........................................................................51
Data Archiving.......................................................................................................... 54
Data Archiving ...................................................................................................................55
Data Archiving Overview ....................................................................................................56
Backup vs. Archive ............................................................................................................57
Data Archiving Operations .................................................................................................58
Use Case: Email Archiving .................................................................................................59
Data Protection
Concepts in Practice................................................................................................ 74
Concepts in Practice ..........................................................................................................75
Data Protection
Data Protection
Data Protection
Data Protection
Data Replication
Data Protection
Data Replication
Data Protection
Replication
Servers
Data Center B
Replication
Data Replication
Data Center A
• Replicas 1 are used to restore and restart operations when there is data loss.
• Data can be replicated to one or more locations.
− For example, the production data is copied from the source (primary
storage) to the target. The target can be other storage in the same data
center, storage in a different data center, or to the cloud.
Data Protection
Use of Replicas
Replicas
Data Replication
Data migration
Source
Notes:
Alternative source for backup: Under normal backup operations, data is read
from the production LUNs and written to the backup device. This approach places
an additional burden on the production infrastructure because production LUNs are
simultaneously involved in production operations and servicing data for backup
operations. To avoid this situation, a replica can be created from production LUN
and it can be used as a source to perform backup operations. This method
alleviates the backup I/O workload on the production LUNs.
Fast recovery and restart: For critical applications, replicas can be taken at short,
regular intervals. This approach allows easy and fast recovery from data loss. If a
complete failure of the source (production) LUN occurs, the replication solution
enables one to restart the production operation on the replica to reduce the RTO.
Data Protection
Testing platform: Replicas are also used for testing new applications or upgrades.
For example, an organization may use the replica to test the production application
upgrade; if the test is successful, the upgrade may be implemented on the
production environment.
Data migration: Another use for a replica is data migration. Data migrations are
performed for various reasons such as migrating from a smaller capacity LUN to
one of a larger capacity for newer versions of the application.
Data Protection
Data Protection
Types of Replication
Data Protection
Snapshots can establish recovery points in a small fraction of time and can reduce
Recovery Point Objective (RPO) by supporting more frequent recovery points. If a
file is lost or corrupted, it can typically be restored from the latest snapshot data in
a few seconds.
VM Snapshot
Data Protection
Multiple snapshots can be created from the same source LUN for various business
requirements. Some snapshot software provides the capability of automatic
termination of a snapshot upon reaching the expiration date. The unavailability of
the source device invalidates the data on the target. The storage system-based
snapshot uses a Redirect on Write (RoW) mechanism.
RoW redirects new writes that are destined for the source LUN to a reserved LUN
in the storage pool. In RoW, a new write from source compute system is written to
a new location (redirected) inside the pool. The original data remains where it is,
and is therefore read from the original location on the source LUN and is untouched
by the RoW process.
Data Protection
Data Protection
• Write is committed to both the source and the remote replica before it is
acknowledged to the compute system.
• Synchronous replication enables restarting business operations at a remote site
with zero data loss and provides near zero RPO.
Notes:
In synchronous replication, writes must be committed to the source and the remote
target prior to acknowledging “write complete” to the production compute system.
Another writes on the source cannot occur until each preceding write has been
completed and acknowledged.
This approach ensures that data is identical on the source and the target always.
Further, writes are transmitted to the remote site exactly in the order in which they
are received at the source. Write ordering is maintained and it ensures
transactional consistency when the applications are restarted at the remote
location. As a result, the remote images are always restartable copies.
Data Protection
response time depends primarily on the distance and the network bandwidth
between sites. If the bandwidth provided for synchronous remote replication is less
than the maximum write workload, there are times during the day when the
response time might be excessively elongated, causing applications to time out.
The distances over which synchronous replication can be deployed depend on the
capability of an application to tolerate the extensions in response time. Typically,
synchronous remote replication is deployed for distances less than 200 kilometers
(125 miles) between the two sites.
Data Protection
Notes:
Asynchronous replication enables to replicate data across sites which are 1000s of
kilometers apart.
In asynchronous replication, compute system writes are collected into buffer (delta
set) at the source. This delta set is transferred to the remote site in regular
intervals. Adequate buffer capacity should be provisioned to perform asynchronous
replication. Some storage vendors offer a feature called delta set extension, which
enables to offload delta set from buffer (cache) to specially configured drives. This
Data Protection
In asynchronous replication, RPO depends on the size of the buffer, the available
network bandwidth, and the write workload to the source. This replication can take
advantage of locality of reference (repeated writes to the same location). If the
same location is written multiple times in the buffer prior to transmission to the
remote site, only the final version of the data is transmitted. This feature conserves
link bandwidth.
Data Protection
Data Protection
• Continuous Data Protection provides the capability to restore data and VMs to
any previous point-in-time (PIT).
2: Continuous Data Protection solutions have the capability to replicate data across
heterogeneous storage systems.
3: Continuous Data Protection supports both local and remote replication of data
and VMs to meet operational and disaster recovery respectively.
Data Protection
5: Continuous Data Protection supports multisite replication, where the data can be
replicated to more than two sites using synchronous and asynchronous replication.
Data Protection
Data Protection
Notes:
Typically, the replica is synchronized with the source, and then the replication
process starts. After the replication starts, all the writes from the compute system to
the source (production volume) are split into two copies. One copy is sent to the
local Continuous Data Protection appliance at the source site, and the other copy is
sent to the production volume. Then the local appliance writes the data to the
journal at the source site and the data in turn is written to the local replica. If a file is
accidentally deleted, or the file is corrupted, the local journal enables organizations
to recover the application data to any PIT.
In remote replication, the local appliance at the source site sends the received write
I/O to the appliance at the remote (DR) site. Then, the write is applied to the journal
volume at the remote site. As a next step, data from the journal volume is sent to
the remote replica at predefined intervals. Continuous Data Protection operates in
either synchronous or asynchronous mode.
Data Protection
Data Protection
Knowledge Check
Data Protection
Knowledge Check
1. Which provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM?
a. Clone
b. Snapshot
c. Pointer-based virtual replica
d. Full volume virtual replica
Data Protection
Data Backup
Data Protection
Data Backup
Data Protection
Backup Overview
A Backup is an additional copy of production data, which is created and retained for
the sole purpose of recovering lost or corrupted data.
Data Protection
Backup Architecture
Notes:
The role of a backup client is to gather the data that must backup and send it to
the storage node. The backup client can be installed on application servers, mobile
clients, and desktops. It also sends the tracking information to the backup server.
The backup server manages the backup operations and maintains the backup
catalog, which contains information about the backup configuration and backup
metadata. The backup configuration contains information about when to run
backups, which client data to be backed up, and so on. The backup metadata
contains information about the backed-up data. The storage node is responsible
for organizing the client’s data and writing the data to a backup device. A storage
node controls one or more backup devices.
In most implementations, the storage node and the backup server run on the same
system. Backup devices may be attached directly or through a network to the
storage node. The storage node sends the tracking information about the data that
is written to the backup device to the backup server. Typically this information is
Data Protection
used for recoveries. Backup targets include tape, disk, virtual disk library, and the
cloud.
Data Protection
Backup Operation
Data Protection
Recovery Operation
After the data is backed up, it can be restored when required. A recovery operation
restores data to its original state at a specific Point in Time (PIT). Typically, backup
applications support restoring one or more individual files, directories, or VMs.
Data Protection
Backup Granularities
Full Backup
• Full backup copies all data on the production volume to a backup device.
Full Backup-Restore
In the motion graphics shown below, a full backup is created on every Sunday.
When there is a data loss in the production on Monday, the recent full backup that
is created on the previous Sunday is used to restore the data in the production.
Data Protection
Incremental Backup
Incremental backup copies the data that has changed since the last backup.
• The main advantage of incremental backups is that fewer files are backed up
daily, allowing for shorter backup windows 2.
• Click here 3 to view the example of incremental backup.
Data Protection
Cumulative Backup
Cumulative (differential) backup copies the data that has changed since the last full
backup.
Data Protection
4 For example, the administrator created a full backup on Sunday and differential
backups for the rest of the week. Backup that is created on Monday would contain
all the data that has changed since Sunday. It would therefore be identical to an
incremental backup at this point. On Tuesday, however, the differential backup
would backup any data that had changed since Sunday (full backup). The
advantage that differential backups have over incremental is shorter restore times.
Restoring a differential backup never requires more than two copies. The tradeoff is
that as time progresses, a differential backup can grow to contain more data than
an incremental backup. Suppose that an administrator wants to restore the backup
from Tuesday. The administrator has to first restore the full backup that is created
on Sunday. After that, the administrator has to restore the backup created on
Tuesday.
Data Protection
Agent-Based Backup
Backup Server
Application
Servers
Backup Device
Agent-based backup
Data Protection
Image-Based Backup
Image-based backup makes a copy of the virtual machine disk and configuration
that is associated with a particular VM. The backup is saved as a single entity
called a VM image.
Create Snapshot
VM Management Server
VM Snapshot A Proxy
Server
Create Mount
Snapshot Snapshot
Backup
Backup Server Data
Backup Device
Notes:
This backup is used for restoring an entire VM if there is any hardware failure or
human error. It is also possible to restore individual files and folders within a virtual
machine.
In an image-level backup, the backup software can backup VMs without installing
backup agents inside the VMs or at the hypervisor-level. Proxy server performs the
backup operations, and it acts as the backup client. The proxy server offloads the
backup processing from the VMs.
Data Protection
server. A snapshot captures the configuration and virtual disk data of the target VM
and provides a point-in-time view of the VM.
The proxy server then performs backup by using the snapshot. Performing an
image-level backup of a virtual machine disk enables running a bare metal restore
of a VM.
Some of the vendors support changed block tracking mechanism. This feature
identifies and tags any blocks that have changed since the last VM snapshot. This
method enables the backup application to backup only the blocks that have
changed, rather than backing up every block.
Data Protection
Cloud Resources
Backup Data
Data Center
Organizations must regularly protect the data to avoid losses, stay compliant, and
preserve data integrity. They may face challenges on IT budget, and IT
management. These challenges can be addressed with the emergence of cloud-
based data protection.
Data Protection
Data Protection
To view the demo of performing backup and recovery using Dell EMC NetWorker, click here.
Data Protection
Knowledge Check
Data Protection
Knowledge Check
1. Which backup component manages the backup operations and maintains the
backup catalog?
a. Backup client
b. Backup target
c. Backup server
d. Backup device
Data Protection
Data Deduplication
Data Protection
Data Deduplication
Data Protection
Deduplication
Data Deduplication is the process of detecting and identifying the unique data
segments within a given set of data to eliminate redundancy.
5It is the ratio of data before deduplication to the amount of data after
deduplication. This ratio is typically depicted as “ratio:1” or “ratio X” (10:1 or 10 X).
For example, if 200 GB of data consumes 20 GB of storage capacity after data
deduplication, the space reduction ratio is 10:1.
Data Protection
Notes:
Many files are common across multiple systems in a data center environment.
Many users across an environment store identical file such as Word documents,
Microsoft PowerPoint presentations, and Excel spreadsheets. Backups of these
systems contain many identical files. Also, many users keep multiple versions of
files that they are working on. Many of these files differ only slightly from other
versions, but are seen by backup applications as new data that must be protected.
Due to this redundant data, the organizations are facing many challenges. Backing
up redundant data increases the amount of storage that is required to protect the
data and then increases the storage infrastructure cost. It is important for
organizations to protect the data within the limited budget. Organizations are
running out of backup window time and facing difficulties meeting recovery
objectives. Backing up large amount of duplicate data at the remote site or cloud
for DR purpose is also cumbersome and requires huge bandwidth.
Data Protection
1 2 3 4
2: As data deduplication reduces the amount of content in the daily backup, users
can extend their retention policies. This approach can have a significant benefit to
users who require longer retention.
4: By using data deduplication at the client, redundant data is removed before the
data is transferred over the network. This approach reduces the network bandwidth
that is required for sending backup data to remote site for DR purpose.
Data Protection
Deduplication at Source
A A
VMs
A
Deduplication Agent
Data Protection
Deduplication at Target
VMs
Data Protection
Data Protection
Knowledge Check
Data Protection
Knowledge Check
Data Protection
Data Archiving
Data Protection
Data Archiving
Data Protection
Notes:
Data in the primary storage is actively accessed and changed. As data ages, it is
less likely to change and eventually becomes “fixed” but continues to be accessed
by applications and users. This data is called fixed data. Fixed data is growing at
over 90 percent annually. Keeping the fixed data in primary storage systems poses
several challenges.
Data archiving is the process of moving fixed data that is no longer actively
accessed to a separate lower-cost archive storage system for long-term retention
and future reference. With archiving, the capacity on expensive primary storage
can be reclaimed by moving infrequently accessed data to lower-cost archive
storage.
Data Protection
Data Protection
• The data archiving operation involves the archiving agent, the archive server
(policy engine), and the archive storage.
• Archiving agent scans primary storage to find files that meet the archiving
policy.
− The archive server indexes the files.
• Once the files have been indexed, they are moved to archive storage and small
stub 6 files are left on the primary storage.
6The stub file contains the address of the archived file. As the size of the stub file is
small, it saves space on primary storage.
Data Protection
• Email archiving is the process of archiving email messages from the mail server
to an archive storage.
− After the email is archived, it is retained for years, based on the retention
policy.
Legal Dispute
Government Compliance
− For example, an organization must produce all email messages from all
individuals that are involved in stock sales or transfers. Failure to comply
with these requirements could cause an organization to incur penalties.
• Email archiving provides more mailbox space by moving old email messages to
archive storage.
Data Protection
Data Protection
Knowledge Check
Data Protection
Knowledge Check
1. Which archiving component scans primary storage to find files that meet the
archiving policy?
a. Archiving agent
b. Archiving storage
c. Archiving client
d. Archiving policy engine
Data Protection
Data Migration
Data Protection
Data Migration
Data Protection
Data Protection
Hypervisor-Based Migration
VM Migrations
Migrated VMs
VM Migration
Compute System 1
Compute System 2
Network
Storage System
In this type of migration, virtual machines (VMs) are moved from one physical
compute system to another without any downtime. VM migration method enables:
Data Protection
VM Storage Migration
Compute System
Network
VM Storage
Migration
Storage system
Storage system
In a VM storage migration, VM files are moved from one storage system to another
system without any downtime or service disruption.
Data Protection
SAN-based Migration
Push
Control Remote
Device Device
Pull
SAN-based migration
Data Protection
NAS-based Migration
NAS-based migration
• NAS-based migration moves file-level data between NAS systems over LAN or
WAN.
In this example, the new NAS system initiates the migration operation and pulls the
data directly from the old NAS system over the LAN. The key advantage of NAS to
NAS direct data migration is that there is no need for an external component (host
or appliance) to perform or initiate the migration process.
Data Protection
• While the files are being moved, clients can access their files non-disruptively.
− Clients can also read their files from the old location and write them back to
the new location without realizing that the physical location has changed.
• Virtualization appliance creates a virtualization layer that eliminates the
dependencies between the data that is accessed at the file level and the
location where the files are physically stored.
Data Protection
Data Protection
Knowledge Check
Data Protection
Knowledge Check
1. Which migration moves file-level data between file servers over LAN or WAN?
a. NAS-based
b. Byte-based
c. SAN-based
d. Block-based
Data Protection
Concepts in Practice
Data Protection
Concepts in Practice
Dell EMC PowerProtect Data Manager provides software defined data protection,
automated discovery, deduplication, operational agility, self-service and IT
governance for physical, virtual and cloud environments.
Data Protection
DPA is a reporting and analytics platform that provides full visibility into the
effectiveness of your data protection strategy. It can automate and centralize the
collection and analysis of all data.
Data Protection
• Features include:
− Systems can scale to Petabyte of usable capacity.
− Cloud long-term retention and cloud DR-ready.
− Provides VMware integration.
• PowerProtect Appliance supports native Cloud DR with end-to-end
orchestration.
TimeFinder SnapVX is a local replication solution for PowerMax, VMAX All Flash
storage systems with cloud scalable snaps and clones to protect data. SnapVX
solution:
Data Protection
− Enables Continuous Data Protection for any point in time (PIT) recovery to
optimize RPO and RTO.
− Provides synchronous (sync) or asynchronous (async) replication policies.
Data Protection
• VMware vSphere High Availability (HA) leverages multiple ESXi hosts that are
configured as a cluster to provide rapid recovery from outages.
− Provides high availability for applications running in virtual machines.
− Protects against a server failure by restarting the virtual machines on other
hosts within the cluster.
− Protects against application failure by continuously monitoring a virtual
machine and resetting it if a failure is detected.
• VMware vSphere Fault Tolerance (FT) provides a higher level of availability.
− Enables users to protect any virtual machine from a host failure with no loss
of data, transactions, or connections.
− Provides continuous availability by ensuring that the states of the Primary
and Secondary VMs are identical at any point in time.
− If either the host running the Primary VM or the host running the Secondary
VM fails, an immediate and transparent failover occurs.
Dell EMC Cloud Tier provides a solution for long-term retention. Using advanced
deduplication technology that reduces storage footprints, unique data is sent to the
cloud and data lands on the cloud object storage already deduplicated. Cloud
tiering:
• Dell EMC Avamar enables fast, efficient backup and recovery through its
integrated variable-length deduplication technology.
Data Protection
• Avamar is optimized for fast, daily full backups of physical and virtual
environments, NAS servers, enterprise applications, remote offices and
desktops/laptops.
• Dell EMC Avamar is proven backup and recovery software that delivers secure
data protection for cloud, remote offices, desktops, laptops, and data centers.
Data Protection
Data Protection
Scenario
Challenges
Data Protection
Requirements
Deliverables
Solutions
Data Protection
Data Protection