0% found this document useful (0 votes)
237 views21 pages

Cohesity Best Practice CloudArchive

Cloud archive best prsctice

Uploaded by

jackjieli2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views21 pages

Cohesity Best Practice CloudArchive

Cloud archive best prsctice

Uploaded by

jackjieli2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Version 1.

2
November 2023

CloudArchive - Best Practices


Guide

ABSTRACT
This guide provides an overview of Cohesity CloudArchive, the best practices, and recommendations
for using the CloudArchive solution.
2

Table of Contents
CloudArchive Introduction ..................................................................................... 4
CloudArchive Versions............................................................................................... 5
Archival Considerations ......................................................................................... 6
Performance Considerations ................................................................................. 7
Right-sizing the Cluster for Archival ........................................................................... 7
Right-sizing the Cluster for Retrieval.......................................................................... 7
File Download Considerations ................................................................................... 9
Bandwidth Throttling Considerations ......................................................................... 9
Streamlining Archival Efficiency Through Protection Group Planning ...................... 11
CloudRetrieve Efficiency: Metadata-First Approach for Swift Data Recovery .......... 11
Data Movement Considerations .......................................................................... 13
CloudArchive Incremental Forever........................................................................... 13
CloudArchive Incremental with Periodic Full ............................................................ 14
Migration Considerations..................................................................................... 16
Recovery Considerations .................................................................................... 17
Remote S3 Compatible Targets Recommendations ........................................... 18
Technical Support and Resources ...................................................................... 19
Your Feedback .................................................................................................... 20
About the Authors ................................................................................................ 20
Document Version History................................................................................... 20

Send Feedback CloudArchive - Best Practices Guide


3

Figures
Figure 1: Cohesity CloudArchive ..................................................................................... 4
Figure 2: CloudRetrieve - CloudArchive Incremental Forever format .............................. 8
Figure 3: CloudRetrieve - CloudArchive Incremental with Periodic Full format ............... 8
Figure 4: Recovery - File Download from CloudArchive .................................................. 9
Figure 5: CloudArchive - Bandwidth Throttling Configuration ........................................ 10
Figure 6: CloudArchive - Archive Jobs Awaits for Backup Completion ......................... 11
Figure 7: CloudRetrieve - Select/Deselect Download Snapshot ................................... 12
Figure 8: CloudArchive - DataMovement ...................................................................... 13
Figure 9: CloudArchive - DataMovement to Colder Tiers .............................................. 13
Figure 10: CloudArchive - Enable CSP LCM................................................................. 14

Send Feedback CloudArchive - Best Practices Guide


4

CloudArchive Introduction

Long-term data and application retention is critical to organizations that seek to prevent data loss and
meet security, legal, and compliance requirements. The exponential growth of data volumes and the
resulting IT management demands have prompted businesses to seek out more cost-effective, reliable
data storage and protection solutions.
With Cohesity, IT organizations save time by quickly archiving data to multiple targets — public clouds,
private clouds, and any S3-compatible device, while increasing operational efficiency and lowering the
total cost of ownership (TCO).
Cohesity’s CloudArchive (CA) brings data protection and recovery together in a single coherent solution,
both on-premises and in the cloud.

Figure 1: Cohesity CloudArchive

Send Feedback CloudArchive - Best Practices Guide


5

CloudArchive Versions
Cohesity CloudArchive has two versions. Review the differences between the versions and the supported
sources before you select an archival method:

CloudArchive Incremental with CloudArchive Incremental Forever — available from 6.6.0a


Periodic Full onwards

Incremental with periodic full Incremental Forever (does not require any periodic full uploads)
(Data and metadata need a
full upload every 90 days)

Deduplication is at Protection Deduplication at Storage Domain level


Group level

NOTE: If the sources also support CloudArchive Incremental Forever archival format, Cohesity
recommends using CloudArchive Incremental Forever archival format to reduce overall costs for storing the
CloudArchive data.

Send Feedback CloudArchive - Best Practices Guide


6

Archival Considerations

• Cohesity recommends archiving data in Incremental Forever archival (CloudArchive Incremental


Forever) format. This removes the periodic full backup requirement and provides cost-effective
backup.

• To accomplish maximum storage efficiency with CloudArchive Incremental Forever, Cohesity


recommends using a single bucket rather than multiple buckets. CloudArchive Incremental Forever
deduplicates data at the storage domain level. It provides greater efficiency if all data is archived to
the same external target and simplifies the management of external targets, thus reducing cloud
costs.
• Cohesity recommends using multiple buckets and Storage Domains if the workloads and associated
storage efficiencies align with different storage efficiency settings.
o For example, if a database such as Oracle TDE is already encrypted, encrypting it again
when storing the data in Cohesity storage or an external target is unnecessary.
Therefore, you may disable encryption at the Storage Domain level in these situations.
▪ If you disable encryption only at the external target level while the Storage
Domain level encryption is still active, Cohesity will encrypt the data when storing
it on Cohesity storage. Then it will have to decrypt it again when writing to the
external target.
• For workloads with different storage efficiency settings, Cohesity recommends archiving them to
separate external targets under different Storage Domains with the efficiencies either enabled or
disabled based on the source workload.

• You can also enable or disable storage efficiency features at the target level.
o For example, Oracle databases that are encrypted provide better compression and do
not offer many deduplication benefits. CloudArchive Incremental with Periodic Full allows
you to turn off source-side deduplication and enable compression.
o Please note that you cannot turn off source-side deduplication in CloudArchive
Incremental Forever.
o Re-registering the same external target with different configuration settings is not
recommended, as it will overwrite the previous settings. To ensure optimal efficiency in
such situations, Cohesity recommends using distinct buckets for archiving purposes.

Send Feedback CloudArchive - Best Practices Guide


7

Performance Considerations

To ensure optimal performance, follow these best practices and guidelines.

Right-sizing the Cluster for Archival


The archival process involves several steps, including reading data from the source, compressing and
deduplicating it, storing the primary copy in the Cohesity cluster, and then writing it to archival storage.
These tasks can increase disk I/O and CPU utilization, potentially affecting the cluster's overall
performance if not managed effectively. To achieve optimal cluster performance, collaborate with
Cohesity to adequately size the cluster to accommodate additional nodes for meeting throughput
requirements.
Depending on the archival schedule, the Cohesity Sizer tool will incorporate extra nodes to meet the
necessary throughput. It's important to note that the Cohesity Sizer tool will introduce additional
throughput requirements only when selecting Daily Archives or more frequent schedules. If Weekly or
Monthly archival options are chosen, there will be no adjustments to the requirements, as Cohesity
expects the cluster to effectively distribute and manage the archival tasks without adversely affecting
cluster performance. However, for guidance on sizing your cluster appropriately for archival purposes and
to ensure optimal performance, we strongly recommend contacting your Cohesity Sales Engineer.
• If you intend to implement a shorter archival interval, such as archiving logs every 10 minutes, it will
undoubtedly have a noticeable impact on the overall performance of your cluster. The continuous
archival operations may lead to conflicts with other backup jobs in the cluster. Cohesity recommends
using a dedicated cluster for archival operations in such scenarios.

Right-sizing the Cluster for Retrieval


• During a retrieval operation, data is recovered to a new cluster. CloudArchive Incremental Forever
archives data directly to the chosen target, whereas CloudArchive Incremental with Periodic Full
Format follows a two-step process, first downloading the data to the Cohesity cluster and then
retrieving it to the selected target. When retrieving data stored in CloudArchive Incremental Forever
archival format, there is no specific cluster size requirement for the retrieval cluster. Cohesity's
process involves temporarily downloading the data and recovering it to the chosen source. Therefore,
whether you use a smaller or larger (higher storage/compute) cluster, the primary difference lies in
the speed of the recovery process.
o For example, suppose you wish to CloudRetrieve 10TB of data using a new Cohesity
Cloud Edition cluster with a size of 6TB. In that case, even though the cluster size is
smaller than the amount of data to be recovered, you can still perform the recovery
without any issues. However, the recovery speed will be slower, as Cohesity will
temporarily download the data and stream it to the requested source based on the
available cluster size.

Send Feedback CloudArchive - Best Practices Guide


8

Figure 2: CloudRetrieve - CloudArchive Incremental Forever format

• When considering a CloudRetrieve operation for data archived in the CloudArchive Incremental with
Periodic Full format, confirming that the retrieval cluster possesses sufficient space to accommodate
the data you intend to recover is crucial. In this case, Cohesity temporarily downloads the entire
dataset onto the retrieval cluster before recovering to the chosen source. Thus, when dealing with
CloudArchive in the Incremental with Periodic Full archival format, it is essential to ensure that the
retrieval cluster has ample space to accommodate the data download process.
o For example, if you wish to CloudRetrieve 10TB of data using a new Cohesity Cloud
Edition cluster, you must ensure that the cluster has an available size (12 TB) that is
more than the retrieval data size. With CloudArchive Incremental with a periodic full
archival format, Cohesity first downloads all the data to the retrieval cluster. Then the
data is recovered and sent to the source. Unlike CloudArchive Incremental forever
format, Cohesity does not stream the downloaded data as it gets downloaded from the
external target.

Figure 3: CloudRetrieve - CloudArchive Incremental with Periodic Full format

Send Feedback CloudArchive - Best Practices Guide


9

File Download Considerations


When initiating a recovery process in Cohesity, you can either download the file or recover it directly to
the selected source. Opting to download a large-sized file will inevitably lead to an extended download
time, regardless of the number of nodes within your cluster.

Figure 4: Recovery - File Download from CloudArchive

This occurs because:

• Cohesity needs to compress the file for downloading.


• The download operation is not distributed across multiple nodes; instead, a single node handles the
download operation.
This can adversely affect the overall recovery speed. Therefore, Cohesity recommends performing a file-
level recovery to the source instead of downloading large files.

Bandwidth Throttling Considerations


Bandwidth throttling can be a valuable tool for managing bandwidth usage and costs when archiving data
to external targets.

Send Feedback CloudArchive - Best Practices Guide


10

Figure 5: CloudArchive - Bandwidth Throttling Configuration

Cohesity suggests implementing bandwidth throttling for external targets exclusively when it's an
essential customer requirement. It's crucial to recognize that configuring bandwidth throttling can
negatively impact archival and recovery speed.
For example, bandwidth throttling can slow down the archival process when archiving a substantial
amount of data to S3 over a slow network connection. Similarly, during data recovery from S3 with a slow
network connection, bandwidth throttling can significantly extend the time needed for the recovery
process. To optimize these processes, consider the following:

• Schedule archival and recovery jobs during off-peak hours to minimize network congestion.
• To expedite the completion of large archival and recovery tasks, break them down into smaller, more
manageable jobs.

Send Feedback CloudArchive - Best Practices Guide


11

Streamlining Archival Efficiency Through Protection Group


Planning
Cohesity ensures that all local backups are complete before initiating the archival process. If a Protection
Group comprises objects of varying sizes, the archival task might be held up until the largest object is
successfully backed up. Thus, Cohesity recommends organizing similar-sized workloads within a single
Protection Group. This way, the archival process can commence promptly without waiting for the largest
object to complete its backup.

Figure 6: CloudArchive - Archive Jobs Awaits for Backup Completion

CloudRetrieve Efficiency: Metadata-First Approach for Swift


Data Recovery
During the CloudRetrieve process, once you've chosen a Protection Group, Cohesity will automatically
download both the metadata and the most recent snapshot.

Send Feedback CloudArchive - Best Practices Guide


12

Figure 7: CloudRetrieve - Select/Deselect Download Snapshot

`
If you're uncertain about which snapshot to use for data recovery, it's advisable to opt to download only
the metadata. You can then select the appropriate snapshot during the actual recovery workflow.
Downloading the snapshot data can be time-consuming, especially when the target is a cold-tier storage
option like AWS Glacier or Deep Archive. In such situations, it's more efficient to begin by obtaining the
metadata rather than immediately downloading the latest snapshot.
NOTE: For additional performance recommendations, open a Cohesity support ticket to receive expert
advice.

Send Feedback CloudArchive - Best Practices Guide


13

Data Movement Considerations

To ensure efficient data life cycle management (LCM) with Cohesity, follow these best practices and
guidelines.

CloudArchive Incremental Forever


• Cohesity recommends data movement with CloudArchive Incremental Forever (Available from 6.8.1
onwards). The Data Movement option allows Cohesity to own the Lifecycle Management (LCM) of
data in the external target.

• You can add multiple data movement policies so long as each time the data is being down-tiered to a
lower storage class than the previous storage class (in short, uptiering is not supported; data must
always be down-tiered).

Figure 8: CloudArchive - DataMovement

• Cohesity keeps all metadata and the latest snapshot in the original/fastest access tier. When
Cohesity recovers data from an archive, it first downloads metadata, and having metadata in the
higher access tier speeds recovery operations.

Figure 9: CloudArchive - DataMovement to Colder Tiers

Send Feedback CloudArchive - Best Practices Guide


14

• Cohesity recommends that you plan your recovery service level agreement (SLA) before enabling
Data Movement. If you need to recover data often from older snapshots, then using Data Movement
is not advisable. Depending on the size of the data and the tier, it could take a significant amount of
time to transfer the data from the cold tier to the higher tier for recovery, resulting in a long-running
task. It's essential to consider these factors as best practice and plan accordingly.
o For example, Data Movement is configured from S3 to S3 Glacier, and you need to
recover a single file from an older snapshot. In that case, it will take time because the
hydration time for S3 Glacier is four hours. Consequently, the time it takes to retrieve the
file will depend on its size, resulting in a longer recovery operation.

NOTE: Data Movement operates at the chunk file level, down-tiering only the unreferenced chunk files. On
the other hand, CloudArchive Incremental Forever deduplicates data at the bucket level and may involve
data being shared between two Protection groups. In this scenario, data will only get down-tiered if it meets
all associated down-tiering policies defined. If incremental archival only add new data without overwriting or
deleting existing data in the source, then the latest snapshot will reference all the data, and Cohesity will
keep the latest snapshot in the first tier itself, which means there won't be any data that will get down-tiered.
To ensure optimal results, iverify any changes to the data at the source (including additions, overwrites, and
deletions) before enabling Data Movement.

CloudArchive Incremental with Periodic Full


• With CloudArchive Incremental with Periodic Full, customers can use the native Cloud Service
Providers (CSP) LifeCycle Management (LCM) to move data between cloud storage tiers to reduce
long-term archival costs.

• Cohesity recommends enabling lifecycle policies to data (cohesity_icebox_data) and keeping


metadata and SnapTree data in the higher tier. For a restore operation, Cohesity downloads
metadata and SnapTree data to determine which data chunks need to be downloaded from the
archive. This speeds up the restore operation from a lifecycle policy-enabled bucket.

Figure 10: CloudArchive - Enable CSP LCM

Send Feedback CloudArchive - Best Practices Guide


15

See the AWS documentation to learn how to set up LCM on the bucket.
• Note that the option to tier the "cohesity_icebox_data" folder is unavailable in Azure. Azure will tier
everything written in the bucket and doesn't provide any options to select a specific folder for tiering.
• Data recovery from CSP LCM-enabled buckets may take longer than expected. This is due to
Cohesity's lack of awareness of data movement managed by the service provider through LCM.
During a recovery operation, Cohesity attempts to retrieve the data from the first tier, which is already
down-tiered by CSP's LCM. This retrieval operation may take a longer time to complete. It's important
to note that Cohesity doesn't support FLR from CSP LCM-enabled buckets. This is because
reconstructing FLR data is more time-consuming than performing full recovery, and due to the long-
running operation, Cohesity will fail FLR from CSP LCM-enabled buckets.

Send Feedback CloudArchive - Best Practices Guide


16

Migration Considerations

To ensure a smooth migration of CloudArchive Incremental with Periodic Full archives to CloudArchive
Incremental Forever format, follow these best practices and guidelines:
• Ensure that the workload supports CloudArchive Incremental Forever format from the support matrix.

• Ensure that the external target configured for CloudArchive Incremental with Periodic Full archival
also supports CloudArchive Incremental Forever from the support matrix.
• Cohesity recommends utilizing the S3 protocol for archiving with CloudArchive Incremental forever if
the external target supports both NFS and S3 protocols, as it can deliver better performance.
• Cohesity recommends using the existing external target. Once you prepare the external target for
CloudArchive Incremental Forever, the jobs will automatically upload a new reference full in
CloudArchive Incremental Forever format during the next scheduled periodic reference full upload.
CloudArchive continues to archive the data in CloudArchive Incremental with Periodic Full format till
the following scheduled reference full backup. The old CloudArchive Incremental with Periodic Full
jobs will get garbage collected based on the retention policies.

• After migrating the jobs to CloudArchive Incremental Forever, you may notice that compared to
CloudArchive Incremental with Periodic Full, there is a relatively higher API cost for CloudArchive
Incremental Forever. This is because CloudArchive Incremental Forever writes data in small chunks
for efficient space reclamation.

• Cohesity recommends using CloudArchive Incremental Forever as even with relatively higher API
cost, in the long run, you will still save more cost with CloudArchive Incremental Forever because of
its efficient space reclamation and deduplication features.
NOTE: If you want to immediately migrate CloudArchive Incremental with Periodic Full jobs to CloudArchive
Incremental Forever, contact support to perform a planned (expedited) migration.

For more information, see the CloudArchive Migration guide.

Send Feedback CloudArchive - Best Practices Guide


17

Recovery Considerations

To ensure a smooth data restoration operation with Cohesity, follow these best practices and
guidelines:
• Cohesity recommends planning the SLA before initiating any recoveries.

• The time to retrieve data from different storage tiers varies depending on the type of storage. For
instance, if data recovery is required from S3 Glacier, the first byte of data will take a minimum of four
hours to be available for recovery. For S3 Glacier Deep Archive, the expected recovery time is 12
hours. For the Azure archive tier, it's 15 hours.

• During the planning phase, Cohesity suggests planning for the Recovery Time Objective (RTO)
before selecting a storage class for archival purposes. If frequent recovery is expected, it's not
advisable to use colder tiers.

Send Feedback CloudArchive - Best Practices Guide


18

Remote S3 Compatible Targets Recommendations

This section outlines some recommendations and best practices that help you set up and run Cohesity
CloudArchive with S3-compatible targets.
• CloudArchive Incremental Forever does not support remote S3-compatible targets. When registering
an S3-compatible target with Incremental Forever archival format, ensure that the target is hosted in
the same data center.
• For efficient space reclamation, Cohesity periodically downloads data from the S3-compatible targets,
and using a remote S3-compatible target will create network congestion with the frequent data
download. Hence, Cohesity recommends CloudArchive Incremental with Periodic Full archival format
for Remote S3 compatible targets. Always refer to the documentation to verify any changes in the
support for remote S3-compatible targets with the latest versions of Cohesity.

Send Feedback CloudArchive - Best Practices Guide


19

Technical Support and Resources

Cohesity Support Portal provides you access to a robust, on-demand, and detailed knowledge base,
along with high-quality services to boost your experiences with Cohesity products.

Cohesity Product Documentation provides you access to the latest product documentation to support
your deployment of Cohesity products, including technical guides and a third-party software support
matrix for Cohesity Data Protection.

Cohesity Developer Portal provides you with ready-to-use integrations with the automation and
orchestration tools you choose to streamline operations.

Send Feedback CloudArchive - Best Practices Guide


20

Your Feedback

Was this document helpful? Send us your feedback!

About the Authors

Saran Ravi is a Staff Technical Solutions Engineer at Cohesity. In his role, Saran focuses on Cloud and
Kubernetes.
Other essential contributors include:

• Adaikkappan Arumugam, Director product Solutions


• Anirudh Kumar, Staff 2 Engineer, Dev

• Dayanand Sharma, Director Product Management

• David Jayanathan, Field Technical Director


• James White, Principal Solutions Engineer

• Kevin Hill, Manager, Solution Architects

• Shayne Wiliams, Principal Architect


• Karan Naik, Senior Site Reliability Engineer

• Bharath Nagraj, Senior Principal Field Technical Director


• Edwin Galang, Solution Architect

• Anupam Sharma, Staff 2 Engineer, QA

Document Version History

VERSION DATE DOCUMENT HISTORY

1.2 Nov 2023 Content update

1.1 Apr 2023 Content update

1.0 JAN 2023 First full release

Send Feedback CloudArchive - Best Practices Guide


21

ABOUT COHESITY
Cohesity radically simplifies data management. We make it easy to protect, manage, and derive value
from data -- across the data center, edge, and cloud. We offer a full suite of services consolidated on
one multicloud data platform: backup and recovery, disaster recovery, file and object services, dev/test,
and data compliance, security, and analytics -- reducing complexity and eliminating mass data
fragmentation. Cohesity can be delivered as a service, self-managed, or provided by a Cohesity-powered
partner.
Visit our website and blog, follow us on Twitter and LinkedIn and like us on Facebook.

© 2023. Cohesity, Inc. All Rights Reserved. The information supplied herein is the confidential and proprietary information of Cohesity and may only
be used (a) by the intended recipients and (b) in conjunction with validly licensed Cohesity software and services. Find the terms of Cohesity
licenses at www.cohesity.com/agreements.

Cohesity, the Cohesity logo, SnapTree, SpanFS, DataPlatform, DataProtect, Helios, the Helios logo, DataGovern, SiteContinuity, DataHawk, and
other Cohesity marks are trademarks or registered trademarks of Cohesity, Inc. in the US and/or internationally. Other company and product
names may be trademarks of the respective companies with which they are associated. This material (a) is intended to provide you information
about Cohesity and our business and products; (b) was believed to be true and accurate at the time it was written, but is subject to change without
notice; and (c) is provided on an “AS IS” basis. Cohesity disclaims all express or implied conditions, representations, warranties of any kind.

Send Feedback CloudArchive - Best Practices Guide

You might also like