0% found this document useful (0 votes)
36 views

vSAN_ArchitectureOverview

Uploaded by

conggiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

vSAN_ArchitectureOverview

Uploaded by

conggiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

TECHNICAL WHITE PAPER:

February 2024

VMware vSAN™
Architecture Overview &
Setup
Proof of Concept (PoC) Guide
VMware vSAN™ Architecture Overview & Setup PoC Guide

Table of contents
Introduction ................................................................................................................................................................. 4

Overview 4

What’s New in vSAN 8.0 4

Hardware Selection 5

General Approach to Testing vSAN 5

Prerequisites ................................................................................................................................................................ 7

Hardware Compatibility 7

vSAN ESA or OSA 7

vSAN ESA ReadyNode™ & ReadyNode Emulated Configuration Guidance 7

vSAN OSA Hardware Guidance 9

vSAN Setup Assumptions 10

vSAN OSA All-Flash or Hybrid 10

Enabling a Single vSAN HCI Cluster ............................................................................................................................... 11

Using Quickstart to Enable Single vSAN HCI Cluster 11

Manually Enabling vSAN Services on a Cluster 24

Enabling vSAN Services on a VMware Cloud Foundation™ based Cluster 24

Enabling vSAN Max™ - Disaggregated Storage............................................................................................................. 26

vSAN Max Sizing Considerations 26

Disaggregated Storage for vSAN OSA (AKA: HCI Mesh) 26

Using Quickstart to Enable vSAN Max Cluster 27

Manually Enabling vSAN Max on a Cluster 32

Enabling vSAN Max/HCI Mesh Services on a VMware Cloud Foundation™ based Cluster 32

Enabling vSAN Compute Cluster.................................................................................................................................. 33

Overview 33

Enabling vSAN Compute Clusters in VMware Cloud Foundation 33

Enabling Two-Node and vSAN Stretched Clusters ........................................................................................................ 34

Overview 34

Enabling vSAN Stretched Clusters in VMware Cloud Foundation 34

Check the vSAN Cluster Thoroughly............................................................................................................................. 35

Why Is This Important? 35

Use Health Check to Verify vSAN Functionality 35

TECHNICAL WHITE PAPER | 2


VMware vSAN™ Architecture Overview & Setup PoC Guide

Manually Checking against the VCG 40

vSAN Basics ............................................................................................................................................................... 42

Deploy your first Virtual Machine 42

Verifying Disk Layout of a VM on vSAN 48

Configuring Fault Domains 52

Creating a Snapshot 53

Clone a Virtual Machine 58

vMotion a Virtual Machine Between Hosts 61

VMware vSphere® Storage vMotion® (Storage vMotion) VM Between Datastores 65

VM Storage Policies and vSAN .................................................................................................................................... 67

Create a New VM Storage Policy 68

Deploy a new VM with a New Storage Policy 72

Assign a new Storage Policy to an Existing VM 76

Modify a VM Storage Policy 79

IOPS Limits 82

Space Efficiency & Encryption Defined by Policy (vSAN ESA) 82

APPENDIX A: vSAN RDMA Configuration & Troubleshooting ........................................................................................ 84

Overview 84

Configuration Example 85

RDMA troubleshooting 88

APPENDIX B: Cleanly Removing vSAN Configuration..................................................................................................... 91

vCLS Retreat Mode 91

Remove vSAN Partitions and Clear Data 91

TECHNICAL WHITE PAPER | 3


VMware vSAN™ Architecture Overview & Setup PoC Guide

Introduction
Overview
This document primarily details the overall architecture and setup of vSAN Express Storage Architecture™ (ESA) cluster
environments. vSAN Original Storage Architecture™ (OSA) environments are covered where they differ from vSAN ESA. The
specific focus of the document includes overviews of:

● Single vSAN HCI cluster deployments (ESA and OSA)


● vSAN Max™ cluster deployments (disaggregated storage)
● vSAN Compute cluster deployments
● vSAN Two-Node and Stretched Cluster deployments.

Additionally, the document includes reviews of basic functions such as vSAN storage policies and virtual machine deployment.

Separate guides are dedicated to more detailed discussions of specific vSAN related areas:

● vSAN Proof of Concept: vSAN Management, Monitoring & Hardware Testing


● vSAN Proof of Concept: vSAN Features
o Space efficiency features (e.g., compression, deduplication, RAID-5/RAID-6 erasure coding, and
Trim/Unmap)
o Encryption
o File Services
● vSAN Proof of Concept: vSAN Performances Testing
● vSAN Proof of Concept: vSAN Stretched Cluster & Two-Node Overview & Testing

What’s New in vSAN 8.0


vSAN 8.0 is a major milestone release that includes a new architecture option for new vSAN clusters, and there are now two
distinct ways to configure a vSAN cluster. vSAN 8.0 introduced ESA, which optimizes the use of certified NVMe drives in a
single tier, with a dynamic approach to data placement. vSAN 8.0 Update 2 introduces Max. A disaggregated deployment
model, providing scalable, cost-effective storage solutions to empower your cloud infrastructure. This innovative approach
caters to three primary use cases: driving cost optimization for infrastructure and applications, ensuring operational simplicity
with unified storage, and facilitating rapid scaling for cloud-native applications. The classic OSA model, which is updated from
vSAN 7, remains an option.

For a comprehensive overview of ESA, visit: https://core.vmware.com/vsan-esa.

Updates to the OSA include:

● Change in buffer device capacity, increasing from 600GB to 1.6TB.


● For HCI Mesh, the number of server and client clusters increases from 5 to 10
● Improvements in vSAN File Services operations
● vSAN Boot time optimizations

TECHNICAL WHITE PAPER | 4


VMware vSAN™ Architecture Overview & Setup PoC Guide

Hardware Selection
Choosing the appropriate hardware is one of the most important factors in the successful validation of vSAN. Hardware
recommendations will differ between ESA and OSA deployments.

There are many variables with hardware (drivers, controller firmware versions) so be sure to choose hardware that is on
the VMware Compatibility List. See the ‘Prerequisites’ section of this guide for more information.

General Approach to Testing vSAN


Once the appropriate hardware is selected for testing, it is useful to define the use case, goals, expected results and success
metrics.

The testing lifecycle can be broken into three phases:

Day-0

The post-design phase, installation of the infrastructure components, including: the hypervisor VMware ESXi™ (ESXi); control
plane VMware vCenter®(vCenter); physical network uplinks and upstream network devices; essential services, such as DNS
and NTP; VLANs, etc.

Day-1

Setup and configuration of the required solution (in the case of vSAN).

Day-2

Operational aspects and monitoring. The most important aspects to validate when testing:

● Successful vSAN configuration and deployment


● VMs successfully deployed to vSAN Datastore
● Reliability: VMs and data remain available in the event of failure (host, disk, network, power)
● Serviceability: Maintenance of hosts, disk groups, disks, clusters
● Performance: vSAN and selected hardware can meet the application, as well as business needs
● Validation: vSAN features are working as expected (File Service, Deduplication and Compression, RAID-5/6,
checksum, encryption)
● Day-2 Operations: Monitoring, management, troubleshooting, and upgrades

TECHNICAL WHITE PAPER | 5


VMware vSAN™ Architecture Overview & Setup PoC Guide

These can be grouped into three common types: resiliency testing, performance testing, and operational testing.

Resiliency Testing
As with any storage solution, failures can occur on hardware components at any time due to age, temperature, firmware, etc.
Such failures can occur among storage controllers, disks, nodes, and network devices among other components. As a
software solution, vSAN is designed to be resilient against these failures. In this guide, we will examine how to systematically
test against disk, host, network, and control plane failures.

Operational Testing
Understanding how the solution behaves during normal (or “day two”) operations is important to consider as part of the
evaluation. Fortunately, because vSAN is embedded within the ESXi hypervisor, many vSAN operations tasks are simply
extensions of normal VMware vSphere® (vSphere) operations. Adding hosts, migrating VMs between nodes, and cluster
creation are some of the many operations that are consistent between vSphere and vSAN.

Performance Testing
Before embarking on testing, it is important to set clear objectives and understand the performance requirements of the
environment. Close attention to details such as workload I/O profiles, latency and hotspots is needed. In this guide, we will
explore how to conduct performance tests with a consistent, methodological approach.

TECHNICAL WHITE PAPER | 6


VMware vSAN™ Architecture Overview & Setup PoC Guide

Prerequisites
Hardware Compatibility
Plan on testing a reasonable hardware configuration resembling a production-ready environment that suits your business
requirements. Refer to the VMware vSAN Design and Sizing Guide for information on design configurations and
considerations when deploying vSAN.

As vSAN is a software solution, it is critical to ensure that well supported, enterprise class hardware components are used. The
VMware Compatibility Guide (or “VCG”) lists components that have been tested and certified for use with vSAN. In addition,
the vSAN Hardware Compatibility List (or “HCL”) lists hardware compatibility specific to vSAN.

Note: The terms “VCG” and “HCL” may be used interchangeably (both within this guide, and in other documentation), but
essentially pertain to hardware compatibility.

Note: If using a VMware vSAN ReadyNode™ (ReadyNode) or appliance, the factory-installed hardware is guaranteed to
be compatible with vSAN. However, be mindful that BIOS updates and firmware and device driver versions may be out of
date and should be checked.

vSAN ESA or OSA


If using a certified ReadyNode or vSAN appliances with specific certified devices, then vSAN ESA (which has an optimized
data path and placement for NVMe devices) is an option.

The table below summarizes the minimum requirements for each architecture:

vSAN 8.0 ESA vSAN 8.0 OSA

Storage Device Minimums 4 devices per host 1 capacity device + 1 cache device per host

ReadyNodes, Appliances, build your own ReadyNodes, Appliances, build your own with
with certified devices certified devices
Hardware Support
Only vSAN ESA certified NVMe devices Any SATA, SAS. NVMe certified device

Networking Requirements 25Gbps 10Gbps


(minimum)

Note that 25Gbps networking (or faster) is required vSAN ESA. The one exception is the vSAN-ESA-AF-0 ReadyNode
configuration, which allows for 10Gbps networking. It is designed for Edge, 2-node, and other small environments with
relatively few VMs, and minimal workload demands. For more information, please refer to:
https://core.vmware.com/blog/esa-all-platforms-and-all-workloads-esa-af-0

vSAN ESA ReadyNode™ & ReadyNode Emulated Configuration Guidance


Customers can pick a server manufacturer’s ESA certified ReadyNode platform listed on the VMware Compatibility Guide
(VCG) for vSAN ESA, and can build out a server configuration using hardware components as they are listed on the given
ReadyNode specification, emulating a ReadyNode purchased using a single SKU. This approach can help customers who
chose not to purchase a ReadyNode through an official SKU but have the same or better hardware found in the desired
ReadyNode classification.

TECHNICAL WHITE PAPER | 7


VMware vSAN™ Architecture Overview & Setup PoC Guide

Here is the direct link to the VMware Compatibility Guide vSAN ESA page:
https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsanesa

For further vSAN ESA hardware guidance, see:

● vSAN ESA ReadyNode Hardware Guidance - https://www.vmware.com/resources/compatibility/vsanesa_profile.php


● KB article on vSAN ESA architecture - https://kb.vmware.com/s/article/90343
● Support for ReadyNode Emulated Configurations in vSAN ESA Blog - https://core.vmware.com/blog/support-
readynode-emulated-configurations-vsan-esa

TECHNICAL WHITE PAPER | 8


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSAN OSA Hardware Guidance


As with ESA deployments, OSA supports both prescriptive ReadyNode and customized architectures. Below is the direct link to
the VMware Compatibility Guide vSAN OSA page:
https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsanosa

For further vSAN OSA hardware guidance, see:

● vSAN OSA Hardware Quick Reference Guide -


https://www.vmware.com/resources/compatibility/vsan_profile.html?locale=
● KB article on vSAN OSA architecture - https://kb.vmware.com/s/article/52084

TECHNICAL WHITE PAPER | 9


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSAN Setup Assumptions


This document assumes a generic vSphere deployment (one vCenter and at least four ESXi hosts). This document does not
assume the test environment is deployed using the VMware Cloud Foundation™ (VCF) model. If the VCF model is used, there
may be an impact to the information in this guide. Those impacts are noted and references to the proper VCF versions are
provided.

The list below details the setup requirements prior to using this guide:

● Hosts with supported hardware and firmware (see above)


● All hosts with ESXi 8.0 Update 2 (build number 22380479) or newer deployed
● vCenter Server 8.0 Update 2 (build number 22617221) or newer has been deployed to manage these ESXi hosts
● DNS and NTP services are available
● The hosts should not be added to any cluster
● If using the Quickstart workflow
o Each host should have a management network VMkernel
o VMware vSphere® vMotion (vMotion) and vSAN networks will be configured during the workflow
● If not using the Quickstart workflow, each host should have a management and a vMotion network VMkernel port
already configured
● A set of IP addresses
o One per ESXi host will be needed for the vSAN network VMkernel ports (the recommendation is that these
are all on the same VLAN and IPv4 or IPv6 network segment)
o One per ESXI will be needed for the vMotion network VMkernel port (the recommendation is that these are
all on the same VLAN and IPv4 or IPv6 network segment)
● If possible, configure internet connectivity for vCenter such that the HCL database can be updated automatically.
(Internet connectivity is also a requirement to enable Customer Experience Improvement Program or CEIP)
● Optionally, for the purposes of testing Storage vMotion operations, an additional datastore type (such as NFS or
VMFS) should be presented to all hosts

vSAN OSA All-Flash or Hybrid


When reviewing whether to deploy all-flash or hybrid configurations (all-flash recommend), there are several factors to
consider:

● All-Flash vSAN requires at least a 10Gb Ethernet network


● Flash devices are used for both cache and capacity
● Deduplication and Compression are space-efficiency features available in all-flash configuration and not available with
hybrid configuration
● Erasure Coding (RAID 5/ RAID 6) is a space efficiency feature available on all-flash configuration only
● Flash read cache reservation is not used with all-flash configurations; reads come directly from the capacity tier SSDs
● Endurance and performance classes now become important considerations for both cache and capacity layers

TECHNICAL WHITE PAPER | 10


VMware vSAN™ Architecture Overview & Setup PoC Guide

Enabling a Single vSAN HCI Cluster


Using Quickstart to Enable Single vSAN HCI Cluster
Follow this section to configure a single vSAN HCI cluster via the Quickstart process (recommended).

Note:

● If you wish to manually setup a vSAN HCI Cluster, please refer to the Manually Enabling vSAN Services on a Cluster
section
● If you deployed vSAN using the bootstrap feature of vCenter deployment, you will not be able to use Quickstart to
configure your environment
● vSAN Max deployments, disaggregated storage, requires a specific vSAN ESA deployment, if you plan to test vSAN
Max, please skip this section a go directly to the Enabling vSAN Max - Disaggregated Storage Section
● ESA and OSA HCI deployment steps are similar

● The walkthrough below calls out where OSA and ESA steps may differ.

Initialize Cluster
Navigate to your Datacenter > Click Actions > New Cluster.

The New Cluster screen pops-up and we are presented with a dialog to enable services. Provide a name for the cluster and
select vSAN from the list of services. We are also able to enable vSAN ESA (the default).

Note: Once the cluster is created with the ESA flag, it cannot be changed unless the cluster is re-created.

Be aware that:

● If targeting an OSA deployment, the “Enable vSAN ESA” checkbox must be unchecked
● For the Quickstart workflow to configure the vMotion VMkernel, vSphere DRS must be set to enabled

TECHNICAL WHITE PAPER | 11


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSAN ESA Example:

vSAN OSA Example:

TECHNICAL WHITE PAPER | 12


VMware vSAN™ Architecture Overview & Setup PoC Guide

We can also setup the cluster to use a single image (thereby enabling vLCM). For more information on vLCM, see:
https://core.vmware.com/resource/introducing-vsphere-lifecycle-management-vlcm.

Quickstart - Cluster Basics


The initial cluster creation above initializes the Quickstart process. Once the cluster has been created, navigate to [vSAN
Cluster] > Configure > Quickstart. On this screen you will be able to confirm the basic services selected previously then move
to the add hosts and configuration phases.

vSAN ESA Example:

vSAN OSA Example:

TECHNICAL WHITE PAPER | 13


VMware vSAN™ Architecture Overview & Setup PoC Guide

Quickstart - Add Hosts


The next step is to add hosts. The process for ESA and OSA is identical, clicking on the 'Add' button on the 'Add hosts' section
presents the dialog below. Multiple hosts can be added at the same time (by IP or FDQN). Additionally, if the credentials of
every host are the same, tick the checkbox above the list to quickly complete the form. If the hosts are already in vCenter, they
will appear in the ‘Existing hosts’ tab.

Note: It may be useful to leave one host out of the configuration here to later demonstrate cluster expansion.

Once the host details have been entered, click Next. You are then presented with a dialog showing the thumbprints of the
hosts. If these are as expected, tick the checkbox(es) and then click OK.

TECHNICAL WHITE PAPER | 14


VMware vSAN™ Architecture Overview & Setup PoC Guide

A summary will be displayed, showing the vSphere version on each host and other details. Verify the details are correct:

TECHNICAL WHITE PAPER | 15


VMware vSAN™ Architecture Overview & Setup PoC Guide

On the next screen, we have the option to import the ESXi an image from a host (to set as the cluster’s new image). Select an
option and continue.

TECHNICAL WHITE PAPER | 16


VMware vSAN™ Architecture Overview & Setup PoC Guide

Finally, review and click Finish if everything is in order.

After the hosts have been added, validation will be performed automatically on the cluster. Check for any errors and
inconsistencies and re-validate if necessary.

TECHNICAL WHITE PAPER | 17


VMware vSAN™ Architecture Overview & Setup PoC Guide

Quickstart - Configure Cluster


The next step is to configure the vSAN HCI cluster. After clicking on Configure under Step 3: Configure Cluster, the Configure
Cluster workflow will start. For ESA deployments the first step will ask whether this is a vSAN HCI or vSAN Max deployment.
Ensure the vSAN HCI radio button is selected. OSA deployments do not have this screen. Instead, they start at the
Distributed Switches step.

Step 1: Select Cluster Type:

Step 2: Configure Distributed Switches:

The next dialog allows for the configuration of the distributed switch(es) for the cluster. Leave the default 'Number of
distributed switches' set to 1 and assign a name to the switch and port groups.

TECHNICAL WHITE PAPER | 18


VMware vSAN™ Architecture Overview & Setup PoC Guide

Scroll down and configure the port groups and physical adapters as needed.

Steps 3 & 4: Configure vMotion and Storage Traffic:

On the next two screens, set the VLAN and IP addresses to be used for the vMotion and vSAN for each host. The ‘autofill’
function can be used to input consecutive IP addresses.

TECHNICAL WHITE PAPER | 19


VMware vSAN™ Architecture Overview & Setup PoC Guide

Step 5: Configure Advanced Options:

Here we select ‘Single site cluster’ as the deployment type. For stretched clusters, refer to the stretched cluster guide

Auto-Policy Management is a vSAN ESA specific feature that ensures resilience settings for the environment are optimally
configured automatically. Enabling this feature does not preclude the creation of custom storage policies as needed post
deployment. For this deployment Auto-Policy management is set to the default “enabled.”

Note: OSA deployments do not support Auto-Policy Management or the RDMA support features. These options will be
grayed out in an OSA deployment.

For more information please review: https://core.vmware.com/blog/auto-policy-remediation-enhancements-esa-vsan-8-u2

TECHNICAL WHITE PAPER | 20


VMware vSAN™ Architecture Overview & Setup PoC Guide

Ensure that an NTP server is added to the configuration (under ‘Host Options’). As vSAN is a distributed system, features may
not work as expected if there is a time drift between servers (ideally, also ensure that vCenter has the same time source).

TECHNICAL WHITE PAPER | 21


VMware vSAN™ Architecture Overview & Setup PoC Guide

Step 6: Claim Disks:

On the next screen, select the disks to use. For a vSAN OSA cluster, the system will attempt to select disks automatically as
cache or capacity (on a vSAN ESA cluster, there is no tier selection). As the ESA example images indicate, the workflow checks
drive compatibility and warns if an issue is detected.

vSAN ESA example:

TECHNICAL WHITE PAPER | 22


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSAN OSA Example:

TECHNICAL WHITE PAPER | 23


VMware vSAN™ Architecture Overview & Setup PoC Guide

Step 7: Review:

Finally, check everything is as expected and click Finish.

Monitor the creation in the task view and wait for the cluster to be formed.

Using Quickstart, we have created a cluster with vSAN enabled with the correct networking in place.

Manually Enabling vSAN Services on a Cluster


Note: If Quickstart was used (as per the earlier section) then this section can be skipped.

Manual vSAN enablement is available for those that do not wish to use the Quickstart process.

For this scenario, please follow the Manually Enabling vSAN instructions on the VMware Docs page linked below:
https://docs.vmware.com/en/VMware-vSphere/8.0/vsan-planning/GUID-53571374-B3E5-4767-A372-FEB7C995AF71.html

Enabling vSAN Services on a VMware Cloud Foundation™ based Cluster


VCF includes dedicated processes to automate the deployment and configuration of core infrastructure including vSAN
services. In fact, these processes are required and are the only supported methods within VCF. This applies to standing up
the initial Management Domain, a subsidiary Workload Domain, or a cluster with a Domain.

For more information on enabling vSAN services within the VCF Model please review these links:

● vSAN Storage with VCF – https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-766FBED1-


6FE8-46D0-99C5-62355478F2CF.html
● VCF Admin Guide main page – https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-
D5A44DAA-866D-47C9-B1FB-BF9761F97E36.html

TECHNICAL WHITE PAPER | 24


VMware vSAN™ Architecture Overview & Setup PoC Guide

● Management Domain Deployment Walkthrough – https://core.vmware.com/cloud-foundation-automation#domain-


operations
● Workload Domain Deployment Walkthrough -https://core.vmware.com/cloud-foundation-automation#domain-
operations
● Add a Cluster to an existing Workload Domain – https://core.vmware.com/cloud-foundation-automation#cluster-
operations

TECHNICAL WHITE PAPER | 25


VMware vSAN™ Architecture Overview & Setup PoC Guide

Enabling vSAN Max™ - Disaggregated Storage


What is vSAN Max? VMware’s new disaggregated storage offering that provides Petabyte-scale centralized shared storage for
your vSphere clusters.

Built on the foundation of vSAN ESA, vSAN Max is a fully distributed architecture, where access to data is NOT funneled
through centralized I/O controllers but using the full power of each node (host) in the vSAN Max cluster. The aggregated
resources across all hosts in a vSAN Max cluster contribute to the processing of I/O. The addition of new hosts means that it
can scale capacity and performance linearly.

For more additional technical information please review:

● Main vSAN Max informational page – https://core.vmware.com/vsan-max


● vSAN Max Design & Operations Guide – https://core.vmware.com/resource/vsan-max-design-and-operational-
guidance
● vSAN Max introduction blog post – https://core.vmware.com/blog/introducing-vsan-max
● vSAN Max scalability blog post – https://core.vmware.com/blog/vsan-max-and-advantage-scalability

vSAN Max Sizing Considerations


When sizing vSAN Max deployments, consider that vSAN Max clusters support:

● A maximum of 32 ESXi hosts in the cluster (24 ESXi hosts recommended)


● A maximum of 10 compute clusters mounting to a single vSAN Max cluster
● A maximum of 128 total ESXi hosts (both within the vSAN Max cluster and the vSAN Compute clusters connecting to a
single vSAN Max datastore)

Note: Limiting the vSAN Max cluster size to 24 ESXi hosts will allow for up to 104 ESXI hosts from vSAN compute clusters to
mount the datastore, offering a 4.3:1 ratio. A vSAN Max cluster size of 32 ESXi hosts would allow for up to 96 ESXI hosts
from vSAN compute clusters to mount the datastore, offering a storage ratio of 3:1.

Disaggregated Storage for vSAN OSA (AKA: HCI Mesh)


Although the vSAN Max is explicitly a vSAN ESA function, vSAN OSA deployments still support disaggregated storage.

vSAN OSA datastores can be shared between two vSAN clusters, utilizing vSAN’s native data path for cross-cluster
connections. Compute Only Clusters are also supported.

Each vSAN OSA client cluster can mount a maximum of ten remote vSAN OSA datastores. A vSAN OSA server cluster can
export its datastore up to a maximum of ten client clusters.

All vSAN features are supported except for Data-in-Transit encryption, Cloud Native Storage (including vSAN Direct),
Stretched Clusters, and 2-Node Clusters. Additionally, HCI Mesh will not support remote provisioning of File Services Shares,
iSCSI volumes, or First-Class Disks (FCDs). File Services, FCDs, and the iSCSI service can be provisioned locally on clusters
participating in a mesh topology but may not be provisioned on a remote vSAN datastore.

The same MTU sizing is required for both the Client and Server clusters.

TECHNICAL WHITE PAPER | 26


VMware vSAN™ Architecture Overview & Setup PoC Guide

Using Quickstart to Enable vSAN Max Cluster


vSAN Max leverages vSAN ESA, as such the initial enablement process is very similar to the steps reviewed in the Using
Quickstart to Enable Single vSAN HCI Cluster section of this document.

Initialize Cluster
Navigate to your Datacenter > Click Actions > New Cluster.

The New Cluster screen pops-up and we are presented with a dialog to enable services. Provide a name for the cluster and
select vSAN from the list of services. Ensure that vSAN ESA (the default). For the Quickstart workflow to configure the vMotion
VMkernel, vSphere DRS must be set to enabled.

We can also setup the cluster to use a single image (thereby enabling vLCM). For more information on vLCM, see:
https://core.vmware.com/resource/introducing-vsphere-lifecycle-management-vlcm.

TECHNICAL WHITE PAPER | 27


VMware vSAN™ Architecture Overview & Setup PoC Guide

Quickstart – Cluster Basics


The initial cluster creation above initializes the Quickstart process. Once the cluster has been created, navigate to [vSAN
Cluster] > Configure > Quickstart. On this screen you will be able to confirm the basic services selected previously then move
to the add hosts and configuration phases.

Quickstart – Add Hosts


The Adding Hosts steps for vSAN Max are identical to those for a single vSAN HCI cluster. Refer to the steps documented in
the Enable a Single vSAN HCI, Quickstart – Add Hosts section of this document.

Quickstart – Configure Cluster


The next step is to configure the vSAN Max cluster. After clicking on Configure under Step 3: Configure Cluster, the Configure
Cluster workflow will start. Ensure that vSAN Max is selected.

TECHNICAL WHITE PAPER | 28


VMware vSAN™ Architecture Overview & Setup PoC Guide

Step 1: Select Cluster Type

Steps 2 -7: Configuring the Cluster


The remaining steps to configure the vSAN Max cluster are identical to those for a single vSAN HCI cluster. Refer to the steps
documented in the Enable a Single vSAN HCI, Quickstart – Configure Cluster section of this document.

After the new vSAN Max cluster creation completes, navigate to [vSAN Cluster] > Configure > vSAN > Services. The screen
will show that the vSAN Max cluster is ready to provide disaggregated storage to vSAN Computer clusters.

TECHNICAL WHITE PAPER | 29


VMware vSAN™ Architecture Overview & Setup PoC Guide

Now navigate to [vSAN Cluster] > Configure > vSAN > Remote Datastore. This screen shows the name of the remote
datastore created by the vSAN Max cluster configuration workflow. The datastore name is a default name. If you wish to
rename this datastore please refer to the Post-Configuration – Renaming vSAN Datastore section of this document.

If you do not wish to rename the datastore, you are now ready to configure a vSAN Compute cluster and mount this datastore
to the Compute cluster. Go to the Enabling vSAN Compute Cluster section.

TECHNICAL WHITE PAPER | 30


VMware vSAN™ Architecture Overview & Setup PoC Guide

Post-Configuration – Renaming vSAN Datastore (optional)


Once the vSAN Max cluster creation completes, the vSAN Max datastore is ready to be shared with vSAN Compute Clusters.
The datastore will have the default name of “vsanDatastore.” If the default name is not suitable for your environment, use
these steps to rename the datastore as needed.

Navigate to [vSAN Cluster] > Datastores. Once on that screen, filter on vSAN (to make it easier to find the new datastore
otherwise one may see the local datastore for each cluster host as well). Then right-click on the vSAN datastore and select
Rename.

This will open the Rename workflow. In the workflow rename the datastore as needed then select OK

The datastore will now reflect the newly assigned name.

TECHNICAL WHITE PAPER | 31


VMware vSAN™ Architecture Overview & Setup PoC Guide

You are now ready to configure a vSAN Compute cluster and mount this datastore to the Compute cluster. Go to the Enabling
vSAN Compute Cluster section.

Manually Enabling vSAN Max on a Cluster


Note: If Quickstart was used (as per the earlier section) then this section can be skipped.

Manual vSAN Max enablement is available for those that do not wish to use the Quickstart process.

For this scenario, please follow the vSAN Max Cluster Services Configuration instructions in the vSAN Max Design and
Operational Guidance document. Direct link to the section listed below:

https://core.vmware.com/resource/vsan-max-design-and-operational-guidance – sec32263-sub1

Enabling vSAN Max/HCI Mesh Services on a VMware Cloud Foundation™ based Cluster

VCF includes dedicated processes to automate the deployment and configuration of core infrastructure including vSAN
services. In fact, these processes are required and are the only supported methods within VCF.

As of the writing of this guide, VCF 5.1 supports HCI Mesh. For more information, please review below.

HCI Mesh with VCF - https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-1F86850D-E95E-40A8-


AFC5-BE58D504D739.html

TECHNICAL WHITE PAPER | 32


VMware vSAN™ Architecture Overview & Setup PoC Guide

Enabling vSAN Compute Cluster


Overview
A vSAN Compute cluster is simply a vSphere cluster that has a thin layer of vSAN installed for the purposes of mounting the
remote vSAN Max datastore.

The configuration process is solely manual. We do not have a Quickstart process for configuring vSAN Compute clusters. For
this scenario, please follow the vSAN Compute Cluster instructions in the vSAN Max Design and Operational Guidance
document. Direct link to the section is listed below:

https://core.vmware.com/resource/vsan-max-design-and-operational-guidance#sec32263-sub2

Prerequisites:

● The hosts in the vSAN Compute cluster will need a vSAN VMkernel configured. (routable to the vSAN network used
by the target vSAN Max cluster)
● If the vSAN Compute cluster is initialized as a vSAN ESA cluster, it will only mount vSAN ESA/vSAN Max datastores
● If the vSAN Compute cluster is initialized as a vSAN OSA cluster, it will be able to mount vSAN OSA as well as vSAN
ESA/Max datastores
● If direct access to the datastore is not required, one can configure vSAN File Services

● For more information on configuring vSAN File Shares refer to the “vSAN Proof of Concept: vSAN Features” guide

Enabling vSAN Compute Clusters in VMware Cloud Foundation


VCF includes dedicated processes to automate the deployment and configuration of core infrastructure including vSAN
services. In fact, these processes are required and are the only supported methods within VCF.

As of the writing of this guide, VCF 5.1 supports vSAN Compute clusters. For more information, please review below:

● Commissioning Hosts into VCF - https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-


45A77DE0-A38D-4655-85E2-BB8969C6993F.html
● HCI Mesh with VCF - https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-1F86850D-
E95E-40A8-AFC5-BE58D504D739.html

TECHNICAL WHITE PAPER | 33


VMware vSAN™ Architecture Overview & Setup PoC Guide

Enabling Two-Node and vSAN Stretched Clusters


Overview
To configure a vSAN two node cluster or stretched, select the appropriate option from the options given in [vSAN Cluster] >
Configure > Services:

Note: A witness appliance is required for two node and stretched vSAN clusters.

For more information on vSAN two-node and stretched cluster, see the ‘”vSAN Proof of Concept: vSAN Stretched Cluster and
Two-Node Overview and Testing” guide.

Enabling vSAN Stretched Clusters in VMware Cloud Foundation


VCF includes dedicated processes to automate the deployment and configuration of core infrastructure including vSAN
services. In fact, these processes are required and are the only supported methods within VCF.

You can stretch a vSAN cluster in a workload domain across two availability zones within a region. Both availability zones must
contain an equal number of hosts to ensure failover in case any of the availability zones goes down.

The default management vSphere cluster must be stretched before a VI workload domain cluster can be stretched. This
ensures that the NSX control plane and management VMs (vCenter, NSX management cluster, and SDDC Manager) remain
accessible if the stretched cluster in the primary availability zone goes down.

You cannot stretch a cluster in the following conditions:

● The cluster uses vSAN ESA.


● The cluster has a vSAN remote datastore mounted on it.
● The cluster shares a vSAN Storage Policy with any other clusters.
● The cluster includes DPU-backed hosts.
● The cluster is enabled for Workload Management (vSphere with Tanzu).

Review this link for more information: https://docs.vmware.com/en/VMware-Cloud-Foundation/5.1/vcf-admin/GUID-


7B4CC729-20BD-4CC9-B855-B38F02F74D40.html

TECHNICAL WHITE PAPER | 34


VMware vSAN™ Architecture Overview & Setup PoC Guide

Check the vSAN Cluster Thoroughly


Once the vSAN network has been created and vSAN is enabled, you should check that each ESXi host in the vSAN cluster is
able to communicate to all other ESXi hosts in the cluster. The easiest way to achieve this is via the vSAN Health Check.

Why Is This Important?


vSAN is dependent on the correct hardware and firmware combinations, as well as the network (configuration, reliability,
performance, etc.). One of the most frequent causes of requesting support is either an incorrect network configuration or the
network not performing as expected.

Use Health Check to Verify vSAN Functionality


Running individual commands from one host to all other hosts in the cluster can be tedious and time-consuming. vSAN has an
integrated health check system. One of the first tasks to do after setting up any vSAN cluster is to perform a vSAN Health
Check.

To run a vSAN Health Check, navigate to [vSAN Cluster] > Monitor > vSAN > Skyline Health and click the RETEST button.

Note: The Skyline Health interface provides the option to see the health findings in both a tile and list views. In this case,
the screenshots below, show a “Cluster Health Score” of 99% with no Unhealthy findings in both tile and list views.

TECHNICAL WHITE PAPER | 35


VMware vSAN™ Architecture Overview & Setup PoC Guide

If any of the health checks fail, select the appropriate finding, and then click on the Troubleshoot button for information on
how to resolve the issue. This brings up a new screen providing details on the cause and recommended fix. The screen also
contains an Ask VMware button where appropriate, which will take you to a VMware Knowledge Base article detailing the
issues, troubleshooting steps, and potential resolutions.

TECHNICAL WHITE PAPER | 36


VMware vSAN™ Architecture Overview & Setup PoC Guide

TECHNICAL WHITE PAPER | 37


VMware vSAN™ Architecture Overview & Setup PoC Guide

Ensure that the latest version of the HCL has been downloaded and run a RETEST on the Health check screen. Navigate to the
vSAN HCL DB up-to-date health finding, expanding the finding, then click View Current Result. On the Result Screen select
GET LATEST VERSION ONLINE.

If there is no internet connectivity, download the latest JSON from https://partnerweb.vmware.com/service/vsan/all.json (see
https://kb.vmware.com/s/article/2145116 for more details) and select UPDATE FROM FILE...

TECHNICAL WHITE PAPER | 38


VMware vSAN™ Architecture Overview & Setup PoC Guide

The Performance Service is enabled by default. You can check its status from [vSAN Cluster] > Configure > vSAN > Services.
If it needs to be manually enabled, click the EDIT button next to Performance Service and turn it on using the defaults. The
Performance Service provides vSAN performance metrics to vCenter and other tools like Aria Operations.

TECHNICAL WHITE PAPER | 39


VMware vSAN™ Architecture Overview & Setup PoC Guide

To ensure everything in the cluster is optimal, the health service will also check the hardware against the VMware
Compatibility Guide (VCG) for vSAN, verify that the networking is functional, and that there are no underlying disk problems or
vSAN integrity issues.

Manually Checking against the VCG


The following commands are useful to help identify firmware and drivers in ESXi for comparison with the VCG. First, log in to
an ESXi host via SSH, then run the following commands to obtain the information from the server:

See the controller details:

esxcli vsan debug controller list

List VID DID SVID SSID of a storage controller or network adapter:

vmkchdev -l | egrep 'vmnic|vmhba'

Show which NIC driver is loaded:

TECHNICAL WHITE PAPER | 40


VMware vSAN™ Architecture Overview & Setup PoC Guide

esxcli network nic list

Show which storage controller driver is loaded:

esxcli storage core adapter list

Display driver version information:

vmkload_mod -s <driver-name> | grep -i version

Display NVMe driver information:

esxcli system module get -m nvme_pcie

For NVMe device info (replace X with the appropriate value):

esxcli nvme device get -A vmhbaX | egrep "Serial|Model|Firmware"

TECHNICAL WHITE PAPER | 41


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSAN Basics
Deploy your first Virtual Machine
In this section, a VM is deployed to the vSAN datastore using the ‘vSAN default storage policy’, which stipulates a simple
RAID-1 mirror.

Note: Due to the way data is structured in vSAN ESA, it recommended in most circumstances to define a RAID-5 policy for
VMs.

To examine the default policy settings, navigate to Menu > Shortcuts >VM Storage Policies.

From there, select vSAN Default Storage Policy. Look under the Rules tab to see the settings on the policy:

TECHNICAL WHITE PAPER | 42


VMware vSAN™ Architecture Overview & Setup PoC Guide

We will return to VM Storage Policies in more detail later, but when a VM is deployed with the default policy, it
should have a mirror copy of the VM data created. This second copy of the VM data is placed on storage on a
different host or fault domain to enable the VM to tolerate any single failure.

Also note that object space reservation is set to 'Thin provisioning', meaning that the object should be deployed as
“thin”. After we have deployed the VM, we will verify that vSAN adheres to both capabilities.

One final item to check before we deploy the VM is the current free capacity on the vSAN datastore. This can be
viewed from the [vSAN Cluster] > Monitor > vSAN > Capacity. In this example, it is 1.37 TB.

TECHNICAL WHITE PAPER | 43


VMware vSAN™ Architecture Overview & Setup PoC Guide

Make a note of the free capacity in your environment before continuing with the deploy VM exercise.

To deploy the VM, simply follow the steps provided in the wizard.

Select New Virtual Machine from the Actions Menu.

Select Create a new virtual machine.

TECHNICAL WHITE PAPER | 44


VMware vSAN™ Architecture Overview & Setup PoC Guide

Provide a name for the VM:

TECHNICAL WHITE PAPER | 45


VMware vSAN™ Architecture Overview & Setup PoC Guide

Select a compute resource (if DRS is enabled on the cluster, select the cluster itself, otherwise select one of the hosts):

Up to this point, the virtual machine deployment process is identical to all other virtual machine deployments. It is the next
section that may be unfamiliar: this is where a policy for the virtual machine is chosen.

As per the screenshot below, select change the VM storage policy to ‘vSAN Default Storage Policy’.

Once the policy has been chosen, datastores are split into those that are either compliant or non-compliant with the selected
policy. As seen below, only the vSAN datastore can utilize the policy settings in the vSAN Default Storage Policy, so it is the
only one that shows up as Compatible in the list of datastores.

TECHNICAL WHITE PAPER | 46


VMware vSAN™ Architecture Overview & Setup PoC Guide

The rest of the VM deployment steps in the wizard are quite straightforward, and simply entail selecting ESXi version
compatibility (leave at default), a guest OS, and customize hardware (no changes needed).

TECHNICAL WHITE PAPER | 47


VMware vSAN™ Architecture Overview & Setup PoC Guide

Verifying Disk Layout of a VM on vSAN


Physical placement of data on vSAN is defined by storage polices, which we will look at in more detail in another section.
Here, we look at how the data is placed with vSAN default policy.

Once the VM is created, select the new VM in the inventory, navigate to the Configure tab, and then select Policies. There
should be two objects shown, "VM home" and "Hard disk 1". Both should show a compliance status of Compliant meaning that
vSAN was able to deploy these objects in accordance with the policy settings.

TECHNICAL WHITE PAPER | 48


VMware vSAN™ Architecture Overview & Setup PoC Guide

To verify this, navigate to the [vSAN Cluster] > Monitor > Virtual Objects. Once again, both the “VM home” and “Hard disk 1”
should be displayed. Select the VM, followed by View Placement Details.

This should display a physical placement of RAID 1 configuration with two components, each component representing a
mirrored copy of the virtual disk. It should also be noted that the components are located on different hosts or fault domains.
This implies that the policy setting to tolerate 1 failure is being adhered to, as each host is an implicit fault domain. Further,
fault domains can be explicitly defined: for instance, hosts within a single rack. Thus, data is resilient to failure of the entire
rack. For details on how to create fault domains, review Managing Fault Domains in vSAN Clusters -
https://docs.vmware.com/en/VMware-vSphere/8.0/vsan-administration/GUID-8491C4B0-6F94-4023-8C7A-
FD7B40D0368D.html

TECHNICAL WHITE PAPER | 49


VMware vSAN™ Architecture Overview & Setup PoC Guide

Physical Placement: vSAN OSA


In a vSAN OSA cluster, the ‘witness’ component is used to maintain a quorum on a per-object basis. For more information,
refer to the VMware vSAN Design Guide on core.vmware.com

Physical Placement: vSAN ESA


In vSAN ESA, physical placement is a little different. Data is written into two legs: writes are first ingested into a performance
leg and then coalesced and written to a capacity leg. Whilst the distribution of data on the capacity leg reflects the storage
policy setting (RAID 1 vs. RAID 5, etc.), the performance leg is always a RAID-1 mirror (and the FTT of the policy is followed).
For this reason, RAID-5 performance in vSAN ESA is at least on-par with RAID-1 performance on vSAN OSA. Thus, it is
recommended, for most workloads, to define a RAID-5 policy for VMs on vSAN ESA.

For more information on object placement in vSAN ESA, visit https://core.vmware.com/vsan-esa

Below we see how the vSAN default policy (RAID-1, FTT-1) distributes the objects (physical disk placement can also be seen
per VM, by selecting the VM and navigating to Monitor > vSAN > Physical disk placement). Also note that no witness
components are needed (as opposed to vSAN OSA) as there are enough data objects to maintain quorum:

TECHNICAL WHITE PAPER | 50


VMware vSAN™ Architecture Overview & Setup PoC Guide

Physical Space Requirements


The “object space reservation” policy setting defines how much space is initially reserved on the vSAN datastore for a VM's
objects. By default, it is set to "thin provisioning", implying that the VM’s storage objects are entirely “thin” and consume no
unnecessary space. Note the free capacity in the vSAN datastore after deploying the VM, we see that the free capacity is very
close to what it was before the VM was deployed, as displayed:

TECHNICAL WHITE PAPER | 51


VMware vSAN™ Architecture Overview & Setup PoC Guide

Because we have not installed anything in the VM (such as a guest OS) - it shows that only a tiny portion of the
vSAN datastore has so far been used, verifying that the object space reservation setting of "Thin provisioning" is
working correctly (observe that the "Virtual disks" and "VM home objects" consume less than 1GB in total, as
highlighted in the "Used Capacity Breakdown" section).

Do not delete this VM as we will use it for other tests going forward.

Configuring Fault Domains


As mentioned above, a single host is a fault domain, i.e. data is separated such that the failure of a host does not lead to data
loss. On failure, data can be rebuilt elsewhere. We can also group hosts so that data is further spread, and data is better
protected.

Fault domains can be defined at cluster creation time (see the sections above). To define (or alter) fault domains, thereafter,
navigate to [vSAN Cluster] > Configure > vSAN > Fault Domains.

TECHNICAL WHITE PAPER | 52


VMware vSAN™ Architecture Overview & Setup PoC Guide

Fault domains can then be created by either dragging and dropping into the ‘plus’ icon, or by ticking the appropriate hosts
and selecting Move Hosts.

Creating a Snapshot
Using the virtual machine created previously, take a snapshot of it. The snapshot can be taken when the VM is powered on or
powered off. The objectives are to see that:

● No setup is needed to make vSAN handle snapshots


● The process for creating a VM snapshot is unchanged with vSAN
● A successful snapshot delta object is created
● The policy settings of the delta object are inherited directly from the base disk object

From the VM object in vCenter, click Actions > Snapshots > Take Snapshot...

TECHNICAL WHITE PAPER | 53


VMware vSAN™ Architecture Overview & Setup PoC Guide

Take a Snapshot of the virtual machine created in the earlier step.

Provide a name for the snapshot and optional description.

Once the snapshot has been requested, monitor tasks and events to ensure that it has been successfully captured. Once the
snapshot creation has completed, additional actions will become available in the snapshot drop-down window. For example,
there is a new action to Revert to Latest Snapshot and another action to Manage Snapshots.

TECHNICAL WHITE PAPER | 54


VMware vSAN™ Architecture Overview & Setup PoC Guide

Choose the Manage Snapshots option. The following is displayed. It includes details regarding all snapshots in the chain, the
ability to delete one or all of them, as well as the ability to revert to a particular snapshot.

To see snapshot delta object information from the UI, navigate to Monitor > vSAN > Physical disk placement.

TECHNICAL WHITE PAPER | 55


VMware vSAN™ Architecture Overview & Setup PoC Guide

There are now three objects that are associated with that virtual machine. First is the "VM Home" namespace. "Hard disk 1" is
the base virtual disk, and "Hard disk 1 - poc-test-vm1.vmdk" is the snapshot delta. Notice the snapshot delta inherits its policy
settings from the base disk that needs to adhere to the vSAN Default Storage Policy.

TECHNICAL WHITE PAPER | 56


VMware vSAN™ Architecture Overview & Setup PoC Guide

For vSAN ESA, another object is created (with the same layout as Hard disk 1):

The snapshot can now be deleted from the VM. Monitor the VM’s tasks and ensure that it deletes successfully.

TECHNICAL WHITE PAPER | 57


VMware vSAN™ Architecture Overview & Setup PoC Guide

Clone a Virtual Machine


We will continue to use the same VM as before. This time make sure the VM is powered on first.

There are several different cloning operations available with vSAN. Here we will “Clone to Virtual Machine”. The cloning
operation is a straightforward click-through operation. This next screen is the only one that requires human interaction. Simply
provide the name for the newly cloned VM, and a folder if desired.

TECHNICAL WHITE PAPER | 58


VMware vSAN™ Architecture Overview & Setup PoC Guide

We are going to clone the VM in the vSAN cluster, so this must be selected as the compute resource.

On the “Select Storage” screen select the source datastore for the VM, “vsanDatastore”. This will all be pre-selected for you if
the VM being cloned also resides on the vsanDatastore.

TECHNICAL WHITE PAPER | 59


VMware vSAN™ Architecture Overview & Setup PoC Guide

Select from the available options (leave unchecked - default)

This will take you to the “Ready to complete” screen. If everything is as expected, click FINISH to commence the clone
operation. Monitor the VM tasks for the status of the clone operation.

This completes the cloning section of this guide. Do not delete the newly cloned VM, we will be using it in subsequent tests.

TECHNICAL WHITE PAPER | 60


VMware vSAN™ Architecture Overview & Setup PoC Guide

vMotion a Virtual Machine Between Hosts


The first step is to power-on the newly cloned virtual machine. We will migrate this VM from one vSAN host to another vSAN
host using vMotion.

Note: Take a moment to revisit the network configuration and ensure that the vMotion network is distinct from the vSAN
network. If these features share the same network, performance will not be optimal.

First, determine which ESXi host the VM currently resides on. Selecting the Summary tab of the VM shows this, in the ‘related
objects window’.

Next, select ‘migrate’, either from the VM ‘actions’ menu or by right-clicking on the VM:

TECHNICAL WHITE PAPER | 61


VMware vSAN™ Architecture Overview & Setup PoC Guide

"Migrate" allows you to migrate to a different compute resource (host), a different datastore or both at the same time. In this
initial test, we are simply migrating the VM to another host in the cluster, so this initial screen should be left at the default of
“Change compute resource only”.

Select Change compute resource only:

TECHNICAL WHITE PAPER | 62


VMware vSAN™ Architecture Overview & Setup PoC Guide

Then select a destination host:

Select a destination network and click Next.

Leave the default (high) on the vMotion Priority window, click Next.

TECHNICAL WHITE PAPER | 63


VMware vSAN™ Architecture Overview & Setup PoC Guide

At the “Ready to complete” window, click on FINISH to initiate the migration. If the migration is successful, the summary tab of
the virtual machine should show that the VM now resides on a different host.

Verify that the VM has been migrated to a new host:

This completes the “VM migration using vMotion” section of this guide. As you can see, vMotion works just great with vSAN.
Do not delete the migrated VM: we will be using it in subsequent tests.

TECHNICAL WHITE PAPER | 64


VMware vSAN™ Architecture Overview & Setup PoC Guide

VMware vSphere® Storage vMotion® (Storage vMotion) VM Between Datastores


This test will only be possible if there is space on a VMFS datastore (such as the boot volume) on the host housing the VM or
you have another datastore type available, such as an NFS share. The objective of this test is to successfully migrate a VM
from another datastore type into vSAN and vice versa.

Mount an NFS Datastore to the Hosts (optional)


The steps to mount an NFS datastore to multiple ESXi hosts are described in the vSphere Administrators Guide. See the Create
NFS Datastore in the vSphere Client topic for detailed steps.

Storage vMotion a VM from vSAN to another Datastore Type


Currently, the VM resides on the vSAN datastore. As we did before, launch the migrate wizard, however, on this occasion
move the VM from the vSAN datastore to another datastore type by selecting Change storage only.

TECHNICAL WHITE PAPER | 65


VMware vSAN™ Architecture Overview & Setup PoC Guide

Select destination datastore and change the VM Storage Policy to Datastore Default as the vSAN policy will not apply to a
VMFS (or NFS) datastore. Here we are migrating to the local VMFS datastore on the host:

On the “Ready to complete” screen, click FINISH to initiate the migration.

Once the migration completes, the ‘Datastores’ tab can be used to examine the datastore on which the VM resides.

TECHNICAL WHITE PAPER | 66


VMware vSAN™ Architecture Overview & Setup PoC Guide

Verify that the VM has been moved to the new storage.

VM Storage Policies and vSAN


VM Storage Policies form the basis of VMware’s Software-Defined Storage vision. A VM Storage Policy dictates how vSAN
should place data (as well as some other features) across the physical resources, such as RAID type, number of stripes and
compression (for vSAN ESA). Previously, we have deployed our VMs onto the ‘vSAN Default Storage Policy’.

The actual status of the data may not reflect the policy definition (for instance, if there is a failure, or if the policy has recently
been changed). Thus, VM disks can be either compliant or non-compliant with the assigned storage policy. The latter is usually
a temporary state, until the system stabilizes. Once the rebuild (or other operation) is complete, compliance is automatically
regained.

Note: Storage policies are applied per VMDK in vSAN OSA and per VM in vSAN ESA. Further, the recommended storage
policy for vSAN ESA clusters is RAID-5 (see the vSAN features guide for more information).

TECHNICAL WHITE PAPER | 67


VMware vSAN™ Architecture Overview & Setup PoC Guide

Create a New VM Storage Policy


We will build a policy with RAID 1 and a stripe width of two. The VM Storage Policies can be accessed from the 'Shortcuts'
page on the vSphere client, as shown below.

Here we see the existing policies already in place, such as the ‘vSAN Default Storage Policy’ (already used to deploy VMs in
the ‘Basic vSphere Functionality’ section of this guide).

TECHNICAL WHITE PAPER | 68


VMware vSAN™ Architecture Overview & Setup PoC Guide

To create a new policy, click on Create.

The next step is to provide a name and an optional description for the new VM Storage Policy. Since this policy will contain a
stripe width of two, we have given it a name to reflect this. You may also give it a name to reflect that it is a vSAN policy.

The next section sets the policy structure. We select Enable rules for "vSAN" Storage to set a vSAN specific policy:

TECHNICAL WHITE PAPER | 69


VMware vSAN™ Architecture Overview & Setup PoC Guide

Now we get to the point where we create a set of rules. The first step is to select the availability of the objects associated with
this rule. Set the failures to tolerate to one failure (RAID 1)

We then set the Advanced Policy Rules. Once this is selected, the six customizable capabilities associated with vSAN are
exposed. Since this VM Storage Policy is going to have a requirement where the stripe width of an object is set to two, this is
what we select from the list of rules. It is officially called “Number of disk stripes per object”.

Note: The general recommendation is to keep the number of stripes at the default, one (unless troubleshooting
performance or for specific scenarios). Here, we are using this setting to clearly demonstrate how a storage policy affects
storage components.

TECHNICAL WHITE PAPER | 70


VMware vSAN™ Architecture Overview & Setup PoC Guide

The next screen shows the datastores that are compatible with the policy. In this case, only the vsanDatastore is compatible
with the policy settings.

Note: This does not mean that the vSAN datastore can successfully deploy a VM with this policy. It simply means that the
vSAN datastore understands the rules or requirements in the policy. The ‘force provisioning’ option will try and apply the
policy without first checking if it can be fulfilled by the cluster.

Review the settings on the next screen and click FINISH to create the policy:

TECHNICAL WHITE PAPER | 71


VMware vSAN™ Architecture Overview & Setup PoC Guide

We can now go ahead and deploy a VM with this new policy and see what effect it has on the layout of the
underlying storage objects.
Note: vSAN includes some pre-defined storage policies for the vSAN File Service, named
‘FSVM_Profile_DO_NOT_MODIFY’. These policies are used internally by vSAN for vSAN file services and should not be
modified.

Deploy a new VM with a New Storage Policy


The workflow to deploy a New VM remains the same until we get to the point where the VM Storage Policy is chosen. This
time, instead of selecting the default policy, select the newly created policy, as shown below. As before, the vsanDatastore
should show up as the compatible datastore, and thus the one to which this VM should be provisioned. To illustrate clearly, we
will see how this works on a vSAN OSA cluster to begin with.

TECHNICAL WHITE PAPER | 72


VMware vSAN™ Architecture Overview & Setup PoC Guide

Now let's examine the layout of this virtual machine, and see if the policy requirements are met, i.e. do the storage objects of
this VM have a stripe width of 2. First, ensure that the VM is compliant with the policy by navigating to [VM] > Configure >
Policies, as shown here:

The next step is to select the [vSAN Cluster] > Monitor > vSAN > Virtual Objects and check the layout of the VM’s storage
objects. Select the "Hard Disk 1" object and click on the View Placement Details:

TECHNICAL WHITE PAPER | 73


VMware vSAN™ Architecture Overview & Setup PoC Guide

Here, in our vSAN OSA cluster, we can see the two (RAID 0) stripe components, as defined by the policy. Each striped
component must be placed on its own physical (capacity) disk. There are enough physical disks to meet this requirement here.
However, a request for a larger stripe width would not be possible in this configuration. The stripes are across physical disks,
and not necessarily disk groups.

However, on examining the “VM Home” object, we see an apparent policy violation – there is no striping seen. This is by
design, as there is no performance gain by striping this object, so the policy setting is ignored.

TECHNICAL WHITE PAPER | 74


VMware vSAN™ Architecture Overview & Setup PoC Guide

It should also be noted that snapshots taken of this base disk continue to inherit the policy of the base disk, implying that the
snapshot delta objects will also be striped.

With a vSAN ESA cluster, the “Hard Disk 1” object is striped, as per the policy on the capacity leg.

TECHNICAL WHITE PAPER | 75


VMware vSAN™ Architecture Overview & Setup PoC Guide

Again, the “VM home” object ignores the stripe width from the policy, and the capacity leg is still only RAID 1, single stripe:

Note: The stripe width setting may have unintended consequences if used on an ESA cluster. For more information, visit:
https://core.vmware.com/blog/stripe-width-storage-policy-rule-vsan-esa

Assign a new Storage Policy to an Existing VM


You can choose to modify the VM Storage Policy mapping of an existing VM deployed on the vSAN datastore. The
configuration of the objects associated with the VM will be modified to comply with the newer policy. For example, if the
number of failures to tolerate (FTT) is increased, newer components would be created, synchronized with the existing object,
and subsequently, the original object is discarded. VM Storage policies can also be applied to individual objects.

Here, we will illustrate this (on a vSAN ESA cluster) by applying a new policy to the VM, increasing the failures to tolerate to
FTT=2 (keeping RAID 1 and the stripes=2). Once again, we create a new policy. Below, we have named the policy “R1 FTT=2
Stripe Width=2’.

TECHNICAL WHITE PAPER | 76


VMware vSAN™ Architecture Overview & Setup PoC Guide

First, we set the FTT value:

Then, as before, the stripe width:

TECHNICAL WHITE PAPER | 77


VMware vSAN™ Architecture Overview & Setup PoC Guide

Then, we navigate to our VM, then Configure > Policies and click on EDIT VM STORAGE POLICES

This takes you to the edit screen, where the policy can be changed. The new policy can then be selected from the drop-down
list

Once the policy is selected, click the OK button as shown above to ensure the policy gets applied to all storage objects. The
VM Storage Policy should now appear updated for all objects. Now when you revisit the Configure > Policies view, you should
see the changes in the process of taking effect (Reconfiguring) or completed.

TECHNICAL WHITE PAPER | 78


VMware vSAN™ Architecture Overview & Setup PoC Guide

Looking at the physical disk placement, the capacity leg now has three sets of stripes (as expected). Moreover, as we have
increased the FTT, the performance leg now has an extra stripe set:

Modify a VM Storage Policy


The workflow above is useful when you only need to modify the policy of one or two VMs, but if you need to change the VM
Storage Policy of a significant number of VMs then this can be a little onerous. Instead, we can update by simply changing the
policy used by those VMs. All VMs using those policies can then be “brought to compliance” by reconfiguring their storage
object layout to make them compliant with the policy. We shall look at this next.

Note: Modifying or applying a new VM Storage Policy leads to additional backend IO as the objects are being
synchronized.

In this task, we shall modify an existing VM Storage policy to set the ‘Object Space Reservation’ parameter to 25%. This means
that each storage object will now reserve 25% of the VMDK size on the vSAN datastore. Since all VMs were deployed with
40GB VMDKs with Failures to tolerate=1 failure - RAID-1 (Mirroring), the reservation value will be 20 GB.

As the first step, note the amount of free space in the vSAN datastore. This would help ascertain the impact of the change in
the policy.

TECHNICAL WHITE PAPER | 79


VMware vSAN™ Architecture Overview & Setup PoC Guide

Select StripeWidth=2 policy from the list of available policies, and then the Edit Settings option. Navigate to vSAN >
Advanced Policy Rules and modify the Object space reservation setting to 25%, as shown below:

Proceed to complete the wizard with default values and click FINISH. A pop-up message requiring user input appears with
details of the number of VMs using the policy being modified. This is to ascertain the impact of the policy change. Typically,
such changes are recommended to be performed during a maintenance window. You can choose to enforce a policy change
immediately or defer it to be changed manually at a later point. Leave it at the default, which is “Manually later”, by clicking
Yes as shown below:

TECHNICAL WHITE PAPER | 80


VMware vSAN™ Architecture Overview & Setup PoC Guide

Next, select the Storage policy -StripeWidth=2 and click on the VM Compliance tab in the bottom pane. It will display the two
VMs along with their storage objects, and the fact that they are no longer compliant with the policy. They are in an “Out of
Date” compliance state as the policy has now been changed.

You can now enforce a policy change by navigating to [VM Storage Policies] and clicking on Reapply VM Storage Policy:

When this button is clicked, the following popup appears.

When the reconfigure activity completes against the storage objects, and the compliance state is once again checked,
everything should show as Compliant.

Since we have now included an ObjectSpaceReservation value in the policy, you may notice corresponding capacity reduction
from the vSAN datastore.

For example, the two VMs with the new policy change have 40GB storage objects. Therefore, there is a 25%
ObjectSpaceReservation implying 10GB is reserved per VMDK. So that's 10GB per VMDK, 1 VMDK per VM, 2 VMs equals 20

TECHNICAL WHITE PAPER | 81


VMware vSAN™ Architecture Overview & Setup PoC Guide

GB reserved space, right? However, since the VMDK is also mirrored, there is a total of 40GB reserved on the vSAN
datastore.

IOPS Limits
vSAN incorporates a quality-of-service feature that can limit the number of IOPS an object may consume. IOPS limits are
enabled and applied via a policy setting. The setting can be used to ensure that a particular virtual machine does not consume
more than its fair share of resources or negatively impact the performance of the cluster.

Here, we demonstrate setting IOPS limits by creating a new policy (as above). We can then set the IOPS limit by navigating to
the ‘Advanced Policy Rules’ tab. In our example, we have set the IOPS limit to 1000:

The IOPS limit value is calculated as the number of 32KB IOs per second. Therefore, in this example, where we have a value of
1000, the IOPS limit is 1000x32KB=32MB/s. If I/O against the VM or VMDK should rise above the threshold, the additional I/O
will be throttled. Note that any I/O incurred by a snapshot is counted too.

Space Efficiency & Encryption Defined by Policy (vSAN ESA)


In vSAN ESA compression can be enabled/disabled by VM storage policy. The policy applies to any new data, so turning off
compression will only affect any new writes. Note that compression is turned on by default in vSAN ESA.

In the example below, we create a new policy (as per above) and then navigate to the ‘Storage rules’ tab to configure the
services. Here, we enable encryption and disable compression:

TECHNICAL WHITE PAPER | 82


VMware vSAN™ Architecture Overview & Setup PoC Guide

Note: Turning off compression will affect new writes only. Existing data is not affected.

TECHNICAL WHITE PAPER | 83


VMware vSAN™ Architecture Overview & Setup PoC Guide

APPENDIX A: vSAN RDMA Configuration & Troubleshooting


Overview
RDMA (Remote Direct Memory Access) provides a more efficient network transport layer than traditional TCP connections.
For more details visit https://core.vmware.com/blog/vsan-7-update-2-rdma-support

Requirements:

● A supported physical network adapter that has RDMA RoCEv2 capabilities; see the vSAN VCG
● A switch that supports RDMA RoCEv2 and associated configuration.

Note that neither vSAN Stretched Clustering nor two-node clusters are supported with RDMA and that LACP should not
be configured on the network uplinks.

To enable RDMA support, it can be achieved using HCI Quick Start wizard

Toggle to enable and select next

TECHNICAL WHITE PAPER | 84


VMware vSAN™ Architecture Overview & Setup PoC Guide

Alternatively on an existing cluster, toggle to enable and click on Apply:

Configuration Example
First, enable PFC support on the switch. The example below shows the configuration steps on a Mellanox SN2100:

Enable PFC:

switch01 [standalone: master] (config) # dcb priority-flow-control enable


This action might cause traffic loss while shutting down a port with priority-flow-control mode on
Type 'yes' to confirm enable pfc globally: yes

Enable Switch PFC for priority 4


dcb priority-flow-control priority 4 enable

Assign PFC to port (ESXi uplink):

switch01 [standalone: master] (config) # interface ethernet 1/9 dcb priority-flow-control mode on
force

Verify RDMA available adapter through ESXi shell:

[root@localhost:~] esxcli rdma device list

Name Driver State MTU Speed Paired Uplink Description


------- ---------- ------ ---- -------- ------------- -----------
vmrdma0 nmlx5_rdma Active 4096 100 Gbps vmnic4 MT27700 Family
vmrdma1 nmlx5_rdma Active 4096 100 Gbps vmnic5 MT27700 Family

Looking at each virtual RDMA adapter, we see details on state, MTU size (see hardware specific documentation)
and the linked adapter.

TECHNICAL WHITE PAPER | 85


VMware vSAN™ Architecture Overview & Setup PoC Guide

Note: To take advantage of RDMA you must have jumbo frames enabled on the physical switch. The RDMA
adapter provides <= 4096 (maximum) MTU size.

Verify ESXi RDMA PFC status:

[root@localhost:~]esxcli network nic dcb status get -n vmnic4

Nic Name: vmnic4


Mode: 3 - IEEE Mode
Enabled: true
Capabilities:
Priority Group: true
Priority Flow Control: true
PG Traffic Classes: 8
PFC Traffic Classes: 8
PFC Enabled: true
PFC Configuration: 0 0 0 0 1 1 0 0

If we receive an error here, double check the driver/firmware combination as per vSphere HCL. The vSAN Health check
invokes a similar process to query the device DCB status.

Verify ESXi RDMA available protocols:

[root@localhost:~] esxcli rdma device protocol list

Device RoCE v1 RoCE v2 iWARP


------- ------- ------- -----
vmrdma0 true true false
vmrdma1 true true false

Verify vSAN Skyline Health check in vCenter:


In the screenshot below, we see a RDMA Configuration Health issue:

TECHNICAL WHITE PAPER | 86


VMware vSAN™ Architecture Overview & Setup PoC Guide

Here we see that PFC is not enabled on the switch:

Verify virtual RDMA adapter performance with esxtop:


SSH to one of the hosts and launch esxtop. Press ‘r’ to show the RDMA screen:

TECHNICAL WHITE PAPER | 87


VMware vSAN™ Architecture Overview & Setup PoC Guide

This shows us the throughput for each virtual adapter. Here we see the traffic traversing vmrdma0.

Pressing ‘n’ for the network view shows that there is minimal traffic on the vmkernel adaptor:

Verify functionality of RDMA and TCP/IP on the same physical vmnic:


In the setup below we can verify that RDMA transport layer is not used for standard TCP/IP protocol and handled separately
on the vmnic card layer:

● Enable vSAN RDMA


● Prepare DVS / vSwitch portgroup for VMs using the RDMA adapter
● Configure two VMs with the iperf/iperf3 package installed
● Place both VMs on different hosts
● Run one VM as iperf server (iperf3 -s)
● Run the second VM as client (iperf3 -H <IP address of server>)
● On each host, run esxtop and look at the difference between the network (‘n’) and RDMA (‘r’) screens during the
iperf3 test

RDMA troubleshooting
First, verify that the physical network adapters support RDMA. On each host, navigate to Configure > Networking > Physical
adapters > [adapter] > RDMA:

TECHNICAL WHITE PAPER | 88


VMware vSAN™ Architecture Overview & Setup PoC Guide

Verify if RDMA adapters are bound to VMkernel interface


On each host, navigate to Configure > Networking > RDMA adapters > [vmrdma#] > Bound VMkernel Adapter:

Note: The RDMA support flag is required for the final setup to enable vSAN RDMA transport (if the RDMA flag is not
visible, then double check the hardware specification of the adapter along with driver/firmware versions and the vSphere
HCL).

TECHNICAL WHITE PAPER | 89


VMware vSAN™ Architecture Overview & Setup PoC Guide

vSphere vSAN RDMA uses RoCEV2 as its protocol layer. When there is no RDMA support available on the physical link or
setup, communication falls back to standard legacy TCP/IP automatically.

Esxtop provides additional fields for enablement through the ‘f’ key:

* A: NAME = Name of device


B: DRIVER = driver
C: STATE = State
* D: TEAM-PNIC = Team Uplink Physical NIC Name
* E: PKTTX/s = Packets Tx/s
* F: MbTX/s = Megabits Tx/s
* G: PKTRX/s = Packets Rx/s
* H: MbRX/s = Megabits Rx/s
I: %PKTDTX = % Packets Dropped (Tx)
J: %PKTDRX = % Packets Dropped (Rx)
* K: QP = Number of Queue Pairs Allocated
L: CQ = Number of Completion Queue Pairs Allocated
M: SRQ = Number of Shared Receive Queues Allocated
* N: MR = Memory Regions Allocated

Toggle fields with a-n, any other key to return

Default setup enables only the minimum requirement for performance for MB/s, queue pairs (QP) and allocated memory
regions verbs (MR). For in-depth RDMA functionality, please contact your hardware vendor.

Run the following to obtain detailed adapter statistics. Check for any errors here. Queue pairs are adjusted automatically by
requirement:

[root@localhost:~] esxcli rdma device stats get -d vmrdma0


Packets received: 1576258135
Packets sent: 899769661
Bytes received: 40653333761546
Bytes sent: 1079424621290
Error packets received: 0
Error packets sent: 0
Error length packets received: 0

TECHNICAL WHITE PAPER | 90


VMware vSAN™ Architecture Overview & Setup PoC Guide

APPENDIX B: Cleanly Removing vSAN Configuration


vCLS Retreat Mode
On occasion, it may become necessary to remove a vSAN cluster and reset hosts to a ‘clean’ state.

To expedite the process, it is advisable to first put vCLS into retreat mode. This will delete the vCLS VMs and make it easier to
remove the vSAN datastore and put hosts into maintenance mode, etc.

To achieve this, an vCenter advanced setting, ‘config.vcls.clusters.[domain].enabled’ needs to be set.

The procedure to do this is detailed in the documentation here: https://docs.vmware.com/en/VMware-


vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-F98C3C93-875D-4570-852B-37A38878CE0F.html

To make this easier a script is available here to use (download to a Linux or Mac host, uses govc):
https://github.com/vmware-tanzu-experiments/vsphere-with-tanzu-proof-of-concept-samples/blob/main/VCF/vCLS.sh

Remove vSAN Partitions and Clear Data


The next step is to turn off vSAN from vCenter, under [cluster] > Configure > Services > vSAN. If for some reason this step
encounters errors, the method below may be useful.

First, open an SSH session to all hosts in the cluster and list the disks used by vSAN by using the command:

vdq -iH

The next step depends on the type of cluster

OSA Clusters
Remove the cache device from each disk group, using the command:

esxcli vsan storage remove -s [cache device]

ESA Clusters
Remove disks from the storage pool, using the command:

esxcli vsan storagepool remove -d [device]


Next, relabel the disks:

partedUtil mklabel /vmfs/devices/disks/[disk] gpt


Again, to make this easier, a script is available to help with this:

OSA: https://github.com/vmware-tanzu-experiments/vsphere-with-tanzu-proof-of-concept-samples/blob/main/VCF/vsan-
remove-esa.sh

ESA: https://github.com/vmware-tanzu-experiments/vsphere-with-tanzu-proof-of-concept-samples/blob/main/VCF/vsan-
remove-esa.sh

TECHNICAL WHITE PAPER | 91


Copyright © 2024 Broadcom. All rights reserved.
The term “Broadcom” refers to Broadcom Inc. and/or its subsidiaries. For more information, go to www.broadcom.com. All trademarks, trade names, service marks, and logos
referenced herein belong to their respective companies. Broadcom reserves the right to make changes without further notice to any products or data herein to improve
reliability, function, or design. Information furnished by Broadcom is believed to be accurate and reliable. However, Broadcom does not assume any liability arising out of the
application or use of this information, nor the application or use of any product or circuit described herein, neither does it convey any license under its patent rights nor the
rights of others.
Item No: vmw-bc-wp-tech-temp-uslet-word-2024 1/24

You might also like