SlideShare a Scribd company logo
Hyper-V R2 High-Availability DEEP DIVE! Greg Shields, MVP, vExpert Head Geek, Concentrated Technology www.ConcentratedTech.com
This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like. For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site,  www.ConcentratedTech.com .  For links to newly-posted decks, follow us on Twitter: @concentrateddon or @concentratdgreg This work is copyright ©Concentrated Technology, LLC
Agenda Part I Understanding Live Migration ’s Role in Hyper-V HA Part II The Fundamentals of Windows Failover Clustering Part III Building a Two-Node Hyper-V Cluster with iSCSI Storage Part IV Walking through the Management of a Hyper-V Cluster Part V Adding Disaster Recovery with Multi-Site Clustering
Part I Understanding Live Migration ’s Role in Hyper-V HA
Do You Really Need HA? High-availability adds dramatically greater uptime for virtual machines. Protection against host failures Protection against resource overuse Protection against scheduled/unscheduled downtime High-availability also adds much greater cost… Shared storage between hosts Connectivity Higher (and more expensive) software editions Not every environment needs HA!
What Really is Live Migration? Part 1:  Protection from Host Failures
What Really is Live Migration? Part 2:  Load Balancing of VM/host Resources
Comparing Quick w/ Live Migration Simply put:  Migration speed is the difference. In Hyper-V ’s original release, a Hyper-V virtual machine could be relocated with “a minimum” of downtime. This downtime was directly related to.. … the amount of memory assigned to the virtual machine … the connection speed between virtual hosts and shared storage. Virtual machines with greater levels of assigned virtual memory and slow networks would take longer to complete a migration from one host to another. Those with less could complete the migration in a smaller amount of time. With QM, a VM with 2G of vRAM could take 32 seconds or longer to migrate!  Downtime ensues…
Comparing Quick w/ Live Migration Down/dirty details… During a Quick Migration, the virtual machine is immediately put into a  “Saved” state. This state is not a power down, nor is it the same as the Paused state. In the saved state – and unlike pausing – the virtual machine releases its memory reservation on the host machine and stores the contents of its memory pages to disk. Once this has completed, the target host can take over the ownership of the virtual machine and bring it back to operations.
Comparing Quick w/ Live Migration Down/dirty details… This saving of virtual machine state consumes most of the time involved with a Quick Migration. Needed to reduce this time delay was a mechanism to  pre-copy  the virtual machine ’s memory from source to target host. At the same moment the pre-copy would to log changes to memory pages that occur during the period of the copy. These changes tend to be relatively small in quantity, making the delta copy significantly smaller and faster than the original copy. Once the initial copy has completed, Live Migration then… … pauses the virtual machine … copies the memory deltas … transfers ownership to the target host. Much faster.  Effectively  “zero” downtime.
Part II The Fundamentals of Windows Failover Clustering
Why  Clustering  Fundamentals? Isn ’t this, after all, a workshop on Hyper-V? It is, but the only way to do highly-available Hyper-V is atop Windows Failover Clustering Many people have given clustering a pass due to early difficulties with its technologies. Microsoft did us all a disservice by making every previous version of Failover Clustering ridiculously painful to implement. Most IT pros have no experience with clustering. … but clustering doesn ’t  have  to be hard.  It just feels like it does! Doing clustering badly means doing HA Hyper-V badly!
Clustering ’s Sordid History Windows NT 4.0 Microsoft Cluster Service  “wolfpack” High-availability service that reduced availability “ As the corporate expert in Windows clustering, I recommend  you don’t use Windows clustering. ” Windows 2000 Greater availability, scalability.  Still painful Windows 2003 Added iSCSI storage to traditional Fibre Channel SCSI Resets still used as method of last resort (painful) Windows 2008 Eliminated use of SCSI Resets Eliminated full-solution HCL requirement Added Cluster Validation Wizard and pre-cluster tests First version truly usable by IT generalists
What ’s New & Changed in 2008 x64 EE gets up to 16 nodes. Backups get VSS support. Disks can be brought on-line without taking dependencies offline.  This allows disk extension without downtime. GPT disks are supported. Cluster self-healing.  No longer reliant on disk signatures.  Multiple paths for identifying  “lost” or failed disks. IPv6 & DHCP support. Network Name resource now uses DNS instead of WINS. Network Name resource more resilient.  Loss of an IP address need not bring Network Name resource offline. Geo-clustering…!  a.k.a. cross-subnet clustering.  Cluster communications use TCP unicast and can span subnets.
So, What IS a Cluster?
So, What IS a Cluster? Quorum Drive & Storage for Hyper-V VMs
Cluster Quorum Models Ever been to a Kiwanis meeting…? A cluster  “exists” because it has quorum between its members.  That quorum is achieved through a voting process. Different Kiwanis clubs have different rules for quorum. Different clusters have different rules for quorum. If a cluster  “loses quorum”, the entire cluster shuts down and ceases to exist.  This happens until quorum is regained. This is much different than a resource failover, which is the reason why clusters are implemented. Multiple quorum models exist, for different reasons.
Node & Disk Majority Node majority eliminates Win2003 ’s Quorum disk as a point of failure.  Works on a “voting system”. A two-node cluster gets three votes. One for each node and one for the quorum. Two votes are needed for quorum. Because of this model, the loss of the quorum disk only results in the loss of one vote. Used when an  even number of nodes are in the cluster. Most-deployed model in production.
Node Majority Only shared storage devices get votes, replicated storage does not. Requires 3+ votes, so need a minimum of three members. Used when the number of cluster nodes is odd. Can use replicated storage instead of shared storage. Handy for stretch clusters.
File Share Witness Model Clustering without the nasty (expensive) shared storage! (Sort of…OK…not really…) One file server can serve as witness for multiple clusters. Can be used for non- production Hyper-V clusters. (eval/demo only) Most flexible model for stretch clusters. Eliminates issues of complete site outage.
Witness Disk Model Nodes get no votes.  Only the quorum. Cluster remains up as long as one node can talk to the witness disk. Effectively the same as legacy model. Bad.  SPOF.  Don ’t use.
4 Steps to Cluster! Step 1:  Configure shared storage. Hardware SAN Software SAN a la StarWind iSCSI Target Software Step 2:  Attach Hyper-V Hosts to the iSCSI Target Step 3:  Configure Windows Failover Clustering Step 4:  Configure Hyper-V
Part III -VIDEO- Building a Two-Node Hyper-V Cluster with iSCSI Storage
Part IV Walking through the Management of a Hyper-V Cluster
Cluster Shared Volumes Hyper-V v.1 required a single VM/LUN. v.1 ’s clustering underpinnings weren’t aware of the files on a LUN.  The “disk” was the cluster resource to failover. Remember that only one node at a time can own a resource. v.2 adds cluster-awareness to individual volumes. This means that individual files on a LUN can be owned by different hosts.  Hosts respect the ownership of each other.
Cluster Shared Volumes Because NTFS is still the file system, this means creating a meta-system of ownership information. Each cluster node checks for ownership, respects the ownership of others, and updates info when they take over ownership. Designed for use  only  by Hyper-V ’s tiny number of files.
Going Beyond Two Nodes Windows Failover Clustering gets non-linearly more complex as you add more hosts. Complexity arrives in failover options. Some critical best practices: Manage Preferred Owners & Persistent Mode options correctly. Consider carefully the effects of Failback. Resist creating hybrid clusters that support other services. Integrate SCVMM for dramatically improved management Use disk  “dependencies” as Affinity/Anti-Affinity rules. Add servers in pairs. Segregate traffic!!!
Best Practices in Network Segregation
Best Practices in Network Segregation
-DEMO- Walking through the Management of a Hyper-V Cluster
Part V Adding Disaster Recovery with Multi-Site Clustering
What Makes a Disaster? Which of the following would you consider a disaster? A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage.  That damage causes the entire processing of that datacenter to cease. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time. A problem with a virtual host creates a  “blue screen of death”, immediately ceasing all processing on that server. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.
What Makes a Disaster? Which of the following would you consider a disaster? A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage.  That damage causes the entire processing of that datacenter to cease. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time. A problem with a virtual host creates a  “blue screen of death”, immediately ceasing all processing on that server. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down. DISASTER! JUST A BAD DAY!
What Makes a Disaster? Your business ’ decision to “declare a disaster” and move to “disaster operations” is a major one. The technologies that are used for disaster protection are different than those used for HA. More complex.  More expensive. Failover and failback processes involve more thought.
What Makes a Disaster? At a very high level, disaster recovery for virtual environments is three things: A storage mechanism A replication mechanism A set of target servers to receive virtual machines and their data
What Makes a Disaster? Storage Device(s) Replication Mechanism Target Servers
Storage Device Typically, two SANs in two different locations Fibre Channel or iSCSI Usually similar model or manufacturer. This is often necessary for replication mechanism to function property. Backup SAN doesn ’t necessarily need to be of the same size as the primary SAN Replicated data isn ’t always full set of data.
Replication Mechanism Replication between SANs can occur… Synchronously Changes are made on one node at a time.  Subsequent changes on primary SAN must wait for ACK from backup SAN. Asynchronously Changes on backup SAN will eventually be written.  Are queued at primary SAN to be transferred at intervals.
Replication Mechanism Synchronously Changes are made on one node at a time.  Subsequent changes on primary SAN must wait for ACK from backup SAN.
Replication Mechanism Asynchronously Changes on backup SAN will eventually be written.  Are queued at primary SAN to be transferred at intervals.
Replication Mechanism Which Should You Choose…? Synchronous Assures no loss of data. Requires a high-bandwidth and low-latency connection. Write and acknowledgement latencies impact performance. Requires shorter distances between storage devices. Asynchronous Potential for loss of data during a failure. Leverages smaller-bandwidth connections, more tolerant of latency. No performance impact. Potential to stretch across longer distances. Your Recovery Point Objective makes this decision…
Replication Mechanism Replication processing can occur… Storage Layer Replication processing is handled by the SAN itself.  Often agents are installed to virtual hosts or machines to ensure crash consistency. Easier to set up, fewer moving parts.  More scalable.  Concerns about crash consistency. OS / Application Layer Replication processing is handled by software in the VM OS.  This software also operates as the agent. More challenging to set up, more moving parts.  More installations to manage/monitor.  Scalability and cost are linear.  Fewer concerns about crash consistency.
The Problem with Transactional Databases O/S Crash Consistency is easy to obtain. Just quiesce the file system before beginning the replication. Application Crash Consistency much harder. Transactional databases like AD, Exchange, SQL don ’t quiesce when the file system does. Need to stop these databases before quiescence. Or, need an agent in the VM that handles DB quiescence. Replication without crash consistency will lose data. DB comes back in  “inconsistent” state.
Four-Step Process for VSS Step 1:  A requestor, such as replication software, requests the server to invoke a shadow copy. Step 2:  A provider accepts the request and calls  an application-specific provider (SQL, Exchange, etc.) if necessary. Step 3:  Application-specific provider coordinates system shadow copy with app quiescence to ensure application consistency. Step 4:  Shadow copy is created. … then the replication can start…
Target Servers & Cluster Finally is a set of target servers in the backup site. With Hyper-V these servers are part of a Multi-Site Hyper-V cluster. A multi-site cluster is the exact same thing as a single-site cluster, except that it expands over multiple sites. Some changes to management and configuration tactics required.
Target Servers & Cluster Finally is a set of target servers in the backup site. With Hyper-V these servers are part of a Multi-Site Hyper-V cluster. A multi-site cluster is the exact same thing as a single-site cluster, except that it expands over multiple sites. Some changes to management and configuration tactics required.
Multi-Site Cluster Tactics Install servers to sites so that your primary site always contains more servers than backup sites. Eliminates some problems with quorum during site outage.
Multi-Site Cluster Tactics Leverage Node and File Share Quorum when possible. Prevents entire-site outage from impacting quorum. Enables creation of multiple clusters if necessary. Third Site for Witness Server
Multi-Site Cluster Tactics Ensure that networking remains available when VMs migrate from primary to backup site. R2 clustering can now span subnets. This seems like a good thing, but only if you plan correctly for it. Remember that crossing subnets also means changing IP address, subnet mask, gateway, etc, at new site. This can be automatically done by using DHCP and dynamic DNS, or must be manually updated. DNS replication is also a problem.  Clients will require time to update their local cache. Consider reducing DNS TTL or clearing client cache.
 
This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like. For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site,  www.ConcentratedTech.com .  For links to newly-posted decks, follow us on Twitter: @concentrateddon or @concentratdgreg This work is copyright ©Concentrated Technology, LLC

More Related Content

PPTX
Implementing dr w. hyper v clustering
Concentrated Technology
 
PDF
node.js in production: Reflections on three years of riding the unicorn
bcantrill
 
PDF
Triton + Docker, July 2015
Casey Bisson
 
PDF
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
PPTX
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
The Linux Foundation
 
PDF
Xen time machine
The Linux Foundation
 
PDF
Aplura virtualization slides
The Linux Foundation
 
PDF
Advanced Docker Developer Workflows on MacOS X and Windows
Anil Madhavapeddy
 
Implementing dr w. hyper v clustering
Concentrated Technology
 
node.js in production: Reflections on three years of riding the unicorn
bcantrill
 
Triton + Docker, July 2015
Casey Bisson
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
The Linux Foundation
 
Xen time machine
The Linux Foundation
 
Aplura virtualization slides
The Linux Foundation
 
Advanced Docker Developer Workflows on MacOS X and Windows
Anil Madhavapeddy
 

What's hot (20)

ODP
open source virtualization
Kris Buytaert
 
PDF
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
PDF
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
The Linux Foundation
 
PPTX
Using functional programming within an industrial product group: perspectives...
Anil Madhavapeddy
 
PDF
OSSNA18: Xen Beginners Training (exercise script)
The Linux Foundation
 
PDF
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
bcantrill
 
PDF
Securing your cloud with Xen's advanced security features
The Linux Foundation
 
PDF
LF Collaboration Summit: Xen Project 4 4 Features and Futures
The Linux Foundation
 
PPTX
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
David McGeough
 
PDF
E2E PVS Technical Overview Stephane Thirion
sthirion
 
PDF
OWF: Xen - Open Source Hypervisor Designed for Clouds
The Linux Foundation
 
ODP
A Xen Case Study
Kris Buytaert
 
PDF
Performance Tuning Xen
The Linux Foundation
 
PDF
Make room for more virtual desktops with fast storage
Principled Technologies
 
PDF
Xen server storage Overview
Nuno Alves
 
PDF
OSCON: Better Collaboration through Tooling
Docker, Inc.
 
PDF
Solar Powered MicroServers - Green Computing
Paul Morse
 
PPTX
Whats new in Microsoft Windows Server 2016 Clustering and Storage
John Moran
 
PDF
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
The Linux Foundation
 
PDF
Web Werks Cloud Hosting FAQ
Web Werks Data Centers
 
open source virtualization
Kris Buytaert
 
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
The Linux Foundation
 
Using functional programming within an industrial product group: perspectives...
Anil Madhavapeddy
 
OSSNA18: Xen Beginners Training (exercise script)
The Linux Foundation
 
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
bcantrill
 
Securing your cloud with Xen's advanced security features
The Linux Foundation
 
LF Collaboration Summit: Xen Project 4 4 Features and Futures
The Linux Foundation
 
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
David McGeough
 
E2E PVS Technical Overview Stephane Thirion
sthirion
 
OWF: Xen - Open Source Hypervisor Designed for Clouds
The Linux Foundation
 
A Xen Case Study
Kris Buytaert
 
Performance Tuning Xen
The Linux Foundation
 
Make room for more virtual desktops with fast storage
Principled Technologies
 
Xen server storage Overview
Nuno Alves
 
OSCON: Better Collaboration through Tooling
Docker, Inc.
 
Solar Powered MicroServers - Green Computing
Paul Morse
 
Whats new in Microsoft Windows Server 2016 Clustering and Storage
John Moran
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
The Linux Foundation
 
Web Werks Cloud Hosting FAQ
Web Werks Data Centers
 
Ad

Similar to Hyper v r2 deep dive (20)

PDF
Introduction to failover clustering with sql server
Eduardo Castro
 
PPT
10215 A 11
Juanchi_43
 
PPTX
MCSA 70-412 Chapter 11
Computer Networking
 
PPTX
BITIC-27 Proyecto 3 BITIC 3 2021 Andres Labera Failover-Cluster.pptx
RodrigoOrtz4
 
PPTX
Hyper-V’s Virtualization Enhancements - EPC Group
EPC Group
 
PPT
High Availability with Windows Server Clustering and Geo-Clustering
StarWind Software
 
PPTX
HostClustering_1.pptx
balamurali_a
 
PPTX
Windows server 2012 failover clustering new features
Joseph D'Antoni
 
PDF
Windows server 2012 r2 Hyper-v Component architecture
Tũi Wichets
 
PPTX
9-clustering-.pptx
AnsarHasas1
 
PPTX
SVR208 Gaining Higher Availability with Windows Server 2008 R2 Failover Clust...
Louis Göhl
 
PPTX
Virtualization Cloud computing technology
anjali23376
 
PPTX
Hyper-V ile Yüksek Erişilebilirlik Senaryoları
Ertan GULEN
 
PPTX
Full on Demo on Setting up High Availability Virtual Machine
Lai Yoong Seng
 
PPTX
Configuring and Using the New Virtualization Features in Windows Server 2012
Lai Yoong Seng
 
PPTX
Hyper-v for Windows Server 2012 Live Migration
CTE Solutions Inc.
 
PPTX
Class 7.pptx
MadaveeJinadasa1
 
PPTX
Windows Server 2012 Virtualization: Notes from the Field
Microsoft TechNet - Belgium and Luxembourg
 
PDF
Microsoft Hyper V Cluster Design 1st Edition Eric Siron
qolbiynezha
 
PPTX
Building Business Continuity Solutions With Hyper V
rsnarayanan
 
Introduction to failover clustering with sql server
Eduardo Castro
 
10215 A 11
Juanchi_43
 
MCSA 70-412 Chapter 11
Computer Networking
 
BITIC-27 Proyecto 3 BITIC 3 2021 Andres Labera Failover-Cluster.pptx
RodrigoOrtz4
 
Hyper-V’s Virtualization Enhancements - EPC Group
EPC Group
 
High Availability with Windows Server Clustering and Geo-Clustering
StarWind Software
 
HostClustering_1.pptx
balamurali_a
 
Windows server 2012 failover clustering new features
Joseph D'Antoni
 
Windows server 2012 r2 Hyper-v Component architecture
Tũi Wichets
 
9-clustering-.pptx
AnsarHasas1
 
SVR208 Gaining Higher Availability with Windows Server 2008 R2 Failover Clust...
Louis Göhl
 
Virtualization Cloud computing technology
anjali23376
 
Hyper-V ile Yüksek Erişilebilirlik Senaryoları
Ertan GULEN
 
Full on Demo on Setting up High Availability Virtual Machine
Lai Yoong Seng
 
Configuring and Using the New Virtualization Features in Windows Server 2012
Lai Yoong Seng
 
Hyper-v for Windows Server 2012 Live Migration
CTE Solutions Inc.
 
Class 7.pptx
MadaveeJinadasa1
 
Windows Server 2012 Virtualization: Notes from the Field
Microsoft TechNet - Belgium and Luxembourg
 
Microsoft Hyper V Cluster Design 1st Edition Eric Siron
qolbiynezha
 
Building Business Continuity Solutions With Hyper V
rsnarayanan
 
Ad

More from Concentrated Technology (20)

PPT
Wsus sample scripts
Concentrated Technology
 
PPTX
Wsus best practices
Concentrated Technology
 
PPT
Virtualization today
Concentrated Technology
 
PPTX
Virtualization auditing & security deck v1.0
Concentrated Technology
 
PPTX
Vdi in-a-box
Concentrated Technology
 
PPT
From VB Script to PowerShell
Concentrated Technology
 
PPT
Top ESXi command line v2.0
Concentrated Technology
 
PPT
Supporting SQLserver
Concentrated Technology
 
PPT
Server Core2
Concentrated Technology
 
PPT
Securely connecting to apps over the internet using rds
Concentrated Technology
 
PPT
Rapidly deploying software
Concentrated Technology
 
PPT
PS scripting and modularization
Concentrated Technology
 
PPT
PS error handling and debugging
Concentrated Technology
 
PPTX
PowerShell crashcourse for Sharepoint admins
Concentrated Technology
 
PPT
Prepping software for w7 deployment
Concentrated Technology
 
PPT
PowerShell Remoting
Concentrated Technology
 
PPTX
PowerShell crashcourse for sharepoint
Concentrated Technology
 
PPT
PowerShell crashcourse
Concentrated Technology
 
PPT
PowerShell 8tips
Concentrated Technology
 
PPTX
PowerShell custom properties
Concentrated Technology
 
Wsus sample scripts
Concentrated Technology
 
Wsus best practices
Concentrated Technology
 
Virtualization today
Concentrated Technology
 
Virtualization auditing & security deck v1.0
Concentrated Technology
 
From VB Script to PowerShell
Concentrated Technology
 
Top ESXi command line v2.0
Concentrated Technology
 
Supporting SQLserver
Concentrated Technology
 
Securely connecting to apps over the internet using rds
Concentrated Technology
 
Rapidly deploying software
Concentrated Technology
 
PS scripting and modularization
Concentrated Technology
 
PS error handling and debugging
Concentrated Technology
 
PowerShell crashcourse for Sharepoint admins
Concentrated Technology
 
Prepping software for w7 deployment
Concentrated Technology
 
PowerShell Remoting
Concentrated Technology
 
PowerShell crashcourse for sharepoint
Concentrated Technology
 
PowerShell crashcourse
Concentrated Technology
 
PowerShell 8tips
Concentrated Technology
 
PowerShell custom properties
Concentrated Technology
 

Recently uploaded (20)

PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Software Development Methodologies in 2025
KodekX
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 

Hyper v r2 deep dive

  • 1. Hyper-V R2 High-Availability DEEP DIVE! Greg Shields, MVP, vExpert Head Geek, Concentrated Technology www.ConcentratedTech.com
  • 2. This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like. For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site, www.ConcentratedTech.com . For links to newly-posted decks, follow us on Twitter: @concentrateddon or @concentratdgreg This work is copyright ©Concentrated Technology, LLC
  • 3. Agenda Part I Understanding Live Migration ’s Role in Hyper-V HA Part II The Fundamentals of Windows Failover Clustering Part III Building a Two-Node Hyper-V Cluster with iSCSI Storage Part IV Walking through the Management of a Hyper-V Cluster Part V Adding Disaster Recovery with Multi-Site Clustering
  • 4. Part I Understanding Live Migration ’s Role in Hyper-V HA
  • 5. Do You Really Need HA? High-availability adds dramatically greater uptime for virtual machines. Protection against host failures Protection against resource overuse Protection against scheduled/unscheduled downtime High-availability also adds much greater cost… Shared storage between hosts Connectivity Higher (and more expensive) software editions Not every environment needs HA!
  • 6. What Really is Live Migration? Part 1: Protection from Host Failures
  • 7. What Really is Live Migration? Part 2: Load Balancing of VM/host Resources
  • 8. Comparing Quick w/ Live Migration Simply put: Migration speed is the difference. In Hyper-V ’s original release, a Hyper-V virtual machine could be relocated with “a minimum” of downtime. This downtime was directly related to.. … the amount of memory assigned to the virtual machine … the connection speed between virtual hosts and shared storage. Virtual machines with greater levels of assigned virtual memory and slow networks would take longer to complete a migration from one host to another. Those with less could complete the migration in a smaller amount of time. With QM, a VM with 2G of vRAM could take 32 seconds or longer to migrate! Downtime ensues…
  • 9. Comparing Quick w/ Live Migration Down/dirty details… During a Quick Migration, the virtual machine is immediately put into a “Saved” state. This state is not a power down, nor is it the same as the Paused state. In the saved state – and unlike pausing – the virtual machine releases its memory reservation on the host machine and stores the contents of its memory pages to disk. Once this has completed, the target host can take over the ownership of the virtual machine and bring it back to operations.
  • 10. Comparing Quick w/ Live Migration Down/dirty details… This saving of virtual machine state consumes most of the time involved with a Quick Migration. Needed to reduce this time delay was a mechanism to pre-copy the virtual machine ’s memory from source to target host. At the same moment the pre-copy would to log changes to memory pages that occur during the period of the copy. These changes tend to be relatively small in quantity, making the delta copy significantly smaller and faster than the original copy. Once the initial copy has completed, Live Migration then… … pauses the virtual machine … copies the memory deltas … transfers ownership to the target host. Much faster. Effectively “zero” downtime.
  • 11. Part II The Fundamentals of Windows Failover Clustering
  • 12. Why Clustering Fundamentals? Isn ’t this, after all, a workshop on Hyper-V? It is, but the only way to do highly-available Hyper-V is atop Windows Failover Clustering Many people have given clustering a pass due to early difficulties with its technologies. Microsoft did us all a disservice by making every previous version of Failover Clustering ridiculously painful to implement. Most IT pros have no experience with clustering. … but clustering doesn ’t have to be hard. It just feels like it does! Doing clustering badly means doing HA Hyper-V badly!
  • 13. Clustering ’s Sordid History Windows NT 4.0 Microsoft Cluster Service “wolfpack” High-availability service that reduced availability “ As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering. ” Windows 2000 Greater availability, scalability. Still painful Windows 2003 Added iSCSI storage to traditional Fibre Channel SCSI Resets still used as method of last resort (painful) Windows 2008 Eliminated use of SCSI Resets Eliminated full-solution HCL requirement Added Cluster Validation Wizard and pre-cluster tests First version truly usable by IT generalists
  • 14. What ’s New & Changed in 2008 x64 EE gets up to 16 nodes. Backups get VSS support. Disks can be brought on-line without taking dependencies offline. This allows disk extension without downtime. GPT disks are supported. Cluster self-healing. No longer reliant on disk signatures. Multiple paths for identifying “lost” or failed disks. IPv6 & DHCP support. Network Name resource now uses DNS instead of WINS. Network Name resource more resilient. Loss of an IP address need not bring Network Name resource offline. Geo-clustering…! a.k.a. cross-subnet clustering. Cluster communications use TCP unicast and can span subnets.
  • 15. So, What IS a Cluster?
  • 16. So, What IS a Cluster? Quorum Drive & Storage for Hyper-V VMs
  • 17. Cluster Quorum Models Ever been to a Kiwanis meeting…? A cluster “exists” because it has quorum between its members. That quorum is achieved through a voting process. Different Kiwanis clubs have different rules for quorum. Different clusters have different rules for quorum. If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained. This is much different than a resource failover, which is the reason why clusters are implemented. Multiple quorum models exist, for different reasons.
  • 18. Node & Disk Majority Node majority eliminates Win2003 ’s Quorum disk as a point of failure. Works on a “voting system”. A two-node cluster gets three votes. One for each node and one for the quorum. Two votes are needed for quorum. Because of this model, the loss of the quorum disk only results in the loss of one vote. Used when an even number of nodes are in the cluster. Most-deployed model in production.
  • 19. Node Majority Only shared storage devices get votes, replicated storage does not. Requires 3+ votes, so need a minimum of three members. Used when the number of cluster nodes is odd. Can use replicated storage instead of shared storage. Handy for stretch clusters.
  • 20. File Share Witness Model Clustering without the nasty (expensive) shared storage! (Sort of…OK…not really…) One file server can serve as witness for multiple clusters. Can be used for non- production Hyper-V clusters. (eval/demo only) Most flexible model for stretch clusters. Eliminates issues of complete site outage.
  • 21. Witness Disk Model Nodes get no votes. Only the quorum. Cluster remains up as long as one node can talk to the witness disk. Effectively the same as legacy model. Bad. SPOF. Don ’t use.
  • 22. 4 Steps to Cluster! Step 1: Configure shared storage. Hardware SAN Software SAN a la StarWind iSCSI Target Software Step 2: Attach Hyper-V Hosts to the iSCSI Target Step 3: Configure Windows Failover Clustering Step 4: Configure Hyper-V
  • 23. Part III -VIDEO- Building a Two-Node Hyper-V Cluster with iSCSI Storage
  • 24. Part IV Walking through the Management of a Hyper-V Cluster
  • 25. Cluster Shared Volumes Hyper-V v.1 required a single VM/LUN. v.1 ’s clustering underpinnings weren’t aware of the files on a LUN. The “disk” was the cluster resource to failover. Remember that only one node at a time can own a resource. v.2 adds cluster-awareness to individual volumes. This means that individual files on a LUN can be owned by different hosts. Hosts respect the ownership of each other.
  • 26. Cluster Shared Volumes Because NTFS is still the file system, this means creating a meta-system of ownership information. Each cluster node checks for ownership, respects the ownership of others, and updates info when they take over ownership. Designed for use only by Hyper-V ’s tiny number of files.
  • 27. Going Beyond Two Nodes Windows Failover Clustering gets non-linearly more complex as you add more hosts. Complexity arrives in failover options. Some critical best practices: Manage Preferred Owners & Persistent Mode options correctly. Consider carefully the effects of Failback. Resist creating hybrid clusters that support other services. Integrate SCVMM for dramatically improved management Use disk “dependencies” as Affinity/Anti-Affinity rules. Add servers in pairs. Segregate traffic!!!
  • 28. Best Practices in Network Segregation
  • 29. Best Practices in Network Segregation
  • 30. -DEMO- Walking through the Management of a Hyper-V Cluster
  • 31. Part V Adding Disaster Recovery with Multi-Site Clustering
  • 32. What Makes a Disaster? Which of the following would you consider a disaster? A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time. A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.
  • 33. What Makes a Disaster? Which of the following would you consider a disaster? A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time. A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down. DISASTER! JUST A BAD DAY!
  • 34. What Makes a Disaster? Your business ’ decision to “declare a disaster” and move to “disaster operations” is a major one. The technologies that are used for disaster protection are different than those used for HA. More complex. More expensive. Failover and failback processes involve more thought.
  • 35. What Makes a Disaster? At a very high level, disaster recovery for virtual environments is three things: A storage mechanism A replication mechanism A set of target servers to receive virtual machines and their data
  • 36. What Makes a Disaster? Storage Device(s) Replication Mechanism Target Servers
  • 37. Storage Device Typically, two SANs in two different locations Fibre Channel or iSCSI Usually similar model or manufacturer. This is often necessary for replication mechanism to function property. Backup SAN doesn ’t necessarily need to be of the same size as the primary SAN Replicated data isn ’t always full set of data.
  • 38. Replication Mechanism Replication between SANs can occur… Synchronously Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN. Asynchronously Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.
  • 39. Replication Mechanism Synchronously Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.
  • 40. Replication Mechanism Asynchronously Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.
  • 41. Replication Mechanism Which Should You Choose…? Synchronous Assures no loss of data. Requires a high-bandwidth and low-latency connection. Write and acknowledgement latencies impact performance. Requires shorter distances between storage devices. Asynchronous Potential for loss of data during a failure. Leverages smaller-bandwidth connections, more tolerant of latency. No performance impact. Potential to stretch across longer distances. Your Recovery Point Objective makes this decision…
  • 42. Replication Mechanism Replication processing can occur… Storage Layer Replication processing is handled by the SAN itself. Often agents are installed to virtual hosts or machines to ensure crash consistency. Easier to set up, fewer moving parts. More scalable. Concerns about crash consistency. OS / Application Layer Replication processing is handled by software in the VM OS. This software also operates as the agent. More challenging to set up, more moving parts. More installations to manage/monitor. Scalability and cost are linear. Fewer concerns about crash consistency.
  • 43. The Problem with Transactional Databases O/S Crash Consistency is easy to obtain. Just quiesce the file system before beginning the replication. Application Crash Consistency much harder. Transactional databases like AD, Exchange, SQL don ’t quiesce when the file system does. Need to stop these databases before quiescence. Or, need an agent in the VM that handles DB quiescence. Replication without crash consistency will lose data. DB comes back in “inconsistent” state.
  • 44. Four-Step Process for VSS Step 1: A requestor, such as replication software, requests the server to invoke a shadow copy. Step 2: A provider accepts the request and calls an application-specific provider (SQL, Exchange, etc.) if necessary. Step 3: Application-specific provider coordinates system shadow copy with app quiescence to ensure application consistency. Step 4: Shadow copy is created. … then the replication can start…
  • 45. Target Servers & Cluster Finally is a set of target servers in the backup site. With Hyper-V these servers are part of a Multi-Site Hyper-V cluster. A multi-site cluster is the exact same thing as a single-site cluster, except that it expands over multiple sites. Some changes to management and configuration tactics required.
  • 46. Target Servers & Cluster Finally is a set of target servers in the backup site. With Hyper-V these servers are part of a Multi-Site Hyper-V cluster. A multi-site cluster is the exact same thing as a single-site cluster, except that it expands over multiple sites. Some changes to management and configuration tactics required.
  • 47. Multi-Site Cluster Tactics Install servers to sites so that your primary site always contains more servers than backup sites. Eliminates some problems with quorum during site outage.
  • 48. Multi-Site Cluster Tactics Leverage Node and File Share Quorum when possible. Prevents entire-site outage from impacting quorum. Enables creation of multiple clusters if necessary. Third Site for Witness Server
  • 49. Multi-Site Cluster Tactics Ensure that networking remains available when VMs migrate from primary to backup site. R2 clustering can now span subnets. This seems like a good thing, but only if you plan correctly for it. Remember that crossing subnets also means changing IP address, subnet mask, gateway, etc, at new site. This can be automatically done by using DHCP and dynamic DNS, or must be manually updated. DNS replication is also a problem. Clients will require time to update their local cache. Consider reducing DNS TTL or clearing client cache.
  • 50.  
  • 51. This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like. For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site, www.ConcentratedTech.com . For links to newly-posted decks, follow us on Twitter: @concentrateddon or @concentratdgreg This work is copyright ©Concentrated Technology, LLC

Editor's Notes

  • #2: MGB 2003 © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
  • #4: Greg Shields
  • #5: Greg Shields
  • #6: Greg Shields
  • #7: Greg Shields
  • #8: Greg Shields
  • #9: Greg Shields
  • #10: Greg Shields
  • #11: Greg Shields
  • #12: Greg Shields
  • #13: Greg Shields
  • #14: Greg Shields
  • #15: Greg Shields
  • #16: Greg Shields
  • #17: Greg Shields
  • #19: To show how to configure cluster quorum settings, create a single-node cluster. Then, right-click the cluster name and choose More Actions | Configure Cluster Quorum Settings.
  • #24: Greg Shields
  • #25: Greg Shields
  • #31: Greg Shields
  • #32: Greg Shields
  • #44: Greg Shields
  • #45: Greg Shields