How to Build a SQL Server AlwaysOn Failover Cluster Instance with SIOS DataKeeper using Azure Resource Manager

The officially recommended way for building highly available SQL Server environments in Azure is AlwaysOn Availability Groups, see here. This approach has many benefits, e.g. failover for a discrete set of user databases, flexible failover policies and read-only secondaries, but it requires SQL Enterprise edition (as described in the feature matrix of SQL Server 2014).

If you don’t need these additional capabilities and you like saving some money, there is an alternative way to build a highly-available and very scalable 2-node cluster on top of AlwaysOn Failover Cluster Instances (FCI) using SQL Server Standard edition. ‘Hang on’, you might say, ‘doesn’t FCI require shared storage – is that possible at all in Azure?’ Actually it is, by leveraging SIOS DataKeeper from the Azure Marketplace in order to synchronize local disks attached to the cluster nodes.

This post will show you how to set up this environment in Azure Resource Manager step-by-step, using PowerShell 1.0 as well as the new Azure Portal.

As the post got fairly long, here are some shortcuts for quick navigation:

Gimme the IOPS!
Create the Domain Environment
Build the First Cluster Node
Build the Second Cluster Node
Install Windows Failover Clustering
Configure SIOS DataKeeper
Create the SQL Cluster
Create an Internal Load Balancer
Update the SQL Listener
Configure the Windows Firewalls
Validate Cluster Connection and Failover

Gimme the IOPS!

The standard approach for shared storage in Azure is Azure Files, which provides a great solution for SMB-style access across multiple clients in Azure and even on-premises applications. You could use Azure Files as shared storage for building an HA solution on top of SQL Server FCI, but keep in mind that (at the time of this writing) there is an upper limit of 1.000 IOPS (measured for 8KB IO unit size) per share.

In Azure, the usual way to increase throughput for SQL Server OLTP workloads is striping premium storage disks together in order to aggregate their IOPS & bandwidth. By properly configuring Storage Spaces in combination with Azure Premium Storage you can achieve up to 80.000 IOPS and 2.000 MB/sec throughput for a single GS5 virtual machine, see here.

Now, if we could use this approach to scale a single database node (in terms of I/O) and engineer high availability around it, wouldn’t that be great? Well, take a look at the SIOS DataKeeper solution, which you can find in the Azure Marketplace. DataKeeper provides asynchronous or synchronous block-level replication between locally attached disks/volumes and integrates with Storage Spaces as well as Windows Server Failover Clusters (WSFC), such that replicated storage appears to the cluster as shared storage.

Although the Azure Marketplace page for SIOS implies that there is a pay-as-you offering for DataKeeper, you actually need to purchase a license for production use. If you just want to deploy a test cluster (as described in this post), you can request a 14-day trial on their website. The site also contains more information about DataKeeper, e.g. a product overview and a Getting Started guide (which is still based on Azure v1 classic resources and the classic portal).

Create the Domain Environment

Before we can get started with the cluster itself, we need to set up the network and domain environment. We can create the resource group, network and VM for the DC in a single go via the Azure Portal. Just create a new VM from the Windows Server 2012 R2 Datacenter template and make sure to select Resource Manager as deployment model, as we want to embed the complete cluster environment into the new ARM deployment model of Azure.


Parameters I am using in this post (for reference in subsequent chapters):

Resource Group Name sqlfci
VM Name dc1
VM Size Standard D1
Storage Account Name sqlfci1
Storage Account Type Standard-LRS
Virtual Network Name sqlfci
VNet Address Space
Subnet Name infra
Subnet Address Range
Availability Set sqlfci

if you do not change the defaults, the VM will get a public IP as well as an associated network security group (NSG), opening just port 3389 for RDP access on the Internet. The final confirmation blade should look like this:


Make Private IP Address Static

Once provisioning of dc1 is finished, you should make its IP address static, as we will install a domain controller and also need the IP to be referenced as DNS server. Open the resource group blade for sqlfci and navigate to the NIC that has been created for the VM:


Open IP Addresses in the NIC settings and set the private IP address to Static (as the VM is the first one in the subnet it should be set to


Configure DNS

In order to let all nodes in the domain find the other guys by name we also need to set this IP in the DNS settings of the virtual network. Navigate to the virtual network sqlfci, open DNS Servers in its settings and configure DNS as follows (make sure to use the IP address of dc1!):


Create SQL Subnet

While we are still in the VNet configuration let’s create the subnet we are going to need later for our SQL cluster nodes. Go to Settings – Subnets and add a new one as follows:

Subnet Name sql
Subnet Address Range


For the sake of simplicity I am not configuring  a network security group on the subnet (which should be done in a production environment).

Install AD Domain Services

Now on to installing the Windows domain that is required by the SQL Server FCI environment. RDP into dc1 and install Active Directory Domain Services in Server Manager. Once that’s finished promote the server to a domain controller and create a new forest using sqlfci.local as root domain name (otherwise accepting defaults). After rebooting the VM, Server Manager should look like this:


Note that (again in order to keep it simple) I am not using the best practice to put the AD database on a separate data disk, as described here.

Create File Share

While we’re still on the DC, let’s also create a file share. We will need it later on for building cluster quorum using a witness file share, additionally to the 2 cluster node votes. Create a wfs folder on the C: drive and create a new share under File and Storage Services – Share in Server Manager:






In the Permissions section of the wizard, you’ll need to do 2 things: first, configure ‘Modify’ permissions for the sqlcluster object on the c:\wfs folder:





Next, you need to configure ‘Change’ permissions for sqlcluster on the share:



Finally, your list of shares should look like this:


Build the First Cluster Node

After the DC has finished installing and rebooted, we can create the first FCI node. As we are planning to leverage SIOS DataKeeper, let’s start off with the image that’s available in the Marketplace. Click on the New button in the portal and enter ‘SIOS’ in the Search the Marketplace field. This will let you select the SIOS DataKeeper Cluster Edition template:


Again, select Resource Manager as deployment model and enter the following parameters for the first node:

Resource Group Name sqlfci (same as above)
VM Name sql1
VM Size Standard DS2
Storage Account Name sqlfci2
Storage Account Type Premium-LRS
Virtual Network Name sqlfci
Subnet Name sql
Subnet Address Range
Availability Set sqlfci


Important things to note:

  • The SQL nodes will be deployed as DS machines, i.e. they can leverage premium storage
  • Make sure to make the new storage account sqlfci2 a premium account
  • The cluster nodes get provisioned into the sql subnet of the VNet
  • Also make sure to place the cluster nodes into the same availability set (sqlfci) as the domain controller, as we will use the DC as a witness file share for the cluster
  • By default you will get a public IP address associated with a Network Security Group. This is only required for RDPing into the VM and could be removed later on, as access to the cluster will happen via the private IP.

Make Private IP Address Static

Once the VM has finished provisioning, enter the Network Interfaces section of the VM, select the NIC and set the private IP address to static:


Add Data Disks

Now, let’s add 2 premium storage data disks in order to demonstrate how to increase throughput via Storage Spaces. In VM settings, navigate to Disks and attach 2 new disks (named sql1-1 and sql1-2) as follows:


Make sure to add premium storage disks (which is the default) and to switch off host caching. In this example I am using P20 disks, which have a maximum of 2.300 IOPS per disk (P30 would give you 5.000 IOPS).

Join the Domain

RDP into the sql1 VM and join it to the domain. Make sure to specify the full domain name (in our example sqlfci.local):


If the domain cannot be found check your DNS settings in the VNet that we set above.

Configure Storage Spaces

After the VM has rebooted sign in using a domain admin account. In Server Manager, open File and Storage Services and go to Storage Pools. You should see the 2 disks we added in the Physical Disks section as follows:


Now, let’s create a storage pool. Right-click on the Primordial entry in Storage Spaces and select ‘New Storage Pool…’:


Use the following parameters in the wizard:

Pool Name pool1
Physical Disks PhysicalDisk2 (sql1)
PhysicalDisk3 (sql1)
Allocation Automatic

I am leaving the default allocation mode set to ‘Automatic’. For best practices check Performance Guidance for SQL Server in Windows Azure Virtual Machines.

In the next step, let’s create a virtual disk. Right-click on the newly created pool pool1 and select ‘New Virtual Disk…’:


Use the following parameters in the wizard:

Storage Pool pool1
Disk Name disk1
Layout Simple
Size Maximum

The Storage Layout should be set to Simple, as this configuration provides maximum performance/throughput, and redundancy is already built into Azure blob storage, so we don’t have to add it here on the Windows OS level.

After the virtual disk has been configured the ‘New Volume’ wizard opens automatically. You should be able to select the disk created above and configure a new volume:

Provision to sql1
Virtual Disk disk1
Volume Size 1.022 GB
Drive Letter F:
File System NTFS
Allocation Unit Size Default
Volume Label vol1

Note that for production use, the allocation size should be configured according to best practices, as referenced above.

Now you should see a new volume as drive F: (labeled vol1) in the Windows file explorer:


Note that you could potentially add more disks to the VMs and include them into storage pools in order to increase IOPS and throughput for the locally attached storage.

Build the Second Cluster Node

Now, go ahead and create the second node using exactly the same steps as described above, using sql2 as name of the VM. Make sure to use the same datacenter location, resource group and VNet/subnet. You should also use the same premium storage account for the OS and data disks to keep it simple. The second node should get as IP address when using the parameters specified above.

It is very important to join the VM into the existing availability set sqlfci in order to achieve high availability and avoid more than one of the VMs to be taken down during a planned maintenance or unplanned outage. The availability set configuration should look like this:


This is showing that our 3 VMs are located in different fault and update domains, providing HA from an Azure platform maintenance and outage point-of-view. Whenever the Hyper-V hosts are patched and our VMs need to be rebooted, the platform makes sure that only one of those three VMs is affected at a time. Similarly, as the VMs are located in 3 different fault domains (think racks), an outage should not affect more than one VM, keeping the cluster quorum.

Install Windows Failover Clustering

Now, as we have the basic setup up & running, let’s activate the failover clustering feature on the cluster nodes. On both VMs sql1 and sql2, start the Add Roles and Features Wizard and enable the Failover Clustering feature:


Create the Cluster

After feature installation has finished, you can open Failover Cluster Manager on sql1 and create a new cluster by adding both nodes as shown below:


Going through the wizard accepting all defaults will take you through cluster validation, which will report a couple of network and storage related warnings. You can safely ignore those, as the validation procedure is looking for classical shared storage (which we don’t yet have until configuring SIOS DataKeeper later on) and also for redundant network connections (which are transparently provided by the Azure fabric).

On node sql1, create the cluster by specifying a name (e.g. sqlcluster) and accept all defaults. The wizard will finish showing a warning about a missing quorum disk, which we will also fix later on.


Fix Cluster IP Address

When opening the Nodes view in Failover Cluster Manager, you will see that either one of the two nodes is shown as ‘Down’ or the cluster IP address resource does not come online:


The issue here is that DHCP in the Azure VNet gives out an IP address to the cluster IP address resource that has already been associated with one of the node VMs, in my case the address of sql2 (


You can easily fix that by going into the cluster IP address properties and setting the address to an unused static IP from the subnet range, e.g.


Note that this IP doesn’t get ‘reserved’ in DHCP, so you need to make sure it won’t be given to another VM in the subnet. After the IP address has been changed for the cluster object, the second node should come up successfully.


As a sanity check you could also RDP into sql2 (again using a domain admin account) and check if you can connect to the cluster resource name (sqlcluster) in Failover Cluster Manager.

Configuring Cluster Quorum

In order to set up the cluster properly, we need to configure it for quorum. We’ll do that by using the \\dc1\wfs share as a witness file share (see here for quorum modes in Windows Server Failover Clustering):






Configure SIOS DataKeeper

The Windows Server Failover Cluster is now up & running, but doesn’t have any ‘shared’ storage yet. In the next step we will use DataKeeper in order to enable replication & cluster integration for the F: drives we created on both cluster nodes above.

Activate License

As we have based our cluster node VMs on the SIOS image in the Azure Marketplace, DataKeeper is already installed and just needs to be licensed (for testing, you can request a 14-day trial). On both VMs, open the SIOS ‘License Key Installer’ and import the .lic file you got from SIOS:



Once that’s done, you’ll need to set the logon account of the ‘SIOS DataKeeper’ service to a domain account with local admin rights on the VM and then start the service.



Create a DataKeeper Volume

On sql1, open the SIOS DataKeeper MMC snapin and connect to both cluster nodes sql1 and sql2:



Once done, the Server Overview Report should show both nodes with their local f: drives:


Next, you need to create a job in order to replicate data between the F: drives of both nodes.


Configure sql1 as the source node and make sure IP address ( and volume (F) are correct:


Configure sql2 as the target node and make sure IP address( and volume (F) are correct:


Configure the replication mode to ‘Synchronous’, as our cluster nodes are in the same datacenter and network segment, so we want replication to happen as fast as possible and also synchronously in order to get strong consistency for the databases in the cluster.


When creating the job, you will get a message to cluster-enable the volume:


After confirming that you should see the following in the DataKeeper console window:


The volume should now also show up in Failover Cluster Manager in the Disks section:


Voilà: we have created a fully functional Windows Server Failover Cluster in Azure, including quorum configuration and cluster-enabled storage! Let’s make use of it by configuring the SQL Server Failover Cluster Instance next.

Create the SQL Cluster

As we have used the SIOS image from the Azure Marketplace, we can’t use the pay-as-you-go version of SQL Server, but rather need to install it manually on both cluster nodes. I am usually doing it using my MSDN account when testing stuff. I went ahead and downloaded the x64 Standard edition of SQL Server 2014 SP1.


Install the First SQL Node

On sql1, start the setup for SQL Server and select the failover cluster option:


Go through the wizard accepting the defaults. When asked for features to be installed, select Database Engine Services and Management Tools. In the Instance Configuration, enter a name for the cluster (I called it sqlclus). This name will be used by clients later on to connect to the cluster:


The Cluster Disk Selection page will present you the DataKeeper volume we configured above:


On the Cluster Network Configuration page, we need to disable DHCP and manually enter an IP address that will be used for the SQL cluster (I used Again, make sure this address is free and won’t be used by any other VM, it will not be reserved by DHCP.


In Server Configuration, specify a properly configured domain account for the SQL Server Agent and Database Engine services.


In Database Engine Configuration, specify SQL Server admin accounts and configure the data, log and backup directories to point to the F: drive:


Note, that this setup is greatly simplified! Usually it is a best practice to split data and logs onto separate disks, check here for best practices. Also, TempDB could go onto the temporary drive D:, which is SSD based for D(S) series VMs, see here how to do it properly. In any case, make sure that TempDB does not get stored on the F: drive, because it is not supposed to get replicated via DataKeeper! In our case, we’ll leave it on c:\TempDB, so create the directory now.

Now, start the installation process and wait for it to finish, confirmation should look like this:


Install the Second SQL Node

On sql2, start the setup for SQL Server and select the ‘Add node to SQL Server failover cluster’ option:


In Cluster Node Configuration you should see the cluster (sqlclus) we built on sql1:


Cluster Network Configuration should show the IP we configured for the SQL cluster above (


In Service Accounts make sure to enter the password for the domain account (same as on sql1).


Start the installation process and wait for it to finish, the confirmation screen should be identical to the one on sql1.

Before we go on, let’s not forget to create the c:\TempDB directory for storing the SQL TempDB, as not doing so will result in errors trying to bring up sql2 as primary node later on!

Create an Internal Load Balancer

Usually, clients are connecting to the cluster via a SQL listener, which transparently routes them to the currently active server node. In Azure, the listener is mapped to an internal load balancer (ILB). The ILB directs incoming traffic to the primary node in the FCI setup and is also probing on the nodes. Once the cluster fails over the ILB will notice and route traffic to the new primary.

At the time of writing this post it was not possible to create load balancers in the Azure Portal, so let’s do so in PowerShell. We’ll use ARM and PowerShell version 1.0, i.e. we are going to use the new …–AzureRm… cmdlets. You can run the script from your local development machine:


$ip = ""         
$ilbName = "sqlclus"       
$protocol = "tcp"              
$sqlPort = 1433               
$probePort = 59999         
$vnetName = "sqlfci"
$groupName = "sqlfci"
$location = "North Europe"

$vnet = Get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $groupName
$feIP = New-AzureRmLoadBalancerFrontendIpConfig -Name "lbfe" -PrivateIpAddress $ip -SubnetId $vnet.Subnets[1].Id
$bePool = New-AzureRmLoadBalancerBackendAddressPoolConfig -Name "lbbe" 

$healthProbe = New-AzureRmLoadBalancerProbeConfig -Name "sqlprobe" -Protocol $protocol -Port $probePort -IntervalInSeconds 10 -ProbeCount 2
$lbRule = New-AzureRmLoadBalancerRuleConfig -Name "sqltraffic" -FrontendIpConfiguration $feIP -BackendAddressPool $bePool -Probe $healthProbe -Protocol $protocol -FrontendPort $sqlPort -BackendPort $sqlPort -EnableFloatingIP

New-AzureRmLoadBalancer -ResourceGroupName $groupName -Name $ilbName -Location $location -LoadBalancingRule $lbRule -BackendAddressPool $bePool -Probe $healthProbe -FrontendIpConfiguration $feIP

Important things to note:

  • Parameter $ip: needs to correspond to the SQL cluster IP (
  • Parameter $ilbName: needs to correspond to the SQL cluster name (sqlclus)
  • sql subnet: As it is the 2nd subnet in the virtual network it is referenced as $vnet.Subnets[1]
  • The switch –EnableFloatingIP in the ILB rule configuration is very important for the ILB to function properly with the SQL cluster (used to be called Direct Server Return in Azure v1)
  • Parameter $location: make sure to use the same location as for the other components in your cluster.

You should now check successful creation of the ILB in the Azure Portal:


You might wonder that we didn’t add the cluster nodes into the backend address pool right from the start (you can see above that the pool is empty), but obviously that’s not (yet) possible in PowerShell. One way to add the VMs is using the Azure Portal via the Backend pool link in the ILB settings, allowing selection of the availability set and associated VMs:


As we started creating the ILB in PowerShell let’s stick with it for adding the VMs. In order to do that we need the names of the 2 SQL VM NICs, which you can find in the list of resources of the sqlfci resource group:


Now we can configure the backend address pool of the ILB into the NICs as follows:

$lb = Get-AzureRmLoadBalancer -Name $ilbName -ResourceGroupName $groupName
$bePool = Get-AzureRmLoadBalancerBackendAddressPoolConfig -LoadBalancer $lb 

$nic1 = Get-AzureRmNetworkInterface -Name "sql1176" -ResourceGroupName $groupName  
$nic1 | Set-AzureRmNetworkInterface

$nic2 = Get-AzureRmNetworkInterface -Name "sql2526" -ResourceGroupName $groupName  
$nic2 | Set-AzureRmNetworkInterface

Update the SQL Listener

After the ILB has been created we need to update the listener. We will also do that in PowerShell. This time the script needs to be run in an elevated PowerShell session on one of the cluster nodes:

$clusterNetworkName = "Cluster Network 1"
$ipResourceName = "SQL IP Address 1 (sqlclus)" 
$ilbIP = "" 

Import-Module FailoverClusters
Get-ClusterResource $ipResourceName | 
    Set-ClusterParameter -Multiple @{"Address"="$ilbIP";"ProbePort"="59999";SubnetMask="";"Network"="$clusterNetworkName";"OverrideAddressMatch"=1;"EnableDhcp"=0}

Important things to note:

  • Parameter $clusterNetworkName: ‘Cluster Network 1’ is the default if you haven’t changed it
  • Parameter $ipResourceName: Name of the SQL cluster IP address resource. The identifier in parantheses is the SQL cluster name (sqlclus)
  • Parameter $ilbIP: has to match the SQL cluster IP (

Running the script will give you a hint that you need to take the SQL cluster IP address resource off- and online:


Configure the Windows Firewalls

In order to allow SQL traffic and also the probes from the ILB get all the way through to their destination, we need to open ports 1433 (SQL traffic) and 59999 (ILB probes) in the Windows firewall of both sql1 and sql2.

According to security best practices, in a production environment you should also define Network Security Group rules, especially on the SQL subnet. There’s default rules blocking all incoming traffic except originating from inside the VNet or the Azure Load Balancer. You should restrict that further and narrow it down to just allow access for clients of the cluster endpoint in the virtual network.

Validate Cluster Connection and Failover

In order to test connectivity to the cluster, it’s best to use SQL Server Management Studio (SSMS) on another machine in the VNet or alternatively install it on the domain controller. From there you can connect to the cluster using the cluster name sqlclus:


Open SSMS, create a Test database and a Person table with 2 columns Id and Name:


Enter a couple of records and execute the following query:

select Id, Name from Person

select connect_time, num_reads from sys.dm_exec_connections
where session_id = @@SPID


Note that with each query execution, connect_time will remain the same and num_reads will increase by one with each execution.

Now you can go ahead and execute a manual failover in Cluster Manager. Navigate to Roles, right-click the SQL cluster object and select Move – Select Node:


Use the cluster node that’s currently the secondary (sql2 in my case) and confirm:


WSFC and SQL FCI will do their thing and gently fail over to the other node:



Executing the query from above in the still opened SSMS query window will show the new connection in connect_time and num_reads (back down to an initial value of 5 or so):


Now, you may have noticed that when configuring DataKeeper, we specified a source and a target for replication. So is it going one way only or is it bidirectional between the two locally attached disks (hoping for the latter)? In order to check that, let’s add a record to the Person table, fail over to the original node and see what we get.

INSERT INTO Person(Id,Name) VALUES (4, 'Maria')

If you like you can also trigger the failover by shutting down the primary node (sql2 in my example), so lets do it just for fun.  Before shutting down the primary, open Failover Cluster Manager on the secondary and watch what’s going to happen: the cluster is promoting the other node to primary within a couple of seconds, great! But what about our new record? Execute the query from above in the query window and voilà, the new record is still there!



If you are looking for a highly scalable and reliable SQL Server HA solution in Azure IaaS, based on Windows Server Failover Cluster and SQL FCI using ‘shared’ storage, this approach is for you. The combination of Azure Premium Storage, Windows Storage Spaces and SIOS DataKeeper lets you provision powerful SQL clusters achieving significant IOPS and throughput numbers.

At the same time it might also save you money, as the approach does not require SQL Enterprise edition, but can be build on top of a 2-node cluster using SQL Standard.

In terms of support and comparing the approach to other solutions you can check this blog post from the Windows Server Core Team.

Thanks for making it all the way down to the end of this long post. I hope you enjoyed reading & trying it out!

One comment on “How to Build a SQL Server AlwaysOn Failover Cluster Instance with SIOS DataKeeper using Azure Resource Manager
  1. WOW! This article is clean and concise. Only thing that I would do differently is to create the cluster via PowerShell and specify the static IP when doing it. That avoids the whole IP address conflict thing.

    I particularly like the ILB PowerShell commands. Leave it to the community when Microsoft falls behind in their documentation.

Leave a Reply

Your email address will not be published. Required fields are marked *