Monday, April 4, 2011

Sneaking around the lack of name resolution in Azure

This is one of those things that you will most likely run into if you try to run a traditional enterprise application in Azure – there is no name resolution.

Over many years enterprise applications have gone through the evolution of moving from IP addresses to relying on WINS to now DNS.  This move to using machine names provides great flexibility to server administrators to replace boxes at will, simply by editing a DNS entry or renaming the new server to the old server.  No need to update the configuration of the application.

Well, now we have Azure.  And although it is not Infrastructure as a Service, if you use a VM Role for some component of a historic enterprise application you do expect familiar features.

One way to handle this is to use Azure Connect.  Connect provides name resolution for the machine that participate in the virtual network.  However you only get IPv6 endpoint addresses back.  And this is useful for machine to machine communication but not for much else.

Another hitch is that in Azure machine names change.  Since all of the images are prepared with sysprep they come out of sysprep with new machine names (each provisioned instance needs to be unique, right?).  All that I know is that my worker tier machine will get new names if something happens and an instance gets replaced.  Or is someone stops my service and then starts it again, or reimages one of my instances.  With this going on, name resolution actually doesn’t help me much.

Well, I had a situation where I absolutely needed name resolution.  And that is one piece of information that I cannot query through the Azure Service Runtime.  So I have a homegrown solution – I edit the HOSTS file on each server.

In my example I only need to know the machine name of all of the instances of a particular role – so I query that to get the IP addresses.  I then turn around and use WinRM to query those instances and get the DNSHostName back from them.

I am using WinRM and not WMI because it is a single, incoming, well known port.  So I only have to define one endpoint at the Role level.  Through a bit of searching I discovered that WMI is not so fixed and I didn’t want to make it so.

Here is how it goes.  Warning:  I am not securing any of this.  I use HTTP for WinRM, and I reduce the PowerShell script execution security.  If you need to be secure and tight, then you need to tighten that up.

In my sysprep unattend.xml I first change the PowerShell execution policy.  I then run a script to set the WinRM service and client.  I then have a wait to give each instance a change to set the WinRM settings.  Then I query the Azure Service Runtime for the role instances and their IP addresses, then through WinRM I touch each server and get its DNSHostName.  Last, I append the local HOSTS file adding this information.  This executes on each server – they are all clones of each other after all.

Don’t forget to add the Internal Endpoint of 5985 to the Role definition in your service.  If you don’t, none of this matters.  If you secure this, use the endpoint of 5986.

Setting WinRM on each client (unsecurely, be aware of that):

<#
.SYNOPSIS
    A script to set the WinRM firewall rules and to configure the service and the
    client.  This enables unsecured communication.
.DESCRIPTION
    This script is designed to simply set up WinRM for remote unsecured communication.
    QuickConfig opens the firewall, the service settings allow remote connections,
    the client setting enables unsecured calling from the client to the remote service.
.LEGAL
    SCRIPT PROVIDED "AS IS" WITH NO WARRANTIES OR GUARANTEES OF ANY KIND, INCLUDING BUT NOT LIMITED TO
    MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  ALL RISKS OF DAMAGE REMAINS WITH THE USER, EVEN IF THE AUTHOR,
    SUPPLIER OR DISTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF ANY SUCH DAMAGE.  IF YOUR STATE DOES NOT PERMIT THE COMPLETE
    LIMITATION OF LIABILITY, THEN DELETE THIS FILE SINCE YOU ARE NOW PROHIBITED TO HAVE IT.  TEST ON NON-PRODUCTION SERVERS.
.AUTHOR
    Brian Ehlert, Citrix Labs, Redmond, WA, USA
.REFERENCES
    Thank you TechNet. For examples. And a random forum post for the single quote fix.
#>

winrm quickconfig -quiet
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
winrm set winrm/config/client '@{AllowUnencrypted="true"}'
winrm set winrm/config/client
'@{TrustedHosts="*"}'

Now, to enumerate the information from the service runtime and append the HOSTS file:

<#
.SYNOPSIS
    A script to set the HOSTS file for Azure VMs to allow proper name resolution.
.DESCRIPTION
    In an Azure environment name resolution might not be available or might resolve IPv6 addresses.
    The Azure RuntimeService is queried to discover the IP addresses of other role members.  And then
    WinRM is used to query the DNSHostName of the other servers.
    The results are then appended to the HOSTS file.
.LEGAL
    SCRIPT PROVIDED "AS IS" WITH NO WARRANTIES OR GUARANTEES OF ANY KIND, INCLUDING BUT NOT LIMITED TO
    MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  ALL RISKS OF DAMAGE REMAINS WITH THE USER, EVEN IF THE AUTHOR,
    SUPPLIER OR DISTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF ANY SUCH DAMAGE.  IF YOUR STATE DOES NOT PERMIT THE COMPLETE
    LIMITATION OF LIABILITY, THEN DELETE THIS FILE SINCE YOU ARE NOW PROHIBITED TO HAVE IT.  TEST ON NON-PRODUCTION SERVERS.
.AUTHOR
    Brian Ehlert, Citrix Labs, Redmond, WA, USA
.REFERENCES
    Thank you to Jason Fossen (
http://blogs.sans.org/windows-security/). And to TechNet. For examples.
#>

# this is a local administrator and user name that is established in the VM
# Azure RDP access will automatically create / inject a user account that is defined
# Otherwise you need to establish a user account using your unattend.xml
$userName = "administrator"
$password = "Citrix`$2"
# Note:  I am using a plain text password.

# Add the Service Runtime snap-in to the standard Windows PowerShell command shell.
add-pssnapin microsoft.windowsazure.serviceruntime

# Take the VM Instance offline with Azure
Set-RoleInstanceStatus -Busy

$HostsFilePath = "$env:systemroot\system32\drivers\etc\hosts"

# Test the Hosts file by adding LocalHost entries
"127.0.0.1 localhost"  | add-content $HostsFilePath -force
"::1 localhost"  | add-content $HostsFilePath -force
if (-not $?) { "Error writing to hosts file!" ; return }

# Discover the other members of the Role
# It is not possible to have a server discover the Role that it is; as is the question "what am I" - it must be hardcoded.
# Enumerate all of the instances of the role named MyRole
$roleMem = Get-RoleInstance -Role MyRole

# Discover the endpoint IP the service is not necessary as it is the same for all
foreach ($roleIn in $roleMem) {

    # Find the WinRM port number
    foreach ($roleInEnd in $roleIn.InstanceEndpoints.Values){
        if ($roleInEnd.IPEndpoint.Port.Equals(5985)){
            #Get the IP of the endpoint
            $endIp = $roleInEnd.IPEndpoint.Address.ToString()
        }
        else{}
    }
   
    # winrm get wmi/root/cimv2/Win32_ComputerSystem (I simply find is easier to treat this as XML)
    $remoteServer = [xml](winrm enum wmi/root/cimv2/Win32_ComputerSystem -r:$endIp -encoding:utf-8 -a:basic -u:$userName -p:$password -format:pretty)
   
    # Add the entry to the HOSTS file
    $endIp + "`t" + $remoteServer.Results.Win32_ComputerSystem.DNSHostName
    $endIp + "`t " + $remoteServer.Results.Win32_ComputerSystem.DNSHostName |  add-content $HostsFilePath -force
   
}

That is it.  I use two scripts with a wait sequence in between (I use my random sleep script a couple articles back) since each machine is independent and must be able to call out to the others – the timing for WinRM happening early is important.

No comments: