Only one day has gone by since I originally posted this – and I must say that this has been a very interesting adventure. The detiled discussion is in the comments. However, this is a real and valid scenario that a developer should plan for.
Here is a bit more insight into the behavior of VMs in Azure – and one more point that VM Role is NOT a solution for Infrastructure as a Service.
Lest begin with a very simple, one VM scenario:
With a VM Role VM – you (the developer, the person that wants to run your VM on Azure) uploads a VHD into what is now VHD specific storage, in a specific datacenter.
You then create an application in the tool of your choice and define the VHD with the service settings – this links your VHD to a VM definition, firewall configuration, load balancer configuration, etc.
You deploy your VM Role centric service and sit back and wait – then test and voila! it works.
You do stuff with your VM, the VM life changes and all is happy – or is it.
Now, you – being a curious individual – click the “Reboot” button in the Azure portal. You think, cool, I am rebooting my VM – but you aren’t you are actually resetting your service deployment. You return to your VM Role VM to find changes missing.
This takes us into behaviors of the Azure Fabric. On a quick note – if you wanted some type of persistence, you need to use Azure storage for that. Back to the issue at had – your rolled back VM. Lets explore a possibility for why this is.
BTW - This behavior is the same for Web Roles, and Worker Roles as well – but it is the Azure base OS image, not yours.
Basically what happened was no different than a revert to a previous snapshot using Hyper-V, or the old VirtualPC rollback mode. When a VM is deployed there is a base VHD (this can be a base image – or your VM Role VHD) and there is a new differencing disk that is spawned off.
You selecting reboot actually tossed out the differencing disk which contains your latest changes and created a new one, thus reverting your VM Role VM. This is all fine and dandy, however my biggest question is: What are the implications upon authentication mechanism such as Active Directory – AD does not deal with rollbacks of itself or of domain joined machines very well at all.
My scenario is that you are using Azure Connect Services connecting back to a domain controller in your environment – you join the domain, someone clicks reboot and your machine is no longer domain joined or you have a mess of authentication errors. Again, this is not the Azure model.
The Azure model I in this case is that your VM reboots at the fabric layer back to the base image (they recommend that you prepare with sysprep) and it re-joins your domain as a new machine – with all of the pre-installed software.
This is all about persistence and where that persistence resides. In the VMs of your service there is no persistence, the persistence resides within your application and its interaction with Azure storage or writing back to some element within the enterprise.
This is important to understand, especially if you think of Azure as IaaS - which you need to stop doing. It is a platform. It is similar to a hypervisor but it is not a hypervisor in your interaction with it as a developer or ITPro.
In a nutshell what happened is that during the reboot of my VM the Azure Fabric considered the VM unhealthy and thus provisioned a new one. It could be that the differencing disk could not be written back to the root VHD, it could be that “something” in my VM is not as the fabric wants it so it considered it bad and provisioned a new one.
Regardless – this is valid behavior – it is behavior to understand and plan for – and again emphasizes that if you want persistence you must design it in by writing your application state out to Azure Storage (or enterprise storage using Azure Connect) in some way in order to guarantee persistence.
All very interesting.