VMware Integrated OpenStack ‐ (Ninjas) In the Real World
At a recent VMUG I presented about our journey with VIO (VMware Integrated OpenStack) to-date, here is a short write up on that presentation.
So, firstly – what is VIO. Well at it’s core it’s a shrink wrapped flavour of OpenStack, shipped & supported by VMware, running atop the ESXi hypervisors. The VIO architecture connects vSphere resources to the OpenStack Compute, Networking, Block Storage, Image Service, Identity Service, and Orchestration components. You can deploy VMware Integrated OpenStack with either VDS, or NSX‐based networking.
Why use VIO and not a.n.other distro of OpenStack?
For us the biggest differential was enterprise stability. Basically VMware QA the deployment, ship you a virtual appliance, and release around every 12 months & support for two years. We also had a big concern about the cost of day two Ops of vanilla OpenStack in comparison, VIO hugely reduces this, but also leverages:
- integration with existing VMware tooling/skills
- treat it as a “shrink wrapped virtual appliance”
- Supported Product ‐ GSS on the end of a phone (this has been invaluable as we take the first steps on our private cloud journey!
Our Key Uses for VIO
- IaaS (self‐service)
- Enterprise Automation
- Developer Cloud
- Production Burst Capacity
- Kubernetes Integration
- Cloud scaling (On‐prem)
VIO Logical Design:
VIO basically looks like this;
There are two major things to be aware of in the initial design stage
- It’s really important here to know the line in the sand is – e.g. where VMware’s bit stops & OpenStack starts? & how do you cope with that? In a nutshell be aware VMware will support you all the way up to the Horizon UI, once inside user land – that’s largely outwith the realms of GSS. So it’s not a panacea, and you do need to know some OpenStack. But it’s well worth it!
- The biggest design decision once you’ve decided to deploy (both fiscal and design wise 🙂 ) … is whether to run with NSX or VDS! Be aware that all of the L3 networking functionality in the OpenStack arena basically needs NSX to work as you would expect – it’s also important to note you cannot migrate between a VDS and an NSX deployment, if you change your mind it’s a complete re-install!
Other than that you’ll need a separate management cluster, we chose to build ours on a vSphere Metro Storage Cluster (vMSC), there is also a hard requirement on a dedicated VC for this owing to the burn rate of new VMs/instances coming online and saturating your existing VC(s). It’s worth noting that since we launched there is now VIO-in-a-box, this is an entire deployment condensed onto one node – well worth a look!
Challenges with OpenStack
VMware has gone a long way, in their usual space of abstracting and simplifying the complex thing, however no matter how well VIO deals with the infrastructure and deployment in this space it cannot help in the running OpenStack arena. You have to be aware that the on-ramp to learning OpenStack can be complex, with a proliferation of projects and components where many come and go, & most are not enterprise ready! The rate of change in the open source world of OpenStack can be staggering and consequently documentation can sometimes lag behind.
Advantages of OpenStack
The biggest thing here for us is the self-service element and the empowering of developers, the other key things were:
- Fault Domains & Availability Zones
- Less person resource at Ops end via Self Service to devs
- Multi‐Tennancy ‐ Containers alongside Legacy Stacks from a single IaaS platform
- Standardised APIs & product ‐ e.g. not is not custom VMware or other vendor lock code
- Very large global community
Core Advantages of VIO over vanilla OpenStack
There are three arena’s which have proved invaluable for us with VIO over vanilla OpenStack, they are
- Enterprise grade stability
- Support direct with VMware – GSS on the end of a phone has been brilliant for us!
- Almost zero day two Ops costs compared to vanilla OpenStack
Those three above were reason enough but in addition to these advantages we’ve also gain from massively simplified upgrade paths. VMware release a new candidate via a new vApp, which installs parallel to the old, the config is ported over, then we flip Horizon over to the new installation. Once happy we just trash the old instance. There is also all the standard ESXi goodness under the hood like DRS et al, & finally backups – not to be sniffed at, VIO provides a mechanism to backup both the install and the state.
Install Pros & Cons
First up be aware of the requirement for a dedicated VC & PSC, this is owing to the burn rate of which new instances can be created and the capacity for that to overload and existing VC. You’ll need a separate cluster for management – recommended at 3 nodes – we spun up a small vMSC with block replicated storage to give us DC level resilience in the management tier. A standard looking deployment will contain something like 16x VMs: 3x MongoDBs 3x Compute, 3x DB, 2x Controller, 2x DHCPServers, 2x Load Balancers, 1x Ceilometer
How to use
So OpenStack is an API driven thing, as such the API is by far the most complete & powerful way to interact with VIO/OpenStack, followed by the CLI and then the Horizon UI. In that order of functionality!
We’ve also invested in some additional tooling from HashiCorp, namely Terraform (for the birthing of instances) and Vault (for the storing of secrets and retrieval via API calls). Integration with our Puppet stacks is ongoing & on the side we’ve also stood up a sizeable Swift Object Store spanning multiple DCs. All of these together makes for a fairly complete set of tooling. This is all just from the infrastructure side, our Devs are additionally using Vagrant and Packer.
Our in house modifications, stories and lessons learned
This could/should be some of the most interesting bits – so well done if you’re still reading! Below are my own experiences;
- Do make sure you understand your use cases before you begin your journey e.g. NSX or not ‐ is it just containers you want?
- Understand the workloads you want to stand up, ephemeral vs persistent
- Networking ‐ layer2 is doable, but you do lose significant features such as NaaS ‐ Security Groups/segmentation ‐ load balancing
- We got it back – we’ve had a few events which may have been terminal and required full rebuilds for a vanilla OpenStack deployment. I should hasten to add that these have all been events that we’ve done to VIO, not VIO going awry or failing. Exemplars of those are:
- do NOT delete your core services project! Itchy trigger fingers from an Asana “clean up projects” task caused every single project to be deleted, this included the core services project that has all of OpenStacks internal services in it, the equivalent of “rm -rf / *” (this should now not be possible after a RFE in future releases beyond Mitaka)
- Storage violently removed from the hypervisors – a combination of a VMware ESXi bug and a SAN platform migration caused storage to be violently taken away and this corrupted our VIO DB stacks. Again ‐ a Jenkins re‐deploy from base config brought it all back.
- Be aware that whilst VMware can not support many things you can do on top of OpenStack (only the VIO components – that’s understandable) but that you can extend it like any other OpenStack installation, we’ve built, and are using, a SWIFT Object Store (cross datacenter) providing us a geographically stretched redundant filesystem.
- Windows licensing is… tricky? Basically in pulling together their infrastructure if you intend to cater for Windows instances you should consider early on that footprint and make a determination as to whether to use datacenter licensing, or not.
- VIO is a very transparent black box, there is much temptation to look, and poke, at the insides. This is not always the best idea – a little knowledge is a dangerous thing
- VMware PSO was invaluable to us in getting a functioning OpenStack environment in a short amount of time!
- Ongoing GSS Support has also been very beneficial – even in the early days whilst it was under the emerging products banner we always received great support
As per above, we’ve already integrated with Terraform and Vault, but we’re pursuing further integrations with Consul & Puppet. Automated birthing, config and plumbing of netscaler configs & devolved DNS & auto‐service discovery & an upgrade to the incoming VIO 5 based on Queens
If you’d like to know more, please reach out to me through this site.