ESXi scripted builds via PXE/kickstart

Periodically we spin up a slew of new hypervisors. If like me you find yourself desiring more automation (and the uniformity that comes with it) but are somewhere between building ESXi by hand and the scale out auto-deploy tooling for hundreds of systems. You may find this useful, especially if you already maintain a kickstart server.

This is my experiences with scripted installations of ESXi 6.5 via kickstart and PXE booting. The beauty of this is it only requires two options set in your DHCP server, the rest of the configuration is static. The net result is, configure your DHCP, let the host PXE boot and within a few minutes you have a repeatable and dependable build process.

So, how to get there?

First up you need to  have the physical environmental basic’s taken care of, e.g. rack & stack, cable up NICs etc. Once that is in place, this process will require DNS entries populated for your new servers, plus explode the ISO on your tftpserver.

n.b. this scripted install uses the “–overwritevmfs” switch so it’s imperative that you do NOT have the host in a state where it can see any storage other than it’s local disks e.g. if you are rebuilding existing hypervisors or the HBA’s have been zoned into something else previously. This is imperative as this switch has the capacity to overwrite the existing filesystems with the new VMFS’d install, therefore it must ONLY see it’s own local disks 😉

Overview of PXE & Kickstart boot workflow:

" <a href=https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vsphere-esxi-vcenter-server-60-pxe-boot-esxi.pdf (pay attention to “/“ and UEFI !)

[1] the boot loader files on the tftpserver. This is an exemplar showing both UEFI and non UEFI sitting side by side on the same install base.

screenshot1

 

[2]  The contents of the CGI script that gleams the FQDN details

screenshot2

[3]  An exemplar of a kickstart file

screenshot3

 

The boot itself

So with that all in place all you need to do is determine if the server you’re booting is legacy BIOS boot or UEFI. The difference being in the boot loader we’ll point at & insert the relevant DHCP-fu for you;

legacy boot

66=your.tftpserver.com

67=esx-6.5/pxelinux.0

UEFI boot

66=your.tftpserver.com

67=esx-6.5/mboot.efi

then simply set the next boot to PXE reboot and watch the little birdy fly!

There you have it. Effectively two DHCP settings and you can rinse and repeat bringing up dozens of systems quickly with repeatable builds. There are of course many other options, all of which have their merits. This one is perhaps most useful if you are already invested in scripted builds/kickstarts.

 

VMware Integrated OpenStack

VMware Integrated OpenStack ‐ (Ninjas) In the Real World

At a recent VMUG I presented about our journey with VIO (VMware Integrated OpenStack) to-date, here is a short write up on that presentation.

So, firstly – what is VIO. Well at it’s core it’s a shrink wrapped flavour of OpenStack, shipped & supported by VMware, running atop the ESXi hypervisors. The VIO architecture connects vSphere resources to the OpenStack Compute, Networking, Block Storage, Image Service, Identity Service, and Orchestration components. You can deploy VMware Integrated OpenStack with either VDS, or NSX‐based networking.

Why use VIO and not a.n.other distro of OpenStack?

For us the biggest differential was enterprise stability. Basically VMware QA the deployment, ship you a virtual appliance, and release around every 12 months & support for two years. We also had a big concern about the cost of day two Ops of vanilla OpenStack in comparison, VIO hugely reduces this, but also leverages:

  • integration with existing VMware tooling/skills
  • treat it as a “shrink wrapped virtual appliance”
  • Supported Product ‐ GSS on the end of a phone (this has been invaluable as we take the first steps on our private cloud journey!

Our Key Uses for VIO

  • IaaS (self‐service)
  • Enterprise Automation
  • Developer Cloud
  • Production Burst Capacity
  • Kubernetes Integration
  • Cloud scaling (On‐prem)

VIO Logical Design:

VIO basically looks like this;

There are two major things to be aware of in the initial design stage

  1. It’s really important here to know the line in the sand is – e.g. where VMware’s bit stops & OpenStack starts? & how do you cope with that? In a nutshell be aware VMware will support you all the way up to the Horizon UI, once inside user land – that’s largely outwith the realms of GSS. So it’s not a panacea, and you do need to know some OpenStack. But it’s well worth it!
  2. The biggest design decision once you’ve decided to deploy (both fiscal and design wise 🙂 ) …  is whether to run with NSX or VDS! Be aware that all of the L3 networking functionality in the OpenStack arena basically needs NSX to work as you would expect – it’s also important to note you cannot migrate between a VDS and an NSX deployment, if you change your mind it’s a complete re-install!

Other than that you’ll need a separate management cluster, we chose to build ours on a vSphere Metro Storage Cluster (vMSC), there is also a hard requirement on a dedicated VC for this owing to the burn rate of new VMs/instances coming online and saturating your existing VC(s). It’s worth noting that since we launched there is now VIO-in-a-box, this is an entire deployment condensed onto one node – well worth a look!

Challenges with OpenStack

VMware has gone a long way, in their usual space of abstracting and simplifying the complex thing, however no matter how well VIO deals with the infrastructure and deployment in this space it cannot help in the running OpenStack arena. You have to be aware that the on-ramp to learning OpenStack can be complex, with a proliferation of projects and components where many come and go, & most are not enterprise ready! The rate of change in the open source world of OpenStack can be staggering and consequently documentation can sometimes lag behind.

Advantages of OpenStack

The biggest thing here for us is the self-service element and the empowering of developers, the other key things were:

  • Fault Domains & Availability Zones
  • Less person resource at Ops end via Self Service to devs
  • Multi‐Tennancy ‐ Containers alongside Legacy Stacks from a single IaaS platform
  • Standardised APIs & product ‐ e.g. not is not custom VMware or other vendor lock code
  • Very large global community

Core Advantages of VIO over vanilla OpenStack

There are three arena’s which have proved invaluable for us with VIO over vanilla OpenStack, they are

  • Enterprise grade stability
  • Support direct with VMware – GSS on the end of a phone has been brilliant for us!
  • Almost zero day two Ops costs compared to vanilla OpenStack

Those three above were reason enough but in addition to these advantages we’ve also gain from massively simplified upgrade paths. VMware release a new candidate via a new vApp, which installs parallel to the old, the config is ported over, then we flip Horizon over to the new installation. Once happy we just trash the old instance. There is also all the standard ESXi goodness under the hood like DRS et al, & finally backups – not to be sniffed at, VIO provides a mechanism to backup both the install and the state.

Install Pros & Cons

First up be aware of the requirement for a dedicated VC & PSC, this is owing to the burn rate of which new instances can be created and the capacity for that to overload and existing VC. You’ll need a separate cluster for management – recommended at 3 nodes – we spun up a small vMSC with block replicated storage to give us DC level resilience in the management tier. A standard looking deployment will contain something like 16x VMs: 3x MongoDBs 3x Compute, 3x DB, 2x Controller, 2x DHCPServers, 2x Load Balancers, 1x Ceilometer

How to use

So OpenStack is an API driven thing, as such the API is by far the most complete & powerful way to interact with VIO/OpenStack, followed by the CLI and then the Horizon UI. In that order of functionality!

We’ve also invested in some additional tooling from HashiCorp, namely Terraform (for the birthing of instances) and Vault (for the storing of secrets and retrieval via API calls). Integration with our Puppet stacks is ongoing & on the side we’ve also stood up a sizeable Swift Object Store spanning multiple DCs. All of these together makes for a fairly complete set of tooling. This is all just from the infrastructure side, our Devs are additionally using Vagrant and Packer.

Our in house modifications, stories and lessons learned

This could/should be some of the most interesting bits – so well done if you’re still reading! Below are my own experiences;

  • Do make sure you understand your use cases before you begin your journey e.g. NSX or not ‐ is it just containers you want?
  • Understand the workloads you want to stand up, ephemeral vs persistent
  • Networking ‐ layer2 is doable, but you do lose significant features such as NaaS ‐ Security Groups/segmentation ‐ load balancing
  • We got it back – we’ve had a few events which may have been terminal and required full rebuilds for a vanilla OpenStack deployment. I should hasten to add that these have all been events that we’ve done to VIO, not VIO going awry or failing. Exemplars of those are:
    • do NOT delete your core services project!  Itchy trigger fingers from an Asana “clean up projects” task caused every single project to be deleted, this included the core services project that has all of OpenStacks internal services in it, the equivalent of “rm -rf / *”  (this should now not be possible after a RFE in future releases beyond Mitaka)
    • Storage violently removed from the hypervisors – a combination of a VMware ESXi bug and a SAN platform migration caused storage to be violently taken away and this corrupted our VIO DB stacks. Again ‐ a Jenkins re‐deploy from base config brought it all back.
  • Be aware that whilst VMware can not support many things you can do on top of OpenStack (only the VIO components – that’s understandable) but that you can extend it like any other OpenStack installation, we’ve built, and are using, a SWIFT Object Store (cross datacenter) providing us a geographically stretched redundant filesystem.
  • Windows licensing is… tricky? Basically in pulling together their infrastructure if you intend to cater for Windows instances you should consider early on that footprint and make a determination as to whether to use datacenter licensing, or not.
  • VIO is a very transparent black box, there is much temptation to look, and poke, at the insides. This is not always the best idea – a little knowledge is a dangerous thing
  • VMware PSO was invaluable to us in getting a functioning OpenStack environment in a short amount of time!
  • Ongoing GSS Support has also been very beneficial  – even in the early days whilst it was under the emerging products banner we always received great support

Futures

As per above, we’ve already integrated with Terraform and Vault, but we’re pursuing further integrations with Consul & Puppet. Automated birthing, config and plumbing of netscaler configs & devolved DNS & auto‐service discovery & an upgrade to the incoming VIO 5 based on Queens

If you’d like to know more, please reach out to me through this site.