ESXi scripted builds via PXE/kickstart

Periodically we spin up a slew of new hypervisors. If like me you find yourself desiring more automation (and the uniformity that comes with it) but are somewhere between building ESXi by hand and the scale out auto-deploy tooling for hundreds of systems. You may find this useful, especially if you already maintain a kickstart server.

This is my experiences with scripted installations of ESXi 6.5 via kickstart and PXE booting. The beauty of this is it only requires two options set in your DHCP server, the rest of the configuration is static. The net result is, configure your DHCP, let the host PXE boot and within a few minutes you have a repeatable and dependable build process.

So, how to get there?

First up you need to  have the physical environmental basic’s taken care of, e.g. rack & stack, cable up NICs etc. Once that is in place, this process will require DNS entries populated for your new servers, plus explode the ISO on your tftpserver.

n.b. this scripted install uses the “–overwritevmfs” switch so it’s imperative that you do NOT have the host in a state where it can see any storage other than it’s local disks e.g. if you are rebuilding existing hypervisors or the HBA’s have been zoned into something else previously. This is imperative as this switch has the capacity to overwrite the existing filesystems with the new VMFS’d install, therefore it must ONLY see it’s own local disks 😉

Overview of PXE & Kickstart boot workflow:

" <a href=https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vsphere-esxi-vcenter-server-60-pxe-boot-esxi.pdf (pay attention to “/“ and UEFI !)

[1] the boot loader files on the tftpserver. This is an exemplar showing both UEFI and non UEFI sitting side by side on the same install base.

screenshot1

 

[2]  The contents of the CGI script that gleams the FQDN details

screenshot2

[3]  An exemplar of a kickstart file

screenshot3

 

The boot itself

So with that all in place all you need to do is determine if the server you’re booting is legacy BIOS boot or UEFI. The difference being in the boot loader we’ll point at & insert the relevant DHCP-fu for you;

legacy boot

66=your.tftpserver.com

67=esx-6.5/pxelinux.0

UEFI boot

66=your.tftpserver.com

67=esx-6.5/mboot.efi

then simply set the next boot to PXE reboot and watch the little birdy fly!

There you have it. Effectively two DHCP settings and you can rinse and repeat bringing up dozens of systems quickly with repeatable builds. There are of course many other options, all of which have their merits. This one is perhaps most useful if you are already invested in scripted builds/kickstarts.

 

VMware Integrated OpenStack

VMware Integrated OpenStack ‐ (Ninjas) In the Real World

At a recent VMUG I presented about our journey with VIO (VMware Integrated OpenStack) to-date, here is a short write up on that presentation.

So, firstly – what is VIO. Well at it’s core it’s a shrink wrapped flavour of OpenStack, shipped & supported by VMware, running atop the ESXi hypervisors. The VIO architecture connects vSphere resources to the OpenStack Compute, Networking, Block Storage, Image Service, Identity Service, and Orchestration components. You can deploy VMware Integrated OpenStack with either VDS, or NSX‐based networking.

Why use VIO and not a.n.other distro of OpenStack?

For us the biggest differential was enterprise stability. Basically VMware QA the deployment, ship you a virtual appliance, and release around every 12 months & support for two years. We also had a big concern about the cost of day two Ops of vanilla OpenStack in comparison, VIO hugely reduces this, but also leverages:

  • integration with existing VMware tooling/skills
  • treat it as a “shrink wrapped virtual appliance”
  • Supported Product ‐ GSS on the end of a phone (this has been invaluable as we take the first steps on our private cloud journey!

Our Key Uses for VIO

  • IaaS (self‐service)
  • Enterprise Automation
  • Developer Cloud
  • Production Burst Capacity
  • Kubernetes Integration
  • Cloud scaling (On‐prem)

VIO Logical Design:

VIO basically looks like this;

There are two major things to be aware of in the initial design stage

  1. It’s really important here to know the line in the sand is – e.g. where VMware’s bit stops & OpenStack starts? & how do you cope with that? In a nutshell be aware VMware will support you all the way up to the Horizon UI, once inside user land – that’s largely outwith the realms of GSS. So it’s not a panacea, and you do need to know some OpenStack. But it’s well worth it!
  2. The biggest design decision once you’ve decided to deploy (both fiscal and design wise 🙂 ) …  is whether to run with NSX or VDS! Be aware that all of the L3 networking functionality in the OpenStack arena basically needs NSX to work as you would expect – it’s also important to note you cannot migrate between a VDS and an NSX deployment, if you change your mind it’s a complete re-install!

Other than that you’ll need a separate management cluster, we chose to build ours on a vSphere Metro Storage Cluster (vMSC), there is also a hard requirement on a dedicated VC for this owing to the burn rate of new VMs/instances coming online and saturating your existing VC(s). It’s worth noting that since we launched there is now VIO-in-a-box, this is an entire deployment condensed onto one node – well worth a look!

Challenges with OpenStack

VMware has gone a long way, in their usual space of abstracting and simplifying the complex thing, however no matter how well VIO deals with the infrastructure and deployment in this space it cannot help in the running OpenStack arena. You have to be aware that the on-ramp to learning OpenStack can be complex, with a proliferation of projects and components where many come and go, & most are not enterprise ready! The rate of change in the open source world of OpenStack can be staggering and consequently documentation can sometimes lag behind.

Advantages of OpenStack

The biggest thing here for us is the self-service element and the empowering of developers, the other key things were:

  • Fault Domains & Availability Zones
  • Less person resource at Ops end via Self Service to devs
  • Multi‐Tennancy ‐ Containers alongside Legacy Stacks from a single IaaS platform
  • Standardised APIs & product ‐ e.g. not is not custom VMware or other vendor lock code
  • Very large global community

Core Advantages of VIO over vanilla OpenStack

There are three arena’s which have proved invaluable for us with VIO over vanilla OpenStack, they are

  • Enterprise grade stability
  • Support direct with VMware – GSS on the end of a phone has been brilliant for us!
  • Almost zero day two Ops costs compared to vanilla OpenStack

Those three above were reason enough but in addition to these advantages we’ve also gain from massively simplified upgrade paths. VMware release a new candidate via a new vApp, which installs parallel to the old, the config is ported over, then we flip Horizon over to the new installation. Once happy we just trash the old instance. There is also all the standard ESXi goodness under the hood like DRS et al, & finally backups – not to be sniffed at, VIO provides a mechanism to backup both the install and the state.

Install Pros & Cons

First up be aware of the requirement for a dedicated VC & PSC, this is owing to the burn rate of which new instances can be created and the capacity for that to overload and existing VC. You’ll need a separate cluster for management – recommended at 3 nodes – we spun up a small vMSC with block replicated storage to give us DC level resilience in the management tier. A standard looking deployment will contain something like 16x VMs: 3x MongoDBs 3x Compute, 3x DB, 2x Controller, 2x DHCPServers, 2x Load Balancers, 1x Ceilometer

How to use

So OpenStack is an API driven thing, as such the API is by far the most complete & powerful way to interact with VIO/OpenStack, followed by the CLI and then the Horizon UI. In that order of functionality!

We’ve also invested in some additional tooling from HashiCorp, namely Terraform (for the birthing of instances) and Vault (for the storing of secrets and retrieval via API calls). Integration with our Puppet stacks is ongoing & on the side we’ve also stood up a sizeable Swift Object Store spanning multiple DCs. All of these together makes for a fairly complete set of tooling. This is all just from the infrastructure side, our Devs are additionally using Vagrant and Packer.

Our in house modifications, stories and lessons learned

This could/should be some of the most interesting bits – so well done if you’re still reading! Below are my own experiences;

  • Do make sure you understand your use cases before you begin your journey e.g. NSX or not ‐ is it just containers you want?
  • Understand the workloads you want to stand up, ephemeral vs persistent
  • Networking ‐ layer2 is doable, but you do lose significant features such as NaaS ‐ Security Groups/segmentation ‐ load balancing
  • We got it back – we’ve had a few events which may have been terminal and required full rebuilds for a vanilla OpenStack deployment. I should hasten to add that these have all been events that we’ve done to VIO, not VIO going awry or failing. Exemplars of those are:
    • do NOT delete your core services project!  Itchy trigger fingers from an Asana “clean up projects” task caused every single project to be deleted, this included the core services project that has all of OpenStacks internal services in it, the equivalent of “rm -rf / *”  (this should now not be possible after a RFE in future releases beyond Mitaka)
    • Storage violently removed from the hypervisors – a combination of a VMware ESXi bug and a SAN platform migration caused storage to be violently taken away and this corrupted our VIO DB stacks. Again ‐ a Jenkins re‐deploy from base config brought it all back.
  • Be aware that whilst VMware can not support many things you can do on top of OpenStack (only the VIO components – that’s understandable) but that you can extend it like any other OpenStack installation, we’ve built, and are using, a SWIFT Object Store (cross datacenter) providing us a geographically stretched redundant filesystem.
  • Windows licensing is… tricky? Basically in pulling together their infrastructure if you intend to cater for Windows instances you should consider early on that footprint and make a determination as to whether to use datacenter licensing, or not.
  • VIO is a very transparent black box, there is much temptation to look, and poke, at the insides. This is not always the best idea – a little knowledge is a dangerous thing
  • VMware PSO was invaluable to us in getting a functioning OpenStack environment in a short amount of time!
  • Ongoing GSS Support has also been very beneficial  – even in the early days whilst it was under the emerging products banner we always received great support

Futures

As per above, we’ve already integrated with Terraform and Vault, but we’re pursuing further integrations with Consul & Puppet. Automated birthing, config and plumbing of netscaler configs & devolved DNS & auto‐service discovery & an upgrade to the incoming VIO 5 based on Queens

If you’d like to know more, please reach out to me through this site.

Setting an SRM VM Priority from vRO

Recently I was given a requirement to enhance a vRO workflow which added a VM to disaster Recovery policy in SRM. The existing workflow by default added all VMs added to Priority 3 (normal) start up order. My requirement was to allow the user to specify the start up order.

Having a quick look at the environment, I could see that the SRM plugin was used so felt this was a good start – however it soon became apparent that it wasn’t ideal for me given that the information we can get out of the plugin is limited never mind having to manipulate that data. Looking online , it seemed that using PowerShell was the common answer to automating this, but I also had a constraint of not introducing any new plugins. During my online hunt I found the SRM 6.5 API guide and this became a nice resource. By browsing this API guide it became apparent that SOAP API was my only option and I continued to refer to this guide in order to find a solution – https://www.vmware.com/support/developer/srm-api/site-recovery-manager-65-api.pdf.

I decided to write this blog because there seemed a sever lack of info on using SOAP for SRM. Continue reading

GitHub Learning Lab

This has been cross posted from my own blog vGemba.net. Go check it out.

Introduction

At a VMUG last year during a presentation by Chris Wahl he recommended that all ops people like me learn a Distributed Version Control System such as GitHub. I use GitHub for my blog and storing some files, and still had not really scratched the surface of it.

Last month GitHub released a tool called GitHub Learning Labthat is basically an app that starts a bot that leads you through some training on the use of GitHub.

Lessons

So far there are five lessons available:

  • Introduction to GitHub
  • Communicating using Markdown
  • GitHub Pages
  • Moving your project to GitHub
  • Managing merge conflicts

In the Introduction to GitHub lesson you learn about:

Introduction to GitHub

Continue reading

Coommunity Mythbusters

At the recent meeting the Scottish VMUG Leaders introduced everyone to the Coommunity. The leaders encouraged attendees to speak to themselves and other members of the Coommunity (Colin Westwater, Craig Dalrymple and Martin Campbell) about taking part. It was nice to hear that some attendees did take up the challenge and actively seek out the members to talk to.

Here is some info to help bust those myths,  for those looking to contribute.

I need to be a triple VCDX etc to be able to contribute. Continue reading

Glasgow April 26th 2018 – Slide decks

Slide decks from April 2018 are available below.

Slide Decks

Atif Qadeer – Automation and NSX

Brian Gerrard and Konrad Klapa – Best Practices for vRealise Automation and Orchestrator

Cody Hosterman – Virtual Volumes Deep Dive 

Cormac Hogan – What’s Happening In the World of VMware Storage

Craig Dalrymple – Making Your 1st Restful API call to VMware

Darren Hirons – Windows 10 – Why Change the Habit of a Lifetime?

Lee Dilworth – vSan Update 6.7 and Lessons From The Field

Michael Armstrong – VMware Hands on Labs Behind The Scenes

Rick Cronin – Wavefront Overview

vSphere 6.5 Update 1 Security Configuration Guide Released

This has been cross posted from my own blog vGemba.net. Go check it out.

Introduction

On the 12th March 2018 VMware released the latest version of the vSphere Security Configuration Guide. This is an indispensable guide for securing your vSphere infrastructure which I highly recommend all VMware admins read.

Purpose

I have been following the guide for a few iterations now. Back in the early versions there were a lot of settings that could mean the over zealous administrator could have gone in and potentially caused problems. For example in the v5.1 version of the guide there were 172 settings listed over multiple sheets. In the latest version there are 68. A couple of reason for this are the mitigation change has been eradicated due to code changes or the guidance is no longer required because the software is secure by default.

Also included are some common sense ‘best practices’. This goal of secure by default can be seen in the graphs in the blog post from VMware. In vSphere 6.5 there were 24 settings available to harden the deployment. In 6.5 Update 1 there are now 10 due to VMware coding the guidelines into the code. So for that 68 Guidelines 10 are Hardening settings with 58 Non-Hardening (Audit only + Site Specific). Great job VMware! Continue reading

Glasgow April 26th 2018 – Details

Sorry it’s taken us so long to get a finalised agenda published. Conscious that it’s only 4 weeks until the VMUG we’ve decided to publish the agenda as we have it at the moment. This blog post will be a living document and will be updated as we get the other session extracts

Cormac Hogan – What’s happening in the world of VMware Storage
In my session, I plan to talk about the state of storage at VMware at moment, which includes talking about new features that you may not be aware of that are already in our products, as well as a sneak-peak at some things that “might” be appearing in some releases very soon. This will look at hyper-convergence improvements in vSAN since last year, where things are currently with VVols, an update on IO Filters, a look at core storage enhancements in vSphere 6.5, and some vision at what “might” be coming down the line. A lot of this will focus on how to leverage different data services from the different storage products for your VMs and workloads. I’ll also talk about some of the things we are doing in the persistent storage for containers, namely ‘Project Hatchway’.

Gold
Pure Storage -Virtual Volumes Deep Dive with Pure Storage – Cody Hosterman

Continue reading

VCAP7-CMA Exam Review

Recently I sat the VCAP Design exam for Cloud Management and Automation based on vRA7.2. Previously I had sat the version 6 exam and this was based on the traditional split of visio based canvas scenarios and drag and drop questions.  I learned that this version of exam has significant changes to it, and in fact there are no more canvas style questions. Indeed most questions are multiple choice with some drag and drop. The time allocation was also less than before, now only 130 minutes to answer 60 questions!

Study Mode

Going into study mode I felt confident having used vRA7.3 for some time now, however there are still slight differences between 7.2 and 7.3 that I had to brush up on. Additionally, due to the architecture of the product I work on, we don’t have a need to utilize all of what vRA can offer, so I also required a refresher on things like approval policies and the vRA portal.

So, where to start? I am lucky enough to have a lab in work where we develop, so I could use that for a “play around”. I created a new tenant and simply clicked everywhere and anywhere to get a feel for all vRA7 has to offer. I also completed some Hands on Labs from VMware. They are an excellent resource and cater for all levels. From here you can also click around – no need to follow the guide :).  I did focus, however, on the vRA/NSX integration labs. I much prefer these labs to reading but I also brushed up on the design qualities that are always part of these types of exams. Having sat a few based on the DCV track, I always refer to Paul McSharrys official guide and also the DCD 5.5 Study Pack from Jason Grierson which is an excellent reference. I also should point out that the official exam guide here contains some really important references. Continue reading