Post on 15-Sep-2018
Callum Vessey 13461118
VMware Virtualisation Labs Blog:
Introduction:
As previously addressed in the NetApp Storage Lab Blog, I first had to set up my lab test bed network
with the required VM’s, including nested ESXi 5.5 hosts and fresh NetApp DataONTAP 7-mode
storage appliance.
Nested ESXi 5.5 Creation:
I first built 2 nested ESXi-5.5 hosts. In order for these to function correctly I had to change vCPU
settings to expose hardware assisted virtualisation to the guest OS. This would normally be done in
the bios of a physical machine before installing a hypervisor or any virtualisation program such as
VMware player or Virtual box. My nested ESXi hosts also required a minimum of 2 CPU cores which I
addressed on creating the VM’s.
In terms of networking, my two EXSi 5.5 hosts would require a minimum of 3 NIC’s. One for
management traffic and production VM traffic, one for iSCSI storage traffic and one for vMotion
traffic. I also had to change the base configuration of both hosts.
vCenter Server Creation:
I built this server on standard Windows Server 2012 r2 VM then installed the vCenter Server
packages. The installation media offers a “simple installation method” but I chose not to use this as
it has been known to cause issues. Instead, I installed each component individually. After the
installation of (*list parts – SSO, updates etc) I installed the vSphere desktop client and logged into
one of my ESXi 5.5 hosts. After ensuring everything was functioning correctly I then logged into the
vSphere web interface and created a datastore then added both exsi's to said datastore.
NetApp DataONTAP 7-Mode Simulator Rebuild:
As I had already been using a storage appliance for the previous NetApp storage labs I rebuilt a fresh
vSim to avoid any issues that could arise due to the previous modifications made. The new storage
appliance did not need to be as large either. The new disk I added to the vSim was only 130gb (still
thin provisioned) and only required one shelf of disks.
After this was created I was ready to begin on the labs proper.
* Note - Some of the above tasks are part of the upcoming labs but logically it made sense to do
them all to begin with to have a complete, or near as complete as possible, test bed.
Lab 1 – Installing VMware vSphere GUIs:
The first lab simply requires us to install the vSphere desktop client. As we need to create our own
test lab environment for this course I had already installed the client on NMIT laptop and I would
need to use it first to create the VM’s to build the lab. Later I installed it on my vCenter Server VM
and initially logged in as root user before setting up the vSphere web client to log in as
administrator@vsphere.local the further as my domain admin (administrator@callum.local)
Lab 2 – Configuring VMware ESXi:
Task 1 and 2:
In this lab I simply logged into the vSphere client on my vCenter Server box and viewed the hardware
configuration.
Task 3 and 4:
Here I configured the DNS settings for my host(s) and set up network time protocol, using the
Microsoft NTP servers.
Task 5:
In the final task of this lab I configured my host to use my AD domain that I had previously set up
before starting the labs.
Lab 3 – Working with Virtual Machines:
This lab requires us to create a VM, as I had already created multiple VM’s to set up my test lab
network I feel that I have already carried out the requirements of this lab.
Lab 4 – Configuring VMware vCenter Server Appliance:
*OPTIONAL
Lab 5 – Using the VMware vSphere Web Client:
The bulk of this lab was also already done in my initial set up of my test lab network. During this time
I navigated the vSphere web client frequently, we also have to make extensive use of it later labs as
well.
Lab 6 – Configuring VMware vCenter Single Sign-On:
For this lab I configured permissions to allow my AD admin user to have full access to my vCenter
server.
Lab 7 – Creating Folders in VMware vCenter Server:
Another very short simple, lab I simply created a folder structure for my VM’s.
Lab 8 – Standard Switches:
Here I created a standard switch (vSwitch1) on my second NIC and added a “production” port group
within that switch. At this time I did not have any VM’s specifically created running on my virtualised
hosts but once I did create these VM’s in latter labs I added them to this port group.
Lab 9 – Accessing iSCSI Storage:
This was the first lab that I ran into significant issues and required a bit of research and configuration
changes in order to get everything working correctly.
First I had to configure my NetApp Storage Appliance. To do so I had to log into On Command
System manager via my windows 8 management box in order to set up the iSCSI service and create
LUN(s) to be used as shared storage. First I licenced the iSCSI service and then made sure it was
running on my second NIC to be used for iSCSI traffic. I then created 2 LUN’s and mapped an iGroup
to them.
Task 1:
I then went back to the vSphere web client and added a VMkernel Port Group to vSwitch1. At this
time I also made sure that this switch was set to promiscuous mode.
Task 2:
Next I added an iSCSI storage adapter, configured the network port binding and finally added the IP
and port of my NetApp Storage Appliance as the target. After this had completed I rescanned the
adapter but still could not see any of my previously created LUN’s in the paths tab.
After double checking things like the initiator targets were correct I decided that the pre-configured
NIC’s could be causing issues here so I deleted all of them and used freshly created NIC’s. As soon as
I did this everything started working correctly and my LUN’s appeared in the paths tab.
Lab 10 – Accessing NFS Storage:
Before I could undertake this lab I had to go back to my vSim storage appliance and create an NFS
share to access as a datastore.
Task 1 and 2:
I then created a new NFS datastore and view information about it.
Lab 11 – Managing VMware vSphere VMFS:
Task 1 and 2:
For these tasks I changed the names of the local datastores on my 2 ESXi hosts from datastore1 and
datastore1(1) to Local-ESXi01 and Local-ESXi02. I was also required to review my shared storage
configuration so I knew where my VMFS datastore would be created.
Task 3:
I created 2 VMFS datastores (PrivateVMFS-01 and -02) on the two 30gb LUN’s I had previously
created on my vSim. However, PrivateVMFS-01 was set to be 1gb smaller than the LUN’s total size.
Task 4:
This task simply required me to expand PrivateVMFS-01 to use the extra 1gb of space available on
LUN1.
Task 5 and 6:
In task 5 I deleted PrivateVMFS-02 datastore. I then extended PrivateVMFS-01 to use the now
available space on LUN2.
Lab 12 – Using Templates and Clones:
In this lab I issues with getting sysprep to work correctly with my template. I ended up running
sysprep manually initially until I got it to work.
Task 1 and 2:
Here I created a basic template from one of my previously created VM’s.
Task 3 and 4:
I then created customised specifications and finally deployed a VM from my new template. This
process took quite a long time so in end I had to leave it running while I attended to other matters.
Task 5:
For the final task I “hot cloned” one of my VM’s. That is to say, I cloned a running VM.
Lab 13 – Modifying a Virtual Machine:
Task 1:
In this task I increased the size of Hard Disk one from the original 8gb default size by 4gb’s to 11.9gb.
I did so initially by increasing the size of the VM’s VMDK file. I then opened a console into the VM’s
OS and using the dell program, ExtPart, I increased the size of the primary partition to take up the
extra available space.
As an interesting aside, in my memory one could increase the size of an active primary partition via
the windows disk management feature even on older OS’s like Windows Server 2003. It would
appear that is not the case so I had to go online and download ExtPart to do so.
Task 2:
Here I simply increased the memory of my VM (CJV01-2) from 1024mb to 1536mb. To do this the
VM must be turned off.
Task 3:
This task required me to rename my Hot-Clone01 VM to CJV01-4. I did so simply by right clicking on
the VM and clicking rename.
However, it should be noted that this only changes the display name of the VM in the inventory and
not the datastore name. For this reason one should always be careful when initially creating and
naming their VM’s.
Task 4:
Before undertaking this task I had to create a new LUN via my NetApp Storage Appliance, then add it
as a virtual drive to my VM (CJV01-2). I then opened a console into said VM’s OS and made sure I
could see the new unpartitioned volume. I was not required to partition the drive for actual use in
this lab.
Task 5:
In the final task of this lab I expanded a thin provisioned virtual disk to thick provision disk. Initially
there was 9.11gb of provisioned space for the VM CJV01-3 and only 1.89gb was used. The datastore
the VM was located on was PrivateVMFS-01. I then inflated the vmdk file.
Lab 14 – Migrating Virtual Machines:
Task 1:
The first task required me to migrate one of my VM’s and its files from one datastore to another.
Before I could do this though I needed to create a second VMFS datastore to migrate to. I did this via
On Command System Manager initially creating a LUN then created a new shared datastore on said
LUN.
Before migration of CJV01-4:
After migration:
Task 2:
Before I could use vMotion to transfer VM’s from ESXi host to EXSi host I first had to create a new
virtual switch and VMkernel port group for said vMotion traffic. In doing this I also had to add an
iSCSI storage adapter to my second EXSi host so both had access to the same shared datastore. I also
made a slight mistake here because I forgot to add a second vswitch to this host to connect it to my
other EXSi host.
Task 3 and 4:
In task 3 I simply had to verify that my Host(s) met the vSphere vMotion requirements, which they
did. In task 4 I had to verify that my VM’s met the same requirements. There was one complication
here in that for some reason I could not delete Hard Disk 2 from CJV01-2 via the web client. Instead,
I had to delete it using the desktop client. I assume this is something to do with the ongoing issues
chrome seems to have running the client integration plugin.
Task 5:
Here I powered on my VM (CJV01-2) started a continuous ping, then migrated the VM from one host
to another. This all worked as expected and the ping never dropped during the process.
Task 6:
In this task I had to migrate a VM both in terms of its datastore and the host it was currently on.
However I encountered an error here again relating to the datastore. Fresh VM’s did not seem to
have this problem when migrating them in this manner. Migration in general seemed to be rather
temperamental in my nested lab environment. This is likely due to the fact that the physical hosts
we were using were using were running fairly low on resources, coupled with the extra overhead of
the nested environment. Also any attempt that involved migrating datastores took an incredibly long
time. Eventually I was successful in migrating both datastore and host simultaneously.
Before migration:
After migration:
Lab 15 – Managing Virtual Machines:
Task 1:
For the first task of this lab I removed a VM (CJV01-4) from my inventory. I then located the VM’s
files via the datastore it was in (PrivateVMFS-01). However, as this VM was originally created as Hot-
Clone01, that is what folder containing its file was called, not CJV01-4.
Task 2:
Here I simply registered what was CJV01-04 (originally Hot-Clone01) by adding the vmx file back to
the inventory. I also gave it a fresh name – CJV01-5.
Task 3:
For this task I went further than the previous task and removed the VM (CJV01-5) from the inventory
and deleted its files from disk.
Task 4:
Here I deleted a file (iometer.exe), made a snapshot. Then deleted another file (cpubusy.vbs) and
made a further snapshot.
Next I copied cpubusy.vbs back to the VM’s desktop and took a final snapshot.
Task 5:
The next task required me to revert to an earlier snapshot. I reverted to the “without iometer and
cpubusy”. After saying yes to the prompt the VM rebooted, this was to allow it to restore to an
earlier state.
Neither iometer.exe nor cpubusy.vbs were on the desktop as I had removed them prior to creating
the snapshot I had just reverted to.
I then reverted to the snapshot labelled, “with cpubusy”. The VM power cycled again like before.
cpubusy.vbs was now back on the desktop but iometer.exe was not.
Task 6:
Here I deleted a snapshot I had previously made, specifically, “without iometer and cpubusy”.
Task 7:
In the final task of this lab I used the delete all function in snapshot manager. As its name suggests, it
deletes on remaining snapshots in the snapshot manager.
Lab 16 – Managing VMware vSphere vApps:
Task 1:
For the first task in this lab I created a vApp (CJV-vApp) and added both CJV01-2 and CJV01-3 to it.
After doing so they no longer show in the VM’s and templates view. They are no visible when I
expand my vApp as expected.
I then changed the start order of the VM’s in my vApp.
Task 2:
After powering on my vApp, as expected, the VM’s boot in the order I set them to and not just
simultaneously.
Task 3:
In the final task I powered off the vApp and removed both VM’s from it. I then deleted the vApp
itself.
Lab 17 – User Permissions:
Task 1:
Here I simply created a role and set privileges for said role.
Task 2:
For this task I set permissions for various inventory objects all linked to the previous role I had
created. Initially I was unsure on which domain I should be using for my non privileged user. At first I
just used my vshpere.local domain but on further consideration swapped to callum.local.
Task 3:
In this task I tested out the various permissions I had set.
Lab 18 – Resource Pools:
Task 1:
I had to create my own cpubusy.vbs script file to run for this lab. However, using a stress testing
program like prime95, OCCT or the built in stress testing features of Aida64 would have also worked.
However, I feared they would be too intensive as the nested VM’s seem to struggle as it is. I also had
to mount a network drive to access other files over the NMIT TALOS network and so I could get a
working browser installed.
I started running the scripts of both VM’s and changed the affinities. However, I seemed to get same
speeds with 1 or 2 cores. I also used cpu-z to make sure the scheduled affinity was working. I also
considered whether the cpubusy.vbs script may need to run multi instances to stress both cores and
thus have more of a noticeable effect (this was the case when stress testing some of the early multi
core CPU’s, multiple instances of Prime95 would need to be run simultaneously).
I then further realised that my scheduling affinity was not working correctly at all and one core was
only running the whole time. After further changing settings, I have added more cpu sockets and
cores and no matter how many I add I cannot disable any via scheduling affinity. It requires all of the
cpus to be added to the string otherwise you cannot continue. I also added more cpu sockets to my
virtualised host to make more cpus available to my nested VM's.
Pre Scheduling Affinity:
Post Scheduling Affinity:
Task 2 and 3:
In these tasks I created two resource pools, Fin-Test and Fin-Prod. The former being configured as
low CPU resource share and the latter a high CPU resource share.
Task 4:
Here I tested the resource pools I had previously created in tasks 2 and 3. The number of shares for
the low resource pool was 2000 and the high was 8000.
I then tested the 2 resource pools using the cpubusy.vbs script again. Like task 1 I observed no real
difference in performance as far as the scripts calculations. They were all taking 1 second. In fact, the
Fin-Test low resource pool seems to be pulling consistently high clocks than the Fin-Prod high
resource pool. There was also no difference when swapping to high normal and then normal either,
prod pool and VM2 always consistently running at a lower clock rate and always 1 second.
Lab 19 – Monitoring Virtual Machine Performance:
Task 1:
Initially I was just required to run the cpubusy.vbs script again on all VM’s.
Task 2:
I then was required to monitor the CPU utilisation via the vSphere web client. However, accurate
monitoring of real time CPU usage was difficult due to the low overall performance of the nested
virtual environment. Lag and delays on updates made it hard to record the latest info via the web
client at the time of testing.
Coming back out of normal class hours offered some improvement as there was less demand on our
physical hosts. I also found that the cpubusy.vbs script did not tax the CPU enough to really show an
appreciable difference in the performance monitor. I decided to use prime95, specifically a blend of
tests to push both CPU and ram.
In any event, to answer the question posed at the end of this task. Yes, theoretically the CPU state
should go back to near idle once stress testing/running math calculations finishes.
Task 3:
Here I put back I just undid the changes made to my VM’s ready for the next labs.
Lab 20 – Using Alarms:
Task 1:
In this task I created an alarm that that monitors CPU usage. Specifically, warning level set at if the
CPU usage is over 25% for 30+ seconds and critical level if the usage is over 50% for 5+ minutes.
Task 2:
In the second task I created an alarm that monitors for an event.
Task 3:
Here I triggered and acknowledged alarms. I did so using the cpubusy script to trigger the CPU usage
alarm. After the alarm was triggered I acknowledged it and reset it to green.
Task 4:
Finally, I disable all alarms I had created in this lab.
Lab 21 – Using VMware vSphere High Availability:
Task 1 and 2:
For the initial task I created a High Availability cluster and added my two nested ESXi hosts to said
cluster.
Q1 - The host, 172.16.61.94 is the master in my cluster. It also was the first host added to cluster and
at the time the only one with VM’s on it.
Q2 - Yes, the number of VM's matches, I had 2 VM’s at the time of creating the cluster and 2 are
protected.
Q3 - 2 datastores are needed for heart beating. PrivateVMFS-01 and shared datastore are listed.
Q4 – Yes, all errors are now removed now I have added an extra DS and added management
network capability to the vMotion switch.
Task 3:
For this task I simulated a host failure and by doing so tested the HA functionality. Initially both VM’s
were on the master host (172.16.61.94). I then rebooted this host and the still powered on VM’s
migrated (via vMotion) to the secondary host (172.16.61.95) which now becomes the master.
Q1 - CJV01-2 and 3 are powered on. .94 is the master host. Q1 - Yes .95 is now the master and .94 is
now the slave. Due .94 going down .95 will now be the master until it goes down.
Task 4:
Here I recorded the HA cluster resource availability/usage:
Total cpu = 5.5ghz
Reserved cpu = 1.38ghz
available cpu = 4.1ghz
Total memory = 9.9gb
reserved mem = 2.61gb
available cap = 7.38gb
Task 5:
Q1 - 32ghz and 78mb for default slot size. Yes, after changing CJV01-3 to 512mhz reservation the
minimum is now 512mhz. After changing to fixed slot size and 300mhz, the CPU slot size is now
300mhz.
Task 6:
In this task I learned how strict admission control works in relation to my HA cluster.
The total memory for each host is 8gb.
Q1 - There is 8192mb available to each host and 16gb in cluster. I am assuming that is rounded? As I
would not imagine there is slightly less available to cluster.
Q2 - 9.99, 0, 9.99gb.
Q3 - N/A slots in cluster. No VM's are on so no slots are showing as none needed to run no VM's.
Please refer to the below screen shot for question info.
Q7 – There are 10 slots available now.
Task 7:
In the final task, like previous labs, I prepared my test bed for further labs. Specifically, I removed the
unnecessary Lab Servers folder and removed the VM memory reservations.
Lab 22 – Designing a Network Configuration:
OPTIONAL
Lab 23 – Configuring VMware vSphere Fault Tolerance:
*Note – from the outset this lab was incredibly frustrating and slow going. In the end it took over a
week to finally get through it. This was largely due to not being able to test new configurations and
options immediately as it would take long periods of time to reboot hosts and VM’s and this was
even worse when trying to enable fault tolerance (FT) on VM’s, I would generally have to leave them
overnight.
Task 1:
I carried out the initial steps to prepare my VM’s for FT. This seemed ok at the time until I tried to
enable FT protection in task 3.
Task 2:
In this task I enabled vSphere FT logging in both ESXi hosts.
Task 3:
Here I tried to enable FT but got an error relating to the CPU of the VM in question, amongst other
things. I had to disable the extra core on each VM to get around this first problem.
On the second attempt to enable FT I came across an issue where my VM would near instantly go to
28% then stall for hours eventually giving a non-descript error. I found the only way around this was
to completely remove the virtual CD-ROM drive.
With FT finally enabled I then tried to turn on my VM. No luck. I got the following error stack:
Error stack:
Record/Replay is supported for 64-bit virtual machines only on certain CPUs. Canceling
Record/Replay.
An error was received from the ESX host while powering on VM CJV01-2.
Failed to start the virtual machine.
Unable to enter Fault Tolerance mode.
Given it seemed to be a CPU related issue I tried to enable EVC mode and messed around with
forcing the Intel VT-x settings but to no avail. I also tried reducing the host CPU sockets/core down
to one each as well. Long mode is apparently tied in here with 64bit OSes.
At this time I also installed VMware tools on my nested hosts in an effort to fix issues I was having. I
did so by running this line of code via a SSH session:
esxcli software vib install -v http://download3.vmware.com/software/vmw-
tools/esxi_tools_for_guests/esx-tools-for-esxi-9.7.0-0.0.00000.i386.vib -f
This argument only works if the nested host has an active net connection.
I also needed to enable SSH before I did the above.
After changing more settings relating to exposing virtualised hardware I decided it was easier to
simply build a 32bit version of my server 2003 VM.
*Interesting point – given it was a 32bit iso I would have thought the install would have been 32bit
as well but as I selected 64bit (automatically) it must force the virtual CPU for that VM to be 64bit
and not 32/64bit like the physical Intel CPU it was running on.
After hours of waiting for it to install I started again.
Also at this point I had an error with one of my virtualised ESXi hosts. I was getting an error that said
the HA agent on this host was unreachable. I decided to shutdown both hosts and disable HA in my
cluster and try rebuilding the HA cluster from scratch. After re-enabling HA everything starting
running as expected once more.
After another day of waiting for server 2003 to install and then overnight waiting for FT to enable I
finally was at the stage where I could turn on the VM with FT enabled. Initially it did not work as it
said that "true" was not a valid Boolean variable for "replay.AllowBTOnly". It also said the host CPU
did not support FT. In a desperate attempt I tried to simply use all caps for the variable, I did not
think this would work as other default values do not use all caps. However, it worked! And finally
booted and FT came online.
The primary VM was located on my .95 host with the secondary location on my .94 host.
Task 4:
In this task I tested the FT configuration using the “test failover” function.
Task 5:
Here I observed the difference between turning off FT and disabling it completely. That is to say,
turning off removes FT altogether and all records/info whereas disabling just temporarily stops FT
but keeps the secondary VM and all info ready for re-nabling it quickly.
Finally, I disabled FT on this VM.
Lab 24 – VMware vSphere Distributed Resource Scheduler:
Task 1:
For the first task I created a load imbalance, that is, I run the cpubusy.vbs script on all 3 of my VM’s
with them all on one host.
Task 2:
Next, I enabled vSphere DRS on my lab cluster.
Task 3:
To check that DRS was functioning correctly I clicked the “Run DRS Now” button and checked the
summary tab. As expected there was an imbalance.
I then checked the DRS recommendation tab. DRS recommended moving one VM to another host
and did so but for some reason the first time it had an issue migrating the VM. I tried it again and it
was fine – just another strange quirk of a nested lab setup. Everything was all balanced after moving
hosts.
Task 4:
In this task I created and tested a VM to VM affinity rule. My rule specifically meant that 2 of my
VM’s were required to stay on the same host. Initially, when I created the rule, they were not. After
applying the rule the DRS recommendation was that I migrate one VM so both CJV01-2 and-3 were
on the same host (as per my rule). I then disabled the DRS rule.
Task 5:
Next I created and tested an anti-affinity DRS rule. This specified that both VM’s CJV01-2 and -3
should always be on separate hosts. As they were on the same host from my last DRS rule the DRS
recommendation was to split them, which I let it do. After doing this I deleted the DRS rule.
Task 6:
In the final task of this lab I created, tested and disabled a VM to host affinity rule. I selected both
CJV01-2 and 3 (a DRS group) and specified they can only run on one host.
I then checked the DRS recommendation and applied said recommendation.
Finally I attempted to manually migrate CJV01-2 (which had just been moved due to the DRS
recommendation) back to my other ESXi host. As expected, this was not allowed and the
compatibility panel informed me of this.
Lab 25 – Vmware vSphere Update Manager:
Task 1:
For the first task I installed the vSphere update Manager. It was a very straight forward installation. I
only had to provide basic info.
Task 2:
Initially I actually could not find the plugin manager to be used for this task. I then realise that I
should be using the desktop client and not the web client for this task. Once found it was very simple
to download and install the plugin.
Task 3:
This task required us to modify a few of the cluster settings relating to DRS, HA and resource
allocation. The following screen shots show that there was no memory or CPU resource reservation
set for any of my VM’s.
Task 4:
Here I configured the update manager and uploaded the patches I was going to distribute to my 2
ESXi hosts.
Task 5:
In the next task I created a patch baseline. This customises which patches will be installed. I only
choose a small selection of patches to demonstrate my understanding as opposed to all the patches
that were required which would have taken a long time put into effect.
Task 6:
This task required me to attach my previously created baseline and scan my host(s) to evaluate
which patches are required. The end result was that both hosts were “non-compliant” but that was
expected.
Task 7:
I then staged the patches to my hosts (makes them locally available to the hosts).
Task 8:
For the final task I remediated my host(s).
Report that was generated:
Initially I ran into an error when the VM (CJV01-4) was being migrated from the host that was being
remediated (172.16.61.95). This was likely due to the lack of resources available to the physical host
my nested environment is sitting on.
On second try everything worked as it should do.
Q1 – This question does not apply as I was in charge of remediating both hosts and thus did one at a
time.
Q2 – Yes the host was placed in maintenance mode.
Q3 – Yes the only VM on .95 host was migrated to .94.
Q4 – Yes the host was patched while in maintenance mode.
Q5 – Yes the host was rebooted after install.
Q6 – After the host had rebooted it was still in maintenance mode but then it cycled back out of
maintenance mode.
Q7 – N/A.
Q8 – Yes CJV01-4 was still present in the lab cluster, it was on .94 host.
Q9 – N/A.
Q10 – N/A.
And finally it was complete:
Lab 26 – Installing the VMware vCenter Server Components
As I mentioned in the introduction section of these blogs, I carried out this lab at the outset when I
set up my test bed environment. It would have made no sense to do this last and would have
actually made doing most of the earlier labs very difficult, more likely impossible. Most of my screen
shots are taken from my vCenter Server via the web client.