Presentation of bachelor project 3E Best Practice running ESX -by Kim Fortescue.

Presentation of bachelor project 3E

Best Practice running ESX-by Kim Fortescue

A little information about Halliburton Landmark Graphics

• Landmark Graphics host geological and geophysical applications for customers in Europe in addition to hosting and storing seismic data for 40 customers in the North Sea.

• They currently have 200 physical Windows servers in addition to 150 Linux/Unix servers.

• Another 100 servers are virtualized and separated into two different virtual environments.

Why was this assignment chosen?

• Before the project started Oddmund Undheim and Torgeir Torgersen presented different assignments that we could discuss and choose from.

• Best Practice running ESX was chosen because it was considered to be closer to the basic knowledge attained in school and it was also a subject I found very interesting.

What will the assignment try to solve?

• Landmark Graphics have two different virtual environments and they wanted to know if they were optimally configured according to Best Practice in VMware Networking.

• They also wanted to explore the possibility of merging the two virtual enviroments into a single vCenter to make managing tasks easier.

How the assignement was solved• In the early stages of the project, the IT administration group of Landmark Graphics decided

to start the process of investing in new hardware designed to meet future needs and demands. So the need to merge the two virtual enviroments shifted to solve how one could migrate the virtual machines from the old and out-dated enviroment to the new enviroment running on the blade-servers.– Finding ways to migrate the virtual machines was done by researching material on the internet, and amidst all the

information available I assumed that every virtual machine was RAW-mapped directly to the SAN. This complicated the task at hand, but in the end when I was told they weren’t the solution at hand was quite common and flexible.

• The one part of the assignement that didn’t change was the need to investigate if they were running in accordance with Best Practice Networking guidelines from VMware. However the focus shifted from investigating both virtual enviroments to just the newer one, running vSphere 4.0– This was also done by extensive research on the internet, forums and also consulting with Oddmund and Torgeir. In

the beginning the amount of information was quite overwhelming, and at first it seemed as if everything was already running Best Practice, but a deep dive into the vCenter settings and configuration uncovered there were a lot of things to correct in order to reach Best Practice. This breakthrough came a little late in the projects timeline, but never the less it was a very steep but good learning process.

VMware Best Practice Networking• VMware already have well defined Best Practice for Networking, and the most

relevant ones to our project are these:– Separate network services from one another to achieve greater security or

better performance– Keep the vMotion connection on a separate network devoted to vMotion– Use redundant switches for each network interface card (NIC) that is a part of

a vSwitch– Assign pNICs (physical NIC) in pairs to vSwitches to increase redundancy and

possibly load balancing within the limitations of the chosen hardware, and also to avoid single points of failure.

• Network Best Practice– There should be a second available switch to avoid single point of failure and

allow rerouting of packets if necessary.

Current configuration on IBM blade servers

A closer look

• This configuration is valid for 3 of the 4 VMware vSphere servers currently running.

• The 4th server only utilizes 3 NICs, where there’s only one NIC assigned to virtual machine traffic.

• Service Console / Virtual Machine traffic are connected to a redundant switches, but vMotion is connected to a single switch which is not redundant.

• Every service only has one NIC assigned via a single vSwitch• But if we look closer into what hardware each blade has to

offer we find this:

IBM HS21 hardware

• This is an overview of available network adapters and NIC’s onboard each one.

• There’s one blade server running only 3 active NICs, the other 3 are configured with 4.

Hardware

• Meaning that we have 2 NICs that are unused, except for one of the servers that has 3 unused NICs.

• This gives clear indication that the current setup is not according to guidelines for Best Practice provided by VMware. – There are single points of failure on every service, because

they run on a single NIC. It would not be hard to fix if something where to happen but it’s still not Best Practice.

– vMotion also doesn’t have a redundant network, so there’s a single point of failure in the network part as well.

– Every service is separated to increase security and performance which is according to Best Practice.

How to reach Best Practice ?• The current situation can be improved by many different solutions involving:

– Solutions that do not require investments that improves the situation, but does not reach Best Practice:• Utilizing 4 NIC’s where 2 & 2 are “Teamed” to provide redundancy and

failover. Service console and vMotion services are merged into a single vSwitch because SC doesn’t require much bandwidth.

• Utilizing 6 NICs where each service is separated into a single switch:– 2 X NIC for Service Console– 2 X NIC for VMkernel vMotion– 2 X NIC for Virtual Machine Traffic

• These solutions can be found in the more extensive document that was provided at the start of the meeting.

• It’s worth noting that one can start with these solutions to improve the situation and then move onto the next part.

Not Best Practice, but not far away

Not Best Practice

• This configuration eliminates all but one single-point-of-failure, and doesn’t require any investments to setup.

• The NIC’s are already there ready to be used, and the only step remaining to reach Best Practice would be to setup a redundant switch for the vMotion traffic.

• The vMotion traffic is still internal which is good because everything is unencrypted.

• This configuration is something that could be implemented fairly quickly, and should be. The NIC configuration will be shown in the next part which details how to reach Best Practice.

How to reach Best Practice ?

• Now we’ll look at a solution that requires a minor cost to implement, but this will also make the configuration reach Best Practice.

• Here we will utilize every NIC available, team them up in pairs, and provide a redundant switch for vMotion(which is the cost part)

• Here’s a picture of what it would look like if it were implemented:

Best Practice

Best Practice• In this setup each service is separated to improve security and

performance. • Each service is assigned two NICs to improve redundancy and fault

tolerance, thus avoiding single points of failures. • What’s worth noting is that even though every service has two NIC at its

disposal, only virtual machine traffic has both of them active at the same time. – vMotion and Service Console only need 1 active NIC, where the other

one is in standby mode and ready to activated on need if the traffic load should increase or if the other NIC is failing.

• vMotion now has a redundant network as well as a redundant NIC.• So we have 3 different services, 3 different network adapters and 6 NIC in

total. To improve redundancy even further we run each service separated on different network adapters:

Service ConfigurationService type: Network interface

card #Network interface card #

Network adapter #

Service Console vmNIC 0 active vmNIC 5 standby 1 & 3

VMkernel vMotion vmNIC 1 standby vmNIC 3 active 1 & 2

Virtual Machine Traffic

vmNIC 2 active vmNIC 4 active 2 & 3

Best Practice Conclusion• With every step taken to reach Best Practice one would eliminate:

– Single point of failure in:• Network Adapters• Network Interface Cards• Physical switches

• You would achieve:– Fully redundant VMware services (Service Console, vMotion & VM traffic)– Fully redundant VMware networking– Better security & throughput for the unsecure transmission of vMotion data– Better load balancing of the virtual machine network which could decrease

the power consumption and network traffic and also CPU time.

• This solution was selected because it provided the most beneficial and cost efficient upgrade, because it would only require one more switch to operate within Best Practice guidelines from VMware.

Merging of virtual environments• As described early in the project Halliburton Landmark Graphics wanted to explore

the possibility of merging the old stand-alone cluster with the new blade cluster. – This was however dismissed as the old standalone servers faced retirement

due to old age, and the need to invest to meet future demands and needs.– So what remained to do for the project was to examine the different

possibilities of migrating the virtual servers currently running on the standalone cluster. • Many different solutions where researched through the projects timeline, but most of them

where researched under the wrong assumptions. To clarify I assumed the virtual machines where RAW mapped directly to the stand-alone cluster’s SAN, which in course limited my options for migration. The methods found were all a bit time consuming and some involved physically linking the SAN from each cluster directly and reconfiguring each virtual machine separately in order to get it up and running. Otherwise it would still “point” towards the old SAN and not boot up.

– So what solution ended up being the best one?

The simple solution

How is it done?1. Establish a remote desktop connection to the vCenter server2. Locate the name and the location of the virtual machine you want to migrate3. Establish a SFTP connection to the cluster through one of the stand-alone hosts,

from the local machine.4. Copy the necessary files to a local hard drive5. Log on to vCenter on the new cluster and start the process of adding a new

virtual machine, and select the locally stored one. 6. The virtual machine will then be stored on the new clusters SAN and be ready to

run.7. It really is a solid case of K.I.S.S. 8. The extra documentation will provide a flow-diagram to explain how to proceed

in more detail and other ways of doing the same task albeit with other software etc.

Thoughts about the project• First off I want to say that I’m truly grateful for been given the chance to

write a project report in an actual IT company and seeing how an everyday situation unfolds at Halliburton Landmark Graphics.

• I also want to thank Oddmund and Torgeir for their patience and knowledge, which they shared willingly although I often had to ask them many times about the same thing to fully grasp the information.

• Throughout the process the lack of well defined Best Practices for VMware vSphere became apparent, which made gathering information more time consuming but more rewarding.

• The learning process has been steep and brutal, but in the end I learned a lot from it, and I hope you have learned something as well during this process.

• And also a thank you to Atle and Jostein who provided steering and direction during the project.

Presentation of bachelor project 3E Best Practice running ESX -by Kim Fortescue.

Documents

Transcript of Presentation of bachelor project 3E Best Practice running ESX -by Kim Fortescue.