Background

The purpose of this document is not to criticize or to point blame, but to explain how things can go wrong and some steps to prevent them. When a system is implemented and deployed, there is an unspoken understanding that components of that system will require some “care and feeding.” For any system to remain optimal and to keep up with ever maturing technologies, some level of maintenance is required.  Throughout the Life Cycle of any equipment, there are and will be required software updates, security updates, firmware updates, OS (Operating System) updates, and equipment updates.  Similar to the maintenance of a vehicle.

This paper illustrates a use case of a client that experienced a full and complete collapse of their IT infrastructure. The series of events that occurred from a client who refused regular Roeing Quarterly Business Reviews (QBRs), resulted in a cascade of issues, downtime, and unexpected and unnecessary expenses. The original call was for a failed motherboard on a 6-year-old server.  While trying to find replacement parts, the Roeing Technical Service department, as a last resort, purchased parts off eBay.  Once the hardware was received and installed, the dated server continued experiencing major issues.  The decision was made at that time to just replace it with a new server. The client did not wish to invest more money and parts in an old, out-of-date, 6-year-old server. 

IT Dominos

Once the new server was in, and VMware was installed, it was discovered that the VMware license the client purchased 6 years ago, was no longer supported on the new equipment.  At this point, Roeing had to facilitate a tough discussion with the client about upgrading to an updated version of VMware. This was an additional and non-budgeted license cost.

Once the licensing was procured, Roeing arrived onsite to install and configure the new server.  It was discovered at that time, there were existing hardware alarms on their legacy SAN (Storage Area Network).  Once the Roeing team gained access to the SAN, it was discovered that there were multiple failed drives and a failed power supply! Another tough conversation, a new IT domino falls, and the client was forced to order replacement drives and a power supply in order to stabilize the SAN before moving forward. 

The next domino to fall was to perform an upgrade of both the client’s server VMware environment and their VMware Horizon View environment to the latest and, more importantly, supported version. The immediate issue here was that this was occurring in January, at the beginning of the New Year, but this year, Adobe Flash had expired on Dec. 31st. Even though the client received notification of Flash expiring, and the version of Horizon was dependent upon Flash, an upgrade never occurred.  The result of this missed step impeded the Roeing team from being able to get into the existing environment and replicate the settings. This required us to upgrade/rebuild the environment utilizing an in-place upgrade path instead of a fresh install. The challenge here was that since they were running such an outdated version of Horizon, Roeing had to do a three-step version upgrade to get to the newest version level of VMware Horizon.

As we peeled back the layers of the onion, it was clear the core was rotten. Additional issues and costs centered around:

  • TLS versions
  • Firmware on Thin Clients
  • Windows 7 images
  • Backup agents
  • Active Directory Server
  • Sophos Anti-Virus

Stability

After the dust settled and the IT infrastructure was brought up to a best-practice standard by the Roeing Technical Services team, it became evident that this scenario and situation was worth sharing with others.  

What started as a simple hardware failure quickly cascaded into a monumental issue, with monumental and unexpected costs. The rate at which software and hardware change in today’s world means the days of setting and forgetting are over. Most of these issues and costs could have been avoided if proper “care and feeding”, including Roeing QBRs of the IT infrastructure, had taken place!

Steve Ricketts

STEVE RICKETTS

Director of Technical Services