This is an article that I wrote but never published back in April 2017 – So at the time of publishing this, it is over 5 Years old. Granted, a lot has changed in 5 Years, however the essence of what i wrote are incredibly relevant today, we cant rely on Monitoring and Reporting platforms to help maintain complex environments at scale:
I recently saw the TechValidate Research on Turbonomic: https://www.techvalidate.com/product-research/turbonomic/charts and it made me want to share my previous experiences as an “End User” and how the information from the report is relevant.
A few years ago I worked as a Technical Led for a mid to large company in the UK. My role was to manage a complex and slightly chaotic infrastructure which consisted predominately of VMWare , NetApp and Cisco UCS; there was of course a whole bunch of other Technology we supported like Dell, EMC and Violin Storage. As with most organisations, we were trying to keep our heads above the water in terms of everyday Issues, outages and upgrades which meant the the “fun” projects like future infrastructure initiatives were often out of the question.
To manage our environment we did what most organisations do, we had numerous Monitoring and Alerting tools in place – some of these came “free” with the products or where part of license agreements, some of them legacy tools which were still ingrained into the environments. For my team we were mainly using vROps (then vCOPS), SCOM and NetApps Monitoring tool; but we also had to look independently at EMC, Violin and Dell Tools.
We received so many alerts and warnings through, we were drowning in data – the Team started the day at 7am trying to make sense of all this. We were constantly trying to troubleshoot issues reported by End users; was it a problem with the VM, the Storage, was it even to do with us – It could be the Network (its always the network…), was it the Application or Database? So much time was wasted reviewing Logs, Reports, Graphs and other Data, speaking with the other Teams – even when we could determine that the problem was with us – we still needed to fix the problem.
I decided enough was enough, we needed a better Monitoring and Alerting tool – a Single pane of glass that could give us all the information in a single place – rather than is scurrying around between tools trying to find answers. I started to research different options and came across VMTurbo. I didn’t know much about the product at the time but it was local and so I went for a Demo – I came back into the office and said to my Boss “We need to get this”
It turns out we didn’t need another Monitoring tool – we needed an Intelligent insight into what was really happening in our environment. We needed a tool to integrate into our Infrastructure and provide us with real-time guidance with the option for it to take care of this for us. My team would be free to come in to work, knowing that our environment was as good as it could be, even before the first coffee of the day.
One of the key highlights from the TechValidate article for me, was that this is not uncommon – Reducing time spent trawling through data to make decisions is a benefit to everyone