You’re on a corporate web conferencing call using Zoom, Webex, Goto Meeting, BlueJeans, Google Talk, Skype, etc and all of a sudden it stops working. The complaints start to roll in, business is slowed, tempers flare and the first thing operational personnel think is, “what the heck, it was working fine yesterday!” Was there a configuration change, is a circuit down, is the wifi controller off-line, was there a change to the firewall, is the SaaS application down, are we experiencing a DDoS attack or victims of malware? Then the forensics start. The team with domain expertise is assembled – no easy task. The mean time to detect (MTTD) clock is ticking as is the mean time to repair (MTTR).
Finding out what went wrong is a very hard problem. The SaaS dependency map might be something like: end point device, WiFi infrastructure, switch/router, firewall, load balancer, SD-WAN circuits, public Internet connections, SaaS provider infrastructure and SaaS application. All of these dependencies represent specific domains of expertise and different vendors. In most ONUG organizations, IT architects and engineers are just at the point of building data lakes to pull monitoring data from each domain in the SaaS dependency map. Most, if not all, are finding that building data lakes is overly complex and the data in them can quickly become stale. Worse, even if operators have the data for each domain in the SaaS dependency map, they might not be able to correlate it because they would need permission from each vendor or service provider to share it with another!
Web conferencing SaaS applications are particularly difficult as they contain nearly every form of communications, mobile phone integration, text/chat, internet voice, video, file sharing, white boarding, screen sharing etc. There are so many combinations and permutations of things that can go wrong especially if you have a license for 10,000 seats or so.
So the question asked at ONUG is … what good is AI for in IT operations? It turns out SaaS applications and web conferencing in particular. There is one part technology and one part organization to solve this problem. The problem being solved is to avoid the above scenario, that is having to assemble a forensic team, building a data lake, permission to share data, application of AI for data correlation and automated remediation. That is to move from MTTD and MTTR to Automated remediation.
With that in mind, ONUG created the AIOps for Hybrid Multi-Cloud Working Group. This working group is chaired by IT executives from Morgan Stanley. FedEx, Veterans Administration and Gap. The vendors participating are VMware, Juniper and AtScale.
The focus of the work is to first create an abstraction layer that provides APIs for accessing a virtual data lake that spans multiple monitoring data repositories or a virtual data lake. That is there will not be a large data lake created where data is pulled or sent, but data stays where it is or is sent to an S3 bucket and virtualized. A cloud – based virtual data lake simplifies the process of aggregating data from multiple operational silos. No building a data lake or warehouse.
This virtualized data lake can provide permissions to data and apply metadata formats to normalize data. Data records will adhere to a common format or schema that supports the necessary fields for tagging and enriching data to correlate metrics, logs and traces spanning multiple silos. The virtual data lake construct allows machine intelligence to be applied within a specific operational domain or across multiple domains. This is being done by Atscale.
Once data is virtualized with permission accessibility between vendors is put in place, then data can be correlated and analyzed via AI algorithms to determine root causes of faults, performance monitoring, security and predictive analytic remediation possibilities. There are two working group chair companies that will donate access to their live data so a proof of concept can be built. The goal is to virtualize all state data in a Zoom conferencing application’s dependency map so that when something goes wrong, AI can root cause it and fix it.
The initial focus for the working group is to create a pilot proof-of-concept demonstration that will be based on a real-world, cloud-based SaaS application utilizing monitoring data collected across multiple domains and stored in public cloud repositories. AI-powered applications used by various operations teams will utilize data stored in these repositories by accessing a data virtualization layer that provides a unified data model which simplifies access to data collected across multiple operational silos.
At ONUG Fall in NYC on October 16-17, hosted by Cigna, the ONUG Community will start to apply machine learning and AI to a definitive IT operational problem, that is keeping web conferencing SaaS applications operational. You can learn more about the AIOps for Hybrid Multi-Cloud Working Group and become a member here. At ONUG Fall there will be multiple sessions that address AIOps and over 50 Proof of Concept demonstrations from which to choose. You’ll be able to see what’s practical, the effort required and outcome for applying AIOps to SaaS application management at ONUG Fall.
I invite you to join us at ONUG Fall and accelerate your corporation’s journey toward being a digital enterprise.