VMware: What’s Cooking with SD-WAN?

by Tim Van Herck

May 1, 2020

Just over 20 years ago, Ronco Inventions started running infomercials for their Showtime Rotisserie BBQ Oven on shopping channels. The infomercial promoted the oven with a simple tag line: “Set it and forget it”, which became a pop culture reference.

[ngg src=”galleries” ids=”58″ display=”basic_thumbnail”]The “Set it and forget it” concept is one that many network administrators hope to find in the products that they use to operate infrastructure. They want technology and tools that are simple to deploy and require little to no tweaking after the fact. It should just work. SD-WAN has been rapidly and widely adopted as an enduring part of the IT technology mix because it incorporates these concepts. VMware SD-WAN

While this has been a very successful approach for the WAN, and we see similar approaches in other technology domains—independently, these isolated approaches can’t always provide the full picture that network admins require on a daily basis. For example, when a user calls into the help desk with the complaint that his/her Zoom session is artifacting, the network admin may only be able to see an isolated view or the customer issue. When the customer issue comes in, a lengthy process of fault isolation starts. Is it a laptop issue? The Wi-Fi? The WAN? A peering issue? An application problem? Troubleshooting the issue in this way is a process of elimination and uses up valuable time.

The Old Approach to Visibility and Correlation

In the recent past, the preferred approach to bring end-to-end visibility and correlation to pinpointing networking issues started with forming a data lake that captured and correlated all available data. This approach had limited success due to the high cost of maintaining this warehouse and—at that time—the limited machine learning (ML) and artificial intelligence (AI) capabilities that could extract meaningful recommendations on where to look first.

A New Approach to Data Correlation

VMware is very proud to participate in the ONUG AIOps working group that aims to advance an industry standard approach to data correlation, where multiple vendors can contribute technology stacks that allow an AI engine to make several suggestions, coupled with a confidence interval on the possible root cause of the misbehaving Zoom call that was reported.

This approach steps away from the use of data lakes and instead uses data available in domain orchestrators, while continuing to correlate it with other vendor data. The idea is not to build a new library with all information, but instead to hire a detective that can synthesize relevant data from the authoritative places where the data resides. It all starts from the questions that arise in the network admin’s mind when the help desk call comes in:

Which laptop is being used?
Which access point is he/she associated to and what is the signal strength?
Are the switches/routers at the user location showing abnormal behavior?
How are the WAN links and paths to Zoom performing?
Is there a reported performance issue with Zoom?

Getting data from all these sources in a data lake would be an enormous task, but if we can leave the data in place and correlate it when needed to answer a specific question, it becomes a more manageable task that can be accomplished without having to set up a data warehouse. This can be done rapidly to provide insight into recent events and narrow the scope of the investigation.

Even when simply recommending areas for further investigation, this approach greatly reduce the time network and application administrators spend in diagnosing reported issues. It also allows on-demand data gathering when key performance metrics deviate from the norm and proactive troubleshooting is warranted.

The ONUG AIOps working group is combining data from Mist access points and VMware SD-WAN through an AtScale semantic layer that can easily correlate data on demand from the existing domain orchestrators. The group is aiming to standardize the way data is represented, stored and accessed to facilitate rapid correlation. The correlated data becomes a source for an AI engine to suggest what the root cause of detected performance degradations or outages in the network can be to reduce the time needed to restore service.

As the engines become more accurate, a closed loop system can emerge that can self-heal the network based on past experiences, truly getting back to the ‘set it and forget it’ concept.

Author's Bio

Tim Van Herck

Director of Technical Product Management