When it comes to network management, there is no lack of data. Screen response times, security alerts, spin times, latency statistics – we collect, store, and process data at all layers of the network.
Being able to correlate and make sense of that data is another matter. Few are more informed of the attempts to standardize data correlation than Neal Secher. The Managing Director and Head of Network Architecture at BNY Mellon, a world-renowned leader in investment management and investment services, Secher chairs ONUG’s Network State Collection, Correlation, and Analytics working group.
“A number of years ago, I was called into a wide-scale network outage,” he emailed to me the other day around the time I was nurturing my second cup of coffee. “It was reported that all communications in the entire network ‘broke’ at about the same time. After a detailed troubleshoot, the problem was determined to have originated from a backward step on one of our Network Time Protocol (NTP) servers. Fast forward to today, if we were correlating events from our NTP clocks with state information from the network and security devices, we would have been able to determine root cause more quickly and begin working on a mitigation strategy.”
The kind of open software-defined network (SDN) taking shape in ONUG is an opportunity to address these challenges. “As we move toward open networking paradigms, we need to move beyond the sins of the past where functionality trumped visibility. Networks have traditionally been complex, and network administrators struggle to understand the applications that run on top of their infrastructure. This complicates the impact of change and lengthens the mean time to resolution during outages. We need analytics built into our networks from day one.”
Netflow, for example, is helpful in diagnosing networking problems, but just try correlating Netflow data with the logs of firewalls, servers, and other vendor equipment. “The correlation is crucial. We gather a great deal of analytic information from our networks and applications, but that data is stored in different toolsets. We need to join these data pools together to improve mean time to resolution. On the planning side, we need to understand the hot points within the network and adjust workload placement or capacity upgrades that will meet the changing needs of our applications.”
The Network State Collection, Correlation, and Analytics Working Group is focused on solving these issues by:
Three use cases are being developed: security analytics, real-time performance analytics, and real-time topology analytics. “While there is some recent focus on the application of east-west security policy within a data center, detection of security events remains a challenge,” he says. “Analysis of network state information and trending of state data over time would enable an organization to identify the security posture of its network. Understanding your network is under attack, as early as possible, is key to successful mitigation of that attack.”
If successful, the group’s work will help operations people everywhere. “What we’re addressing is what network analysts do on a daily basis,” says Secher. “Our goal is to provide a framework for faster detection of issues in the network, as indicated by state conditions, and provide the ability for performing “what-if” analysis on the network.”