by Jim Frey
Anyone who has followed network management in its various forms over the last couple of decades knows that true breakthroughs are a rarity. There have been plenty of ideas and innovative concepts, but few changes have made a significant difference or stood the test of time. On the data source side, SNMP was one, RMON probes were another, and NetFlow was yet another. Streaming telemetry, as a successor to SNMP, looks promising, but we’ll have to wait and see if it has lasting power.
Beyond those few significant changes in data sources, there have been plenty of attempts at breakthroughs in the network management systems (NMS) layer. In my view, these all boil down to two primary challenges – scalability and analytics. And while creative use of appliances, complex data models, rules engines, machine-assisted learning, predictive algorithms, etc. are all important elements, there’s a fundamental underlying issue that most all current products in the market suffer: pre-cloud architecture.
Cloud and big data represent real opportunities for a network management breakthrough. With the arrival of cloud comes not only accelerated deployments and operational simplicity, but something else – the compute power to take on the truly big problems in data analytics. The steadily increasing maturity of the cloud and big data technologies are making previously unimagined network management feats possible.
The Need for Change
Network management is a software application, the goal of which is to enable operational and planning efficiencies in order to assure service and uphold user experience. Its core functions are the ingest of network management telemetry data (mostly metadata), data storage and management, and presentation of the data to human and automation users such that they can make decisions to achieve the goals.
There is no lack of network management data on which to operate. There are many types: polling, logs, traffic flow record exports, pcap, routing, and network performance metrics. Each can generate large to truly massive volumes of data in a modern network. Take traffic flow data as an example. Unsampled flow data can generate UDP packets roughly equivalent to 1% of total data plane traffic on a network. At 100G, that’s 1G of flow record data! Even with high sampling rates, a single large network infrastructure element can easily spit out hundreds of millions of flow records per day.
Most network management products have been able to ingest one category of data at a high rate for some time, but beyond that, there are serious architectural deficiencies. First of all, processing limitations of pre-cloud, single server software architectures mean that products can’t ingest more than a couple of types of data at best. From a storage point of view, few network management products can store raw, detailed data at scale. Most purge the details quickly and only retain a few summaries or aggregates of the original data. From a data management point of view, even summary data is usually split up into siloed data tables with very few key fields linking them into a rigid hierarchy, because of limited memory. This makes it nearly impossible to meaningfully analyze data across different tables. Presentation has focused nearly exclusively on the UI, while most APIs are either non-native or lag far behind implemented features or data fields.
This is problematic for a number of reasons. First, by discarding all the details, the vast majority of the operational value and much of the planning value of that copious network telemetry data is lost. Different data types are trapped in siloed management tools. As those siloed tools pile up (many of which actually consume the same data, but can only manage the data for a small range of use cases), more tools such as packet brokers are needed to distribute more and more copies of the same data to the rapidly expanding toolset. Being separated, the tools require swivel chair, manual reconciliation by overtaxed humans.
The fractured, siloed and summarized state of network management leaves gaping holes in visibility, reduces humans to guesswork, and roadblocks automation.
How Big Data Changes the Picture
Big data addresses many of these issues. But before we get there, we have to be frank. Some network management/OSS observers turned negative on big data due to initial stumbles. Early attempts leveraging common big data tools distributions such as Hadoop were designed for post-processing existing sets of data. Those limitations and a related lack of use case clarity doomed most initial big data and network analytics experiments, resulting in so-called ‘data lakes’ that required millions of dollars to be spent building analytical trawlers to fish out value. Too often, data lakes degraded into stagnant data dead seas.
Big data has matured though. Newer open source and purpose-built engines allow for streaming ingest and unification of multiple data sets, retention of raw data details for months, full indexing of all fields, and ad-hoc, multi-dimensional analyses performed in real-time. Cloud-scale architectures afford the compute power to create far more sophisticated and automated anomaly detection than previously possible. Cloud-friendly native APIs make it easy to present finely sculpted datasets to external automation programs.
Use case-driven big data network analytics is happening today. Global, multi-terabit networks are using big data to automatically program geo server load balancers based on network performance, traffic flow, and geoIP datasets. Ad-technology companies are using big data to monitor performance by network path, and adjust traffic paths to assure revenue. Service providers of all sizes are using big data insights to reduce transit costs and improve service performance. Big data is being harnessed to more accurately detect attacks such as DDoS and relied upon to trigger automated mitigation. The future is now.
Big data unlocks the value of network data as never before possible. As more organizations invest in big data network analytics and experience freedom from network management siloes and automation roadblocks, there will be little reason to stay with previous approaches.
Author bio
Jim Frey
Jim has over twenty years experience in the network management tools and technology sector, in roles ranging from product manager to marketing executive to industry analyst. Most recently, he was VP of Research with Enterprise Management Associates, and before that he was VP of Marketing at NetScout Systems.