Data center monitoring and analytics (M&A) requires a comprehensive metadata collection platform that leverages advanced Big Data processing, Machine Learning (ML), and Artificial Intelligence (AI) analytics to perform application to network visibility, correlation, and modeling for data center network fabrics. The platform collects network stream metadata (not actual payload data) from across the data center network and devices using telemetry technologies. This data is stored as data lakes and processed using Big Data technology. The structure allows rapid access, searching, and processing of the huge amounts of data collected, to the tune of tens of billions of records per second. Combined with additional determination of network device state, this data allows the system to form a “network state” database in near-real-time, and record that state, much like a “network state DVR”. This data lake of current network state, and recorded past state, delivers tremendous power and capabilities to enhance network operations in many ways, resulting in direct business benefits.
Network Visualization
DC M&A should not just collect network telemetry and state information, it should parse, process, and present the data in visual formats which are easy to examine and comprehend. Visualization of the network and flow states enables rapid visual-based understanding of the health of the network and the applications running on it. This capability greatly enhances an IT organization’s ability to rapidly assess the state of the network, or isolate issues should they arise.
Application to Network Visibility
Accurate tracking and mapping of application flow information to network state creates natural and intuitive correlation maps between network nodes, paths, devices, and applications. This enables IT personnel to rapidly reconcile application issues through the network, seeing across layers of abstraction typical in SDN and virtualized networks. IT personnel can then quickly identify the relationship between application and network, resulting in a faster cycle that accurately eliminates or acknowledges the network as a contributing factor to any application performance or availability issues. Faster elimination or identification of root cause creates a better overall resolution experience for IT staff and increased productivity time for end-users.
Proactive Monitoring
In addition to collecting, creating, and storing this network state, DC monitoring should apply analytics to the data to create direct mappings and visualizations of application and network performance. By employing machine learning algorithms, it is able to build dynamic self-updating normalization watermarks indicative of learned “normal” operating environments. This enables the system to perform proactive monitoring, by indicating any abnormal events or trends in the network that could potentially lead to issues as they progress. The ability to proactively recognize these events or behaviors, and flag them for action by IT personnel before escalating, can greatly reduce the time wasted in an organization dealing with unexpected performance or service availability issues by indicating them before they become highly impactful.
Visibility + Data Source Maximize Intent-Driven Networking
Besides acting as a standalone platform for network and application visibility and monitoring, DC M&A acts as a data source, providing access to the data lake created to external applications and platforms. Providing complete state visibility of the network to the Intent Engine allows reconciliation of the network’s current operating state with its desired state, thereby preserving intent while correcting the network’s operational state to bring it back inline with requirements and within desired operating parameters.
Predictive Modeling
The next step beyond machine learning, based on proactive monitoring and intent-driven networking, is DC M&A architectural design structured to apply AI algorithms to network state data, enabling predictive modeling of the overall environment. This future capability will provide a predictive snapshot of how the network might evolve or behave given forthcoming changes or events. This type of scenario modeling will enable network operators and IT staff to model the potential impact of network changes or events before they happen, providing unprecedented risk management capabilities in network planning, resource provisioning, and fault prevention.
Summary
Data center M&A creates the foundation to a future of network monitoring that more proactively prevents and predicts network and application issues, with the intent of significantly reducing the likelihood of service outages or minimizing impact. Starting with the seemingly simple capability of providing easy-to-understand network state and related application performance visualization, ONUG M&A WG paves the way for transforming data centers towards a future network monitoring and modeling environment capable of avoiding problems in the data center before they happen.
George Zhao
Director, OSS & Ecosystem
America Research Center
Huawei Technologies Co., Ltd.
George has 25 years of working experience in networking, software architecture and has been an open source evangelist for the past 4 years. He currently is director of open source and ecosystem at Huawei Technologies Co., Ltd. George has participated in ONUG working groups since April 2016.
George received a bachelor in E.E. from McGill University and master’s on computer engineering from the University of Toronto.