Artificial Intelligence (AI) is sparking innovation and transforming the data center landscape. Massive buildouts are required to support each new AI model, introducing increasing scale and complexity with every iteration. Staying competitive demands innovative approaches to boost operational efficiency and manage rising costs. In this blog, we’ll explore how a Network Source of Truth (NSoT), combined with a comprehensive automation strategy, can be a cornerstone of this transformation.
The scale of AI computational power, and consequently energy consumption, is expected to surge in the coming years. For instance, ChatGPT-3, released in late 2022, required approximately 1,000 GPUs to train. In contrast, ChatGPT-4, launched just a year later, demanded a staggering 25,000 GPUs—an astounding 25-fold increase. The upcoming fifth iteration of ChatGPT, anticipated in late 2024 or early 2025, is expected to require at least 20 times more computational power to deliver meaningful advancements.
Notably, data centers have made significant strides in energy efficiency, leading to substantial financial savings. However, further improvements are becoming increasingly challenging. In 2007, the average PUE (Power Usage Effectiveness) was 2.5. By 2013, this figure had dropped to 1.65. Since then, progress has slowed, with the industry average reaching just 1.58 in 2023—a modest improvement of only a tenth of a point. This suggests that approaching perfect efficiency (a PUE of 1) may take decades, yielding only incremental gains.
Identifying new areas for efficiency to offset rising energy costs will be crucial in controlling operating expenses. Network Management, being a significant driver of a data center’s operating expenses, offers ample opportunities for refinement. However, without the right solutions in place, network management costs will continue to rise due to the growing complexity and scale of AI-driven infrastructure. Consider that individual data center racks are limited by power density, typically housing only 8-16 AI-capable GPUs per rack. This constraint necessitates the interconnection of thousands of racks. Moreover, even the largest data centers will soon be unable to accommodate future AI models, requiring new interconnections and mediums between multiple data centers. The demand for network services, such as incident and change management, rises with every additional GPU, rack and interconnection, which in turn drives up personnel and operating costs.
Moreover, scaling AI necessitates rethinking network architectures. New topologies and technologies will need to be invented and iterated upon as we search for optimal solutions. For example, the Ultra Ethernet Consortium’s initiative to create a new ethernet specification, Ultra Ethernet, is specifically designed for AI solutions. Every few years, we’ll need to overhaul these topologies to align with the latest best practices. Importantly, this is not just a challenge for the largest enterprises. Take MidJourney, for example—a generative AI image service with fewer than 150 employees. Despite its small size, it ranks among the world’s largest GPU consumers. In the next decade, we can expect hundreds of organizations like MidJourney to emerge, each with its niche markets and equally demanding infrastructure needs.
Given this challenge, how can organizations keep pace with this rapid expansion? Newfound efficiency gains can be unlocked through a network automation strategy guided by a Network Source of Truth (NSoT). An NSoT models the intended state of the network and uses this data to inform automations that drive down operating expenses, reduce business risks, and enhance agility. As AI models grow in size and complexity, an automated approach offers a way to capture value. Automation enables network administrators to:
Combining AI solutions with intent-based networking is a natural synergy. In both domains, data stands as the cornerstone of insight, decision-making and innovation.
The ongoing transformation of data centers driven by AI demands scalability, efficiency, and adaptability at unprecedented levels. Leveraging a Network Source of Truth alongside a robust automation strategy equips data centers with the critical tools needed to manage this complexity, ensuring they can support AI models while maintaining high standards of performance, reliability, and operational efficiency. As AI continues to evolve, so too will the infrastructure and technologies that support it.