The Shift to AI Infrastructure Is Underway. Is Your Enterprise Ready? - ONUG

AI Infrastructure

Build the foundation for enterprise AI. This track covers private infrastructure, cost models, and scaling AI workloads in production.

We know most Enterprises aren’t running tens of thousands of GPUs (yet!), but AI is quickly accelerating every industry. Before you can run AI, you need a network built to carry it. That means your network must be smart, secure, and ready to evolve. This session cuts through the complexity with a practical look at how enterprises can get AI infrastructure up and running without disrupting existing workloads. Learn about what it takes to run both traditional and AI workloads seamlessly, with unified operations, automation, and real-time observability across traditional and AI fabrics. Your blueprint to success is a proven modernization path. No matter where you are on your AI journey, you’ll come away with practical ways to simplify, future-proof, and supercharge your data center for the Agentic Era.

View Session

AI has evolved from passive insights to active decision-making—drafting emails, generating code, and executing tasks across enterprise systems. As AI agents grow more powerful, they also introduce unpredictability, expanding attack surfaces and compliance risks. In this session, Netskope shows how to move beyond simply enabling AI to securing it at every touchpoint. We’ll outline a practical blueprint to protect AI agents end-to-end across users, applications, and data.

Learn how to maintain visibility into agent behavior, apply advanced DLP and threat protection to reduce misuse, and build an AI security strategy that balances innovation with governance.

View Session

Across the Global 2000, most AI initiatives remain stuck in experimentation and proof-of-concept phases. The reasons are clear: GPU and accelerator costs remain high, hardware obsolescence cycles are accelerating, skilled infrastructure talent is scarce, equipment lead times are unpredictable, and many enterprises lack the power, cooling, and space required to scale AI infrastructure.
This session examines how the vendor ecosystem is responding with new silicon, system architectures, networking fabrics, managed AI infrastructure, and consumption models designed to reduce risk and accelerate enterprise adoption. The discussion will focus on how enterprises can move beyond experimentation to deploy production AI infrastructure that enables true business transformation and competitive advantage.

Key Questions

What are the biggest infrastructure barriers preventing enterprises from moving AI projects from PoC to production?
How are vendors addressing cost, obsolescence risk, and supply chain constraints in AI infrastructure?
What new architectures (accelerators, networking fabrics, composable infrastructure) are emerging to make AI more enterprise-ready?
Can new consumption models—AI infrastructure-as-a-service, managed GPU clusters, or hybrid deployment models—reduce capital risk?
How should enterprises plan infrastructure investments when the pace of AI hardware innovation is so rapid?

Takeaways

A clear view of the real infrastructure headwinds slowing enterprise AI adoption today.
Insight into new vendor innovations in silicon, networking, and system design aimed at reducing cost and deployment friction.
Practical guidance on how enterprises can build AI infrastructure strategies that minimize risk while enabling large-scale transformation

View Session

AI Networking

Redesign the network for the AI era with agentic overlays, A2A fabrics, and architectures built for autonomous systems.

As AI models continue to scale, both training and inference are growing rapidly in operational importance. Training pushes the limits of compute density and interconnect scale, while inference now dominates production workloads. Together, these forces are reshaping AI system architectures.

Meeting these demands requires a next-generation networking fabric that can:

Scale up within and across a small number of racks to tightly couple XPUs for high-throughput training and low-latency inference
Scale out across entire data centers using flat, high-performance topologies that support large-scale training and high-fanout inference workloads
Scale across geographically distributed data centers, enabling unified AI fabrics that support million-plus-XPU training and inference environments.

We will present the latest advancements in industry initiatives—including Ethernet Scale-Up Networking (ESUN), Scale-Up Ethernet Transport (SUE-T), and Open Cluster Design for AI—and show how Ethernet is democratizing large-scale AI deployments through insights from G42 and other AI operators.

View Session

Enterprise network and security operations centers are sitting on a goldmine of untapped efficiency — not from bleeding-edge autonomous agentic AI, but from mature, battle-tested capabilities already at their fingertips. This panel brings together practitioners who have deployed large language models, machine learning pipelines, and intelligent automation within live NOC and SOC environments to cut mean-time-to-resolution, reduce alert fatigue, and reclaim analyst capacity. Panelists will share concrete before-and-after metrics, walk through implementation playbooks that worked — and the integration pitfalls, data-quality traps, and organizational resistance that nearly derailed them. If your team is still triaging thousands of alerts manually or troubleshooting network issues with legacy runbooks, this session delivers the pragmatic roadmap to operationalize AI capabilities that are proven, available today, and delivering measurable ROI.

View Session

Agentic AI overlays promise networks that sense, reason, and act—but the real challenge is treating AI agents as first-class identities in the network. Building on ONUG’s A2A and Agentic Overlay concepts, this session explores how to standardize identity, trust, and messaging among agents, and how those agents interact with the underlying network fabric.
Key Questions: – What should an A2A reference architecture look like in a large enterprise (broker vs. mesh, policy dialects, schemas)? – How do we provide Zero-Trust identity and least-privilege capability routing for agents issuing network changes? – What telemetry is needed to link agent intents → network changes → workload outcomes → cost? – How do you avoid “agent sprawl” and conflicting policies across multiple vendors’ AI assistants?
Takeaways: – A clear mental model of Agentic Overlays and A2A in networking. – Governance patterns for letting agents safely orchestrate NaaS, routing, and provisioning.

View Session

As traffic demands increased, eBay needed to scale network bandwidth without sacrificing cost efficiency or operational predictability. In this session, eBay will discuss how it used open networking and SONiC to move from 100G to 400G, enabling faster upgrades, greater hardware flexibility, and stronger long-term operational control. Attendees will gain practical insight into how standardized, vendor-agnostic networking can help modernize infrastructure while supporting scale, efficiency, and consistent NetOps operations.

View Session

As AI clusters scale toward hundreds of thousands of GPUs, the network fabric dictates the overall efficiency of the compute investment. Transitioning to 1.6TbE port switches enables architects to build a high-radix, 102.4Tbps foundation that flattens network hierarchy and eliminates additional hops.

This session demonstrates how 1.6TbE infrastructure optimizes AI workload performance at scale. We will evaluate the impact of 224G SerDes signaling and examine how support for Linear-drive Pluggable Optics (LPO) reduces per-link latency and cuts power consumption by up to 50%—a critical metric for sustaining the massive data bursts of large-model training.

Join the session to learn how a 102.4Tbps platform provides the flexibility required for modern AI data centers.

View Session

IT teams struggle with complex, siloed troubleshooting. See how Zscaler Digital Experience (ZDX) ends the blame game with a unique advantage: a single, lightweight agent providing end-to-end Zero Trust for both security and comprehensive monitoring. This unified data powers our AI Agents, delivering Tier-3 expertise directly to your Service Desk. Join us to see live examples of how to instantly isolate root causes and accelerate resolution from hours to minutes.

View Session

Networking plays a critical role in the accelerated compute fabric. Critical features to enable high performance networking for AI workloads are being defined and ratified in UEC and ESUN (OCP). Marvell is a contributor to both these industry standards and has developed a full portfolio of Scale-out and Scale-up switches powered by SONiC to enable best in class AI fabrics.

View Session

eBay operates one of the most advanced enterprise networks built on SONiC and large-scale automation. In this technical deep dive, Rick Casarez provides an inside look at the architecture, design principles, and operational decisions behind eBay’s production SONiC deployment.

Rick will also demonstrate how eBay is applying AI-powered LLMs to assist and automate NOC workflows, enabling engineers to investigate incidents faster, interpret telemetry, and streamline troubleshooting runbooks. The session concludes with a look at how eBay’s network architecture is evolving to support AI workloads and more autonomous network operations.

Key Topics Covered
• Architecture of eBay’s production SONiC-based network
• Design principles behind eBay’s disaggregated networking strategy
• Hardware abstraction and multi-vendor switching architecture
• Automation frameworks and infrastructure-as-code for network operations
• Telemetry, observability, and operational tooling at scale
• Applying LLMs to assist and automate NOC workflows
• AI-assisted troubleshooting, incident investigation, and operational runbooks
• Preparing enterprise networks for AI infrastructure and high-performance workloads

Who Should Attend
• Enterprise network engineers and architects evaluating or deploying SONiC
• NOC and operations leaders exploring AI-assisted operations
• Infrastructure engineers responsible for automation and network reliability
• Platform and SRE teams operating large-scale distributed infrastructure
• IT leaders preparing networks to support AI and high-performance compute environments

View Session

As enterprise networks become more critical — and more complex — the limits of traditional network performance monitoring are increasingly exposed. In today’s AI-driven digital environment, networks remain one of the most static and risk-averse layers of infrastructure, even as business demands for agility, resilience, and real-time insight continue to rise.

Join IDC’s Mark Leary for a data-driven look at how the bar for network observability has been raised — and what IT leaders and suppliers must do to meet the moment

View Session

For most organizations, the low-hanging network issues are solved; thresholds, polling, and playbooks handle them. The remainder? Issues that cross domain boundaries, evade static rules, and cannibalize NOC time while defying existing solutions. At IBM, we believe the right AI architecture can transform this space: time-series foundation models that observe without thresholds, paired with agentic reasoning that forms root cause hypotheses across your full stack. This session covers why the architecture matters and how context engineering, composable skills, and the open Model Context Protocol (MCP) create an extensible AI pipeline enterprises can localize and govern, with human oversight at every decision point. You’ll leave with a clear framework for evaluating where AI fits in your operations model and what it takes to build trust between NOC teams and intelligent systems.

View Session

Agentic Networking is the next evolution of cloud and network operations, where intelligent, autonomous agents – powered by LLMs, real-time telemetry, and policy-driven intent – continuously observe, decide, and act across complex, multi-cloud environments. Instead of manually tuning networks or relying solely on static automation, agentic systems interpret high-level business intents (minimize latency for this workload, shift traffic for cost efficiency, isolate anomalies before impact), translate them into actionable network changes, and coordinate with other agents to optimize end-to-end performance. This creates a self-healing, self-optimizing fabric that adapts dynamically to demand, failures, and emerging threats.

We will discuss how Agentic Networking transforms the network from a static transport layer into an intelligent, collaborative control plane: accelerating operations, reducing risk, and enabling entirely new levels of agility for AI-era workloads.

View Session

Building on the NYC “From NOC to AOC” concept, this session focuses on operational trust: how to evolve from human-centric operations centers to Agentic Operations Centers (AOCs) where AI agents handle most monitoring and remediation—but humans still own accountability. It ties directly into ONUG’s 2026 focus on agentic operations you can trust, including standard identity, guardrails, and observability.
Key Questions: – What work should be fully handed to agents in 2026 (Tier-1, Tier-2 incidents, routine changes), and what must remain human-centric? – How do you train and certify “agent supervisors” to oversee closed-loop systems without being buried in noise? – How do you design human-in-the-loop patterns that are fast enough for AI-era incidents yet still satisfy governance? – What changes in org design, skills, and incentives are needed for teams to trust automation?
Takeaways: – A target operating model for AOCs, including roles, skills, and escalation patterns. – Examples of closed-loop NetOps/SecOps where agents deliver measurable MTTR and SLO gains.

View Session

AI Factories are expensive ecosystems. Design, Deployment, and workload scheduling need to be automated to ensure the desired outcome. This session shows how Aviz ONES delivers the blueprint for SONIC networks and end-end orchestration experience through partner integrations.

View Session

AI Security

Secure the rise of AI agents with zero-trust frameworks, guardrails, and full visibility into machine-driven actions.

As enterprises adopt agentic AI, they face a new class of adversary: autonomous, goal-seeking agents that probe controls, data, and policies at machine speed. Building on ideas like “Digital Darwinism” and adversarial agents competing to optimize infrastructure, this session asks: what happens when attackers weaponize agents—and how do
we respond?
Key Questions: – How will adversarial agents change red-teaming, penetration testing, and threat modeling? – What defensive patterns are emerging for agent endpoint protection, data controls, and PII sensitivity mitigation? – Where do AI guardrails, content moderation, and design best practices fit in the security architecture versus at the application layer? – How do we monitor for agent-vs-agent “arms races” inside the enterprise and prevent unintended escalation?
Takeaways: – A taxonomy of adversarial agent threats relevant to Global 2000 environments. – Concrete examples of guardrail policies and monitoring approaches that actually reduce risk.

View Session

This session explores the growing risk posed by quantum-enabled decryption, referred to as Q-Day, the point at which sufficiently powerful quantum computers could undermine widely used public-key cryptosystems. Attendees will examine what this shift means for today’s networks, including the “harvest now, decrypt later” risks and the operational impact of large-scale cryptographic compromise. The discussion will also look at quantum technologies as part of a forward-looking security strategy. Panelists will discuss where these approaches are most relevant, how they can complement broader migration efforts, and what organizations should be doing now to build resilient, tamper-evident communications that can withstand future quantum adversaries.

Who Should Attend:
Cybersecurity professionals, network engineers, IT security architects, and executives responsible for data protection in AI-driven environments. This session is especially valuable for those in finance, healthcare, government, and critical infrastructure sectors where encryption compromise would have severe consequences.

What You Will Learn:

The mechanics and realistic timelines of quantum decryption threats, including the impact of algorithms like Shor’s on current cryptographic standards.
Core principles of quantum networking, such as QKD and entanglement distribution, and how they differ from and complement classical security methods.
Strategies for assessing organizational exposure to quantum threats and planning migration to quantum-resistant and quantum-secured architectures.
Practical first steps for evaluating, piloting, and implementing quantum networking solutions to protect sensitive data flows against future decryption risks.

View Session

Enterprises are deploying agentic AI, but risk controls remain incomplete. Observability tools map which agents exist and how they interact, while gateways filter prompts and responses—yet neither addresses the core problem: what the agent actually executes inside enterprise systems.

In this session, Rein Security responds to the ONUG Agentic AI Challenge by demonstrating how an Agentic AI Control Plane can govern autonomous systems in production.

Using the ONUG AOMC demo architecture, Rein will show how enterprises can monitor and enforce the three W’s of agent behavior—who accessed a system, what they accessed, and why the action occurred.

View Session

Agentic AI systems are useless without provable identity, least-privilege authorization, and strong supervision for non-human actors. This session builds directly on board use cases calling for Zero-Trust identity frameworks for agentic AI in Tier-1/Tier-2 operations, extending those concepts into a full enterprise security model.
Key Questions: – How do you assign and manage identity, credentials, and posture for agents that can provision, patch, and reconfigure networks, applications, and data pipelines? – What does capability-scoped, time-boxed authorization look like in practice for AI tools and agents? – How do we create signed, auditable action trails that satisfy internal audit and regulators for “who/what acted, when, and with what policy”? – How do you integrate agent identity into existing Zero-Trust and privileged access strategies?
Takeaways: – A practical pattern for a Zero-Trust identity framework for agentic AI across network, cloud, and ITSM stacks. – Controls and dashboards that make agent actions observable, provable, and reversible.

View Session

This session presents a collaborative approach to building hybrid AI infrastructure that balances public AI services with private model deployment. As AI technology, economics, and governance rapidly evolve in unpredictable ways, the panel will discuss the need for an Enterprise AI Fabric that delivers long-term flexibility and choice. Moderated by Nick Lippis, the session will explore how aligned architectures, data strategies, and operational models can create secure, high-performance environments adaptable to changing AI consumption patterns. Attendees will gain insight into how coordinated multi-vendor ecosystems can future-proof enterprise AI infrastructure.

What You Will Learn

How Cisco forms collaborations on a hybrid AI infrastructure demo
The role of agentic security in securing hybrid AI environments
Key coordination requirements across vendors to deliver integrated AI infrastructure
How validated hybrid AI usage data informs infrastructure and operational decisions

Who Should Attend

Enterprise IT executives evaluating hybrid AI strategies
Infrastructure and platform leaders responsible for AI deployment models
Security and networking architects focused on AI and agentic operations
Technology decision-makers interested in multi-vendor AI infrastructure collaboration

View Session

This session examines quantum-safe networking as an emerging approach to strengthening security for high-bandwidth connections across large-scale enterprise data center environments. We will define key concepts in quantum-safe networking – including the role of quantum phenomena such as entanglement in supporting secure, tamper-evident communications over fiber infrastructure and hybrid solutions.
With cyber risk increasing and AI workloads driving demand for large-scale data movement, the discussion will explore where conventional cryptographic approaches face operational and lifecycle challenges on high-speed links, and how quantum-safe architectures may complement existing security models. The session will also address practical adoption considerations, including readiness assessments, implementation planning, and hybrid quantum-classical deployment strategies beyond PQC transitions already underway.
Attendees will leave with a clearer understanding of the technology landscape, relevant use cases, and a pragmatic framework for evaluating quantum networking within enterprise infrastructure roadmaps.

View Session

AI Automation

Transform operations from NOC to AOC with trusted automation, human-in-the-loop design, and agent-driven workflows at scale.

Enterprises are investing heavily in AI, yet many initiatives stall when moving from pilot to production—not because of models or data, but the network. The connectivity layer remains fragmented, manual, and static, lacking a unified system of truth across locations, providers, and services. This session introduces a programmable connectivity layer for AI-ready infrastructure, combining location intelligence, automated discovery, quote-to-order automation, and continuous reconciliation. By transforming connectivity into a dynamic, software-driven foundation, enterprises can accelerate deployment, improve cost and performance decisions, and enable AI systems to autonomously manage and optimize infrastructure across multi-cloud and edge environments.

View Session

The phrase “network autonomy” is everywhere. But what does it actually look like in practice? In this session, we will demonstrate an interlocking AI architecture working across a real-world network to identify, triage and resolve traditionally hidden issues in minutes, not hours. Watch foundation models detect the anomaly, the reasoning engine correlate it across topology and config state via MCP, and agentic processes propose and execute governed remediation through Ansible. As important, we will introduce a working model for human + AI partnership where every AI decision is transparent, confidence-building and overridable.

View Session

This presentation explores the collapse of the legacy, reactive NOC and the rise of the AI-driven Agentic Operations Center (AOC). It outlines how enterprises must strategically delegate operational toil to AI—automating telemetry correlation, triage, and analysis—while preserving human oversight for high-stakes decisions. The session highlights Human-in-the-Loop design, agent-to-agent collaboration, and Zero-Trust guardrails as foundations for safe closed-loop automation. Ultimately, it frames the AOC as both a technological and organizational transformation, elevating engineers from reactive troubleshooting to strategic architecture and autonomous network design.

View Session

In multitenant AI clusters, bandwidth is not the SLA—Job Completion Time (JCT) is. In this joint session, NextHop.ai and Keysight present a real-world study of how load balancing modes, ECN thresholds, and congestion control tuning directly impact JCT in shared AI Ethernet fabrics. Using production-class switching and traffic emulation, we demonstrate how overly conservative settings can double completion times and waste GPU cycles—and how systematic tuning makes JCT predictable. Attendees will gain a practical framework for designing and validating AI networks around measurable performance outcomes, not just throughput.

View Session

GenAI is powerful in network operations—but it’s probabilistic by nature, meaning it produces “most likely” answers that can vary with small prompt changes, making it risky as a direct control mechanism for repeatable, auditable change workflows. In contrast, deterministic automation (RPA in ops contexts) is built for consistent execution: enforcing config standards, running verified OS upgrade workflows, and remediating drift with predictable outcomes.

In this session, you’ll get a simple decision framework for when to use GenAI vs. deterministic automation, why trusted data (a network source of truth) is a prerequisite for either approach to work reliably at scale, and how to avoid “guardrail sprawl” when teams try to force GenAI into deterministic behavior. We’ll close with the practical operating model that balances speed with control: GenAI accelerates triage and recommendations; humans approve; deterministic automation executes changes with pre-checks and rollback plans.

View Session

As AI agents become more capable, a new operational model is emerging: role-based agents designed to support specific operational responsibilities. IT and Network Operations teams interact with agents tailored for frontline responders, specialists, and reliability engineers. These agents combine predictive machine learning, contextual memory, and generative reasoning to summarize incidents, identify root cause, recommend validated remediation steps, and execute approved actions within policy guardrails. In this session, we explore the architecture behind role-based operational agents and how they integrate predictive intelligence, episodic memory, and cross-domain telemetry learning. The discussion previews Grokstream’s L1 Agent for frontline operations and how role-based agents may redefine incident management at scale.

View Session

As enterprises pivot toward autonomous orchestration, a context gap has emerged. When AI agents rely on Source of Truth systems that are only 60-80% accurate, they can amplify operational errors at machine speed, leading to system failures and operational drift.

This session provides insights into how companies are leveraging Agentic AI frameworks to ensure next-gen automation and AI projects are fueled by quality data needed for success. We will focus on:

Data Readiness by implementing a reconciliation agent to resolve data conflicts
Operational Efficiency by reclaiming the 80% of expert time lost to manual tasks
Sustainability by identifying and validating potential “Zombie Resources”

Learn to leverage an AI Intelligence Engine that transforms fragmented data into a trusted foundation for the AI-native enterprise.

View Session

Special Programs

The Special Programs Track highlights the breakthrough technologies and initiatives shaping the future of enterprise IT including SONiC and Quantum Computing

What You Will Learn:

The mechanics and realistic timelines of quantum decryption threats, including the impact of algorithms like Shor’s on current cryptographic standards.
Core principles of quantum networking, such as QKD and entanglement distribution, and how they differ from and complement classical security methods.
Strategies for assessing organizational exposure to quantum threats and planning migration to quantum-resistant and quantum-secured architectures.
Practical first steps for evaluating, piloting, and implementing quantum networking solutions to protect sensitive data flows against future decryption risks.

View Session

Is Your Enterprise Ready for the Shift to AI Infrastructure?

Enterprise IT leaders have moved past the AI hype cycle and into execution.

AI Infrastructure

AI Networking

AI Security

AI Automation

Special Programs

Is Your Enterprise Ready for the Shift to AI Infrastructure?

Enterprise IT leaders have moved past the AI hype cycle and into execution.

AI Infrastructure

From AI Ambition to AI Infrastructure: Your Blueprint to Success for Enterprise Networks – Cisco Triple-T

Securing Agentic AI: From Insight to Oversight – Netskope Triple-T

Enterprise-Ready AI: Breaking Through the Infrastructure Barriers to Production

AI Networking

Distributed Computing @ Scale for AI Training & Inference – Main Stage Keynote

Beyond the Hype: Driving Real NOC & SOC Efficiency with Proven AI

Agentic Overlays & Agent-to-Agent Fabrics – Making Autonomous Agents First-Class Network Citizens

eBay Gains Cost Efficiency and a 4x Bandwidth Boost with Open Networking and SONiC

Scaling AI Infrastructure: Optimizing Performance with High-Density 1.6 TbE Switching – Celestica Triple-T

Pinpoint Root Causes in Minutes: AI-Guided Troubleshooting – Zscaler Triple-T

Networking for AI Workloads: Making the Network AI-Aware – Marvell Semiconductor, Inc. Triple-T

Inside eBay’s AI-Powered Network: SONiC, Automation, and AI-Assisted NOC Operations

How to Achieve Network Observability Success in 2026

How Human + AI Partnership Will Redefine Network Operations

Agentic Networking: The Self-Driving Network for the AI Era

From NOC to AOC – Agentic Operations You Can Trust

Turn AI Factories into Profit Centers – Aviz SONiC Triple-T

AI Security

Defending Against Adversarial AI Agents – From Digital Darwinism to Guardrails

Quantum Solutions and the Impending Decryption Threat: Preparing for Q-Day

Securing Enterprise AI in Real Time: Monitoring and Protecting What Agents Are Really Doing

Zero-Trust for Agentic AI – Identity, Policy and Control for Non-Human Actors

Hybrid AI Infrastructure and AgenticOps Security in Practice

Quantum Networking for Quantum-Safe Data Center Connectivity in Enterprise-Scale Environments

AI Automation

The AI Stack Is Programmable—Except the Network: Why Deployment Fails & How to Fix It – Connectbase Triple-T

From NOC to AOC – Agentic Operations You Can Trust

IBM Network Intelligence: Live AI Detection, Diagnosis, and Remediation – IBM Triple-T

From NOC to AOC: Architecting the AI-Driven Operations Center – Kentik Triple-T

The Network Tax on AI: How Fabric Bottlenecks Are Killing Your GPU ROI – Keysight Triple-T

AI Recommends, Automation Executes: A Practical Framework for GenAI + Deterministic Automation in Network Ops – Network to Code Triple-T

Role-Based AI Agents for the Modern NOC – Grokstream Triple-T

Architecting for Truth, Efficiency & Sustainability in the Agentic Era – DevAI Triple-T

Special Programs

Quantum Solutions and the Impending Decryption Threat: Preparing for Q-Day

Quantum Networking for Quantum-Safe Data Center Connectivity in Enterprise-Scale Environments

Turn AI Factories into Profit Centers – Aviz SONiC Triple-T

Inside eBay’s AI-Powered Network: SONiC, Automation, and AI-Assisted NOC Operations