Using APIs for End-to-End Process Automation

Blog #3 in a Series on APIs from the ONUG O&A Working Group

Part One of this blog series on APIs for network automation set the foundation for the importance of using APIs (application programming interfaces) for network and end-to-end process automation. Part Two, we are going to look at more of the implementation details and points to consider when first looking to leverage API calls for automation. Here in Part Three, we are going to look at putting APIs to work to enable end-to-end process automation for orchestration and automation of the infrastructure.

When starting to think about process automation for an IT workflow, let’s look at it from the perspective of the O&A Working Group, as O&A being the layer between other management functions and the infrastructure endpoints being automated. Looking at this reference diagram from the O&A Taxonomy document, we can observe APIs “northbound” to other management components and “southbound” to the infrastructure endpoints. Where the integration of the API calls is implemented depends on where the workflow starts. For example, it could start from the ITSM and call an O&A solution, or it could start in O&A and make calls to both the management layer and the infrastructure layer.

Now, let’s apply this perspective to use-cases and walk through the steps explained in blog part two that describe the process for integrating API calls. Here we provide some specific use-cases and the API integrations performed to accomplish the process automation:

Use Case #1: Synchronizing the Inventory “Source of Truth” from a CMDB with an Orchestration and Automation Solution from Gluware

In this use case, the enterprise IT organization uses a CMDB for inventory management and regards it as the source of truth. The requirement was to synchronize the inventory from the CMDB (devices under management) with the O&A solution so that various lifecycle management activities could be automated. In this example highlighted at ONUG Spring 2021, the Gluware GluAPI, published RESTful API, was used and the specific calls were integrated with the CMDB to “push” the required device information into the Gluware Device Manager using a POST command. Using Gluware, network discovery and device discovery was automated to provide network operations with details about the platforms running in the network. Network Operations were also able to run regular reports and identify any discrepancies between the CMDB and what was actually running in the network. This is an example of a tactical API integration to synchronize information between two systems used for network management.

Use Case #2: Integrating an Automated Approval Process using an ITSM with an Autonomous Port Activity Scan from Gluware

In this use case, highlighted at ONUG Spring 2021, a financial organization wanted to perform automated checks of their physical switch ports and automatically ‘admin down’ ports that were not actively being used for security purposes. The challenge came in when the process also had to engage their ITSM, in this case ServiceNow, to automate the notification and approval for the change to be permitted. Using Gluware Lab, a custom API integration and workflow was created by onboarding eleven specific API calls to ServiceNow for the approval process. This was published into their Gluware instance for operational usage. The Gluware workflow periodically scans the switch port operational state, identifying unused ports, then interacting with ServiceNow for the approval by opening a change request to automatically ‘admin down’ the ports once an administrator approves the change to be implemented.

Use Case #3: API Integration with a CI/CD Pipeline to Automate adding VLANs to EVPN-VXLAN Fabric with Gluware

In this use case, highlighted at ONUG Spring 2021, Acuity Insurance automated their Juniper based EVPN-VXLAN with Gluware and required integration with their CI/CD pipeline so that software developers could make changes to the infrastructure adding VLANs when required. This implementation required standard GluAPI REST calls along with some custom calls which provided the customer with the ability to feed in variables (VLAN ID and name) that are integrated with their EVPN-VXLAN data model. The customer integrated these calls with the GitLab DevOps CI/CD pipeline to enable automated provisioning of the data center infrastructure to speed application deployment.

Use Case #4: Modular integration of ITSM and Notification systems into any Network Automation to realize rapid time to value with Itential.

In this use case, presented during Itential’s ONUG Spring 2021 Proof of Concept Session, the Network Services Team at S&P Global had a goal to accelerate their ability to build network automations and integrate them with different internal IT systems. In addition to automating a specific network task, it becomes useful to orchestrate other external IT systems as part of an end-to-end process that increases time to value. In this use case, DNS updates are automated using API integration with Infoblox, and using a modular approach to integrating external IT systems, ServiceNow was integrated into the automation workflow to create and update a change request, and notification of the automation state was integrated using Microsoft Teams. In the examples any number of different ITSM, Messaging, or other systems can be integrated quickly, allowing network teams to update automations to accommodate a rapidly changing IT ecosystem.

Use Case #5: Orchestration of End-To-End IT systems to reduce time to deploy new SD-WAN deployments. In this use case, presented during Itential’s ONUG Spring 2020 Proof of Concept Session, a large enterprise with multiple retail locations was looking for an automated solution to increase the number of SD-WAN sites that can be successfully deployed each week in order to meet specific deadlines. In order to reduce time spent manually within different IT systems, an automation was built to integrate with these systems in a single end-to-end orchestrated work flow, which reduced the time required to deploy sites and allowed the customer to meet their deployment schedule. The workflow integrates with ServiceNow as the ITSM for Change Requests and updates, with Netbox as a source of truth for network IP addresses for each site, with FedEx tracking to track physical hardware that is shipped to determine delivery status, and integration with Slack and Email for real-time notification and messaging to different IT teams. By integrating with systems and services at the API level any number of vendor solutions can be automated, and for this use case Meraki was used as the SD-WAN solution for remote deployments.

Use Case #6: Self-Service Network Automation that integrates Network Service Verification using Itential and Forward Networks.

In this use case, presented during Itential’s ONUG Spring 2021 Open Session, large enterprises that see the network as a critical infrastructure are adopting sophisticated network verification solutions that can test network changes before they are applied to the live network to ensure they will operate correctly and will not impact existing network services. End users can request a new network service through a self-service portal in Itential, which automates opening a ticket with ServiceNow and Jira, and then provides multiple notifications through Slack, Microsoft Teams, and Email. Itential integrates with Forward Networks with the proposed network service details, and the Forward Network solution verifies that the new service meets requirements and operates correctly. Itential will automate updating of tickets with verification details and when the proper approvals are met, Itential initiates the Forward Network solution to make these verified changes in the live network.

Use Case #7: Automating Manual Security Policies from Orchestral

In this example use case, a large financial services company had been afflicted by a common enterprise-grade ailment: inability to update security policies on time. Customer access to secure and confidential data was being hampered by the speed at which the access could be manually provided by the SecOps team. This slow access to data led to customer issues on making business decisions and questions of whether the service was worthwhile. This manual processing of requests led to backed up request logs, where a new request would take upwards of 4 hours to satisfy, at which time the request would no longer be relevant leaving a disgruntled customer.

  1. A User Request to update their security policy leads to a ServiceNow ticket being created with a selected existing policy to manually change.
  2. SecOps team members now manually go through entering the changes that were selected. Manually checking the updated policies and hoping they have not been updated incorrectly.
  3. SecOps team members go through the commit and push phases of deployment for each device group specified, then a final check to ensure the new policies are updated. ServiceNow record is now updated and closed.

In addressing this solution with an automated approach begins with mapping the existing process and customer interactions including the ServiceNow ticketing system. This includes pinpointing precisely which elements would be automated. Utilizing firewall integrations the next step was to orchestrate on-demand access to any secure data and remove that access after a specified time frame. With the automation platform’s flexibility to interact with both ServiceNow and the firewall platform, nothing needed to change from the existing tools and policies. Instead, the CRUD operations for the security policy updates were automated via the automation workflow engine. An end-to-end SecOps solution was written with the ability to orchestrate all CRUD operations for new or existing security policies. In doing so, the automation platform provided the Day 0, 1, and Day N phases of operation for the company’s new secured network.

The repetitive, time-consuming and error-prone task of providing and removing security access was replaced by a highly secure and error-proof automated workflow that reduces security policy change process time from 3-4 hours to less than a minute while dramatically improving customer experience.

Use Case #8: Event Driven Auto-Remediation from Orchestral

In this example use case, a user complains about an application not working, only to realize that there is a connectivity issue. The user then must open a service ticket which then alerts the company that it faces the inevitable issue of a critical outbound interface going down. The ticket gets moved by the service team to the network team with a priority 1 for immediate action regardless of what time it is. The network team, oftentimes woken up in the middle of the night to deal with the issue, must then troubleshoot and run diagnostics to discover which interface is down. Once identified the remediation action of bringing up the interface occurs, the connection is verified, and the service ticket is updated and closed.

  1. A user recognizes that a service has gone down and creates a ServiceNow ticket. Service Ops assigns the ticket to NetOps with a Priority 1.
  2. NetOps team members now manually start diagnostics, discovering that an interface is down. Remediation action is performed to bring up the interface.
  3. NetOps team now either updates the ticket if successful or continues running diagnostics to discover why the interface went down, staying as a priority level 1 task. The ServiceNow record is then updated and closed.

Introducing a completely automated solution for this problem begins with ensuring that all of the existing operations involving the company’s existing IT tools and practices are maintained. Next, a multi-vendor data collector, collects statistical data from all the infrastructure end points and publishes the data to the AI engine’s infrastructure telemetry data store. The data collector also collects all syslog information from network switches which enables the AI engine to recognize immediately when a critical outbound interface goes down. Once recognized, the AI engine will trigger an automated auto-remediation workflow without delay. The automation workflow then executes the following steps:

  1. If the router is accessible: Informs the operations teams about the outage through omni-communicational chatops and indicates the start of the auto-remediation workflow.
  2. Collects the show tech information on the router before and after the remediation action.
  3. Zips the two files as the artifact of the incident.
  4. Automation workflow engine then opens a service ticket with priority 5 on the ticketing system and attaches the troubleshooting artifact for further analysis and informs the Ops team through chatops of the new incident created and the number for analysis.

A manual process involving 2-4 hours of stressful activity for networks operations teams is eliminated and replaced by 20-40 seconds of automated workflow executions.

Use Case #9: Zero Touch Provisioning from Orchestral

In this example use case, leaders at a Fortune Top 25 company had decided that it was time to make the switch from client-server on prem software to providing a Software-as-a-Service (SaaS) cloud-based service. Not only was their IT team not prepared to operate a public cloud, but they had to delegate the procurement, installation and deployment of over 300 physical servers. Though they split into 4 sub-groups to tackle this challenge, the entire provisioning task required nearly 6 weeks to accomplish due to a simple lack of organizational agility.
The company was losing out on revenue from the servers sitting idle for so long during the provisioning process. Customers would question why they were not able to immediately utilize the services, and instead would be stuck waiting, while the company was also stuck waiting to be able to collect revenue.

  1. DC Operator Starts Server Provisioning and creates a ServiceNow ticket.
  2. NetOps, ServerOps and SecOps teams pass the task back and forth at different priority levels as they operate through the manual new server provisioning process.
  3. NetOps and SecOps update the ServiceNow ticket and give to the business side. Business now adds the task of contacting the customer to their hundreds of other tasks.

Using the company’s knowledge of what would work most efficiently for installing a single server, a workflow was outlined, built, tested and deployed utilizing an automation workflow platform. All of the same profile applications, tools and integrations were able to be utilized to ensure the company trusted the automated process. The company ran their penetration tests dependent on a number specified by the customer. This was added to the automation by enabling the automation platform to request an input for how many tests the specific customer profile required. All of the previously manual tasks were now automated. The automation platform was able to bring the new server provisioning process down from 6 weeks to just 3 minutes. Including the physical servers that needed to be installed, the company was able to receive and provision production servers within 2 days.

Revenue that would have otherwise been lost as servers sat in house for up to 6 weeks was able to be collected without delays.

Although this is the final blog in the series, the O&A Working Group will continue to look at the impact, technology and use-cases for implementing complex automation. APIs have proven to be a key enabling technology for end-to-end process automation.

Sign up to participate in the O&A Working group here – https://onug.net/working-groups/orchestration-and-automation/

Author's Bio

Michael Haugh, Rich Martin, Dale Smith

Gluware, Itential and Orchestral