When clients talk to me about their cloud native goals, resiliency, scale, and performance are top of mind. Their main goal, however, is speed. The client’s customers are demanding and constantly evolving their tastes and preferences. Every day, sophisticated customers expect new features, customizations, and privacy that force development teams to react with increasing velocity.
As one of the oldest technology companies, IBM has a company culture that is based on 100 years of experience transforming products and services. Today, IBM Cloud teams push new updates into production thousands of times a day. Our customers can draw confidence from that; if a 100-year-old company can do it, they can do it too.
One of the most-common questions I get—besides what enables use move so quickly—is how we organize our dev teams to do cloud native work. Typically, I turn that question into a suggestion—use a new project to begin re-evaluating the team’s organization, processes, and rewards.
There are usually obvious ways to get to the planned outcome faster; for example, automating everything. It’s important to give your teams the freedom to think about how they could do a task faster and better and more efficiently. Changes that come from within the team have the best chance of being adopted.
The first thing I do is to set up a design-thinking workshop that brings together all of my stakeholders; not just app development and operations practitioners, but also leaders from the line of business, marketing, sales. The initial challenge is to agree on what your customer currently needs from the end-user experience. From there, the team maps out quantifiable hills for achieving a minimum viable product (MVP) to put in front of actual customers for feedback. This process for delivering and refining an MVP is what the IBM Garage method is set up to help IBM Cloud customers do.
With the MVP defined, the team moves as fast as possible to completing it. And that’s where Kubernetes comes into play—you can start to use Kubernetes as the catalyst for transformation. Besides allowing dev and ops team members’ work to be implemented in one common way, Kubernetes inherently solves complex problems like scaling and availability as you go so that your dev team isn’t really burdened with having to solve them all at once in going into production. Because of all that, Kubernetes enables you to establish automation, which in turn creates a culture where automation is a goal wherever possible.
In terms of working on Kubernetes services in IBM Cloud, our team has become adept at creating links to information instead of having SRE interactions with customers drive the problem-solving learning curve. It’s also a great way to make sure that your documentation is excellent and evolves with all of the fast updates to applications. You get to the point where if you hear the same question twice, a GitHub issue is opened; and either your documentation gets updated because something wasn’t clear or the product itself gets updated because something is confusing in the user experience. It’s a new way of using agile thinking on several levels at once. That’s where in that automation culture really starts to kick in.
ChatOps is one strong example of automation. We started using ChatOps early on. Each chatbot has a specific persona and a specific job. For example, one is called Sir Topham Hat, based off of the Thomas the Tank Engine children’s book. The Sir Topham Hat bot helps us gate approvals that move our pipeline from dev all the way to production.
Other chatbots listen for conversations in Slack and recommend things like runbooks or steer you to an app that provides more information about a specific IP address. It’s been very interesting when the chatbots proactively add value to conversations or when you see the chatbots “talk” to each other, interacting to solve an issue without human intervention.
A key insight from working with the chatbots is that you need to provide workflow automation where people are actually working, and Slack is an active part of our teamwork every day. If you can bring more information at the right time into the tools that you’re already using, then you find yourself doing things like opening up issues in GitHub right from Slack. It makes coordinating the work so much more efficient.
As you know, Kubernetes manages containerized apps in pods on clusters of worker nodes. To globally manage clusters on IBM Cloud, our small team needed a way to efficiently drive changes across clusters in the tens of thousands. To do it, the team invented Razee, which is now an open source tool. Razee automates making changes to Kubernetes resources at great volume. So, while our core SRE team actually hasn’t grown much—maybe one or two people since the beginning—we’re now able to operate 24,000 clusters worldwide.
The path from doing MVPs to using ChatOps and Razee reflects a culture of automation that takes root around Kubernetes. It helps to you evaluate all the tools you’re using and the way you’re communicating so that you can see how to do things better from project to project.
I encourage you, as you push your team to move faster with cloud native tools and ways of working, to give them the freedom to really figure out what’s going to work for them. The more they do, the more likely they will be to find a way to automate the repetitious parts of their work.