July 9th, 2018
I recently joined the Google Cloud Office of the CTO (a.k.a., “OCTO”) as technical director, after five years as a chief architect at one of the world's largest insurance companies. People often ask me about the differences between these environments, to which I jokingly reply: “Ask me what’s the same; that’s a shorter answer.”
Prior to working in the insurance industry, I’d been a staff software engineer at Google for seven years. At the time I felt that all the great technology that Google and other tech giants were building could be very valuable for enterprises—even if it required them to rethink their existing assumptions and architectures. And kidding aside, having spent extensive time inside both large enterprise IT and a digital giant, I find that there are more connections and similarities between the two than one might think.
For example, when enterprises migrate their on-premises workloads to the cloud, they start by mapping their existing needs and operational models to those of the cloud providers. During this process, the OCTO team fields lots of questions related to the cloud operating model, and how it differs from their environments—at least at first sight. Some customers may even start to wonder whether the cloud is really a fit for their “traditional” enterprise, and not just for cloud-native startups. Part of our job is to work with IT leaders to connect the dots and show them why the cloud operating model is exactly what they need for their enterprises.
Today’s CIOs face an exciting, but also quite challenging time. Digital disruption and rapid technology evolution dramatically change expectations for enterprise IT, causing IT organizations to juggle rapid technology evolution and increasing demands from the business to deliver faster and at a lower cost. Speaking to numerous CIOs, we’ve found that their top agenda items tend to fall into three major buckets:
- Security: No CEO wants to be in the news for a data breach or cyber attack. Among all of a CIO’s challenges, these have the potential to not only harm the business, but also to end your career almost immediately, perhaps even with legal implications. Above all, enterprise IT systems must be kept secure.
- Uptime: Information technology is only good if it runs. Outages can also get you into a news, or at least annoy customers and cause you to miss out on revenue opportunities. No CIO likes to be called in to discuss an outage.
- Cost: While security and uptime are the main drivers, IT is still a significant cost factor in most enterprises, sometimes running into the billions of dollars. “Doing more with less” is a common theme with many CIOs as they look to embrace new capabilities while at the same reducing operational expenses.
Cloud’s digital capabilities
Meanwhile, the key capabilities that web-scale companies and cloud providers like Google use to be successful appear different than CIO’s IT priorities, at least on the surface:
- Speed: In the digital world it’s all about being fast. You’ve got to be able to launch new products quickly, either to be ahead of the competition or to run another round of experiments to make your product better.
- Automation: What makes digital businesses fast at scale is relentless automation. Google deploys billions of container instances every week—you can be assured that none of this is done manually.
- Feedback: Digital enterprises are charting new territory. Hence they need to make small steps, obtain feedback, and improve their product based on that feedback. Instead of running large projects with a meticulously defined target state, they start small, and improve from what they learn.
Connecting the dots
On the surface, these two sets of priorities look quite different: security, reliability, and cost vs. speed, automation, and feedback. However, knowing that cloud providers like Google successfully fend off cyber attacks on a daily basis, that their core services are almost perfectly reliable, and that they’re able to offer their services at a low price point suggests that there’s a connection here.
Security = Cloud speed + automation + feedback
Assuring cybersecurity is no longer a matter of a well-configured firewall and an intrusion prevention system. Both attack vectors and cyber defense have changed dramatically, making cyber security an integral part of IT operations as opposed to a bolt-on or afterthought. The most easily executed attacks often result from known vulnerabilities in outdated versions of operating systems or software frameworks. A single unpatched machine can leave the door fairly wide open for cyber attackers. Similarly, a piece of software that’s being deployed could expose a security weakness that can be exploited. In these cases, being able to revert back to a prior known state quickly is key to keeping your systems secure.
Automated deployments and upgrades are a critical part of keeping your environment secure because they ensure all system components are at a consistent patch level and software updates can be instantly reverted if need be. Being focused on speed ensures that these actions can be taken without any observable downtime to the user. For example, when the CPU exploits Meltdown and Spectre were identified, Google Cloud patched all its servers without any service disruption.
Lastly, cyber attacks and breaches occur constantly, with attack methods and defenses ending up in a kind of cat-and-mouse game that ups the ante almost every day. Therefore, your cyber defense can’t be merely a matter of planning, but also one of reacting and evolving quickly through feedback.
Uptime = Cloud automation + feedback
Hardware fails. Servers fail; firewalls fail; even failover systems fail—I once observed a significant on-premises outage due to the backup power supply not coming online when the first one failed. That’s why a single server or piece of hardware can rarely deliver the desired uptime. The classic response to failure has been to procure high-quality components and to build in redundancy—but both drive up the cost. Instead, constant feedback and automated deployment allow you to deploy additional instances of your software immediately in case of an actual failure. Such systems are resilient—they are designed to deal with failure and absorb it without noticeable end-user impact. This approach creates systems that have virtually no user-visible outages. For example, people visit the google.com homepage to see whether their internet connection is working because they’ve never seen that site fail.
Cost = Cloud automation + feedback
IT expenses are largely driven by software license costs, manual labor and hardware. Traditional, inflexible IT environments tend to massively over-provision hardware that remains underutilized and delivers a low return on capital investment. A classic example is a redundant hardware, also known as “warm standby.” The downside of that approach is that to increase system availability from somewhere around 98 percent to closer to 99.5 percent, you need to allocate twice the hardware to the application. Essentially, you double your hardware cost for an additional 1.5 percent of uptime—not a great return on investment. By automating deployment, a limited, shared pool of hardware can be used to rapidly deploy the application in case of a failure. Better yet, let your cloud provider manage this spare pool, which brings your standby cost to zero.
Additionally, cloud computing offers a consumption-based model that allows you to only pay for hardware you actually need. Automation allows you to rapidly scale up and down your application infrastructure depending on load, allowing you to optimize your cloud usage and further reduce cost. Feedback and transparency gives an indication of any remaining underutilized hardware that can be decommissioned.
Cloud models meet enterprise goals
Once you dig a little bit deeper, you quickly notice that digital companies and traditional, large enterprises have the same goals for security, availability, and cost. However, digital companies using a cloud model have learned to achieve them using different mechanisms. The good news is that what they’ve learned and built is directly applicable to, and directly benefits, traditional enterprises.
By connecting the dots, enterprise IT leaders realize that as the external world changes, a cloud-oriented operating model is the natural way to achieve the key metrics expected from a CIO and their IT organization these days.