2021.02.28 – Get in loser, we’re going cloudbusting! Image by Bill Hunt. The “Cloudbuster” was a device invented by William Reich to create clouds and rain by shooting “energy” into the sky through a series of metal rods. Although Reich was paid by many desperate farmers to produce rain, the device was never proven to work. It’s been ten years since the Office of Management and Budget (OMB) released the original Federal Cloud Computing Strategy. I had the opportunity to update this strategy two years ago when I served as the Cloud Policy Lead at OMB. Having spent 20 years in the private sector building bleeding-edge cloud infrastructure for some of the best known companies in the world, I was able to leverage my practical experience in the creation of the 2019 Federal Cloud Computing Strategy, “Cloud Smart”. During the course of my work at OMB, I spoke with hundreds of practitioners, policy experts, and Chief Information Officers (CIOs) across government. From this vantage point, I had an intimate view into the entire Federal technology portfolio and learned that many myths about cloud computing were being accepted as truth. In this article, I’ll debunk key myths about cloud adoption, and explain why - and when - cloud is appropriate for government. These myths are generally intended for civilian Federal agencies of the United States, but the recommendations below apply to any public sector organization - and even some private organizations as well. In part two, I’ll discuss some strategies for overcoming the pitfalls discussed here. Both guides are available to download as a single PDF
Myth 1: Cloud Is CheaperThe main reason cited by Federal agencies to move to commercial cloud is the promise of cost savings. This myth originated with vendors and was repeated by Congress, eventually becoming a common talking point for Executive Branch agencies. Unfortunately, it is based on false premises and poor cost analyses. In practice, the government almost never saves actual money moving to the cloud - though the capabilities they gain from that investment will usually result in a greater value. At a glance, it can appear that moving applications to the cloud may be cheaper than leaving them in a data center. But in most cases, a Federal agency will not see much, if any, cost savings from moving to the cloud. More often than not, they end up spending many times more on cloud than for comparable workloads run in their data center. Experts have known this was a myth for at least a decade, but the lobbyists and salespeople were simply louder than those who had done the math. First, it’s important to note that most Federal agencies own outright the facilities their data centers are located in. In the 1980s and 1990s, agencies began repurposing existing office space for use as data centers, adding in advanced cooling and electrical systems to support their growing compute needs. This changes the equation for the total cost of ownership because the facilities are already built and can be run relatively cheaply, though they may be partially or fully staffed by contractors due to the constant push to outsource all work. The government has also built a few best-in-breed data centers such as the Social Security Administration’s flagship data center that can compete with some of the most efficient commercial facilities in the world, with solar collectors for electricity generation, and advanced heat management systems for reduced energy usage. However, these super-efficient facilities are only represent a handful of the over 1500 data centers the government owns and operates, and cost half a billion dollars each to build. Second, agencies routinely run their servers and equipment well past the end-of-life to save money. There are no Federal requirements to update hardware. In fact, until recently, Federal data center requirements for efficiency measured the utilization of servers by time spent processing, which disincentivized agencies from upgrading - older hardware runs slower and thus results in a higher utilization rate for a given task than a newer, more efficient server that completes the task quickly. During a budget shortfall, an agency with a data center has the option of skipping a hardware refresh cycle or cutting staff to make up the deficit; meanwhile, an agency that is all-in on cloud loses this option, as they will have to continue paying for licenses, operations and maintenance costs. As a result, agencies will need to future-proof their plans in more innovative ways, or better communicate funding priorities to OMB and Congress. Also, it’s important to realize that once the government does buy hardware, the government owns it outright. When you move your application to a commercial cloud, you’re paying a premium for data storage even if it’s just sitting around and not being actively used - for large amounts of data, cloud costs will quickly skyrocket. The government maintains decades worth of massive data sets - NASA generates terabytes of data per day, and even a tiny agency like the Small Business Administration has to maintain billions of scanned loan documents going back to its inception sixty years ago. This is why some major companies have moved away from commercial cloud and built their own infrastructure instead. I would note that the idea of workload portability - moving a service between different cloud vendors, generally to get a cheaper cost - is also largely a myth. The cost to move between services is simply too great, and the time spent in building this flexibility will not realize any savings. Moreover, every cloud vendor’s offering is just slightly different from its peers, and if you’re only using the most basic offerings which are identical - virtual servers and storage - you’re missing out on the full value that cloud offers.
Myth 2: Cloud Requires Fewer StaffAnother promise of cloud cost savings is that an agency no longer has to keep data center engineers on staff. These practitioners are usually comparatively cheap to employ in government, and rarely reach a grade above GS-13 ($76K-$99K annual salary) and agencies moving to cloud will instead employ comparatively expensive DevSecOps practitioners, site reliability engineers, and cloud-software engineers to replace them when moving applications to IaaS or PaaS. These types of staff are extremely difficult to hire into government as they make very high salaries in the private sector, well in excess of the highest end of the General Schedule pay scale (GS-15: $104K-138K), even assuming an agency has the budget and staff slots open to create a GS-15 position in the first place. Due to the many flaws in the government hiring process, it also can be very difficult to recruit these people into government, even with the new OPM hiring authorities to streamline this process. An agency that chooses to outsource these skills will often find that contractors may cost even more than hiring capable staff. The agency will still need to have staff with cloud experience to actively manage these staff, and contracts will need to be carefully crafted around concrete outcomes so that the agency is not fleeced by a vendor. Another overlooked cost here is training. New solutions aren’t always easy for agencies to adopt - whether that’s a fancy software development tool or something as simple as a video chat platform. Personally, a day doesn’t go by that I don’t find myself explaining to a customer some aspect of Teams or Sharepoint they don’t know how to use. Agencies often must provide formal training, and of course there’s inevitably a loss of productivity while teams get up to speed on the new tools and solutions. Since many SaaS vendors roll out new features extremely rapidly, this can present a challenge for slow-to-adapt agencies. Although some training is provided free from vendors, this rarely suffices for all of an agency’s needs, so in most cases further training will have to be purchased.
Myth 3: Cloud Is More SecureA constant refrain is that cloud is safer and more secure, owing to the fact that the servers are patched automatically - meaning that key security updates are installed immediately, rather than waiting for a human to make the time to roll out all of these updates. For a large enterprise, this is historically a very time-consuming manual process, which automation has improved dramatically. However, the same tools that major corporations use for patching in the Cloud are largely open source and free, and they can be used in an agency’s own data center. Moreover, it’s important to note that cloud does not remove complexity, it only hides it in places that are harder to see. When it comes to security, this is especially true, as organizations must adapt to highly-specialized security settings that are not always easily found, particularly with the IaaS offerings. These settings are also constantly changing because of the constant-patching of these vendors, and all too often with little notice in the case of SaaS offerings. This “double-edged sword” has resulted in a number of high-profile cloud-related breaches over the last few years - affecting both the public and private sectors alike as we learn best security practices the hard way. Cloud vendors have also been… less than enthusiastic about meeting government security and policy requirements, unless the government is willing to pay a very high premium for the privilege of security. (I talked about this contentious relationship more in my post on Automation Principles.) For instance, as of today no major cloud vendor completely meets the government requirements for IPv6 which have been around for 15 years and which OMB recently revised to try to get them to move faster.
Myth 4: Cloud Is More ReliableThis one is less of a myth and more of an overpromise, or fundamental misunderstanding of the underlying technology. For a long time, one of the main pitches of cloud is that of self-healing infrastructure - when one server or drive fails, a new one is spun up to replace it. Although this is something that can be implemented in the cloud, it’s definitely not the default. Specifically, for IaaS solutions, you have to build that into your application - and you don’t get it for free. Relatedly, many agencies assume that any application put into the cloud will automatically scale to meet any demand. If your agency’s website gets mentioned by the President, let’s say, you wouldn’t want it to collapse due to its newfound popularity. Without building infrastructure designed to handle this, simply being “in the cloud” will not solve this problem. However, solving it in the cloud will likely be faster than waiting for physical servers to be purchased, built, shipped, and installed - assuming you have staff on-hand who can handle the tasks. It is important to keep in mind cloud is, by definition, ephemeral. Servers and drives are often replaced with little-to-no notice. I’ve frequently had virtual machines simply become completely unresponsive, requiring them to be rebooted or rebuilt entirely. When you’re building in the cloud, you should assume that anything could break without warning, and you should have recovery procedures in place to handle the situation. Tools like Chaos Monkey can help you test your recovery procedures. One issue that some of the most seasoned practitioners often miss is that all cloud providers have hard limits on their resources that they are able to sell you. After all, they are just running their own data centers, and there are a fixed number of servers that they have on-hand. I have often encountered these limits in practical, seemingly-simple use cases. For instance, I’ve created applications which needed high-memory virtual servers, where the provider didn’t have enough instances to sell us. During the pandemic response, I also discovered that cloud-based email inboxes have hardcoded, technical limits as to the volume of mail they can receive. I had assumed we could simply buy more capacity but this was not the case, requiring a “Rube Goldberg machine” workaround of routing rules to handle the massive increase associated with a national disaster. There is no question that scalability is a huge benefit, until the practical limits become a liability because of your assumptions.
Myth 5: Cloud Must Be All-or-NothingMany organizations assume that the goal is to move everything to a commercial cloud provider. Both the Government Accountability Office and Congress have stated that the government needs to “get out of the data center business.” However, this is simply not a realistic goal in the public sector - government couldn’t afford to make such a massive move given their very restricted budgets. We also must clarify the concept of “legacy systems,” another frequent talking point. Most Federal agencies that have been around for more than 30 years still have mainframes, and they’re often still running older programming languages such as COBOL, Fortran, and Pascal. Many major industries in the private sector still use these same technologies - most notably, the banking industry still is heavily dependent on these legacy systems. Regardless of the hype about cloud and blockchain for moving money around, 95% of credit card transactions still use a COBOL system, probably running on a mainframe behind the scenes. These systems are not going away any time soon. Now these mainframes usually are not dusty old metal boxes that have been taking up an entire basement room for decades. Often, they’re cutting edge hardware that’s incredibly efficient - and even have all the shiny plastic and glowing lights and advanced cooling systems you’d expect to see on a gamer’s desktop computer. Dollar for dollar, modern mainframe systems can be more cost-effective than cloud for comparable workloads over their lifecycle. It’s also worth noting that they are about a thousand times less likely to be attacked or exploited than cloud-based infrastructure. The code running on these mainframes, on the other hand, is likely to be very old, and it’s almost certainly been written such that it cannot be virtualized or moved to the cloud without rewriting partially or entirely at great expense. Modern programming languages come with their own risks, so finding a sustainable middle path between the ancient and bleeding-edge is important for a successful modernization effort. Due to the considerations above, the future of government infrastructure will remain a hybrid, multi-cloud environment - much to the consternation of cloud vendors.
“… I just know that something good is gonna happen”Instead of these myths, the best reason to use cloud is for the unrivaled capabilities that these tools can unlock:
- Agility: being able to quickly spin up a server to try something new is much easier in the cloud, if you have not already created an on-premise virtualized infrastructure. Cloud.gov, an offering from the General Services Administration (GSA) that bundles many Amazon Web Services (AWS) offerings in a government-friendly “procurement wrapper” can make this even easier for agencies.
- Scalability: the main hallmark of cloud is using this agility to quickly respond to sudden increases in requests to websites and applications. Especially during the COVID-19 pandemic, agencies have taken advantage of this functionality to deal with the dramatic increase in traffic to benefit applications and other services. However, it is critical to note that most cloud services do not scale automatically (another myth covered below).
- Distributed: most Federal agencies have staff in field offices all over the country, and of course their customers are both at home and abroad. Since the cloud is really just a series of distributed data centers around the world, this can dramatically reduce the latency between the customer and the service. For instance, agencies are using cloud-based virtual private network (VPN) solutions to securely connect their staff to internal networks. Those that have moved to cloud-based email, video chat, and document collaboration tools see an additional speed bump for staying in the same cloud for all of these services.