posts

Cloud Strategy Guide

2021.03.07

Cloud
Strategy Guide
In part one, I discussed many of the myths around cloud use in government. In this article, I will describe critical strategies to address these myths that every organization should embrace before, during, and after moving to the cloud. These strategies are generally intended for civilian Federal agencies of the United States, but the recommendations below apply to any public sector organization - and even some private organizations as well. Both guides are available to download as a single PDF
  1. Chapter 1 - Migrate Pragmatically
  2. Chapter 2 - Plan to Your Budget & Staff
  3. Chapter 3 - Embrace New Security Models
  4. Chapter 4 - Understand What You’re Buying
  5. Chapter 5 - Build a Family Farm
  6. Epilogue - Getting More Help

Chapter 1 - Migrate Pragmatically

The first thing to accept is that not all projects are appropriate for the cloud, and not all organizations have the skills necessary to fully take advantage of the cloud. With that as a starting point, an organization needs to come up with a way to rationalize its application portfolio, to determine what should stay on-premises and what should be modernized.  As a general rule, “lift-and-shift” - moving an application without rewriting it for the cloud environment - is almost never cost-effective for Infrastructure as a Service (IaaS) offerings unless it’s already a very modern system in the first place. On the other hand, basic websites with mostly static content are ideal for moving into Software as a Service (SaaS) or Platform as a Service (PaaS) offerings. The CIO Council’s Application Rationalization Playbook (disclaimer: another document I worked on) is a useful starting point for this evaluation. Specifically, an agency should work up a thorough analysis of alternatives between various SaaS, PaaS, and IaaS offerings against the existing on-prem setup, or a hybrid environment. A major consideration here will be the Total Cost of Ownership (TCO), which should take into account not just service costs, but also staffing, support, and training costs. However, the lowest priced option may not always be the best choice (as I’ll be covering below). Cloud.gov, is an offering from the General Services Administration (GSA) that bundles several Amazon Web Services (AWS) offerings in a government-friendly “procurement wrapper” can make migration even easier for agencies. It’s an excellent platform for small agencies, or for large agencies that just want to prototype a new concept quickly. When you do start moving applications, it’s important to start tagging your assets - accounts, virtual machines, workflows, etc. - as early as possible to make accounting easier. Always include the project name and the customer organization at a minimum. Some providers also allow you to easily isolate a project or office’s services into a resource group, and this can also simplify this process. This is very important to allow easy payback or showback of funds, but for these models remember to include in these costs the TCO aspects not captured - e.g. staff time and contractor resources. I strongly recommend agencies take a very cynical stance on so-called low-code/no-code platforms, customer-relationship management tools (CRMs), and workflow management solutions. Many of you may remember the promises of “Business Intelligence” solutions in decades past, where agencies were fleeced for billions of dollars in configuration costs - these solutions are simply using a new buzzword for the same idea. These all promise to reduce costs but are often vastly more expensive than just building a tool from scratch - and the agency becomes completely locked-in to a single vendor until they replace the application entirely. The brilliant Sean Boots of the Canadian Digital Service has presented a “1-day rule” to help identify these boondoggles.
  Checklist
Rationalize the application portfolio
Don’t lift-and-shift
Use cloud.gov
Properly tag cloud assets
Avoid low-code/no-code/crm snake oil

Chapter 2 - Plan to Your Budget & Staff

The easiest way to avoid risks and unexpected costs is to simplify as much as possible. Civilian agencies should not be investing in bleeding-edge technology solutions - they’re too risky and expensive to maintain. Instead, pick the simplest possible solution that can be supported by your staff. The average agency should be aiming to stay well behind the “hype curve” into the “plateau of productivity.”  Since most of the complexity is hidden from the customer, SaaS and commercial-off-the-shelf (COTS) tools are less risky than PaaS and IaaS options overall (provided you follow the 1-day rule above). This goes beyond just cloud, and applies to most anything you’re building. Most agencies, for instance, also should absolutely not be attempting to build a fancy React/Redux/GraphQL single-page application when a plain Wordpress or Drupal website with a few plugins will fulfill the customer’s needs. Building native mobile applications should be completely avoided by most organizations as these can cost millions of dollars a year just for upkeep - instead they should build mobile-friendly, responsive websites. Any custom application or tool may not be a sustainable solution given the high complexity and cost of engineers. This also means that agencies should be simplifying their requirements to the minimum necessary when comparing alternatives, not just the software itself. Avoiding “one-off” projects and special requests will save massive amounts of time and money. Instead, agencies must be actively investing in their staff. Agencies should allocate two to three times the standard training budget for IT and technology-adjacent staff, including project managers, program managers, and acquisition professionals. Some vendors provide a limited amount complementary training, but inevitably agencies need more than these free offerings. This training should include non-IT topics as well, including diversity awareness training, accessibility, plain language writing, project management, agile development techniques, and budgeting and procurement. GSA offers a variety of programs covering many of these areas. This must also include hands-on training - sitting through a webinar is no replacement for actual practical engineering experience. These staff need to be given the time and flexibility to practice these skills to develop them - building small test projects and trying out tools. The best teams are constantly changing and learning, so setting aside up to 10% or more of the staff’s time just for practice is not unreasonable - some private sector companies set aside 20%. All of these investments will pay off richly for agencies. Also, make sure your staff is cross-trained and able to fill gaps as they occur. As your staff begins to understand the new cloud paradigms, it will be important to modify your existing processes to handle the agility the cloud brings. Instead of slow, end-to-end, waterfall process “monorails”, set operational parameters as “guiderails.” Your acquisition process should be modified so that cloud can be purchased like a utility. You should not need to have a Change Control Board meeting anytime someone wants to create, resize, or destroy a virtual server. Plan a cost range that the entire project will fit within and review as needs change, along with monthly or quarterly portfolio reviews to stay on top of the budget. Instead of codified “gold disk” server images maintained by your team, consider template security rules.
  Checklist
Simplify the requirements and architecture
No mobile apps, avoid single-page webapps
Train and cross-train your staff
Allocate time for personal development
Update processes to set guardrails instead of monorails

Chapter 3 - Embrace New Security Models

Agencies must be able to manage the security of everything they run. Going back to the previous strategy, an agency should not deploy anything it cannot manage, and that goes for security as well. This is equally true in on-premises environments, but new operating models require new security models. Both your operations and security teams will need to be familiar with just about every setting that can be changed in your cloud environment - and how to lock them down to prevent exploitation. Organizations should no longer assume that a solution is secure just because they did an up-front initial review. The Federal government uses a security review process for services and applications known as the Authorization (or Authority) To Operate (ATO), but the implementation varies from agency to agency. Traditionally this is a series of standard security controls that are reviewed, checklist-style, by an agency once every three years. However, agencies that have excelled at cloud security have moved to Continuous Authorization, using monitoring tools to actively verify that the security controls are being met and maintained, twenty-four hours a day and seven days a week. However, these monitoring checks still must evolve with the products being monitored to make sure new vulnerabilities have not appeared outside the scope of existing checks. As per usual with cybersecurity, vigilance is key. Since attackers are constantly evolving their methods, tools that automate security responses as well should be used whenever practical - especially built-in, native from the large vendors that are constantly evolving to meet these threats. To help combat this second issue, the Federal government has been moving away from so-called “castle-and-moat” perimeter-based security methods which only monitor network traffic. Instead, an approach known as Zero Trust has appeared, taking a data-first methodology of protecting systems instead of just the perimeter, verifying user identities in real-time, and allowing staff to only have access to the minimum amount of information necessary to fulfill the task at hand. In this way, when the perimeter is inevitably breached, the data assets contained within are still secure. It also should go without saying that teams should be using multi-factor authentication on all privileged accounts. Whether developers or administrators, using more than just a username and password will dramatically reduce the risk of exploitation. The Federal government has “PIV cards” that are generally used on most devices, but if the vendor does not support them, implementing a token system via any of the commercially-available platforms is fine: Google Authenticator, 1Password, Microsoft Authenticator, and YubiKey are all worth looking at. However, organizations should completely avoid text-message codes sent to phones, as these are easily intercepted. For public customers that will need to login or prove their identies, all U.S. government agencies should be using Login.gov.
  Checklist
Research all product configuration settings
Implement continuous monitoring, not just compliance
Use security automation tools
Leverage zero-trust practices to protect your data
Use MFA & Login.gov

Chapter 4 - Understand What You’re Buying

Cloud isn’t going to make your teeth whiter or your breath fresher or fix all of your problems, regardless of what the salespeople tell you. You need to know exactly what you’re buying. Before making an investment, make sure you fully understand what capabilities you’re purchasing and what parts you - and the vendor - will be responsible for. If your evaluation team does not have technical expertise, bring engineers into the conversation early, to sort the truth from the sales pitch. As discussed in the previous article, you may not be getting autoscaling or load balancing or other features you’ve assumed just happen “automatically” - and if available these features definitely will not be free. You may have to build more “glue” between services than you assume, and someone will have to maintain this connective tissue. Also keep in mind that the government cloud regions (or “govcloud” by some vendors’ naming) provide different versions of these tools than the commercial ones. As a result, not all features or solutions will be available - so again, plan ahead. Though, in most cases, civilian agencies not dealing with highly-sensitive data should consider using the commercial versions whenever possible - the security differences are not so great as to be insurmountable, but the functionality limitations are huge. Before implementing a service, do careful research on the service limits - maximum traffic or number of virtual machines or emails that can be sent, etc.. Do not just trust what you are told by a vendor’s engineers or customer representatives - most of the time, they also do not know about these limits until you run aground on them. You should estimate your expected usage - number of site visits and/or users and/or emails, etc., and actually spend the time to search through user forums to make sure no one has hit a limit related to what you’re doing. Customer Experience (CX) is another area where the private sector has been building people-friendly interfaces into their SaaS solutions, and agencies can skip a lot of the hard work and directly benefit from the results. Metrics and feedback-loops are often built-in as well. Maximizing these built-in elements can radically improve an agency’s public satisfaction scores at little or no additional cost.
  Checklist
Validate assumptions; know your responsibilities
Consider commercial cloud instead of govcloud
Research service limits in advance
Leverage built-in CX tools

Chapter 5 - Build a Family Farm

Given that agency IT budgets continue to be cut, and staffing has not increased in 40 years, agencies are largely unprepared to completely rewrite and replace all of their legacy systems.  Moreover, “IT Modernization” as a concept is an unending pursuit, as in Zeno’s paradox of Achilles chasing the Tortoise, software written today is legacy tomorrow. Agencies will need to use all available funding sources to overcome their deep technical debt, prioritizing those that present the greatest risk: those that are unmaintained, frequently used by customers, and lacking in resilience and redundancy. Under this scrutiny, agencies may find that their public websites are a bigger risk than older backend systems. Also, rather than replacing entire large monolithic systems, they should pull off pieces and replace them independently as resources are available.  This can be done by isolating functions and building microservices, but that approach can often lead to expensive, unnecessary complexity. Agencies should not be afraid to build a newer parallel monolith adjacent to the existing one - again, keep in mind that it’s not the size that’s the concern, but the complexity and sustainability. That all being said, the government does have major shortcomings in redundancy today, and too many systems have a single point of failure. At a minimum, agencies should be using cloud for data backup of critical systems whenever possible. I also strongly recommend agencies consider creating load balancing and caching layers in the cloud in front of on-premise public-facing systems to deal with unexpected loads. One final concern is automation. Many organizations begin their cloud journey with unrealistic goals for maturity. The practice of Infrastructure as Code is incredibly popular at the moment, where we talk about treating virtual servers as “cattle, not pets.” An unprepared agency may immediately think that they need to be using all of the most cutting edge tools and technologies at first, but this would be a critical mistake. Instead, following the principles relating to complexity in the sections above, agencies should aim to create a “family farm” - only automating that which they can realistically manage. For instance, there is absolutely nothing wrong with only using a few virtual machines and load balancers instead of a fully configuration-only architecture. The great thing about cloud is you can evolve as your team grows, but it’s incredibly difficult to reduce complexity you’ve invested in if your team shrinks.
  Checklist
Assess technical debt by risk
Replace monoliths a piece at a time
Don’t over-automate
Use cloud backups and load balancing as soon as possible
Build a small “family farm” to start

Epilogue - Getting More Help

These strategies are a starting point towards a successful cloud rollout. If you run into trouble, want to talk shop with your peers, or would like to share your own strategies and experiences, there are several communities to engage with:
  • The Federal CIO Council Cloud and Infrastructure Community of Practice is the main Federal group for discussing these topics. However, they are currently in the process of changing their charter to allow any U.S. government staff to participate: Federal, state, and local. Membership is free.
  • The ATARC Cloud and Infrastructure Working Group is free and open to any government staff, though private sector companies must pay to be members.
  • Cloud & Coffee (presented by ATARC & MorphWorks) is a biweekly podcast hosted by myself and Chris Oglesby. Each episode, we chat with a guest about their personal experience with technology modernization, and there’s a live Q&A open during the chat. Any ATARC member can participate; old episodes are publicly available on Spotify.

Read This

Cloudbusting

2021.02.28 Title: Cloudbusting. Retro 80s video game (pixel art) style portrait of Kate Bush in front of a car Get in loser, we’re going cloudbusting! Image by Bill Hunt. The “Cloudbuster” was a device invented by William Reich to create clouds and rain by shooting “energy” into the sky through a series of metal rods. Although Reich was paid by many desperate farmers to produce rain, the device was never proven to work. It’s been ten years since the Office of Management and Budget (OMB) released the original Federal Cloud Computing Strategy. I had the opportunity to update this strategy two years ago when I served as the Cloud Policy Lead at OMB. Having spent 20 years in the private sector building bleeding-edge cloud infrastructure for some of the best known companies in the world, I was able to leverage my practical experience in the creation of the 2019 Federal Cloud Computing Strategy, “Cloud Smart”. During the course of my work at OMB, I spoke with hundreds of practitioners, policy experts, and Chief Information Officers (CIOs) across government. From this vantage point, I had an intimate view into the entire Federal technology portfolio and learned that many myths about cloud computing were being accepted as truth. In this article, I’ll debunk key myths about cloud adoption, and explain why - and when - cloud is appropriate for government. These myths are generally intended for civilian Federal agencies of the United States, but the recommendations below apply to any public sector organization - and even some private organizations as well. In part two, I’ll discuss some strategies for overcoming the pitfalls discussed here. Both guides are available to download as a single PDF


Myth 1: Cloud Is Cheaper

The main reason cited by Federal agencies to move to commercial cloud is the promise of cost savings. This myth originated with vendors and was repeated by Congress, eventually becoming a common talking point for Executive Branch agencies. Unfortunately, it is based on false premises and poor cost analyses. In practice, the government almost never saves actual money moving to the cloud - though the capabilities they gain from that investment will usually result in a greater value. At a glance, it can appear that moving applications to the cloud may be cheaper than leaving them in a data center. But in most cases, a Federal agency will not see much, if any, cost savings from moving to the cloud. More often than not, they end up spending many times more on cloud than for comparable workloads run in their data center. Experts have known this was a myth for at least a decade, but the lobbyists and salespeople were simply louder than those who had done the math. First, it’s important to note that most Federal agencies own outright the facilities their data centers are located in. In the 1980s and 1990s, agencies began repurposing existing office space for use as data centers, adding in advanced cooling and electrical systems to support their growing compute needs. This changes the equation for the total cost of ownership because the facilities are already built and can be run relatively cheaply, though they may be partially or fully staffed by contractors due to the constant push to outsource all work. The government has also built a few best-in-breed data centers such as the Social Security Administration’s flagship data center that can compete with some of the most efficient commercial facilities in the world, with solar collectors for electricity generation, and advanced heat management systems for reduced energy usage. However, these super-efficient facilities are only represent a handful of the over 1500 data centers the government owns and operates, and cost half a billion dollars each to build. Second, agencies routinely run their servers and equipment well past the end-of-life to save money. There are no Federal requirements to update hardware. In fact, until recently, Federal data center requirements for efficiency measured the utilization of servers by time spent processing, which disincentivized agencies from upgrading - older hardware runs slower and thus results in a higher utilization rate for a given task than a newer, more efficient server that completes the task quickly. During a budget shortfall, an agency with a data center has the option of skipping a hardware refresh cycle or cutting staff to make up the deficit; meanwhile, an agency that is all-in on cloud loses this option, as they will have to continue paying for licenses, operations and maintenance costs. As a result, agencies will need to future-proof their plans in more innovative ways, or better communicate funding priorities to OMB and Congress. Also, it’s important to realize that once the government does buy hardware, the government owns it outright. When you move your application to a commercial cloud, you’re paying a premium for data storage even if it’s just sitting around and not being actively used - for large amounts of data, cloud costs will quickly skyrocket. The government maintains decades worth of massive data sets - NASA generates terabytes of data per day, and even a tiny agency like the Small Business Administration has to maintain billions of scanned loan documents going back to its inception sixty years ago. This is why some major companies have moved away from commercial cloud and built their own infrastructure instead. I would note that the idea of workload portability  - moving a service between different cloud vendors, generally to get a cheaper cost - is also largely a myth. The cost to move between services is simply too great, and the time spent in building this flexibility will not realize any savings. Moreover, every cloud vendor’s offering is just slightly different from its peers, and if you’re only using the most basic offerings which are identical - virtual servers and storage - you’re missing out on the full value that cloud offers.

Myth 2: Cloud Requires Fewer Staff

Another promise of cloud cost savings is that an agency no longer has to keep data center engineers on staff. These practitioners are usually comparatively cheap to employ in government, and rarely reach a grade above GS-13 ($76K-$99K annual salary) and agencies moving to cloud will instead employ comparatively expensive DevSecOps practitioners, site reliability engineers, and cloud-software engineers to replace them when moving applications to IaaS or PaaS. These types of staff are extremely difficult to hire into government as they make very high salaries in the private sector, well in excess of the highest end of the General Schedule pay scale (GS-15: $104K-138K), even assuming an agency has the budget and staff slots open to create a GS-15 position in the first place. Due to the many flaws in the government hiring process, it also can be very difficult to recruit these people into government, even with the new OPM hiring authorities to streamline this process. An agency that chooses to outsource these skills will often find that contractors may cost even more than hiring capable staff. The agency will still need to have staff with cloud experience to actively manage these staff, and contracts will need to be carefully crafted around concrete outcomes so that the agency is not fleeced by a vendor. Another overlooked cost here is training. New solutions aren’t always easy for agencies to adopt - whether that’s a fancy software development tool or something as simple as a video chat platform. Personally, a day doesn’t go by that I don’t find myself explaining to a customer some aspect of Teams or Sharepoint they don’t know how to use. Agencies often must provide formal training, and of course there’s inevitably a loss of productivity while teams get up to speed on the new tools and solutions. Since many SaaS vendors roll out new features extremely rapidly, this can present a challenge for slow-to-adapt agencies. Although some training is provided free from vendors, this rarely suffices for all of an agency’s needs, so in most cases further training will have to be purchased.

Myth 3: Cloud Is More Secure

A constant refrain is that cloud is safer and more secure, owing to the fact that the servers are patched automatically - meaning that key security updates are installed immediately, rather than waiting for a human to make the time to roll out all of these updates.  For a large enterprise, this is historically a very time-consuming manual process, which automation has improved dramatically. However, the same tools that major corporations use for patching in the Cloud are largely open source and free, and they can be used in an agency’s own data center. Moreover, it’s important to note that cloud does not remove complexity, it only hides it in places that are harder to see.  When it comes to security, this is especially true, as organizations must adapt to highly-specialized security settings that are not always easily found, particularly with the IaaS offerings. These settings are also constantly changing because of the constant-patching of these vendors, and all too often with little notice in the case of SaaS offerings. This “double-edged sword” has resulted in a number of high-profile cloud-related breaches over the last few years - affecting both the public and private sectors alike as we learn best security practices the hard way. Cloud vendors have also been… less than enthusiastic about meeting government security and policy requirements, unless the government is willing to pay a very high premium for the privilege of security. (I talked about this contentious relationship more in my post on Automation Principles.) For instance, as of today no major cloud vendor completely meets the government requirements for IPv6 which have been around for 15 years and which OMB recently revised to try to get them to move faster.

Myth 4: Cloud Is More Reliable

This one is less of a myth and more of an overpromise, or fundamental misunderstanding of the underlying technology. For a long time, one of the main pitches of cloud is that of self-healing infrastructure - when one server or drive fails, a new one is spun up to replace it. Although this is something that can be implemented in the cloud, it’s definitely not the default. Specifically, for IaaS solutions, you have to build that into your application - and you don’t get it for free. Relatedly, many agencies assume that any application put into the cloud will automatically scale to meet any demand. If your agency’s website gets mentioned by the President, let’s say, you wouldn’t want it to collapse due to its newfound popularity. Without building infrastructure designed to handle this, simply being “in the cloud” will not solve this problem. However, solving it in the cloud will likely be faster than waiting for physical servers to be purchased, built, shipped, and installed - assuming you have staff on-hand who can handle the tasks. It is important to keep in mind cloud is, by definition, ephemeral. Servers and drives are often replaced with little-to-no notice. I’ve frequently had virtual machines simply become completely unresponsive, requiring them to be rebooted or rebuilt entirely. When you’re building in the cloud, you should assume that anything could break without warning, and you should have recovery procedures in place to handle the situation. Tools like Chaos Monkey can help you test your recovery procedures. One issue that some of the most seasoned practitioners often miss is that all cloud providers have hard limits on their resources that they are able to sell you. After all, they are just running their own data centers, and there are a fixed number of servers that they have on-hand. I have often encountered these limits in practical, seemingly-simple use cases. For instance, I’ve created applications which needed high-memory virtual servers, where the provider didn’t have enough instances to sell us. During the pandemic response, I also discovered that cloud-based email inboxes have hardcoded, technical limits as to the volume of mail they can receive. I had assumed we could simply buy more capacity but this was not the case, requiring a “Rube Goldberg machine” workaround of routing rules to handle the massive increase associated with a national disaster. There is no question that scalability is a huge benefit, until the practical limits become a liability because of your assumptions.

Myth 5: Cloud Must Be All-or-Nothing

Many organizations assume that the goal is to move everything to a commercial cloud provider.  Both the Government Accountability Office and Congress have stated that the government needs to “get out of the data center business.” However, this is simply not a realistic goal in the public sector - government couldn’t afford to make such a massive move given their very restricted budgets. We also must clarify the concept of “legacy systems,” another frequent talking point. Most Federal agencies that have been around for more than 30 years still have mainframes, and they’re often still running older programming languages such as COBOL, Fortran, and Pascal. Many major industries in the private sector still use these same technologies - most notably, the banking industry still is heavily dependent on these legacy systems. Regardless of the hype about cloud and blockchain for moving money around, 95% of credit card transactions still use a COBOL system, probably running on a mainframe behind the scenes. These systems are not going away any time soon. Now these mainframes usually are not dusty old metal boxes that have been taking up an entire basement room for decades. Often, they’re cutting edge hardware that’s incredibly efficient - and even have all the shiny plastic and glowing lights and advanced cooling systems you’d expect to see on a gamer’s desktop computer. Dollar for dollar, modern mainframe systems can be more cost-effective than cloud for comparable workloads over their lifecycle. It’s also worth noting that they are about a thousand times less likely to be attacked or exploited than cloud-based infrastructure. The code running on these mainframes, on the other hand, is likely to be very old, and it’s almost certainly been written such that it cannot be virtualized or moved to the cloud without rewriting partially or entirely at great expense. Modern programming languages come with their own risks, so finding a sustainable middle path between the ancient and bleeding-edge is important for a successful modernization effort. Due to the considerations above, the future of government infrastructure will remain a hybrid, multi-cloud environment - much to the consternation of cloud vendors.

“… I just know that something good is gonna happen”

Instead of these myths, the best reason to use cloud is for the unrivaled capabilities that these tools can unlock:
  • Agility: being able to quickly spin up a server to try something new is much easier in the cloud, if you have not already created an on-premise virtualized infrastructure. Cloud.gov, an offering from the General Services Administration (GSA) that bundles many Amazon Web Services (AWS) offerings in a government-friendly “procurement wrapper” can make this even easier for agencies.
  • Scalability: the main hallmark of cloud is using this agility to quickly respond to sudden increases in requests to websites and applications. Especially during the COVID-19 pandemic, agencies have taken advantage of this functionality to deal with the dramatic increase in traffic to benefit applications and other services. However, it is critical to note that most cloud services do not scale automatically (another myth covered below).
  • Distributed: most Federal agencies have staff in field offices all over the country, and of course their customers are both at home and abroad. Since the cloud is really just a series of distributed data centers around the world, this can dramatically reduce the latency between the customer and the service. For instance, agencies are using cloud-based virtual private network (VPN) solutions to securely connect their staff to internal networks. Those that have moved to cloud-based email, video chat, and document collaboration tools see an additional speed bump for staying in the same cloud for all of these services.
Of course, we all know that “cloud is just someone else’s data center,” but the government should not be held back by fear, uncertainty, and doubt from someone else holding their data. Cloud technologies have a huge potential to improve Federal technology, when approached with a full knowledge of the complexity and costs. Cloud is not a replacement for good management, however. You can’t buy your way out of risk. Until the government invests in its workforce to make sure that IT can be planned, acquired, implemented, and maintained effectively, we will not see any improvement in the services provided to the American people. Now, Congress just needs to be convinced to fully fund some of these improvements. Next week I’ll share part two, where I will discuss several key strategies for a successful cloud implementation in a government agency.

Read This

Login.gov for Everyone!

2021.02.18 – A little over two years ago, I was walking out of the New Executive Office Building by the White House. I immediately ran into Robin Carnahan, who said to me, “Bill, we should be able to provide Login to cities and states.” (If you haven’t met Robin, let me just make it clear for the narrative here that she’s super-smart and anything she says you should just agree with immediately because she knows what she’s talking about.) As soon as I got back to my desk at the Office of Management and Budget (OMB), I started sending out emails to figure out why the General Services Administration (GSA) was preventing this excellent service from being used by smaller governments. For those of you who don’t know about this hidden gem, Login.gov is a GSA solution to help solve the difficult problem of verifying that a person is who they say they are to receive a government benefit, as well as a solution for logging into government websites. It was created through the combined efforts of USDS and 18F - the two most prominent digital service teams in all of government - and is in use by many Federal agencies today. Today it provides access to government services for over 27 million people! Today, GSA has announced that Login.gov is available for use by local and state governments! (To be clear, I had effectively nothing to do with the actual permission being granted here - sending stern emails had little effect. The victory today belongs entirely to the wonderful, amazing, fantastic team at Login and the bureaucrats who were willing to push to make it happen.) There are, however, still a few restrictions for city and state use. To be eligible, the government agencies must be using Login for a “federally funded program.” This is an arbitrary addition by GSA that, in my opinion, misinterprets the original intent of the legal authority - but I’m not a lawyer and am no longer responsible for these sorts of policy decisions. I am hopeful that this restriction will be removed in the future and this incredible service will be open to all who want it! Moreover, as I’ve written in the past, it is my hope that OMB will mandate the use of Login for all Federal agencies. This is already mandated by law, but OMB is not enforcing the requirement. The most expensive part of the tool is the identity verification step - however, once an identity has been proven, it does not need to be re-proven if the customer wants to use any other service that is using Login. This means that as more organizations sign up for Login, the cost to each decreases. By allowing Federal agencies to maintain their own independent login systems, the costs remain high. Moreover, this presents customers with an inferior experience, as they must sign up for a new account for each website or application. It’s also important to note that most identity verification behind the scenes is using data sources that the government controls and gives to private companies, who then sell the government back its own data in the verification process at a very high premium. Eventually, it would be smarter to allow agencies to exchange the necessary information themselves, cutting out the middleperson, which would decrease the cost to almost nothing. (Congress, of course, could speed this along too with the right legislation.) I’ve heard that the Login team has also been working on a pilot to allow customers to prove their identity in-person at a government facility, which has shown to improve the success rates of the verification process. The Department of Veterans Affairs (VA) uses such a process to help Veterans walk through the process of setting up their online accounts right in the lobby of many VA health clinics. The US Postal Service also performed a similar pilot several years ago, where anyone could stop by a post office and have them review their documents, or even let their postal carrier perform the review when they drop off the day’s mail, allowing them to reach almost every single person in the country! Detractors still complain about the cost of Login.gov, and consider that a reason to not require it, even though the cost would be reduced if it was mandated. Even so, if the Federal government agrees that this is the tool that agencies should be using, then it should be treated like a Public Good - like a library or park. To that end, Congress could pass appropriations dedicated to funding this critical program, for instance as part of President Biden’s proposal for TTS Funding. However, I would caution agencies from implementing identity requirements beyond what is absolutely necessary! The Digital Identity Guidelines from the National Institute for Standards and Technology (NIST) are the baseline that most Federal agencies use; in my personal opinion, they set too high a bar. The government must provide critical services to at-risk and economically disadvantaged groups, and by setting requirements that individuals in these groups cannot meet agencies are not serving people equitably. For instance, the the VA serves Veterans that may be homeless, may not have a credit card, may be partially or fully blind, may have trouble remembering or recalling information, may not have fingerprints, and so on. Since the standard methods of identity verification and authentication may present an impossible barrier for the very people the VA serves, it is in the best interest of these people to not implement NIST’s high standards as written. (And I told NIST the same thing.) If you’re a city or state government interested in a world-class identity solution, I’d recommend reaching out to GSA about Login.gov! Even if you don’t meet the requirement mentioned above, it’s definitely worthwhile to getting in touch with GSA anyway - as we’ve learned, policies change every day.

Read This

Presenting EOPbot

2021.01.25 – If you’re like me, you may be having trouble keeping up with all the new Executive Orders and OMB Memos that the Biden Administration is putting out. To help, I’ve created a little bot to look for changes on specific pages of the White House website: @EOPbot!

Read This

Principles for Automation in Government

2020.12.20 – This article is part three in a series on IT policy recommendations. A PDF of the full recommendations may be downloaded here. Artificial Intelligence (AI), Machine Learning (ML), Robotic Processing Automation (RPA)1, and other related predictive algorithm technologies continue to gain attention. However, at the moment their promises are far greater than the reality, and instead of successes we continue to see the worst of ourselves reflected back. Vendors also continue to oversell the functionality of these tools, while glossing over major expenses and difficulties, such as acquiring and tagging training data. The Trump Administration, rather than increasing scrutiny and oversight of these technologies, only sought to reduce barriers to its usage. The Biden Administration will need to** create stronger protections for the American people through better governance of the usage of these solutions in government.** The problem is that humans have written our biases into our processes, and automation only expedites and amplifies these biases. (The book Automating Inequality explains this better than I ever could.) As a technologist, I become concerned when I hear of government agencies implementing these technologies for decision-making, as our unequal systems will only lead to greater inequity. It’s all too easy to “blame the algorithm” to avoid liability, but it’s who humans create the algorithms. Simply put, the Federal government cannot have racist chatbots. The government must not exacerbate the problem of minorities not receiving benefits they deserve. And the government should not be using tools that can reenforce existing racism and sexism while remaining willfully ignorant of these topics. Yet with all of these failures, we still see organizations running gleefully towards toxic ideas such as predictive policing and facial-recognition technology. Fundamentally, this is a question of ethics. Although in government we have extensive ethics laws and regulations in regard to finances and influence, there is almost no actual guidance on ethical practices in the use of technology. And in the U.S. there exists no standard code of ethics for software engineering, no Hippocratic Oath for practicing technology. However, we do have a series of regulatory proxies for ethics, in the form of security and privacy requirements aimed to protect the data of the American people. A diagram reflecting the balance between human versus computer decision-making and impact to human life and livelihood. A diagram reflecting the balance between human versus computer decision-making and impact to human life and livelihood. By requiring a series of controls — not unlike those that we use for IT security — we can increase the safety of the usage of these tools. Similar to the current National Institute of Standards and Technology (NIST) classifications for Low, Medium, and High security systems, artificial intelligence systems should be classified by their impact to people, and the level of automation that is allowed must be guided by the impact. And like the NIST security controls, these must be auditable and testable, to make sure systems are functioning within the expected policy parameters. For instance, a robot vacuum cleaner presents very little risk of life, but can cause some inconvenience if it misbehaves, so very few controls and human oversight would be required. But automation in the processing for loans or other benefits may disastrously impact people’s finances, so higher controls must be implemented and more human engagement should be required. Most notably among these controls must be explainability in decision-making by computers. When a decision is made by a machine — for instance, the denial of a benefit to a person — we must be able to see exactly how and why the decision was made and improve the system in the future. This is a requirement that megacorporations have long railed against due to the potential legal liabilities they may face in having to provide such documentation, but the Administration must not yield to these private interests at the expense of The People. Another key control will be transparency in the usage of these systems, and all Federal agencies must be required to notify the people when such a system is in use. This should be done both through a Federal Records Notice similar to the ones required for new information systems, but also on the form, tool, or decision letter itself so that consumers are aware of how these tools are used. Standard, plain language descriptions should be created and used government-wide. Related to that control, any system that makes a determination, on a benefit or similar, must have a process for the recipient to appeal the decision to an actual human in a timely fashion. This requirement is deliberately burdensome, as it will actively curtail many inappropriate uses in government, since overtaxed government processes won’t be able to keep up with too many denied benefits. For instance, the Veterans Benefit Appeals system currently is entirely manual, but has a delay of a year or more, and some Veterans have been waiting years for appeals to be adjudicated; if a system is seeing an unreasonably large number of appeals of benefit denials, that’s a good indicator of a broken system. Moreover the result of that appeal must become part of the determining framework after re-adjudication, and any previous adjudications or pending appeals should be automatically reconsidered retroactively. There also exists a category of uses of Artificial Intelligence that the government should entirely prohibit. The most extreme and obvious example is the creation of lethal robots for law enforcement or military usage — regardless of what benefits the Department of Defense and military vendors try to sell us. Although there’s little fear of a science-fiction dystopia of self-aware murderbots, major ethical considerations must still be taken into account. If we cannot trust even human officers to act ethically under political duress, we certainly cannot expect robots devoid of any empathy to protect our citizens from tyranny when they can be turned against people with the push of a button. Similarly, the government must also be able to hold private companies liable for their usage of these technologies both in government and the private sector as well. If something fails, the government legally owns the risk, but that does not mean that private companies should escape blame or penalties. The increase in companies creating self-driving cars will inevitably lead to more deaths, but these companies continue to avoid any responsibility. The National Highway Traffic Safety Administration’s recommendations on autonomous vehicles do not go nearly far enough, merely making the “request that manufacturers and other entities voluntarily provide reports.” In short, the government must make a stand to protect its people, instead of merely serving the interests of private companies — it cannot do both. For further reading, the governments of Canada and Colombia have released guidance on this topic, providing an excellent starting point for other governments.

  1. Some of us technologists have referred to RPA as “Steampunkification” instead of IT modernization, as the older systems are still left in place while newer tech is just stuck on top, increasing rather than decreasing the technical debt of an organization— much as Steampunks glue shiny gears onto old hats as fashion. 

Read This

Reskilling and Hiring for Technology in Government

2020.12.19 – This article is part two in a series on IT policy recommendations. A PDF of the full recommendations may be downloaded here. The nature of business is change — we move, refine, and combine goods and services and data, which generates value — and this is true both in the public and the private sector. Technology is just one of the ways that we manage that change. Those organizations that do best at managing change are often the best equipped to deal with the relentless pace of transformation within the IT field itself. Government, however, tends to resist change because of misaligned value incentives which prioritize stability *and avoid *risk, though these elements do not necessarily need to be at odds with one another. Since the Reagan era, government agencies have outsourced more and more IT tasks to contractors and vendors, under the false promise of reduced risk and increased savings for taxpayers. There’s an infamous joke that we’ve done such a good job of saving money through IT over the last decade that we’ve reduced the IT budget of $2 billion to $40 billion. Yet almost all of that spending has gone to private companies, instead of increasing Federal staff and providing needed training, and the government has astonishingly little positive progress to show for it — systems and projects continue to fail. This effort has lobotomized government by eliminating subject matter experts, reducing its ability to manage change, and as a result has greatly increased — rather than reduced — the risk for Federal agencies. Agencies have tried to “buy their way out” of their risk, by leveraging vendors and IT products to “absorb” the risk. Unfortunately, government doesn’t work that way — agencies are solely responsible for risk, and if something fails, the agency, not the vendor, is the one on the hook for any lawsuits or Congressional hearings that result. The only practical way for agencies to deal with their risk and begin paying down the government’s massive technical debt is to hire and train experts inside of government who can address these problems directly, and begin to facilitate change management. In the Cloud Smart strategy OMB states, “to harness new capabilities and expand existing abilities to enable their mission and deliver services to the public faster … instead of ‘buy before build’, agencies will need to move to ‘solve before buy,’ addressing their service needs, fundamental requirements, and gaps in processes and skillsets.” Although there has been a major effort to hire and train cybersecurity professionals in government, technology literacy needs to be improved in all job roles. Technology will always be a core function of government, and to be successful, government must have expertise in its core functions; to do otherwise is to deliberately sabotage that success. Efforts such as GSA’s 18F Team and The US Digital Service (USDS) have proven that there is a need for this expertise, and the government must continue and expand on those efforts by teaching agencies “how to fish.” Beyond just these short-term hires via Digital Service/Schedule A and Cybersecurity/2210 to augment staff temporarily, agencies need to invest in permanently expanding their knowledge, skills, and capacity.

Increase Training Opportunities for Federal Government Employees

First, there needs to be a** governmentwide approach to increasing training, starting with **additional funding in the President’s budget dedicated to improving IT skills. Financial and leave award incentives could also be used to encourage staff to participate in more training outside of their immediate job roles. The Federal Cybersecurity Reskilling Academy as part of the Cloud Smart strategy was a good start, but didn’t go far enough. It’s impossible to fully train a practitioner in everything they need to know about Cybersecurity — or any other complex technology — in just a few short weeks. A real apprenticeship program in the form of agency rotation & detail programs for staff into more IT-mature agencies would have a major impact, by allowing staff to learn skills on-the-job in a hands-on way. Many of these skills are impossible to learn meaningfully from a book or seminar; in general most technical certifications — instead of being required — should be met with skepticism. Almost all policy decisions today have some aspect of technology involved. To address the rapidly aging Federal IT infrastructure and make smart investments with taxpayer dollars, all of our leaders need to be equipped with knowledge of modern systems beyond just the sales pitches they receive from vendors. Ongoing training in technology must be made a priority and part of every Senior Executive Service (SES) performance plan.

Create a new IT Job Series

Although many technologists have been willing to work for a short term of 2–4 years in government at a massive pay cut just out of a feeling of civic duty, this sort of “holiday labor” is not a sustainable path for long-term success. A new Administration will need to address the massive pay disparity for government IT jobs, which acts as a barrier to both hiring and retaining staff. The White House will need to direct the Office of Personnel Management (OPM) to establish a proper IT job series or extend the 2210 CyberSecurity role definition, and create a special rate that reduces this gap particularly at the top end of the scale (GS-13 through GS-15). Ideally this pay should be competitive with the private sector by locale, or as close to the standard rates as possible. And this pay must be made available to staff as they are retrained, not just to outsiders coming in to government with lucrative salaries from the private sector. Without this key step, the work done to reskill our staff will be lost as they use their new skills to find better-paying employment outside of government. Also, this job series should include not only security personnel, software engineers, and graphic designers, but also non-traditional (but very important) members of government technical teams such as program & product managers, contracting officer representatives (CORs), customer experience experts, and content designers.

Leverage Modern Hiring Techniques to Bring in Skilled Personnel

Third, agencies must be directed to aggressively move away from older hiring processes and switch to techniques which evaluate if candidates can actually do the job. OPM, in coordination with USDS, has already done a lot of work towards this, including eliminating education requirements and moving to knowledge-based hiring techniques, but agencies largely have not yet implemented this new guidance. The White House will need to apply more pressure for these changes if agencies are expected to adopt them. Initiatives such as Launch Grad and the Civic Digital Fellowship could also provide a pipeline for potential candidates with critical skills into government service.

Improving Diversity in the Senior Executive Service

Finally, major improvements must be made to the Senior Executive Service (SES) hiring process. These staff represent the senior leaders at Federal agencies, and almost all policy decisions today have some aspect of technology involved. To address the rapidly aging Federal IT infrastructure and make smart investments with taxpayer dollars, all of our leaders need to be equipped with knowledge of modern systems beyond just the sales pitches they receive from vendors. In addition to increasing critical technical knowledge of these key decision-makers, the lack of diversity of this group has gone woefully unaddressed even after years of critical reports. Since these SESs are on the boards that hire the other SESs, and many of these leadership roles are filled due to tacit political connections not the candidates’ skills, it is unlikely that the diversity will improve organically from this in-group. This entire hiring process needs to be reconsidered to level the playing field. The Executive Core Qualifications (ECQs) were a good idea to set a baseline for expertise in senior management, but have largely become an expensive gatekeeping exercise. This has given rise to a cottage industry of writers who simply churn out government resumes to a pricetag of thousands of dollars. I know of very few SES staff who were not either hand-picked for their first SES role or who paid to have their resume written by a professional. This limits these staff to those who can “pay to play” — either with literal dollars or political influence, severely limiting the candidate pool. On the reviewer’s end, it’s long been known that overtaxed human resources staff are often just searching for keywords from the job postings in the resumes as a means of first review, which eliminates anyone who may have missed a specific word or phrase. Government expertise and education appears to be given a higher standing than outside experience as well. And after your ECQs have been approved once you don’t need to have them re-reviewed for each job, further narrowing the list of candidates who are considered. There is no single, easy solution to the systemic problems in this process. Expanding training opportunities for senior General Schedule employees (GS-14 and GS-15) beyond just the outdated and time-consuming Candidate Development Program would be a first step. A new Administration could make diversity a key priority in the President’s Management Agenda, setting goals for hiring and new initiatives for recruiting under the Chief Human Capital Officers Council (CHCOC).

In Closing: Countering Bias Through Diversity

Our country is changing, and so is the nature of government. Diversity is critical for all technology roles in government, not just leadership. Addressing systemic bias in the tools that agencies are implementing will require attention from all levels of staff. Our benefit systems must provide services equitably to all, but this will be impossible without acknowledging these biases. However, due to a recent Executive Order, training around bias has largely been halted in the Federal government, reducing our ability to tackle this challenge. As the government begins to close gaps around technology skills, it is critical that we’re building a workforce that reflects the people we serve, so that we can better address these issues at their root.

Read This