Platform Engineering - Release Patterns (Part 1)
I often dream about a kind of “builders nirvana”. A place where the distance between our ideas and reality is so small that you enable an explosion of creativity. A frictionless stream of value creation where problems are solved easily and almost instantly.
Generative Artificial Intelligence (Gen AI) is a game changer that will contribute to this explosion in creativity and innovation.
Take v0.dev for example. Allowing you to rapidly prototype different Web UIs with text prompts and upload image examples of what you’re looking to build. Outputting functional React/NextJS code, often using shadcn/ui components that you can simply copy/paste into your app. All you need to do is wire it up, giving you an exceptional starting point.
Before you get too excited and start building, we need to talk about something that will heavily impact the sustainability of your project by the decisions you make in the beginning. Deployment & release environment design.
The Importance of People in Release Workflow Design
Let’s make sure we’re not shooting ourselves in the foot before the race begins. Fixing this later is often too late, either resulting in a rebuild, or locking you into a subpar solution due to the decisions you made at the start.
Rory Sutherland, Vice Chairman of Ogilvy Group and a pioneer of digital advertising once stated, “The perception of time as measured by engineers is measured in seconds. In humans, it’s kind of measured in pain, boredom or irritation. If we only try to optimise using objective numerical criteria ... we’ll miss out on a lot of things.”
If you apply this observation to evaluate areas we fail to optimise for when designing our release workflow, it’s obvious we consistently overlook people. We almost always forget to factor in the psychology and softer metrics (e.g. communication) of who our work ultimately impacts - people.
To alleviate discomfort and boredom from the People Experience (Not just the Developer Experience), while reducing the average time to release, there is a simple strategy that goes a long way to achieve both.
To demonstrate how this works let’s first look at where most of the industry is today.
Common Challenges in Current Release Patterns
This pattern above is likely very familiar, as it is effective and widely used. The DevOps revolution championed things like automated testing, CI/CD for quick, easy, safe and repeatable deployment with multiple “identical” environments. As useful as this pattern is, it does present some problems.
Economist and social theorist, Thomas Sowell states, “There are no solutions. There are only trade-offs.”
This pattern is technically simple to implement, but it is “people complex”. There is moderate to high communication and people management overhead with this pattern.
With this pattern, going back and forth with a kind of “Release person” or other engineers to work out when you can deploy your change to an environment can waste days. It holds up the process until you are able to tell your stakeholders that the change is ready to test, or even for you to simply test it yourself in a cloud environment.
The feedback loop is notoriously slow and painful.
Occasionally, you have to “Hotfix” straight to Production and pray to the deployment gods that everything will be okay because you are blocked from deploying to UAT or Staging by other changes.
Dev can also pose an issue when someone else deploys to Dev not long after you, either completely overwriting your version or partially breaking your code where they unknowingly made a conflicting change in the codebase.
Evolution to Multiple Dev Environments
One of the most common, natural progressions to mitigate these risks is to create multiple Dev environments like below:
This pattern can work well for individuals or small teams. However, as teams grow, more long-lived Dev environments are created, blowing out Cloud bills at the end of the month.
A workaround to this can be through implementing a booking system so engineers can lock in a testing time slot for a shared Dev environment. This avoids having costly Dev environments for every engineer.
However, even with a booking system, this pattern is not bulletproof. You face the problem of engineers booking out entire days when they may not need them, or only booking one hour and end up needing two. This can create tension and resentment between engineers as they grapple to effectively utilise the platform.
The next prevalent natural evolution is Microservices, or Macroservices (kind of mini-monoliths):
Many avoid microservices at all costs, turning to them as a last resort when scaling engineering teams. Distributed systems, microservices, and event-based architectures should only be used when absolutely required.
Non-Blocking Branch Environments as an Alternative
An alternative solution that can support multiple teams to get much further than you would expect, while forcing them to get really good at the things you need to have perfected before “breaking down the monolith”, is this temporary production-like branch environments pattern:
This non-blocking pattern is commonly referred to as “branch deployments”, triggered by a Pull Request to the main (Production) branch. V2 Digital has been using this as our default “goto” pattern for a long time. A battle-tested and proven method.
If you’ve worked in software development for over a decade, you may remember the craze of async HTTP/API requests with non-blocking I/O. The “Blocking Release Environment Design” diagram above resembles a synchronous HTTP (Blocking) request. The next logical step is to adapt it so it looks like an asynchronous HTTP (Non-Blocking) request.
Ironically, breaking down the monolith into microservices to increase change throughput looks just like how we originally solved the scaling problem for synchronous (blocking) HTTP requests: “Just scale out the number of HTTP processes to increase request throughput” (i.e. build more servers).
Before we made the HTTP request process design itself more efficient (non-blocking).
V2 Digital is not the only one that has landed on this temporary “branching” release environment pattern. Many others also prefer this pattern as the industry moves towards a more focused mindset on Platform Engineering & Developer Experience.
Some examples of PaaS providers that have heavily invested in this pattern are Vercel & AWS Amplify with “Preview” deployments, also NEON, PlanetScale and Superbase with “Branching”. We’re seeing Snowflake with “Zero Copy Clone” and Databricks with “Shallow Clones” on the Enterprise AI & Data side of the world, coming in hot with this pattern as well.
Conclusion
This non-blocking pattern significantly reduces the communication and people management overhead for Software, Cloud (infrastructure) and Data change management. It also significantly reduces the time to spin up a PoC, shortens the stakeholder feedback loop, and enables engineers to have multiple changes in progress, very comfortably, with minimal context switching.
The trade-off is that it can be technically difficult and sometimes unreasonable to implement with an existing stack. Even building new can be restrictive due to the technical requirements. However, you only need to solve these problems once, then you can enjoy the benefits for the life of the project.
Part 2 of this series will cover technical considerations. What you must sacrifice and have implemented in your tech stack to enable this non-blocking pattern.
Part 3 will cover safety. All the things you need to make the release process safe and reliable, if not more so than your existing pattern.
If you want to hear more about this in-depth, with your specific situation and context in mind, contact us for a chat! We’d love to help!