Cloud Cost Optimisation - An Architect’s Guide to Operational Efficiency

Cloud and cost can be quite a polarising topic. Do it right, and you can run super lean, drive down the cost to serve, and ride the cloud innovation train. But inversely, do it wrong, and treat the public cloud like a data centre, and your costs could be significantly larger than on-premises.

I have been fortunate to work for some of Australia's largest websites and two of the major public cloud vendors. When it comes to architecture, I have seen the exceptional and the questionable.

Just like a car has value for its economy, there are often tradeoffs. Having a low litres per litres/kilowatt hours per 100km often goes hand in hand with low performance.

Our goal should be:

How can we increase the efficiency of our architecture without compromising other facets such as reliability, performance and operational overhead?

Broadly speaking, yes, you should be spending less.

In this multi-part series, we’re going to cover three main domains;

Operational optimisation
Infrastructure optimisation
Architectural optimisations

With such a broad range of optimisations, hopefully, something you read here will resonate with you and provide some meaningful cost-saving initiatives that you can execute in your environment. I want to show you where the opportunities for savings exist.

Public cloud is full of new architectural levers for us builders, which is amazing but can be daunting. New levers for all of us, and with the hyperscale providers releasing north of 2000 (5.x per day) updates per day, we all need to pay attention and constantly climb this cloud maturity curve.

Maths related to the public cloud can be complex at times. Services may have multiple pricing dimensions. The key is to find those cost savings and invest in them.

Serverless Cost Optimisation Tips

Cloud – A new dimension

When we look through the lens of the public cloud it has brought us all a new dimension of flexibility, so many more building blocks. The question is, how are you constructing them?

Image 1: Are you a Lego master?

When we, as builders, talk about architecture, we will often architect around a few dimensions, some more important than others, depending on your requirements.

Commonly we will architect for availability, performance, security and function, but I would like to propose a new domain for architecture, and that is economy.

When you’re building your systems, you need to look at the economy of your architecture because today, in 2024, you have a great deal of control over it. New frameworks, tools, technologies, hosting platforms… all new, new, new.

Lifecycle Management

Your goal should be to trial and change the way a system is built during its own lifetime. As architects and developers, we must move away from this model of heavy upfront design or some finger-in-the-air predictions of what capacities a solution needs.

Instead, embrace the idea of radical change during an application lifecycle funded by cost savings.

Yes, there are degrees to which you can do this depending on whether you built the system yourself or you’re using COTS (Consumer Off The Shelf Software), but I will walk through options that you can apply to your existing stacks regarding what is possible.

How Are You Keeping Score?

Even with COTS, there are options. Have you noticed the appearance of new levers in the form of updates? Do you have a mechanism in place to be kept aware of updates? If you do, then that's great, but if you don’t, let me share with you two patterns we use at V2 Digital.

Two mechanisms you can use is to feed updates into Slack or Teams either via RSS

Or via serverless compute with a Webhook into your messaging platform of choice.

Exposing your teams to the latest updates can often be a cue to alter your architecture whilst upskilling your internal team and building their capability.

AWS Slack Feed: Slack has a built in RSS feed parser making life easy for the technologists at V2

Image 2: Slack has a built in RSS feed parser making life easy for the technologists at V2

The Basics

With the right approach, some sizeable cost savings can be made to reduce your cloud bill.

Image 3: To pass "GO" you must follow strategic moves for Cloud Cost Optimisation.

The first step is to go back to basics, to get the fundamentals right from the start. These are fundamentals in cloud and, to a degree, software development.

Understand your baseline.

You can’t improve what you can’t measure. Do you know what your per transaction cost is?

What is your per transaction cost? Do you know what the cost to serve is?
If you do, well done, but if you don’t then how can you improve?

Measuring The Cost To Serve

There are three different approaches to determining this baseline:

Beginner
- Simply do it by hand, sit down with AWS Cost Explorer / Azure Cost Management/GCP Cost Management and figure out your transaction rates, do some rough calculations and either be pleasantly surprised or really shocked depending on what comes back to you.
Intermediate
- Gather these transaction volumes in real time from your systems. You may have instrumented using an APM (Application Performance Management) such as Azure Application Insights, New Relic, AWS X-Ray or Elastic APM but you still calculate this by hand
Advanced
- Monitor in real-time, noting that your calculations are also in real-time. Leverage a platform such as Azure Event Hubs, Amazon Kinesis or Apache Kafka and derive this real-time.

When you have this information, you can ask the question.

What’s my average transaction flow versus my average infrastructure cost? Then you can put it up in the corner and say, “Development team, we need to optimise”.

This becomes your measure, and you need to make this relevant and tangible to your business stakeholders for organisational buy-in.

Image 4: Do you have a cost dashboard?

Operational Optimisation

Another consideration is how you are paying for public cloud. Using a credit card in a PAYG (Pay As You Go) model might be a great way to get started, but it can be expensive for Microsoft Azure and Amazon Web Services.

Here are some approaches to investigate:

Enterprise Agreement (Azure)
Reserved Instances (Azure / AWS)
Compute Savings Plan (AWS/Azure)
EC2 Savings Plans (AWS)
SPOT Instances (Azure / AWS)
CloudFront Security Savings Bundle (AWS)

In my experience, you need to move away from paying on demand because this is the most expensive way to leverage public cloud. In comparison, on-demand savings can range from 15% to 90%. Typically, discounts apply either for commitment, giving cloud providers certainty, or, in the case of SPOT, for your ability to leverage idle unused resources.

While not groundbreaking, ‘Reserved Instances’ and ‘Savings Plans’ allow you to minimise the cost of traditional architectures. My next piece of wisdom is to have a ‘Reserved Instance / Savings Plan’ percentage target.

Some of the best organisations I have seen in the past have had up to 80% of their IaaS resources covered by ‘Reserved Instances / Savings Plans’. If you don’t have a target, I recommend you look into this.

But before you make a purchase, understand your workload. Understand the ebbs and flows of your baseline load.

The rule of thumb is to assess a workload for 3 months, during the time right size accordingly.

Leverage Azure Monitor / Amazon CloudWatch with a combination of Azure Advisor / AWS Trusted Advisor to fine-tune your application.

Optimise The Humans – High Value vs. Low Value

Operational optimisation. How much time do you spend thinking about labour costs, do you include these costs in your cost to serve? Think about one’s labour cost. You hire people, they do ‘stuff’. The thing is, cloud practitioners can be an expensive resource.

To prove my point, according to SEEK, the average Database Administrator (DBA) in Australia earns $105,000 AUD annually.

This is just the median DBA and none of us here would ever work with just a median DBA, so we have established that people have a cost. But let’s think about what is the actual meaning of this cost.

Looking through the lens of something DBA’s do so often, a minor database engine upgrade. This is important as we should be upgrading our databases on a regular basis (security, features, performance).

But let’s look at the Amazon RDS, which is a managed service for running relational databases in the cloud vs. running a database engine on IaaS.

Self-Managed (IaaS)	Amazon RDS
Backup primary	Verify update window
Backup secondary	Create a change record
Backup server OS	Verify success in staging
Assemble upgrade binaries	Verify success in production
Create a change record
Create rollback plan
Rehearse in development
Run against staging
Run against production standby
Verify
Failover
Run in production
Verify
8 Hours Minimum	1 Hour

What’s the administrative effort of a minor database engine upgrade?

While managed services may appear more expensive on paper, the administrative cost of performing undifferentiated heavy lifting is far greater. I am saving time, and I will receive logs and an audit trail that I can attach to my change record for auditability.

You may say to me, well we’re going spend that money anyway, these people are not going away.

I would say that’s great, but you could invest that particular chunk of time into something else of greater business value like maybe tuning your database (query plans, index optimisation). This is a better use of a DBA time with a higher value return.

Summary

Public Cloud brings a magnitude of opportunities to builders and architects. Public Cloud provides you with a raft of new levers that you can pull, twist, and pull to architect for the new world.

Contact us at V2 Digital and let us help you and your team climb the cloud maturity curve and achieve the same or better outcome at a lower cost.
Architectures can and should evolve, but they need to make sense. What is the cost of change?

Join me in the next part of this multi-part series as we explore Infrastructure and Architectural optimisations you can make.

Cloud Cost Optimisation - An Architect’s Guide to Operational Efficiency (Part 1)