Architecting for Global Scale
Introduction
One of the key advantages of using Amazon Web Services (AWS) is that you can start small and then scale up to a global level without the need for major re-architecting.
However, to avoid major efforts in the future, it’s essential to establish the right architectural foundations. Think of it like building a house – if the foundation of a single-story room is not strong enough, you won’t be able to safely extend it to two stories without significant re-work.
This article will cover the main areas to consider when architecting an application for global scale – either from day one, or by establishing solid foundations to minimise re-work needed when an application grows in the future.
Foundations for global scale
Before diving into AWS services, let’s cover some theory that will form the foundation for the rest of this article. Why architect for global scale?
The million-dollar question! There are a number of reasons why you might want to explore multi-region architectures.
First, even if you don’t think your application will reach global scale, it never hurts to be prepared – within reason. I’m not suggesting deploying to 5 different regions if all your users are based in the UK, but solid foundational architecture decisions will set you up for more straightforward scaling in future, regardless of the magnitude.
In my view, there are two main reasons for running an application across multiple regions: data residency and network latency.
- Data residency refers to the requirement for an application’s data to be stored in a particular geographical region. This sometimes, but not always, extends to the requirement for that data to be only accessible from the same region. Sometimes data residency is a self-imposed requirement, sometimes a legal or regulatory requirement.
- Network latency is the time it takes for data to travel from its source to the user consuming it. This could be documents being downloaded, images, or even the compiled source files that make up an application. Lower latency results in a better user experience and happier users.
Reducing network latency and ensuring data residency compliance are requirements that will likely surface for most applications as they scale, so it’s important to consider the technical implications of these ahead of time.
Choosing regions
So, how do you choose the geographic regions in which to deploy and operate an application? This decision should always link back to data residency and network latency requirements.
Consider any legal and regulatory requirements that your business has, for example, the General Data Protection Regulations (GDPR). Consider not only this but also what your users or customers might expect. If all of your customers are based in the UK, would they expect their data to be stored in London, or would they be comfortable with it all being stored in the US?
Network latency is intrinsically linked to the location of your users. If 90% of your users are based in the UK and 10% in the US, then it would make sense to optimise your choice of region for the majority. If the split becomes more 50/50 in the future, exploring multi-region deployments may make more sense.
Stateful vs. stateless
If your application stores data (the vast majority do), it must be determined whether the implementation is stateful or stateless:
- Stateful applications store their data alongside the application itself (i.e. on the same instance).
- Stateless applications decouple the state from the application layer.
Determining whether your application is stateful and making it stateless opens the door to a global scale while also enabling other best practices, such as auto-recovery and horizontal scaling with auto-scaling groups.
Rounding up
In summary, establishing solid foundations in your architecture is essential to minimise re-work when the time comes to scale. Considering your data residency and network latency requirements is vital when deciding which geographic regions in which to deploy and operate your application. Making your application stateless opens the door to global scale and other best practices.
Next, we’ll explore how these concepts apply within AWS by looking at five different services that either set you up for future scale or enable global scale from day one.
AWS services for global deployments
Let’s start by looking at a couple of services within AWS that can help route users to the optimal deployed version of your workload.
Route53
First up is a DNS-based solution. Typical DNS records consist of a record name and a single value. For example, ubertasconsulting.com might point to 141.193.213.10. Wherever you are in the world and whether the website server host on is healthy or not, the record name will always resolve to that IP address. For a lot of workloads, that’s entirely sufficient. But what about if it’s not?
Route53 is AWS’ DNS service. Within a Route53 hosted zone, where the records for a particular domain or sub-domain are configured, you can configure the routing policy for each individual record. A routing policy determines how the value of the DNS is resolved. There are three routing policies of interest when we’re talking about architecting for global scale. These are geolocation routing, geoproximity routing, and latency-based routing. Let’s explore these in a little more detail.
- Geolocation routing policies route DNS queries based on the location of the user; to be more precise, the location that the DNS query comes from. A great use for this is specifying that users in a particular region should always be served content from a particular resource.
- Geoproximity routing policies are often confused with geolocation policies. The easiest way to remember it is that geolocation policies allow you to specify exact mappings between resource locations and user locations, whereas geoproximity policies route users to the closest resource available. Geolocation policies mean you could technically (if you really wanted) route all users from the US to a server in Europe and vice-versa. Geoproximity routing focuses on distance and enables the customisation of bias for each resource location to influence the relative size of the resource region compared to others globally.
- Latency-based routing policies are the most straightforward. Given a set of record values, Route53 will choose the one with the lowest latency for the user. Often, this will give similar results to geoproximity routing but covers cases where the closest location may not always be the lowest latency.
While all three of these routing policies will help direct users from across the globe to the right deployment of your workload, they don’t cater for another benefit of multi-region deployments: automatic failover. You can combine any of these policies with Route53 health checks to ensure that users are only directed to healthy deployments.
Global Accelerator
Whilst DNS is one solution for directing users to their closest deployed workload, there are a couple of things to keep in mind:
- DNS is distributed by nature, and records are cached around the world. This means there can be a delay between updating a record in Route53 and user requests reflecting the change.
- A value returned from a DNS record, even if it is the IP address for a workload located closest to you, still potentially requires a large distance for data to travel over the public internet. The public internet is congested and can be very variable in aspects like time of day.
Global Accelerator looks to solve these two problems. A single endpoint is provided with a static IP address that acts as the entry point for between one and ten regional deployments. There are over 100 network entry points located on the AWS global edge network across 50 countries. This minimises public internet latency and optimises routing on the congestion-free AWS network.
As well as routing to the closest region fronted by the access point, Global Accelerator will automatically route to only healthy endpoints. There are many advanced use cases for this service; for most workloads, it’s unnecessary, but it’s good to have the option to add it when you need to scale.
You can test for yourself how much faster AWS Global Accelerator can be using the Speed Comparison tool.
Data
Let’s move on to data. Data is one of the hardest elements of multi-region architecture. The biggest thing to remember is that the workload needs to be stateless; you’ll recall that we talked about the reasons for this earlier. S3
S3 is AWS’ object storage service. Objects are organised into buckets. When buckets are created, they are created in a specific region. They’re accessible from any region, but latency will increase if data travels a long distance.
One strategy for using S3 globally is to create a bucket in each region that you operate in. You can then use cross-region replication to ensure that any objects added, deleted or changed in any of the buckets are replicated to the others. With this method, each deployment of your application would need to point to the S3 bucket that is for its region. For example, you might have my-bucket-eu-west-2 and my-bucket-us-east-1.
A potential issue with this strategy is that there is always the rare chance of a regional issue with Amazon S3 occurring. For example, if S3 were to experience an outage in eu-west-2, the overall workload deployment would not function as desired, even if other components were still healthy.
A solution to this issue is S3 Multi-Region Access Points (MRAPs). This is built on the same technology that powers Global Accelerator and uses cross-region replication. MRAPs expose a single endpoint that applications can use for interacting with S3 and then, behind the scenes, route the request to the closest S3 bucket or, if unhealthy, the next closest. DynamoDB global tables
What about if your data is in a database and it’s not suitable for Amazon S3? If it’s relational data (e.g. SQL database engines like MySQL or PostgreSQL), then there’s a service just for you, but we’ll come to that. First, let’s look at the offering for non-relational data. DynamoDB is the go-to service in AWS for non-relational, or NoSQL, data.
DynamoDB has a feature called ‘Global tables’ that allows you to configure a multi-region and, crucially, multi-active database. This means workloads deployed into specific regions can read and write to that region’s DynamoDB table. Data is replicated between the regional tables with single-digit millisecond latency, giving you confidence that users will see the same data no matter where they use the application.
If DynamoDB in a particular region were to experience an outage, you can implement logic in your application to fallback to using a different region. Aurora Global Databases
Now, relational data. Typically, when people talk about multi-region SQL databases, they’re referring to read replicas that can be promoted in the event of disaster recovery. They’re not wrong; it is a valid multi-region architecture. It doesn’t, however, take into account use cases where workloads are accessed globally, and data needs to be read from, as well as written to, the database.
What AWS services can help solve this problem? Amazon Aurora is AWS’s fully managed MySQL- and PostgreSQL-compatible database solution. Within Aurora, there is a feature called ‘Global Database’. Global Databases, by default, follow the same pattern I’ve described, where a primary database cluster handles reads and writes, and secondary clusters are read-only.
This means your application needs to be configured to write to a database in one region, and read from another. Whilst this isn’t too complicated to implement in code, it does mean you have to configure your AWS networking to allow cross-region connectivity – this could be with something like VPC peering or Transit Gateways.
What’s the alternative then? AWS launched a new feature in Aurora Global Databases called ‘write forwarding’. This allows you to configure a secondary database cluster to transparently forward any write transactions back to the primary cluster without any additional network set-up on your part. Data is written to the primary cluster first before being replicated out to any secondary clusters to ensure that the primary is always the source of truth.
Rounding up
We explored how Route53 or Global Accelerator can be used to direct users to the optimal regional deployment of your application. We then looked at handling data in a multi-region architecture. We began with S3, exploring cross-region replication and how multi-region access points can simplify the process. We finished with databases, specifically DynamoDB global tables and Aurora global database and the opportunities they provide you.
Conclusion
We covered a lot there! Let’s recap the key takeaway points.
We looked at the importance of establishing solid foundations when architecting for AWS. It’s crucial that you consider which regions you want to deploy into and start on the right foot by making your application stateless.
We explored the AWS services that can help and enable easier scaling up when the time comes. We also reviewed the various extents to which you can use AWS services to support on the journey to operating at global scale.
Overall, the key point I want you to takeaway is that it isn’t difficult to set yourself up for global scale. AWS make it significantly easier with their managed services than if you tried to implement it yourself. Let engineers and operations teams focus on delivering business value whilst AWS do the heavy lifting.