Fixing CloudFront's issues with static sites

2024-05-26

#aws

Let's analyze two common problems with CloudFront static websites.

If you're interested in learning more about architecting solutions on AWS, check out popular reading - AWS for Solution Architects or The Self-Taught Cloud Computing Engineer

CloudWhat?#

CloudFront is a CDN - in other words it helps deliver whatever content is behind it.

S3 is just a file store, however S3 does have some capabilities to host websites. This works great, honestly - you get low latency static websites without any complications. But it lacks most of the features required for a modern website - HTTPS, Caching & DDoS Protection to name a few.

This is where our content delivery network comes in - it's brings all these features with a single service.

Then what's wrong with it?#

It does everything it says, but there are two issues I faced when setting up a static website on it.

The first one seems to be an issue faced by (almost?) everyone.

It's to do with Default Root Objects, which is the object that is returned when a user requests a path URL ( /about/ ) instead of an actual object ( /about/contact.html )

In S3, you can set a index document which gets used for all subdirectories, but in CloudFront the default root object works only with the root directory

"The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket."

Gluing up the first crack#

People have found various workarounds for this, and AWS themselves have suggested a few - let's take a look at some of these and see what would fit best for each use case.

> Let the world see#

The most commonly used fix is to make your S3 bucket public, enable S3 hosting, and then point your CloudFront distribution to the S3 website instead of the S3 bucket.

This is good because

There's some consistency in your URLs (you'll understand why I'm mentioning this later)
No extra overhead to maintain

But then again, it's not good because

Anyone who knows (or guesses) the name of your bucket can just DDoS the bucket and send your AWS bill skyrocketing ( CloudFront offers basic DDoS protection only on it's own URL ).
If you have some non-public content on the bucket, selective authorization gets complex.

> You can't fool me#

If the problem is that CloudFront keeps trying to access the path as an object in S3 itself - why not do exactly that?

This involves creating a script which copies each index.html into another object with the same name as the folder. e.g. In S3 you would have both a 'folder' called about which has the same content as about/index.html

This is good because

It works

But then again, it's not good because

It's hacky, relative links will get tricky
You need to do this every time you change your content
Duplicate content! (discussed below)

In short, I would just ignore this approach completely - but that's a personal preference.

> You go where I tell you#

It involves using CloudFront Functions / Lambda@Edge to inject some code in between each CloudFront request to redirect URLs to the correct path.

This is good because

Your resources are still private & secure

But then again, it's not good because

It's an (albeit small) additional component to maintain on AWS
Your URL consistency goes for a toss

This is because this function given by AWS treats /about and /about/ as two different URLs.

Though Google has specified this won't affect SEO ranking, this can still affect things like Caching, URL preference for search results, analytics etc. And we don't really know if every search engine on the internet does a good job consolidating this duplication.

So instead of sacrificing performance and possible SEO degradation let's look at how to fix it.

Gluing up the second crack#

There aren't many unique solutions to this problem, so let's just fix up AWS' CloudFront function ourselves.

The right way to do this is using 301 redirects.

I found a nice small gist that handles both scenarios (and some more) which helps make CloudFront hosting more robust with SPA / Static Pages.

However, if you are using CloudFront functions instead of Lambda@Edge, you will need to adjust some of the statements to work with the structures on CloudFront functions.

I have hosted a gist of the CloudFront function adaptation. Here's the raw code:

function handler(event) {
  const isFilePath = uri => /\/[^/]+\.[^/]+$/.test(uri);
  var request = event.request;
  var oldUri = request.uri;

  if (!isFilePath(oldUri) && !oldUri.endsWith('/')) {
    var query = request.querystring;
    const newUri = Object.keys(query).length != 0 ? `${oldUri}/?${query}` : `${oldUri}/`;
    return {
      statusCode: 301,
      statusDescription: 'Moved Permanently',
      headers: { location: { value: newUri } }
    };
  }

  request.uri = oldUri.replace(/\/$/, '\/index.html');
  return request;
}

With this we managed to improve CloudFront - S3 integrations for static web hosting.

Till the next post, Arrivederci!