Fixing CloudFront's issues with static sites
Let's analyze two common problems with CloudFront static websites.
If you're interested in learning more about architecting solutions on AWS, check out popular reading - AWS for Solution Architects or The Self-Taught Cloud Computing Engineer
CloudWhat?
CloudFront is a CDN - in other words it helps deliver whatever content is behind it.
S3 is just a file store, however S3 does have some capabilities to host websites. This works great, honestly - you get low latency static websites without any complications. But it lacks most of the features required for a modern website - HTTPS, Caching & DDoS Protection to name a few.
This is where our content delivery network comes in - it's brings all these features with a single service.
Then what's wrong with it?
It does everything it says, but there are two issues I faced when setting up a static website on it.
The first one seems to be an issue faced by (almost?) everyone.
It's to do with Default Root Objects, which is the object that is returned when a user requests a path URL ( /about/
) instead of an actual object ( /about/contact.html
)
In S3, you can set a index document which gets used for all subdirectories, but in CloudFront the default root object works only with the root directory
"The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket."
Gluing up the first crack
People have found various workarounds for this, and AWS themselves have suggested a few - let's take a look at some of these and see what would fit best for each use case.
> Let the world see
The most commonly used fix is to make your S3 bucket public, enable S3 hosting, and then point your CloudFront distribution to the S3 website instead of the S3 bucket.
This is good because
- There's some consistency in your URLs (you'll understand why I'm mentioning this later)
- No extra overhead to maintain
But then again, it's not good because
- Anyone who knows (or guesses) the name of your bucket can just DDoS the bucket and send your AWS bill skyrocketing ( CloudFront offers basic DDoS protection only on it's own URL ).
- If you have some non-public content on the bucket, selective authorization gets complex.
> You can't fool me
If the problem is that CloudFront keeps trying to access the path as an object in S3 itself - why not do exactly that?
This involves creating a script which copies each index.html into another object with the same name as the folder.
e.g. In S3 you would have both a 'folder' called about
which has the same content as about/index.html
This is good because
- It works
But then again, it's not good because
- It's hacky, relative links will get tricky
- You need to do this every time you change your content
- Duplicate content! (discussed below)
In short, I would just ignore this approach completely - but that's a personal preference.
> You go where I tell you
AWS has suggested another way to do this.
It involves using CloudFront Functions / Lambda@Edge to inject some code in between each CloudFront request to redirect URLs to the correct path.
This is good because
- Your resources are still private & secure
But then again, it's not good because
- It's an (albeit small) additional component to maintain on AWS
- Your URL consistency goes for a toss
This is because this function given by AWS treats /about
and /about/
as two different URLs.
Though Google has specified this won't affect SEO ranking, this can still affect things like Caching, URL preference for search results, analytics etc. And we don't really know if every search engine on the internet does a good job consolidating this duplication.
So instead of sacrificing performance and possible SEO degradation let's look at how to fix it.
Gluing up the second crack
There aren't many unique solutions to this problem, so let's just fix up AWS' CloudFront function ourselves.
The right way to do this is using 301 redirects.
I found a nice small gist that handles both scenarios (and some more) which helps make CloudFront hosting more robust with SPA / Static Pages.
However, if you are using CloudFront functions instead of Lambda@Edge, you will need to adjust some of the statements to work with the structures on CloudFront functions.
I have hosted a gist of the CloudFront function adaptation. Here's the raw code:
With this we managed to improve CloudFront - S3 integrations for static web hosting.
Till the next post, Arrivederci!