Surviving the Slashdot Effect: Caching Web Traffic with Rails and Cloudflare

So you've got a content-oriented website, maybe your own blog or something, and maybe you've (like me) decided to ignore the advice about using static site generators. You build your Rails site, and it is wonderful and beautiful and dynamic, and every page it replies with delights readers with your artisanally-crafted HTML. Maybe you've got some internal caching (Rails has you covered here), maybe it's all roundtrips to the database. But who cares! Your site is up and receiving traffic!

Then, suddenly, a storm hits. Congratulations! You've made it to the top of Reddit/Slashdot/Hacker News! You now have thousands, if not millions, of people beating down your door to read your content. But now your site is down! The link's comment thread is filling up with complaints about being hugged to death, and a few helpful souls are posting the archive.org equivalent of your link and siphoning away your traffic.

How do we fix this?

You could throw more compute resources at it -- think "scaling vertically/horizontally" -- which I'm sure your server/application host would ab$olutely love.

Or,
you could install some sort of proxy cache in front of it. The traditional answer here is to use nginx or Varnish as a caching proxy, but if you use a content delivery network (such as Cloudflare) it may be better to use that CDN's caching features instead. (Some might recommend using both your own cache and your CDN's cache, but I wouldn't advise this because troubleshooting cache issues is already difficult enough, and having multiple layers only makes debugging even more confusing. If you do this, you should understand your web application thoroughly.)

Since this site is fronted by Cloudflare, I want to make use of its page cache: it's free and comes with the service!

However, setting this up is not as simple as it may first appear: in a default configuration, Rails doesn't permit caching (the Cache-Control headers it sends don't allow for it), and as a result, nearly every request you receive bypasses the cache and gets passed directly to the app server. This is a screenshot of my Cloudflare dashboard showing the percentage of page requests cached before I applied the fixes I describe here (those peaks top out at ~10%):

Uh.... that's not very good!


Now, you can set up rules in the Cloudflare dashboard to override Rails' requested caching behavior, but this does not solve the underlying root cause: Rails is requesting no caching, because the Cache-Control request header it sets explicitly forbids it:

Cache-Control: NO CACHE!


Setting the Correct Cache-Control Headers with Rails

NOTE: The directions given here apply to Ruby on Rails version 7, though expires_in and fresh_when have existed since at least version 2, and concerns have been available since version 4.

Luckily, Rails makes changing this behavior fairly simple. We don't even need to really dive into how Cache-Control works! (Though here is a good guide if you want to know.) You simply call the expires_in and/or fresh_when functions in your controller, supplying an expiry and ensuring that you set public to true. Like this:

expires_in 1.hour, public: true
# or
fresh_when(@article, public: true)

However, setting this for every route is both tedious and a pretty egregious violation of DRY. Instead, we can set as much as we can once and then propagate it through our application using either class inheritance or composition (via ActiveSupport's concerns) feature. And while inheritance may be slightly easier, composition is a bit more modern and flexible; here we will be taking the latter approach.

To start, we will want to make a new concern and call it "Cacheable". The easiest way to do this is to simply go to the $RAILS_ROOT/app/controllers/concerns folder and create a new file, naming it cacheable.rb. In this file, we want to make one small action (called "set_cache_headers") and call expires_in within it. Here is a very basic and usable example, which also prevents page caching when a user is logged in:

# app/controllers/concerns/cacheable.rb
module Cacheable
  extend ActiveSupport::Concern

  included do
    before_action :set_cache_headers, unless: current_user
  end

  def set_cache_headers
    expires_in 1.hour, public: true
  end

end

Then, for each controller whose content you wish to cache, simply add "include Cacheable" at the top right below the class declaration. Here is an example pulled directly from this site's code, for the controller that powers the "About" feature:

# app/controllers/static_controller.rb
class StaticController < ApplicationController
  include Cacheable
  def about
    @page_title = "About"
  end
end

Once this is done you will see that Cache-Control is indeed being set correctly:

Objective achieved!


But! You are not finished yet! You may notice that while your 'cached' stats are going up, they aren't going up as much as one might think. This is because there is another component to page caching that we have not yet discussed: etags.

Setting the Correct Etag Headers with Rails

This is where things get a bit more tricky: Rails generates another header, called an Etag, that, in theory, is supposed to be unique for each page.  (For the more technically inclined, you can think of an Etag as like a SHA256 hash for your page.) But Rails, by default, makes this tag unique per request. Both your browser cache and your CDN cache read this header to determine whether a given request is a cache hit or cache miss, and so we will need to configure Rails' to set it correctly, based on our rendered page content (or other context).

Enter fresh_when, which provides further direction to Rails on how to render the correct etag header. You provide it with an object that describes what the page renders -- generally the model instance for the given page (the Rails docs use @article in their examples) -- and it generates a hash that is used for the Etag header.

Using fresh_when with Dynamic Pages

For dynamic pages, such as the example described above and in the Rails docs, simply call fresh_when and pass the it your model instance as its first parameter, inside the controller route. Like so:

# app/controllers/articles_controller.rb
def show_article
  fresh_when @article
end

When combined with aforementioned Cacheable, this is sufficient to avoid page-caching in the case of logged-in users, as the expires_in directive is never called when current_user exists, and Rails reverts to its default, zero-expiry "private" cache behavior.

If you aren't using Cacheable, as described above, you will need to consult the documentation as you need to provide additional information.

Using fresh_when with Static Pages

Static pages are a bit more tricky here, as the examples in the docs do not cover this circumstance. Once again by default the etag will always be different, so we need to pin it to something, yet appropriate data (namely, the path to the view and a timestamp for when it was last updated) isn't really available to our controller. And, like above, we don't want it to get mixed-up when the site is being viewed by a logged-in user vs. the anonymous public.

I haven't figured out a great solution here, but we can build a string using params and current_user and set that as the value of our etag, and it should work well enough for our purposes. But! You will need to manually purge the cache on your CDN when these pages are updated, or wait for them to expire. For this case, a short expiry (say, 15 minutes) is useful here.

(Note that if we can get information about the last-modified timestamp of the template to be rendered, we could include that data in the etag so it would naturally invalidate when it is updated, but I don't know of a non-hacky way to do this and some preliminary research in this area yields nothing.)

So, we craft another concern, this time called StaticCacheable, and include this in each controller serving static content. Once again, like Cacheable, this is a controller-level solution; if you need something per-action that is an exercise left up to you.

To make this concern, create a new file called static_cacheable.rb and save it to your $RAILS_ROOT/app/controllers/concerns folder, right next to cacheable.rb. Note that we will include a reference to Cacheable from within StaticCacheable, so that you only need to include StaticCacheable on your static controllers. In the action it defined we simply grab route information from params and feed that into fresh_when:

# app/controllers/concerns/static_cacheable.rb
module StaticCacheable
  extend ActiveSupport::Concern

  included do
    include Cacheable
    before_action :set_static_cache_headers
  end

  def set_static_cache_headers
    etag =  "#{params[:static]}#{params[:about]}#{current_user}"
    fresh_when etag: etag
  end
end

Note that including current_user makes the Etag different every for request, when current_user is logged in. This is because fresh_when will coerce the string representation of current_user (via an implicit .to_s), which will always vary because that string includes current_user's internal object-id (NOT its database id), which varies with each request.

Finally!

Once your Cache-Control and Etag headers are under control and correctly set, and you have correctly configured your proxy service, your site should be well-equipped to handle large volumes of traffic without falling over. Hurray!

A Quick Note About Cloudflare and Implementing This

It's worth noting that Cloudflare seems to strip the Etag header when Cache-Control renders it useless, as is the case when Cache-Control is set to private. This may seem annoying but it punches out your browser cache, presumably to ease troubleshooting.

You can still see the Etag header if you pull requests directly from your webserver (by its IP address), and it will also be visible during development. Unfortunately, it seems like you will have to rely on unit tests or Cloudflare's provided statistics to verify your cache strategy is working.

sui generis.

Lyjia's Blog

See posts by category: