Curious (Clojure) Programmer Simplicity matters

Menu

October 10, 2022

Feature flags, Middlewares, and Cloudfront caching.

Table of Contents

All (animals) are equal, but some (animals) are more equal than others
Feature flags and Middlewares
The fix
Takeaways
References

This post has also been published on CodeScene Engineering blog: CodeScene Engineering blog.

In our staging environment, we recently found an odd issue: it was returning blank pages - for some people, sometimes. Unable to reproduce the problem on their machine, developers started suspecting it’s an infrastructure issue.

We noticed that when the problem happened, it was caused by a missing javascript bundle; but the server responded with 200 OK response and an empty file, rather than returning 404. This prompted further investigation …

All (animals) are equal, but some (animals) are more equal than others

I heard from a couple of colleagues that they were not able to reproduce the issue on their machine, but I tried it anyway. And voila!

This was really strange - the server was returning empty 200 OK response for any non-existent route. I was pretty sure it was returning proper 404s until recently, so something recent changed this behavior.

At least, I found a way to reproduce the problem, on my machine.

Feature flags and Middlewares

I remembered seeing a small change in our Ring middleware chain lately. It was related to our new REST API feature that we had just released. I jumped into the code and focused on the API middleware.

Eventually, I stumbled upon code that looked like this:

(defn wrap-api [handler]
  (-> handler
      wrap-logging
      wrap-auth
      wrap-common
      wrap-no-cache))

wrap-no-cache was the recent addition so I looked at it:

(defn- wrap-no-cache [handler]
  (fn [request]
    (-> (handler request)
        util/no-cache-response)))

Hmm… Could that cause a problem? I added some debug prints and it became obvious that this was the problematic piece.

Ring’s middlewares

A middleware is simply a higher-order function that wraps a request handler and adds some functionality.

A simple middleware adding a key to the request map may look like this:

(defn wrap-user [handler]
  (fn [request]
    (if-let [user-id (-> request :session :user-id)]
      (let [user (get-user-by-id user-id)]
        (handler (assoc request :user user)))
      (handler request))))

The function wrap-no-cache shown earlier is a middleware that calls util/no-cache-response which inserts the Cache-Control header to make sure that the client doesn’t cache the response.

The trouble is that no-cache-response returns a response map even if (handler request) returns nil ^[1], thus turning a Not Found response to an OK response!

Feature flags or "why is it always me?"

We were using a feature flag (disabled by default) for the new REST API functionality. And the whole API middleware was only activated if the flag was on.

I had the flag enabled because I was testing REST API before, but my colleagues didn’t use it yet. Thus it was working for them just fine. We also had enabled the flag on staging a while ago to be able to test it continuously before releasing to production. So this feature flag was at least part of the equation.

But why it was sometimes working, for some people, on staging? That was still strange and demanded a closer look. ## Cloudfront caching

Mind your layers

Modern development might be tricky because there are typically many layers to be aware of, for instance:

your application code
database
3rd party services
the app (web) server
proxies and load balancers
firewalls
CDNs

These are nice and give us robustness and decouple concerns. Debugging, however, becomes more challenging.

Cloudfront

We host all our infrastructure on AWS, and in front of it, there’s Cloudfront - a managed CDN service. It’s the usual suspect when it comes to caching problems.

My colleague noticed, that using curl everything looked fine but in the browser, we were getting empty pages, intermittently.

We knew that the empty responses were due to a non-existent bundle files being requested and our app returning an OK response instead of Not Found. But why (some) clients keep requesting those non-existent files?

Here’s a summary of the first hypothesis trying to explain the behavior:

During a deployment of a new app version, the (two) web application nodes are updated one by one.
The client requests /index.html file from the server and it gets served by the node (B) having the latest version of the app
The client parses the HTML response, and requests the other resources, including JS and CSS bundles - but this time, they are served from the node (A) still having the older version of the app.
The bundle files that the new version of the app is using are not available on the node hosting the older version - it would normally yield 404 but due to the bug in our code, it returns 200 OK.
Such an empty response breaks the app and it manifests as a blank page in the browser.

Having said that, it was still unclear to me why would clients keep requesting the old bundles, even after the application deployment was finished. Even though wrap-no-cache was broken, the responses it returned should be non-cacheable (no-store value in the cache-control header).

Fortunately, a colleague of mine (thanks Simon!) had shed some light on it. He noticed we had special nginx configuration for JS & CSS files:

# CSS and Javascript, Fonts
location ~* ^(?!/docs)/.*\.(?:css|js|woff|woff2|ttf|eot|otf)$ {
    expires         1y;
    access_log      off;
    add_header      Cache-Control "public";
    proxy_pass      http://127.0.0.1:5000;
}

This finally explained it! While the app was returning no-store cache-control header, the caching behavior was overwritten by nginx and the cache expiration time set to 1 year. So even if the user did a hard-refresh of their browser, Cloudfront still served the cached version of the non-existent/empty JS & CSS resources.

The fix

Fixing the problem was really simple - just use some→ to make sure nil is not turned into a map, unexpectedly:

(defn- wrap-no-cache [handler]
  (fn [request]
    (some-> (handler request)
            util/no-cache-response)))

Takeaways

Feature flags are nice but remember: they multiply the number of testing paths through the application and different configurations making it harder to spot and reproduce problems
There are often multiple contributing factors under the hood of a tricky problem, not a single "root cause"

References

1. Ring interprets nil as Not Found and returns 404 response in that case

Tags: clojure-ring clojure http caching

« Git - How to Delete Merged Branches Older than X Days? Leiningen, uberjars and a mysterious "dev-only" dependency problem. »