Curious (Clojure) Programmer: Weekly Bits 07/2022 - clojure.set/union gotcha, memory "leaks" using eval, EC2 Instance Connect, and CloudFront vs Host header

Clojure

Verbose namespace loading

Nice tip for "debugging" clojure namespace loading from hiredman on slack: https://clojurians.slack.com/archives/C03S1KBA2/p1646782587521059

;; turn it on everywhere
(alter-var-root #'clojure.core/*loading-verbosely* (constantly true))

Note: there’s also :verbose option for require

Clojure gotcha: clojure.set/union

It’s relatively well-known that clojure.set/union produces garbage when its inputs are not sets, but it can easily cause surprises and bad behaviors.

I found a production bug caused by duplicates not being removed when one of the inputs was a vector. This is a simple demonstration:

  (set/union [1 2 3] #{3 4 5})
  ;; => [1 2 3 4 3 5]

In our case, it was harder to spot because the vector was produced in a completely different namespace.

Be careful!

Or check Andy Fingerhut’s library: funjible.

Learn Reagent

Progressing with the course, I started playing with Firebase.

I also learned about #js as an alternative to clj→js. (note that #js is "shallow" - it does not recursively convert the whole data structure).

Clojure: memory leaks using eval

I noticed a StackOverflow question and it got me curious. In there, they claim that continuous calls to eval lead to OutOfMemoryError due to classes produced by the evaluation. Moreover, they say that such classes are not garbage collected: By itself, JVM will not get rid of new classes.

This is not correct. GC works the same way as for any other object. It will gargabe collect classes if they are no longer reachable. I suspect that what they might be seeing is nREPL keeping instances of DynamicClassLoader alive: "In a nREPL client, instances of DynamicClassLoader keep piling up."

I confirmed that this is not an issue with a regular deps.edn based project. See my stackoverflow answer and the link that contains some profiling details (and images of better quality :) ):

Profiling memory of Clojure code calling 'eval' in a loop

AWS & Cloud

EC2 Instance Connect & SSM Manager

A great article from the Cloudonaut crew: Connect to your EC2 instance using SSH the modern way.

I jumped on the wagon and started using this technique which allows me to get rid of our bastion hosts altogether.

IAM Policies and Terraform variable interpolation

While working on the transition to EC2 Instance Connect, I found that interpolating variables inside IAM policy documents requires special attention with Terrafrom.

Instead of the usual ${aws:username} you need to use double-dollar sign, that is $${aws:username}:

module "ssm-sessions-policy" {
  source      = "terraform-aws-modules/iam/aws//modules/iam-policy"
  ...
  policy      = <<EOF
  ...
                "Resource": [
                "arn:aws:ssm:*:*:session/$${aws:username}-*"
            ]

Setting up a new CloudFront distribution and the Host header

I made a brand new CloudFront distribution for one of our public-facing web applications that was still not using CloudFront. The advantage is that we can use the same WAF rules we use everywhere and of course improved performance and security in general.

I learned, that it is important to pass the Host header to the origin server to make redirects work. For instance, our public facing server lives at portal.codescene.com - this is a CloudFront distribution that points to origin-portal.codescene.com which is a Beanstalk load balancer.

Without passing the Host header, redirects made by the application were broken, because they were redirecting to origin-portal.codescene.com, instead of portal.com. It wouldn’t be a problem if we "relative" redirects work properly - but they don’t: many web servers still rewrite relative redirect URLs to absolute URLs because of how former HTTP redirects spec was defined (Wikipedia: HTTP location).

Checkout AWS support article: How do I configure CloudFront to forward the Host header to the origin?

MISC

On the complexity of Kafka and alternative messaging systems

There was a good discussion on Clojurians slack about Kafka and alternative messaging systems.

In particular, how RabbitMQ can be easily used for consuming messages even after they have been published (assuming the consumers were offline at the time):

lukasz: one of the primary use cases is exactly that: accepting webhooks and processing them out of band.
I forget how many millions of msg/s we’re processing right now with a mid-size managed instance in AWS. You can do that with Rabbit just fine - set your queues to be durable and persistent. In fact, that’s how all our consumers operate - we have open sourced our framework for safe RMQ consumers: https://github.com/nomnom-insights/nomnom.bunnicula/

— lukasz on Clojuarians slack

Weekly Bits 07/2022 - clojure.set/union gotcha, memory "leaks" using eval, EC2 Instance Connect, and CloudFront vs Host header