Curious (Clojure) Programmer Simplicity matters

Menu

May 3, 2022

Bits & Pieces 09/2022 - April

Table of Contents

Clojure
Java / JVM
Architecture
- Architecture: The Hard Parts (a book by Neal Ford and Mark Richards)
AWS & Cloud
- CloudFront’s Server-Timing headers
- Serving content only to logged-in users with CloudFront Signed Cookies
Security
- Critical JDK vulnerability in the ECDSA verification algorithm (CVE-2022-21449)
- Antivirus or not?
Reading (Books)
Writing (Blog, Articles)
- Writing effective technical documentation
- Typing exercises
MISC
Links

This time, it is more of a "Monthly bits" piece because I didn’t have time to publish regular weekly updates in April.

Clojure

Broken HTTP/S redirects, friend, and ring

One of our on-premise customers reported a problem with redirects when running CodeScene behind a proxy server terminating HTTPS. In such cases, we’d always recommended our clients to configure their proxy server to rewrite redirect URLs.

However, my colleague wasn’t satisfied with this solution and looked at the http routing machinery we use. They found that friend, a Clojure authentication library, enforced absolute URLs for redirects even though we configured it to use absolute URLs.

I found that ring does the same thing, but its behavior can be customized.

A long time ago, the old HTTP spec mandated using absolute URLs for redirects. This is no longer valid and it’s completely fine to use relative URLs for redirects. See https://en.wikipedia.org/wiki/HTTP_location.

I submitted a pull request to friend’s repo to get rid of enforcing absolute URLs for redirects and configured ring-defaults in our app properly:

(defn not-enforce-absolute-redirects [defaults]
  (assoc-in defaults [:responses :absolute-redirects] false))

(def my-app
  (-> handler
      (wrap-defaults (-> site-defaults not-enforce-absolute-redirects))))

This fixed the customer’s problem.

REPL meets the businesss

A programmer shared a nice experience report on Clojurians slack about using the REPL when talking to a business person:

Their eyes widened as I was able to use Calva’s Shift + Option + Enter keyboard shortcut to evaluate each “step” of the threading macro to see the result. Creating “verbs” as functions and stringing them together with threading macros—evaluating each “step” in the REPL—is a powerful “scratchpad/whiteboard” for showing and explaining things to non-technical people.
Had they asked a few months ago, I would have used Python in Jupyter Notebook.And they were in awe of Clojure/VS Code/Calva, saying things like: “What is that tool you’re using?! That’s amazing. I need something like that.
”Reminder to self: Show, don’t tell. And don’t assume that non-technical people won’t understand what’s going on

clj-ddd-example

didibus published clj-ddd-example:

An example implementation of Domain Driven Design in Clojure with lots of information about DDD and its concepts and how it’s implemented in Clojure along the way.
discussion on clojureverse: Domain Driven Design (DDD) in Clojure, an example implementation

Moving files between file systems

I had troubles with the me.raynes.fs/rename function not moving files across file system boundary and simply returning false.

This is actually a problem with the underlying java.io.File method renameTo. It’s one of the many shortcomings of the legacy Java file I/O.

The solution was to use the Java NIO.2 API:

(defn rename-file
  "Moves source file to the target file while respecting `replace-existing?`, by default true.
  `source` and `target` are expected to be a string, java.io.File or java.nio.file.Path.
  This uses java.nio.file.Files/move to overcome shortcomings of old `java.io.File/renameTo` method,
  notably when moving files between different file systems:
  - https://docs.oracle.com/javase/tutorial/essential/io/legacy.html
  - https://www.baeldung.com/java-path-vs-file"
  ([source target]
   (rename-file source target true))
  ([source target replace-existing?]
   (Files/move (.toPath (io/file source))
               (.toPath (io/file target))
               (into-array (if replace-existing?
                             [StandardCopyOption/REPLACE_EXISTING]
                             [])))))

The new function moves the file properly (and if it does not, it will throw an exception).

Serializing & De-serializing Clojure objects

2008 discussion with a couple of comments from Rich Hickey to understand the difference between print-method and print-dup:

Rich: Yes, please consider print/read. It is readable text, works with a lot of data structures, and is extensible.
As part of AOT I needed to enhance print/read to store constants of many kinds, and restore faithfully - This led to a new multimethod - print-dup, for high-fidelity printing.
You can get print-dup behavior by binding print-dup

I have a full example here: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/compiler/reader-and-printer.clj#L34-L91. Here’s a snippet of it:

(binding [*print-dup* true]
  (dorun
   (map prn
        [[1 2 3]
         {4 5 6 7}
         (java.util.ArrayList. [8 9])
,,,
;; prints this:
;; [1 2 3]
;; #=(clojure.lang.PersistentArrayMap/create {4 5, 6 7})
;; #=(java.util.ArrayList. [8 9])

lsp4clj

Eric Dallo announced a new lib lsp4clj that should help you create any LSP for any language in Clojure.

Charred - new JSON & CSV parsing library with zero dependencies and very fast

Chris Nuernber introduced a brand new library for parsing JSON and CSV. It’s very fast and importantly, doesn’t have any dependency on Jackson.

See Clojurians slack discussion.

Verbose reloading of current clojure namespace

A useful tip from Clojurians slack shows how to reload and print all dependencies of a namespace:

(require (ns-name *ns*) :reload :verbose)

Pipe buffers

I experimented a bit with pipe buffers.

Pipes are the most common way to read output of a sub-process. It is important to read the sub-processe’s output promptly, though. If you fail to do so, the sub-process will block if its output is larger than max pipe buffer size, which is typically 64 KB.

Java / JVM

OutOfMemoryError: unable to create a new native thread

Sometimes, you may experiences a rather odd "OutOfMemoryError" with the detail error message saying something like this:

Exception in thread "async-dispatch-4" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

This is probably a symptom of something else. It could be insufficient system memory, but it’s more likely that you are hitting the 'max number of open files' limit.

Here are some things to try: How to solve java.lang.OutOfMemoryError: unable to create new native thread

check threads-max (system-wide limit on max number of threads): sysctl kernel.threads-max
check ulimit -n (number of open files)
count processes & threads: ps -elfT | wc -l
count threads for given process: ps -p $(pgrep java) -lfT | wc -l
check PID limit: sysctl kernel.pid_max

Basic Java/Linux monitoring script:

As a follow-up to the previous OOM topic, I came up with a basic Java/Linux monitoring script.

Run it like this:

nohup watch -n 20 './watch.sh >> monitor.log' &> /dev/null &

Prevent Java from executing sub-processes

I answered this StackOverflow question: How to prevent a Java application from executing processes on GNU/Linux?

In short, I don’t think this is easily achievable. Security Manager used to be a tool to solve such problems but it’s deprecated for removal. If you try to limit the number of processes executed by the JVM process you will run into serious issues because threads themselves are counted as processes.

[9816123.415s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.

JVM Shutdown Sequence

java.lang.Shutdown class is a useful place to learn about what happens when the JVM process exits.

An orderly shutdown sequence will call Shutdown#exit whose body is straightforward:

            beforeHalt();
            runHooks();
            halt(status);

For the application programmer, the important method is runHook - it calls all the registered shutdown hooks.

Architecture

Architecture: The Hard Parts (a book by Neal Ford and Mark Richards)

I’ve had the book for some time but didn’t find time to read it. This month, I listened to a Thoughtworks' podcast about the book where they give a useful summary of it:

It’s all about trade off analysis
Two (three) main things in distributed analysis
- sizing (granularity of) services
- Communication/ wiring
- And then data management (you cannot treat sw and data architecture separately these days)
Book What every programmer should know about object oriented design is useful for understanding various types of cohesion
- especially the distinction between the static and dynamic coupling
Architecture kata: SysOps squad stories
Rebecca: Architecture give us (is?) a framework for thinking about and working with complexity
Data architecture
Historically, there was a sharp distinction between analytics (data warehouses) and operational data
- Today, we see the need for bringing analytical capabilities back to applications (in real time)
  - We have three main types: data warehouses, data lakes, data meshes
- There’s still a key distinction between them:
  - operational workloads - the code defines a procedure and stores it’s state (code-first)
  - analytics - data drives the behavior, like in Machine learning (data-first)

I also checked another podcast’s episode about NoSQL with Martin Fowler. He offered a good piece of advice:

Using Database as an integration point is horrible
Amazon did it right by integrating via external APIs

AWS & Cloud

CloudFront’s Server-Timing headers

Amazon CloudFront now supports Server Timing headers

Server Timing headers provide detailed performance information, such as whether the content was served from the cache when the request was received, how the request was routed to the CloudFront edge location, and how much time elapsed during each stage of the connection and response process.

Example:

Server-Timing: cdn-upstream-layer;desc="REC",cdn-upstream-dns;dur=0,cdn-upstream-connect;dur=195,cdn-upstream-fbl;dur=366,cdn-cache-miss,cdn-pop;desc="IAD89-C3",cdn-rid;desc="bjEUzYyv7e3FyYoK93Tw0MNYhNV2zVTMbjFO8g-Tr5aEW108VkzM-w=="

This can help you diagnose CloudFront errors such as mysterious OriginCommError.

To enable the Server-Timing header, create (or edit) a response headers policy.

Serving content only to logged-in users with CloudFront Signed Cookies

Another useful article/guide from Cloudonaut. They demonstrate how CloudFront can be used to restrict parts of your website only to authenticated users.

Security

Critical JDK vulnerability in the ECDSA verification algorithm (CVE-2022-21449)

A nasty vulnerability was disclosed by ForgeRock: see https://neilmadden.blog/2022/04/19/psychic-signatures-in-java/: You could produce a fake signature which Java would accept as a valid signature for any message and for any public key!

Action:

If you are running one of the vulnerable versions then an attacker can easily forge some types of SSL certificates and handshakes (allowing interception and modification of communications), signed JWTs, SAML assertions or OIDC id tokens, and even WebAuthn authentication messages
If you have deployed Java 15, Java 16, Java 17, or Java 18 in production then you should stop what you are doing and immediately update to install the fixes in the April 2022 Critical Patch Update.
The very first check in the ECDSA verification algorithm is to ensure that r and s are both >= 1.
- Guess which check Java forgot?

Antivirus or not?

I’ve got an interesting question about usage of anti-virus software on cloud servers. I asked a question on the DevOps Engineers slack.

The end result is that not many people use antivirus software on their server infrastructure and some consider it to be even more risky that not running an antivirus at all (it’s a complicated low-level continuously scanning all the system binaries)
A few people mentioned solutions like ClamAV and Falcon platform

Reading (Books)

Perfect Software and Other Illusions about testing

A solid book by Garry Weinberg.

A lot of it is about busting various myths about software testing (such that you can find all the bugs via a perfect testing process). They keep saying that testing is about providing information to managers so they can make decisions to mitigate risk. It also offers an advice on how to give such information and how to receive feedback.

Recommended!

List in Small Pieces - book club

Chris Houser is looking for people joing his book club reading the Lisp in Small Pieces book: https://chouser.us/lisp2022/

The Art of War

I decided to order The Art of War: Complete Texts and Commentaries. I’ve stumbled on it every now and then in the past years - for instance, I liked the selection of quotes in The Art of Scalability.

It’s time to give it a shot.

Writing (Blog, Articles)

Writing effective technical documentation

This short talk offers good advice on how write good technical documentation. My summary:

Start with what the reader needs (don’t include stuff just because it’s "related" or "you know about it")
Write less
Write the outline first - helps you to write less too and focus on what the reader needs
"Rubber ducking" - talking helps to clarify thoughts, debug your understanding
- you can write it down to be captured in the docs later
- yields friendly conversational stuffi
- read it out loud
write readably - not one big pile of text
- use headings, lists, short paragraphs (one idea)
- put most important bits first
it’s not just about documentation - think how to make the underlying thing better (fix the thing instead of "writing around the problem")

Typing exercises

I started doing some typing exercises. My goal is to achieve 120 wpm at 99% accuracy but so far it’s been challenging :). I think I’ll need at least a few months to achieve that.

The websites I’m using:

MISC

Searching in Chrome tabs

Are you struggling with many tabs and finding the right one quickly? Just use Cmd + Shift + A (or Ctrl + Shift + A). It’s powerful!

Leadership Without Management: Scaling Organizations by Scaling Engineers (Bryan Cantrill)

I watched this talk. It was fun and re-freshing. As always, Bryan’s energy is incredible.

The key quotation for me:

We wanna solve hard commercially relevant problem with people that inspire us achieving a mission that we believe in.

— Bryan Cantrill

The word "We" here refers to the mythical 10x/top engineers.

The Quest for Low-Code: 9 paths, some of which actually work (Gregor Hohpe)

A very good article on the topic of "low-code" tools and techniques. One of the key ideas, for me, was how to measure amount of code: We shouldn’t count the amount of code in lines but in cognitive load.

Therefore, when minimizing code we should strive to reduce the cognitive load primarily, not LOC.
We also need to reduce maintenance and operations (like most SaaS products do).

Cool website: https://pudding.cool/

The Pudding makes data fun. As one example, check their Film Dialogue.

Use difftastic for git diffs

difftastic is a cool diffing tool and it supports Clojure! You can use it with git log easily:

GIT_EXTERNAL_DIFF=difft git log -p --ext-diff

Carrier-grade NAT

Until know, I somehow managed to not know about this technique used by some ISPs (Internet Service Providers). It is often used for mitigating IPv4 address exhaustion. The idea is to translate end users' IP addresses to a single shared IP address. Hundreds users may end up with the same public IP address!

This can cause problems with services blocking/whitelisting access by IP address:

People that shouldn’t be blocked are blocked because a client sharing the same public IP is misbehaving and thus gets blocked.
People that should be blocked gain access to an internal service of a company because they share an IP address of the company’s employee.

Linux - SysRq (Magic System Request Key)

I learned about this "magic" facility of Linux OS when reading about Processes in an Uninterruptible Sleep (D) State. It can list all the processes in this 'D' state with associated kernel stack traces. It might be very useful for debugging tricky issues.

You trigger it simply:

echo w > /proc/sysrq-trigger

Then check the system’s logs:

dmesg -T
...
[Fri Apr 22 11:11:40 2022] sysrq: Show Blocked State
[Fri Apr 22 11:11:40 2022]   task                        PC stack   pid father
[Fri Apr 22 11:11:40 2022] analysis-schedu D    0  4460   2556 0x00000320
[Fri Apr 22 11:11:40 2022] Call Trace:
[Fri Apr 22 11:11:40 2022]  __schedule+0x2e3/0x740
...

Note: The whole The Linux kernel user’s and administrator’s guide might be quite useful!

Linux namespaces

Namespaces are a fundamental building block for Linux containers. They give us an illusion of:

superuser inside the container (User namespace),
isolated file system (Mount namespace),
process isolation (PID namespace),
separate network (Network namespace)

I think it’s worth getting familiar with them For that, Michael Kerrisk’s talk Containers unplugged: Linux namespaces is an excellent starting point.

System call tracing with strace (Michael Kerrisk)

Another great talk by Michael Kerrisk and relatively new one (2018) is about strace. It’s an impressive and very useful tool. I used it numerous times to debug tricky issues. It’s also useful for learning about the Linux operating system.

The talk gives you a solid base to start using the tool on your own. You can find slides here and all Michael’s presentations at https://man7.org/conf/index.html.

Explore CSV files with VisiData

I talked about VisiData before and I think it’s a great tool. I still struggle with it a lot, mostly because I use it only every now and then. Accidentaly, I found how to open CSV files with non-standard delimiters like semicolons:

vd --csv-delimiter ";" my-ebanking-export.csv

Flamescope - Flamegraph on steroids

It’s like an interactive heatmap - you can select an arbitrary continuous time-slice of the captured profile, and visualize it as a flame graph.

Xv6, a simple Unix-like teaching operating system

This is recommeneded for teaching/learning operating systems in the book Operating Systems: Three Easy Pieces.

How can we run a command stored in a variable?

I needed this recently in our Dockerfile to avoid duplication when using tini

mycmd=(java)
if test "$$" = "1"; then
      mycmd=(tini -- java)
fi
exec "${mycmd[@]}" -XshowSettings:vm ...

This is also a good reminder to not run your application as PID 1 inside the container: https://gds-way.cloudapps.digital/manuals/programming-languages/docker.html#running-programs-as-process-id-pid-1

Links

A quick recap of some of the links mentioned in this post:

obsolete version of HTTP spec mandated using absolute URLs for redirects: see https://en.wikipedia.org/wiki/HTTP_location
My basic Java/Linux monitoring script.
Don’t run your application as PID 1 inside the container: https://gds-way.cloudapps.digital/manuals/programming-languages/docker.html#running-programs-as-process-id-pid-1
Amazon CloudFront now supports Server Timing headers
legacy Java file I/O
Java Security Manager is deprecated for removal.
The Quest for Low-Code: 9 paths, some of which actually work (Gregor Hohpe)
lsp4clj - create any LSP for any language in Clojure
Charred - new JSON & CSV parsing library with zero dependencies and very fast
Film Dialogue - one of the Pudding’s pages.
Carrier-grade NAT
You could produce a fake signature which Java would accept as a valid signature for any message and for any public key!**
Processes in an Uninterruptible Sleep (D) State - use Linux SysRq to get details about such processes
The Linux kernel user’s and administrator’s guide
Michael Kerrisk’s talk Containers unplugged: Linux namespaces
System call tracing with strace (Michael Kerrisk) - slides
I experimented a bit with pipe buffers.
Flamescope - Flamegraph on steroids
Xv6, a simple Unix-like teaching operating system - recommended in Operating Systems: Three Easy Pieces
How can we run a command stored in a variable?
Chris Houser started a book club reading Lisp in Small Pieces: https://chouser.us/lisp2022/
Serving content only to logged-in users with CloudFront Signed Cookies
clj-ddd-example
Leadership Without Management: Scaling Organizations by Scaling Engineers (Bryan Cantrill)

Tags: architecture security clojure aws linux writing weekly-bits

« Weekly Bits 10/2022 - Clojure, Cider 1.4, AWS architectures, Lisp in Small Pieces, and The Fieldstone Method Weekly Bits 08/2022 - Clojure keyword deserialization, 'reified', difftastic, and JDK 18 release »