Curious (Clojure) Programmer Simplicity matters

Menu

  • Home
  • Archives
  • Tags
  • About
  • My Talks
  • Weekly Bits & Pieces
  • Clojure Tip of the Day Screencast
  • RSS
June 13, 2022

Java in Docker: Pid 1 and git zombies - who's the reaper?

  • Prelude: Riddles in the dark
  • Act one: How I "fixed" the PID 1 problem
    • In a "fixing mood"
  • Act two: Zombies are coming…​
    • Zombies: neither dead, nor Alive
    • Orphans?
    • Why is it a problem, after all?
  • Act two: Init
    • The roles of the init process
  • Intermezzo: Subprocess execution in Java
    • process reaper
  • Act Three:
  • Finale: Rest in peace (or rather "The fix"??)
  • Rest in peace? (Key takeaways)??
  • Epilog: If you didn’t fix it, it’s ain’t fixed
  • Further reading

You can’t see, them but they can find you.

You can’t kill them, but they will kill you.

— process reaper

Prelude: Riddles in the dark

On a Saturday night, I found myself deep in strace logs checking [not a good word] syscalls made by our (CodeScene) on-prem app. This thing, that a customer reported, with "git zombies wandering around" on their machine had been worrying me. I couldn’t resist the temptation - I had to find what’s going on.

Act one: How I "fixed" the PID 1 problem

Months ago, when enabling JFR monitoring for our application running in a Docker container, I noticed that the app isn’t responding to signals: if you called docker stop <codescene-container>, then it would halt for 10 seconds and then the whole container was SIGKILL-ed.

This is a classic problem when your application is running as a subprocess of a shell such as bash. The shell won’t pass signals to the subprocess and thus the subprocess cannot respond to it.

docker stop first stends SIGTERM and waits for 10 seconds. If the container isn’t stopped yet, then it sends SIGKILL which does its dirty job and kills all the processes running in the container, immediately.

In a "fixing mood"

I was aware of the "signal-passing problem" and decided to fix it. After a brief study and asking on a forum I believed I came up with a simple solution: use exec to execute our Java/JVM process as the PID 1 in the container, effectively replacing the shell wrapper. This way, the signals would be passed immediately to the CodeScene process and thus the JVM can respond to SIGTERM and other signals as expected.

The fix was simple - prepend exec to the java command in the start.sh file used as ENTRYPOINT in Dockerfile:

exec java ...

Act two: Zombies are coming…​

Zombies: neither dead, nor Alive

Orphans?

Why is it a problem, after all?

Who’s the father of zombies?

Act two: Init

The roles of the init process

Intermezzo: Subprocess execution in Java

process reaper

Act Three:

Finale: Rest in peace (or rather "The fix"??)

In the end, the quickest solution was to add --init docker run flag or init: true for docker-compose. However, we cannot rely on customers doing that work for us. So we decided to add tiny straing into our docker image - that way it works in environments that don’t support the init flag (like Kubernetes) and doesn’t require any work/awareness from the end user.

As a bonus, I also remapped exit code 143[1] to 0. 143 is returned by Java when it receives the SIGTERM signal

The best solution I find

Rest in peace? (Key takeaways)??

Epilog: If you didn’t fix it, it’s ain’t fixed

I intended this to be a lead but perhaps better just as a conclusion.

Remember the last time you learned about a bug, or a "suboptimal solution", but you were too lazy (didn’t have time?) to fix it?

This one was a reminder for me: when you ignore bug it doesn’t fix itself. It will come, with no mercy, again. Don’t ignore problems and fix them when you find them (or at least plan for the fix).

Further reading

  • Running programs as process ID (PID) 1

  • Riddles in the Dark

  • Intermezzo


1. Why 143? Java is using the same convention as Bash: 128 + signal number

Tags:

« Leiningen, uberjars and a mysterious "dev-only" dependency problem. Weekly Bits 12/2022 - Abstractions, Lisp in Small Pieces, lein repl vs JIT, »

Copyright © 2022 Juraj Martinka

Powered by Cryogen | Free Website Template by Download Website Templates