Curious (Clojure) Programmer: Weekly Bits & Pieces 02/2020 (4.5.

Clojure

Clojurians slack excerpts

Frankline Apiyo May 4th at 7:01 PM What are some reasons for using fully qualified keywords as map keys? bfabry: spec, more precise automated refactoring, ability to add new information to a map without worrying about clashing key names
hiredman [about core.async usage] it is better to use >! and <! ops to communicate to impose ordering over trying to do it via close!

Random Clojure bits

I spent a frustrating hour debugging a large function using name as a map key and also the clojure.core/name function (to convert a string to a keyword) at the same time. It was pretty obvious once I removed most of the code but mysterious while trying to debug the 70+ lines monster.

(defn- handle-login-callback [req]
(let [{:keys [id login name] :as user-info} {:id 123 :login "jumarko" :name "Juraj Martinka"}
provider-id :github
provider-name (name provider-id)]
provider-name))

#_(handle-login-callback {})
;;=>
Unhandled java.lang.ClassCastException
class java.lang.String cannot be cast to class clojure.lang.IFn (java.lang.String is in module
java.base of loader 'bootstrap'; clojure.lang.IFn is in unnamed module of loader 'app')

The solution here is to either rename the extracted key or use the fully qualified version: clojure.core/name. I went with clojure.core.name.

JVM

Andrei Pangin (my favorite JVM expert) shared a snippet of Java code that can be used to find out how much memory has been allocated by a single method: https://stackoverflow.com/questions/61539760/benchmarking-jvm-memory-consumption-similarly-to-how-it-is-done-by-the-android-o

I translated it to Clojure:

(defn allocated-bytes [f]
(let [thread-mbean (java.lang.management.ManagementFactory/getThreadMXBean)
thread-id (.getId (Thread/currentThread))
start (.getThreadAllocatedBytes thread-mbean thread-id)]
(f)
(- (.getThreadAllocatedBytes thread-mbean thread-id)
start)))

(allocated-bytes #())
;; => 20336
(allocated-bytes (fn []))
;; => 20176
;; notice that JOL shown 29480 bytes for vector of 1000 numbers so this looks close
(allocated-bytes (fn [] (vec (range 1000000))))
;; => 29436912

You can find this snippet in my clojure-experiments repository.

Projects Update

Python for Data Science

I continue reading the Python for Data Analysis book. I’ve just started the chapter 5 about Pandas and learned about the fundamental concepts like Series and DataFrame.

Series

Series is like a 1-dimensional array / list with optional labels (index) for each value. By default you the values are labeled from 0 to n-1, but you can specify your own index. A natural way to do this is to simply use a dict to instantiate our Series object:

import pandas as pd
sdata = {'Ohio':35000, 'Texas':71000, 'Oregon':16000, 'Utah':5000}
states_ser = Series(sdata)
states_ser
#=>
Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

# or you can specify (perhaps a slightly different) index explicitly to override dict keys ordering
# - notice that 'California' yields NaN and 'Utah' is omitted
states = ['California', 'Ohio', 'Oregon', 'Texas']
states_ser_idx = Series(sdata, index=states)
states_ser_idx
#=>
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

DataFrame

DataFrame can be visualized as a 2D table.

SICP

I made little progress last week - spent around 30 minutes on chapter 2.4.2. They mention a good example of mixing multiple representations of complex numbers, rectangular and polar, within the same program, and how that fits into broader Stratified design approach (see page 140 in the book and also Lisp: A Language for Stratified Design

You can find relevant code here in my clojure-experiments repository.

Computer Systems

I spent a quick 20-minute session by reading the chapter 3 (about machine code & assembler). I learned about the differences between the (frequently used) ATT (AT&T) assembly syntax and the Intel’s assembly format: https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

A few notable differences: - Instructions with multiple operands have arguments in reversed order; e.g. the first argument of the mov instruction is destination, not source - There aren’t different variants of a single instruction for different argument sizes (such as movl vs movq in ATT format); instead, they are distinquished by special syntax like in mov QWORD PTR [rbx], rax.

By default, gcc outputs ATT assembly but you can make it use Intel format via -masm=intel:

gcc -Og -S  -masm=intel mstore.c -o mstore.intel.s

Work (CodeScene)

Azure Containers

Last week I talked about performance issues when running CodeScene on Azure Container instances with mounted files shares.

I decided to perform more experiments:

I tried a premium file share with 5 TB quota. According to Azure this should yield much better performance: But I haven’t noticed any real improvement over the basic 100 GB file share. An analysis was still 10x slower.
I evaluated an alternative approach with using a file share only for the database (H2) file and storing the rest (cloned repositories and analysis results) on a container’s host file system. This is much better and the analysis running time was close to what I observe on my laptop.

Learning

Rapid Learner

I’ve been working on week 3 (Practice) which emphasizes importance of real work. That is, if you want to really learn something in depth, you need to actually solve real problems. For things like Math and Physics this means working on problem sets; for programming it means to actually write code.

Week 3 lessons highlight topics and techniques like:

distributed practice (aka spaced repetition)
active recall (testing yourself without looking at the source material)
failure of transfer (inherent difficulties in applying what you learned in real world)
direct practice (working directly on the skill you want to improve; e.g. using the foreign language you want to learn)
- also toy practice (e.g. Skype foreign language tutoring) as an easier form of direct practice when you’re not yet ready for real work

Personal (Hobbies)

There’s a very special type of rock climbing available in Czech republic - sandstone trad climbing. There are usually long distances between fixed protection points or they might be missing altogether. It’s not your typical limestone climbing.

We visited Český Ráj - Hruboskalsko to get a feeling of it. It was nice and we’re still alive :).

Weekly Bits & Pieces 02/2020 (4.5. - 11.5.)