Ylan Segal

The REPL: Issue 61 - September 2019

Building A Relational Database Using Kafka

Robert Yokota explores building a relational database on top of Kafka. It follows his previous article on creating an in-memory cache on backed by Kafka. RDBM systems are commonly thought of keeping track of tables and rows. The semantics of SQL reinforce the concept of rows being updatable. In practice though, most implementation use an immutable log under the hood. That is what makes transactions possible, each with its own consistent view of the world. Kafka can be thought of as an “exposed” MVCC system, and the current state of the data can be derived by consuming the messages in a topic. The article is interesting in that it assembles a relation database by using different existing open-source projects.

3 Key Ideas Behind The Erlang Thesis

Yiming Chen summarizes Joe Armstrong’s thesis: “Making reliable distributed systems in the presence of software errors”. The 3 key ideas identified: Concurrency oriented programming, abstracting concurrency, and let-it-fail philosophy. Armstrong is Erlang’s creator, and his thesis has been very influential in the Erlang and Elixir communities.

The REPL: Issue 60 - August 2019

Issue 60! I’ve been posting my favorite links to tech articles every month for the last 5 years! I’ve linked to 163 in that time (not including the links in this post). And, now that I am looking back… I realize that I’ve made a mistake and I re-used #53 for the 2018-12 and 2019-12 issues. ¯\_(ツ)_/¯

Engineers Don’t Solve Problems

This article by Dean Chahim is not about software engineering or computer science. It’s about Mexico City’s infrastructure and the decades-long battle to prevent flooding in the city. The article stroke a chord with me: Mexico City is my home town, it’s where I went to University to obtain my degree in Civil Engineering. The article illustrates how engineers make trade-offs that might have far-reaching consequences, and are not immune from political and socio-economic influence. There are lessons there for all engineers.

How to Build Good Software

Software has characteristics that make it hard to build with traditional management techniques; effective development requires a different, more exploratory and iterative approach.

Li Hongyi writes a thoughtful article on why software projects are not the same as other engineering projects, and require different management techniques. Successful software projects are very iterative and oscillate between cycles discovery and consolidation.

Arcs of Seniority

Stevan Popovic breaks down engineering seniority into a few factors: Independence, authority, Design, and Influence. During once career each of these develops in an engineer, and mark different types of seniority. As expected, not everyone reaches the same maturity in all factors at once. Each senior engineer has it’s own mix. The illustrations on the articles are particularly helpful.

Spring Hopes Eternal

I have a love-hate relationship with spring, Rails' application pre-loader. One one hand, it speeds up the feedback loop when doing TDD. Faster running specs, promote running them more often, which promotes writing code in smaller increments, and so forth. On the other hand, it is dark magic: In its quest to be unobtrusive, it starts automatically, and barely reports it’s being used at all. Occasionally it looses track of which code it needs to reload, causing much confusion to the user, as the code executing is different than the version saved on disk.

For a while, I disabled its use all together, by setting the DISABLE_SPRING environment variable. I found it tolerable while working on smaller rails apps, but not on the giant rails monolith I use everyday:

1
2
3
4
5
6
7
8
# spec/example_spec.rb
require 'rails_helper'

RSpec.describe 'A spec' do
  it 'states the obvious' do
    expect(1).to eq(1)
  end
end

Let’s time with and without spring:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ time bin/rspec spec/example_spec.rb
Running via Spring preloader in process 26118

Randomized with seed 15334
.

Finished in 0.05529 seconds (files took 3.86 seconds to load)
1 example, 0 failures

Randomized with seed 15334

bin/rspec spec/example_spec.rb  0.27s user 0.11s system 7% cpu 5.050 total

$ bin/spring stop
Spring stopped.

$ export DISABLE_SPRING=yes_please

$ time bin/rspec spec/example_spec.rb
Randomized with seed 42078
.

Finished in 0.09926 seconds (files took 9.99 seconds to load)
1 example, 0 failures

Randomized with seed 42078

bin/rspec spec/example_spec.rb  11.03s user 1.35s system 98% cpu 12.547 total

Running with spring takes 0.27 seconds. Running without takes 11.03. Can I have my cake and eat it too?

Git Hooks

I don’t have conclusive evidence, but I’ve noticed that code loading issues creep into spring when changing git branches. Git provides a mechanism to hook into it’s events and run an arbitrary script. Putting it all together, I created git hooks that stop spring:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env bash
# Copy this script to .git/hooks/post-checkout
# Make it executable (chmod +x ..git/hooks/post-checkout)

# The hook is given three parameters: the ref of the previous HEAD,
# the ref of the new HEAD (which may or may not have changed),
# and a flag indicating whether the checkout was a branch checkout
# (changing branches, flag=1) or a file checkout
# (retrieving a file from the index, flag=0).
if [[ "$3" == "1" ]] && [[ "$1" != "$2" ]]; then
  # Stop spring, if we have the binstub fot it

  spring_command="bin/spring"

  [[ -x "$spring_command" ]] || exit

  echo "Git Hook: Stopping spring"
  exec $spring_command stop
fi

I’ve been using the above hook for weeks. I haven’t encountered a code loading issue yet.

The REPL: Issue 59 - July 2019

View-centric performance optimization for database-backed web applications

This post is a walk-through of of the academic paper with the same title. Keeping page-load time low continues to be important, but it has become an increasingly challenging task, due to the ever-growing amount of data stored in back-end systems. The authors created a view-centric development environment that provides intuitive information about the cost of each HTML element on page, along with the performance-enhancing opportunities can be highlighted. The goal is to make it easier to explore functionality and performance trade-offs.

Interestingly, the development environment, Panorama, targets the Ruby on Rails framework specifically. I look forward to trying it out soon.

Zanzibar: Google’s Consistent, Global Authorization System

This paper includes a thorough description of the architecture behind Zanzibar, a global system for storing and evaluating access control lists internal to Google. As a highly distributed system, it builds on top of other Google technology, like Spanner – a distributed NoSQL database. In particular, I was very interested in consistency model and how they provide guarantees around external consistency so that the casual ordering of events is maintained. It achieves this by providing clients with tokens after write operations (called a zookie): When a client makes a subsequent request with that token, the system guarantees that any results are at least as fresh as the timestamp encoded in the zookie.

The paper has a lot more, including how they architect for performance with caching layers, and a purpose-built indexing system for deeply nested recursive permission structures.

Fast Feedback Loops

One of the reasons that I love TDD, is that it promotes fast feedback. You write a line, execute the tests, and see what the results are. I write outside-in-TDD most of the time. Occasionally, I don’t have a clear idea of what tests to write, or I am doing exploratory coding.

For example, lately I’ve found myself writing a fair amount of raw SQL queries (without an ORM). SQL is finicky, and produced notoriously hard-to-decipher errors. As a consequence, I like to build up SQL in small increments, and execute the work-in-progress statement often, to see it and its output alongside each other. My workflow looks something like this:

What is going on? I selected some SQL, executed in psql, and appended the commented-out output into the same selection. After inspection, I can change the statement and repeat.

Why?

The benefit I get from this workflow is that I can iterate in small steps, get feedback on what the current code does, and continue accordingly. This workflow is heavily inspired by Ruby’s xmpfilter or the newer seeing_is_believing. Both tools take Ruby code as input, execute it, and then record all (or some) of the evaluated code as comments to the code.

How?

This workflow is made possible by leveraging the pipe Atom package. I previously described it. It allows sending the current selection in Atom to any Unix command (or series of piped commands) and replaces the selection with the output.

Building on top of that, I wanted a unix command (that I called io, for lack of imagination) that would output both the original input and the commented-out output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bash
# Prints stdin and executes in the given program, commenting the output.

set -euo pipefail

# Determine which comment pattern to use
case $1 in
  psql)
    comment='-- ' ;;
  *)
    comment='# ' ;;
esac

grep -v "^$comment" /dev/stdin | tee >("$@" | sed "s/^/$comment/")

The case statement selects the correct comment prefix. It is customary in many Unix tools to treat a line starting with # as a comment. psql is different, in that it uses -- prefix. I haven’t needed support for anything else, but it’s easily extendible.

The meat of the execution breaks down like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Reads /dev/stdin and removes and lines starting with a comment
grep -v "^$comment" /dev/stdin

# The comment-less input, is now send to tee.
# tee will redirect the input to a file and to stdout.
| tee

# Instead of a file, we give tee a sub-shell as a file descriptor, using
# process substition.
>( )

# That subshell will execute the rest of the arguments passed to io
# as a command
"$@"

# the output is piped to sed, to add the comment prefix to every line
| sed "s/^/$comment/"

The result is that the final output is what we have been looking for: The original input without comments, plus the executed input with comments added.

From my point of view, this is a great example of the Unix philosophy: Composing utilities to create new functionality. I took advantage of the flexibility in input/output redirection and process substitution to improve my development workflow.