Ylan Segal

The REPL: Issue 58 - June 2019

Per-project Postgres

In this post, Jamey Sharp elaborates on a neat technique to run different versions of postgres on a per-project basis. I learned that you can run postgres on a Unix socket only, without having a port open, which removes the need to manage those ports for each version of postgres. The technique also has the advantage of keeping the data for the project, inside the project directory structure. It illustrates the power and flexibility of Unix tools.

How to do distributed locking

Martin Kleppmann writes about distributed locks in general, and in particular the merits of Redlock, a Redis-based distributed-lock algorithm. Kleppmann breaks down the reasons to use a distributed lock, it’s characteristics, and how Redlock in particular is vulnerable to timing attacks. I found this to be great technical writing. The post came about when Kleppmann was researching his book, Desiging Data-Intensive Applications. I finished that book a few days ago, and hope to write a review soon. I can recommend it enough.

The REPL: Issue 57 - May 2019

We Can Do Better Than SQL

Simple SQL statements can read almost like English. With just a bit of complexity (e.g. more than one join) they quickly can become almost impossible to dicern. In this post Elvis Pranskevichus critiques SQL’s shortcomings compellingly. He then introduces EdgeQL, a query language designed to fix SQLs shortcomings. This is the first time I’ve heard of it or EdgeDB.

Is High Quality Software Worth the Cost?

With his traditional knack for analysis and synthesis, Martin Fowler describes how the familiar trade-off of quality and cost that is intuitive in the physical world doesn’t quite hold for software. Software projects are constantly evolving, requirements changing. Internal quality determines that speed at which features can be delivered. Disregarding internal quality leads to software projects where it becomes almost impossible to continue making changes. I can’t recommend this article enough.

Chasing a Segmentation Fault

Recently, I chased down a segmentation fault occurring in one of our production servers. A segmentation fault cannot be triggered by code is that written completely in Ruby, barring a bug in Ruby itself. The VM manages the memory, making it impossible to access memory in violation of the OS rules.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system the software has attempted to access a restricted area of memory.

Not surprisingly, I tracked it down to a gem with a C extension: A database driver for MS SQL. The issue can be reproduced by attempting to read results from a connection after it has been closed. I don’t expect to be able to read the results, but I expected an exception to be raised, not for the whole process to crash. I reported the bug.

Interestingly, I can also reproduce the segfault by running the garbage collector (GC) manually. The way we interact with the gem is by instantiating a TinyTds::Client object, and executing some SQL. It then returns a TinyTds::Result object. The segfault is triggered when (1) the client object is no longer in scope (thus eligible for garbage collection), (2) the GC runs, and (3) then the result object is used. Since normally the GC runs at Ruby’s pleasure, we see non-deterministic segfaults in production, with varying stack traces.

The gem hasn’t been fixed yet, but I believe I can solve our particular issue by re-organizing my code so that the client and result objects are always in scope at the same time. The most expedient solution was to read all the results into Ruby as soon as possible. I was concerned that this would increase the memory usage. This was a good opportunity to use benchmark-memory.

1
2
3
4
5
6
7
8
require 'benchmark/memory'

Benchmark.memory do |x|
  x.report('with workaround') { Existing.run  }
  x.report('without workaround') { Workaround.run }

  x.compare!
end

Its output is handy.

1
2
3
4
5
6
7
8
9
10
11
Calculating -------------------------------------
     with workaround     1.338B memsize (    19.558M retained)
                        17.715M objects (     6.234k retained)
                        50.000  strings (    50.000  retained)
  without workaround     1.397B memsize (    12.303k retained)
                        18.828M objects (   109.000  retained)
                        50.000  strings (    19.000  retained)

Comparison:
     with workaround: 1337820417 allocated
  without workaround: 1396586267 allocated - 1.04x more

Looks like there was no need for concern.

RailsConf 2019 Talk: Bug-Driven Development

Last week I had the pleasure of attending RailsConf 2019 for the first time, and the honor to be a speaker. My talk was titled “Bug-Driven Development”. On the surface it’s ostensibly a war story about fixing a particularly nasty bug. At a deeper level, it is about software design evolution. Software is an iterative endeavor, perpetually in a state of flux. Requirements change, new features are added, external APIs are deprecated, scaling demands adjustments. In the talk, I try to thread the needle between a specific bug fix and the broader applicability of design patterns, proper abstractions, and the role of testing. My goal was for audience members to be able to see both the proverbial forest and the trees: to connect the ivory-tower, abstract design concepts with the day-to-day practice of writing code, test-driven development, and fixing bugs.

RailsConf itself was a great experience. The organizers did a great job, both from an attendee, and a speaker perspective. Attending the conference gave me an opportunity to learn about new topics, dive deeper on familiar ones, connect with other rubyists, and hopefully contribute my own grain of sand to the community from which I have gained so much. This may have been my first RailsConf, but I am sure I’ll come back in the future.

Last — but not least, I am grateful of all the support received from Procore. The continuous drive to engineering excellence is inspiring and contagious. Its commitment and investment to the personal development of the engineering team made this talk possible.

Links:

The REPL: Issue 56 - April 2019

Technical Debt Is Like Tetris

Eric Higgins makes an analogy that resonates: Technical debit is like a Tetris. It accumulation can always be seen, but it not necessarily detrimental until there is enough of it. At some point it can make it impossible to make progress, or even to keep the game going. I guess starting a new game is equivalent for rewriting your software :-).

Storing UTC is not a silver bullet

One doesn’t have to program for long before encountering the sharp edges of date, time, and time zones handling. For a lot of use cases, storing date/time in UTC is a good solution. In this post, Jon Skeet explains why it doesn’t solve all the possible problems. The biggest takeaway is that Time zone definition changes constantly, and if your code doesn’t account for it explicitly, it will do so implicitly – sometimes with unexepcted results.

Using GNU Stow to manage your dotfiles

I manage my Unix dotfiles with care, manage them with source control, and take them with me from machine to machine. I spend a lot of time in my Unix environment and my configuration allows me to be productive. Until recently, I managed my dotfiles with a hand-crafted script that created symlinks. In the past I tried a few different solutions for the same problem, but nothing worked well for both files, directories and arbitrary locations. In this post, Brandon Invergo explains how to leverage a new-to-me utility: GNU stow. It’s purpose is to manage symlinks, usually in the context of activating different version of the same software in the same system. It works great for dotfiles!