-
The REPL: Issue 105 - May 2023
The Statistics Handbook
I’ve been taking a statistics course in Coursera. The lectures and exercises are great, but I was really missing a text book that I can come back and reference. I was happy to find this gem: Free and available to download.
The definitive guide to Arel, the SQL manager for Ruby
Recently, I’ve been doing more complicated SQL queries in Rails, for which the
ActiveRecord
API is not enough. EnterArel
, a relational algebra library on whichActiveRecord
is built, that allows more flexibility when using Rails. This is a great guide to using it.Arel
is considered a private API in Rails. I’ve found it to be very stable, but be mindful when using it.Introducing Tobox
This gem attempts to solve the write-to-multiple-databases problem when using background processing libraries in Ruby (e.g sidekiq). In effect, this is an event system, but without describing it heavily as such. The problem, stated more concisely is well described in Pattern: Transactional Outbox. The gem is new. I can’t comment on it’s maturity or stability. The author is right in pointing out that Rail’s ActiveJob DSL allows easy backgrounding of jobs, but ignores transactionality and dual-write problem that might exist. In fact, I was talking about this with some co-workers recently. One of the benefits of using GoodJob is that since the queue storage is in the same database, we can ignore this problem, as long as we are using a transaction.
-
GoodJob Bulk Enqueue
A common pattern in Rails application to queue similar jobs for a collection objects. For example:
post.watchers.find_each do |user| NotifyOfChanges.perform_later(user, post) end
The above will generate 1
INSERT
SQL statement for each job queued. I recently noticed that GoodJob introduced a bulk enqueue feature. It allows using a singleINSERT
statement for all those jobs, similar to Rails’s #insert_all:GoodJob::Bulk.enqueue do post.watchers.find_each do |user| NotifyOfChanges.perform_later(user, post) end
Let’s see what the performance is locally:
class NoOpJob < ApplicationJob def perform end end require 'benchmark/ips' Benchmark.ips do |x| x.config(:time => 10, :warmup => 5) x.report('Single Inserts') { ApplicationRecord.transaction do 500.times { NoOpJob.perform_later } end } x.report('Bulk Inserts') { ApplicationRecord.transaction do GoodJob::Bulk.enqueue do 500.times { NoOpJob.perform_later } end end } x.compare! end
$ rails runner benchmark.rb Running via Spring preloader in process 46655 Warming up -------------------------------------- Single Inserts 1.000 i/100ms Bulk Inserts 1.000 i/100ms Calculating ------------------------------------- Single Inserts 0.833 (± 0.0%) i/s - 9.000 in 10.823196s Bulk Inserts 4.746 (± 0.0%) i/s - 48.000 in 10.155956s Comparison: Bulk Inserts: 4.7 i/s Single Inserts: 0.8 i/s - 5.70x slower
Locally, we can see a significant performance boost due to fewer round trips to the database. But using bulk enqueue can be even more impactful than that. Production systems typically see much more concurrent load that my local machine. When the queueing is wrapped in a transaction, it can be very disruptive. Long-running transactions can slow the whole system down. Bulk inserting records is a great way to keep transactions short, and the GoodJob feature provides an easy way to do that, while keeping the semantics of the code the same.
-
The REPL: Issue 104 - April 2023
Making A Network Call: Mitigate The Risk
Nate Berkopec, well knows for his Ruby/Rails performance work, writes some good advice to mitigating the performance risk of making network calls: Make calls whenever possible in background jobs, set aggressive network timeouts, and use circuit breakers to fail fast when you detect a system is misbehaving.
I’m not saying this is easy, I’m saying it’s necessary.
Makefile Tutorial By Example
make
is tried and true technology. I don’t writeMakefile
s often. When I do, having a mental model of howmake
treats dependencies helps make the whole enterprise more efficient and enjoyable. This guide has plenty of material to get you started.Pure sh Bible
Very ingenious collection of recipes for
sh
, that avoid using new processes. Some of the syntax is clever, but terrifying to read. Case in point:trim_string() { trim=${1#${1%%[![:space:]]*}} trim=${trim%${trim##*[![:space:]]}} printf '%s\n' "$trim" }
-
The REPL: Issue 103 - March 2023
What Is ChatGPT Doing … and Why Does It Work?
The Stephan Wolfram explains what ChaptGPT is doing. This article is very technical and on the long side. I found it quite enlightening to learn about what LLM are doing. A more accessible article article can be found, but I hate it’s name: “normie” is an objectionable term. It’s condecending, like you are not “in on something”.
I wish people would stop insisting that Git branches are nothing but refs
The article covers the friction of the mental model of using git vs the actual implementation. I’ve read a ton of complaints about git’s UX, the leaky abstraction, and all that. It’s true: Using git without knowing some of the plumbing is hard and doesn’t make much sense. In my experience, the more I’ve learned over the years from articles and especially from the Building Git Book, have made me much better at it, because I know understand the internals.
In any case, the author seems correct to me: Saying that branches are just refs, is not helpful. A branch is a moving ref, which implies a series of commits. That is how most people think and talk about branches.
-
TIL: rails restart
I first started writing Rails in 2010. Today I learned on the Ruby on Rails Blog that you can restart a running server in development with:
$ bin/rails restart
Up until today, I always quit my server (ctrl-c) and restarted when I wanted to pick a change that won’t be hot-reloaded (e.g. a change to an initializer). This works, but it is slower, especially when I am using foreman to start a fleet of processes (e.g. webpacker, background workers).
Learning is a life-long process.