-
The REPL: Issue 106 - June 2023
How First Principles Thinking Fails
Thinking from first principles is not a silver bullet:
When reasoning from first principles, you may fail if:
You have flawed assumptions.
You make a mistake in one of your inference steps. (An inference step is a step in the chain of reasoning in your argument).
You start from the wrong set of principles/axioms/base facts.
You reason upwards from the correct base, but end up at a ‘useless’ level of abstraction.
I’ve been thinking a lot about the third one. If you don’t know certain facts, and you don’t know that you don’t know them, you can reach wrong conclusions.
For example, if you only know Newtonian physics, you will calculate the orbits of plants very precisely, but you will come to the wrong conclusions about Mercury’s orbit. You need to know about Relativity to get that orbit correct.
In essence, the “unknown unknowns” can get you.
Speed matters: Why working quickly is more important than it seems
If you are fast at something, of course you can do more of that thing in a given time. The author proposes that you also have a lower perceived cost for doing more of that thing, and that in itself lowers he barrier to doing more of that activity. The conclusion is that if you want to do more of something, you should try getting faster at it.
How do you get faster at something? In running (the sport), you get faster by doing speed-work, sprints, and the like. You also get faster by running longer and longer distances, at a relatively slow pace. The physiology is interesting, but not my point. In guitar playing, a very effective technique for learning a fast solo or transition is to use a metronome and slow down to practice the section, say at 50%, speed until your are proficient at it, then increase to 60% and so on, until you can play it at 100%.
While I agree that doing something faster promotes you doing more of it, it is not always intuitive how to get faster.
I’m an ER doctor. Here’s how I’m already using ChatGPT to help treat patients
This resonates with me: In it’s current form, ChatGPT is already very usable to generate text that you can verify is correct. In this example, the doctor can read the text and can tell if it’s empathetic, like he requested, and correct. It saved him from having to type it, and in fact maybe even did something that he could not: Communicate with more empathy. In any case, the doctor was 100% capable of evaluating the text produced.
In my personal use of LLMs, they can and will suggest code snippets. It saves me the time of having to read API documentation of several classes. I can evaluate the result for correctness, and even more importantly, I will adapt for my use case which will include tests for formally verifying that the code indeed works as expected.
I am not sure about the drunken intern analogy: I probably would have phrased it as a sleep-deprived intern instead. The intern heuristic is useful though. Getting back to the my code usage: It is useful to think of LLMs as an intern that produces code very fast, but that I have to evaluate. “Hey intern, how do I add a custom header to a web request using Ruby’s ‘rest’ gem?”. I am capable of evaluating the result and making whatever corrections are needed. Time saved.
-
Surprise in Arel's API
arel
– A relational algebra library – is the ruby library that that powersActiveRecord
. It provides a lower-level abstraction for working withSQL
thanActiveRecord
, and typically used when a particular query is not possible withActiveRecord
. Over the years, I’ve reached out for it on occasion. The definitive guide to Arel, the SQL manager for Ruby provides good information if you haven’t used it before.I was recently working with a complicated query, and spent way more time because I assumed some things about it’s API that turned out to not be true.
Let’s first observe how
ActiveRecord::Relations
behave:ar_relation = User.where.not(active_thru: nil) ar_relation.to_sql # => SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL ar_relation.where(id: 5).to_sql # => SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5 ar_relation.to_sql # SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL
Notice how calling a second
where
on a relation returns a new relations without modifying the original. This allows composing relations in Rails with great effect.However, that is not true with
Arel::SelectManager
classes:table = User.arel_table arel_manager = table.where(table[:active_thru].not_eq(nil)) arel_manager.to_sql # => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL arel_manager.where(table[:id].eq(5)).to_sql # => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5 arel_manager.to_sql # => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5
Notice how adding a new
where
clause modified the original object.Caveat Emptor
-
The REPL: Issue 105 - May 2023
The Statistics Handbook
I’ve been taking a statistics course in Coursera. The lectures and exercises are great, but I was really missing a text book that I can come back and reference. I was happy to find this gem: Free and available to download.
The definitive guide to Arel, the SQL manager for Ruby
Recently, I’ve been doing more complicated SQL queries in Rails, for which the
ActiveRecord
API is not enough. EnterArel
, a relational algebra library on whichActiveRecord
is built, that allows more flexibility when using Rails. This is a great guide to using it.Arel
is considered a private API in Rails. I’ve found it to be very stable, but be mindful when using it.Introducing Tobox
This gem attempts to solve the write-to-multiple-databases problem when using background processing libraries in Ruby (e.g sidekiq). In effect, this is an event system, but without describing it heavily as such. The problem, stated more concisely is well described in Pattern: Transactional Outbox. The gem is new. I can’t comment on it’s maturity or stability. The author is right in pointing out that Rail’s ActiveJob DSL allows easy backgrounding of jobs, but ignores transactionality and dual-write problem that might exist. In fact, I was talking about this with some co-workers recently. One of the benefits of using GoodJob is that since the queue storage is in the same database, we can ignore this problem, as long as we are using a transaction.
-
GoodJob Bulk Enqueue
A common pattern in Rails application to queue similar jobs for a collection objects. For example:
post.watchers.find_each do |user| NotifyOfChanges.perform_later(user, post) end
The above will generate 1
INSERT
SQL statement for each job queued. I recently noticed that GoodJob introduced a bulk enqueue feature. It allows using a singleINSERT
statement for all those jobs, similar to Rails’s #insert_all:GoodJob::Bulk.enqueue do post.watchers.find_each do |user| NotifyOfChanges.perform_later(user, post) end
Let’s see what the performance is locally:
class NoOpJob < ApplicationJob def perform end end require 'benchmark/ips' Benchmark.ips do |x| x.config(:time => 10, :warmup => 5) x.report('Single Inserts') { ApplicationRecord.transaction do 500.times { NoOpJob.perform_later } end } x.report('Bulk Inserts') { ApplicationRecord.transaction do GoodJob::Bulk.enqueue do 500.times { NoOpJob.perform_later } end end } x.compare! end
$ rails runner benchmark.rb Running via Spring preloader in process 46655 Warming up -------------------------------------- Single Inserts 1.000 i/100ms Bulk Inserts 1.000 i/100ms Calculating ------------------------------------- Single Inserts 0.833 (± 0.0%) i/s - 9.000 in 10.823196s Bulk Inserts 4.746 (± 0.0%) i/s - 48.000 in 10.155956s Comparison: Bulk Inserts: 4.7 i/s Single Inserts: 0.8 i/s - 5.70x slower
Locally, we can see a significant performance boost due to fewer round trips to the database. But using bulk enqueue can be even more impactful than that. Production systems typically see much more concurrent load that my local machine. When the queueing is wrapped in a transaction, it can be very disruptive. Long-running transactions can slow the whole system down. Bulk inserting records is a great way to keep transactions short, and the GoodJob feature provides an easy way to do that, while keeping the semantics of the code the same.
-
The REPL: Issue 104 - April 2023
Making A Network Call: Mitigate The Risk
Nate Berkopec, well knows for his Ruby/Rails performance work, writes some good advice to mitigating the performance risk of making network calls: Make calls whenever possible in background jobs, set aggressive network timeouts, and use circuit breakers to fail fast when you detect a system is misbehaving.
I’m not saying this is easy, I’m saying it’s necessary.
Makefile Tutorial By Example
make
is tried and true technology. I don’t writeMakefile
s often. When I do, having a mental model of howmake
treats dependencies helps make the whole enterprise more efficient and enjoyable. This guide has plenty of material to get you started.Pure sh Bible
Very ingenious collection of recipes for
sh
, that avoid using new processes. Some of the syntax is clever, but terrifying to read. Case in point:trim_string() { trim=${1#${1%%[![:space:]]*}} trim=${trim%${trim##*[![:space:]]}} printf '%s\n' "$trim" }