The REPL: Issue 106 - June 2023

Jul 3, 2023 • Ylan Segal • the repl, machine_learning

How First Principles Thinking Fails

Thinking from first principles is not a silver bullet:

When reasoning from first principles, you may fail if:

You have flawed assumptions.

You make a mistake in one of your inference steps. (An inference step is a step in the chain of reasoning in your argument).

You start from the wrong set of principles/axioms/base facts.

You reason upwards from the correct base, but end up at a ‘useless’ level of abstraction.

I’ve been thinking a lot about the third one. If you don’t know certain facts, and you don’t know that you don’t know them, you can reach wrong conclusions.

For example, if you only know Newtonian physics, you will calculate the orbits of plants very precisely, but you will come to the wrong conclusions about Mercury’s orbit. You need to know about Relativity to get that orbit correct.

In essence, the “unknown unknowns” can get you.

Speed matters: Why working quickly is more important than it seems

If you are fast at something, of course you can do more of that thing in a given time. The author proposes that you also have a lower perceived cost for doing more of that thing, and that in itself lowers he barrier to doing more of that activity. The conclusion is that if you want to do more of something, you should try getting faster at it.

How do you get faster at something? In running (the sport), you get faster by doing speed-work, sprints, and the like. You also get faster by running longer and longer distances, at a relatively slow pace. The physiology is interesting, but not my point. In guitar playing, a very effective technique for learning a fast solo or transition is to use a metronome and slow down to practice the section, say at 50%, speed until your are proficient at it, then increase to 60% and so on, until you can play it at 100%.

While I agree that doing something faster promotes you doing more of it, it is not always intuitive how to get faster.

I’m an ER doctor. Here’s how I’m already using ChatGPT to help treat patients

This resonates with me: In it’s current form, ChatGPT is already very usable to generate text that you can verify is correct. In this example, the doctor can read the text and can tell if it’s empathetic, like he requested, and correct. It saved him from having to type it, and in fact maybe even did something that he could not: Communicate with more empathy. In any case, the doctor was 100% capable of evaluating the text produced.

In my personal use of LLMs, they can and will suggest code snippets. It saves me the time of having to read API documentation of several classes. I can evaluate the result for correctness, and even more importantly, I will adapt for my use case which will include tests for formally verifying that the code indeed works as expected.

I am not sure about the drunken intern analogy: I probably would have phrased it as a sleep-deprived intern instead. The intern heuristic is useful though. Getting back to the my code usage: It is useful to think of LLMs as an intern that produces code very fast, but that I have to evaluate. “Hey intern, how do I add a custom header to a web request using Ruby’s ‘rest’ gem?”. I am capable of evaluating the result and making whatever corrections are needed. Time saved.

Read on →
Surprise in Arel's API

Jun 19, 2023 • Ylan Segal • rails, arel
arel – A relational algebra library – is the ruby library that that powers ActiveRecord. It provides a lower-level abstraction for working with SQL than ActiveRecord, and typically used when a particular query is not possible with ActiveRecord. Over the years, I’ve reached out for it on occasion. The definitive guide to Arel, the SQL manager for Ruby provides good information if you haven’t used it before.

I was recently working with a complicated query, and spent way more time because I assumed some things about it’s API that turned out to not be true.

Let’s first observe how ActiveRecord::Relations behave:
```
ar_relation = User.where.not(active_thru: nil)
ar_relation.to_sql
# =>  SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL

ar_relation.where(id: 5).to_sql
# => SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5

ar_relation.to_sql
# SELECT "users".* FROM "users" WHERE "users"."active_thru" IS NOT NULL
```
Notice how calling a second where on a relation returns a new relations without modifying the original. This allows composing relations in Rails with great effect.

However, that is not true with Arel::SelectManager classes:
```
table = User.arel_table
arel_manager = table.where(table[:active_thru].not_eq(nil))
arel_manager.to_sql
# => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL

arel_manager.where(table[:id].eq(5)).to_sql
# => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5

arel_manager.to_sql
# => SELECT FROM "users" WHERE "users"."active_thru" IS NOT NULL AND "users"."id" = 5
```
Notice how adding a new where clause modified the original object.

Caveat Emptor
Read on →
The REPL: Issue 105 - May 2023

Jun 6, 2023 • Ylan Segal • the repl, rails, active_record, arel, statistics, good_job

The Statistics Handbook

I’ve been taking a statistics course in Coursera. The lectures and exercises are great, but I was really missing a text book that I can come back and reference. I was happy to find this gem: Free and available to download.

The definitive guide to Arel, the SQL manager for Ruby

Recently, I’ve been doing more complicated SQL queries in Rails, for which the ActiveRecord API is not enough. Enter Arel, a relational algebra library on which ActiveRecord is built, that allows more flexibility when using Rails. This is a great guide to using it. Arel is considered a private API in Rails. I’ve found it to be very stable, but be mindful when using it.

Introducing Tobox

This gem attempts to solve the write-to-multiple-databases problem when using background processing libraries in Ruby (e.g sidekiq). In effect, this is an event system, but without describing it heavily as such. The problem, stated more concisely is well described in Pattern: Transactional Outbox. The gem is new. I can’t comment on it’s maturity or stability. The author is right in pointing out that Rail’s ActiveJob DSL allows easy backgrounding of jobs, but ignores transactionality and dual-write problem that might exist. In fact, I was talking about this with some co-workers recently. One of the benefits of using GoodJob is that since the queue storage is in the same database, we can ignore this problem, as long as we are using a transaction.

Read on →

GoodJob Bulk Enqueue

May 14, 2023 • Ylan Segal • good_job, rails, ruby

A common pattern in Rails application to queue similar jobs for a collection objects. For example:

post.watchers.find_each do |user|
  NotifyOfChanges.perform_later(user, post)
end

The above will generate 1 INSERT SQL statement for each job queued. I recently noticed that GoodJob introduced a bulk enqueue feature. It allows using a single INSERT statement for all those jobs, similar to Rails’s #insert_all:

GoodJob::Bulk.enqueue do
  post.watchers.find_each do |user|
  NotifyOfChanges.perform_later(user, post)
end

Let’s see what the performance is locally:

class NoOpJob < ApplicationJob
  def perform
  end
end

require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(:time => 10, :warmup => 5)


  x.report('Single Inserts') {
    ApplicationRecord.transaction do
      500.times { NoOpJob.perform_later }
    end
  }
  x.report('Bulk Inserts') {
    ApplicationRecord.transaction do
      GoodJob::Bulk.enqueue do
        500.times { NoOpJob.perform_later }
      end
    end
  }

  x.compare!
end

$ rails runner benchmark.rb
Running via Spring preloader in process 46655
Warming up --------------------------------------
      Single Inserts     1.000  i/100ms
        Bulk Inserts     1.000  i/100ms
Calculating -------------------------------------
      Single Inserts      0.833  (± 0.0%) i/s -      9.000  in  10.823196s
        Bulk Inserts      4.746  (± 0.0%) i/s -     48.000  in  10.155956s

Comparison:
        Bulk Inserts:        4.7 i/s
      Single Inserts:        0.8 i/s - 5.70x  slower

Locally, we can see a significant performance boost due to fewer round trips to the database. But using bulk enqueue can be even more impactful than that. Production systems typically see much more concurrent load that my local machine. When the queueing is wrapped in a transaction, it can be very disruptive. Long-running transactions can slow the whole system down. Bulk inserting records is a great way to keep transactions short, and the GoodJob feature provides an easy way to do that, while keeping the semantics of the code the same.

Read on →

The REPL: Issue 104 - April 2023

May 1, 2023 • Ylan Segal • the repl, performance, make, unix
Making A Network Call: Mitigate The Risk

Nate Berkopec, well knows for his Ruby/Rails performance work, writes some good advice to mitigating the performance risk of making network calls: Make calls whenever possible in background jobs, set aggressive network timeouts, and use circuit breakers to fail fast when you detect a system is misbehaving.

I’m not saying this is easy, I’m saying it’s necessary.

Makefile Tutorial By Example

make is tried and true technology. I don’t write Makefiles often. When I do, having a mental model of how make treats dependencies helps make the whole enterprise more efficient and enjoyable. This guide has plenty of material to get you started.

Pure sh Bible

Very ingenious collection of recipes for sh, that avoid using new processes. Some of the syntax is clever, but terrifying to read. Case in point:
```
trim_string() {
    trim=${1#${1%%[![:space:]]*}}
    trim=${trim%${trim##*[![:space:]]}}

    printf '%s\n' "$trim"
}
```
Read on →

« Older Newer »