-
The REPL: Issue 123 - November 2024
Plugging the Postgres Upgrade Hole
Upgrading Postgres is hard. I’ve felt the pain, even upgrading local databases. The article outlines why that is the case. Maybe there is some hope for the future.
What is a Staff Engineer?
It uses a new-to-me framework for evaluating a job on 4 levels: Core technical skill, product management, project management, and people management. It then talks about what a Staff engineers does in each of these levels.
How we made a Ruby method 200x faster
A tale of performance optimization by using a profiler. How to focus on the important parts of a flame graph still seems a bit like a dark art. And not the key point of the article, but the refactor looks really nice!
-
A Rails Migration Foot Gun
I recently discovered a foot gun when writing rails migrations.
Rails runs migrations inside a transaction by default, for those databases that support it (e.g. Postgres). It also provides a what to disable it if you so choose, by using
disable_ddl_transaction!
. That can be useful for example for creating a large index concurrently, which is not supported inside a transaction. It looks like this:class FootGun < ActiveRecord::Migration[7.2] disable_ddl_transaction! def change create_table :foot_guns end end
So far, so good. However, because of how
disable_ddl_transaction!
is implemented, there is also adisable_ddl_transaction
method defined. That is an accessor that checks weather the migration should be run in a transaction or not. But it can be used by mistake:class FootGun < ActiveRecord::Migration[7.2] disable_ddl_transaction # This doesn't do anything!!! def change create_table :foot_guns end end
The migrations looks like it is disabling the transaction, but it’s actually not. It’ also a hard mistake to catch, because the output rails prints out when running the migration in both cases is the same:
$ rails db:migrate == 20241116193728 FootGun: migrating ========================================== -- create_table(:foot_guns) -> 0.0137s == 20241116193728 FootGun: migrated (0.0158s) =================================
I’d love for
disable_ddl_transaction
not to exist at all, so that aNameError
would be raised, and this mistake was impossible to make. -
The REPL: Issue 122 - October 2024
Waiting for PostgreSQL 18 – Add temporal PRIMARY KEY and UNIQUE constraints
In this article, and a follow up we learn about upcoming changes to Postgres 18 that will make temporal modeling much easier. A welcome change. Maybe soon after that we can get libraries to leverage it in popular web frameworks.
Rightward assiggment in Ruby
It’s now possible to use rightward (
->
) assignment in Ruby. The tweet talks about using it in “pipelines”:rand(100) .then { _1 * 2 } .then { _1 -3 } => value value # => 7
I am very fond of pipelines like that, but feel that the
=>
is not very visible. What I want to write is:rand(100) .then { _1 * 2 } .then { _1 -3 } => value
But that doesn’t work, because the parser balks. I can use a
\
, but that makes it awkward:rand(100) .then { _1 * 2 } .then { _1 -3 } \ => value value # => 87
Goodhart’s Law Isn’t as Useful as You Might Think
when a measure becomes a target, it ceases to be a good measure
Long dive into concepts from operations research that go deeper than the pithy “law” and explain the mechanisms at play.
-
Postgres default values as a backfill method
Often, I want to add a new column to a Postgres table with a default value for new records, but also want existing records to have a different value. Changing Postgres default value can make this a very fast operation.
Let’s see an example. Let’s assume we have a
songs
table, and we want to add aliked
column. Existing records need to have the value be set tofalse
, while new values have it set totrue
.Table and initial data setup:
CREATE TABLE songs ( name character varying NOT NULL ); -- CREATE TABLE -- Time: 16.084 ms INSERT INTO songs(name) VALUES ('Stairway To Heaven'); -- INSERT 0 1 -- Time: 0.590 ms SELECT * FROM songs; -- name -- -------------------- -- Stairway To Heaven -- (1 row) -- -- Time: 0.652 ms
Now, let’s add the new column with a default value of
false
. That is not our end-goal, but it will add that value to existing records1:ALTER TABLE songs ADD COLUMN liked boolean DEFAULT false; -- ALTER TABLE -- Time: 3.745 ms SELECT * FROM songs; -- name | liked -- --------------------+------- -- Stairway To Heaven | f -- (1 row) -- -- Time: 0.672 ms ALTER TABLE songs ALTER COLUMN liked SET NOT NULL; -- ALTER TABLE -- Time: 1.108 ms
Now, if we change the default value to
true
, and insert a new record:ALTER TABLE songs ALTER COLUMN liked SET DEFAULT true; -- ALTER TABLE -- Time: 4.664 ms INSERT INTO songs(name) VALUES ('Hotel California'); -- INSERT 0 1 -- Time: 1.447 ms SELECT * FROM songs; -- -- name | liked -- --------------------+------- -- Stairway To Heaven | f -- Hotel California | t -- (2 rows) -- -- Time: 0.791 ms
As we can see, we have the schema in the shape that we want, and the correct data stored in it, without needing a “traditional” backfill to modify each existing row manually. The default value method is much faster, since Postgres doesn’t need to update each record, just check the default value when they were created. 👍🏻
-
Stairway To Heaven is excellent. I’m not implying that I don’t like it. I do. It’s an anthem. ↩
-
-
The REPL: Issue 121 - September 2024
OAuth from First Principles
The articles explains how the problem of sharing access between servers evolves into OAuth, as one starts trying to solve the security issues in the naive implementation.
The “email is authentication” pattern
For some people, logging into a website means using the “forgot your password” flow every time they want to log in. They do in lieu of other schemes like using the same password, using a password manager, using a password generation scheme, etc.
Are people well informed about the options? From the website’s perspective it doesn’t matter much: Essentially, having access to an email address grants you access to the website. As long as that is the case, we might as well use “magic links” for authentication and do away with passwords all together.
In fact, in many places, email is now also used as 2-factor authentication. If the website has a “forgot my password” flow via email, then 2-factor via email only adds the illusion of security.
Solid Queue 1.0 released
I’m happy about this development: Rails should definitely have a canonical queue implementation. I’m also interested in it’s performance because of the
UPDATE FOR SKIP LOCKED
usage. I plan on evaluating it in in the future vs GoodJob. I noticed a few things about the announcement:37 Signals production setup uses claims 20M jobs per day with 800 workers. That seems like a lot of workers, but without the context of what those workers are doing.
They are using a separate database for the queue. While I get that it alleviates some of the performance concerns with the main database, you also loose transactionality between your jobs and the rest of your writes: To me, transactionality is one of the main selling points of using a db-based queueing system. I’ve chased many production issue where using a separate data store for the jobs ends up causing the queue workers to look for records that are not visible, either temporarily due to a race condition, or permanently because of a roll-back. Using 2 separate databases also means that each Rails process (web or worker) needs a connection to each database.
In the announcement there is a link to a Postgres-only issue recently fixed, that made me realize that Solid Queue has concurrency controls built-in, and uses
INSERT ON CONFLICT DO NOTHING
to enforce them. That is clever, and more efficient than checking for existence of the concurrency key before inserting.