• The REPL: Issue 123 - November 2024

    Plugging the Postgres Upgrade Hole

    Upgrading Postgres is hard. I’ve felt the pain, even upgrading local databases. The article outlines why that is the case. Maybe there is some hope for the future.

    What is a Staff Engineer?

    It uses a new-to-me framework for evaluating a job on 4 levels: Core technical skill, product management, project management, and people management. It then talks about what a Staff engineers does in each of these levels.

    How we made a Ruby method 200x faster

    A tale of performance optimization by using a profiler. How to focus on the important parts of a flame graph still seems a bit like a dark art. And not the key point of the article, but the refactor looks really nice!

    Read on →

  • A Rails Migration Foot Gun

    I recently discovered a foot gun when writing rails migrations.

    Rails runs migrations inside a transaction by default, for those databases that support it (e.g. Postgres). It also provides a what to disable it if you so choose, by using disable_ddl_transaction!. That can be useful for example for creating a large index concurrently, which is not supported inside a transaction. It looks like this:

    class FootGun < ActiveRecord::Migration[7.2]
      disable_ddl_transaction!
    
      def change
        create_table :foot_guns
      end
    end
    

    So far, so good. However, because of how disable_ddl_transaction! is implemented, there is also a disable_ddl_transaction method defined. That is an accessor that checks weather the migration should be run in a transaction or not. But it can be used by mistake:

    class FootGun < ActiveRecord::Migration[7.2]
      disable_ddl_transaction # This doesn't do anything!!!
    
      def change
        create_table :foot_guns
      end
    end
    

    The migrations looks like it is disabling the transaction, but it’s actually not. It’ also a hard mistake to catch, because the output rails prints out when running the migration in both cases is the same:

    $ rails db:migrate
    == 20241116193728 FootGun: migrating ==========================================
    -- create_table(:foot_guns)
       -> 0.0137s
    == 20241116193728 FootGun: migrated (0.0158s) =================================
    

    I’d love for disable_ddl_transaction not to exist at all, so that a NameError would be raised, and this mistake was impossible to make.

    Read on →

  • The REPL: Issue 122 - October 2024

    Waiting for PostgreSQL 18 – Add temporal PRIMARY KEY and UNIQUE constraints

    In this article, and a follow up we learn about upcoming changes to Postgres 18 that will make temporal modeling much easier. A welcome change. Maybe soon after that we can get libraries to leverage it in popular web frameworks.

    Rightward assiggment in Ruby

    It’s now possible to use rightward (->) assignment in Ruby. The tweet talks about using it in “pipelines”:

    rand(100)
      .then { _1 * 2 }
      .then { _1 -3 } => value
    
    value # => 7
    

    I am very fond of pipelines like that, but feel that the => is not very visible. What I want to write is:

    rand(100)
      .then { _1 * 2 }
      .then { _1 -3 }
      => value
    

    But that doesn’t work, because the parser balks. I can use a \, but that makes it awkward:

    rand(100)
      .then { _1 * 2 }
      .then { _1 -3 } \
      => value
    
    value # => 87
    

    Goodhart’s Law Isn’t as Useful as You Might Think

    when a measure becomes a target, it ceases to be a good measure

    Long dive into concepts from operations research that go deeper than the pithy “law” and explain the mechanisms at play.

    Read on →

  • Postgres default values as a backfill method

    Often, I want to add a new column to a Postgres table with a default value for new records, but also want existing records to have a different value. Changing Postgres default value can make this a very fast operation.

    Let’s see an example. Let’s assume we have a songs table, and we want to add a liked column. Existing records need to have the value be set to false, while new values have it set to true.

    Table and initial data setup:

    CREATE TABLE songs (
      name character varying NOT NULL
    );
    -- CREATE TABLE
    -- Time: 16.084 ms
    
    INSERT INTO songs(name) VALUES ('Stairway To Heaven');
    -- INSERT 0 1
    -- Time: 0.590 ms
    
    SELECT * FROM songs;
    --         name
    -- --------------------
    --  Stairway To Heaven
    -- (1 row)
    --
    -- Time: 0.652 ms
    

    Now, let’s add the new column with a default value of false. That is not our end-goal, but it will add that value to existing records1:

    ALTER TABLE songs
    ADD COLUMN liked boolean DEFAULT false;
    -- ALTER TABLE
    -- Time: 3.745 ms
    
    SELECT * FROM songs;
    --         name        | liked
    -- --------------------+-------
    --  Stairway To Heaven | f
    -- (1 row)
    --
    -- Time: 0.672 ms
    
    ALTER TABLE songs
    ALTER COLUMN liked SET NOT NULL;
    -- ALTER TABLE
    -- Time: 1.108 ms
    

    Now, if we change the default value to true, and insert a new record:

    ALTER TABLE songs ALTER COLUMN liked SET DEFAULT true;
    -- ALTER TABLE
    -- Time: 4.664 ms
    
    INSERT INTO songs(name) VALUES ('Hotel California');
    -- INSERT 0 1
    -- Time: 1.447 ms
    
    SELECT * FROM songs;
    --
    --         name        | liked
    -- --------------------+-------
    --  Stairway To Heaven | f
    --  Hotel California   | t
    -- (2 rows)
    --
    -- Time: 0.791 ms
    

    As we can see, we have the schema in the shape that we want, and the correct data stored in it, without needing a “traditional” backfill to modify each existing row manually. The default value method is much faster, since Postgres doesn’t need to update each record, just check the default value when they were created. 👍🏻

    1. Stairway To Heaven is excellent. I’m not implying that I don’t like it. I do. It’s an anthem. 

    Read on →

  • The REPL: Issue 121 - September 2024

    OAuth from First Principles

    The articles explains how the problem of sharing access between servers evolves into OAuth, as one starts trying to solve the security issues in the naive implementation.

    The “email is authentication” pattern

    For some people, logging into a website means using the “forgot your password” flow every time they want to log in. They do in lieu of other schemes like using the same password, using a password manager, using a password generation scheme, etc.

    Are people well informed about the options? From the website’s perspective it doesn’t matter much: Essentially, having access to an email address grants you access to the website. As long as that is the case, we might as well use “magic links” for authentication and do away with passwords all together.

    In fact, in many places, email is now also used as 2-factor authentication. If the website has a “forgot my password” flow via email, then 2-factor via email only adds the illusion of security.

    Solid Queue 1.0 released

    I’m happy about this development: Rails should definitely have a canonical queue implementation. I’m also interested in it’s performance because of the UPDATE FOR SKIP LOCKED usage. I plan on evaluating it in in the future vs GoodJob. I noticed a few things about the announcement:

    37 Signals production setup uses claims 20M jobs per day with 800 workers. That seems like a lot of workers, but without the context of what those workers are doing.

    They are using a separate database for the queue. While I get that it alleviates some of the performance concerns with the main database, you also loose transactionality between your jobs and the rest of your writes: To me, transactionality is one of the main selling points of using a db-based queueing system. I’ve chased many production issue where using a separate data store for the jobs ends up causing the queue workers to look for records that are not visible, either temporarily due to a race condition, or permanently because of a roll-back. Using 2 separate databases also means that each Rails process (web or worker) needs a connection to each database.

    In the announcement there is a link to a Postgres-only issue recently fixed, that made me realize that Solid Queue has concurrency controls built-in, and uses INSERT ON CONFLICT DO NOTHING to enforce them. That is clever, and more efficient than checking for existence of the concurrency key before inserting.

    Read on →