on.code && such

Postgres default values as a backfill method

Oct 17, 2024 • Ylan Segal • postgres
Often, I want to add a new column to a Postgres table with a default value for new records, but also want existing records to have a different value. Changing Postgres default value can make this a very fast operation.

Let’s see an example. Let’s assume we have a songs table, and we want to add a liked column. Existing records need to have the value be set to false, while new values have it set to true.

Table and initial data setup:
```
CREATE TABLE songs (
  name character varying NOT NULL
);
-- CREATE TABLE
-- Time: 16.084 ms

INSERT INTO songs(name) VALUES ('Stairway To Heaven');
-- INSERT 0 1
-- Time: 0.590 ms

SELECT * FROM songs;
--         name
-- --------------------
--  Stairway To Heaven
-- (1 row)
--
-- Time: 0.652 ms
```
Now, let’s add the new column with a default value of false. That is not our end-goal, but it will add that value to existing records¹:
```
ALTER TABLE songs
ADD COLUMN liked boolean DEFAULT false;
-- ALTER TABLE
-- Time: 3.745 ms

SELECT * FROM songs;
--         name        | liked
-- --------------------+-------
--  Stairway To Heaven | f
-- (1 row)
--
-- Time: 0.672 ms

ALTER TABLE songs
ALTER COLUMN liked SET NOT NULL;
-- ALTER TABLE
-- Time: 1.108 ms
```
Now, if we change the default value to true, and insert a new record:
```
ALTER TABLE songs ALTER COLUMN liked SET DEFAULT true;
-- ALTER TABLE
-- Time: 4.664 ms

INSERT INTO songs(name) VALUES ('Hotel California');
-- INSERT 0 1
-- Time: 1.447 ms

SELECT * FROM songs;
--
--         name        | liked
-- --------------------+-------
--  Stairway To Heaven | f
--  Hotel California   | t
-- (2 rows)
--
-- Time: 0.791 ms
```
As we can see, we have the schema in the shape that we want, and the correct data stored in it, without needing a “traditional” backfill to modify each existing row manually. The default value method is much faster, since Postgres doesn’t need to update each record, just check the default value when they were created. 👍🏻
1. Stairway To Heaven is excellent. I’m not implying that I don’t like it. I do. It’s an anthem. ↩
Read on →
The REPL: Issue 121 - September 2024

Oct 3, 2024 • Ylan Segal • the repl, postgres, rails, security

OAuth from First Principles

The articles explains how the problem of sharing access between servers evolves into OAuth, as one starts trying to solve the security issues in the naive implementation.

The “email is authentication” pattern

For some people, logging into a website means using the “forgot your password” flow every time they want to log in. They do in lieu of other schemes like using the same password, using a password manager, using a password generation scheme, etc.

Are people well informed about the options? From the website’s perspective it doesn’t matter much: Essentially, having access to an email address grants you access to the website. As long as that is the case, we might as well use “magic links” for authentication and do away with passwords all together.

In fact, in many places, email is now also used as 2-factor authentication. If the website has a “forgot my password” flow via email, then 2-factor via email only adds the illusion of security.

Solid Queue 1.0 released

I’m happy about this development: Rails should definitely have a canonical queue implementation. I’m also interested in it’s performance because of the UPDATE FOR SKIP LOCKED usage. I plan on evaluating it in in the future vs GoodJob. I noticed a few things about the announcement:

37 Signals production setup uses claims 20M jobs per day with 800 workers. That seems like a lot of workers, but without the context of what those workers are doing.

They are using a separate database for the queue. While I get that it alleviates some of the performance concerns with the main database, you also loose transactionality between your jobs and the rest of your writes: To me, transactionality is one of the main selling points of using a db-based queueing system. I’ve chased many production issue where using a separate data store for the jobs ends up causing the queue workers to look for records that are not visible, either temporarily due to a race condition, or permanently because of a roll-back. Using 2 separate databases also means that each Rails process (web or worker) needs a connection to each database.

In the announcement there is a link to a Postgres-only issue recently fixed, that made me realize that Solid Queue has concurrency controls built-in, and uses INSERT ON CONFLICT DO NOTHING to enforce them. That is clever, and more efficient than checking for existence of the concurrency key before inserting.

Read on →
Using expect to tailor environment

Sep 19, 2024 • Ylan Segal • unix
I recently learned to leverage expect. According to the man pages it is:

expect - programmed dialogue with interactive programs

I often need to ssh into remote hosts that both are ephemeral and I don’t have much control over. Any environment customization that I do is one-time only, because odds are that next time I connect, it will be to a different host. Yet, I’d like to be in a familiar environment. That is where expect comes in.

expect provides a DSL of sorts to interactively use programs in the command line. I use it to front ssh and do a few common tasks every time I log in:
```
#!/usr/bin/env expect
# Front ssh and automate env settings once on remote machine

set environment [lindex $argv 0];

spawn ssh $environment
expect "\$ " {
  send ". entry.sh"

  send -- "alias db_replica='psql ..."
  send -- "\n"

  send -- "export EDITOR=nano"
  send -- "\n"

  expect "\$ " { interact }
}
```
That script:
1. Reads the first argument to it and sets to a local variable named “environment”.
2. Starts an instance (spawn) of ssh, passing the value of environment to it.
3. It then waits (expect) until it reads a $
4. It sends a few commands to source a file, create an alias, export a variable, etc.
5. It then expects another prompt and starts and switches to interactive mode (interact)
That last steps is what “drops” me into the terminal in the remote host, personalized with my taste.

This is just the proverbial tip of the iceberg. The expect manual has a lot more information.
Read on →
The REPL: Issue 120 - August 2024

Sep 4, 2024 • Ylan Segal • the repl

Structure Your ERb and Partials for more Maintainable Front-end Code in Rails

Interesting exploration of how to write better ERB views in Rails. One the one hand, adding ERB tags and interspersed Ruby code in HTML is not optimal. On the other extreme, is generating everything in Ruby with tag helpers, which is also less than optimal. No matter where you land, it seems that one needs to know both HTML, ERB and some Ruby DSL to write views, with an eye to what HTML will be generated.

actual_db_schema: Wipe out inconsistent DB and schema.rb when switching branches

Interesting library. It attempts to solve the issue of jumping between multiple branches in Rails code bases and the DB schema getting out of sync. I have not used it, and can’t say that I’ve struggled much with this problem. What I do struggle with is structure.sql conflicts of different branches all wanting to insert their migration number in the same spot.

Read on →
The REPL: Issue 119 - July 2024

Aug 3, 2024 • Ylan Segal • the repl

Entering text in the terminal is complicated

Most of the things that I associate with “usable” CLIs actually comes from readline. TIL that you can wrap anything with readline! rlwrap nc. It doesn’t seem to be installed on my Mac, by default though.

PostgreSQL INSTEAD OF Triggers

I’ve been working with Postgres triggers recently. They are immensely powerful, but I find the syntax hard to get used to, and the manual has limited examples. In particular this tutorial shows a use of RETURNING ... INTO to put a value into a variable that made it much easier to achieve what I wanted, instead of a chain of CTEs with INSERT INTO that was becoming unwieldy.

Event sourcing for smooth brains: building a basic event-driven system in Rails

I don’t consider what this article described to be event sourcing. It’s a light event system in Rails, which is great, but not exactly event sourcing.

In event sourcing (as understood by most of the literature), the event records are the source of truth, and by “applying” those events you get the current state of the system. What they describe is not that. The system in the article has events and subscribers that act on those events, and the state of the system is maintained separately from the events.

In trying to simplify, they threw out some of what the event sourcing typically consider the most useful parts. Basically, they kept a pub/sub system hooked into after_commit hooks. Useful, but not what is described on the tin.

Read on →

« Older Newer »