-
Postgres Updatable Views
Postgres has support for creating views that are themselves updatable, as if they are tables. By default, this is only possible for simple views, as defined in the documentation. More advanced views, can also be updatable, with a bit more work. Lets examine how.
For the sake of this example1, lets say we have a
user_last_logins
table that captures the last date and time that the user has logged into a system:CREATE TABLE user_last_logins ( user_id integer NOT NULL, logged_in_at timestamp NOT NULL ); -- CREATE TABLE -- Time: 10.102 ms CREATE UNIQUE INDEX idex_user_id_on_user_last_logins ON user_last_logins(user_id); -- CREATE INDEX -- Time: 7.783 ms
Somewhere in the application code, when the user logs in, we update the time that a user has logged in.
INSERT INTO user_last_logins(user_id, logged_in_at) VALUES (1, NOW()), (2, NOW()), (3, NOW()); -- INSERT 0 3 -- Time: 5.007 ms SELECT * FROM user_last_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 1 | 2024-04-29 14:36:11.765157 -- 2 | 2024-04-29 14:36:11.765157 -- 3 | 2024-04-29 14:36:11.765157 -- (3 rows) -- -- Time: 0.481 ms UPDATE user_last_logins SET logged_in_at = NOW() WHERE user_id = 1; -- UPDATE 1 -- Time: 5.463 ms SELECT * FROM user_last_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 2 | 2024-04-29 14:36:11.765157 -- 3 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:36:22.268137 -- (3 rows) -- -- Time: 0.973 ms
Notice how the timestamp for the row with
user_id = 1
was updated.Now lets imagine that our requirements change. We are now set to record the time of every login by users, as opposed to the last time. We also have different client applications that might not be on the same schedule, and would like to keep the current database functionality intact. Our first order of business is to create our new table and populated with the information we have at hand:
BEGIN; CREATE TABLE user_logins ( user_id integer NOT NULL, logged_in_at timestamp NOT NULL ); INSERT INTO user_logins (user_id, logged_in_at) SELECT * FROM user_last_logins; COMMIT; -- BEGIN -- Time: 0.129 ms -- CREATE TABLE -- Time: 4.720 ms -- INSERT 0 3 -- Time: 0.842 ms -- COMMIT -- Time: 0.339 ms SELECT * from user_logins; -- -- user_id | logged_in_at -- ---------+---------------------------- -- 2 | 2024-04-29 14:36:11.765157 -- 3 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:36:22.268137 -- (3 rows) -- -- Time: 0.816 ms
The new table has the same structure as the old table, but has no unique index on
user_id
, because we want to allow multiple rows for each user. I am aware that I could have renamed the table instead of creating a new one and copying data. I choose to do it this way for didactical purposes.It’s now possible to insert new records for each users:
INSERT INTO user_logins VALUES (1, NOW()); -- INSERT 0 1 -- Time: 1.613 ms SELECT * from user_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 2 | 2024-04-29 14:36:11.765157 -- 3 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:36:22.268137 -- 1 | 2024-04-29 14:37:23.696786 -- (4 rows) -- -- Time: 0.505 ms
We still have a problem: There are clients to this application that can’t change their code right away. Let’s tackle first the clients that want to read an up-to-date
user_last_logins
relation:BEGIN; DROP TABLE user_last_logins; CREATE OR REPLACE VIEW user_last_logins AS SELECT user_id, MAX(logged_in_at) as logged_in_at FROM user_logins GROUP BY 1; COMMIT; -- BEGIN -- Time: 0.103 ms -- DROP TABLE -- Time: 3.219 ms -- CREATE VIEW -- Time: 6.703 ms -- COMMIT -- Time: 1.338 ms SELECT * FROM user_last_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 3 | 2024-04-29 14:36:11.765157 -- 2 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:37:23.696786 -- (3 rows) -- -- Time: 1.273 ms
We’ve created a view that produced the same information that used to be in
user_last_logins
from the underlyinguser_logins
table, with the same guarantees that auser_id
will only show up in a single row. Read clients can continue working without a hitch. However, write clients won’t be able to update as before:UPDATE user_last_logins SET logged_in_at = NOW() WHERE user_id = 1; -- ERROR: cannot update view "user_last_logins" -- DETAIL: Views containing GROUP BY are not automatically updatable. -- HINT: To enable updating the view, provide an INSTEAD OF UPDATE trigger or an unconditional ON UPDATE DO INSTEAD rule. -- Time: 0.975 ms
Conceptually, we know that the underlying table has the same structure, and that the insert to the view can be forwarded to the underlying table. The error even gives us a hint at what to do: Use and
INSTEAD OF UPDATE
trigger.BEGIN; CREATE OR REPLACE FUNCTION instead_function_insert_user_last_logins() RETURNS TRIGGER AS $BODY$ BEGIN INSERT INTO user_logins(user_id, logged_in_at) VALUES (NEW.user_id, NEW.logged_in_at); RETURN NEW; END; $BODY$ LANGUAGE plpgsql; CREATE TRIGGER instead_trigger_user_last_logins INSTEAD OF UPDATE ON user_last_logins FOR EACH ROW EXECUTE PROCEDURE instead_function_insert_user_last_logins(); COMMIT; -- BEGIN -- Time: 0.077 ms -- CREATE FUNCTION -- Time: 6.508 ms -- CREATE TRIGGER -- Time: 2.049 ms -- COMMIT -- Time: 0.427 ms
We’ve defined a trigger that runs on each attempted update of
user_last_logins
and instead runs a function. The body of the function plainly inserts instead intouser_logins
. And becauseuser_logins
powers theuser_last_logins
view, the insert appears to work as requested.UPDATE user_last_logins SET logged_in_at = NOW() WHERE user_id = 1; -- UPDATE 1 -- Time: 6.153 ms SELECT * FROM user_last_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 3 | 2024-04-29 14:36:11.765157 -- 2 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:38:39.369368 -- (3 rows) -- -- Time: 0.696 ms SELECT * FROM user_logins; -- user_id | logged_in_at -- ---------+---------------------------- -- 2 | 2024-04-29 14:36:11.765157 -- 3 | 2024-04-29 14:36:11.765157 -- 1 | 2024-04-29 14:36:22.268137 -- 1 | 2024-04-29 14:37:23.696786 -- 1 | 2024-04-29 14:38:39.369368 -- (5 rows) -- -- Time: 0.826 ms
Now our existing clients can migrate to use
user_logins
at their own pace. For this particular case, we only dealt withINSTEAD OF UPDATE
, but in the same vein we could have defined anINSTEAD OF INSERT
trigger.Conclusions
The
INSTEAD OF
trigger allows us a complex view to remain updatable. In this made-up-case this allows views to function as an abstraction for tables that no longer exist, and de-couples the need from client code changing at the same time as the database schema.-
I used Postgres v12.5 for this example. ↩
-
-
The REPL: Issue 115 - March 2024
Bash Debugging
Another gem by Julia Evans. I always learn something from her zines, and this is no exception: You can have a step debugger in bash!
How Figma’s Databases Team Lived to Tell the Scale
Very interesting article on how Figma managed to scale Postgres. Having a DB proxy that checks the queries and routes to the correct shard (and even aggregates among shards) is wild.
The use of logical partitioning to prove their work before doing actual physical partitioning is very clever.
Before going out and building a custom replication scheme, remember that there are out-of-the-box solutions out there that most organizations are better choosing over custom solutions.
jnv: interactive JSON filter using jq
jnv
looks like a nice tool for interactively exploring the a JSON file. -
The REPL: Issue 114 - February 2024
Testing ActiveRecord Transactions the Right Way
Very handy recipes for testing
ActiveRecord
transactions.Useful Uses of cat
Using
cat
to start a pipelines is about composing commands: It makes it easier to build pipelines in steps. Technically, you could be adding an extra process that you don’t need, but in day-to-day unix pipe operations, the performance is never an issue.Data Looks Better Naked
Nice visual animation of how removing stuff improves the design. The pie chart in particular was great!
-
Pipelines With Result Objects
In a previous post – On Class Structure – we discussed how to organize Ruby classes so that the logic is not buried in private methods, and instead is clear to readers what the class is responsible for doing. In this post, I want to expand on that a bit more and introduce result objects.
Let’s resume with our example:
class FindScore DEFAULT_SCORE = 0 URL = 'http://example.com'.freeze def initialize(user, http_client = HTTParty) @user = user @http_client = http_client end def call make_api_request(@user, @http_client) .then { parse_response(_1) } .then { extract_score(_1) } end private def make_api_request(user, http_client = @http_client, url = URL) http_client.post( url, body: { user: user.id } ) end def parse_response(response) JSON.parse(response.body) end def extract_score(response_body, default = DEFAULT_SCORE) response_body.fetch("score", default) end end
That class doesn’t handle any errors. Each of the private methods can fail in different ways. For the sake of examples, lets say that we can encounter HTTP errors in
make_api_request
, the response may fail to be valid JSON or the response might have a different JSON shape than what we expect. One way to handle them is via exceptions or checking for specific conditions, and then ensuring that the value passed along is what the next step in our pipeline expects:class FindScore DEFAULT_SCORE = 0 URL = 'http://example.com'.freeze def initialize(user, http_client = HTTParty) @user = user @http_client = http_client end def call make_api_request(@user, @http_client) .then { parse_response(_1) } .then { extract_score(_1) } end private def make_api_request(user, http_client = @http_client, url = URL) response = http_client.post( url, body: { user: user.id } ) response.ok? ? response.body : "{}" end def parse_response(response) JSON.parse(response.body) rescue JSON::ParserError {} end def extract_score(response_body, default = DEFAULT_SCORE) response_body.fetch("score", default) end end
In that version,
#make_api_request
checks for a correct response, passing the response body to#parse_response
. If the response is not successful however, it returns"{}"
, which is JSON that will be parsable by that response. In a similar manner, parsing JSON might raiseJSON::ParseError
.#parse_response
rescues the exception, and returns a hash, as expected by#extract_score
.The code is now more resilient: It can handle some errors and recover from them by returning a value that can be used in the next method. However, these errors are being swallowed. What if we wanted to add some logging or metrics for each error, so we can understand our system better? One way, is to add a logging statement on the error branch of each method. I prefer another way, using result objects.
For our purposes a result object can either be a success or an error. In either case, it wraps another value, and it has some methods that act differently in each case. This object is known as a result monad, but let’s now dwell on that. Our result object will make it easier to write pipelines of method calls, without sacrificing error handling.
A very minimal implementation looks like this:
class Ok def initialize(value) @value = value end def and_then yield @value end def value_or(_other) @value end end class Error def initialize(error) @error = error end def and_then self end def value_or yield @error end end
The polymorphic interface for
Ok
andError
has two methods:#and_then
which is used to pipeline operations, and#value_or
which is used to unwrap the value. Let’s see some examples:Ok.new(1) .and_then { |n| Ok.new(n * 2) } # => 1 * 2 = 2 .and_then { |n| Ok.new(n + 1) } # => 2 + 1 = 3 .value_or(:error) # => 3 Ok.new(1) .and_then { |n| Ok.new(n * 2) } # => 1 * 2 = 3 .and_then { |n| Error.new("something went wrong") } .and_then { |n| Ok.new(n + 1) } # => Never called .and_then { |n| raise "Hell" } # => Never called either .value_or { :error } # => :error
A chain of
#and_then
calls continue much like#then
does, expecting a result object as a return value. However, if the return value at any point is anError
, subsequent blocks will not execute, and instead will continue returning the same result object. We then have a powerful way of constructing pipelines. Error handling can be left to the end.Our class with error handling, can now be written as:
class FindScore DEFAULT_SCORE = 0 URL = 'http://example.com'.freeze def initialize(user, http_client = HTTParty) @user = user @http_client = http_client end def call make_api_request(@user, @http_client) .and_then { parse_response(_1) } .and_then { extract_score(_1) } .value_or { |error_message| log.error "FindScore failed for #{@user}: #{error_message}" DEFAULT_SCORE } end private def make_api_request(user, http_client = @http_client, url = URL) response = http_client.post( url, body: { user: user.id } ) response.ok? ? Ok.new(response.body) : Error.new("HTTP Status Code: #{response.status_code}") end def parse_response(body) Ok.new(JSON.parse(body)) rescue JSON::ParserError => ex Error.new(ex.to_s) end def extract_score(parsed_json) score = parsed_json["score"] score.present? ? Ok.new(score) : Error.new("Score not found in response") end end
Now, each method is responsible for returning either an
Ok
or andError
. The#call
method is responsible for constructing the overall pipeline and handling the failure (i.e. returning a DEFAULT_SCORE), and with a single line, it also logs all errors.This technique is quite powerful. The result objects are not limited to private class methods. Public methods can return them just as well. The
Ok
andError
implementation is quite minimal as a demonstration for this post. There are full-featured libraries out there (e.g. dry-rb), or you can roll your own pretty easily and expand the API to suit your needs (e.g.#ok?
,#error?
,#value!
,#error
,#fmap
).As I concluded in my previous post, writing Ruby classes so that the class is read in the same order as the operations will be performed leads to more legible code. Adding result objects enhances those same goals, and makes error conditions a first-class concern.
-
The REPL: Issue 113 - January 2024
Tech Companies Are Irrational Pop Cultures
I agree with the author that there is a lot of Pop Culture in software companies, in the sense that they forget about the past, and there is a bias for “newer is better”. Thus, we get all the articles advising to choose “boring” technology with a proven track record.
There also does seem to be a good amount of contagion in the current round of layoffs. Companies are firing people, even if that they are doing well. I disagree that it is irrational. I dislike that characterization. It seems like a crutch for failing to understand the motivation for the people making the decisions. I believe that company executives do know that layoffs are bad for morale and create some problems down the line. There are some pretty smart people in company management. I think that they are making those decisions in spite of knowing that there are real downsides. Maybe the pressure from boards or investors is too much. Even if it is a case of copying what others are doing, it need not be irrational. There is an incentive to go with the flow: It’s safe. No one ever got fired for buying IBM. If things go wrong, you wont be blamed for making the same decision everyone else made.
Anti-Pattern: Iteratively Building a Collection
Mike Burns writes about how iteratively building a collection is an anti-pattern:
What follows are some lengthy method definitions followed by rewrites that are not only more concise but also more clear in their intentions.
It resonates with me that the pattern should be avoided. Brevity and clarity are great, but I think minimize mutation is a better reason to avoid building collections iteratively. Written in a functional style, your code handles less mutation of data structures, which means that it handles less state. Handling state is were a lot of complexity hides, and the source of many bugs. In fact, in Joe Armstrong’s estimation:
State is the root of all evil. In particular functions with side effects should be avoided.
The style of Ruby that the article encourages removes the state handling from your code. 👍🏻
[Is It Possible for My Internet Router to Wear Out?][routes]
Every few years, my routes start acting up in strange ways. Some devices function great, while others seem to have intermittent downloads. This articles confirms my suspicions. Router just wear out:
In general, routers can and do fail. The primary cause of failure for consumer grade equipment is heat stress. Most consumer grade hardware runs far too hot and have respectively poor air circulation compared to their ventilation needs.
To increase ventilation, I’ve started raising my router from the surface it’s on with a Lego structure that increases airflow from the bottom. It seems to improve heat dissipation by the imprecise measure of “it feels cooler to my touch”. 🤷🏻♂️