Git Monorepo Improved Performance

Oct 20, 2022 • Ylan Segal • git

git recently shipped some performance improvements when working with large repositories, as announced on the GitHub blog.

I tested in a large repository. With default configuration:

$ time git status
On branch master
Your branch is behind 'origin/master' by 686 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean
git status  0.40s user 8.55s system 429% cpu 2.082 total

We then configure fsmonitor and untrackedcache:

$ git config core.fsmonitor true
$ git config core.untrackedcache true

And run twice, to warm up the cache:

$ time git status
On branch master
Your branch is behind 'origin/master' by 686 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean
git status  0.38s user 1.43s system 159% cpu 1.141 total

$ time git status
On branch master
Your branch is behind 'origin/master' by 686 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean
git status  0.13s user 0.03s system 92% cpu 0.178 total

The improvement is quite significant. The end performance is under 200 ms, generally considered to be perceived as instantaneous by users. I’m thrilled!

Read on →

The REPL: Issue 97 - September 2022

Oct 3, 2022 • Ylan Segal • the repl, git, postgres, elixir

Signing Git Commits with Your SSH Key

SSH keys are more common than GPG keys, by far. I don’t know many developers that have GPG keys, but all of them have SSH keys, if only to use GitHub. However, the support for the signatures seems a bit rough at the moment.

Transactionally Staged Job Drains in Postgres

The article explains well how background jobs that run outside of a db transaction can have several categories or problems. However, job queues driven by relational databases sometimes don’t scale well, when compared to other queues. For example see DelayedJob, or Que vs Sidekiq. The article presents a pattern that keeps the transactionality, but regains much of the performance by using a staging table for jobs, which drains into the actual job queue that will do the work.

Understanding GenStage back-pressure mechanism

Really concise explanation of what the concept of back-pressure means in Elixir, and how it can prevent overflow and the capacity of the system being exceeded.

Read on →
The REPL: Issue 96 - August 2022

Sep 9, 2022 • Ylan Segal • the repl, unix

Your Makefiles are wrong

make is a very powerful build tool, but it has sharp edges. In this post Jacob Davis-Hansson explains some best practices to improve the experience. The key insight is that each make target, by default, is suppose to generate a file, and execution is determined by laying out dependencies between files.

Why are you so busy?

as long as you are doing your work well and continuously working on the next most important thing prioritised by the business, any pressure to deliver beyond what your team is capable of is objectively unreasonable.

Tom Lingham writes about being busy in software engineering teams. The quote above gets at the crux of the problem: You can only do so much. Asking for more, means that you need to work more or take shortcuts. Both of those lead to non-sustainable work. The appropriate response is to push back and have the tough conversations.

Read on →

This Blog: 10 Years Later

Aug 23, 2022 • Ylan Segal • writing

Ten years ago I published my first post in this blog. Let’s look back.

I’ve published 210 posts:

$ ls src/_posts | wc -l
     210

Relatively evenly over the years:

$ ls src/_posts | cut -f1 -d'-' | sort | histogram | sort
           2012    11 ######################
           2013    12 ########################
           2014    19 ######################################
           2015    30 ############################################################
           2016    22 ############################################
           2017    21 ##########################################
           2018    18 ####################################
           2019    21 ##########################################
           2020    20 ########################################
           2021    22 ############################################
           2022    14 ############################

(See my post on World Player Age if you are curious about histogram)

And even more so if we count by month:

$ ls src/_posts | cut -f2 -d'-' | sort | histogram | sort
             01    19 ####################################################
             02    17 ###############################################
             03    17 ###############################################
             04    18 ##################################################
             05    22 ############################################################
             06    17 ###############################################
             07    16 ############################################
             08    19 ####################################################
             09    19 ####################################################
             10    16 ############################################
             11    16 ############################################
             12    14 #######################################

I’ve written almost 87,000 words (roughly 193 pages)

$ wc -w src/_posts/*
   86919 total

Some of the blog posts I am most proud of, are also the longest (wc -w src/_posts/* | sort -r | head -n 10):

Word Count	Post
3183	Deployments With Schema Migrations
2153	Bug Driven Design
1966	Scratching An Itch With A Ge,
1881	I Also Built a CLI Application With Crystal
1672	Avro Schema Evolution
1557	Enforcing Style
1293	Bitemporal Data
1236	Abstractions With Database Views
1213	This Blog Is Now Delivered Over TLS

And if you create a world cloud out of the categories, I write about these topics:

Category Cloud

Here is to the next 10 years!

Read on →

The REPL: Issue 95 - July 2022

Aug 8, 2022 • Ylan Segal • the repl

The Bullshit Web

Nick Heer laments the state of the current web. Modem speed in the 1990s were 56K modem (I started connecting with the internet on a 14.4K connection). Connection speeds are orders of magnitude faster, and yet web pages still feel slow. It’s the bloat. The embedded videos you don’t watch, the trackers, the ads. It’s all bullshit.

Failed #SquadGoals

Jeremiah Lee writes about the famed Spotify model. It was copied and talked about widely. It turns out, not even Spotify used that model. At least not to the extent that was implied in the original whitepaper. Remember that what you read companies are doing in their blog posts and whitepapers might not be exactly what they are doing, and when they move on to something else, they rarely go back to talk about the mistakes that they made in the first place.

Soft Deletion Probably Isn’t Worth It

Brandur contends that soft-deletion is usually not worth it. It’s rarely used, complicates the model, and on top of that breaks foreign keys. I agree with all that. Especially loosing foreign keys. As an alternative, he proposes using a generic deleted_records table, storing most of the columns in JSON, and populating on deletion. It preserves foreign keys, and preserves the audit ability for customer support. He doesn’t mention it, but it strikes me that it can easily be partitioned for scalability.

There is another alternative I’ve written about: Temporal Modeling. The issue with temporal modeling, is that it also looses foreign key constraints as they are implemented in typical relational databases. The database can still enforce via constraints on date ranges, but it requires a lot more work. I wish there was a Postgres extension that was temporal modeling aware and simplified constraint generation.

Read on →

« Older Newer »