• Background long-running git hooks

    A script that I’ve been using for years stopped working as expected after I upgraded bash and git. I use ctags to navigate code in my editor (currently Atom). To automate the generation of the tags file, I run the ctags executable from git hooks (post-commit, post-merge, and post-checkout), which fits well with my development workflow.

    Some of the projects I work with are quite large, and the ctags invocation can take longer than 30 seconds. To avoid waiting that long on each commit, I background the invocation. The hook – that has worked for years – looked like this:

    #!/usr/bin/env bash
    # Regenerate ctags
    
    # Only run one ctags process for this directory at the time.
    # Otherwise the ctags file is corrupted
    (lockfile .ctags.lock; \
     ctags -R --exclude='*.js' --exclude='*.h' --exclude='*.cpp' &> /dev/null ; \
     rm -f .ctags.lock) &
    

    The lockfile usage prevents multiple copies of ctags running at the same time, which can happen when the hook is invoked often (like when comitting multiple times in quick succession). The (..) invoke the commands inside on a sub-shell, and the & at the end tells bash to background the work and continue.

    I’ve been using this for years without issue, until I recently upgraded both git and bash on my machine. The invocation above continued to generate the tags as expected, but instead of backgrounding the work, the git hook would block until ctags finished.

    I could not find anything related to that in either git or bash release notes. StackOverflow provided several tips regarding using nohup or disown but using them didn’t help.

    Eventually, what did work is redirecting the output of the sub-shell, instead of redirecting the output of ctags alone:

    (lockfile .ctags.lock; \
      ctags -R --exclude='*.js' --exclude='*.h' --exclude='*.cpp' ;\
      rm -f .ctags.lock) &> /dev/null &
    

    When the sub-shell is instantiated, it’s stdout and stderr are connected to the parent process (i.e. the git hook). My best guess is that after the upgrade, the hook invocation now waited until the sub-shell existed because it’s std{out,err} was connected to the sub-shell’s. With the new invocation, the (..) &> /dev/null disconnects the output streams for the whole sub-shell from the hook’s output streams, by redirecting it to /dev/null. The hook’s process can then safely close its own std{out,errr} and exit.

    Read on →

  • The REPL: Issue 92 - April 2022

    The Dunning-Kruger Effect is Autocorrelation

    This article is fascinating. The argument is that the well-known Dunnign-Kruger effect (i.e. unskilled people overestimate their skill), is not a psychological effect. Rather, it is a statistical mistake. It is an artifact of autocorrelation: Comparing a variable to itself.

    Refactoring Ruby with Monads

    Tom Stuart is a great, clear writer. This article does a great job at introducing the usefulness of monads – explaining them from the ground up, without the math pretentiousness.

    Ruby Shell-Out Flow Chart

    Ruby supports many ways of doing that. This excellent flow-chart from a StackOverflow answer tells you which one to use. I am reposting mainly so that I can find it again easily!

    Ruby Shell-Out Flow Chart

    Read on →

  • Testing Unix Utilities With RSpec

    I maintain a series of small unix scripts to make my daily usage more effective. I approach the development of these utilities like I do my other software: Use Outside-In Test-Driven Development. I use rspec to write my tests, even if the code itself is written in bash, zsh, or ruby. Let’s see a few examples.

    Testing Output

    Some of my utilities are similar to pure functions: They always return the same output given for the same input, and they don’t have side-effects (i.e. they don’t change anything else in the system).

    One of my most often used utility is jira_ticket_number. Given a string, it extracts the Jira ticket number from it. I typically don’t call it directly, but use it in other scripts. In my typical workflow, I’ll create a branch for a ticket I am working in, and include the ticket number in the name (e.g. ys/CF-8176_rework_request_sweeper). This is useful in a few ways. I use it in another utility jira, to construct and open a URL to the ticket. This saves me several clicks. I also use it to prepend new commit messages with the ticket number automatically when using git, via a custom prepare-commit-msg hook.

    The specs for jira_ticket_number:

    Read on →

  • The REPL: Issue 91 - March 2022

    One Way Smart Developers Make Bad Strategic Decisions

    So now, when I hear about top-down standardization efforts, I get a little worried because I know that trying to generalize across many problems is a fraught endeavor. What’s better is getting to know a specific problem by working collaboratively and embedding with the people who have the most tacit knowledge of the problem. Standardization and top-down edicts fail when they miss or ignore the implicit understandings of people close to the problem

    Hints for writing Unix tools

    General good advice on how to design unix tools. I summarize it as: “Design your unix tools to be composable”.

    The Code Review Pyramid

    The graphic speaks for itself: Spend more time in the bottom, than at the top. Automate what is possible.

    Read on →

  • Finding Broken Links

    HTML powers the web, in great part by providing a way to link to other content. Every website maintainer dreads having broken links: Those that when followed result in a document that is no longer there.

    I remember that when I first learned to hand-write HTML (yes, last century) I used a Windows utility called Xenu’s Link Sleuth. It allowed me to check my site for broken links. I don’t use Windows anymore, but wget turns out to have everything I need.

    Based on an article by Digital Ocean, I created a script that checks for broken internal1 links:

    #!/usr/bin/env bash
    # Finds broken links for a site
    #
    # Usage
    # find_broken_links http://localhost:3000
    
    ! wget --spider --recursive --no-directories --no-verbose $1 2>&1 | grep -B1 -E '(broken link!|failed:)'
    

    It uses wget to spider (or crawl) a given URL and recursively check all links. All output is redirected and filtered to print only the broken links or other failures. The ! before the invocation inverts the process output: grep typically returns a non-zero (error) code if there is no output, but in this case we consider that a success.

    Running against this blog found 3 broken links!

    Now, my Makefile has a test target:

    test:
      find_broken_links http://127.0.0.1:4000
    

    I run it before every deployment (including posting this very post), to ensure I have not introduced bad link :-)

    1. By default, wget will not spider links in other hosts, but can be configured with --span-hosts to do so, to also check that external links are still valid. While I consider a broken internal link something that I must fix, a broken external link is something that another website operator broke. Their url is no longer valid, but I don’t necessarily want to do anything about it. 

    Read on →