• On Class Structure

    Don’t let short methods obscure a class’s purpose.

    There is a pattern of writing Ruby classes that I see a lot: Classes with small methods, most of them private. For example:

    class FindScore
      DEFAULT_SCORE = 0
      URL = 'http://example.com'.freeze
    
      def initialize(user)
        @user = user
      end
    
      def call
        score
      end
    
      private
    
      def score
        response_body.fetch("score", DEFAULT_SCORE)
      end
    
      def response_body
        @response_body ||= JSON.parse(response.body)
      end
    
      def response
        HTTParty.post(
          URL,
          body: { user_id: @user.id }
        )
      end
    end
    

    Each method is small, and easy to understand on it’s own. However, I find that this code obscures the class’s purpose. Figuring out what FindScore does feels like reading backwards, like solving a mystery. I see that call returns the score… which is extracted from the response body with a default if it wasn’t there… the response body is memoized and is obtained from the response, and is parsed from JSON… the response itself obtained by making a web request. Now I can unravel the mystery: We are making a web request to obtain a user’s score, it returns JSON which we parse, and the we extract the score from that. The sequence of operations are the reverse of how the code is written.

    Notice how as I built up my mental image of what the class is doing, I was also dealing with the low-level details like the default score value, or memoization. And this is a fairly simple class.

    For the last few years, I’ve been writing classes in a different style:

    class FindScore
      DEFAULT_SCORE = 0
      URL = 'http://example.com'.freeze
    
      def initialize(user, http_client = HTTParty)
        @user = user
        @http_client = http_client
      end
    
      def call
        make_api_request(@user, @http_client)
          .then { parse_response(_1) }
          .then { extract_score(_1) }
      end
    
      private
    
      def make_api_request(user, http_client = @http_client, url = URL)
        http_client.post(
          url,
          body: { user: user.id }
        )
      end
    
      def parse_response(response)
        JSON.parse(response.body)
      end
    
      def extract_score(response_body, default = DEFAULT_SCORE)
        response_body.fetch("score", default)
      end
    end
    

    Let’s ask the same question: What does FindScore do? It makes an API request, then it parses that response, then it extracts the score. That is it! That is the high-level overview of the class, all clearly laid out in #call. Now, I can deal with the details of each method if I am interested in knowing more.

    Notice that the private methods are as small as in the first class: The are one liners. The major difference is in how we sequenced those methods. That makes a huge difference. We are now telling the story of this class at a high-level of abstraction. Additionally, the private methods in the class interact only with their arguments (and the constants in the class). That makes the methods easier to reason about. I’ve also decided to inject http_client in the initializer. It makes it clear which collaborators this class is dealing with: A user and an http_client. It gives an initial hint to the reader what is to come. I expect most callers will use the default, but injecting all collaborators makes it easier to test too.


    Let’s imagine that we decide that we need to cache the score instead of making a web request every time. In the first style, we would probably add caching like this:

    def score
      Cache.fetch("score-for-#{@user.id}") do
        response_body.fetch("score", DEFAULT_SCORE)
      end
    end
    

    It’s expedient, but the fact that the score is cached is now again hidden in a private method.

    In my alternate version, we can make it a part of the story:

    def initialize(user, http_client = HTTParty, cache_client = Cache)
        @user = user
        @http_client = http_client
        @cache_client = cache_client
      end
    
    def call
      cache(@cache_client, @user) do
        make_api_request(@user)
          .then { parse_response(_1) }
          .then { extract_score(_1) }
      end
    end
    
    private
    
    def cache(cache_client, user, &block)
      cache_client.fetch("score-for-#{@user.id}", &block)
    end
    

    I’ve added a few more lines than in the first implementation, in order to keep the story that is being told front and center, keeping the details at a lower-level of abstraction.

    I’ve come to believe that this story-telling procedural way of classes is more legible and digestible by readers. It is organized in the same way and order that the sequence of operations it’s codifying. It reminds me a lot about unix pipes.

    Don’t let short methods obscure a class’s purpose. Inject your collaborators. Write method calls in the same order as the operations they are performing.

    Read on →

  • The REPL: Issue 108 - August 2023

    Eventual Business Consistency

    Kent Beck talks about bi-temporal modeling. It’s a topic I’m very interested in. I am glad that Ken Beck is talking about this: He has great readership and might make this a more mainstream technique. I am not sure about renaming it to “Eventual Business Consistency”.

    However, I think part of the reason it hasn’t become more popular, given the benefits it brings, is just the name. Hence my proposed rebranding to “eventual business consistency”.

    I don’t see any rationale for this assertion. I doesn’t ring true for me. Bi-temporal data/modeling seems like a fine name. Programmers regularly talk about polymorphism, inheritance, dependency injection, concurrency, parallelism. As far as I can tell, bi-temporal doesn’t seem different than other technical jargon. I fail to see why it’s a disadvantage.

    If I had to guess, I think that he was right in the first place:

    Part of the reason it hasn’t taken off is because of the additional complexity it imposes on programmers.

    Bi temporal modeling adds a lot of complexity. Most queries are “current” state queries, where NOW() can be used for both the validity range and the transaction range. The complexity comes from primary keys and foreign keys now needing to account for the ranges. It’s solvable, but most databases (Postgres, MySQL) don’t have first-class support for modeling like this. This probably could be solved with extension or application frameworks. I believe that could actually bring more usage.

    Another barrier to entry, is that most applications are not designed with bi-temporal data in mind. Adding bi-temporal data to an existing model is more complicated and requires migrating data (and obviously not all data has been kept).

    Just normal web things.

    I nodded along while reading. Users have expectation of how the web should work and what they can do with it: Copy text, open a new window, etc. Websites shouldn’t break that! Sometimes websites are really apps that have a different UX paradigm (e.g. a photo editor). Most of the website that are coded as “apps” – and break web conventions – could easily be standard CRUD web apps. Sigh.

    Play in a sandbox in production

    Andy Croll advices to use rails console --sandbox in production, to avoid making unintended data changes.

    The “why not” section is missing that opening a rails console with --sanbox opens a transaction that is rolled back after the console is closed. Long-running transactions can cause whole system performance degradation when there is a high load on the system.

    When should you worry about this? Depends on your system. I’ve worked on systems where traffic was relatively low, and wouldn’t be a problem. I’ve also worked in systems where a long-running transaction of only 1 or 2 minutes cause request queueing that would bring the whole system down.

    Is there an alternative? Yes. Open a rails console using a read-only database connection (to a read replica or configured to be read-only against the same database). That is not as easy as --sandbox, but it can be as simple as setting a postgres variable to make the connection read only.

    Read on →

  • Unexpteced Rails N+1 when using #without

    I recently noticed an unexpected N+1 in a Rails project when using #without (aliased from #excluding).

    The feature is a page that lists all available programs, and a list of participants other than the current user. In its basic form it’s equivalent to:

    # Controller:
    programs = Program.all.includes(:participants)
    # Program Load (2.0ms)  SELECT "programs".* FROM "programs"
    # ProgramParticipant Load (1.0ms)  SELECT "program_participants".* FROM "program_participants" WHERE "program_participants"."program_id" IN ($1, $2)  [["program_id", 1], ["program_id", 2]]
    # Person Load (0.5ms)  SELECT "people".* FROM "people" WHERE "people"."id" IN ($1, $2, $3)  [["id", 4], ["id", 2], ["id", 3]]
    
    # View
    programs.map do |program|
      program.participants.without(current_user).map { _1.first_name }.join(", ")
    end
    # Person Load (1.5ms)  SELECT "people".* FROM "people" INNER JOIN "program_participants" ON "people"."id" = "program_participants"."participant_id" WHERE "program_participants"."program_id" = $1 AND "people"."id" != $2  [["program_id", 1], ["id", 4]]
    # Person Load (0.6ms)  SELECT "people".* FROM "people" INNER JOIN "program_participants" ON "people"."id" = "program_participants"."participant_id" WHERE "program_participants"."program_id" = $1 AND "people"."id" != $2  [["program_id", 2], ["id", 4]]
    # => ["Gabriel", "Gabriel, Alex"]
    

    Notice that participant (in people table) are being loaded, and seemingly ignoring the includes in the controller.

    The N+1 was not present before this app was upgraded to Rails 7.0. That is key. We can see in the changelog the implementation of ActiveRecord::Relation#excluding (but not mentioned in the guide as a notable change). Before that, excluding (or without) was implemented in Enumerable, which didn’t create the N+1. In fact, using that method – by calling to_a on the relation – returns us to the desired behavior:

    programs.map do |program|
      program.participants.to_a.without(current_user).map { _1.first_name }.join(", ")
    end
    # => ["Gabriel", "Alex, Gabriel"]
    # --> Same result, no extra query!
    

    Conclusion

    Typically, doing more work in the database and less in Ruby brings performance improvements. In this specific case, the optimization prevented using already loaded data, which resulted in many more queries and overall worse performance. Catching these errors when upgrading Rails is difficult, because the functionality was actually not affected.

    Read on →

  • The REPL: Issue 107 - July 2023

    The Day FedEx Delivered Its Promise

    The story is compelling. A small tweak has a big payoff. We’d all like to believe that we can do that in our own lives. It also rings as apocryphal, but I haven’t actually checked. The takeaway is that incentives matter, and changing incentives changes behavior.

    The more interesting question is: how do you find the correct incentives? Getting lucky is one way. Is there a systematic methodology to design and measure incentives? It also reminds me of the adage “You optimize what you measure”. The measuring itself becomes an incentive.

    Responding to “Are bugs and slow delivery ok?”

    This article responds to another article, about when it’s OK to ship buggy software. I think the original was a marketing ploy by that author setting up a false dichotomy so that you would agree that you do need good quality. This author then misunderstands, I think, but it doesn’t matter. The claim is that it is in fact OK to ship slow, with bugs because:

    I’ve seen (and wrote) some terrible quality code. Really bad stuff. Untested, most of it. In nearly every place I’ve worked at. I’ve seen enormous amounts of time wasted with testing for, or fixing, bugs.

    You know what I haven’t seen? not once in 15 years? A company going under.

    There is lots of ways a company can have bad performance, without “going under”. It’s is a false dichotomy. Akin to dismissing all diseases because they are not deadly, glossing over the nuance that you can still suffer a lot without dying.

    In any case, it’s hard to think that a company like Apple would be the the most valuable company in the world, if they embraced shipping buggy software or hardware. A company can limp along and compensate, but not excel. Quickbooks comes to mind. Their software is the worst. Everyone I know that uses it hates it, but they are profitable. Why? Because they have captured the accountants market, and they make their clients use it. They survive like that, but I don’t think they actually thrive.

    Online Data Type Change in PostgreSQL

    This article does a good job at sequencing how to change a column type in postgres without locking the whole table. The temp table for keeping track of what needs to be backfilled works, but it can also be done without.

    What was not discussed at all, is what happens to the application code in the meantime. Rails applications will see the new column an register it. The trigger takes care of writing the data for that column, but when we rename the columns in the last step (or delete them), we are changing the column information from under Rails’s proverbial rug, which will cause exceptions. Solvable problems, if you are looking for them in the first place.

    Read on →

  • Spring and Bundler

    I was trying to update rails in one of my projects recently. The command I ran was not quite right, so I wanted to discard changes to Gemfile.lock

    I kept doing:

    git restore Gemfile.lock
    

    But the git reported the file as still dirty, and the unwanted changes were still there. I couldn’t understand what was going on!

    Eventually, I noticed that:

    git restore Gemfile Gemfile.lock
    

    worked. Then it hit me. A spring server was still running an apparently running bundle install whenever the Gemfile changed, which was regenerating my unwanted changes as soon as I restored them with git.

    I guess for some use cases this quiet behavior helps. In my case, I wanted to run a particular bundle update:

    $ bundle update rails --conservative --patch --strict
    

    spring. Sigh.

    Read on →