• The REPL: Issue 110 - October 2023

    Postgres Goodies in Ruby on Rails 7.1

    Rails 7.1 is out with some very interesting features for Potsgres users. Composite primary keys support in particular caught my eye: When partitioning tables, using a composite primary key that includes the partition key is a best practice. Now, Rails supports the composite primary keys in the model and associations (through query_constraints) ensuring that when reading from the table, the partition key is always used.

    Improved support for CTEs is also welcome!

    Writing Object Shape friendly code in Ruby

    It turns out that the way you structure classes (and more precisely variable instantiation) in ruby > 3.2 has performance implications. Ben Sheldon discusses how to structure classes to take advantage of those optimizations. It is an interesting demonstration on how code style and the ruby interpreter interact.

    The article doesn’t mention how much it impacts performance. I wonder: On a typical web request, how much can this save by structuring your classes for optimization?

    The TLDR on Ruby’s new TLDR testing framework

    It’s called TLDR and it blows up if your tests take more than 1.8 seconds to run.

    Testing is a near and dear topic to me. I have not tried this new framework, but I have some initial thoughts:

    • 1.8s is not a lot of time for a whole test suite.
    • Test that fast need to avoid database interactions at all costs. In my experience that leads to heavy mocking, which in turn can lead to unit test passing but the components break on integration.
    • TLDR seems incompatible with large systems (e.g. majestic monolith). For good or bad.
    • Pushing the envelope can lead to some great ideas. For example:

    TLDR automatically prepends the most-recently modified test file to the beginning of the suite

    This is brilliant. I have a script that guesses which test files to run on a branch based on what changed in git. After reading this, I immediately incorporated ordering the files by modification date.

    Read on →

  • Book Review: A Philosophy of Software Design

    by John Ousterhout

    This book focuses on software design, identified as a continuos process that spans the complete lifecycle of a software system. The first part of the book proposes that the main issue in software design lies in managing complexity.

    If a software system is hard to understand and modify, then it is complicated; if it is easy to understand and modify, then it is simple.

    The rest of the book are a collection of principles, techniques, and heuristics to help remove or hide the complexity, replacing it with a simper design.

    It is easier to tell whether a design is simple than it is to create a simple design,

    Probably the most salient piece of advise is that “modules should be deep”: A module is deep when it provides a narrow interface to it’s callers that provides a lot of functionality that abstracts away the details of the implementation.

    Adding garbage collection to a system actually shrinks its overall interface, since it eliminates the interface for freeing objects. The implementation of a garbage collector is quite complex, but that complexity is hidden from programmers.

    Overall, I found the books worthwhile. Especially the attitude that the overall design of a system is constantly shifting. Individual programmers add or remove to the complexity in small increments every time they make changes to the system. Cutting corners very often, will leave the code in a state that is hard to recover from.

    My own attitudes to software design align well with Ousterhout’s except for comments and tests. The author uses comments as a design aid: Writing interface comments first, before implementing any code, so that they guide the design. It gets the programmer thinking about how the module will be used, instead of how it will be implemented. As for tests:

    The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design.

    I wholeheartedly agree with the goal of writing comments first: Outside-in thinking results in better design. Focusing on how a module will be used from a caller’s perspective improves the module’s API. Sometimes comments can serve that purpose, but I think that the author misses the point that test-driven design (TDD) accomplishes that purpose as well. When you write your tests first, by definition you are forced to think how the module will be called, because the test itself uses it! In fact, TDD works best when you start writing tests in the outermost layer of your system and work your way inwards. It gets some time getting use to because the outermost test wont pass until the innermost implementation is complete. The gain is that those tests inform the design through the layers. As for the criticism about TDD being to focused on getting specific features working, I think this is a “shallow” TDD. TDD is typically a red-green-refactor loop. Red: write a failing test. Green: make it pass. Refactor: improve the design. I would agree with Ousterhout if we stopped at red-green, but the last step, the refactor, is what makes it complete: Red improves the API design, Green makes it correct, Refactor improves the internal design.

    Links:

    Read on →

  • The REPL: Issue 109 - September 2023

    YJIT is juicy!

    I’ve seen some recent post on social media about the great performance of Ruby + YJIT. It’s time to give a try!.

    I got it working locally with asdf:

    $ asdf install rust 1.72.1
    $ export ASDF_RUST_VERSION=1.72.1
    $ export RUBY_CONFIGURE_OPTS=--enable-yjit
    
    $ asdf install ruby 3.2.1
    $ asdf shell ruby 3.2.1
    $ ruby --yjit -v
    ruby 3.2.1 (2023-02-08 revision 31819e82c8) +YJIT [arm64-darwin22]
    

    Why You Might Not Want to Run Rails App:update

    The author points out that rails app:update should be use with caution, because it might make unwanted changes to your application or remove manually added configuration. Fair enough. What I don’t understand is the remedy: not to use it! That is what version control is for! I’ve upgraded multiple apps, multiple times using rails app:update. In every case, before committing the changes to version control I inspect each one and make an informed decision if I want to keep them or not.

    Read on →

  • On Class Structure

    Don’t let short methods obscure a class’s purpose.

    There is a pattern of writing Ruby classes that I see a lot: Classes with small methods, most of them private. For example:

    class FindScore
      DEFAULT_SCORE = 0
      URL = 'http://example.com'.freeze
    
      def initialize(user)
        @user = user
      end
    
      def call
        score
      end
    
      private
    
      def score
        response_body.fetch("score", DEFAULT_SCORE)
      end
    
      def response_body
        @response_body ||= JSON.parse(response.body)
      end
    
      def response
        HTTParty.post(
          URL,
          body: { user_id: @user.id }
        )
      end
    end
    

    Each method is small, and easy to understand on it’s own. However, I find that this code obscures the class’s purpose. Figuring out what FindScore does feels like reading backwards, like solving a mystery. I see that call returns the score… which is extracted from the response body with a default if it wasn’t there… the response body is memoized and is obtained from the response, and is parsed from JSON… the response itself obtained by making a web request. Now I can unravel the mystery: We are making a web request to obtain a user’s score, it returns JSON which we parse, and the we extract the score from that. The sequence of operations are the reverse of how the code is written.

    Notice how as I built up my mental image of what the class is doing, I was also dealing with the low-level details like the default score value, or memoization. And this is a fairly simple class.

    For the last few years, I’ve been writing classes in a different style:

    class FindScore
      DEFAULT_SCORE = 0
      URL = 'http://example.com'.freeze
    
      def initialize(user, http_client = HTTParty)
        @user = user
        @http_client = http_client
      end
    
      def call
        make_api_request(@user, @http_client)
          .then { parse_response(_1) }
          .then { extract_score(_1) }
      end
    
      private
    
      def make_api_request(user, http_client = @http_client, url = URL)
        http_client.post(
          url,
          body: { user: user.id }
        )
      end
    
      def parse_response(response)
        JSON.parse(response.body)
      end
    
      def extract_score(response_body, default = DEFAULT_SCORE)
        response_body.fetch("score", default)
      end
    end
    

    Let’s ask the same question: What does FindScore do? It makes an API request, then it parses that response, then it extracts the score. That is it! That is the high-level overview of the class, all clearly laid out in #call. Now, I can deal with the details of each method if I am interested in knowing more.

    Notice that the private methods are as small as in the first class: The are one liners. The major difference is in how we sequenced those methods. That makes a huge difference. We are now telling the story of this class at a high-level of abstraction. Additionally, the private methods in the class interact only with their arguments (and the constants in the class). That makes the methods easier to reason about. I’ve also decided to inject http_client in the initializer. It makes it clear which collaborators this class is dealing with: A user and an http_client. It gives an initial hint to the reader what is to come. I expect most callers will use the default, but injecting all collaborators makes it easier to test too.


    Let’s imagine that we decide that we need to cache the score instead of making a web request every time. In the first style, we would probably add caching like this:

    def score
      Cache.fetch("score-for-#{@user.id}") do
        response_body.fetch("score", DEFAULT_SCORE)
      end
    end
    

    It’s expedient, but the fact that the score is cached is now again hidden in a private method.

    In my alternate version, we can make it a part of the story:

    def initialize(user, http_client = HTTParty, cache_client = Cache)
        @user = user
        @http_client = http_client
        @cache_client = cache_client
      end
    
    def call
      cache(@cache_client, @user) do
        make_api_request(@user)
          .then { parse_response(_1) }
          .then { extract_score(_1) }
      end
    end
    
    private
    
    def cache(cache_client, user, &block)
      cache_client.fetch("score-for-#{@user.id}", &block)
    end
    

    I’ve added a few more lines than in the first implementation, in order to keep the story that is being told front and center, keeping the details at a lower-level of abstraction.

    I’ve come to believe that this story-telling procedural way of classes is more legible and digestible by readers. It is organized in the same way and order that the sequence of operations it’s codifying. It reminds me a lot about unix pipes.

    Don’t let short methods obscure a class’s purpose. Inject your collaborators. Write method calls in the same order as the operations they are performing.

    Read on →

  • The REPL: Issue 108 - August 2023

    Eventual Business Consistency

    Kent Beck talks about bi-temporal modeling. It’s a topic I’m very interested in. I am glad that Ken Beck is talking about this: He has great readership and might make this a more mainstream technique. I am not sure about renaming it to “Eventual Business Consistency”.

    However, I think part of the reason it hasn’t become more popular, given the benefits it brings, is just the name. Hence my proposed rebranding to “eventual business consistency”.

    I don’t see any rationale for this assertion. I doesn’t ring true for me. Bi-temporal data/modeling seems like a fine name. Programmers regularly talk about polymorphism, inheritance, dependency injection, concurrency, parallelism. As far as I can tell, bi-temporal doesn’t seem different than other technical jargon. I fail to see why it’s a disadvantage.

    If I had to guess, I think that he was right in the first place:

    Part of the reason it hasn’t taken off is because of the additional complexity it imposes on programmers.

    Bi temporal modeling adds a lot of complexity. Most queries are “current” state queries, where NOW() can be used for both the validity range and the transaction range. The complexity comes from primary keys and foreign keys now needing to account for the ranges. It’s solvable, but most databases (Postgres, MySQL) don’t have first-class support for modeling like this. This probably could be solved with extension or application frameworks. I believe that could actually bring more usage.

    Another barrier to entry, is that most applications are not designed with bi-temporal data in mind. Adding bi-temporal data to an existing model is more complicated and requires migrating data (and obviously not all data has been kept).

    Just normal web things.

    I nodded along while reading. Users have expectation of how the web should work and what they can do with it: Copy text, open a new window, etc. Websites shouldn’t break that! Sometimes websites are really apps that have a different UX paradigm (e.g. a photo editor). Most of the website that are coded as “apps” – and break web conventions – could easily be standard CRUD web apps. Sigh.

    Play in a sandbox in production

    Andy Croll advices to use rails console --sandbox in production, to avoid making unintended data changes.

    The “why not” section is missing that opening a rails console with --sanbox opens a transaction that is rolled back after the console is closed. Long-running transactions can cause whole system performance degradation when there is a high load on the system.

    When should you worry about this? Depends on your system. I’ve worked on systems where traffic was relatively low, and wouldn’t be a problem. I’ve also worked in systems where a long-running transaction of only 1 or 2 minutes cause request queueing that would bring the whole system down.

    Is there an alternative? Yes. Open a rails console using a read-only database connection (to a read replica or configured to be read-only against the same database). That is not as easy as --sandbox, but it can be as simple as setting a postgres variable to make the connection read only.

    Read on →