-
Book Review: A Philosophy of Software Design
by John Ousterhout
This book focuses on software design, identified as a continuos process that spans the complete lifecycle of a software system. The first part of the book proposes that the main issue in software design lies in managing complexity.
If a software system is hard to understand and modify, then it is complicated; if it is easy to understand and modify, then it is simple.
The rest of the book are a collection of principles, techniques, and heuristics to help remove or hide the complexity, replacing it with a simper design.
It is easier to tell whether a design is simple than it is to create a simple design,
Probably the most salient piece of advise is that “modules should be deep”: A module is deep when it provides a narrow interface to it’s callers that provides a lot of functionality that abstracts away the details of the implementation.
Adding garbage collection to a system actually shrinks its overall interface, since it eliminates the interface for freeing objects. The implementation of a garbage collector is quite complex, but that complexity is hidden from programmers.
Overall, I found the books worthwhile. Especially the attitude that the overall design of a system is constantly shifting. Individual programmers add or remove to the complexity in small increments every time they make changes to the system. Cutting corners very often, will leave the code in a state that is hard to recover from.
My own attitudes to software design align well with Ousterhout’s except for comments and tests. The author uses comments as a design aid: Writing interface comments first, before implementing any code, so that they guide the design. It gets the programmer thinking about how the module will be used, instead of how it will be implemented. As for tests:
The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design.
I wholeheartedly agree with the goal of writing comments first: Outside-in thinking results in better design. Focusing on how a module will be used from a caller’s perspective improves the module’s API. Sometimes comments can serve that purpose, but I think that the author misses the point that test-driven design (TDD) accomplishes that purpose as well. When you write your tests first, by definition you are forced to think how the module will be called, because the test itself uses it! In fact, TDD works best when you start writing tests in the outermost layer of your system and work your way inwards. It gets some time getting use to because the outermost test wont pass until the innermost implementation is complete. The gain is that those tests inform the design through the layers. As for the criticism about TDD being to focused on getting specific features working, I think this is a “shallow” TDD. TDD is typically a red-green-refactor loop. Red: write a failing test. Green: make it pass. Refactor: improve the design. I would agree with Ousterhout if we stopped at red-green, but the last step, the refactor, is what makes it complete: Red improves the API design, Green makes it correct, Refactor improves the internal design.
Links:
-
The REPL: Issue 109 - September 2023
YJIT is juicy!
I’ve seen some recent post on social media about the great performance of Ruby + YJIT. It’s time to give a try!.
I got it working locally with
asdf
:$ asdf install rust 1.72.1 $ export ASDF_RUST_VERSION=1.72.1 $ export RUBY_CONFIGURE_OPTS=--enable-yjit $ asdf install ruby 3.2.1 $ asdf shell ruby 3.2.1 $ ruby --yjit -v ruby 3.2.1 (2023-02-08 revision 31819e82c8) +YJIT [arm64-darwin22]
Why You Might Not Want to Run Rails App:update
The author points out that
rails app:update
should be use with caution, because it might make unwanted changes to your application or remove manually added configuration. Fair enough. What I don’t understand is the remedy: not to use it! That is what version control is for! I’ve upgraded multiple apps, multiple times usingrails app:update
. In every case, before committing the changes to version control I inspect each one and make an informed decision if I want to keep them or not. -
On Class Structure
Don’t let short methods obscure a class’s purpose.
There is a pattern of writing Ruby classes that I see a lot: Classes with small methods, most of them
private
. For example:class FindScore DEFAULT_SCORE = 0 URL = 'http://example.com'.freeze def initialize(user) @user = user end def call score end private def score response_body.fetch("score", DEFAULT_SCORE) end def response_body @response_body ||= JSON.parse(response.body) end def response HTTParty.post( URL, body: { user_id: @user.id } ) end end
Each method is small, and easy to understand on it’s own. However, I find that this code obscures the class’s purpose. Figuring out what
FindScore
does feels like reading backwards, like solving a mystery. I see thatcall
returns the score… which is extracted from the response body with a default if it wasn’t there… the response body is memoized and is obtained from the response, and is parsed from JSON… the response itself obtained by making a web request. Now I can unravel the mystery: We are making a web request to obtain a user’s score, it returns JSON which we parse, and the we extract the score from that. The sequence of operations are the reverse of how the code is written.Notice how as I built up my mental image of what the class is doing, I was also dealing with the low-level details like the default score value, or memoization. And this is a fairly simple class.
For the last few years, I’ve been writing classes in a different style:
class FindScore DEFAULT_SCORE = 0 URL = 'http://example.com'.freeze def initialize(user, http_client = HTTParty) @user = user @http_client = http_client end def call make_api_request(@user, @http_client) .then { parse_response(_1) } .then { extract_score(_1) } end private def make_api_request(user, http_client = @http_client, url = URL) http_client.post( url, body: { user: user.id } ) end def parse_response(response) JSON.parse(response.body) end def extract_score(response_body, default = DEFAULT_SCORE) response_body.fetch("score", default) end end
Let’s ask the same question: What does
FindScore
do? It makes an API request, then it parses that response, then it extracts the score. That is it! That is the high-level overview of the class, all clearly laid out in#call
. Now, I can deal with the details of each method if I am interested in knowing more.Notice that the private methods are as small as in the first class: The are one liners. The major difference is in how we sequenced those methods. That makes a huge difference. We are now telling the story of this class at a high-level of abstraction. Additionally, the private methods in the class interact only with their arguments (and the constants in the class). That makes the methods easier to reason about. I’ve also decided to inject
http_client
in the initializer. It makes it clear which collaborators this class is dealing with: Auser
and anhttp_client
. It gives an initial hint to the reader what is to come. I expect most callers will use the default, but injecting all collaborators makes it easier to test too.
Let’s imagine that we decide that we need to cache the score instead of making a web request every time. In the first style, we would probably add caching like this:
def score Cache.fetch("score-for-#{@user.id}") do response_body.fetch("score", DEFAULT_SCORE) end end
It’s expedient, but the fact that the score is cached is now again hidden in a private method.
In my alternate version, we can make it a part of the story:
def initialize(user, http_client = HTTParty, cache_client = Cache) @user = user @http_client = http_client @cache_client = cache_client end def call cache(@cache_client, @user) do make_api_request(@user) .then { parse_response(_1) } .then { extract_score(_1) } end end private def cache(cache_client, user, &block) cache_client.fetch("score-for-#{@user.id}", &block) end
I’ve added a few more lines than in the first implementation, in order to keep the story that is being told front and center, keeping the details at a lower-level of abstraction.
I’ve come to believe that this story-telling procedural way of classes is more legible and digestible by readers. It is organized in the same way and order that the sequence of operations it’s codifying. It reminds me a lot about unix pipes.
Don’t let short methods obscure a class’s purpose. Inject your collaborators. Write method calls in the same order as the operations they are performing.
-
The REPL: Issue 108 - August 2023
Eventual Business Consistency
Kent Beck talks about bi-temporal modeling. It’s a topic I’m very interested in. I am glad that Ken Beck is talking about this: He has great readership and might make this a more mainstream technique. I am not sure about renaming it to “Eventual Business Consistency”.
However, I think part of the reason it hasn’t become more popular, given the benefits it brings, is just the name. Hence my proposed rebranding to “eventual business consistency”.
I don’t see any rationale for this assertion. I doesn’t ring true for me. Bi-temporal data/modeling seems like a fine name. Programmers regularly talk about polymorphism, inheritance, dependency injection, concurrency, parallelism. As far as I can tell, bi-temporal doesn’t seem different than other technical jargon. I fail to see why it’s a disadvantage.
If I had to guess, I think that he was right in the first place:
Part of the reason it hasn’t taken off is because of the additional complexity it imposes on programmers.
Bi temporal modeling adds a lot of complexity. Most queries are “current” state queries, where
NOW()
can be used for both the validity range and the transaction range. The complexity comes from primary keys and foreign keys now needing to account for the ranges. It’s solvable, but most databases (Postgres, MySQL) don’t have first-class support for modeling like this. This probably could be solved with extension or application frameworks. I believe that could actually bring more usage.Another barrier to entry, is that most applications are not designed with bi-temporal data in mind. Adding bi-temporal data to an existing model is more complicated and requires migrating data (and obviously not all data has been kept).
Just normal web things.
I nodded along while reading. Users have expectation of how the web should work and what they can do with it: Copy text, open a new window, etc. Websites shouldn’t break that! Sometimes websites are really apps that have a different UX paradigm (e.g. a photo editor). Most of the website that are coded as “apps” – and break web conventions – could easily be standard CRUD web apps. Sigh.
Play in a sandbox in production
Andy Croll advices to use
rails console --sandbox
in production, to avoid making unintended data changes.The “why not” section is missing that opening a rails console with
--sanbox
opens a transaction that is rolled back after the console is closed. Long-running transactions can cause whole system performance degradation when there is a high load on the system.When should you worry about this? Depends on your system. I’ve worked on systems where traffic was relatively low, and wouldn’t be a problem. I’ve also worked in systems where a long-running transaction of only 1 or 2 minutes cause request queueing that would bring the whole system down.
Is there an alternative? Yes. Open a rails console using a read-only database connection (to a read replica or configured to be read-only against the same database). That is not as easy as
--sandbox
, but it can be as simple as setting a postgres variable to make the connection read only. -
Unexpteced Rails N+1 when using #without
I recently noticed an unexpected N+1 in a Rails project when using #without (aliased from
#excluding
).The feature is a page that lists all available programs, and a list of participants other than the current user. In its basic form it’s equivalent to:
# Controller: programs = Program.all.includes(:participants) # Program Load (2.0ms) SELECT "programs".* FROM "programs" # ProgramParticipant Load (1.0ms) SELECT "program_participants".* FROM "program_participants" WHERE "program_participants"."program_id" IN ($1, $2) [["program_id", 1], ["program_id", 2]] # Person Load (0.5ms) SELECT "people".* FROM "people" WHERE "people"."id" IN ($1, $2, $3) [["id", 4], ["id", 2], ["id", 3]] # View programs.map do |program| program.participants.without(current_user).map { _1.first_name }.join(", ") end # Person Load (1.5ms) SELECT "people".* FROM "people" INNER JOIN "program_participants" ON "people"."id" = "program_participants"."participant_id" WHERE "program_participants"."program_id" = $1 AND "people"."id" != $2 [["program_id", 1], ["id", 4]] # Person Load (0.6ms) SELECT "people".* FROM "people" INNER JOIN "program_participants" ON "people"."id" = "program_participants"."participant_id" WHERE "program_participants"."program_id" = $1 AND "people"."id" != $2 [["program_id", 2], ["id", 4]] # => ["Gabriel", "Gabriel, Alex"]
Notice that participant (in
people
table) are being loaded, and seemingly ignoring theincludes
in the controller.The N+1 was not present before this app was upgraded to Rails 7.0. That is key. We can see in the changelog the implementation of
ActiveRecord::Relation#excluding
(but not mentioned in the guide as a notable change). Before that,excluding
(orwithout
) was implemented inEnumerable
, which didn’t create the N+1. In fact, using that method – by callingto_a
on the relation – returns us to the desired behavior:programs.map do |program| program.participants.to_a.without(current_user).map { _1.first_name }.join(", ") end # => ["Gabriel", "Alex, Gabriel"] # --> Same result, no extra query!
Conclusion
Typically, doing more work in the database and less in Ruby brings performance improvements. In this specific case, the optimization prevented using already loaded data, which resulted in many more queries and overall worse performance. Catching these errors when upgrading Rails is difficult, because the functionality was actually not affected.