Ylan Segal

Bug-Driven Design

The Bug

The other day, I got a bug report from our customer service team. One of our newest customer’s was using our public API and was reporting inconsistencies for our endpoint that lists account users. The issue was that for some users, their role in the account and their active flag was being reported incorrectly – compared with the value visible through our web interface. This was surprising: That endpoint has been in place for years and is used regularly by many of our customers. Nonetheless, the customer support team had verified the report from the customer and I had examples of the JSON output from their API interactions.

The code base for this projects is a mono-Rails application. I’ve worked on it for more than five years. It has evolved significantly throughout the years (including the addition of the public API in question). It has been worked on by more than a dozen engineers. The root cause of the bug could be anywhere. My go-to move in this kind of situation is to reproduce in the development environment, following the rule-of-thumb that reproducing an issue is half the battle. In this case, reproducing the bug was extremely painful. I spot checked the API call for some of my users in my development environment and – with QA’s help – tested on a staging environment. I finally found some instances of the issue, but it wasn’t at all clear to me what the root cause was.

Next, I took a dive through the code. I followed the code starting from the controller action in question and followed the calls all the way to the database and back. Eventually, I found the problem in a decorator class of all things. In this particular project, we use classes to serialize responses into hashes and then let rails serialize those into JSON. The code is functionally similar to this1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class UsersController < ApplicationController
  def index
    @users = #... somehow load users.
    render json: @users.map { |u| UserDecorator.new(u).to_h(current_account) }
  end
end

class UserDecorator
  def initialize(user)
    @user = user
  end

  def to_h(account)
    user_account = user.user_accounts.detect { |ua| ua.account_id = account.id }

    {
      id: user.id,
      first_name: user.given_name,
      last_name: user.family_name,
      email_address: user.primary_email,
      role: user_account.role,
      enabled: user_account.enabled?
    }
  end

  private

  attr_reader :user
end

The above code does some mapping between our internal names for some fields and what we decided to call them in the public API. It also includes some information from the UserAccount model. Spot the bug yet? I didn’t.

The user, account and user_account models in question are typical ActiveRecord models. The relevant pieces look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Account < ActiveRecord::Base
  has_many :user_accounts
  has_many :users, nil, through: :user_accounts
end

class UserAccount < ActiveRecord::Base
  belongs_to :user
  belongs_to :account
end

class User < ActiveRecord::Base
  has_many :user_accounts
  has_many :accounts, nil, through: :user_accounts
end

If you are a keen reader, you probably noticed the same thing I did: In the decorator code user.user_accounts will return an ActiveRecord::Relation. Why is the code filtering for that relation in Ruby? Why is the work being done after loading the ActiveRecord objects instead of filtering in the much-more efficient database? While I was pondering at, it finally hit me: The block inside of detect should be using a ==

1
2
3
user_account = user.user_accounts.detect { |ua| ua.account_id = account.id }
# Should really be
user_account = user.user_accounts.detect { |ua| ua.account_id == account.id }

What is happening? Instead of finding the user_account that corresponds to the passed in account, the code block is assigning a new account_id to the UserAccount. The assignment in ActiveRecord returns the same assigned value, the account id in this case. That value is truthy, so the detect call always returns the first user_account read from the database. By default, ActiveRecord will load the records in an association ordered by the primary key (i.e. id). In our case, this made the detection of the code particularly tricky. When using auto-incrementing ids – the default Rails case – the first account returned will also be the first account inserted: I had considered that as a possible cause for the bug, but rejected it because it didn’t fit my observations on production or staging servers. What I neglected to consider was that our models do not use auto-increment ids: They use UUIDs, stored in the database as strings and generated by SecureRandom. This throws-off any ordering:

1
2
3
SecureRandom.uuid # => "34a22bb9-e648-4264-ad13-3fbb07e9efc6"
SecureRandom.uuid # => "8b355635-ab2a-4b3e-936d-cbec824ab48d"
SecureRandom.uuid # => "39b2bc99-ee56-41e1-9e08-cc2cd65bc374"

The Fix

At this point, I was ready to create a failing test case that exposes the bug: We need a user with multiple UserAccounts (and thus multiple Accounts), we need the UserAccount to have different values for role and enabled?, and we need for them to be loaded in the correct order from the database so that we can expose the bug. My tests looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
describe UserDecorator do
  subject { described_class.new(user) }
  # This project uses FactoryGirl for model factories, instead of Rails' fixtures.
  let(:user) { FactoryGirl.create(:user, accounts_count: 2) }
  let(:user_account) {
    # We need the last user_account to be the one that we check for
    user.user_accounts.last.tap { |user_account|
      user_account.update!(role: role, enabled: true)
    }
  }
  let(:account) { user_account.account }
  let(:role) { :admin }

  before {
    # We need the first user account to have different values, so that it exposes the bug
    user.user_accounts.first.tap { |user_account|
      user_account.update!(role: :other, enabled: false)
    }
  }

  it "returns the correct role name for the user_account" do
    expect(subject.to_h[:role]).to be == role
  end

  it "returns the correct enabled value for the user_account" do
    expect(subject.to_h[:enabled]).to be == true
  end
end

With the bug exposed, I was committed a fix and opened a pull request, figuring I would fix the bug and get better performance by limiting the number of ActiveRecord objects loaded from the database.

1
2
3
4
# This:
user_account = user_accounts.detect { |ua| ua.account_uid = account.uid }
# Became
user_account = user_accounts.where(account: account).first

I wasn’t quite done. One of my co-workers pointed out that there was a reason that the code was using Ruby to filter the user_accounts. The previous commit message – the one that brought the Ruby filtering – had a helpful message that I neglected to read:

1
2
3
4
5
Find correct user_account in ruby for performance...

The user_accounts are already preloaded, but ActiveRecord goes
to the database again, which causes an N+1 query. In this case,
doing the filtering in Ruby is more efficient

I shouldn’t have assumed that the person that coded the filtering in Ruby was doing it wrong. In this case, there seem to be a good reason behind the unusual code. By the way, the author of that previous commit was me in 2015. I was smart enough to leave a note for my future self, but not humble enough to read it.

Design Implications

The bug is now fixed, yet I am not quite comfortable with the code. Two things bother me: - The specs needs a relatively complicated setup to expose the bug - The production code is loading more objects from the database, only to discard them in Ruby. This somehow is faster.

Why is a our test complicated? A decorator is supposed to be a simple class. It really is about serializing an object into JSON. The class is called a UserDecorator. It’s instantiated with User, which makes total sense. However, we have two collaborators that snuck-up on the serialization. An instance of Account, being passed in and a UserAccount that we find based on the other two. Where does the account come from? Originally, I glossed over how the users are loaded. The code is a bit complicated because it searches the database based on the parameters passed in. However, in it’s simplest form and removing a few layers of abstraction, it’s functionally equivalent to:

1
2
3
4
5
6
class UsersController < ApplicationController
  def index
    @users = SearchUsers.new(params).perform.preload(:user_accounts)
    render json: @users.map { |u| UserDecorator.new(u).to_h(current_account) }
  end
end

It’s not particularly important how SearchUsers works, but it’s sufficient to know that #perform returns an ActiveRecord::Relation that will load User records upon execution. The preloading here is what causes the user_accounts perviously alluded to in the commit message to be already loaded. If we stop to think about it, something else tickles my code nose: If we are searching for users within an account, and then pre-loading all user_accounts for those users, we are loading some data for other accounts. This is unacceptable. On requests that are account-scoped, like this one, data for other other accounts should never be loaded from the database.

Coming back to our decorator: Why are we even passing a parameter to the method? If #to_h is supposed to just construct a hash, let’s make the code just do that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class UserDecorator
  def initialize(user, account)
    @user = user
    @account = account
  end

  def to_h
    {
      id: user.id,
      first_name: user.given_name,
      last_name: user.family_name,
      email_address: user.primary_email,
      role: user_account.role,
      enabled: user_account.enabled?
    }
  end

  private

  attr_reader :user, :account

  def user_account
    @user_account ||= user.user_accounts.detect { |ua| ua.account_id == account.id }
  end

This is better: It moves the searching for a user account out on it’s own method. It also now needs a user and an account to initialize the decorator. A User and an Account together are… a UserAccount. And at this point, is where the flaws in the original design become evident. We are searching for, decorating and listing users, but working in an account scope. This concept already exists in our system, in the form of the a UserAccount and we should be leveraging it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class UsersController < ApplicationController
  def index
    @users = SearchUserAccounts.new(params).perform
    render json: @users.map { |u| UserAccountDecorator.new(u).to_h }
  end
end

class UserAccountDecorator
  def initialize(user_account)
    @user_account = user_account
  end

  def to_h
    {
      id: user.id,
      first_name: user.given_name,
      last_name: user.family_name,
      email_address: user.primary_email,
      role: user_account.role,
      enabled: user_account.enabled?
    }
  end

  private

  attr_reader :user_account

  def user
    user_account.user
  end
end

This is markedly better. We are not loading extra objects from the database anymore. The only UserAccounts loaded are the ones pertaining to the account in scope. The decorator now doesn’t know or care about how the user_accounts are loaded. As always, we need to be careful that we don’t introduce N+1 queries when the user_account reaches out to the user. A judicious user of ActiveRecord::Relation#includes can take care of that.

What about the complicated spec setup? Well, we got rid of the UserDecorator and it’s specs. Sarah Mei writes about testing serving many purposes or – as she calls them – factors. One of those is informing our design. Our specs above clearly did that: The complicated setup lead to the realization that we were dealing with the wrong data model.

Another test factor is to prevent future regressions. The same issue we had before is unlikely to crop-up again in the new design. However, production bugs are to be taken seriously – especially ones where data is being improperly reported for an account. Where should this regression-prevention spec live? We could move it to the controller and make it an integration spec of sorts. I expect the setup to be at least as complicated as the one shown above – and quite possible much worse. Rails controllers are notoriously difficult to test. I would gladly pay that price before laving no regression spec in place, though.

Fortunately, there is a better place for our complicated spec: SearchUserAccounts. As the name implies, this class' responsibility is to load data from our database based on the input in the params. I haven’t shown that code at all, but I hope it’s clear that it deals with the ActiveRecord querying API. This is the correct level to test that only the correct data has been loaded. The regression spec then becomes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
describe SearchUserAccounts do
  subject { described_class.new(params) }
  let(:params) { { account: account } }
  let(:account) { user_account.account }
  let(:user_account) { user.user_accounts.first }
  let(:user) { FactoryGirl.create(:user, accounts_count: 2) }

  let(:other_user_account) { user.user_accounts.last }

  it "only loads user_accounts belonging to the scoped account" do
    aggregate_failures do
      expect(subject.perform).to include(user_account)
      expect(subject.perform).not_to include(other_user_account)
    end
  end
end

The spec is now simpler and testing at the correct level of abstraction.

Conclusion

Fixing bugs is a part of programming. Sometimes, they are simple and emerge because of our failure to consider all inputs or internal state. I hope that the takeaway for you, the reader, is that sometimes bugs can expose subtle design deficiencies. Improving the design can do much more than just fix the bug: It can bring simplicity to both the production code and it’s tests.


  1. All the code in this post is inspired by actual production code in a project I work on. It has been simplified considerably to make it easier to follow, protect copyright, and make it easier for the reader to see the proverbial forest for the trees.

Ruby Stdlib: Base64

Base64 is a widely-used mechanism to represent binary data in an ASCII string format. There are a few different Base64 schemes that share most of the implementation. The encoding strategy consists of choosing 64 characters that are common to most other string encodings and are also printable. For example, MIME’s Base64 implementation uses A-Z, a-z, and 0-9 for the first 62 characters. Other variations share this property but differ in the characters chosen for the last two values and an extra one for padding. Each base64 digit represents exactly 6 bits of data.

Wikipedia’s Base64 article has a great explanation of the details of encoding and decoding from Base64.

Base64 is typically used to send binary across channels that are text based like email, JSON Web Tokens, SAML Requests and Response, and many more.

Ruby includes the base64 package in its standard library, with support for RFC-2045, RFC-4648 and “RFC-4648 Base 64 Encoding with URL and Filename Safe Alphabet”.

It’s usage is straight forward:

1
2
3
4
5
6
7
require "base64"

encoded = Base64.encode64("I'd rather be a hammer than a nail")
# => "SSdkIHJhdGhlciBiZSBhIGhhbW1lciB0aGFuIGEgbmFpbA==\n"

Base64.decode64(encoded)
# => "I'd rather be a hammer than a nail"

Base64 is a module. It can be called directly, like in the previous example or included in other classes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
require "base64"

class MyEncoder
  include Base64

  def initialize(binary)
    @binary = binary
  end

  def encode
    urlsafe_encode64(@binary)
  end
end

MyEncoder.new("I'd rather be a hammer than a nail").encode
# => "SSdkIHJhdGhlciBiZSBhIGhhbW1lciB0aGFuIGEgbmFpbA=="

Notice that the urlsafe_encode64 returns slightly different results than encode64. See the Ruby documentation for details.

You can create your own modules with functions that can be included or called directly, like Base64 does. Use module_function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module Greeter
  def hello
    "Hello, World!"
  end

  module_function :hello
end

class Person
  include Greeter

  def greet
    hello
  end
end


Greeter.hello # => "Hello, World!"
Person.new.greet # => "Hello, World!"

Under the hood, Base64 relies on Array#pack (documentation) and String#unpack1 (documentation) which do the heavy lifting. Both of these methods are implemented in C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
pry(main)> show-source Array#pack

From: pack.c (C Method):
Owner: Array
Visibility: public
Number of lines: 621

static VALUE
pack_pack(int argc, VALUE *argv, VALUE ary)
{
  # ... many lines removed
}

pry(main)> show-source String#unpack1

From: pack.c (C Method):
Owner: String
Visibility: public
Number of lines: 5

static VALUE
pack_unpack1(VALUE str, VALUE fmt)
{
    return pack_unpack_internal(str, fmt, UNPACK_1);
}

The REPL: Issue 37 - August 2017

The fallacies of web application performance

Is performance only a production concern? Are threads enough for multi-core concurrency? Are there cost-free solutions to solve performance? José Valim answers these en some other questions in this post. José is the creator of Elixir’s Phoenix framework and was part of the Rails core team. I’ve found most of his writing to be worth my time. This is no exception.

Developing with Kafka and Rails Applications

Sam Goldman explains how Blue Apron uses Ruby on Rails to work with Apache Kafka. Part of the article touches on which gems they use to process Kafka streams. The other portion describes how to setup a local development environment. Docker is leveraged effectively to make a complicated setup something easy to spin up locally: The final product has 4 different services: Zookeper, a Kafka broker, A schema registry, and a REST proxy for Kafka.

An Intro to Compilers

Nicole Orchard writes an introductory post on how compilers work. Specifically those leveraging the LLVM toolchain – used by Swift, most Mac gcc compilers, Crystal and many more. It takes a simple “Hello, Compiler!” program through the 3 phases: Front-end, Optimizer and Back-end. Short and sweet.

Book Review: Understanding the Four Rules of Simple Design

Understanding the Four Rules of Simple Design by Corey Haines is a book about how to approach software design from a perspective of his years of the authors involvement in Code Retreats. A Code Retreat is a day-long practice session for software developers where they can explore different ways of building software by practicing deliberately without the pressure of having to deliver production code. I’ve previously written about my experience in a code retreat.

The book uses the same base example that code retreats do: Conway’s Game of Life. This example is specifically chosen because the rules are simple enough to understand quickly, yet it possible to write an implementation in many different ways with interesting tradeoffs.

The 4 rules of simple design, first enumerated by Kent Beck are presented in simplified form as:

  1. Test Passes
  2. Express Intent
  3. No Duplication (DRY)
  4. Small

Each of this rules is expanded on in detail with plenty of examples. One of my favorite quotes:

In the end, most design guidelines are best internalized and applied subconsciously.

This books converges on many of the better patterns that I like about the Ruby community: Outside-in test-driven development, writing small intention revealing methods, consciously think about what each object’s public API and avoiding over-designing for a future that may not materialize. I enjoyed reading it very much.

Links:

Testing a Puts Method

When I code long-running tasks, I often want to see some sort of progress report in my terminal to let me know that my code is still running. Let’s take a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class ThumbnailCreator
  def process
    images.each_with_index do |image, index|
      # ...
      puts "Processed #{index + 1} images" if index % 10 == 0
    end
  end

  private

  def images
    # ... somehow find eligible images for processing
  end
end

The above code will print a new line to the console every 10th image processed. While this approach works, it is also hard to test and causes undesired output when running my tests. Can we do better? Where does the puts method comes from:

1
2
3
4
5
6
7
8
9
10
11
pry(main)> show-doc ThumbnailCreator#puts

From: io.c (C Method):
Owner: Kernel
Visibility: private
Signature: puts(*arg1)
Number of lines: 3

Equivalent to

    $stdout.puts(obj, ...)

pry makes it easy to trace the source of that method the Kernel module. Furthermore, it lets us know that Kernel#puts is equivalent to calling $stdout.puts. $stdout is a global ruby constant, which holds the current standard output. We can make that explicit in our code:

1
2
3
4
5
6
7
8
class ThumbnailCreator
  def process
    images.each_with_index do |image, index|
      # ...
      $stdout.puts "Processed #{index} images" if index % 10 == 0
    end
  end
end

Adding an explicit receiver for the puts makes the code a bit longer and more verbose – usually things that rubyists shun. It also makes it clear that our class is collaborating with $stdout, a different object. Once we realize that, it follows that we can also make this collaboration configurable through dependency injection.

1
2
3
4
5
6
7
8
9
10
11
12
class ThumbnailCreator
  def initialize(out = $stdout)
    @out = out
  end

  def process
    images.each_with_index do |image, index|
      # ...
      @out.puts "Processed #{index} images" if index % 10 == 0
    end
  end
end

All existing code that use our class continue to work as before: The default value for out will ensure that by default, we continue printing to $stdout. However, in our tests, we can now inject a different collaborator. What can we use?

So far, we’ve used only one method on out. Ruby will happily let us inject any object that we want, as long as it implements puts in a compatible manner (in terms of arity). However, there is a risk that our tests can become too coupled to our implementation by only passing an object that implements the narrowest of interfaces. Ruby’s stdlib includes a class that we can use: StringIO

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ ri StringIO

= StringIO < Data

------------------------------------------------------------------------------
= Includes:
(from ruby core)
  Enumerable
  IO::generic_readable
  IO::generic_writable

(from ruby core)
------------------------------------------------------------------------------
Pseudo I/O on String object.

Commonly used to simulate `$stdio` or `$stderr`

=== Examples

  require 'stringio'

  io = StringIO.new
  io.puts "Hello World"
  io.string #=> "Hello World\n"
------------------------------------------------------------------------------

Our tests can now use and verify the collaborator:

1
2
3
4
5
6
7
8
9
10
11
12
require "rspec"

describe ThumbnailCreator do
  subject { described_class.new(out) }
  let(:out) { StringIO.new }

  it "shows progress while processing images" do
    subject.process

    expect(out.string).to match(/Processed/)
  end
end

Conclusion

Often classes collaborate implicitly with other objects. Making the collaboration explicit allows us to use dependency injection as a way to configure behavior, resulting in a more modular design. Our initial motivation to test our code resulted in a better design, at little cost.