Ylan Segal

Ruby Stdlib: Base64

Base64 is a widely-used mechanism to represent binary data in an ASCII string format. There are a few different Base64 schemes that share most of the implementation. The encoding strategy consists of choosing 64 characters that are common to most other string encodings and are also printable. For example, MIME’s Base64 implementation uses A-Z, a-z, and 0-9 for the first 62 characters. Other variations share this property but differ in the characters chosen for the last two values and an extra one for padding. Each base64 digit represents exactly 6 bits of data.

Wikipedia’s Base64 article has a great explanation of the details of encoding and decoding from Base64.

Base64 is typically used to send binary across channels that are text based like email, JSON Web Tokens, SAML Requests and Response, and many more.

Ruby includes the base64 package in its standard library, with support for RFC-2045, RFC-4648 and “RFC-4648 Base 64 Encoding with URL and Filename Safe Alphabet”.

It’s usage is straight forward:

1
2
3
4
5
6
7
require "base64"

encoded = Base64.encode64("I'd rather be a hammer than a nail")
# => "SSdkIHJhdGhlciBiZSBhIGhhbW1lciB0aGFuIGEgbmFpbA==\n"

Base64.decode64(encoded)
# => "I'd rather be a hammer than a nail"

Base64 is a module. It can be called directly, like in the previous example or included in other classes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
require "base64"

class MyEncoder
  include Base64

  def initialize(binary)
    @binary = binary
  end

  def encode
    urlsafe_encode64(@binary)
  end
end

MyEncoder.new("I'd rather be a hammer than a nail").encode
# => "SSdkIHJhdGhlciBiZSBhIGhhbW1lciB0aGFuIGEgbmFpbA=="

Notice that the urlsafe_encode64 returns slightly different results than encode64. See the Ruby documentation for details.

You can create your own modules with functions that can be included or called directly, like Base64 does. Use module_function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module Greeter
  def hello
    "Hello, World!"
  end

  module_function :hello
end

class Person
  include Greeter

  def greet
    hello
  end
end


Greeter.hello # => "Hello, World!"
Person.new.greet # => "Hello, World!"

Under the hood, Base64 relies on Array#pack (documentation) and String#unpack1 (documentation) which do the heavy lifting. Both of these methods are implemented in C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
pry(main)> show-source Array#pack

From: pack.c (C Method):
Owner: Array
Visibility: public
Number of lines: 621

static VALUE
pack_pack(int argc, VALUE *argv, VALUE ary)
{
  # ... many lines removed
}

pry(main)> show-source String#unpack1

From: pack.c (C Method):
Owner: String
Visibility: public
Number of lines: 5

static VALUE
pack_unpack1(VALUE str, VALUE fmt)
{
    return pack_unpack_internal(str, fmt, UNPACK_1);
}

The REPL: Issue 37 - August 2017

The fallacies of web application performance

Is performance only a production concern? Are threads enough for multi-core concurrency? Are there cost-free solutions to solve performance? José Valim answers these en some other questions in this post. José is the creator of Elixir’s Phoenix framework and was part of the Rails core team. I’ve found most of his writing to be worth my time. This is no exception.

Developing with Kafka and Rails Applications

Sam Goldman explains how Blue Apron uses Ruby on Rails to work with Apache Kafka. Part of the article touches on which gems they use to process Kafka streams. The other portion describes how to setup a local development environment. Docker is leveraged effectively to make a complicated setup something easy to spin up locally: The final product has 4 different services: Zookeper, a Kafka broker, A schema registry, and a REST proxy for Kafka.

An Intro to Compilers

Nicole Orchard writes an introductory post on how compilers work. Specifically those leveraging the LLVM toolchain – used by Swift, most Mac gcc compilers, Crystal and many more. It takes a simple “Hello, Compiler!” program through the 3 phases: Front-end, Optimizer and Back-end. Short and sweet.

Book Review: Understanding the Four Rules of Simple Design

Understanding the Four Rules of Simple Design by Corey Haines is a book about how to approach software design from a perspective of his years of the authors involvement in Code Retreats. A Code Retreat is a day-long practice session for software developers where they can explore different ways of building software by practicing deliberately without the pressure of having to deliver production code. I’ve previously written about my experience in a code retreat.

The book uses the same base example that code retreats do: Conway’s Game of Life. This example is specifically chosen because the rules are simple enough to understand quickly, yet it possible to write an implementation in many different ways with interesting tradeoffs.

The 4 rules of simple design, first enumerated by Kent Beck are presented in simplified form as:

  1. Test Passes
  2. Express Intent
  3. No Duplication (DRY)
  4. Small

Each of this rules is expanded on in detail with plenty of examples. One of my favorite quotes:

In the end, most design guidelines are best internalized and applied subconsciously.

This books converges on many of the better patterns that I like about the Ruby community: Outside-in test-driven development, writing small intention revealing methods, consciously think about what each object’s public API and avoiding over-designing for a future that may not materialize. I enjoyed reading it very much.

Links:

Testing a Puts Method

When I code long-running tasks, I often want to see some sort of progress report in my terminal to let me know that my code is still running. Let’s take a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class ThumbnailCreator
  def process
    images.each_with_index do |image, index|
      # ...
      puts "Processed #{index + 1} images" if index % 10 == 0
    end
  end

  private

  def images
    # ... somehow find eligible images for processing
  end
end

The above code will print a new line to the console every 10th image processed. While this approach works, it is also hard to test and causes undesired output when running my tests. Can we do better? Where does the puts method comes from:

1
2
3
4
5
6
7
8
9
10
11
pry(main)> show-doc ThumbnailCreator#puts

From: io.c (C Method):
Owner: Kernel
Visibility: private
Signature: puts(*arg1)
Number of lines: 3

Equivalent to

    $stdout.puts(obj, ...)

pry makes it easy to trace the source of that method the Kernel module. Furthermore, it lets us know that Kernel#puts is equivalent to calling $stdout.puts. $stdout is a global ruby constant, which holds the current standard output. We can make that explicit in our code:

1
2
3
4
5
6
7
8
class ThumbnailCreator
  def process
    images.each_with_index do |image, index|
      # ...
      $stdout.puts "Processed #{index} images" if index % 10 == 0
    end
  end
end

Adding an explicit receiver for the puts makes the code a bit longer and more verbose – usually things that rubyists shun. It also makes it clear that our class is collaborating with $stdout, a different object. Once we realize that, it follows that we can also make this collaboration configurable through dependency injection.

1
2
3
4
5
6
7
8
9
10
11
12
class ThumbnailCreator
  def initialize(out = $stdout)
    @out = out
  end

  def process
    images.each_with_index do |image, index|
      # ...
      @out.puts "Processed #{index} images" if index % 10 == 0
    end
  end
end

All existing code that use our class continue to work as before: The default value for out will ensure that by default, we continue printing to $stdout. However, in our tests, we can now inject a different collaborator. What can we use?

So far, we’ve used only one method on out. Ruby will happily let us inject any object that we want, as long as it implements puts in a compatible manner (in terms of arity). However, there is a risk that our tests can become too coupled to our implementation by only passing an object that implements the narrowest of interfaces. Ruby’s stdlib includes a class that we can use: StringIO

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ ri StringIO

= StringIO < Data

------------------------------------------------------------------------------
= Includes:
(from ruby core)
  Enumerable
  IO::generic_readable
  IO::generic_writable

(from ruby core)
------------------------------------------------------------------------------
Pseudo I/O on String object.

Commonly used to simulate `$stdio` or `$stderr`

=== Examples

  require 'stringio'

  io = StringIO.new
  io.puts "Hello World"
  io.string #=> "Hello World\n"
------------------------------------------------------------------------------

Our tests can now use and verify the collaborator:

1
2
3
4
5
6
7
8
9
10
11
12
require "rspec"

describe ThumbnailCreator do
  subject { described_class.new(out) }
  let(:out) { StringIO.new }

  it "shows progress while processing images" do
    subject.process

    expect(out.string).to match(/Processed/)
  end
end

Conclusion

Often classes collaborate implicitly with other objects. Making the collaboration explicit allows us to use dependency injection as a way to configure behavior, resulting in a more modular design. Our initial motivation to test our code resulted in a better design, at little cost.

The REPL: Issue 36 - July 2017

Is Ruby Too Slow For Web-Scale?

Nate Berkopec writes a long post about Ruby performance and how it affects web applications. Not-withstanding the click-bait title, Nate brings up that raw performance might not be as significant as many teams would like to think. Many of use work on applications that receive only a modest amount of traffic. In this organizations, the trade-off between engineering productivity and server costs tilts towards productivity.

Five ways to paginate in Postgres, from the basic to the exotic

Most web-applications encounter a need to paginate results into multiple page loads. Joe Nelson works his way from the most simple implementations (LIMIT and OFFSET) to the more complex. He discusses the benefits and drawbacks of each. The techniques described cover most of the typical web-application needs. The more exotic ones – like stable page loads that return the same results even if elements are added or deleted from the collection – require more exotic solutions. They are usually expensive to compute.

An engineer’s guide to cloud capacity planning

Patrick McKenzie writes a great guide on how to plan server capacity in the cloud. He covers decoupling the applications with knowledge from it’s deployment environment, advises to automate provisioning and deployment, covers how to estimate capacity and what to focus on as traffic grows. This is another great article by Increment.