Ylan Segal

Unicorn vs. Puma: Round 3

MRI Ruby has gotten a lot faster since I ran my last benchmark, so it’s time for an update.

Methodology

The benchmark consists of hitting a single endpoint on a rails (4.1.1) app in production mode for 30 seconds. The endpoint reads a set amount of posts from a Postgres database and the renders them in a html view using erb, without any view caching.

The purpose of the benchmark to get an idea of the performances characteristics under varying load for MRI and jRuby. Unicorn was chosen for MRI because it uses the unix fork model for it’s processes, which is pretty much the de facto way to do concurrency in MRI. Puma was chosen for jRuby because it bills itself as a very efficient threading server (although recent versions can mix forking workers and threading). Threading is the de facto way to do concurrency on the JVM.

Of course, there are many parameters that can be tweaked in the server configuration. No benchmark is perfect, but I believe it’s a good indication of what type of performace differences can be seen in the two versions on ruby.

Here are the details:

Unicorn

  • Unicorn 4.8.3
  • Ruby 2.1.2
  • Configuration: 4 Workers, Timeout 30 seconds
  • Maximum observer memory usage: 450 Mb

Puma

  • Puma 2.8.2
  • jRuby 1.7.12
  • Configuration: Default configuraration (Maximum 16 threads)
  • Maximum observer memory usage: 482 Mb

The number of unicorn workers where used in order to match the amount of memory used by puma. In both cases, the observer memory stays below a 1x dyno from Heroku (but not by much).

The benchamrk was run with Apache’s Benchamrking Tool, with varying levels of concurrency:

1
$ ab -c $USER_COUNT -t 30 $URL

Results

Both servers perform similarly in the number of request they can handle per second. Unicorn seems to ramp up on par with it’s number of workers and then plateau. Even though more users are hitting the endpoint concurrently, unicorn just handles 4 at a time. Puma seems to increase in capacity with more users, although there is a sharp drop-off 1 at the end when reaching 64 concurrent users.

With regards of average (or 50th percentile) response time, it looks like both servers, surprisingly, perform exactly the same!. The response times are significantly slower when the server is under heavier load, but still perform acceptably.

The 95th percentile and 99th percentile graphs paint a different story though: Unicorn’s response time start to get more pronounced as concurrency increases, wich means that for some of the users, it might easily fall into unacceptable levels.

How significant is this? For example, let’s take the 32 concurrent users case: Puma 50th percentile response is 62 ms against unicorn’s 64 ms. Not very different. However, when we look at the 95% percentile puma comes in at 147 ms, wich is 2.3 times the average. Unicorn comes in at 175 ms, 2.7 times the average. Looking into the 99th percentile, puma’s response if 2.69 times the average response; Unicorn is a more dramatic 4.15 times. You should care about percentiles and not just the average response time.

Conclusion

Since the benchmark was last made, MRI Ruby has gotten much faster (last version was running in 1.9.3), however running a Rails app in jRuby still offers some performance characteristics under high load.


  1. I do not know what that drop-off means, and it didn’t seem to be there last year. However, I re-ran the benchmark many times, and got consistent results.

The Illusion of Security

I recently refinanced my car loan with a local credit union. The refinance process is pretty easy and mostly handled over the phone, until it’s time to sign the paperwork, for which they requested an email address. A few minutes later I get an email from the credit union in which I am notified that I have a secure email waiting at the other side of a link. Upon clicking, you visit a Barracuda Network site, in which I a need an email and password to access. As I have not established a password in the past, I just need to type a new one and confirm it in another box. Easy.

Killing Me Softly

Every once in a while, a process is stuck and doesn’t want to respond. I usually just found the process id by using ps and then ran kill -9 <pid>. Why? Cargo-culting, mostly.

Recently a friend and co-worker shared with me a little bash function that will attempt to send less destructive signals to the process to allow it to have time to clean-up after itself. Eventually it ends up just sending the KILL signal, equivalent to -9.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function mercy_kill() {
  pid=$1
  for signal in TERM INT HUP KILL; do
    cmd="kill -s ${signal} $pid"
    echo $cmd
    eval $cmd
    for i in {0..19}; do
      if [ $(ps -p $pid|wc -l) -lt 2 ]; then
        echo "pid $pid no longer exists"
        return 0
      fi
      sleep 0.1
    done
  done
}

Use Multiple Ruby Engines in the Same Project

One of the biggest pains of using jruby is the slow startup time.

For a trivial rails application, the startup is really painful:

1
2
3
4
5
$ rvm current
jruby-1.7.11
$ time rails runner "puts 'Hello'"
Hello
rails runner "puts 'Hello'"  24.82s user 0.83s system 223% cpu 11.457 total

Compare to the same project running MRI:

1
2
3
4
5
$ rvm current
ruby-2.1.1
$ time rails runner "puts 'Hello'"
Hello
rails runner "puts 'Hello'"  1.14s user 0.19s system 98% cpu 1.355 total

MRI is more than 20 times faster!

The REPL: Issue 1

Today, I am starting a new feature for this blog. I am calling it The REPL. It’s pretty much a link page of interesting reading I have done around the web in the last week (or, more than likely, since the last issue). Of course, this is not a new idea, but I still think there might be some value to it. I will try to avoid this becoming an echo chamber and instead I will try to focus on material that has got me thinking about software engineering.

The Circuit Breaker Pattern

Martin Fowler explains the circuit breaker pattern. Coincidentally at work, we have been discussing using something like this for building in fault tolerance in our interactions with other services. Netflix has a library (in java) for this sort of thing and has blogged about it’s use. Embracing that failure will happen and properly preparing for it turns how you design your code on its head.

Using Interactors To Clean Up Rails

The fellows at Grouper explain how they are using the interactor gem to extract business logic from controllers and models. Again, this is a pattern that we adopted at work not too long ago. DHH gave it some flak on the hacker news comments, but it has given our team a convention on where and how to code business logic.

Store Data Not Types

A cautionary tale on why it’s important to set clear boundaries between your system and the libraries and frameworks that you use.