Ylan Segal

The REPL: Issue 10 - May 2015

Lessons Learned In Software Development

Henrik Warne compiles a list of rules-of-thumb he has learned throughout his career. Great nuggets of information. If you find some of these obvious, it’s probably because you have already internalized them.

Do Not Disclose Your Salary To Recruiters

Salary negotiations are something that many software engineers (and people in general) don’t give much thought about. However, it’s effect on your career is huge. Learning to negotiate and dealing with recruiters are useful skills to have.

Why Learning Rails Is Hard

Brook Riggio presents a great mind-map of the skills he considers a Rails developer needs to be competent in. As he mentions, if anything, after reading it I was ready to add many items to the list. Web systems can get complicated in a really quickly.

Using a Ruby Class To Write Functional Code

With a clear style and building sequentially, Pat Shaughnessy explains how to leverage some functional programming concepts in an object-oriented language. I’ve had a lot of success implementing code in this manner. Makes it easy to read, easy to change.

Experiment: Use Rbnev Instead of Rvm

I have been using rvm to manage my rubies for almost 5 years, mostly without problems. Throughout the years though, the number of features added keeps going in an attempt to do more for the user. Two weeks ago I was dealing with a cryptic stack trace related to X509 certificates when doing some cryptographic operations in JRuby 1.7.19. I wasn’t really sure what the culprit was, but the rvm documentation suggest that rvm itself can fix the issue. That seemed weird to me and also, it didn’t work. I was stuck with a JRuby installation that could not read the certificate from *https://www.google.com*.

Methodology

Under the assumption that the culprit of my problem was rvm, I decided that try one of the alternatives: rbenv. Luckily this is a blog, because when speaking the name of the two tools sound infuriatingly similar. Switching to rbenv was relatively easy. The steps I followed are those outlined in brentertz gist:

Installing rubies was straight forward. I usually need a few versions of MRI on hand, going back to 1.9.3 and JRuby as well. All were installed without problems and worked fine.

My team had some scripts that assumed rvm was installed, but it was trivial to add support for rbenv, like so:

1
2
3
4
5
6
if which rvm &> /dev/null; then
  rvm --create use ${version}
fi
if which rbenv &> /dev/null; then
  rbenv shell ${version}
fi

In addition, I like having the current ruby version in my prompt, because I switch between versions often, even while working on the same project. My custom zsh theme needed to be adjusted as well. Using the same trick as above, I created a bash function that does the right thing:

1
2
3
4
5
6
7
8
9
10
11
# Somewhere that gets sourced on shell init.. like .profile
ruby_version()
{
  if which rbenv &> /dev/null; then
    rbenv version | cut -f1 -d ' '
  else
    if which rvm-prompt &> /dev/null; then
     rvm-prompt i v g
    fi
  fi
}

Results

Most of our projects have been around for a while, so they are setup to use gemsets, because that was what rvm encouraged (and maybe still does, I don’t know). rbenv’s philosophy, on the other hand, is that they are unnecessary when using bundler. So far, not using gemsets has not had negative effects for me. I also have noted that my shell feels snappier when navigating directories: I attribute that to rvm hooking into cd, which is not done by rbenv.

So far, I have been happy with rbenv and believe that it is a simpler tool that does enough for the job at hand, but no more. And remember that X509 issue? It turns out it was not really related to rvm at all: It was caused by duplicate certificates derived from the OSX keychain that where being picked up by JRuby and the underlying Java classes objected to. That issue got solved by getting certs from the curl website and pointing JRuby to use those.

Book Review: Architecting the Cloud

Architecting The Cloud. Design decisions for cloud computing service models, by Michael J. Kavis describes cloud computing in general and the different service models that are prevalent today in particular. It explores the differences and trade-offs between Software as a service (SaaS), Platform as a service (PaaS) and Infrastructure as a service (IaaS). I consider the book a good introduction to considerations for cloud computing for those that are used to more traditional data-center deployments.

The author covers a section on worst practices: Things that do not translate well when moving to the cloud and recommendations on how to avoid them. I found the most useful chapter to be the one on disaster recovery: A good overview of different strategies to become fault-tolerant in the cloud and embracing resiliency.

The REPL: Issue 9 - April 2015

Does Organization Matter?

Uncle Bob makes a useful analogy about code organization and physical organization of say, your desk or a library. Organization matter. Sometimes, all we need is a small amount of organization, sometimes we need the Dewy Decimal System

Why (and How) I Wrote My Academic Book in Plain Text

Most developers appreciate the benefits of plain text files since they play so well with other tools, like source control, grep, find, etc. W. Caleb McDaniel makes a great case for using plain text other than for programing code. In his case, he composes his academic writing in plain text and uses open source tools at the end to convert them to industry-standard proprietary formats. Awesome.

The Quality Wheel

A big part of effective communication is sharing the same terminology. It helps with context and allows us to be more specific. Jessitron proposes expanding our vocabulary around what “Quality Software” means. Instead of saying a piece of code is “good” or “clean”, how about it’s “configurable” and “readable”.

Adding an Index to Mongo Can Change Query Results

While trying to optimize some slow queries in a MongoDB database, I found an unexpected and concerning surprise: Adding an index can alter the results returned by a query against the same dataset.

Demonstration

Supose we have a collection that looks like this (All samples from a mongo shell):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> db.example.find()
{
  "_id" : ObjectId("5542ef97b08a749f8e8e4f0d"),
  "title" : "Pink Floyd",
  "rating" : 1
}
{
  "_id" : ObjectId("5542efa2b08a749f8e8e4f0e"),
  "title" : "Led Zeppelin",
  "rating" : 2
}
{
  "_id" : ObjectId("5542efb3b08a749f8e8e4f0f"),
  "title" : "Aerosmith",
  "rating" : null
}
{
  "_id" : ObjectId("5542efbab08a749f8e8e4f10"),
  "title" : "Metallica"
}

Note that some documents have a numeric rating, one has a null value and one does not have the field.

Suppose we query for all documents with a rating of 1 or null:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> db.example.find({rating: { $in: [1, null]}})
{
  "_id" : ObjectId("5542ef97b08a749f8e8e4f0d"),
  "title" : "Pink Floyd",
  "rating" : 1
}
{
  "_id" : ObjectId("5542efb3b08a749f8e8e4f0f"),
  "title" : "Aerosmith",
  "rating" : null
}
{
  "_id" : ObjectId("5542efbab08a749f8e8e4f10"),
  "title" : "Metallica"
}

The Metallica document is returned, even though it does not have a rating field.

Suppose that we want to optimize this collection and now we add an index on the rating field and re-run our query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
> db.example.ensureIndex({rating: 1}, {sparse: true})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 2,
  "ok" : 1
}
> db.example.find({rating: { $in: [1, null]}})
{
  "_id" : ObjectId("5542efb3b08a749f8e8e4f0f"),
  "title" : "Aerosmith",
  "rating" : null
}
{
  "_id" : ObjectId("5542ef97b08a749f8e8e4f0d"),
  "title" : "Pink Floyd",
  "rating" : 1
}

The Metallica document is gone. Surprised? I definetly was.

Thoughts

The behavior may seem a bit contrived, but I actually encountered it while trying to optimize a produciton database. This example just boils it down to something trivial to reproduce. I should mention that if the index is created without the sparse option, the results are correct. The sparse option allows saving space on the index itself, by only creating an entry for documents that have the field. A non-sparse index, creates a record for all documents and sets the value to null.

In my opinion, the above-described behavior is awful. It is up to the database engine to decide which index to use. A sparse index may be useful in less queries than a non-sparse index. However, my expectations of indexes is that they are all about performance and trading off disk space and insert time for query time. The existance of an index should never change the result set for the same query and dataset.