• Gotcha using Oj to generate JSON

    Oj is a Ruby gem that bills itself as a faster way to generate JSON, mainly through the use of a C extension. I recently found it was generating unexpected results.

    I was looking through a report that one of our endpoints was generating unusually large JSON payloads. In particular, timestamps where being serialized to a very verbose (and not very useful format):

    {
      "created_at": {
        "^o": "ActiveSupport::TimeWithZone",
        "utc": {
          "^t": 1639339673.031328000
        },
        "time": null,
        "time_zone": {
          "^o": "ActiveSupport::TimeZone",
          "name": "UTC",
          "utc_offset": null,
          "tzinfo": {
            "^o": "TZInfo::DataTimezone",
            "info": {
              "^o": "TZInfo::ZoneinfoTimezoneInfo",
              "identifier": "Etc/UTC",
              "offsets": {
                "^#1": [0, {
                  "^o": "TZInfo::TimezoneOffset",
                  "utc_offset": 0,
                  "std_offset": 0,
                  "abbreviation": ":UTC",
                  "utc_total_offset": 0
                }]
              },
              "transitions": [],
              "previous_offset": {
                "^o": "TZInfo::TimezoneOffset",
                "utc_offset": 0,
                "std_offset": 0,
                "abbreviation": ":UTC",
                "utc_total_offset": 0
              },
              "transitions_index": null
            }
          }
        },
        "period": null
      }
    }
    

    I quickly saw that the controller was invoking Oj directly, and that is the root of the problem. The library has a Rails compatibility mode, that is not the default:

    ts = Time.zone.now
    
    ts.to_json
    # => "\"2021-12-12T20:10:56Z\""
    
    Oj.dump(ts)
    # => "{\"^o\":\"ActiveSupport::TimeWithZone\",\"utc\":{\"^t\":1639339856.001998000},\"time\":{\"^t\":1639339856.001998000},\"time_zone\":{\"^o\":\"ActiveSupport::TimeZone\",\"name\":\"UTC\",\"utc_offset\":null,\"tzinfo\":{\"^o\":\"TZInfo::DataTimezone\",\"info\":{\"^o\":\"TZInfo::ZoneinfoTimezoneInfo\",\"identifier\":\"Etc/UTC\",\"offsets\":{\"^#1\":[0,{\"^o\":\"TZInfo::TimezoneOffset\",\"utc_offset\":0,\"std_offset\":0,\"abbreviation\":\":UTC\",\"utc_total_offset\":0}]},\"transitions\":[],\"previous_offset\":{\"^o\":\"TZInfo::TimezoneOffset\",\"utc_offset\":0,\"std_offset\":0,\"abbreviation\":\":UTC\",\"utc_total_offset\":0},\"transitions_index\":null}}},\"period\":{\"^o\":\"TZInfo::TimezonePeriod\",\"start_transition\":null,\"end_transition\":null,\"offset\":{\"^o\":\"TZInfo::TimezoneOffset\",\"utc_offset\":0,\"std_offset\":0,\"abbreviation\":\":UTC\",\"utc_total_offset\":0},\"utc_total_offset_rational\":null}}"
    
    Oj.dump(ts, mode: :rails)
    # => "\"2021-12-12T20:10:56Z\""
    

    Adding mode: :rails to the Oj call fixed the unexpected payload size issue.

    The fact that we had a production endpoint generating unexpected JSON for months lets me know two things:

    • There is no test coverage that checks the generated JSON against a known schema
    • Consumers of this internal endpoint have no use for the timestamps that were being sent down: There is no code that recognizes that data structure.

    Read on →

  • The REPL: Issue 87 - November 2021

    A terrible schema from a clueless programmer

    A very experienced engineer tells a story about a horrible database design. The kicker is that the terrible design was hers, when she was younger and didn’t know any better.

    We’ve all been there. This is how we learn. Especially when a lot of software engineers don’t have the opportunity to be mentored and guided by more experienced engineers.

    RegexLearn

    This is a step-by-step tutorial for learning regular expressions. Well explained, plenty of examples and feels like a smooth on-ramp to regex.

    The History of Command Palettes: How Typing Commands Became The Norm Again

    So much this: Typing commands is better than clicking your mouse. Command palettes help with discoverability.

    In fact, one of my “must-have” Alfred extensions is Menu Bar Search. It adds the command-p behavior to any program, by searching the text of all the menus (using accessibility access). I use it a lot, in all sorts of programs that don’t include such functionality natively (e.g. Quickbooks, Firefox).

    One thing not mentioned is that a shell typically also stores history, which helps you discover commands you’ve already typed before. I use my history all the time and use a fuzzy finder to search through it.

    Read on →

  • Conditionally setting your gitconfig, or not

    In Conditionally setting your gitconfig, Marcus Crane solves a problem that many of us have: Different git configuration for personal and work projects. His solution includes adding conditional configuration, like so:

    [includeIf "gitdir:~/work/"]
      path = ~/.work.gitconfig
    

    I’ve been taking a different approach. According to the git-scm configuration page, git looks for system configuration first, the the user’s personal configuration (~/.gitconfig or ~/.config/git/config), and then the project’s specific configuration.

    In my personal configuration, I typically set my name, but don’t set my email.

    [user]
      name = Ylan Segal
    

    On first interaction with a repository, git makes it evident that an email is needed:

    $ git commit
    Author identity unknown
    
    *** Please tell me who you are.
    
    Run
    
      git config --global user.email "you@example.com"
      git config --global user.name "Your Name"
    
    to set your account's default identity.
    Omit --global to set the identity only in this repository.
    
    fatal: no email was given and auto-detection is disabled
    

    I then use git config user.email "ylan@...." to set a project-specific email. I don’t use the --global option. I want to make that choice each time I start interacting with a new repo.

    As they say, there are many ways to skin a cat.

    Read on →

  • The REPL: Issue 86 - October 2021

    Bitcoin is a Ponzi

    I’ve been thinking about this a lot: Bitcoin (and other crypto) seem like a Ponzi scheme. Is it? Jorge Stolfi make the argument that it is, and I find it compelling.

    Understanding How Facebook Disappeared from the Internet

    Interesting explanation of how BGP and DNS work, and how it it possible for a company like Facebook disappeared completely off the internet, and what it looked like for Cloudflare, one of the biggest content-delivery networks on the internet.

    We Tried Baseball and It Didn’t Work

    An allegory? Sarcasm? Humorous pastiche? You decide.

    I call it satire. Like most funny ones, it resonates because it has a kernel of truth, exaggerated to absurdity. Squarely aimed at those that criticize Agile, TDD, or any other discipline without actually understanding it.

    Read on →

  • The Bike Shed Podcast Feedback

    I recently listened to the The Bikshed, Episode 313 and sent some feedback for the hosts:


    Hello Chris & Steph! I am a long-time listener of the podcast. Thank you for taking the time each week to record it.

    On The Bikshed, Episode 313 you discussed a failure mode in which a Sidekiq job is enqueued inside a transaction. The job gets processed before the transaction commits, so the job encounters an unexpected database state. The job eventually succeeds when retried after the transaction commits.

    The proposed solution, is to enqueue the job after the transaction commits. This certainly fixes that particular failure mode. It also makes possible different ones. Imagine the transaction commits, but the Sidekiq job can not be enqueued for whatever reason (e.g. network partition, buggy code, server node’s process runs out of memory). In this instance, you will fail to process your order. Is this better? You might not even notice that no job was enqueued. You can add code to check for that condition, of course.

    In the original configuration, there are other failure modes as well. For example, the write to the database succeeds, the job enqueues, but then the transaction fails to commit (for whatever reason). Then you have a job that won’t succeed on retries. To analyze all failure modes, you need to assume that any leg of the network communication can fail.

    The main problem you are running up against is that you trying to write to two databases (Postgres and Redis) in a single conceptual transaction. This is known as the “Dual Write” problem. Welcome to distributed systems. You can read a more thorough explanation by Thorben Janssen.

    The approach outlined in that article – embracing async communication – is one way to solve the issue. For smaller Rails apps, there is another approach: Don’t use two databases! If you use a Postgres-based queue like GoodJob or even DelayedJob you don’t have this problem: Enqueuing the job is transactional, meaning that either everything writes (the records and the job) or nothing does: That is a very powerful guarantee. I try to hold on to it as much as possible.

    I hope you’ve found this helpful.

    Thanks again for the podcast.

    Read on →