Book Review: Building Git
Building Git by James Coglan
git is a widely successful version control system, it’s used in software companies large and small. It’s distributed nature changed software development in many ways. In
Building Git, James Coglan re-implements a subset of
git’s functionality from the ground up, using Ruby, which has a large standard-library and is higher-level than the original C.
git itself is a large project with a lot of functionality. The book covers a lot of ground, in a step-by-step fashion. Each line of code is explained both conceptually and syntactically.
Some of the most interesting things I learned:
gitare represented by a hash of their content. They are also immutable. Since their content never changes, neither does their hash. This powers
git’s ability to capture the state of every change in the project, without having multiple copies of the same content. I found this to be a clever use of cryptography.
My interactions with
gitare typically around file diffs. I assumed that is what
gitstored. In fact, it stores blobs with the content and mode of files, trees with the heirarchy of files and folders in the project, and commits with metadata about the tree. Every commit in
gitpoints to a full representation of the project. Along the way I got to dive deeper into nested tree structures.
Unix-related content: Writing files atomically, lock files, files modes, guarantees (or lack of) of system calls, etc. In later chapters, the distributed nature of
gitis implemented by taking advantage that
sshcan pipe input and output between a local and remote processes. This lets us build abstractions that hide all the complexity involved in implementing network protocols.
Graph theory: Generating diffs is not quite as straight forward as I imagined. Coglan does not shy away from the explanation of the theory, or rolling up the proverbial sleeves and writing the algorithm implementation.
gitoptimizes object transfer by calculating deltas between objects and transfering in packs. Those packs are sometimes stored as-is on disk.
gitthen deals with reading objects from it’s database from “loose” vs “packed” storage.
Each chapter includes extensive references of the topics covered, as good as I’ve seen in any other technical book.
I enjoyed – and learned a lot from – this book.
Find me on Twitter at @ylansegal and Mastodon at @email@example.com