In my previous posts about bi-temporal data, I dealt with a lot of queries that had where clauses that dealt with operations in dates. For example:
SELECT employee_id, committee_id FROM committee_membership WHERE valid_from <= '2020-05-02' AND '2020-05-02' < valid_up_to AND tx_applicable_from <= NOW() AND NOW() < tx_applicable_up_to
The underlying schema looks like this:
CREATE table committee_membership ( employee_id int NOT NULL, committee_id int NOT NULL, valid_from date NOT NULL, valid_up_to date NOT NULL, tx_applicable_from date NOT NULL, tx_applicable_up_to date NOT NULL )
The four dates in the table share the same structure. There are two prefixes
tx_applicable, and two suffixes
up_to. This structure that hints that the dates represent two different concepts: An interval in time that delineates validity and an interval that delineates applicability.
Craig Baumunk presents at the NJ SQL Server User Group on bi-temporal data. He goes over the differences between non-temporal, valid temporal, transaction temporal modeling and the different types of problems that they solve. He makes the case of why bi-temporal data is superior to all the previous issues and what the implications are. The presentation is from 2011, but it is still as relevant as ever. Note that the presentation is broken up into 7 different parts.
Jeremy Beard covers the importance of bi-temporal data modeling, and the type of problems that it can solve. Using a credit score example, he builds up the modeling bit by bit in an intuitive way. The second portion of the article focuses specifically on the implementation in Cloudera EDH, which I don’t use.
Jeremy Keith writes on how to think about design principles. Sometimes, design principles can be truisms that can be less than useful (e.g. Make it usable.). Expressing principles as a set of priorities, makes them more useful and actionable (e.g. Usability, even over profitability). As an example, he uses the HTML design principles as:
Users, even over authors. Authors, even over implementors. Implementors, even over specifiers. Specifiers, even over theoretical purity.
In non-temporal data, deletions are literal: Specific rows or columns are deleted, because only the current state is modeled. In bi-temporal data, the equivalent operation is modeled by inserting new facts.
Bi-temporal data refers to a modeling technique to store and retrieve data that changes on two different axes. The valid time axis refers to the range of time in which data is valid. Transaction time refers to when the system recorded the data. Keeping track of both with expose a very rich data model.
In the simplest data modeling, a system keeps track of the current state. Let’s assume that we are working with a system that keeps track of company personnel. In its
peopletable, it will hold things like first and last names, date of birth, and social security numbers. At first sight, this seem like invariant facts, but upon closer inspection we can see that in reality they are not. People often change their name when their marital status changes, and birth date and social security number are technically facts that don’t change, but often need to be corrected. These two very different reasons for change are often conflated, or worse, not accounted for. Bi-temporal data provides a way to deal with both.
Like so much of the content in Martin Fowler’s website, this article – by Zhamak Dehghani – is a well-though out description of how to reason breaking out a monolith into services. Some choice quotes:
Every increment must leave us in a better place in terms of the architecture goal.
In the context of leaving both the old way and the new way in place:
If the team stops here and pivots into building some other service or feature, they leave the overall architecture in a state of increased entropy. At this point the teams are actually further away from their overall goal of making changes faster. Any new developer to the monolith code needs to deal with two code paths, increased cognitive load of understanding the code, and slower process of changing and testing it.
It’s well worth the read.
Many of the blog post written by large engineering organizations, don’t often apply to smaller organizations. While they are still interesting, reading how Google and Amazon handle load doesn’t necessarily translate into practical advice. This post by Damir Svrtan and Sergii Makagon in the Netflix Engineering Blog is different. They describe how they went about building a new service rapidly, meant to integrate with a variety with other services, even in the face of unknown requirements. Their solution: Hexagonal Architecture.
The idea of Hexagonal Architecture is to put inputs and outputs at the edges of our design. Business logic should not depend on whether we expose a REST or a GraphQL API, and it should not depend on where we get data from — a database, a microservice API exposed via gRPC or REST, or just a simple CSV file.
In particular, they way they defined their core concepts resonated with me. They stuck most of their code into Entities (domain objects), Repositories (read and write data), and Interactors (orchestration classes – i.e. service classes, use case objects).
I’ve been doing a lot of research into multi-service architectures, and I’ve seen many references to how entity services are an anti-pattern. Michael Nygard has a previous article describing just that. Designing services to avoid the anti-pattern is sometimes easier said than done. This post walks the reader on how to avoid the pitfalls with a concrete example modeling services based on the business lifecycle, instead of just focusing on the data that they store.