xargs is one of my go-to tools in Unix. It reads lines from stdin and executes another command with each line as an argument. It’s very useful to glue commands together.
It’s default behavior is slightly different in Mac (or BSD) and Linux, in a subtle way. On the Mac, if there is no input from stdin, it will not execute the command. On Linux, it will execute it without any argument.
As an example, let’s say that we want to use rubocop (a ruby syntax checker and linter) to check only RSpec files in a project. We can write something like this:
$ find . -name '*_spec.rb'| xargs rubocop
On a project that has a two spec files, expanding the above example:
Why is this important? In the original example, if no files are found, rubocop will not be invoked at all on the Mac, but will be invoked with no arguments on Linux. In my case, that is unwanted behavior because rubocop will then check all files in the project.
When writing bash scripts that are intended to run on different Unix version, be careful that you understand and test the behavior of the Unix commands used, sometimes they have subtle differences in behavior.
Sean Kelly writes a cautionary post about microservices, organized into debunking 5 fallacies that he has encountered about microservices: They keep the code cleaner, they are easy, they are faster, they are simple for engineers and, they scale better.
While looking into Apache Milagro, I found a link to this short paper on the math behind public-key cryptography. It’s a great introduction, or refresher, to the mathematics that makes the secure web work. The paper itself has no author information, but the URL suggests that it written by Kathryn Mann at the University of California at Berkley.
Olivier Lacan has a great explanation of Koichi Sasada recent proposal for bringing better parallelism to Ruby 3. The proposal is to introduce a new abstraction, called Guilds that is implemented in terms of existing Threads and Fibers, but can actually execute in parallel, because they have stronger guarantees around accessing shared state. In particular, guilds won’t be able to access objects in other guilds, without explicitly transferring them via channels. It’s exciting to think about Ruby’s performance not being bound by the Global Interpreter Lock (GIL).
Mozilla is considering taking action against two Certificate Authorities, WoSign and StartCom after an investigation into improper behavior, including not reporting that the WoSign bought StartCom outright.
As I wrote about earlier, this blog used a StartCom TLS certificate, under their StartSSL brand, which was free. At the time, the only reason why I didn’t pick Let’s Encrypt was because the certificate expiration is every 3 months. However, given the contents of the report, I would much rather use an organization that wants to make the web better – not exploit it.
Obtaining and installing the new certificate, turned out to to be an easy process.
Obtaining A Certificate From Let’s Encrypt
I used the certbot client to obtain a certificate. On my mac, I installed via Homebrew:
$ brew install certbot
certbot can request and install the certificate, if it’s executed in the same machine that runs the web-server. In my case, I just wanted the certificates to be generated and downloaded locally.
$ sudo certbot certonly --manual
During the in-terminal process, certbot will ask for the intended domain and instruct you to make available some specified content at a particular url in that domain. This is to prove that you the person requesting the TLS certificate is an administrator for that domain. For me, this involved copying one new file to my hosting service.
After that, the certificate is issued immediately and available locally at /etc/letsencrypt/live/ylan.segal-family.com (your domain will vary).
Installing The New Certificate
The last time I installed a certificate, I had to open a support ticket at Nearly Free Speech. Since then, they have made the process automated and available from the control panel. The instructions are to paste into the provided form the certificate (including the full cert chain) and private key, all into the same field:
$ cat fullchain.pem privkey.pem | pbcopy
A few seconds later, the new certificate was installed and being served on this domain.
The Let’s Encrypt process ended up being simpler than StartSSL, since there was no need to manually create the private key and certificate signing request: It’s all done with one command.
While working on an HTTP API that serves binary files to client applications, I came upon some unexpected behavior.
Imagine that we have a /file/:id endpoint, but that instead of responding with the binary, it redirects to an external storage service, like AWS S3. Our endpoint is also protected, so that users need an access token. A typical request/response cycle:
curl, as requested, followed the redirect response, but in doing so, it included the original Authorization header in the request to another domain1. We have just leaked our secret and gave a valid token to access our system to a third party. To be fair, after some thought, I think it’s reasonable for curl to interpret that the header is to be sent in all requests, since we are also telling it to follow redirects. From the manual:
WARNING: headers set with this option will be set in all requests - even after redirects are followed, like when told with -L, --location. This can lead to the header being sent to other hosts than the original host, so sensitive headers should be used with caution combined with following redirects.
Who does that?
curl’s behavior (sending specifically set headers on redirects) was also observed on some other User Agents, notably the library used by one of our client applications. However, it doesn’t seem to be universal. For example httpie, does not leak the header:
As you can see, the Authorization header is conspicuous for its absence in the second request.
Since we can’t predict the behavior of all User Agents that are going to use our API, we can design our APIs differently on the server.
Use Token As a Parameter
If we are using OAuth2 (which my example implies, because the use of a Bearer token), the specification allows for the token to be passed as a URI Query Parameter named access_token. Since that makes it part of the original URL it will certainly not be included by any client that follows redirection. However, I have seen the used flagged as risky by several security audits. One of the objections is that parameters in URLs are commonly written to logs and expose tokens unnecessarily.
The OAuth2 specification also allows a Form-Encoded Body Parameter also named access_token. This gets aournd the fact that the token is part of the URL and won’t be sent on any redirect. However, the request must have an application/x-www-form-urlencoded content type, which may conflict with the rest of the application wanting it to be application/json or similar.
Use Basic Authentication
Basic Authentication is a method for a User Agent to provide credentials to the server (usually username and password). Most User Agents have good support for it and understand that its use is limited to the original URL.
Of course, redirecting is not the only option: Your endpoint can act as a proxy and read the contents from the external server and pass along to the client. The penalty is that the client connection to your server will stay open longer, consume more computation resources and transfer more data than a redirect.
Be careful when redirecting to external servers and you are using header-based authentication. Some clients may forward those headers along to a third party.
We can ignore the 404 response. This is a made up example, and it’s irrelevant how the external server actually responded.↩
Gary Bernhardt writes a great article on types, type systems and the differences in typing in different programming languages. He clarifies some of the adjective commonly associates with types: static, dynamic, weak, strong. It’s a very interesting read, as are some of the comments in the gist. Gary has also re-started his Destroy All Software screencast series: I haven’t watched any of the new ones, but I learned a lot from the old ones.
Troy Hunt explores the services that CloudFlare provides as a content delivery network (CDN), in particular with respect to SSL (or, more properly, TLS). As with most interesting things in life, it’s not black and white: CloudFlare is not evil – like some recent blog post claim – and provides valuable services, but users need to be aware what the security guarantees are, or more importantly what they are not. Security is hard and nuanced. The more you know…
In the last few weeks I have been reading a lot on data pipelines. Many companies have been moving from centralized databases for all their data to distributed systems that present a set of challenges. In particular: How to make the data produced in one system available to other systems in a robust and consistent manner. In this articles Jay Kreps explains the Log in detail – the underlying abstraction necessary to understand database systems, replication, transactions, etc. The Log, in this context, refers to a storage abstraction that is append-only, totally-ordered sequence of records, ordered by time. The article is long, but thorough and absolutely worth your time. Many of the concepts are similar to what is described on a post about Apache Samza, also a enlightening read.