Ylan Segal

The REPL: Issue 27 - October 2016

Karafka

Karafka is a framework used to simplify Apache Kafka based Ruby applications development. It looks like a Rails-like abstraction to remove some of the boilerplate and decisions around how to structure a Kafka application. I don’t know if it’s ready for production, but worth keeping an eye on it.

MiniTest is not “Just Ruby”, it is “Just Rails”

Victor Shepelev writes his opinion about RSpec and MiniTest and how the differ. I don’t subscribe to all the author’s opinions or conclusions, but I do prefer RSpec and I have never found the “It’s just Ruby” argument for MiniTest very convincing. If anything, I find that having a distinct shape, structure and feel for test is a net positive. It promotes shifting from “This is the part that specifies behavior” to “This is the part that implements behavior” in a cleaner way.

Be Kind

Being a good and kind person pays dividends. I love this story. You should read it.

Subtleties of Xargs on Mac and Linux

xargs is one of my go-to tools in Unix. It reads lines from stdin and executes another command with each line as an argument. It’s very useful to glue commands together.

It’s default behavior is slightly different in Mac (or BSD) and Linux, in a subtle way. On the Mac, if there is no input from stdin, it will not execute the command. On Linux, it will execute it without any argument.

As an example, let’s say that we want to use rubocop (a ruby syntax checker and linter) to check only RSpec files in a project. We can write something like this:

1
$ find . -name '*_spec.rb' | xargs rubocop

On a project that has a two spec files, expanding the above example:

1
2
3
$ find . -name '*_spec.rb'
./spec/one_spec.rb
./spec/two_spec.rb

xargs will execute the equivalent of:

 $ rubocop ./spec/one_spec.rb ./spec/two_spec.rb

The subtlety in behavior cames in when no files are found. To illustrate, let’s see the difference in a trivial example:

1
2
3
4
$ uname
Linux
$ echo "" | xargs echo "Hello"
Hello
1
2
3
4
$ uname
Darwin
$ echo "" | xargs echo "Hello"
$

On Linux, xargs will execute the utility, on a Mac it will not. The Linux version can be configured to have the same behavior as the Mac:

1
2
3
4
$ uname
Linux
$ echo "" | xargs --no-run-if-empty echo "Hello"
$

Unfortunetly, the --no-run-if-empty option is not recognizable by the Mac:

1
2
3
4
5
6
7
$ uname
Darwin
$ echo "" | xargs --no-run-if-empty echo "Hello"
xargs: illegal option
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
             [-L number] [-n number [-x]] [-P maxprocs] [-s size]
             [utility [argument ...]]

Why is this important? In the original example, if no files are found, rubocop will not be invoked at all on the Mac, but will be invoked with no arguments on Linux. In my case, that is unwanted behavior because rubocop will then check all files in the project.

Conclusion

When writing bash scripts that are intended to run on different Unix version, be careful that you understand and test the behavior of the Unix commands used, sometimes they have subtle differences in behavior.

The REPL: Issue 26 - September 2016

Microservices – Please, don’t

Sean Kelly writes a cautionary post about microservices, organized into debunking 5 fallacies that he has encountered about microservices: They keep the code cleaner, they are easy, they are faster, they are simple for engineers and, they scale better.

The science of encryption: prime numbers and mod n arithmetic

While looking into Apache Milagro, I found a link to this short paper on the math behind public-key cryptography. It’s a great introduction, or refresher, to the mathematics that makes the secure web work. The paper itself has no author information, but the URL suggests that it written by Kathryn Mann at the University of California at Berkley.

Concurrency in Ruby 3 with Guilds

Olivier Lacan has a great explanation of Koichi Sasada recent proposal for bringing better parallelism to Ruby 3. The proposal is to introduce a new abstraction, called Guilds that is implemented in terms of existing Threads and Fibers, but can actually execute in parallel, because they have stronger guarantees around accessing shared state. In particular, guilds won’t be able to access objects in other guilds, without explicitly transferring them via channels. It’s exciting to think about Ruby’s performance not being bound by the Global Interpreter Lock (GIL).

Goodbye StartSSL, Hello Let's Encrypt

Mozilla is considering taking action against two Certificate Authorities, WoSign and StartCom after an investigation into improper behavior, including not reporting that the WoSign bought StartCom outright.

As I wrote about earlier, this blog used a StartCom TLS certificate, under their StartSSL brand, which was free. At the time, the only reason why I didn’t pick Let’s Encrypt was because the certificate expiration is every 3 months. However, given the contents of the report, I would much rather use an organization that wants to make the web better – not exploit it.

Obtaining and installing the new certificate, turned out to to be an easy process.

Obtaining A Certificate From Let’s Encrypt

I used the certbot client to obtain a certificate. On my mac, I installed via Homebrew:

1
$ brew install certbot

certbot can request and install the certificate, if it’s executed in the same machine that runs the web-server. In my case, I just wanted the certificates to be generated and downloaded locally.

1
$ sudo certbot certonly --manual

During the in-terminal process, certbot will ask for the intended domain and instruct you to make available some specified content at a particular url in that domain. This is to prove that you the person requesting the TLS certificate is an administrator for that domain. For me, this involved copying one new file to my hosting service.

After that, the certificate is issued immediately and available locally at /etc/letsencrypt/live/ylan.segal-family.com (your domain will vary).

Installing The New Certificate

The last time I installed a certificate, I had to open a support ticket at Nearly Free Speech. Since then, they have made the process automated and available from the control panel. The instructions are to paste into the provided form the certificate (including the full cert chain) and private key, all into the same field:

1
$ cat fullchain.pem privkey.pem | pbcopy

A few seconds later, the new certificate was installed and being served on this domain.

Conclusion

The Let’s Encrypt process ended up being simpler than StartSSL, since there was no need to manually create the private key and certificate signing request: It’s all done with one command.

Redirecting to an External Server May Leak Tokens in Headers

While working on an HTTP API that serves binary files to client applications, I came upon some unexpected behavior.

Imagine that we have a /file/:id endpoint, but that instead of responding with the binary, it redirects to an external storage service, like AWS S3. Our endpoint is also protected, so that users need an access token. A typical request/response cycle:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl --include --header "Authorization: Bearer SECRET_TOKEN" http://localhost:3000/file/12345
HTTP/1.1 302 Found
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Location: https://external-file-server.com/some-path
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
X-Request-Id: 8025fbf8-8513-401b-8ebc-32752cfd7c59
X-Runtime: 0.002428
Transfer-Encoding: chunked

<html><body>You are being <a href="https://example.com/some-path">redirected</a>.</body></html>```

Now, let’s instruct curl to follow redirects and be more verbose so that we can see the headers sent in the requests, as well as the responses. I’ll omit some output (with ...) for clarity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ curl --verbose --location --header "Authorization: Bearer SECRET_TOKEN" http://localhost:3000/file/12345
> GET /file/12345 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.43.0
> Accept: */*
> Authorization: Bearer SECRET_TOKEN
>
< HTTP/1.1 302 Found
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: https://example.com/some-path
< Content-Type: text/html; charset=utf-8
< Cache-Control: no-cache
< X-Request-Id: 4c2254fb-d5a5-46d2-9a8b-9cbfecc8b2ec
< X-Runtime: 0.002537
< Transfer-Encoding: chunked
<

> GET /some-path HTTP/1.1
> Host: example.com
> User-Agent: curl/7.43.0
> Accept: */*
> Authorization: Bearer SECRET_TOKEN
>
< HTTP/1.1 404 Not Found
...

curl, as requested, followed the redirect response, but in doing so, it included the original Authorization header in the request to another domain1. We have just leaked our secret and gave a valid token to access our system to a third party. To be fair, after some thought, I think it’s reasonable for curl to interpret that the header is to be sent in all requests, since we are also telling it to follow redirects. From the manual:

WARNING: headers set with this option will be set in all requests - even after redirects are followed, like when told with -L, --location. This can lead to the header being sent to other hosts than the original host, so sensitive headers should be used with caution combined with following redirects.

Who does that?

curl’s behavior (sending specifically set headers on redirects) was also observed on some other User Agents, notably the library used by one of our client applications. However, it doesn’t seem to be universal. For example httpie, does not leak the header:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ http --verbose --follow http://localhost:3000/file/12345 "Authorization: Bearer SECRET_TOKEN"
GET /file/12345 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Authorization: Bearer SECRET_TOKEN
Connection: keep-alive
Host: localhost:3000
User-Agent: HTTPie/0.9.6



HTTP/1.1 302 Found
Cache-Control: no-cache
Content-Type: text/html; charset=utf-8
Location: https://example.com/some-path
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: a53eb96b-f58c-4eb0-bbbd-bdca3eee8cc6
X-Runtime: 0.002384
X-XSS-Protection: 1; mode=block

GET /some-path HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: example.com
User-Agent: HTTPie/0.9.6



HTTP/1.1 404 Not Found
...

As you can see, the Authorization header is conspicuous for its absence in the second request.

Mitigation

Since we can’t predict the behavior of all User Agents that are going to use our API, we can design our APIs differently on the server.

Use Token As a Parameter

If we are using OAuth2 (which my example implies, because the use of a Bearer token), the specification allows for the token to be passed as a URI Query Parameter named access_token. Since that makes it part of the original URL it will certainly not be included by any client that follows redirection. However, I have seen the used flagged as risky by several security audits. One of the objections is that parameters in URLs are commonly written to logs and expose tokens unnecessarily.

The OAuth2 specification also allows a Form-Encoded Body Parameter also named access_token. This gets aournd the fact that the token is part of the URL and won’t be sent on any redirect. However, the request must have an application/x-www-form-urlencoded content type, which may conflict with the rest of the application wanting it to be application/json or similar.

Use Basic Authentication

Basic Authentication is a method for a User Agent to provide credentials to the server (usually username and password). Most User Agents have good support for it and understand that its use is limited to the original URL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ curl --verbose --location --user SECRET_TOKEN: http://localhost:3000/file/12345
> GET /file/12345 HTTP/1.1
> Host: localhost:3000
> Authorization: Basic U0VDUkVUX1RPS0VOOg==
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 302 Found
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: https://example.com/some-path
< Content-Type: text/html; charset=utf-8
< Cache-Control: no-cache
< X-Request-Id: f3211d6c-4e77-448b-a44d-7ad080fe5d3f
< X-Runtime: 0.002391
< Transfer-Encoding: chunked
<

> GET /some-path HTTP/1.1
> Host: example.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
......

The U0VDUkVUX1RPS0VOOg== in the Authorization header above is the secret, Base64 encoded:

1
2
$ echo U0VDUkVUX1RPS0VOOg== | base64 --decode
SECRET_TOKEN:

Don’t Redirect At All

Of course, redirecting is not the only option: Your endpoint can act as a proxy and read the contents from the external server and pass along to the client. The penalty is that the client connection to your server will stay open longer, consume more computation resources and transfer more data than a redirect.

Conclusion

Be careful when redirecting to external servers and you are using header-based authentication. Some clients may forward those headers along to a third party.


  1. We can ignore the 404 response. This is a made up example, and it’s irrelevant how the external server actually responded.