Foced https Redirects Considered Harmful

I don't remember where I first saw the admontion that “not everything that does HTTP is a browser“ – but I'd like to underscore this here. One corollary to this is:

Please do not unconditionally redirect to https!

People may have good reasons to choose unencrypted http, and sometimes they don't get to choose, in particular in embedded systems (where https may be prohibitively large) or when you cannot upgrade the ssl libraries and sooner or later the server no longer considers any of the ciphers you know safe.

Case in point: I have a command line program to query bahn.de (python3 version)…

Nachtrag (2022-09-04)

after many years of relative stability, the Bahn web page has significantly changed their markup, which broke this script. There is a new bahnconn now.

…which screen-scrapes the HTML pages that Deutsche Bahn's connection service hands out. I know bahn.de has a proper API, too, and I'm sure it would be a lot faster if I used it, but alas, my experiments with it were unpromising, with what's on the web working much better; perhaps I'll try again next time they change their HTML. But that's beside the point here.

The point is: In contrast to browsers capable of rendering bahn.de's HTML/javascript combo, this script runs on weak hardware like my Nokia N900. Unfortunately, the N900 is more or less frozen at the state of something like Debian Lenny, because its kernel has proprietary components that (or so I think) deal with actually doing phone calls, and hence I can't upgrade it beyond 2.6.29. And that means more or less (sure, I could start building a lot of that stuff from source, but eventually the libc is too old, and newer libcs require at least kernel 2.6.32) that I'm stuck with Python 2.5 and an OpenSSL of that time. Since about a year ago, these have no ciphers any more that the bahn.de server accepts. But it redirects me to https nevertheless, and hence the whole thing breaks. For no good reason at all.

You see, encryption buys me nothing when querying train connections. The main privacy breach here is bahn.de storing the request, and there I'm far better off with my script, as that (at least if more people used it) is a lot more anonymous than my browser with all the cookies I let Deutsche Bahn put into it and all the javascript goo they feed it. I furthermore see zero risk in letting random people snoop my train routes individually and now and then. The state can, regrettably, ask Deutsche Bahn directly ever since the Ottokatalog of about 2002. There is less than zero risk of someone manipulating the bahn.de responses to get me on the wrong trains.

Now, I admit that when lots of people do lots of queries in the presence of adversarial internet service providers and other wire goblins, this whole reasoning will work out differently, and so it's probably a good idea to nudge unsuspecting muggles towards https. Well: That's easy to do without breaking things for wizards wishing to do http.

Doing it right

The mechanism for that is the upgrade-insecure-requests header that essentially all muggle browsers now send (don't confuse it with the upgrade-insecure-requests CSP). This does not lock out old clients while still giving muggles some basic semblance of crypto.

And it's not hard to do, either. In Apache, you add:

<If "%{req:Upgrade-Insecure-Requests} == '1'">
  Header always set Vary Upgrade-Insecure-Requests
  Redirect 307 "/" "https://<your domain>/"
</If>

rather than the unconditional redirect you'd otherwise have; I suppose you can parameterise this rule so you don't even have to edit in your domain, but since I'm migrating towards nginx on my servers, I'm too lazy to figure out how. Oh, and you may need to enable mod_headers; on Debian, that would be a2enmod headers.

In nginx, you can have something like:

set $do_http_upgrade "$https$http_upgrade_insecure_requests";
location / {

  (whatever you otherwise configure)

  if ($do_http_upgrade = "1") {
     add_header Vary Upgrade-Insecure-Requests;
     return 307 https://$host$request_uri;
  }
}

in your server block. The trick with the intermediate do_http_upgrade variable makes sure we don't redirect if we already are on https; browsers shouldn't send the header on https connections, but I've seen redirect loops without this trick (origin).

Browser considerations

Me, I am by now taking it as a sign of quality if a server doesn't force https redirects and instead honours upgrade-insecure-requests. For instance, that way I can watch what some server speaks with the Javascript it executes on my machine without major hassle, and that's something that gives me a lot of peace of mind (but of course it's rather rare these days). In celebration of servers doing it right, I've configured my browser – luakit – to not send upgrade-insecure-requests; where I consider https a benefit rather than a liability for my privacy, I can remember switching to it myself, thank you.

The way to do that is to drop a file no_https_upgrade_wm.lua into .config/luakit containing:

local _M = {}

luakit.add_signal("page-created",
    function(page)
        page:add_signal("send-request", function(p, _, headers)
            if headers["Upgrade-Insecure-Requests"] then
                headers["Upgrade-Insecure-Requests"] = nil
            end
        end)
end)

(or fetch the file here). And then, in your rc.lua, write something like:

require_web_module("no_https_upgrade_wm")

...and for bone-headed websites?

In today's internet, it's quite likely that a given server will stink. As a matter of fact, since 1995, the part of the internet that stinks has consistently grown 20 percentage points[1] faster than the part that doesn't stink, which means that by now, essentially the entire internet stinks even though there's much more great stuff in it than there was in 1995: that's the miracle of exponential growth.

But at least for escaping forced https redirects, there is a simple fix in that you can always run a reverse proxy to enable http on https-only services. I'm not 100% sure just how legal that is, but as long as you simply hand through traffic and it's not some page where cleartext on the wire can realistically hurt worse than the cleartext on the server side, I'd claim you're ethically in the green. So, to make the Deutsche Bahn connection finder work with python 2.5, all that was necessary was a suitable host name, an nginx, and a config file like this:

server {
  listen 80;
  server_name bahnauskunft.tfiu.de;

  location / {
    proxy_pass https://reiseauskunft.bahn.de;
    proxy_set_header Host $host;
  }
}
[1]This figure is of course entirely made up<ESC>3bC only a conservative guess.

Zitiert in: BahnBonus ohne Google-Id und auf dem eigenen Rechner Bahnauskuft auf antiken Geräten – und auf Codeberg Abenteuer Irland: Kaputtes Drupal und eine Mail an die Datenschutzbehörde

Kategorie: edv

Letzte Ergänzungen