PSA: netsurf 3 does not accept cookies from localhost

As I have already pointed out in April, I consider simple and compact web browsers a matter of freedom (well, Freedom as in speech, actually), and although there's been a bit of talk about ladybird lately, my favourite in this category still is netsurf, which apparently to this date is lean enough to be runnable on vintage 1990 Atari TT machines. I'll freely admit I have not tried it, but the code is there.

Yesterday, however, netsurf drove me crazy for a while: I was developing a web site, making sure it works with netsurf. This website has a cookie-based persistent login feature, and that didn't work. I sent my Set-Cookie headers all right – ngrep is your friend if you want to be sure, somewhat like this:

sudo ngrep -i -d lo cookie port 8888

Ngrep also clearly showed that netsurf really did not send any Cookie headers, so the problem wasn't on the cookie header parsing side of my program, either.

But why did the cookies disappear? Cookie policy? Ha: netsurf does accept a cookie from Google, and crunching this would be the first thing any reasonable policy would do. Did I perhaps fail to properly adhere to the standards (which is another thing netsurf tends to uncover)? Hm: looking up the cookie syntax spec gave me some confidence that I was doing the right thing. Is my Max-Age ok? Sure, it is.

The answer to this riddle: netsurf does not store cookies if it cannot sort them into a hierarchy of host names, and it never can do that for host names without dots (as in localhost, for instance). Given the ill-thought-out Domain attribute one can set for cookies (see the spec linked above if you want to shudder), I even have a solid amount of sympathy for that behaviour.

But given that that is something that will probably bite a lot of people caring about freedom enough to bother with netsurf, I am still a bit surprised that my frantic querying of search engines on that matter did not bring up the slightly unconventional cookie handling of netsurf. Let us hope this post's title will change that. Again, netsurf 3 will not store cookies for not only localhost but any host name without dots in it. Which is a bit inconvenient for development, and hence despite my sympathy I am considering a bug report.

Meanwhile, I've worked around the problem by adding:

127.0.0.1 victor.local.de

to my /etc/localhost (the name really doesn't matter as long as it will never clash with an actual site you want to talk to and it contains one or more dots) and access the site I'm developing as http://victor.local.de. Presto: my cookie comes back from netsurf all right.

A Debugging Session

So, how did I figure this riddle out? The great thing about Debian and halfway compact software like netsurf is that it makes it reasonably simple to figure out such (mis-) features. Since I firmly believe that the use of debuggers is a very basic skill everyone touching a computer should have, let me give a brief introduction here.

First, you need to get the package's source. Make sure it matches the version of the program that you actually run; to do that, copy the deb line in /etc/apt/sources.list for the repository the package comes from (note that this could be the security repo if you got updates from there). In the copied line, replace deb with deb-src. In my case, that would be:

deb-src https://deb.debian.org/debian bullseye main

On a freshly installed Debian, it's likely you already have a line like this; consider commenting out the deb-src lines when not working with source code, as that will make your apt operations a bit faster.

After an apt update, I can now pull the source. To keep your file system tidy, I put all such sources into children of a given directory, perhaps /usr/src if you're old-school, or ~/src if not:

cd
mkdir -p src/netsurf
cd src/netsurf
apt-get source netsurf-gtk

I'm creating the intermediate netsurf directory because apt-get source creates four items in the directory, and in case you're actually building a package (which you could, based on this), more entries will follow; keeping all that mess outside of src helps a lot. Note that apt-get source does not need any special privileges. You really should run it as yourself.

By the way, this is the first part where monsters like webkit make this kind of thing really strenuous: libwebkit sources (which still are missing much over a full browser) pull 26 megabytes of archive expanding to a whopping 300 Megabytes of source-ish goo.

To go on, enter the directory that apt-get source created; in my case, that was netsurf-3.10. You can now look around, and something like:

find . -name "*.c" | xargs grep "set-cookie"

quickly brought me to a file called netsurf/content/urldb.c (yeah, you can use software like rgrep for „grep an entire tree“; but then the find/xargs combo is useful for many other tasks, too).

Since I still suspected a problem when netsurf parses my set-cookie header, the function urldb_parse_cookie in there caught my eye. It's not pretty that that function is such an endless beast of hand-crafted C (rather than a few lines of lex[1]), but it's relatively readable C, and they are clearly trying to accomodate some of the horrible practices out there (which is probably the reason they're not using lex), so just looking at the code cast increasing doubts on my hypothesis of some minor standards breach on my end.

In this way, idly browsing the source code went nowhere, and I decided I needed to see the thing in action. In order to not get lost in compiled machine code while doing that, one needs debug symbols, i.e., information that tells a debugger what compiled stuff resulted from what source code. Modern Debians have packages with these symbols in an extra repository; you can guess the naming scheme from the apt.sources string one has to use for bullseye:

deb http://debug.mirrors.debian.org/debian-debug bullseye-debug main

After another round of apt update, you can install the package netsurf-gtk-dbgsym (i.e., just append a -dbgsym to the name of the package that contains the program you want to debug). Once that's in, you can run the GNU debugger gdb:

gdb netsurf

which will drop you into a command line prompt (there's also a cool graphical front-end to gdb in Debian, ddd, but for little things like this I've found plain gdb to be less in my way). Oh, and be sure to do that in the directory with the extracted sources; only then can gdb show you the source lines (ok: you could configure it to find the sources elsewhere, but that's rarely worth the effort).

Given we want to see what happens in the function urldb_parse_cookie, we tell gdb to come back to us when the program enters that function, and then to start the program:

(gdb) break urldb_parse_cookie
Breakpoint 1 at 0x1a1c80: file content/urldb.c, line 1842.
(gdb) run
Starting program: /usr/bin/netsurf

With that, netsurf's UI comes up and I can go to my cookie-setting page. When I try to set the cookie, gdb indeed stops netsurf and asks me what to do next:

Thread 1 "netsurf" hit Breakpoint 1, urldb_parse_cookie (url=0x56bcbcb0,
    cookie=0xffffbf54) at content/urldb.c:1842
1842  {
(gdb) n
1853    assert(url && cookie && *cookie);

n (next) lets me execute the next source line (which I did here). Other basic commands include print (to see values), list (to see code), s (to step into functions, which n will just execute as one instruction), and cont (which resumes execution).

In this particular debugging session, everything went smoothly, except I needed to skip over a loop that was boring to watch stepping through code. This is exactly what gdb's until command is for: typing it at the end of the loop will fast forward over the loop execution and then come back once the loop is finished (at which point you can see what its net result is).

But if the URL parsing went just fine: Why doesn't netsurf send back my cookie?

Well, tracing on after the function returned eventually lead to this:

3889      suffix = nspsl_getpublicsuffix(dot);
(gdb)
3890      if (suffix == NULL) {

and a print(suffifx) confirmed: suffix for localhost is NULL. Looking at the source code (you remember the list command, and I usually keep the source open in an editor window, too) confirms that this makes netsurf return before storing the freshly parsed cookie, and a cookie not stored is a cookie not sent back to the originating site. Ha!

You do not want to contemplate how such a session would look like with a webkit browser or, worse, firefox or chromium, not to mention stuff you don't have the source code for. So: Give your users freedom and let them use your web pages with simple browsers like netsurf.

[1]I'm rather sure lex – which basically generates finite state machines – would be enough for parsing set-cookie headers, as their grammar is probably regular (rather than context-free). Don't quote me on this, though: I've just briefly glanced over what's in the spec, and I may have missed a trap.

Zitiert in: SPARQL 2: Improvising a client

Kategorie: edv

Letzte Ergänzungen