Yesterday, however, netsurf drove me crazy for a while: I was developing
a web site, making sure it works with netsurf. This website has a
cookie-based persistent login feature, and that didn't work. I sent my
Set-Cookie headers all right – ngrep is your friend if you want to be
sure, somewhat like this:
But why did the cookies disappear? Cookie policy? Ha: netsurf does
accept a cookie from Google, and crunching this would be the first thing
any reasonable policy would do. Did I perhaps fail to properly adhere
to the standards (which is another thing netsurf tends to uncover)? Hm:
looking up the cookie syntax spec gave me some confidence that I was
doing the right thing. Is my Max-Age ok? Sure, it is.
The answer to this riddle: netsurf does not store cookies if it cannot
sort them into a hierarchy of host names, and it never can do that for
host names without dots (as in localhost, for instance). Given the
ill-thought-out Domain attribute one can set for cookies (see the spec
linked above if you want to shudder), I even have a solid amount of
sympathy for that behaviour.
But given that that is something that will probably bite a lot of people
caring about freedom enough to bother with netsurf, I am still a bit
surprised that my frantic querying of search engines on that matter did
not bring up the slightly unconventional cookie handling of netsurf.
Let us hope this post's title will change that. Again, netsurf 3 will
not store cookies for not only localhost but any host name without dots
in it. Which is a bit inconvenient for development, and hence despite
my sympathy I am considering a bug report.
A Debugging Session
So, how did I figure this riddle out? The great thing about
Debian and halfway compact software like netsurf is that it makes it
reasonably simple to figure out such (mis-) features.
Since I firmly believe that the use of debuggers is a very basic skill
everyone touching a computer should have, let me give a brief
introduction here.
First, you need to get the package's source. Make sure it matches the
version of the program that you actually run; to do that, copy the
deb line in /etc/apt/sources.list for the repository the package
comes from (note that this could be the security repo if you got
updates from there). In the copied line, replace deb with
deb-src. In my case, that would be:
deb-src https://deb.debian.org/debian bullseye main
On a freshly installed Debian, it's likely you already have a line like
this; consider commenting out the deb-src lines when not working with
source code, as that will make your apt operations a bit faster.
After an apt update, I can now pull the source. To keep your file
system tidy, I put all such sources into children of a given
directory, perhaps /usr/src if you're old-school, or ~/src if
not:
cd
mkdir -p src/netsurf
cd src/netsurf
apt-get source netsurf-gtk
I'm creating the intermediate netsurf directory because apt-get
source creates four items in the directory, and in case you're
actually building a package (which you could, based on this), more
entries will follow; keeping all that mess outside of src helps a
lot. Note that apt-get source does not need any special privileges.
You really should run it as yourself.
By the way, this is the first part where monsters like webkit make this
kind of thing really strenuous: libwebkit sources (which still are
missing much over a full browser) pull 26 megabytes of archive expanding
to a whopping 300 Megabytes of source-ish goo.
To go on, enter the directory that apt-get source created; in my
case, that was netsurf-3.10. You can now look around, and something
like:
find . -name "*.c" | xargs grep "set-cookie"
quickly brought me to a file called netsurf/content/urldb.c (yeah,
you can use software like rgrep for „grep an entire tree“; but then the
find/xargs combo is useful for many other tasks, too).
Since I still suspected a problem when netsurf parses my set-cookie
header, the function urldb_parse_cookie in there caught my eye.
It's not pretty that that function is such an endless beast of
hand-crafted C (rather than a few lines of lex), but it's
relatively readable C, and they are clearly trying to accomodate some of
the horrible practices out there (which is probably the reason they're
not using lex), so just looking at the code cast increasing doubts on my
hypothesis of some minor standards breach on my end.
In this way, idly browsing the source code went nowhere, and I decided I
needed to see the thing in action. In order to not get lost in
compiled machine code while doing that, one needs debug symbols, i.e.,
information that tells a debugger what compiled stuff resulted from what
source code. Modern Debians have packages with these symbols in an
extra repository; you can guess the naming scheme from the
apt.sources string one has to use for bullseye:
deb http://debug.mirrors.debian.org/debian-debug bullseye-debug main
After another round of apt update, you can install the package
netsurf-gtk-dbgsym (i.e., just append a -dbgsym to the name of
the package that contains the program you want to debug). Once that's
in, you can run the GNU debugger gdb:
gdb netsurf
which will drop you into a command line prompt (there's also a cool
graphical front-end to gdb in Debian, ddd, but for little things like
this I've found plain gdb to be less in my way). Oh, and be sure to do
that in the directory with the extracted sources; only then can gdb show
you the source lines (ok: you could configure it to find the sources
elsewhere, but that's rarely worth the effort).
Given we want to see what happens in the function
urldb_parse_cookie, we tell gdb to come back to us when the program
enters that function, and then to start the program:
(gdb) break urldb_parse_cookie
Breakpoint 1 at 0x1a1c80: file content/urldb.c, line 1842.
(gdb) run
Starting program: /usr/bin/netsurf
With that, netsurf's UI comes up and I can go to my cookie-setting page.
When I try to set the cookie, gdb indeed stops netsurf and asks me what
to do next:
Thread 1 "netsurf" hit Breakpoint 1, urldb_parse_cookie (url=0x56bcbcb0,
cookie=0xffffbf54) at content/urldb.c:1842
1842 {
(gdb) n
1853 assert(url && cookie && *cookie);
n (next) lets me execute the next source line (which I did here).
Other basic commands include print (to see values), list (to see
code), s (to step into functions, which n will just execute as
one instruction), and cont (which resumes execution).
In this particular debugging session, everything went smoothly, except I
needed to skip over a loop that was boring to watch stepping through
code. This is exactly what gdb's until command is for: typing it at
the end of the loop will fast forward over the loop execution and then
come back once the loop is finished (at which point you can see what its
net result is).
But if the URL parsing went just fine: Why doesn't netsurf send back my
cookie?
Well, tracing on after the function returned eventually lead to this:
3889 suffix = nspsl_getpublicsuffix(dot);
(gdb)
3890 if (suffix == NULL) {
and a print(suffifx) confirmed: suffix for localhost is NULL.
Looking at the source code (you remember the list command, and I usually
keep the source open in an editor window, too) confirms that this makes
netsurf return before storing the freshly parsed cookie, and a cookie
not stored is a cookie not sent back to the originating site. Ha!
You do not want to contemplate how such a session would look like with a
webkit browser or, worse, firefox or chromium, not to mention stuff you
don't have the source …