Tag Messages

This tag collects solutions to computer problems by diagnostics that programmes emit. Since I'm entering error messages into search engines a lot more often than I care to admit, this is part of me giving back. And since these search engines are international, articles with this tag are generally in English.

  • Another Bookworm Regression: D-bus, X11 Displays, purple-remote, Oh My!

    When I reported on what broke when I upgraded to Debian bookworm, I overlooked that my jabber presence management (where I'm offline at night and on weekends) no longer worked. Figuring out why and fixing it was a dive into D-Bus and X11 that may read like a noir detective novel, at least if you are somewhat weird. Let me write it up for your entertainment and perhaps erudition.

    First off, against the March post, I have migrated to pidgin as my XMPP (“jabber”) client; at its core, presence management still involves a script in /etc/network/if-*.d where I used to call something like:

    su $DESKTOP_USER -c "DISPLAY=:0 purple-remote getstatus"
    

    whenever a sufficiently internetty network interface went up or down, where DESKTOP_USER contains the name under which I'm running my X session (see below for the whole script with the actual presence-changing commands).

    Purple-remote needs to run as me because it should use my secrets rather than root's. But it was the DISPLAY=:0 thing that told purple-remote how to connect to the pidgin instance to interrogate and control. As most boxes today, mine is basically a single-user machine (at least as far as “in front of the screen” goes), and hence guessing the “primary” X display is simple and safe.

    Between X11 and the D-Bus

    That purple-remote needed the DISPLAY environment variable was actually almost a distraction from the start. There are many ways for Unix programs to talk to each other, and DISPLAY might have pointed towards 1980ies-style X11 inter-client communication. But no, the purple-remote man page alreads says:

    This program uses DBus to communicate with Pidgin/Finch.

    Correctly spelled D-Bus, this is one of the less gruesome things to come out of the freedesktop.org cauldron, although it is still riddled with unnecessarily long strings, unnecessarily deep hierarchies, and perhaps even unnecessary use of XML (though I feel sympathies in particular for that last point).

    But that's not what this post is about. I'm writing this because after upgrading to Debian bookworm, purple-remote no longer worked when used from my if-up.d script. Executing the command in a root shell (simulating how it would be called from ifupdown) showed this:

    # DESKTOP_USER=anselm su $DESKTOP_USER -c "DISPLAY=:0 purple-remote getstatus"
    No existing libpurple instance detected.
    

    A quick glance at the D-Bus Specification gives a hint at how this must have worked: dbus-launch – which is usually started by your desktop environment, and my case by a:

    export $(dbus-launch --exit-with-x11)
    

    in ~/.xinitrc – connects to the X server and leaves a “property” (something like a typed environment variable attached to an X11 window) named _DBUS_SESSION_BUS_ADDRESS in, ah… for sure the X server's root window [careful: read on before believing this]. As the property's value, a D-Bus client would find a path like:

    unix:path=/tmp/dbus-1cAbvsX6FD,guid=795a0d...
    

    and it could open that socket to talk to all other D-Bus clients started within the X session.

    Via apropos to xprop to Nowhere

    So… Does that property exist in the running X server? Hm. Can I figure that out without resorting to C programming? Let's ask the man page system:

    $ apropos property
    [..lots of junk...]
    xprop (1)            - property displayer for X
    [...]
    

    Typing in man xprop told me I was on the right track:

    $ man xprop
    
    SYNOPSIS
         xprop  […] [format [dformat] atom]*
    
    SUMMARY
      The xprop utility is for displaying window and font properties in an
      X server.
    
    OPTIONS
      […]
      -root   This argument specifies that X's root window is the target win‐
              dow.   This  is  useful  in situations where the root window is
              completely obscured.
    

    So, let's see:

    $ xprop -root _DBUS_SESSION_BUS_ADDRESS
    _DBUS_SESSION_BUS_ADDRESS:  not found.
    

    Hu? Has dbus-launch stopped setting the property? Let's inspect Debian's change log; a major change like that would have to be noted there, wouldn't it? Let's first figure out which package to look at; the documentation then is in /usr/share/doc/<packagename>:

    $ dpkg -S dbus-launch
    dbus-x11: /usr/bin/dbus-launch
    $ zless /usr/share/doc/dbus-x11/changelog.Debian.gz
    

    Looking for “property” or “BUS_ADDRESS” in there doesn't yield anything; that would make it unlikely that the property was somehow dropped intentionally. I have to admit I had halfway expected that, with something like “for security reasons”. But then if someone can read your root window's properties, access to your session bus is probably the least of your problems.

    Still, perhaps someone is slowly dismantling X11 support on grounds that X11 is kinda uncool? Indeed, you can build dbus-launch without X11 support. If the Debian maintainers built it that way, the respective strings should be missing in the binary, but:

    $ strings `which dbus-launch` | grep _DBUS_SESSION
    _DBUS_SESSION_BUS_PID
    _DBUS_SESSION_BUS_ADDRESS
    _DBUS_SESSION_BUS_SELECTION_
    

    No, that's looking good; dbus-launch should still set the properties.

    Skimming the Docs is Not Reading the Docs.

    If I did not see the property a moment ago, perhaps I have used xprop the wrong way? Well, actually: I didn't read the D-Bus spec properly, because what it really says is this:

    For the X Windowing System, the application must locate the window owner of the selection represented by the atom formed by concatenating:

    • the literal string "_DBUS_SESSION_BUS_SELECTION_"
    • the current user's username
    • the literal character '_' (underscore)
    • the machine's ID

    – and then find the _DBUS_SESSION_BUS_PID on the window owning that selection. The root window thing was my own fantasy.

    If you bothered to skim the ICCCM document I linked to above, you may recognise the pattern: that's just conventional X inter-client communication – no wonder everyone prefers D-Bus.

    This is beyond what I'd like to do in the shell (though I wouldn't be surprised if xdotool had a hack to make that feasible). I can at least establish that dbus-launch still produces what the spec is talking about, because the “atoms” – a sort of well-known string within the X server and as a concept probably part of why folks are trying to replace X11 with Wayland – are all there:

    $ xlsatoms | grep DBUS
    488   _DBUS_SESSION_BUS_SELECTION_anselm_d162...
    489   _DBUS_SESSION_BUS_ADDRESS
    490   _DBUS_SESSION_BUS_PID
    

    The Next Suspect: libdbus

    Given that, dbus-launch clearly is exonerated as the thing that broke. The next possible culprit is purple-remote. It turns out that's a python program:

    $ grep -i dbus `which purple-remote`
    import dbus
        obj = dbus.SessionBus().get_object("im.pidgin.purple.PurpleService", "/im/pidgin/purple/PurpleObject")
    purple = dbus.Interface(obj, "im.pidgin.purple.PurpleInterface")
                data = dbus.Interface(obj, "org.freedesktop.DBus.Introspectable").\
    

    So, this is using the python dbus module. Let's see if its changelog says anything about dropping X11 support:

    $ zless /usr/share/doc/python3-dbus/changelog.Debian.gz
    

    Again, nothing for X11, property, or anything like that. Perhaps we should have a brief look at the code:

    $ cd /some/place/for/source
    $ apt-get source python3-dbus
    […]
    dpkg-source: info: extracting dbus-python in dbus-python-1.3.2
    […]
    $ cd dbus-python-1.3.2/
    

    You will see that the python source is in a subdirectory called dbus. Let's see if that talks about our property name:

    $ find . -name "*.py" | xargs grep _DBUS_SESSION_BUS_ADDRESS
    $
    

    No[1]. Interestingly, there's no mention of X11 either. Digging a bit deeper, however, I found a C module dbus_bindings next to the python code in dbus. While it does not contain promising strings (X11, property, SESSION_BUS…) either, that lack made me really suspicious, since at least the environment variable name should really be visible in the source. The answer is in the package's README: “In addition, it uses libdbus” – so, that's where the connection is being made?

    Another Red Herring

    That's a fairly safe bet. Let's make sure we didn't miss something in the libdbus changelog:

    $ zless /usr/share/doc/libdbus-1-3/changelog.Debian.gz
    

    You will have a déjà-vu if you had a look at dbus-x11's changelog above: the two packages are built from the same source and hence share a Debian changelog. Anyway, again there are no suspicious entries. On the contrary: An entry from September 2023 (red-hot by Debian stable standards!) says:

    dbus-user-session: Copy XDG_CURRENT_DESKTOP to activation environment. Previously this was only done if dbus-x11 was installed. This is needed by various freedesktop.org specifications…

    I can't say I understand much of what this says, but it definitely doesn't look as if they had given up on X11 just yet. But does that library still contain the property names?

    $ dpkg -L libdbus-1-3
    […]
    /lib/i386-linux-gnu/libdbus-1.so.3
    […]
    $ strings /lib/i386-linux-gnu/libdbus-1.so.3 | grep SESSION_BUS
    DBUS_SESSION_BUS_ADDRESS
    $
    

    No, it doesn't. That's looking like a trace of evidence: the name of the environment variable is found, but there's nothing said of the X11 property. If libdbus evaluated that property, it would stand to reason that it would embed its name somewhere (though admittedly there are about 1000 tricks with which it would still do the right thing without the literal string in its binary).

    Regrettably, that's another red herring. Checking the libdbus from the package in bullseye (i.e., the Debian version before bookworm) does not yield the property …

  • How to Pin a Wifi Access Point in Debian – and Why You Probably Don't Want to in Lufthansa Planes

    A vertical gradient from black to light blue, lots of unfilled template variables in double curly braces in white.

    That's what you see in Lufthansa's onboard wifi when you don't let just about anyone execute client-side Javascript on your machine. See below for a more useful URI in the onboard wifi.

    I have already confessed I was flying recently (albeit only in German). What was new versus the last time I've been in a plane five years ago[1]: Not only did wifi signals apparently no longer confuse the aircraft's navigation systems but there was actually an onboard wifi network with no less than seven access points within my machine's range.

    Somewhat surprisingly, I had a hard time getting a connection that would not break after a few seconds. I'll confess that's not the first time I've had trouble connecting to fancy networks recently, where the syslog contained cryptic messages like:

    kernel: wlan0: deauthenticated from <redacted> (Reason: 252=<unknown>)
    kernel: wlan0: disassociated from <redacted> (Reason: 1=UNSPECIFIED)
    

    In all these cases, there were a lot of access points with the same ESSID around, and so I suspect whatever selects the access points is currently broken on my machine; it chooses really weak access points and then gets badly mangled signals. While I'm waiting for this to heal by itself, I am resorting to manually picking and pinning the access points. In case you use ifupdown to manage your wifi, perhaps this little story is useful for you, too.

    The first part is to pick an access point. To do that, I ignore the warning of the authors of iw (from the eponymous package) not to parse its output and run:

    sudo iw wlan0 scan | egrep "^BSS|signal: .*dBm|SSID:"
    

    Nachtrag (2023-11-02)

    Well, in non-plane situations it's wise to get the SSIDs, too, so you see which APs actually are for the network you want to join. Hence, I've updated the grep in the command line above.

    The output of this looked like this on the plane I was in:

    BSS 00:24:a8:83:37:93(on wlan0)
            signal: -68.00 dBm
    BSS 00:24:a8:ac:1d:93(on wlan0)
            signal: -41.00 dBm
    BSS 00:24:a8:83:37:82(on wlan0)
            signal: -62.00 dBm
    BSS 00:24:a8:ac:1d:82(on wlan0)
            signal: -48.00 dBm
    BSS 00:24:a8:83:37:91(on wlan0)
            signal: -60.00 dBm
    BSS 00:24:a8:83:76:53(on wlan0)
            signal: -76.00 dBm
    BSS 00:24:a8:83:77:e2(on wlan0)
            signal: -82.00 dBm
    

    The things after the “BSS” are the MAC addresses of the access points, the numbers after signal is some measure for the power that reaches the machine's antenna[2] from that access point, where less negative means more power. So, with the above output you want to pick the access point 00:24:a8:ac:1d:93.

    With ifupdown, you do that by editing the stanza for that Wifi and add a wireless-ap line; for me, this then looks like:

    iface roam inet dhcp
      wireless-essid Telekom_FlyNet
      wireless-ap 00:24:a8:ac:1d:93
    

    – and this yields a stable connection.

    I must say, however, that the services on that network (I'm too stingy for actual internet access, of course) are a bit lacking, starting with the entirely botched non-Javascript fallback (see above). At least there is http://services.inflightpanasonic.aero/inflight/services/flightdata/v1/flightdata where you will see some basic telemetry in JSON. Or wait: it's actually perimetry if you see speed, height, and other stuff for the plane you're on.

    Fetching the numbers from the json you will save a lot of power versus the web page that becomes extremely network-chatty and CPU-hoggy (on webkit, at least) once you let Lufthansa execute Javascript. I'm afraid I have too much flight shame (and hence too little use for it) to cobble something nice together with that API and qmapshack. But it certainly looks like a fun project.

    [1]Ah wait… now that I think again, I seem to remember that during one of my last sinful travels there has already been a plane that had on-board Wifi. But it certainly is a nicer story with the little lie of news when coming back after five years.
    [2]Since “dBm” stands for „decibel milliwatt“, you could compute that power as 10s ⁄ 10  W. I'd not trust the absolute numbers, as they would indicate here that one access point is a factor of ten thousand stronger than another one, which sounds implausible primarily because I'd be surprised if the circuitry of the Wifi card could deal with such a high dynamic range. And “I'm getting 0.0001 milliwatts from the AP“ is a statement in dire need of interpretation anyway (e.g., “in the carrier? Bolometric?”). But let's not go there.
  • mdns-scan complains that IP_ADD_MEMBERSHIP failed

    Last weekend I had to use a video projector via Google cast or chromecast or whatever it's called this month – it was mounted at the ceiling and was unreachable by cables.

    What I could work out about Google cast from a few web searches sounded like it should be simple: encode what's on the local screen to a video and then transmit that to some more or less bespoke endpoint through – I think – Secure Reliable Transport, a UDP-based protocol for which there's a Debian package called srt-tools.

    Whether or not that's roughly right, what I failed to answer is: Where do you transmit to? It seems the way to figure that out is to ask zeroconf alias Bonjour the right questions, and that in turn seems to require multicasting DNS-ish requests and then collecting responses from devices that reply to these multicasts. Aw! If only avahi – the usual mDNS implementation on Linux – wasn't among the first things I purge from machines if I find it.

    While trying to nevertheless cobble something together that would tell me where to send my stream to I got an interesting error message when I experimentally ran mdns-scan:

    IP_ADD_MEMBERSHIP failed: No such device
    

    This was while I was connected to the projector's built-in Wifi access point. And I didn't have the foggiest idea what the thing was saying. Search engines didn't bring up satisfying explanations (although there was some unspecific mumbling about “routes”). So, I straced the thing to see what syscalls it does before giving up:

    $ strace mdns-scan
    [the dynamic linker in action]
    ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
    munmap(0xf7f13000, 132486)              = 0
    socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
    setsockopt(3, SOL_IP, IP_MULTICAST_TTL, [255], 4) = 0
    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    bind(3, {sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, 16) = 0
    setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, {imr_multiaddr=inet_addr("224.0.0.251"), imr_interface=inet_addr("0.0.0.0")}, 12) = -1 ENODEV (No such device)
    write(2, "IP_ADD_MEMBERSHIP failed: No suc"..., 41IP_ADD_MEMBERSHIP failed: No such device
    ) = 41
    close(3)                                = 0
    exit_group(1)                           = ?
    

    – failing after so few syscalls is actually brilliant. Try strace on your average web browser or even your python interpreter and you know what I mean.

    And while I don't really know much about multicasting, this gave me an idea what was going on. You see, the projector hadn't set a default route. My box's routing table was simply:

    $ ip route
    192.168.2.0/24 dev wlan0 proto kernel scope link src 192.168.2.32
    

    I guess that's rather typical, and that's why I'm writing this: I'd expect other people trying Google cast or Airplay to projectors may run into that same problem.

    The reason this is a problem is that mdns-scan wants to (I think; don't believe me without researching further) subscribe to the address 224.0.0.251 via some network interface. That particular IP address looks less crazy than it is, because it's a multicast address, which makes it mean something rather special, and this one is special special because it basically means “I want to see multicast DNS packets floating around on the local network” (and send them to everyone using the same router as I do). Saying this means that the machine has to have an idea where to send packets to, and with the routing table the projector's DHCP server set up, it felt it didn't know that. I have to admit I have not quite worked out just why it felt that, but I'm rather confident that's why the setsockopt above failed.

    In that special situation – when you are not connected to the internet anyway – it is safe to just set the default route to the projector:

    $ ip route add default via 192.168.2.1 dev wlan0
    

    (where you will probably have to change the IP address to whatever your projector binds to; it's almost certainly the address of the DHCP server on the projector's access point, which you'll find in your syslog). This is enough to let mdns-scan do its multicasting magic, and indeed it now yielded a handle for the chromecasting service.

    But that still was not enough to figure out where to srt-live-transmit my video stream to, and hence I didn't get to improvise screen mirroring in the time I had before the event started. I eventually projected using a Windows (well: at least not Android…) box with a silly chromecast dongle that came with the projector and had some nasty driver software for the dongle on some built-in USB mass storage.

    Regrettably (or fortunately, if you want), I don't have access to the device any more, so I cannot continue my chromecast hacking. If you are aware of a little script that does the zeroconf queries and connection setup in a few lines, please let me know: It would be nice to be prepared when next I encounter one of these beasts.

  • OpenSSL, Syslog, and Unexpected Consequences of Usrmerge: Upgrading to bookworm

    A few weeks after the release of Debian bookworm, I have recently dist-upgraded my main, ah well, workstation, too. As mentioned in my bullseye upgrade post, that box's file system is ancient, and the machine does many things in perhaps unusual ways, which includes still booting with sysvinit rather than systemd for quite a few reasons. Hence, it always brings up the some interesting upgrade probl^H^H^H^H^Hchallenges. While for bullseye, the main… um… challenge for me was the migration to python3, this time the big theme was dropped crypto engines.

    Rsyslogd, wmnet

    Much more harmless than those, but immediately visible after the upgrade, was that my syslog display remained empty. The direct reason was that the rsyslog daemon was not running. The reason for that, in turn, was that there was not even a init script for it in /etc/init.d, let alone rc.d links to it. But the rsyslogd package was installed. What would the purpose be of installing a daemon package without an init script?

    The Debian bug tracker had something like an answer: the maintainer took it out, presumably to shed files they considered cruft in the age of systemd. Although I have to concur with Naranyan's remark in the bug report that rsyslog will typically be in place exactly when systemd (with its own log daemon) is not, at least that bug (#1037039) offers the (simple) fix: Install the orphan-sysvinit-scripts package.

    Something a bit harder to explain is that the nice wmnet applet for monitoring transfers on network interfaces came up blank after the upgrade. This is fixed by passing a -n option to it, which tells it to draw into a normal window rather than something suitable for the Windowmaker dock. Wmnet (as perhaps other Windowmaker applets, too) tries to guess where to draw based on some divination. Perhaps my window manager sawfish started to pretend it's Windowmaker in bookworm? Or indicated to wmnet in some other way it was living in a Windowmaker dock? Hm: Given that the last changelog entry on sawfish itself is from 2014 (which I consider a sign of quality), that seems unlikely, but then I can't bring myself to investigate more closely.

    The usr Merge and Bashism in the Woodwork

    Although I had merged the root and usr file systems on that box last time I migrated to a new machine, I had postponed doing the usrmerge thing (i.e., making the content of /bin and /usr/bin identical) on the box until the last possible moment – that is, the bookworm installation – because I had a hunch some hack I may have made 20 years ago would blow up spectacularly.

    None did. Except… it turned out I had linked /bin/sh to /bin/bash for some long-forgotten and presumably silly reason; if you had asked me before the upgrade, I'd have confidently claimed that of course all my little glue scripts are executed by Debian's parsimonious dash rather than the relatively lavish bash. Turns out: they weren't.

    With the installation of the usrmerge package during the bookworm dist-upgrade that is over. /bin/sh is now dash as I had expected it to be all the time. I have to admit I am a bit disappointed that I do not notice any difference in system snappiness at all.

    But I did notice that plenty of my scripts were now failing because they contained a bashism: Comparison for string equality in POSIX-compliant [ ... ] constructs is not the C-like == but the SQL-like = even though bash accepts both. I don't know when I forgot this (or, for that matter, whether I ever knew it), but a dozen or so of my (often rather deeply embedded) shell scripts started to fail with messages like:

    script name: 22: [: tonline: unexpected operator
    

    So, repeat after me: In shell scripts, compare strings with = and numbers with -eq. And I have to admit that this experience made me a bit more sympathetic to the zero shell paradigm behind systemd. But for the record: I still think the advantages of having hooks for shell scripts almost everywhere overall outweigh these little annoyances.

    The OpenSSL Upgrade

    With the bookworm upgrade, a fair number of hashes and ciphers were declared “legacy” in openssl, which means that in the default configuration, it will reject them. That had a few rather disruptive consequences: For one, I needed to update a few snake-oil certificates I had generated for playing with https on my box.

    Also, fetchmail failed for a POP server I had configured with a message like:

    fetchmail: <hostname> SSL connection failed.
    fetchmail: socket error while fetching from <whatever>
    

    I was puzzled for a while until I realised that the recipe said:

    with proto TLS1
    

    That was probably valuable in, like, 2004, to suppress ancient (relatively) easily breakable SSL versions, but by now it didn't let fetchmail negotiate crypto that was still allowed by openssl. Removing the proto TLS1 fixed that problem.

    The most unnerving breakage, however, was that my preferred disk crypto, encfs (cf. this advocacy in German), broke for some volumes I had created more than a decade ago: they failed to mount because openssl now refuses (I think) the blowfish cipher. I fiddled around a bit with re-enabling legacy algorithms as per Debian bug 1014193 but quickly lost my patience with the slightly flamboyant semantics of openssl.cnf. To my surprise, downgrading to encfs_1.9.5-1+b2_i386.deb from bullseye (by briefly re-adding the sources.list lines) let me mount the old volumes again. I then simply created new encfs volumes and rsync -av-ed from the old decrypted volume into the new decrypted volume. Finally, after unmounting everything encfs, I overwrote the old encrypted volumes with the new encrypted volumes and upgraded back to bookworm encfs.

    Since I can't explain why downgrading encfs would have fixed the problem as I've analysed it and hence suspect that a part of my analysis (and fix) is wrong, I'd strongly recommend to run:

    encfsctl info <encrypted volume>
    

    on each encfs directory you have before the upgrade. If it says something like:

    Filesystem cipher: "ssl/blowfish", version 2:1:1 (using 3:0:2)
    

    or even just:

    Version 5 configuration; created by EncFS 1.2.5 (revision 20040813)
    

    (where I have not researched the version where encfs defaults became acceptable for bookworm openssl; 1.9 is ok, at any rate), copy over the decrypted content into a newly created encfs container; it's quick and easy.

    Relatedly, bookworm ssh also disallows a few crypto methods now deemed insecure by default, in particular SHA-1 hashes for host keys. Now, I have to connect to a few hosts I cannot upgrade (either because I'm not root or because they are stuck on some ancient kernel because of proprietary kernel components). For these, when trying to connect I now get messages like this:

    Unable to negotiate with 192.168.20.21 port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss
    

    You could reasonably argue I should discard boxes of that type. On the other hand, nobody will spend 50'000 Euro to eavesdrop on my communications with these machines[1] – that's the current estimate for producing a hash collision for an ssh host key, which this is about. Hence, I'm happy to risk man-in-the-middle attacks for these machines.

    To deal with such situations, openssh lets you selectively re-allow SHA-1 hashes on RSA host keys. Helpfully, /usr/share/doc/openssh-client/NEWS.Debian.gz gives a recipe to save those hosts; put host stanzas like:

    Host ancient-and-unupdatable.some.domain
      HostKeyAlgorithms=+ssh-rsa
      PubkeyAcceptedKeyTypes +ssh-rsa
    

    into ~/.ssh/config (and do read ssh_config (5) if you are not sure what I'm talking about, regardless of whether or not you have this particular problem). Incidentally, just to save that one machine where you forgot to update your ancient DSA public key, you can for a brief moment change the second line to:

    PubkeyAcceptedKeyTypes +ssh-rsa,ssh-dsa
    

    If you don't have an RSA key yet, create one (ssh-genkey -t rsa) – RSA keys work even on the most venerable openssh installations that don't yet know about the cool ed25519 keys. Connect to the server, install the RSA public key, and re-remove the ssh-dsa part in the config again.

    Kudos to the openssh maintainers for keeping compatibility even in crypto over more than 20 years. And shame on many others – including me – who don't manage to do that even in non-crypto software.

    Terrible Font Rendering in Swing

    One of the more unexpected breakages after the upgrade was that some Java Swing (a once-popular GUI toolkit) applications suddenly had terribly jagged fonts, such as my beloved TOPCAT:

    Part of a screenshot of a menu with horribly jaggy letters

    I cannot exactly say why this looks so terrible[2]. Perhaps in the age of 300 dpi displays font hinting – which is supposed to avoid overly jagged pixelisation when rendering vector fonts at low resolutions – has become out of fashion, perhaps OpenJDK now …

  • Fixing “libqca-ossl is missing”

    In all honesty, I don't expect many people who might profit from this post will ever see the message below. But since common web searches don't yield anything for it (yet), I figure I should at least document it can happen. I also want to praise kwallet's author(s) because whatever went wrong yielded what turned out to be a rather useful error message rather than a spectacular crash:

    createDLGroup failed: maybe libqca-ossl is missing
    

    Here's what lead up to it: in Debian bookworm, my old Mastodon client tootle started crashing when viewing images. Its development has moved to a new client called Tuba, and even though that is not packaged yet I figured I might as well move on now rather than fiddle with tootle. Tuba, however, needs a password manager more sophisticated than the PGP-encrypted text file I use otherwise. So I bit the bullet and installed kwalletmanager; among the various password managers, it seemed to have the most reasonable dependencies.

    With that, Tuba can do the oauth dance it needs to be able to communicate with the server. But when it tries to save the oauth token it gets from the Mastodon instance, I got the error message above. Tuba can still talk to the the server, but once the session is over, the oauth token is lost, and the next time I start Tuba, I have to do the oauth dance again.

    Fixing the error seemed simple enough:

    $ apt-file search libqca-ossl
    libqca-qt5-2-plugins: /usr/lib/i386-linux-gnu/qca-qt5/crypto/libqca-ossl.so
    $ sudo apt install libqca-qt5-2-plugins
    

    – as I said: kwallet's is a damn good error message. Except the apt install has not fixed the problem (which is why I bother to write this post). That's because kwalletmanager starts a daemon, and that daemon is not restarted just because the plugins are installed.

    Interestingly, just killing that daemon didn't seem to fix the problem; instead, I had to hit “Close“ in kwalletmanager explicitly and then kill the daemon (as in killall kwalletd):

    Screenshot: kdewallet with a close button and two (irrelevant) tabs.

    I give you that last part sounds extremely unlikely, and it's possible that I fouled something up the first time I (thought I) killed kwalletd. But if you don't want to do research of your own: Just hit Close and relax.

    You could also reasonably ask: Just what is this “ossl” thing? Well… I have to admit that password wallets rank far down in my list of interesting software categories, and hence I just gave up that research once nothing useful came back when I asked Wikipedia about OSSL.

  • What to do when github eats 100% CPU in luakit

    I can't help it: As probably just about every other programming life form on this planet I have to be on github now and then. Curse the network effect and all those taking part in it (which would by now include me).

    Anyway, that's why the last iteration of luakit bug #972 (also on github. Sigh) bit me badly: as long as the browser is on a github page, it will spend a full 100% of a CPU on producing as many error messages as it can, each reading:

    https://github.githubassets.com/<alphabet soup>1:8116:
    CONSOLE JS ERROR Unhandled Promise Rejection:
    TypeError: undefined is not an object (evaluating 'navigator.clipboard.read')
    

    Github being a commercial entity I figured it's a waste of time trying to fill in a bug report. And the problem didn't fix itself, either.

    So, I went to fix it (in a fashion) with userscript. Since the problem apparently is that some github code doesn't properly catch a missing (or blacklisted) clipboard API in a browser (and I still consider blacklisting that API an excellent idea), I figured things should improve when I give github something similar enough to an actual clipboard. It turns out it does not need to be terribly similar at all. So, with a few lines of Javascript, while github still sucks, at least it doesn't eat my CPU any more.

    What do you need to do? Just create a userscript like this (for luakit; other browsers will have other ways):

    cd
    mkdir -p .local/share/luakit/scripts
    cat > .local/share/luakit/scripts/github.user.js
    

    Then paste the following piece of Javascript into the terminal:

    // ==UserScript==
    // @name          clipboard-for-github
    // @namespace     http://blog.tfiu.de
    // @description   Fix github's 100% CPU usage due to unhandled clipboard errors
    // @include       https://github.com*
    // ==/UserScript==
    navigator.clipboard = Object()
    navigator.clipboard.read = function() {
            return "";
    }
    

    As usual with this kind of thing, at least have a quick glance at what this code does; these four lines of source code sufficient here at least are easy to review. Finish off with a control-D, go to a luakit window and say :uscripts-reload.

    If you then go to, say bug #972, your CPU load should stay down. Of course, as long as github blindly tries to use the navigator.clipboard object for „copy link“-type operations, these still won't work. But that's now github's problem, not mine.

    And anyway: Give up Github.

  • How to Block a USB Port on Smart Hubs in Linux

    Lots of computer components (a notebook computer with its cover removed

    Somewhere beneath the fan on the right edge of this image there is breakage. This post is about how to limit the damage in software until I find the leisure to dig deeper into this pile of hitech.

    My machine (a Lenovo X240) has a smart card reader built in, attached to its internal USB. I don't need that device, but until a while ago it did not really hurt either. Yes, it may draw a bit of power, but I'd be surprised if that were more than a few milliwatts or, equivalently, one level of screen backlight brightness; at that level, not even I will bother.

    However, two weeks ago the thing started to become flaky, presumably because the connecting cable is starting to rot. The symptom is that the USB stack regularly re-registers the device, spewing a lot of characters into the syslog, like this:

    Aug 20 20:31:51 kernel: usb 1-1.5: USB disconnect, device number 72
    Aug 20 20:31:51 kernel: usb 1-1.5: new full-speed USB device number 73 using ehci-pci
    Aug 20 20:31:52 kernel: usb 1-1.5: device not accepting address 73, error -32
    Aug 20 20:31:52 kernel: usb 1-1.5: new full-speed USB device number 74 using ehci-pci
    Aug 20 20:31:52 kernel: usb 1-1.5: New USB device found, idVendor=058f, idProduct=9540, bcdDevice= 1.20
    Aug 20 20:31:52 kernel: usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    Aug 20 20:31:52 kernel: usb 1-1.5: Product: EMV Smartcard Reader
    Aug 20 20:31:52 kernel: usb 1-1.5: Manufacturer: Generic
    Aug 20 20:31:53 kernel: usb 1-1.5: USB disconnect, device number 74
    Aug 20 20:31:53 kernel: usb 1-1.5: new full-speed USB device number 75 using ehci-pci
    [as before]
    Aug 20 20:32:01 kernel: usb 1-1.5: new full-speed USB device number 76 using ehci-pci
    Aug 20 20:32:01 kernel: usb 1-1.5: New USB device found, idVendor=058f, idProduct=9540, bcdDevice= 1.20
    Aug 20 20:32:01 kernel: usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    [as before]
    Aug 20 20:32:02 kernel: usb 1-1.5: USB disconnect, device number 76
    

    And that's coming back sometimes after a few seconds, sometimes after a few 10s of minutes. Noise in the syslog is never a good thing (even when you don't scroll syslog on the desktop), as it will one day obscure something one really needs to see, and given that device registrations involve quite a bit of computation, this also is likely to become relevant power-wise. In short: this has to stop.

    One could just remove the device physically or at least unplug it. Unfortunately, in this case that is major surgery, which in particular would involve the removal of the CPU heat sink. For that I really want to replace the thermal grease, and I have not been to a shop that sells that kind of thing for a while. So: software to the rescue.

    With suitable hubs – the X240's internal hub with the smart card reader is one of them – the tiny utility uhubctl lets one cut power to individual ports. Uhubctl regrettably is not packaged yet; you hence have to build it yourself. I'd do it like this:

    sudo apt install git build-essential libusb-dev
    git clone https://github.com/mvp/uhubctl
    cd uhubctl
    prefix=/usr/local/ make
    sudo env prefix=/usr/local make install
    

    After that, you have a program /usr/local/sbin/uhubctl that you can run (as root or through sudo, as it needs elevated permissions) and that then tells you which of the USB hubs on your system support power switching, and it will also tell you about devices connected. In my case, that looks like this:

    $ sudo /usr/local/sbin/uhubctl
    Current status for hub 1-1 [8087:8000, USB 2.00, 8 ports, ppps]
      Port 1: 0100 power
      [...]
      Port 5: 0107 power suspend enable connect [058f:9540 Generic EMV Smartcard Reader]
      [...]
    

    This not only tells me the thing can switch off power, it also tells me the flaky device sits on port 5 on the hub 1-1 (careful inspection of the log lines above will reconfirm this finding). To disable it (that is, power it down), I can run:

    $ sudo /usr/local/sbin/uhubctl -a 0 -l 1-1 -p 5
    

    (read uhubctl --help if you don't take my word for it).

    Unfortunately, we are not done yet. The trouble is that the device will wake up the next time anyone touches anything in the wider vicinity of that port, as for instance run uhubctl itself. To keep the system from trying to wake the device up, you also need to instruct the kernel to keep its hands off. For our port 5 on the hub 1-1, that's:

    $ echo disabled > /sys/bus/usb/devices/1-1.5/power/wakeup
    

    or rather, because you cannot write to that file as a normal user and I/O redirection is done by your shell and hence wouldn't be influenced by sudo:

    $ echo disabled | sudo tee /sys/bus/usb/devices/1-1.5/power/wakeup
    

    That, indeed, shuts the device up.

    Until the next suspend/resume cycle that is, because these settings do not survive across one. To solve that, arrange for a script to be called after resume. That's simple if you use the excellent pm-utils. In that case, simply drop the following script into /etc/pm/sleep.d/30killreader (or so) and chmod +x the file:

    #!/bin/sh
    case "$1" in
      resume|thaw)
        echo disabled > /sys/bus/usb/devices/1-1.5/power/wakeup
        /usr/local/sbin/uhubctl -a 0 -l 1-1 -p 5
        ;;
    esac
    exit 0
    

    If you are curious what is going on here, see /usr/share/doc/pm-utils/HOWTO.hooks.gz.

    However, these days it is rather unlikely that you are still leaving suspend and hibernate to pm-utils; instead, on your box this will probably be handled by systemd-logind. You could run pm-utils next to that, I suppose, if you tactfully configured the host of items with baroque names like HandleLidSwitchExternalPower in logind.conf, but, frankly, I wouldn't try that. Systemd's reputation for wanting to manage it all is not altogether undeserved.

    I have tried to smuggle in my own code into logind's wakeup procedures years ago in systemd's infancy and found it hard if not impossible. I'm sure it is simpler now. If you know a good way to make logind run a script when resuming: Please let me know. I promise to amend this post for the benefit of people running systemd (which, on some suitable boxes, does include me).

  • SPARQL 2: Improvising a client

    A Yak on a mountain path, watching the observer

    There is still a lot of hair on the Yak I am shaving in this little series of posts on SPARQL. All the Yaks shown in the series lived on the Valüla Mountain in Vorarlberg, Austria.

    This picks up my story on figuring out whether birthdays are dangerous using SPRAQL on Wikidata. You can probably skip this part if you're only interested in writing SPARQL queries to Wikidata and are happy with the browser form they give you. But you shouldn't. On both accounts.

    At the end of part one, I, for one, was unhappy about the Javascript-based UI at Wikidata and had decided I wanted a user interface that would let me edit my queries in a proper editor (in particular, locally on my machine, giving me the freedom to choose my tooling).

    My browser's web inspector quickly showed me that the non-Javascript web UI simply sent a query argument to https://query.wikidata.org/sparql. That's easy to do using curl, except I want to read the argument from a file (that is, the one I am editing in my vi). Helpfully, curl's man page informs on the --form option:

    This enables uploading of binary files etc. To force the 'content' part to be a file, prefix the file name with an @ sign. To just get the content part from a file, prefix the file name with the symbol <. The difference between @ and < is then that @ makes a file get attached in the post as a file upload, while the < makes a text field and just get the contents for that text field from a file.

    Uploads, Multipart, Urlencoded, Oh My!

    In this case, Wikidata probably does not expect actual uploads in the query argument (and the form does not submit it in this way), so < it ought to be.

    To try it, I put:

    SELECT ?p ?o
    WHERE {
      wd:Q937 ?p ?o.
    }
    LIMIT 5
    

    (the query for everything Wikidata says about Albert Einstein, plus a LIMIT clause so I only pull five triples, both to reduce load on Wikidata and to reduce clutter in my terminal while experimenting) into a file einstein.rq. And then I typed:

    curl --form query=<einstein.rq https://query.wikidata.org/sparql
    

    into my shell. Soberingly, this gives:

    Not writable.
    

    Huh? I was not trying to write anything, was I? Well, who knows: Curl, in its man page, says that using --form does a POST with a media type of multipart/form-data, which many web components (mistakenly, I would argue) take as a file upload. Perhaps the remote machinery shares this misconception?

    Going back to the source of https://query.wikidata.org/, it turns out the form there does a GET, and the query parameter hence does not get uploaded in a POST but rather appended to the URL. Appending to the URL isn't trivial with curl (I think), but curl's --data option at least POSTs the parameters in application/x-www-form-urlencoded, which is what browsers do when you don't have uploads. It can read from files, too, using @<filename>. Let's try that:

    curl --data query=@einstein.rq https://query.wikidata.org/sparql
    

    Oh bother. That returns a lenghty message with about a ton of Java traceback and an error message in its core:

    org.openrdf.query.MalformedQueryException: Encountered " <LANGTAG> "@einstein "" at line 1, column 1.
    Was expecting one of:
        "base" ...
        "prefix" ...
        "select" ...
        "construct" ...
        "describe" ...
        "ask" ...
    

    Hu? Apparently, my query was malformed? Helpfully, Wikidata says what query it saw: queryStr=@einstein.rq. So, curl did not make good on its promise of putting in the contents of einstein.rq. Reading the man page again, this time properly, I have to admit I should have expected that: “if you start the data with the letter @“, it says there (emphasis mine). But haven't I regularly put in query parameters in this way in the past?

    Sure I did, but I was using the --data-urlencode option, which is what actually simulates a browser and has a slightly different syntax again:

    curl --data-urlencode query@einstein.rq https://query.wikidata.org/sparql
    

    Ha! That does the trick. What comes back is a bunch of XML, starting with:

    <sparql xmlns='http://www.w3.org/2005/sparql-results#'>
      <head>
        <variable name='p'/>
        <variable name='o'/>
      </head>
      <results>
        <result>
          <binding name='p'>
            <uri>http://schema.org/version</uri>
          </binding>
          <binding name='o'>
            <literal datatype='http://www.w3.org/2001/XMLSchema#integer'>1692345626</literal>
          </binding>
        </result>
    

    Making the Output Friendlier: Turtle?

    Hm. That's not nice to read. I thought: Well, there's Turtle, a nice way to write RDF triples in plain text. In RDF land, people rather regularly support the HTTP accept header, a wildly underused and cool feature of HTTP that lets a client say what kind of data it would like to get (see Content negotiation in the Wikipedia). So, I thought, perhaps I can tell Wikidata to produce Turtle using accept?

    This plan looks like this when translated to curl:

    curl --header "accept: text/turtle" \
      --data-urlencode query@einstein.rq https://query.wikidata.org/sparql
    

    Only, the output does not change, Wikidata ignores my request.

    Thinking again, it is well advised to do so (except it could have produced a 406 Not Acceptable response, but that would probably be even less useful). The most important thing to remember from part one is that RDF talks about triples of subject, predicate, and object. In SPARQL, you have a SELECT clause, which means a result row in general will not consist of subject, predicate, and object. Hence, the service couldn't possibly return results in Turtle: What does not consist of RDF triples canot be serialised as RDF triples.

    Making the Output Friendlier: XSLT!

    But then what do I do instead to improve result readability? For quick and (relatively) easy XML manipulation on the command line, I almost always recommend xmlstarlet. While I give you its man page has ample room for improvement, and compared to writing XSL stylesheets, the command line options of xmlstarlet sel (use its -h option for explanations) are somewhat obscure, but it just works and is compact.

    If you inspect the response from Wikidata, you will notice that the results come in result elements, which for every variable in your SELECT clause have one binding element, which in turn has a name attribute and then some sort of value in its content; for now, I'll settle for fetching either uri or literal (again, part one has a bit more on what that might mean). What I need to tell xmlstarlet thus is: “Look for all result elements and produce one output record per such element. Within each, make a name/value pair from a binding's name attribute and any uri or literal element you find.” In code, I furthermore need to add an XML prefix definition (that's totally orthogonal to RDF prefixes). With the original curl and a pipe, this results in:

    curl --data-urlencode query@einstein.rq https://query.wikidata.org/sparql \
    | xmlstarlet sel -T -N s="http://www.w3.org/2005/sparql-results#" -t \
      -m //s:result --nl -m s:binding -v @name -o = -v s:uri -v s:literal --nl
    

    Phewy. I told you xmlstarlet sel had a bit of an obscure command line. I certainy don't want to type that every time I run a query. Saving keystrokes that are largely constant across multiple command invocations is what shell aliases are for, or, because this one would be a bit long and fiddly, shell functions. Hence, I put the following into my ~/.aliases (which is being read by the shell in most distributions, I think; in case of doubt, ~/.bashrc would work whenever you use bash):

    function wdq() {
      curl -s --data-urlencode "query@$1" https://query.wikidata.org/sparql
      | xmlstarlet sel -T -N s="http://www.w3.org/2005/sparql-results#" -t \
        -m //s:result --nl -m s:binding -v @name -o = -v s:uri -v s:literal --nl
    }
    

    (notice the $1 instead of the constant file name here). With an exec bash – my preferred way to get a shell to reflecting the current startup scripts –, I can now type:

    wdq einstein.rq | less
    

    and get a nicely paged output like:

    p=http://schema.org/version
    o=1692345626
    
    p=http://schema.org/dateModified
    o=2022-07-31T01:52:04Z
    
    p=http://schema.org/description
    o=ލިޔުންތެރިއެއް
    
    p=http://schema.org/description
    o=ಗಣಿತಜ್ಞ
    
    p=http://schema.org/description
    o=भौतिकशास्त्रातील नोबेल पारितोषिकविजेता शास्त्रज्ञ.
    

    We will look at how to filter out descriptions in languagues one can't read, let alone speak, in the next instalment.

    For now, I'm reasonably happy with this, except of course I'll get many queries wrong initially, and then Wikidata does not return XML at all. In that case, xmlstarlet produces nothing but an unhelpful error message of its own, because it …

  • 'Failed to reset ACL' with elogind: Why?

    As I've blogged the other day, I like having my machine's syslog on the screen background so I notice when the machine is unwell and generally have some idea what it thinks it is doing. That also makes me spot milder distress signals like:

    logind-uaccess-command[30337]: Failed to reset ACL on /dev/bus/usb/002/061: Operation not supported
    

    I've ignored those for a long time since, for all I can see, logind-like software does nothing that on a normal machine sudo and a few judicious udev rules couldn't do just as well – and are doing on my box. The only reason there's elogind (a logind replacement that can live without systemd) on my box is because in Debian, kio – which in bullseye 270 packages depend upon – depends upon something like logind. The complaints in the syslog thus came from software I consider superfluous and I'd rather not have at all, which I felt was justification enough to look the other way.

    But then today curiosity sneaked in: What is going on there? Why would whatever elogind tries break on my box?

    Well, the usual technique of pasting relevant parts of the error message into some search engine leads to elogind PR #47 (caution: github will run analytics on your request). This mentions that the message results from a udev rule that tries to match hotplugged devices with users occupying a “seat”[1]. The rule calls some binary that would make sure that the user on the “seat” has full access to the device without clobbering system defaults (e.g., that members of the audio group can directly access the sound hardware) – and to keep the others out[2]. The Unix user/group system is not quite rich enough for this plan, and hence a thing called POSIX ACLs would be used for it, a much more complicated and fine-grained way of managing file system access rights.

    Well, the udev rules mentioned in the bug indeed live on my box, too, namely in /lib/udev/rules.d/73-seat-late.rules, which has the slightly esoteric:

    TAG=="uaccess", ENV{MAJOR}!="", RUN{program}+="/lib/elogind/elogind-uaccess-command %N $env{ID_SEAT}"
    

    I frankly have not researched what exactly adds the uaccess tag that this rule fires on, and when it does that, but clearly it does happen in Debian bullseye. Hence, this rule fires, and thus the failing elogind-uaccess-command is started.

    But why does it fail? Well, let's see what it is trying to do. The great thing about Debian is that as long as you have a (proper) deb-src line in your /etc/apt/sources.list, you can quickly fetch the source code of anything on your box:

    cd /usr/src  # well, that's really old-school.  These days, you'll
                 # probably have your sources somewhere else
    mkdir elogind # apt-get source produces a few files
    cd elongind   # -- keep them out of /usr/src proper
    apt-get source elogind
    cd <TAB>  # there's just one child directory
    

    To see where the source of the elongind-uaccess-command would be, I could have used a plain find, but in cases like these I'm usually lazy and just recursively grep for sufficiently specific message fragments, as in:

    find . -name "*.c" | xargs grep "reset ACL"
    

    This brings up src/uaccess-command/uaccess-command.c, where you'll find:

    k = devnode_acl(path, true, false, 0, false, 0);
    if (k < 0) {
             log_full_errno(errno == ENOENT ? LOG_DEBUG : LOG_ERR, k, "Failed to reset ACL on %s: %m", path);
             if (r >= 0)
                     r = k;
     }
    

    Diversion: I like the use of the C ternary operator to emit a debug or error message depending on whether or not things failed because the device file that should have its ACL adapted does not exist.

    So, what fails is a function called devnode_acl, which does not have a manpage but can be found in login/logind-acl.c. There, it calls a function acl_get_file, and that has a man page. Quickly skimming it would suggest the prime suspect for failures would be the file system, as that may simply not support POSIX ACLs (which, as I just learned, aren't really properly standardised). Well, does it?

    An apropos acl brings up the chacl command that would let me try acls out from the shell. And indeed:

    $ chacl -l /dev/bus/usb/001/003
    chacl: cannot get access ACL on '/dev/bus/usb/001/003': Operation not supported
    

    Ah. That in fact fails. To remind myself what file system we are talking about, I ran mount | grep "/dev " (the trailing blank on the search pattern is important), which corrected my memory from “it's a tmpfs” to “it's a devtmpfs”; while it turns out that the difference between the two does not matter for the problem at hand, your average search engine will bring up the vintage 2009 patch at https://lwn.net/Articles/345480/ (also from the abysses from which systemd came) when asked for “devtmpfs acl”, and a quick skim of that patch made me notice:

    #ifdef CONFIG_TMPFS_POSIX_ACL
    (something)
    

    This macro comes from the kernel configuration. Now, I'm still building the kernel on my main machine myself, and looking at the .config in my checkout of the kernel sources confirms that I have been too cheap to enable POSIX ACLs on my tmpfses (for a machine with, in effect, just a single user who's only had contact with something like POSIX ACLs ages ago on an AFS, that may be understandable).

    Well, I've enabled it and re-built my kernel, and I'm confident that after the next reboot the elogind messages will be gone. And who knows, perhaps the thing may actually save me a custom udev rule or two in the future because it automagically grants me access to whatever I plug in.

    Then again: Given there's now an API for Javascript from the web to read USB devices (I'm not making this up) and at least so far I'm too lazy to patch that out of my browsers… perhaps giving me (and hence these browsers) that sort of low-level access is not such a good idea after all?

    [1]See Multiseat on Wikipedia if you have no idea what I'm talking about. If you've read that you can probably see why I consider logind silly for “normal” computers with either a single user or lots of users coming in through the network.
    [2]Mind you, that in itself is totally reasonable: it would suck if everyone on a machine could read the USB key you've just plugged into a terminal; except that it's a rare configuration these days to have multiple persons share a machine that anyone but an administrator could plug anything into.
  • Mutt says: “error encrypting data: Unusable public key”

    Today, I replied to an encypted mail, and right after the last “yes, go ahead, send this stuff already”, my mail client mutt showed an error:

    error encrypting data: Unusable public key
    

    Hu? What would “unusable” mean here? The message when all PGP keys are expired looks quite a bit different. And indeed, the key in question was not expired at all:

    $ gpg --list-keys person@example.net
    pub   rsa4096/0xDEEEEEEEEEEEEEEE 2015-03-21 [SCA] [expires: 2023-02-01]
          FINGERPINTWITHHELDFINGERPRINTWITHHELDFIN
    uid                   [  full  ] Person <person@example.net>
    

    – this should do for another year or so. Or should it?

    Feeding the message to a search engine brings up quite a few posts, most of them from times when keyservers would mess up subkeys, i.e., the cryptographic material that is used to actually encrypt stuff (as opposed to the main key that usually just authenticates these subkeys).

    This obviously did not apply here, since keyservers have long been fixed in this respect. But subkeys were the right hint. If you compare the output above with what such a command will output for the feedback key for this blog:

    $ gpg --list-keys zuengeln@tfiu.de
    pub   rsa3072/0x6C4D6F3882AF70AD 2021-01-28 [SC]
          60505502FB15190B10DBF1436C4D6F3882AF70AD
    uid                   [ultimate] Das Engelszüngeln-Blog <zuengeln@tfiu.de>
    sub   rsa3072/0x3FCFC394D8DF7140 2021-01-28 [E]
    

    you'll notice that the Person's key above does not have a sub line, i.e., there are no subkeys.

    How can that happen? Gnupg won't create such a thing without serious amounts of coercion, and such a key is largely useless.

    Well, it turns out it doesn't happen. The subkeys are there, gnupg just hides them because that's what it does with expired subkeys by default. If you override that default, you'll get:

    $  gpg --list-options show-unusable-subkeys --list-keys person@example.net
    pub   rsa4096/0xDEEEEEEEEEEEEEEE 2015-03-21 [SCA] [expires: 2023-02-01]
          FINGERPINTWITHHELDFINGERPRINTWITHHELDFIN
    uid                   [  full  ] Person <person@example.net>
    sub   rsa4096/0xEEEEEEEEEEEEEEEE 2015-02-01 [E] [expired: 2020-01-31]
    sub   elg4096/0xEEEEEEEEEEEEEEEE 2020-02-01 [E] [expired: 2022-01-31]
    

    So, that's the actual meaning of the error message about „Unusable public key“: “No usable subkey”.

    What's a fix for that? Well, for all I know you cannot force gnupg to encrypt for an expired key, so the way to temporarily fix things (for instance, to tell people make their keys permanent[1]) is to turn the clock. There's the nice program faketime that just changes the time for whatever runs below it. That's great because on modern computers, changing the system time has all kinds of ugly side effects (not to mention you'd have to kill the ntpd that your computer quite likely runs to keep your computer's clock synchronised with the rest of the world).

    Since I'm using mutt as a mailer, I'd use faketime like this:

    faketime 2022-01-31 mutt
    

    I'm fairly confident this would work with, say, thunderbird as well, though it might be a problem if the times of an X server and client are dramatically different.

    But that's really no substitute for an updated key: In most people's mailboxes, such mails will be way down in the swamp of rotting mails from one month ago[2] And mail servers sometimes refuse to transport mail that's so far from the past.

    Then again, to my own surprise, everytime I had to go to such extremes because I didn't have a non-expired key, the recipients eventually noticed.

    [1]Let me again advertise non-expiring keys. The main arguments for these are that (a) essentially nobody directly attacks keys, so it really doesn't matter if a key is used for a decade or more, and (b) PGP is hard enough for muggles even without auto-destructing keys. The net effect of expiring keys on privacy is thus negative, because they keep people off using PGP and even trying to understand crypto. And you can always revoke keys, in particular when we have educated people to now and then sync their keyring with keywervers.
    [2]As a side note: While inbox zero sounds to much like one of those market-radical self-improvement fads to me, I've been religious about less-than-a-page inbox for the past decade or so and found it did improve a relevant part of my life.
  • Wakealarm: Device or resource busy

    The other day I wanted a box doing regular (like, daily) file system backups and really not much else to switch off while idle and then wake up for the next backup. Easy, I thought, install the nvram-wakeup package and that's it.

    Alas, nvram-backup mumbled something about an unsupported BIOS that sounded suspiciously like a lot of work that would benefit almost nobody, as the box in question houses an ancient Supermicro board that's probably not very common any more.

    So, back to the roots. Essentially any x86 box has an rtc that can wake it up, and Linux has had an interface to that forever: Cat a unix timestamp (serialised to a decimal number) into /sys/class/rtc/rtc0/wakealarm, as discussed in the kernel documentation's sysfs-class-rtc file:

    (RW) The time at which the clock will generate a system wakeup event. This is a one shot wakeup event, so must be reset after wake if a daily wakeup is required. Format is seconds since the epoch by default, or if there's a leading +, seconds in the future, or if there is a leading +=, seconds ahead of the current alarm.

    That doesn't tell the full story, though. You see, I could do:

    BACKUP_AT="tomorrow 0:30"
    echo `date '+%s' -d "$BACKUP_AT"` > /sys/class/rtc/rtc0/wakealarm
    

    once, and the box came back, but when I then tried it again, the following happened:

    echo `date '+%s' -d "$BACKUP_AT"` > /sys/class/rtc/rtc0/wakealarm
    bash: echo: write error: Device or resource busy
    

    Echoing anything with + or += did not work either; I have not tried to ascertain why, but suspect that's functionality for more advanced RTC chips.

    Entering the error message into a search engine did bring up a lkml thread from 2007, but on lmkl.iu.edu the thread ends with an open question: How do you disable the wakealarm? Well: the obvious guess of echo "" does not work. My second guess, however, did the trick: You reset the kernel wakealarm by writing a 0 into it:

    echo 0 > /sys/class/rtc/rtc0/wakealarm
    

    – after which it is ready to be written to again.

    And now that I've written this post I notice that the 2007 thread indeed goes on, as on narkive, and a bit further down, Tino summed up this entire article as:

    Please note that you have to disable the old alarm first, if you want
    to set a new alarm. Otherwise, you get an error. Example:
    
    echo 12345 > /sys/class/rtc/rtc0/wakealarm
    echo 0 > /sys/class/rtc/rtc0/wakealarm
    echo 23456 > /sys/class/rtc/rtc0/wakealarm
    

    Ah well. Threading is an important feature in mail clients, even if they're just archives.

  • Fixing "No sandbox user" the Right Way

    I'm setting up an ancient machine – a Pentium M box with a meme 256 MB of RAM – with current Debian bullseye, and I'm impressed that that still works: this machine is almost 20 years old. Hats off to the Debian folks.

    But that's not really my story. Instead, this is about fixing what's behind the message:

    No sandbox user '_apt' on the system, can not drop privileges
    

    from apt. As you probably have just done, my first reaction was to feed that message to a search engine.

    Quite a few pages were returned, and all I looked at suggested to simply create the user using one of the many ways a Debian box has for that. That is not totally unreasonable, but it does not really address the underlying cause, and hence I thought I should do better.

    The immediately underlying cause is that for whatever deeper reason a maintainer script – shell scripts that Debian packages run after installing packages or before removing them – has not properly run; that is usually the place where packages create users and do similar housekeeping. Just creating the user may or may not be enough, depending on what else the maintainer script would have done.

    Hence, the better way to fix things is to re-run the maintainer script, as that would either run the full routine or at least give an error message that lets you figure out the deeper cause of the problem. Dpkg runs the maintainer script(s) automatically when you re-install the package in question.

    But what is that “package in question” that should have created the user? You could guess, and in this particular case your guess would quite likely be right, but a more generally applicable technique is to simply see what script should have created the user. That's not hard to do once you know that the maintainer scripts are kept (next to other package metadata) in /var/lib/dpkg/info/; so, with GNU grep's -r (recursive) option, you can run:

    grep -lr "_apt" /var/lib/dpkg/info/
    

    which gives the names of all files containing _apt in files below that directory. On my box, that is:

    /var/lib/dpkg/info/python3-apt.md5sums
    /var/lib/dpkg/info/libperl5.32:i386.symbols
    /var/lib/dpkg/info/apt.postinst
    /var/lib/dpkg/info/python3-apt.list
    

    Ah-ha! The string is mentioned in the post-installation script of the apt package. Peeking inside this file, you see:

    if [ "$1" = 'configure' ]; then
            # add unprivileged user for the apt methods
            adduser --force-badname --system --home /nonexistent  \
                --no-create-home --quiet _apt || true
    fi
    

    So: this really tries to create the user when the package is being configured, but it ignores any errors that may occur in the process (the || true). That explains why the system installation went fine and I got the warnings later (rather than a hard error during the installation).

    Just re-configuring the apt package would therefore be enough to either fix things or at least see an error message. But really, unless it's a huge package I tend to save on brain cycles and just run apt reinstall, which in this particular case leads to the somewhat funky command line:

    apt reinstall apt
    

    For me, this fixed the problem – and I've not bothered to fathom why the user creation failed during initial system setup. If you've seen the same problem and still have a record of the installation, perhaps you could investigate and file a bug if necessary?

  • Mailman3: "Cannot connect to SMTP server localhost on port 25"

    I've been a fairly happy mailman user for about 20 years, and I ran mailman installations for about a decade in the 2000s.

    Over the last week or so, I've spent more time setting up a mailman3 list off and on than I've spent with mailman guts in all the years before, which includes recovery form one or two bad spam attacks. Welcome to the brave new world of frameworks and microservices.

    Perhaps the following words of warning can help other mailman3 deployers to not waste quite as much time.

    Badly Misleading Error Messages

    Most importantly, whatever you do, never call mailman as root. This will mess up permissions and lead to situations really hard to debug. In particular, the error message from the post's title:

    Cannot connect to SMTP server localhost on port 25
    

    apparently can have many reasons (or so the recipes you find on the net suggest), few of which have anything to do with SMTP, but one clearly is when mailman can't read or write to queue files or templates or whatever and bombs out while trying to submit mail.

    Morale: Don't claim too much when writing error messages in your programs.

    Unfortunately, I've fixed the thing accidentally, so I can't say what exactly broke. The take away still is that, in Debian (other installations' mailman users might be called something else) you run mailman like this:

    sudo -u list mailman
    

    However, I can now say how to go about debugging problems like these, at least when you can afford a bit of mailman unavailability. First, stop the mailman3 daemon, because you want to run the thing in the foreground. Then set a breakpoint in deliver.py by inserting, right after def deliver(mlist, msg, msgdata), something like:

    import pdb; pdb.set_trace()
    

    Assuming Debian packaging, you will find that file in /usr/lib/python3/dist-packages/mailman/mta.

    Of course, you'll now need to talk to the debugger, so you'll have to run mailman in the foreground. To do that, call (perhaps adapting the path):

    sudo -u list /usr/lib/mailman3/bin/master
    

    From somewhere else, send the mail that should make it to the mail server, and you'll be dropped into the python debugger, where you can step until where the thing actually fails. Don't forget to remove the PDB call again, as it will itself cause funky errors when it triggers in the daemonised mailman. Of course, apt reinstall mailman3 will restore the original source, too.

    Template Management Half-Broken

    When I overrode the welcome message for a mailing list, the subscription notifications to the subscribing users came out empty.

    This time, there was at least something halfway sensible in the log:

    requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost/postorius/api/templates/list/kal.sofo-hd.de/list:user:notice:welcome
    

    Until you read up on the mailman3 system of managing templates (which, roughly, is: store URIs from where to pull them), it's a bit mystifying why mailman should even try this URI. Eventually, one can work out that when you configure these templates from Postorius, it will take the URI at which mailman should see it, Postorius, from POSTORIUS_TEMPLATE_BASE_URL in /etc/mailman/mailman-web.py. This is preconfigured to the localhost URI, which proabably only rarely is right.

    To fix it, change that setting to:

    POSTORIUS_TEMPLATE_BASE_URL = 'http://<your postorious vserver>/postorius/api/templates/'
    

    Of course it'll still not work because the old, wrong, URI is still in mailman's configuration. So, you'll have to go back to the template configuration in Postorius and at least re-save the template. Interestingly, that didn't seem to fix it for me for reasons I've not bothered to fathom. What worked was deleting the template and re-adding it. Sigh.

    As soon as you have more than one template, I expect it's faster to change the URIs directly in mailman's database, which isn't hard, as seen in the next section.

    [Incidentally: does anyone know what the dire warnings in the docs about not using sqlite3 on “production” systems actually are about?]

    Disable Emergency Moderation After Moving

    Basically because I was hoping to get a more controlled migration, I had set one list on the old server to emergency moderation before pulling the config.pck. Don't do that, because at least as of now mailman3 has the notion of emergency moderation but makes it hard to switch it on or off. I eventually resorted to directly touching mailman's config database (if you've configured mailman to use something else than sqlite, the shell command is different, but the query should be the same):

    $ sudo -u list sqlite3 /var/lib/mailman3/data/mailman.db
    [on the sqlite prompt:]
    update mailinglist set emergency=0 where list_id='<your list id>';
    

    Note that <your list id> has a dot instead of the at, so if your list is mylist@example.org, its id is mylist.example.org.

    Oh No, CSRF Token

    The list I cared about most could be joined from an external web site, transparently posting to mailman2's cgi-bin/mailman/subscribe (oh! CGI! How am I missing you in the age of uwsgi and Django!). Looking at its counterpart for modern mailman3, the first thing I noted is that there's a CSRF token in it – if you've not encountered them before, it's a couple of bytes the originating server puts into a web form to prevent what Postorius' authors feels is Cross Site Request Forgery.

    Of course, what I wanted was exactly that: Post to Postorius from a different web site. I don't feel that's forgery, very frankly.

    I didn't see an obvious way to turn it off, and I was a bit curious about mailman3's own http API, so I wrote a few lines of code to do this; the API part itself was straightforward enough, something like:

    result = requests.post(
      getConfig("mailmanAPI")+"/members", {
        'list_id': getConfig("mailmanListname"),
        'subscriber': toSubscribe,
        'pre_verified': False,
        'pre_confirmed': False,
        'pre_approved': True,},
      auth=(getConfig("mailmanAPIUser"),
        getConfig("mailmanAPIPassword")),
      timeout=1)
    

    – but of course it sucks a bit that subscribing someone requires the same privilege level as, say, creating a mailing list or changing its description. And all that just to work around CSRF prevention. Sigh.

    On top of that, I've tried K-SAT on the pre_X booleans to try and see if anything gives me the tried and tested workflow of “let folks enter a mail address, send a confirmation link there, subscribe them when it's being clicked“. No luck. Well, let's hope the pranksters don't hit this server until I figure out how to do this.


    Hm. I think I'm a bit too locked in into mailman to migrate away, but I have to say I wish someone would port mailman2 to python3 and thus let mailman2 hang on essentially forever. It did all a mailing list manager needs to do as far as I am concerned, and while it wasn't pretty with the default browser stylesheets, even now, almost a decade into mailman3, it works a whole lot more smoothly.

    Or perhaps there's a space for a new mailing list manager with a trivially deployable web interface not requiring two separate database connections? Perhaps such a thing exists already?

    Well, summing up, the central migration advice again: Mind the sudo option in

    sudo -u list mailman import21 my-list@example.org config.pck
    
  • OpenSSL: get_name: no start line?

    As part of my DIY mail server project, the other day I put a POP3 server on that box – solid-pop3d if you want to know –, and since that server doesn't have SSL built in, I configured stunnel to provide that, re-using a certificate I get for mail.tfiu.de's https server from letsencrypt. Trivial configuration:

    [spop3d]
    accept=995
    connect=110
    cert=/etc/stunnel/mail.pem
    

    And bang!, an error message from stunnel:

    [ ] Loading private key from file: /etc/stunnel/mail.pem
    [!] error queue: 140B0009: error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib
    [!] SSL_CTX_use_PrivateKey_file: 909006C: error:0909006C:PEM routines:get_name:no start line
    

    One of my least favourite pastimes is figuring out cryptic OpenSSL error messages, and so I immediately fed this to $SEARCH_ENGINE. The responses were, let's say, lacking rigour, and so I thought I might use this blog to give future message googlers an explanation of what the problem was in my case.

    What OpenSSL was saying here simply was: there's no private key in the PEM.

    Where would the fun be if OpenSSL had said that itself?

    In case this doesn't immediately tell you how to fix things: “PEM files” in today's computing [1] are typically bundles of a “key” (that's the pair of public and secret key in sensible language), a “certificate” (that's a signed public key in sensible language), and possibly intermediate certificates that user agents may need to figure out that the signature on the certificate is any good, based on what certificate authorities they trust.

    All these things almost always come in base64 encoded ASCII these days (that's the actual meaning of “PEM“), which is nice because you can create your “PEM file” with cat if you've got the other parts. For instance, in my dealings with letsencrypt, I'm creating the key using:

    openssl genrsa 4096 > $SERVERNAME.key
    

    Then I build a certificiate signing request in some way that's immaterial here, and finally call the great acme-tiny something like:

    acme-tiny --account-key ./account.key --csr ./"$SERVERNAME".csr \
            --acme-dir /var/www/acme-challenge\
             > ./"$SERVERNAME".crt
    

    Letsencrypt also hands out the the intermediate certificates at a well-known URI, so I pull that, too:

    curl https://letsencrypt.org/certs/lets-encrypt-x3-cross-signed.pem \
            > intermediate.pem
    

    With that, all I have to do to make the “PEM file” is:

    cat $SERVERNAME.crt intermediate.pem > $SERVERNAME.pem  # not
    

    That was basically what I had in my certificate updating script, and that is what caused the error in my case. Spot it? Right, I failed to cat the key file in. I should have written:

    cat $SERVERNAME.key $SERVERNAME.crt intermediate.pem > $SERVERNAME.pem
    

    So – if you're seeing this error message, while I can't say why your key pair is missing in the PEM, I'd strongly suspect it is. Diagnosis: look for

    -----BEGIN RSA PRIVATE KEY-----

    somewhere in the file (and make sure all the dashes are present if you see something that looks like that and you're still seeing the odd OpenSSL message).

    [1]

    I've had to look that up myself: PEM actually has nothing to do with all kinds of cryptographic material cat-ed together into one file. Rather, it stands for Privacy-Enhanced Mail, something the IETF tried to establish in the early 1990ies where today (regrettably) S/MIME sits and what we could all mercifully forget if people finally just adopted PGP already.

    RFC 1421 – where a good deal of PEM is defined – was published in 1993 and still talks about BITNET! Oh wow. While this sort of PEM is dead, it did pioneer the ASCII-armoring of X.509 material. Of course, ASCII-armoring as such had been around for many years at that time – let me just mention uuencode, the cornerstone of software distribution on Usenet –, and PGP had even used base64 for crypto stuff, but all these (sensibly) steered clear of X.509.

    And ASCII-armored X.509 is PEM's legacy, as acknowledged by RFC 7468 (published in 2015, more than 20 years after the original PEM). Of course, RFC 7468 doesn't mention the .pem extension, let alone anything about the practice of assembling multiple kinds of cryptographic material in files with that extension.

Seite 1 / 1

Letzte Ergänzungen