DIY – „Do It Yourself“ – is mostly about computers here, and mostly about operating services oneself that these days are typically bought from some platform (using some currency not necessarily monetary in the narrower sense). Indeed, I claim that if digital self-determination means anything (ro: can mean anything given the mad complexity of modern computers in both software and hardware), it will be this: can you run the services you want to use yourself? Or can, perhaps, your friends do that, providing truly social media, from human to human, not mediated via markets, but in direct, well, social interaction?

  • ssh-Based git Upstreams with Multiple Identities

    A screenshot showing a grid of many grey and a few black boxes, with a legend “70 contributions in the last 12 months“.

    This is what I'd like to contain: Activity graphs on version control platforms. It should not be too easy to get a global graph of my activities, and in particular not across my various activities in work and life, even where these are, taken by themselves, public.

    I have largely given up on self-hosting version control systems for the part of my stuff that the general public might want to look at. In this game, it's platforms[1] exclusively nowadays, as there is no way anyone would find the material anywhere else.

    That, of course, is a severe privacy problem, even when the platform itself is relatively benevolent (i.e., codeberg). For people like me – spending a significant part of their lives at the keyboard and version-controlling a significant part of the keystrokes, the commit history says a lot about their lives, quite likely a lot more than they care to publicly disclose.

    As a partial mitigation, I am at least using different accounts for different functions: work, hacking, politics, the lot. The partial commit histories are decidedly less telling than a global commit history would be.

    However, this turns out to be trickier than you might expect, which is why I am writing this post.

    First off, the common:

    git config user.name "Anselm Flügel"
    git config user.email zuengeln@tfiu.de

    does not have anything to do with the platform accounts. It only changes the authorships in the log entries, and you are completely free to put anything at all there. The account used for pushing – and hence the source of the platforms' user history and activity images (see above) is absolutely unrelated to the commits' user.name-s. Instead, the account name is, in effect, encoded in the URI of the remote; and that is where things become subtle.

    Because, you see, there is no useful user name in:

    $ git remote get-url origin

    The AnselmF in there is part of the repo path; you can push into other peoples' repos if they let you, so that cannot be the source of the user name. And the “git@” at the start, while it looks like a username and actually is one, is the same for everyone.

    So, how do github, codeberg and their ilk figure out which account to do a push under? Well: They use the ssh key that you uploaded into your profile. Since each ssh key can only be assigned to one account, the platforms can deduce the account from the fingerprint of the public key that ssh presents on connecting.

    Historical note: the big Debian SSL disaster 16 years ago, where Debian boxes would only generate a very small number of distinct secret keys (thus making them non-secret), was uncovered in this way, as just when the early github phased in this scheme, impossibly many keys from different persons turned out to have the same fingerprint. Matt Palmer recently related how in his work at github he worked out Debian's broken random number generator back then.

    In practice, this means that when you want to have multiple accounts on a single platform, after you have created a new account, you need to create a new ssh key associated with it (i.e., the new account), preferably with a name that roughly matches its intended use:

    cd ~/.ssh
    ssh-keygen -t ed25519 -f id_anselm

    This will leave the public key in ~/.ssh/id_anselm.pub; the contents of that file will go into (say) codeberg's SSH key text box (look for the “Keys” section in your “Settings”).

    This still is not enough: ssh will by default try all the keys you have in ~/.ssh in a deterministic order. This means that you will still always be the same user as long as you use a remote URL like git@codeberg.org:AnselmF/crapicity.git – the user the public key of which is tried first. To change this, you must configure ssh to use your account-specific key for some bespoke remote URIs. The (I think) simplest way to do that is to invent a hostname in your ~/.ssh/config, like this:

    Host codeberg-anselm
            HostName codeberg.org
            User git
            IdentitiesOnly yes
            IdentityFile ~/.ssh/id_anselm

    This lets you choose your upstream identity using the authority part of the remote URI; use (in this case) codeberg-anselm rather than codeberg.org to work with your new account. Of course the URIs you paste from codeberg (or github or whatever) will not know about this. Hence, you will normally have to manually configure the remote URI, with a (somewhat hypothetical) command sequence like this:

    git clone git@codeberg.org:AnselmF/crapicity.git # pasted URI
    git remote set-url origin git@codeberg-anselm:AnselmF/crapicity.git

    After that, you will push and pull using the new account.

    [1]Well, at this point it is, if I have to be absolutely honest, one platform largely, but I outright refuse to acknowledge that.
  • Feiertage in remind: Jetzt Bundesweit

    Vor einem bunten Schaufenster steht eine bunte Jesusfigur auf einem kleinen, leintuchumhüllten Podest, darunter ganz viel Grünstreu und Blumen.

    Vielleicht braucht ein Post zu Feiertagsdaten nicht unbedingt eine Illustration. Aber wo sollte ich diese Konkurrenz zwischen Fronleichnamskult (2014 in Walldürn) und moderner Schaufensterdeko sonst unterbringen?

    In meinem Post zu Feiertagen in remind habe ich gesagt:

    Mit wenig Mühe sollte das auf die Verhältnisse in anderen Bundesländern anzupassen sein. Wer das tut, darf die Ergebnisse gerne hierherschicken. Als großer Freund des Feiertags an und für sich würde ich hier sehr gerne ein Repositorium von Feiertagsdateien pflegen.

    Nun, tatsächlich lohnt es sich eigentlich gar nicht, so etwas crowdzusourcen, denn es gibt eine recht nützliche Übersicht über die Feiertage in den Astronomischen Grundlagen für den Kalender, und das wiederum ist schnell in Python übersetzt (will sagen: Fehler sind meine). Das Ergebnis: remind-feiertage.

    Das ist ein Python-Skript, das ohne weitere Abhängigkeit läuft und einen oder mehrere Bundesland-Kürzel nimmt:

    $ python remind-feiertage.py
    Usage: remind-feiertage.py land {land}.
    Gibt remind-Feiertagsdateien für deutsche Länder aus.
    Länderkürzel: BW BY BE BB HB HH HE MV NDS NRW RLP SH TH.
    Erklärung: SL=Saarland, SN=Sachsen, SA=Sachsen-Anhalt)

    Übergibt mensch alle Kürzel, kommen auch alle Feiertagsdateien raus. Ihr könnt also auch einfach die Daten für euer Bundesland von hier cutten und pasten:

    $ python remind-feiertage.py BW BY BE BB HB HH HE MV NDS NRW RLP SA SH SL SN TH
    ============= BB =============
    # Feiertage in BB
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Oct 31 MSG Reformationstag
    ============= BE =============
    # Feiertage in BE
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Mar 8 MSG Frauentag
    ============= BW =============
    # Feiertage in BW
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Jan 6 MSG Epiphanias
    REM [ostern+60] MSG Fronleichnam
    REM Nov 1 MSG Allerheiligen
    ============= BY =============
    # Feiertage in BY
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Jan 6 MSG Epiphanias
    REM [ostern+60] MSG Fronleichnam
    REM Aug 15 MSG M. Himmelfahrt
    REM Oct 31 MSG Reformationstag
    ============= HB =============
    # Feiertage in HB
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Oct 31 MSG Reformationstag
    ============= HE =============
    # Feiertage in HE
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    ============= HH =============
    # Feiertage in HH
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Oct 31 MSG Reformationstag
    ============= MV =============
    # Feiertage in MV
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Mar 8 MSG Frauentag
    REM Oct 31 MSG Reformationstag
    ============= NDS =============
    # Feiertage in NDS
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Oct 31 MSG Reformationstag
    ============= NRW =============
    # Feiertage in NRW
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    REM Oct 31 MSG Reformationstag
    ============= RLP =============
    # Feiertage in RLP
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    REM Oct 31 MSG Reformationstag
    ============= SA =============
    # Feiertage in SA
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Jan 6 MSG Epiphanias
    REM Oct 31 MSG Reformationstag
    ============= SH =============
    # Feiertage in SH
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM Oct 31 MSG Reformationstag
    ============= SL =============
    # Feiertage in SL
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    REM Aug 15 MSG M. Himmelfahrt
    REM Oct 31 MSG Reformationstag
    ============= SN =============
    # Feiertage in SN
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    REM Oct 31 MSG Reformationstag
    REM Wednesday Nov 16 MSG Buß+Bettag
    ============= TH =============
    # Feiertage in TH
    # CC0; siehe auch https://codeberg.org/AnselmF/remind-feiertage
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM [ostern-2] MSG Karfreitag
    REM [ostern+1] MSG Ostermontag
    REM May 1 MSG Maifeiertag
    REM [ostern+39] MSG Himmelfahrt
    REM [ostern+50] MSG Pfingstmontag
    REM Oct 3 MSG Nationalfeiertag
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern+60] MSG Fronleichnam
    REM Sep 20 MSG Weltkindertag
    REM Oct 31 MSG Reformationstag

    Hinweise, wie das mit remind verwendbar ist, findet ihr im Baden-Württemberg-Post.

    Lasst mich zur Klarheit und auch als mein äußerstes Zugeständnis an Search Engine Optimisation gerade noch die Bundesland-Kürzel ausschreiben:

  • Feiertage in Baden-Württemberg für die Terminverwaltung remind

    Screenshot eines Terminals mit blauem Hintergrund. Gezeigt ist die Kommandozeile remind -cu+2 ~/.reminders 2024-03-24 und ein ASCII-Kalender, in dem Karfreitag und Ostermontag markiert sind.

    Gut: In der Realität sehe ich meinen remind-Kalender meist als Tk-Widget oder in HTML, aber im Zweifel geht auch ASCII, etwa, wenn ich wie jetzt meine Feiertage vorführen will.

    Als ich neulich zu Debian bookworm migriert bin, musste ich mich endlich vom GPE-Kalender[1] verabschieden, weil er nach langen Jahren als verwaistes Paket schließlich doch noch einen Konflikt mit was Wichtigem eingefangen hat. Es war aber ohnehin höchste Zeit, für die Terminverwaltung zu etwas Sinnvollerem zu migrieren. In meinem Fall: remind. Das nun fühlt sich – zusammen mit tkremind (auch Debian-paketiert) und einem:

    reminders = subprocess.run(["remind", "-pp", "-c+3",
    reminders_html = subprocess.run(["rem2html", "-tableonly"],
      capture_output=True, input=reminders).stdout

    in dem Python-Skript, das mir meine tägliche Zusammenfassung in HTML produziert – so an, als könnte das für die nächsten 20 Jahre halten.

    Mit diesem Gefühl wollte ich nun endlich die Anzeige von Feiertagen konfigurieren, etwas, das ich mit dem GPE-Kalender bis zu dessen bitterem Ende Jahr um Jahr prokrastiniert habe. Allein, zu einer Anfrage "remind" Feiertage "Baden-Württemberg" ist weder Google noch Duckduckgo etwas Brauchbares eingefallen.

    Um das zu ändern, schreibe ich diesen Post. Und zwar habe ich gerade die folgende remind-Datei mit den gesetzlichen Feiertagen in Baden-Württemberg geschrieben:

    # Feiertage in Baden-Württemberg (Stand 2024)
    # Verteilt unter CC0.
    SET ostern EASTERDATE($Uy)
    REM Jan 1 MSG Neujahr
    REM Jan 6 MSG Epiphania
    REM May 1 MSG Kampftag
    REM Oct 3 MSG Nationalfeiertag
    REM Nov 1 MSG Allerheiligen
    REM Dec 25 MSG Weihnachten 1
    REM Dec 26 MSG Weihnachten 2
    REM [ostern-2] Karfreitag
    REM [ostern+1] Ostermontag
    REM [ostern+39] Himmelfahrt
    REM [ostern+50] Pfingstmontag
    REM [ostern+60] Fronleichnam

    Mit wenig Mühe sollte das auf die Verhältnisse in anderen Bundesländern anzupassen sein. Wer das tut, darf die Ergebnisse gerne hierherschicken. Als großer Freund des Feiertags an und für sich würde ich hier sehr gerne ein Repositorium von Feiertagsdateien pflegen.

    Wie verwende ich das? Nun, ich habe ein Verzeichnis für allerlei Kram, der längere Zeit irgendwo in meinem Home sein soll, aber nicht gerade in dessen Wurzel: ~/misc. Dort leben jetzt auch diese Feiertage als bawue.rem.

    Die eigentlichen Termine habe ich – wie aus dem Python oben schon ahnbar und mit großem Vergnügen XDG-unkonform – in einer Datei ~/.reminders. Und dort steht jetzt bei mir:

    INCLUDE /usr/share/remind/lang/de.rem
    DO misc/bawue.rem

    Die erste Zeile macht deutschsprachige Beschriftung, das DO (statt include) in der zweiten Zeile ist wichtig, damit remind den Pfad relativ zum Pfad der reminders-Datei auflöst.

    Und damit werde ich nie wieder dienstliche Termine auf Feiertage legen. So.

    [1]GPE steht hier für das längst vergessene GPE Palmtop Environment; demnach roch auch der GPE-Kalender schon seit einem Jahrzehnt ziemlich streng.
  • Select And Merge Pages From Lots Of PDFs Using pdftk

    For most of my ad-hoc PDF manipulation needs (cut and paste pages, watermark, fill forms, attach files, decrypt, etc), I am relying on pdftk: Fast, Debian-packaged (in pdftk-java), and as reliable as expectable given the swamp of half-baked PDF writers. So, when I recently wanted to create a joint PDF from the first pages of about 50 other PDFs, I immediately started thinking along the lines of ls and perhaps a cat -b (which would number the lines and thus files) and then pdftk.

    Why cat -b? Well, to do cut-and-merge with pdftk, you have to come up with a command line like:

    pdftk A=input1.pdf B=input2.pdf cat A1-4 B5-8 output merged.pdf

    This would produce a document merged.pdf from pages 1 through 4 of input1.pdf and pages 5 through 8 of input2.pdf. I hence need to produce a “handle” for each input file, for which something containing the running number would a appear an obvious choice.

    My initial plan had therefore been to turn lines like 1 foo.pdf from ls | cat -b into doc1=foo.pdf with a dash of sed and go from there. If I were more attentive than I am, I would immediately have realised that won't fly: With handles containing digits, pdftk would have no robust way to tell whether doc12 means “page 12 from doc“, “page 2 from doc1“, or “all pages from doc12”. Indeed, pdftk's man page says:

    Input files can be associated with handles, where a handle is one or more upper-case letters[.]

    Oh dang. I briefly meditated whether I could cook up unique sequences of uppercase handles (remember, I had about 50 files, so just single uppercase letters wouldn't have done it) using a few shell hacks. But I then decided[1] that's beyond my personal shell script limit and calls for a more systematic programming language like, umm, python[2].

    The central function in the resulting little program is something that writes integers using uppercase letters only. Days later, I can't explain why I have not simply exploited the fact that there are a lot more uppercase letters than there are decimal digits, and hence making uppercase labels from integers is solvable using string.translate. A slightly overcompact rendering of that would be:

    DIGIT_TO_LETTER = {ascii: chr(ascii+17) for ascii in range(48, 59)}
    def int_to_uppercase(i):
      return str(i).translate(DIGIT_TO_LETTER)

    (if you don't remember the ASCII table: 48 is the ASCII code for zero, and 48+17 is 65, which is the ASCII code for the uppercase A).

    But that's not what I did, perhaps because of professional deformation (cf. my crusade against base-60). Instead, I went for a base-26 representation using uppercase letters only, just like the common base-16 (“hex”) representation that, however, uses 0-9 and A-F and thus is unsuitable here. With this, you would count like this (where more signifiant “digits“ are on the right rather than on the western-conventional left here because it doesn't matter and saves a reverse):

    A, B, C, D..., X, Y, Z, AB, BB, CB, ... ZB, AC, BC...
    0, 1, ..............25, 26, 27,.......      52, 53

    I freely admit I was at first annoyed that my handles went from Z to AB (rather than AA). It did take me longer than I care to confess here to realise that's because A is the zero here, and just like 01 is the same as 1 decimal[3], AA is equal to A (and BA equal to B) in that system. Consequently, my function for unique handles didn't produce AA even though I hadn't realised the problem when writing the function – there's nothing as practical as a good theory.

    With that function, the full ad-hoc script to pick pages one (that's encoded in the f"{hdl}1" in case you want other page ranges) from all files matching /some/dir/um*.pdf looks like this:

    import glob
    import os
    import subprocess
    def make_handle(ind):
        """returns a pdftk handle for a non-negative integer.
        This is a sequence of one or more uppercase letters.
        hdl = []
        while True:
            ind = ind//26
            if not ind:
        return "".join(hdl)
    sources = [(make_handle(ind), name)
      for ind, name in enumerate(sorted(glob.glob("/some/dir/um*.pdf")))]
    subprocess.check_call(["pdftk"]+[f"{hdl}={name}" for hdl, name in sources]+
        ["cat"]+[f"{hdl}1" for hdl, _ in sources]+
        ["output", "output.pdf"])

    Looking back, not only the massively silly base-26 handles are unnecessarily complicated. Had I realised from the beginning I would be using python in the end, I would probably have gone for pdfrw right away; while the complexity in terms of Debian dependencies is roughly the same (“one over what you'll already have”), avoiding a subprocess call is almost always a win[4].

    But these misgivings are one reason why I wrote this post: This is a compact illustration of the old programmers' wisdom to “Plan to throw one away – you will anyway“. Except that for tiny little ad-hoc scripts like this, a bit of baroque adornment and an extra process do not hurt and the code above ought to work just fine if you need to produce a PDF document from some fixed page range of a few dozen or hundred other PDF documents.

    [1]Decided foolishly, by the way, as tr 0123456789 ABCDEFGHIJ immediately turns a sequence of distinct integers into a sequence of distinct uppercase-only strings.
    [2]I don't feel too good about being in the mainstream for a change, but I can prove that I'd have chosen python long before it became fashionable.
    [3]Not in Python, though, where 01 thankfully is a syntax error, and not neccessarily in C, where you may be surprised to see that, for instance, 077 works out to 63 decimal. I would rank this particular folly among the most questionable design decisions in the history of programming languages.
    [4]That, and my growing suspicion that “you'll already have a Java runtime on your box” is quickly becoming a rather daring assumption. Once the assumption is plain wrong, pdftk stops being a cheap dependency, as it will pull in a full JRE.
  • Saner Timestamps With DIT: In Pelican and Beyond

    The other day Randall Munroe posted XKCD 2867:

    This lament about time calculus struck me as something of a weird (pun alarm) synchronicity, as one evening or two before that I had written a few lines of flamboyant time-related code.

    Admittedly, I was neither concerned with “sin to ask” nor with „impossible to know“: Both are a consequence of the theory of relativity, which literally states that (against Newton) there is no absolute time and hence when two clocks are in two different places, even synchronising them once is deep science.

    Sold on Decimal Internet Time

    No, my coding was exclusively about the entirely unnecessary trouble of having to account for time zones, daylight savings time, factors of 60, 24, sometimes 30, 31, 29, or 28, and quite a few other entirely avoidable warts in our time notation. Civil time on Earth is not complicated because of physics. On human scales of time, space, velocity, gravitation, and precision, it is not particularly hard to define an absolute time even though it physically does not exist.

    Rather, civil time calculations are difficult because of the (pun alarm) Byzantine legacy from Babylon – base-60 and base-12, seven-day weeks, moon calendar – exacerbated by misguided attempts of patching that legacy up for the railway age (as in: starting in 1840, by and large done about 1920). On top of that, these patches don't work particularly well even for rail travel. I speak from recent experience in this particular matter.

    Against this backdrop I was almost instantly sold on DIT, the Decimal Internet Time apparently derived from a plan a person named Anarkat (the Mastodon link on the spec page is gone now) proposed: Basically, you divide the common day in what currently is the time zone UTC-12 into 10 parts and write the result in decimal. Call the integer part “Dek” and the first two digits after the dot “Sim”. That's a globally valid timestamp precise to about a (Babylonian) minute. For example, in central Europe what's now 14:30 (or 15:30 during daylight savings time; sigh!) would be 0.62 in DIT, and so would Babylonian 13:30 in the UK or 8:30 in Boston, Mass. This may look like a trivial simplification, but makes a universe of a difference in how much less painful time calculations become.

    I admit I'd much rather have based time keeping on the second (the SI unit of time), but I have to give Anarkat that the day is much more important in most people's lives than the second. Thus, their plan obviously is a lot saner for human use than any I would have come up with (“let's call the kilosecond kes and use that instead of an hour…”)[1].

    If you use pelican…

    Since I think that this would be a noticeably better world if we adopted DIT (clearly, in a grassrootsy step-by-step process), I'd like to do a bit of propaganda for it. Well, a tiny bit perhaps, but I am now giving the timestamps of the posts on this blog in StarDIT, which is an extension of DIT where you count the days in a (Gregorian, UTC-12) year and number the years from the “Holocene epoch”, which technically means “prepend a one to the Gregorian year number“ (in other words, add 10'000 to “AD”).

    Like DIT itself, with sufficient adoption StarDIT would make some people's lives significantly simpler, in this case in particular historians (no year 0 problem any more!). I would like that a lot, too, as all that talk about “Domini” doesn't quite cater to my enlightened tastes.

    How do I do produce the starDITs? Well, I first wrote a rather trivial extension for my blog engine, pelican, which adds an attribute starDIT to posts. You will find it as ditdate.py in my pelican plugins repo on codeberg. Activate it by copying the file into your blog's plugins directory and adding "ditdate" to the PLUGINS list in your pelicanconf.py. You can then use the new attribute in your templates. In mine, there is something like:

    <a href="http://blog.tfiu.de/mach-mit-bei-dit.html">DIT</a>
    <abbr class="StarDIT">{{ article.starDIT[:-4] }}</abbr>
    (<abbr class="date">{{ article.date.strftime("%Y-%m-%d") }}</abbr>)

    If you don't use pelican…

    I have also written a Python module to convert between datetimes and DITs which shows a Tkinter UI when called as a program:

    A small grey window on top of some bright background; sans-serif letters say 12023:351 (small) 1.08.5 (large).

    I have that on my desktop now. And since alarmingly many people these days use a web browser as their primary execution platform, I have also written some HTML/Javascript to have the DIT on a web page and its title (also hosted here).

    Both of these things are in my dit-py repo on codeberg, available under CC0: Do with them whatever you want. (Almost) anything furthering the the cause of DIT is – or so I think I have argued above – very likely progress overall.

    [1]If you speak German or trust automatic translation, I have a longer elaboration of DIT aspects I don't like in a previous blogpost.
  • Mach mit bei DIT

    [In case you're coming here from an English-language article, see here]

    A small grey window on top of some bright background; sans-serif letters say 12023:351 (small) 1.08.5 (large).

    Hier zeigt meine DIT-Uhr die Zeit (und das Datum) in meinem sawfish-Dock. Nein, das ist kein Startrek-Unfug. Ich hoffe stattdessen, dass etwas in dieser Art im Laufe der Zeit zum In-Accessoire werden wird: Wer keins hat, darf nicht mehr Digitalisierung sagen [nun: glücklicherweise hat niemand, der_die sowas wollen könnte, Mittel, mit denen so ein Verbot durchzusetzen wäre].

    Heraus aus der babylonischen Verwirrung!

    Es gibt nach 3000 Jahren nicht mehr allzu viele Gründe, sauer auf die großen KriegsherrInnen aus Babylon und ihre mesopotamischen KollegInnen zu sein. Mit dem babylonischen Klerus sieht das anders aus: Nicht nur sexagesimale Koordinaten etwa in der Astronomie geht auf ihn zurück, sondern auch all der krumme Kram mit Faktoren von 60 oder 24 oder 7, mit dem wir uns völlig ohne Not[1] immer noch in der Zeitrechnung herumschlagen.

    Keine Schuld haben die mesopotamischen PriesterInnen am Ärgernis Zeitzonen und dem damit zusammenhängenden Sommerzeit-Elend, aber ich wollte auch die schon ewig loswerden, nicht nur wie neulich aus Betroffenheit.

    So hat die Decimal Internet Time (DIT) mein Herz (fast) im Sturm genommen, ein Vorschlag, die Zeit durch Zehnteln des Tages zu notieren. Dieser Stundenersatz heißt Dek (von Dekatag) und entspricht fast zweieinhalb (nämlich 24/10) babylonischen Stunden.

    Selbst für sehr grobe Zeitangaben werden Deks in der Regel nicht reichen, weshalb sie in hundert Sims (von Decimal Minute) aufgeteilt werden. So ein Sim entspricht 86 Sekunden, ist also ziemlich nahe an einer babylonischen Minute. Das wäre wohl so die Einheit für Verabredungen: „Mittagessen um neun komma fünfundsiebzig“ oder meinetwegen „fünfungzwanzig vor null“, denn um die 100 Sekunden warten sollten für niemand ein Problem sein, und viel genauer fährt die Bahn nicht mal in der Schweiz. Aber weils Dezimal ist, wärs auch kein Problem, einfach nach den Zehnern aufzuhören: „Ich breche dann um 7.8 auf“, eine Angabe, die etwa auf eine Viertelstunde genau ist – sehr menschengemäß in meinem Buch.

    Ich finde das total plausibel; wenn euch das demgegenüber komisch vorkommt, ist das, ich muss es euch sagen, sehr parallel zur Abneigung von in imperialen Einheiten aufgewachsenen Leuten, etwas wie „ein Meter Fünfundachtzig“ zu sagen, wo doch „six foot two inches“ soo viel intuitiver ist.

    Um ein Gefühl für die Dezimalzeit zu bekommen, hätte ich folgende Kurzreferenz für BRD-Gewohnheiten anzubieten:

    DIT MEZ in Worten
    0 Mittag (13:00)
    1.5 Nachmittag (~16:30)
    2 Früher Abend (~18:00)
    3 Abend (20:00)
    4.5 Mitternacht
    6 Unchristliche Zeit (3:30)
    7.5 Morgen (7:00)
    9 Vormittag (10:30)

    Deseks: Vielleicht nicht so nützlich

    Weniger begeistert bin ich von der kleinsten Zeiteinheit von DIT, der Dezimalsekunde, Desek oder kurz Sek; das ist ein Tag/100'000, gegenüber einem Tag/86'400 bei der SI-Sekunde.

    Als SI-Taliban hätte ich die ganze dezimale Zeitrechnung ja ohnehin lieber auf die Sekunde aufgebaut und die Kilosekunde (ungefähr eine Viertelstunde) als Stundenersatz etabliert. Zwar gebe ich zu, dass die DIT-Wahl des Bezugs auf den Tag für menschliche Nutzung ein besserer Plan ist als die Kilosekunde (von der es 86.4 in einem Tag gibt, was eingestandenermaßen blöd ist).

    Aber für rein menschliche Nutzung (Verabredungen, Tagesplan, Fahrpläne…) spielen Zeiten im Sekundenbereich in der Regel keine Rolle, und so hätte ich die Deseks einfach rausgelassen und gesagt: Wers genauer braucht, soll zu den PhysikerInnen gehen und von denen die Sekunde nehmen. Dass ein Sim ziemlich genau aus 86.4 von diesen SI-Sekunden besteht, ist eher eine putzige Kuriosität als eine praktische Schwierigkeit, und jedenfalls nicht nennenswert lästiger als die 60 Sekunden, die eine babylonische Minute hat.

    Und nein, die physikalische Sekunde als Tag/100,000 umzudefinieren lohnt den Aufwand nicht; dafür ist die Erdrotation längst zu ungenau, und dann wir wollen ohnehin den Schaltsekunden-Unfug nicht mehr. Die Sekunde ist Physik, die braucht nichts mit menschlichen Zeiten zu tun zu haben. Insofern: Es wäre schöner, wenn es keine Desek gäbe, aber ich will auch nicht streiten.

    Good riddance, Zeitzonen

    Der neben der Nutzung des Dezimalsystems zweite große Fortschritt von DIT ist, dass sie auf der ganzen Welt einheitlich verläuft. Es gibt also in DIT keine Zeitzonen mehr.

    Mehr nebenbei ist das so gemacht, dass das babylonische 12 Uhr, die Mittagszeit bzw. 5 Deks in DIT, in der aktuellen UTC-12-Zeitzone (der „frühesten“, die es gibt), tatsächlich ungefähr mit der Kulmination der Sonne, also einer naiven Mittagsdefinition, zusammenfällt. Aber das spielt – im Gegensatz zum etwas antibritisch klingenden Sentiment in der DIT-Spec – eigentlich keine Rolle. Relevant ist nur, dass DIT-Uhren auf der ganzen Welt den gleichen Wert anzeigen. Ich darf meine Fantasie von neulich für DIT aktualisieren:

    Wäre es wirklich ein Problem, wenn Menschen, die in Kasachstan leben, 2 Deks für eine gute Zeit fürs Mittagessen halten würden und sich die Leute in New York eher so gegen siebeneinalb Deks über ihres hermachten? Ich wette, alle würden sich schnell dran gewöhnen. Es ist jedenfalls einfacher als das Sommerzeit-Mantra „spring forward, fall back“.

    Eingestanden: Wenn ich DIT entworfen hätte, hätte ich die auf die Referenz 12 babylonische Stunden entfernt von UTC verzichtet, denn alle anständigen Zeitstempel sind bereits jetzt in UTC. Wenn mensch für die DIT davon weggeht, verschränken sich Datum und Zeit bei der Umrechnung dieser anständigen Zeitstempel zu DIT – beim Übergang von babylonischer Zeit zu DIT kann sich also auch das Datum ändern.

    Das ist eine Komplikation, die keinen erkennbaren Nutzen hat; es ist eben kein Privileg, dass die Sonne um 5 Deks kulminiert, und so ist der Versuch albern, dabei möglichst wenige Menschen „zu bevorzugen“. Aber seis drum.

    Das Datum zur Zeit: StarDIT

    Insbesondere spielt das keine Rolle mehr, wenn mensch auch das Datum in DIT schreibt. Dazu gibt es eine Erweiterung von DIT zu größeren Zeiträumen hin, die im Vorschlag StarDIT genannt wird. Ob die Gesellschaft schon durchnerdet genug ist, um mit so einem Namen durchzukommen? Weiß nicht.

    An sich ist, wo wir schon bei Namen sind, ja auch das I, „Internet“, in DIT nicht so richtig seriös. Ich würde es vielleicht lieber als „International“ lesen – Internationalismus ist und bleibt einer der sympathischeren Ismen.

    Im StarDIT-Plan jedenfalls besteht das Datum aus (gregorianischem) Jahr zu einer leicht entchristlichten Epoche sowie der laufenden Tagesnummer innerhalb eines Jahres, mit einem Doppelpunkt getrennt, also für heute etwa 12023:350. Wer Wochen haben will, nimmt den Zehneranteil und schreibt ein x dahinter; aktuell haben wir also die Woche 35x.

    Zehntagewochen bergen ein wenig das Risiko, dass aus fünf Arbeitstagen acht werden; ein analoger Effekt hat schon dem Französischen Revolutionskalender (in meiner Geschichtserzählung) den Hals gebrochen. Aber wir müssen ja gerade sowieso über drastische Arbeitszeitverkürzung reden, um irgendwie die immer noch wachsende CO₂-Emission in den Griff zu kriegen. Da könnte der Übergang zu DIT durchaus mit einem Zwischenmodell mit weiterhin fünf Tagen Lohnarbeit, dafür dann auch fünf Tagen Selbstbestimmung („Wochenende“) zusammengehen – bevor die Lohnarbeit weiter abnimmt, natürlich.

    Putzig, wenn auch nicht allzu praktikabel für den Alltag, finde ich die DIT-Idee, die christliche Epoche (zu meinen eigenen Bedenken vgl. Fußnote 1 hier) durchs Holozän-Jahr zu ersetzen. Das ist wie das normale Gregorianische Jahr, nur dass die Zählung 9'999 vdCE anfängt (das heißt: Zählt einfach 10'000 zu ndCE-Jahren dazu).

    Es ist sicher prima, wenn die Leute nicht mehr durch Kennungen wie „v. Chr“ oder „n. Chr“ letztlich fromme Märchen verbreiten, und es ist auch großartig, wenn das Jahr-0-Problem (es gibt nämlich kein Jahr 0: die derzeitige Jahreszählung geht direkt von 1 v. zu 1 n., und drum ist auch die DIT-Referenzepoche etwas krumm) zumindest aus der post-mittelsteinzeitlichen Geschichtsschreibung komplett verschwindet. Ob jedoch das ein Deal ist, wenn mensch dafür mit einer Extraziffer in Jahreszahlen bezahlen muss? Fünf ist, haha, eben nicht zwingend Trümpf.

    Andererseits: Das StarDIT ist trivial aus der gewohnten Jahreszahl auszurechnen, und realistisch würden die Leute auch mit DIT im Alltag wohl weiterhin „Dreiundzwanzig“ oder „Zwanziger Jahre“ sagen und nicht „Zwölftausenddreiundzwanzig“ oder „zwölftausendzwanziger Jahre“. Insofern: Meinen Segen haben sie.

    Implementation: Python und Javascript

    Um nun mit gutem Beispiel voranzugehen, will ich selbst ein Gefühl für DIT bekommen. Dazu habe ich ein Python-Modul geschrieben, das Konversionen von Python-Datetimes von und nach DIT unterstützt. Das ist so wenig Code, dass ich lieber niemand verführen würde, das als Dependency zu importieren. Drum habe ich es auch nicht ins pyPI geschoben; guckt einfach in mein codeberg-Repo. Meine vorgeschlagene Vorgehensweise ist copy-paste (oder halt einfach das Modul in den eigenen Quellbaum packen).

    Das Modul funktioniert auch als Programm; legt es dazu einfach in euren Pfad und macht es ausführbar. Es zeigt dann eine DIT-Uhr in einem Tkinter-Fenster an. Ich habe das in meinen Sawfish-Dock aufgenommen – siehe das Eingangsbild.

    Ich habe außerdem noch ein Stück Javascript geschrieben, das DITs ausrechnen und anzeigen kann. Es ist eingebettet in der Datei dit.html im Repo oder unter https://blog.tfiu.de/media/2023/dit.html erreichbar. Menschen, die (ganz anders als ich) breit Tabs in ihren Browsern nutzen, können die Webseite öffenen und haben mit etwas Glück (wenn der Browser nämlich das Javascript auch …

  • Another Bookworm Regression: D-bus, X11 Displays, purple-remote, Oh My!

    When I reported on what broke when I upgraded to Debian bookworm, I overlooked that my jabber presence management (where I'm offline at night and on weekends) no longer worked. Figuring out why and fixing it was a dive into D-Bus and X11 that may read like a noir detective novel, at least if you are somewhat weird. Let me write it up for your entertainment and perhaps erudition.

    First off, against the March post, I have migrated to pidgin as my XMPP (“jabber”) client; at its core, presence management still involves a script in /etc/network/if-*.d where I used to call something like:

    su $DESKTOP_USER -c "DISPLAY=:0 purple-remote getstatus"

    whenever a sufficiently internetty network interface went up or down, where DESKTOP_USER contains the name under which I'm running my X session (see below for the whole script with the actual presence-changing commands).

    Purple-remote needs to run as me because it should use my secrets rather than root's. But it was the DISPLAY=:0 thing that told purple-remote how to connect to the pidgin instance to interrogate and control. As most boxes today, mine is basically a single-user machine (at least as far as “in front of the screen” goes), and hence guessing the “primary” X display is simple and safe.

    Between X11 and the D-Bus

    That purple-remote needed the DISPLAY environment variable was actually almost a distraction from the start. There are many ways for Unix programs to talk to each other, and DISPLAY might have pointed towards 1980ies-style X11 inter-client communication. But no, the purple-remote man page alreads says:

    This program uses DBus to communicate with Pidgin/Finch.

    Correctly spelled D-Bus, this is one of the less gruesome things to come out of the freedesktop.org cauldron, although it is still riddled with unnecessarily long strings, unnecessarily deep hierarchies, and perhaps even unnecessary use of XML (though I feel sympathies in particular for that last point).

    But that's not what this post is about. I'm writing this because after upgrading to Debian bookworm, purple-remote no longer worked when used from my if-up.d script. Executing the command in a root shell (simulating how it would be called from ifupdown) showed this:

    # DESKTOP_USER=anselm su $DESKTOP_USER -c "DISPLAY=:0 purple-remote getstatus"
    No existing libpurple instance detected.

    A quick glance at the D-Bus Specification gives a hint at how this must have worked: dbus-launch – which is usually started by your desktop environment, and my case by a:

    export $(dbus-launch --exit-with-x11)

    in ~/.xinitrc – connects to the X server and leaves a “property” (something like a typed environment variable attached to an X11 window) named _DBUS_SESSION_BUS_ADDRESS in, ah… for sure the X server's root window [careful: read on before believing this]. As the property's value, a D-Bus client would find a path like:


    and it could open that socket to talk to all other D-Bus clients started within the X session.

    Via apropos to xprop to Nowhere

    So… Does that property exist in the running X server? Hm. Can I figure that out without resorting to C programming? Let's ask the man page system:

    $ apropos property
    [..lots of junk...]
    xprop (1)            - property displayer for X

    Typing in man xprop told me I was on the right track:

    $ man xprop
         xprop  […] [format [dformat] atom]*
      The xprop utility is for displaying window and font properties in an
      X server.
      -root   This argument specifies that X's root window is the target win‐
              dow.   This  is  useful  in situations where the root window is
              completely obscured.

    So, let's see:

    $ xprop -root _DBUS_SESSION_BUS_ADDRESS
    _DBUS_SESSION_BUS_ADDRESS:  not found.

    Hu? Has dbus-launch stopped setting the property? Let's inspect Debian's change log; a major change like that would have to be noted there, wouldn't it? Let's first figure out which package to look at; the documentation then is in /usr/share/doc/<packagename>:

    $ dpkg -S dbus-launch
    dbus-x11: /usr/bin/dbus-launch
    $ zless /usr/share/doc/dbus-x11/changelog.Debian.gz

    Looking for “property” or “BUS_ADDRESS” in there doesn't yield anything; that would make it unlikely that the property was somehow dropped intentionally. I have to admit I had halfway expected that, with something like “for security reasons”. But then if someone can read your root window's properties, access to your session bus is probably the least of your problems.

    Still, perhaps someone is slowly dismantling X11 support on grounds that X11 is kinda uncool? Indeed, you can build dbus-launch without X11 support. If the Debian maintainers built it that way, the respective strings should be missing in the binary, but:

    $ strings `which dbus-launch` | grep _DBUS_SESSION

    No, that's looking good; dbus-launch should still set the properties.

    Skimming the Docs is Not Reading the Docs.

    If I did not see the property a moment ago, perhaps I have used xprop the wrong way? Well, actually: I didn't read the D-Bus spec properly, because what it really says is this:

    For the X Windowing System, the application must locate the window owner of the selection represented by the atom formed by concatenating:

    • the literal string "_DBUS_SESSION_BUS_SELECTION_"
    • the current user's username
    • the literal character '_' (underscore)
    • the machine's ID

    – and then find the _DBUS_SESSION_BUS_PID on the window owning that selection. The root window thing was my own fantasy.

    If you bothered to skim the ICCCM document I linked to above, you may recognise the pattern: that's just conventional X inter-client communication – no wonder everyone prefers D-Bus.

    This is beyond what I'd like to do in the shell (though I wouldn't be surprised if xdotool had a hack to make that feasible). I can at least establish that dbus-launch still produces what the spec is talking about, because the “atoms” – a sort of well-known string within the X server and as a concept probably part of why folks are trying to replace X11 with Wayland – are all there:

    $ xlsatoms | grep DBUS
    488   _DBUS_SESSION_BUS_SELECTION_anselm_d162...

    The Next Suspect: libdbus

    Given that, dbus-launch clearly is exonerated as the thing that broke. The next possible culprit is purple-remote. It turns out that's a python program:

    $ grep -i dbus `which purple-remote`
    import dbus
        obj = dbus.SessionBus().get_object("im.pidgin.purple.PurpleService", "/im/pidgin/purple/PurpleObject")
    purple = dbus.Interface(obj, "im.pidgin.purple.PurpleInterface")
                data = dbus.Interface(obj, "org.freedesktop.DBus.Introspectable").\

    So, this is using the python dbus module. Let's see if its changelog says anything about dropping X11 support:

    $ zless /usr/share/doc/python3-dbus/changelog.Debian.gz

    Again, nothing for X11, property, or anything like that. Perhaps we should have a brief look at the code:

    $ cd /some/place/for/source
    $ apt-get source python3-dbus
    dpkg-source: info: extracting dbus-python in dbus-python-1.3.2
    $ cd dbus-python-1.3.2/

    You will see that the python source is in a subdirectory called dbus. Let's see if that talks about our property name:

    $ find . -name "*.py" | xargs grep _DBUS_SESSION_BUS_ADDRESS

    No[1]. Interestingly, there's no mention of X11 either. Digging a bit deeper, however, I found a C module dbus_bindings next to the python code in dbus. While it does not contain promising strings (X11, property, SESSION_BUS…) either, that lack made me really suspicious, since at least the environment variable name should really be visible in the source. The answer is in the package's README: “In addition, it uses libdbus” – so, that's where the connection is being made?

    Another Red Herring

    That's a fairly safe bet. Let's make sure we didn't miss something in the libdbus changelog:

    $ zless /usr/share/doc/libdbus-1-3/changelog.Debian.gz

    You will have a déjà-vu if you had a look at dbus-x11's changelog above: the two packages are built from the same source and hence share a Debian changelog. Anyway, again there are no suspicious entries. On the contrary: An entry from September 2023 (red-hot by Debian stable standards!) says:

    dbus-user-session: Copy XDG_CURRENT_DESKTOP to activation environment. Previously this was only done if dbus-x11 was installed. This is needed by various freedesktop.org specifications…

    I can't say I understand much of what this says, but it definitely doesn't look as if they had given up on X11 just yet. But does that library still contain the property names?

    $ dpkg -L libdbus-1-3
    $ strings /lib/i386-linux-gnu/libdbus-1.so.3 | grep SESSION_BUS

    No, it doesn't. That's looking like a trace of evidence: the name of the environment variable is found, but there's nothing said of the X11 property. If libdbus evaluated that property, it would stand to reason that it would embed its name somewhere (though admittedly there are about 1000 tricks with which it would still do the right thing without the literal string in its binary).

    Regrettably, that's another red herring. Checking the libdbus from the package in bullseye (i.e., the Debian version before bookworm) does not yield the property …

  • Unerwartete Konsequenzen am Klo

    Ich bin ein großer Fan von Geschichten, in denen Leute etwas tun, das über ein paar verwinkelte Ecken noch was ganz anderes bewirkt, das zumindest nicht offensichtlich erwartbar ist, so etwa wie bei den Akazien, die verkümmerten, weil sie mit Elektrozäunen vor Elefanten geschützt wurden.

    Ganz erheblich profaner ist die Geschichte rund um die Handtuchspender bei uns am Institut. Bis April diesen Jahres hatten wir dazu Geräte, die mit einem recht raffinierten Mechanismus sehr lange waschbare Stofftücher abrollten. Das war einerseits prima, weil die Hände so wirklich trocken wurden und nicht viel Papier nach einfachem Gebrauch weggeworfen wurde. Andererseits wurde jedes Handtuch-Äquivalent effektiv nur ein Mal benutzt, so dass die Rollen häufig gewechselt werden mussten. Mit Logistik, Waschen und Trocknen wird die Ökobilanz der Stofftücher vermutlich nicht nennenswert besser gewesen sein als die der üblicheren Papierhandtücher.

    Um die Ökobilanz am CO₂-Fußabdruck einzuordnen: Händetrocknen ist eines der Paradebeispiele von Mike Berners-Lee[1] für Handlungsweisen, die viele Menschen für wichtig halten, die aber für ihren tatsächlichen Fußabdruck fast keine Rolle spielen. Für ein Mal Händetrocknen schätzt er je nach Methode:

    3 g CO2e Dyson Airblade [obwohl so eine Airblade eher zu den Geräten gehören wird, bei denen der Herstellungsaufwand relativ zum Energieverbrauch während des Betriebs verschwindet, hätte ich gerade bei der Hi-Tech-Lösung gerne was zu vergegenständlichten Emissionen gelesen]

    10 g CO2e one paper towel

    20 g CO2e standard electric drier

    Wenn ich im Jahr 600 Mal meine Hände im Institut abtrockne (also rund drei Mal pro Arbeitstag), ist das so oder so schlimmstenfalls das Äquivalent von einem Kilo Käse (das Berners-Lee auf 12 kg CO₂e schätzt). Mit der Kopfzahl der BRD-Emission (2/3 Gt/a) ist zum Vergleich der mittlere Footprint pro Einwohner auf etwas wie 8000 kg abzuschätzen. Die Handtücher sind also für Normalos weit unterhalb von einem Promille, und selbst Ökos müssten sich hier im Land schon ganz schön bösen chronischen Durchfall zuziehen, um mit betrieblichem Händetrocknen auch nur auf ein halbes Promille ihres Fußabdrucks zu kommen.

    Wie auch immer: die schönen Stoffhandtuchspender sind inzwischen verschwunden, vielleicht aus Kostengründen, vielleicht, weil der Lieferant sie nicht mehr anbietet, vielleicht wirklich, weil sie alles in allem eher beim Fußabdruck von Berner-Lees „standard electric dryer“ rausgekommen sind. Stattdessen hat die Uni ziemlich flächendeckend das hier beschafft:

    Ein Papierhandttuchspender mit einer aufgeklebten zweisprachigen Nachricht: „Wegen Verstopfungsgefahr: Nur Toilettenpapier KEIN Handtuchpapier und ähnliches herunterspülen! Danke!

    Neue Handtuchspender an der Uni Heidelberg mit liebevollem Denglisch. Ich weiß nicht, wer es geschrieben hat, aber „constipation“ (Verstopfung im Darm, statt clogging) und „Wash down“ (ein Getränk kippen, statt flush) riecht durchaus nach subtilem Insider-Humor mit Fäkalhintergrund. Ich vermute Kommunikationsguerilla.

    Wie der aufgeklebte Zettel schon vermuten lässt, hatte der ökologisch vielleicht zweitrangige Schritt ernsthafte und jedenfalls von mir unerwartete Konsequenzen an anderer Stelle: Keinen Monat nach der Abschaffung der Stoffrollen drückte es übelriechendes Abwasser aus der Toilette im Erdgeschoss, und für 24 Stunden musste, wer musste oder Teewasser wollte, ins Nebengebäude gehen.

    Diagnose: heruntergespülte Papierhandtücher hatten das Abwasserrohr komplett dicht gemacht. Mit den alten Stoffhandtüchern wäre das nicht passiert. Also: Es ist nicht passiert.

    Ich habe übrigens auch eine Konsequenz gezogen, selbst wenn mein CO₂-Fußabdruck leider so oder so praktisch unbeeinflusst ist. Dafür liefert meine Konsequenz jede Menge Hitchhiker-Bonuspunkte. Ich habe nämlich ein privates Handtuch ins Büro gehängt, das ich nun viele Male verwende und dazu jeweils zum Klo trage („every day is Towel Day“). Weil ich das Handtuch schon hatte, trockne ich meine Hände damit fast CO₂-frei.

    Nur: wo soll in einem Büro ein Haken zum Aufhängen von Handtüchern herkommen? Nun, ich hatte irgendwann während einer schrecklich langweiligen Telecon eine alte Festplatte aus der SATA-Ära zerlegt. Schmeißt die Dinger auf keinen Fall weg, bevor ihr die Magnete der Schrittmotoren erbeutet habt, denn die sind großartig. Zum Beispiel können sie zusammen mit einer Büroklammer und einem Heizkörper einen prima Handtuchhaken machen, der mir bereits seit zwei Monaten taugt:

    Foto des im Fließtext beschriebenen Murkses.
    [1]Berners-Lee, M. (2011): How Bad Are Bananas, Vancouver: Greystone, ISBN 978-1-55365-832-0
  • Taming an LTE card in Linux

    When I wrote my request for help on how to do voice calls with a PCI-attached cellular modem I realised that writing a bit about how I'm dealing with the IP part of that thing might perhaps be mildly entertaining to some subset of the computer-literate public. After all, we are dealing with rather serious privacy issues here. So, let's start with these:

    Controlling registration

    Just like almost any other piece of mobile phone tech, an LTE card with a SIM inserted will by default try to register with the network operator's infrastructure when it is switched on (or resumed, more likely, in the case of a notebook part). If this is successful, it will create a data point in the logs there, which in turn will be stored for a few days or, depending on the human rights situation in the current jurisdiction (as in: is telecom data retention in effect?), for up to two years. This record links you with a time (at which you used your computer) and a location (at which you presumably were at that point). That's fairly sensitive data by any measure.

    So: You don't want to create these records unless you really want network. But how do you avoid registration? There are various possible ways, but I found the simplest and probably most robust one is to use Linux's rfkill framework, which is in effect a refined version of airline mode. To make that convenient, I am defining two aliases:

    alias fon="sudo rfkill unblock wwan"
    alias keinfon="sudo rfkill block wwan"

    (“keinfon“ is “no phone“ in German); put these into your .bashrc or perhaps into .aliases if your .bashrc includes that file.

    Since I consider rfkill relatively a relatively unlikely target for privilege escalation, I have added , NOPASSWD: usr/sbin/rfkill to my system user's line in /etc/sudoers.d, but that's of course optional.

    With that, when I want to use internet over LTE, I type fon, wait a few seconds for the registration to happen and then bring up the interface. When done, I bring down the interface and say keinfon. It would probably be more polite to the service operators if I de-registered from the infrastructure before that, but for all I can see only marginally so; they'll notice you're gone at the next PLU. I don't think there are major privacy implications either way.

    It might be wiser to do the block/unblock routine in pre-up and post-down scripts in /etc/network/interfaces, but since registration is slow and I rather regularly find myself reconnecting while on the cell network, I'd consider that over-automation. And, of course, I still hope that one day I can do GSM voice calls over the interface, in which case the card shouldn't be blocked just because nobody is using it for an internet connection.

    Phone Status

    In case I forget the keinfon, I want to be warned about my gear leaking all the data to o2 (my network operator). I hence wrote a shell script display-phone-status.sh like this:

    if /usr/sbin/rfkill list | grep -A3 "Wireless WAN" | grep 'blocked: yes' > /dev/null; then
      echo "WWAN blocked."
      /usr/games/xcowsay -t 10 -f "Steve Italic 42" --at 0,520 --image ~/misc/my-icons/telephone.xpm 'Ich petze gerade!'

    The notification you'll want to change, for instance because you won't have the nice icon and may not find the font appropriate. The German in there means ”I'm squealing on you.“. Here's how this works out:

    Screenshot: an old-style telephone with a baloon saying „Ich petze gerade“

    I execute that at every wakeup, which is a bit tricky because xcowsay needs to know the display. If you still run pm-utils and are curious how I'm doing that, poke me and I'll write a post.


    Mainly because tooling for MBIM and other more direct access methods felt fairly painful last time I looked, I am still connecting through PPP, even though that's extremely silly over an IP medium like LTE. Part of the reason I'm writing this post is because duckduckgo currently returns nothing useful if you look for “o2 connection string“ or something like that. I tried yesterday because surprisingly while the internet connection worked over GSM, when connected over LTE (see below on how I'm controlling that), executing the good old:

    AT+CGDCONT=1, "IPV4V6", "internet"

    would get me an ERROR. That command – basically specifying the protocol requested and the name of an „access point“ (and no, I have never even tried to figure out what role that „access point“ might have had even in GSM) – admittedly seems particularly silly in LTE, where you essentially have an internet connection right from the start. I'm pretty sure it didn't use to hurt LTE connections three years ago, though. Now it does, and so that's my chat script for o2 (put it into /etc/ppp/chat-o2 with the peer definition below):

    IMEOUT 5
    '' "ATZ"
    OK 'ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0'
    OK "\d\dATD*99#"
    CONNECT ""

    You can probably do without almost all of this and just run ATD*99# if you're stingy; but over the past 15 years of using cellular modems in one way or another, each piece of configuration was useful at one time. I'm not claiming they are now.

    Similarly, my /etc/ppp/peers/o2 configuration file might contain a bit of cruft:

    remotename any
    user thing
    connect "/usr/sbin/chat -v -f /etc/ppp/chat-o2"
    lcp-echo-interval 300
    lcp-echo-failure 10

    I'd expect the liberal LCP configuration at the bottom of the file is still very beneficial in the o2 network.

    To manage the network, I use normal Debian ifupdown with this stanza in /etc/network/interfaces:

    iface o2 inet ppp
      provider o2

    To bring up the interface, I have an icon on my desktop that executes sudo ifup o2.


    To see what's going through a network connection, I have a script monitor in /etc/network/if-up.d; this is unconditionally executed once an interface comes up. A case statement brings up wmnet instances with parameters somewhat adapted to the respective interfaces:

    case $IFACE in
    wlan* )
      su - anselm -c 'DISPLAY=:0 nohup wmwave -r 200' > /dev/null 2>&1 &
      su - anselm -c "DISPLAY=:0 nohup wmnet -l -x 1000000 -d 200000 -r green -t red -W $IFACE" > /dev/null 2>&1 &
      su - anselm -c "DISPLAY=:0 nohup wmnet -l -x 1000000 -d 200000 -r green -t red -W $IFACE" > /dev/null 2>&1 &
    o2 | n900)
      su - anselm -c "DISPLAY=:0 nohup wmnet -l -x 1000000 -d 200000 -r green -t red -W ppp0" > /dev/null 2>&1 &

    The complicated su logic is necessary because again, the little window maker dockapps need access to the X display.

    That whole part is a bit weak, not only because of the hard-coded user name and DISPLAY (these are fairly safe bets for a personal computer) and because it relies some configuration of your window manager to place the dockapps at predictable positions.

    More importantly, ifupdown executes the script too early: To ifupdown, the interface is up when the pppd is up. But it is only then that pppd starts to negotiate, and these negotiations fail quite easily (e.g., when you're in a dead zone, and there are plenty of those with o2). If that happens, you have an essentially dead wmnet on the desktop. I clean up rather unspecifically in /etc/network/if-down.d/monitor:

    case $IFACE in
    wlan* )
      killall wmwave
      killall wmnet
      killall wmnet
    exit 0

    The implicit assumption here that the computer will only have one wireless network connection at a time.

    Modem Configuration

    I used to have to do a bit of modem configuration in the olden days. It's rarer these days, but I thought I might as well publish the source of a program, I wrote back then to encapsulate that configuration. I still find it is useful now and then to choose between the access methods LTE (fast, but perhaps smaller cells hence less stable) and GSM (slow, but perhaps more robust with larger cells and better coverage), which this script can do if your card supports the AT+XACT command. While I would guess that includes many Sierra modems, I have no idea how many that may be. Anyway, here's how that ought to look like (and perhaps the most relevant piece of information is the <home>, which means there's an infrastructure connection – as opposed to, for instance, <offline>):

    $ modemconfig.py -a LTE
    Modem is >home<
    Using LTE
    Running on band BAND_LTE_1

    If you think you might find this kind of thing useful: It's on https://codeberg.org/AnselmF/sierra-config, and it's trivial to install.

  • Help wanted: PCM telephony on a Sierra EM7345

    For fairly silly reasons I would like to do voice calls using a Sierra Wireless EM7345 4G LTE wireless modem built into a Lenovo Thinkpad X240. However, I am stuck between a lack of documentation and the horrors of proprietary firmware blobs to the extent that I am even unsure whether that's possible without reprogramming the whole device. If you can help me, I'd appreciate any sort of hand-holding. If you think you know someone who might be able to help me, I'd appreciate if you pointed them to this post.

    The analog stuff

    What works nicely is calling voice numbers. After a minicom -D /dev/ttyACM0, I can do:

    AT DT 062210000000
    AT DT 062210000000;

    The first command is attempting a data connection that fails because it's a real telephone at the other end. The semicolon in the second command says “do voice“. It actually makes the remote phone ring a few seconds after the modem said OK. I can pick up the call on that remote phone, too, but fairly unsurprisingly there is just silence at the computer and, and whatever I say at either end goes nowhere. The eventual NO CARRIER is when I hang up the phone.

    The other way round works as well: seeing the good old RING and typing ATA like in the good old days warmed my heart. Hanging up with an ATH was fun, too. But when no sound is being transported through the Sierra card, these games quickly become boring.

    As usual for entities like Sierra, they don't give their documentation to just anyone (as in „me”). I still happen to have a PDF titled „MP 700 Series GPS Rugged Wireless Modem AT Command Reference”, which pertains to some different but reasonably similar device. There, it says:

    If your account allows for it, you can attach a headset to your modem and use it as a mobile phone. You require a 4-wire headset with a 2.5 mm connector, to use your modem as a phone. (This plugs into the Audio connector on the back of the modem. You may need an extension cable if the modem is installed in the trunk. Contact your service provider to determine what extension cables are supported.)

    Well… The small EM 7345 certainly does not have a 2.5 mm connector. There aren't even soldering pads visible without peeling off stickers:

    Photo of the interior of a computer with some small extension cards.  One of them has a big sticker on it saying it's a Sierra device.

    The Sierra modem as fitted into a Lenovo X240: Certainly no 2.5 mm connectors visible.

    There is also no trace of the Sierra card in the ALSA mixer, and neither is there an ALSA card they could have put in as a USB audio device. Hence, at this point I believe getting out some sort of analog-ish audio is unrealistic.

    Go digital

    However, what if I could pull PCM bytes or perhaps GSM-encoded audio from the device in some way? A thread in the Sierra forum seems to indicate it could work but then trails off into mumbling about firmware versions. Some further mindless typing into search engines suggested to me that the a “version 6“ of the firmware should be able to do PCM voice (in some way not discussed in useful detail). Version 6 sounds a bit menacing to me in that my device says:


    I faintly remember having once tried to update the firmware and eventually giving up after some quality time with WINE. On that background, skipping five major versions sounds particularly daring („the Evel Knievel upgrade: what could possibly go wrong except 40 to 50 broken bones“). But then Sierra's support page doesn't even acknowledge the 7345's existence any more.

    While Sierra itself does not give its documentation to the unwashed masses, on some more or less shady page I found documentation on the AT commands of one of its the successors, the EM7355. That appears to have a lot of PCM-related commands. In particular:

    Note: To enable audio on an audio-capable device, use the “ISVOICEN” customization for AT!CUSTOM (see page 32 for details).

    Regrettably, on my box:


    Actually, it would seem that none of the various Sierra-proprietary AT commands starting with a bang are present in my firmware.

    That's where I stand. Does anyone have deeper insights into whether I could have GSM voice calls on that board without reverse-engineering the whole firmware?

    A tale of two cards

    In case you are wondering why I would even want to do GSM telephony with my computer… Well, I have a 4.99 Euro/month 1 GB+telephony flatrate with Winsim (turn off Javascript to avoid their broken cookie banner). While I can recommend Winsim for a telephone support far better than you'd expect for that price (of course: the network coverage isn't great, it's just a Telefonica reseller, and forget about using the e-mail support), they'll charge you another five Euro or so monthly for a second SIM card in that plan, whereas you can get a SIM card for free if you get a second pre-payed contract.

    I'm not sure what reasoning is behind two contracts with two cards being cheaper than one contract with two cards, but then telephony prices stopped making any sense a long time ago.

    Since my phone can only do UMTS and GSM (i.e., only GSM these days in Germany) and I have the LTE modem inside the computer anyway, I recently transferred the SIM with the flatrate into the LTE modem so my garden office has a faster internet connection than when I'm using the phone as a modem. Consequenly, I now have another (pre-paid) card in the phone. The net effect is that I could do telephone calls for free on the computer if I could just figure out the audio part – whereas naive VoIP doesn't really work in much of the network because of packet loss, latencies, low bandwitdth and so on – and I pay 9 ct per minute for GSM telephony on the phone.

    I give you that's probably not a sufficient reason to sink hours of research into the stupid Sierra card. But I'd also have the BIGGEST PHONE ON THE WHOLE TRAIN if I just could pull it off!

    Nachtrag (2023-06-21)

    Well, on the “new firmware“ part, I found https://lists.freedesktop.org/archives/libqmi-devel/2018-August/002951.html. And oh my, of course Intel don't publish the sources to their firmware flash thingy. That's extremely bad for me because they don't bother to build i386 binaries and now I have to dual-arch in half a linux system:

    ldd /opt/intel/platformflashtoollite/bin/platformflashtoollite
            linux-vdso.so.1 (0x00007ffcf43ed000)
            libdldrapi.so => /opt/intel/platformflashtoollite/lib/libdldrapi.so (0x00007f61d4000000)
            libCore.so => /opt/intel/platformflashtoollite/lib/libCore.so (0x00007f61d3c00000)
            libNetwork.so => /opt/intel/platformflashtoollite/lib/libNetwork.so (0x00007f61d3800000)
            libDeviceManager.so => /opt/intel/platformflashtoollite/lib/libDeviceManager.so (0x00007f61d3400000)
            libLogger.so => /opt/intel/platformflashtoollite/lib/libLogger.so (0x00007f61d3000000)
            libJson.so => /opt/intel/platformflashtoollite/lib/libJson.so (0x00007f61d2c00000)
            libDldrManager.so => /opt/intel/platformflashtoollite/lib/libDldrManager.so (0x00007f61d2800000)
            libUtilityWidgets.so => /opt/intel/platformflashtoollite/lib/libUtilityWidgets.so (0x00007f61d2400000)
            libQt5Xml.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Xml.so.5 (0x00007f61d2000000)
            libQt5Widgets.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Widgets.so.5 (0x00007f61d1600000)
            libQt5Gui.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Gui.so.5 (0x00007f61d0c00000)
            libQt5Network.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Network.so.5 (0x00007f61d0800000)
            libQt5Script.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Script.so.5 (0x00007f61d0200000)
            libxfstk-dldr-api.so => /opt/intel/platformflashtoollite/lib/libxfstk-dldr-api.so (0x00007f61cfe00000)
            libPlatformUtils.so => /opt/intel/platformflashtoollite/lib/libPlatformUtils.so (0x00007f61cfa00000)
            libQt5Core.so.5 => /opt/intel/platformflashtoollite/lib/libQt5Core.so.5 (0x00007f61cf200000)
            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f61d3ebc000)
            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f61d4530000)
            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f61d322c000)
            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f61d450c000)
            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f61d4502000)
            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f61d44fc000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f61d457b000)
            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f61d2e33000)
            libUSBScan.so => /opt/intel/platformflashtoollite/lib/libUSBScan.so (0x00007f61cee00000)
            libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007f61d44a0000)
            libgthread-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgthread-2.0.so.0 (0x00007f61d449b000)
            libglib-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007f61d3ad1000)
            libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f61d4486000)
            libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f61d36bd000)
            libGL.so.1 => /usr/lib/x86_64-linux-gnu/libGL.so.1 (0x00007f61d3e35000)
            libusb-0.1.so.4 => /lib/x86_64-linux-gnu/libusb-0.1.so.4 (0x00007f61cea00000)
            libboost_program_options.so.1.46.1 => /opt/intel/platformflashtoollite/lib/libboost_program_options.so.1.46.1 (0x00007f61ce600000)
            libicui18n.so.54 => /opt/intel/platformflashtoollite/lib/libicui18n.so.54 (0x00007f61ce000000)
            libicuuc.so.54 => /opt/intel/platformflashtoollite/lib/libicuuc.so.54 (0x00007f61cdc00000)
            libicudata.so.54 => /opt/intel/platformflashtoollite/lib/libicudata.so.54 (0x00007f61cc000000)
            libudev.so.0 => /usr/lib/x86_64-linux-gnu/libudev.so.0 (0x00007f61d447d000)
            libffi.so.7 => /usr/lib/x86_64-linux-gnu/libffi.so.7 (0x00007f61d4471000)
            libpcre.so.3 …
  • Feedback and Addenda in Pelican Posts

    Screenshot: a (relatively) rude comment and a reply, vaguely reminiscent of classic slashdot style.

    Blog comments may be dead out there; here, I'd like to at least pretend they're still alive, and thus I've written a pelican plugin to properly mark them up.

    When I added a feedback form to this site about a year ago, I also created a small ReStructuredText (RST) extension for putting feedback items into the files I feed to my blog engine Pelican. The extension has been sitting in my pelican plugins repo on codeberg since then, but because there has not been a lot of feedback on either it or the posts here (sigh!), that was about it.

    But occasionally a few interesting (or at least provocative) pieces of feedback did come in, and I thought it's a pity that basically nobody will notice them[1] or, (cough) much worse, my witty replies.

    At the same time, I had quite a few addenda to older articles, and I felt some proper markup for them (plus better chances for people to notice they're there) would be nice. After a bit of consideration, I figured the use cases are similar enough, and I started extending the feedback plugin to cover addenda, too. So, you can pull the updated plugin from codeberg now. People running it on their sites would certainly be encouragement to add it to the upstream's plugin collection (after some polishing, that is).

    Usage is simple – after copying the file to your plugins folder and adding "rstfeedback" to PLUGINS in pelicanconf.py, you write:

    .. feedback::
        :author: someone or other
        :date: 2022-03-07
        Example, yadda.

    for some feedback you got (you can nest these for replies) or:

    .. addition::
      :date: 2022-03-07
      Example, yadda.

    for some addition you want to make to an article; always put in a date in ISO format.

    In both cases a structured div element is generated in the HTML, which you can style in some way; the module comment shows how to get what's shown in the opening figure.

    The extension also adds a template variable LAST_FEEDBACK_ITEMS containing a list of the last ten changes to old posts. Each item is an instance of some ad-hoc class with attributes url, kind (feedback or addendum), the article title, and the date. On this blog, I'm currently formatting it like this in my base template:

    <h2>Letzte Ergänzungen</h2>
    <ul class="feedback">
    {% for feedback_item in LAST_FEEDBACK_ITEMS %}
            <li><a href="{{ SITEURL }}/{{ feedback_item.url }}">{{ feedback_item.kind }} zu „{{ feedback_item.title }}“</a> ({{ feedback_item.date }})</li>
    {% endfor %}

    As of this post, this block is at the very bottom of the page, but I plan to give it a more prominent place at least on wide displays real soon now. Let's see when I feel like a bit of CSS hackery.


    First of all, I have not localised the plugin, and for now it generates German strings for “Kommentar” (comment), “Nachtrag” (addendum) and “am” (on). This is relatively easy to fix, in particular because I can tell an article's language from within the extension from the article metadata. True, that doesn't help for infrastructure pages, but then these won't have additions anyway. If this found a single additional user, I'd happily put in support for your preferred language(s) – I should really be doing English for this one.

    This will only work with pages written in ReStructuredText; no markdown here, sorry. Since in my book RST is so much nicer and better defined than markdown and at the same time so easy to learn, I can't really see much of a reason to put in the extra effort. Also, legacy markdown content can be converted to RST using pandoc reasonably well.

    If you don't give a slug in your article's metadata, the plugin uses the post's title to generate a slug like pelican itself does by default. If you changed that default, the links in the LAST_FEEDBACK_ITEMS will be wrong. This is probably easy to fix, but I'd have to read a bit more of pelican's code to do it.

    I suppose the number of recent items – now hardcoded to be 10 – should become a configuration variable, which again ought to be easy to do. A more interesting (but also more involved) additional feature could be to have per-year (say) collections of such additions. Let's say I'm thinking about it.

    Oh, and error handling sucks. That would actually be the first thing I'd tackle if other people took the rstfeedback plugin up. So… If you'd like to have these or similar things in your Pelican – don't hesitate to use the feedback form (or even better your mail client) to make me add some finish to the code.

    [1]I made nginx write logs (without IP addresses, of course) for a while recently, and the result was that there's about a dozen human visitors a day here, mostly looking at rather recent articles, and so chances are really low anyone will ever see comments on old articles without some extra exposure.
  • Ach Bahn, Teil 12: „Digitales“ 49-Euro-Ticket

    Foto eines altmodischen Telefons mit einem anonymisierten Barcode im Display

    Das Happy End dieses Artikels: Ich habe das 49-Euro-Ticket auf Rechnern unter meiner Kontrolle (neben dem N900 im Bild auch noch auf meinem ordentlichen Computer).

    Ich habe mir ein 49-Euro-Ticket von der Bahn gekauft. Ich hätte das, der Kritik von freiheitsfoo folgend, besser nicht tun sollen, aber das 9-Euro-Ticket hat mir viel Spaß gemacht, und monatliche Kündigung und so… da habe ich verdrängt, dass Wissing von „digital“ geredet hat, was ja bei weniger EDV-affinen Menschen in der Regel heißt: „Ist in meinem Handy“ bzw. „Google macht das für mich“ (also für mich: „Vergiss es“). Da aber eine Bahn-FAQ erklärte, wie mensch das „Ticket in die App“ bekommt, war mein Umkehrschluss, dass das Ticket erstmal nicht in der „App“ ist und also für mich verwendbar. Wegen dieses Fehlschlusses bekam die Bahn meine 49 Euro und ich einen Haufen Ärger.

    Denn nach der Bezahlung kam aber nicht wie gewohnt ein PDF mit dem QR-Code – was für die Bahn wirklich kein Problem wäre –, sondern ein dämlicher Text, der mich aufforderte, das Ticket in meinem „DB Navigator” zu „öffnen“.

    Digitalisierung: Zwei Stunden Arbeit von Kauf bis Erhalt

    Tja: Dieses Programm („App“) gibts jedenfalls offiziell nur mit Google-id und nur auf relativ wenigen Typen von Hardware, und drum habe ich es nicht. Ich knirschte also mit den Zähnen und habe erstmal eine diesbezügliche Frage an die immerhin angegebene Kontaktadresse (Lob: ganz normale Standard-Email) geschickt – aber von da kam nur eine gutgelaunte Eingangsbestätigung zurück:

    Derzeit kann es aufgrund des hohen Bestellaufkommens zu Verzögerungen kommen. Wenn´s [falsches Auslassungszeichen im Original] ein wenig länger dauert: Wir haben Sie nicht vergessen, wir melden uns.

    Nun – bis dahin ist der Mai vorbei, und dann brauche ich auch keine Information mehr.

    Ich knirschte dann heute morgen lauter mit den Zähnen und dachte mir: Na ja, wenn ich mir schon mit dem Bahn-Bonus-Quatsch Android eingetreten habe, kann ich da ja vielleicht noch den „DB Navigator“ dazupacken – ich brauche das ja nur ein Mal im Monat, um den QR-Code runterzuladen. Also bin ich wieder zum etwas dubiosen[1] apkpure.com gegangen. Dort gibt es auch ein paar Dinge, die „DB Navigator“ heißen, aber die alle kommen nicht als apk, sondern als xapk. Hu?

    Mit etwas Recherche stellt sich xapk als so eine Art informeller Standard aus der Android-Piratencommunity heraus, in dem zusammengesetzte Pakete, die Google vermutlich über Abhängigkeiten aus dem Appstore ausliefert, in einer Zip-Datei kommen. In dem Navigator-xapk von apkpure finden sich insbesondere auch zwei Pakete, in deren Dateinamen arm64 drinsteht, und ich begann zu ahnen, dass das ohne dedizierte Telefonhardware wenig Spaß machen würde.

    Tatsächlich habe ich nach ein paar Experimenten mit pm install (so installiert mensch Pakete auf der Android-Shell) und den Nicht-arm64-Paketen, die alle mit nutzlosen und/oder kryptischen Fehlermeldungen endeten, auch aufgegeben.

    Digitalisierung: Datenübertragung durch Foto

    Stattdessen habe ich ein Google-administriertes (aber nicht -registriertes, also: Kein Playstore) Telefon, das mir mal ein netter Mensch überlassen hat, ausgepackt, die ganze Google-Belästigung weggetatscht, ultramutig einen Piraten-xapk-Installer draufgeklatscht, der nun sicher alle meine Credentials zu irgendwelchen Kids in Wladiwostok schickt (ein Glück, dass das nur meine Bahn-Credentials betrifft; trotzdem: Danke, Bahn), wieder Google-Belästigung weggetatscht, den blöden „DB Navigator“ von apkpure draufgebügelt, wieder Google-Belästigung weggetatscht und tatsächlich: Die Bahn hat mir die Karte, die ich gekauft habe, nun auch endlich gegeben:

    Foto eines Mobiltelefons mit einem 49-Euro-Ticket im DB Navigator

    Nur zur Rechtfertigung: Den QR-Code habe ich verwürfelt, mir die Bahn ansonsten das hart errungene Ticket gleich wieder zurückruft.

    Welcome to digital capitalism, wo du erstmal zwei Stunden basteln und fummeln und irgendwelchen Kids aus Wladiwostok Zugriff auf deinen (Wegwerf-) Computer geben musst, damit du neu erworbenen Krempel auch bekommst. Fast so klasse wie Onlinehandel.

    Es gab aber noch ein zweites Problem: Wie bekomme ich den so erkämpften QR-Code nun aus dem Android-Silo raus? Ich habe schnell beschlossen, dass ich überhaupt keinen Nerv habe rauszukriegen, wo die Kiste ihre Screenshots speichert. Mein Kopf ist schon beim Lokalisieren der Chrome-Downloads während meiner Android-x86-Versuche explodiert. Noch weniger Lust hatte ich, zur Datenübertagung einen sshd auf das Telefon zu installieren, das ich Minuten vorher den Kids aus Wladiwostok übereignet hatte.

    Und so habe ich, es lebe die Digitalisierung!, das Foto oben gemacht, es aus der Kamera in einen richtigen Computer gezogen und dort entzerrt. Und so habe ich jetzt ein PNG mit dem QR-Code.

    Auf dem N900

    Das wiederum hat den Vorteil, dass ich mein gutes, altes Corona-Impfpass-Skript für den Nokia N900 (vgl. Foto oben) weiterverwenden kann. Das hat während der 3G-Zeiten gut funktioniert: Es zieht das PNG auf den Bildschirm, stellt das Backlight auf krass hell und macht nach 45 Sekunden alles wieder rückgängig – ich war damit fast immer schneller und unproblematischer durch Checkpoints durch als Leute mit der offiziellen App.

    Wer noch einen N900 mit hinreichend originalem Maemo hat, mag das vielleicht nützlich finden (es geht davon aus, dass ihr das Zertifikat als 49-euro.png ins Homeverzeichnis gelegt habt):

    /usr/bin/dbus-send --print-reply --dest=com.nokia.image_viewer /com/nokia/image_viewer com.nokia.image_viewer.mime_open string:file:///home/user/49-euro.png
    /usr/bin/dbus-send --print-reply --system --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer_backlight org.freedesktop.Hal.Device.LaptopPanel.SetBrightness int32:255
    sleep 45
    /usr/bin/dbus-send --print-reply --system --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer_backlight org.freedesktop.Hal.Device.LaptopPanel.SetBrightness int32:20
    killall image-viewer

    Wenn das in /home/user/mybin/passhow.sh steht, geht es gut zusammen mit einer Datei covpass.desktop im Verzeichnis /usr/share/applications/hildon, in der sowas hier steht:

    [Desktop Entry]

    Ich erwähne im Desktop-File ein Icon namens „covpass“. Damit das was anderes als ein blaues Quadrat anzeigt, müsst ihr ein hübsches PNG (bei mir ist das noch ein stilisiertes Coronavirus, was, finde ich, auch für das doofe 49-Euro-Ticket ganz gut passt) mit dem namen covpass.png nach /opt/usr/share/icons/hicolor/scalable/apps schreiben.

    Damit der Desktop diese Datei sieht: sudo killall hildon-desktop – upstart (ja, das lebt noch im alten Maemo) zieht das dann automatisch wieder hoch.

    [1]„Dubios“ ist in diesem Zusammenhang ein positives Wort, denn bei Google bin ich sicher, dass sie gegen meine Interessen handeln. Bei apkpure hingegen kann ich da noch Zweifel (lat: dūbium, n) haben.
  • Browsing Peace and Privacy With dnsmasq

    Screenshot of the dnsmasq extra configuration page in freetz

    You can even have the DNS-based adblocking discussed here in your whole network if your router runs dnsmasq (it probably does) and you can edit its configuration (you probably can't). As shown here, with freetz you can.

    I'm not a big fan of in-browser adblocking. For one, I have my doubts about several of the extensions – Adblock plus, for instance, comes from a for-profit, though I give you this critique might be partisan. Also, I like to switch browsers freely and certainly don't want to maintain block lists for each of them, and finally quite a few clients other than browsers may render HTML and hence ads.

    At least with the pages I want (and don't want) to read, there's a much lighter alternative: DNS-based adblocking. You see, on the relatively few commercial pages I occasionally have reason to visit, ads, tracking pixels, and nasty javascript typically are served from a rather small set of domains – doubleclick.net, googleadservices.com, and a few more like these. If I can make my computer resolve these names to – that is, my computer in IPv4, or yours, if you type that address –, everything your browser would pull from these servers is instantly gone in everything rendering HTML.

    So, how do you do that? Well, you first make sure that your computer does the name resolution itself[1]. On Debian, you do that by installing the packages resolvconf (without a second e; in a systemd environment I think you want to use systemd-resolved instead) and dnsmasq; that's really all, and that ought to work out of the box in all reasonably common situations:

    $ sudo apt install resolvconf dnsmasq

    You will probably have to bring your network down and up again for this to take effect.

    Once that's done, you can tell dnsmasq what names to resolve to what. The man page dnsmasq(8) documents what to do under the --address option – you could actually configure dnsmasq through command line options exclusively –, where you can read:

    -A, --address=/<domain>[/<domain>...]/[<ipaddr>]

    Specify an IP address to return for any host in the given domains. […] A common use of this is to redirect the entire doubleclick.net domain to some friendly local web server to avoid banner ads. The domain specification works in the same was [sic, as of bullseye] as for --server […]

    – and from the documentation of --server you learn that <domain> is interpreted as a suffix (if you will), such that if you give an address for, say, google.com, it will also be used for foo.google.com or foo.bar.google.com.

    But where do these address expressions go? Well, at least in Debian, dnsmasq will read (essentially, see the README in there) any file you drop into /etc/dnsmasq.d and add its content to its configuration. Having configuration snippets in different files really helps maintenance and dist-upgrades in general; in this case, it also helps distributing the blacklist, as extra configuration that may be inappropriate on a different host is kept in some other file.

    I tend to prefix snippet names with numbers in case order might one day matter. So, I have a file /etc/dnsmasq.d/10spamreduce.conf containing:


    When you do the same thing, you should restart dnsmasq and then see the effect like this:

    $ sudo service dnsmasq restart
    $ dig +short fonts.gstatic.com

    As you can see, I have also included some trackers and other sources of annoyance in my address list. Of course, if you actually want to read Facebook (ugh) or need to pull Google's fonts (ughugh), you'll have to adapt that list a bit.

    In case you have interesting and useful contributions to this list: Please do write in!

    [1]Regrettably, with things like DNS over HTTPS, it could be that your browser actually will not use your computer's DNS resolver. Adblocking hence is one extra reason to disable DoH when you see it.
  • Work-Life Balance and Privacy with Bash, D-Bus, gajim and ifupdown

    A small screenshot showing an offline icon

    Sunday morning: my gajim is automatically offline. This post explains how I'm doing that.

    I still consider XMPP the open standard for “chat” (well, instant messaging), and I have been using Psi as an XMPP client for almost 20 years now. However, since Psi has occasionally crashed on me recently (as in: at least since Bullseye), presumably on receiving some message, I consider it a certainty that it is remotely exploitable. Given its large codebase I don't think I want to fix whatever is wrong myself, and I don't think there are still people maintaing Psi.

    I therefore recently migrated to gajim last week; after all, one of the nice things about open standards is that there are usually multiple implementations. This, however, made me update an ancient hack to automatically manage my status so that I'm XMPP-offline when it's nobody's business whether or not my machine is on.

    In this post, I'd like to tell you how that works, hoping it may be useful to solve other (but similar; for instance: get offline when doing talks) problems, too.

    Not Always Online

    First off, the major reason I'm not much of a fan of synchronous messaging (which IM is, and email is not) is that it requires some sort of “presence” notification: something needs to know whether I am online, and where I can be reached. At least in XMPP, additionally all your contacts get to know that, too.[1]

    While I admit that can be useful at times, during the night and on weekends, I really don't want to publish when my computer is on and when it's not. Hence I have so far told my Psi and I am now telling my gajim to not automatically re-connect on Weekends or between 20:00 and 7:00. That I can specify this perhaps somewhat unique preference illustrates how great shell integration everywhere is. The ingredients are:

    • ifupdown, Debian's native network management. If you're using systemd or NetworkManager or something, I think these use other hooks [if you've tried it, let me know so I can update this].
    • D-Bus, a framework to communicate between programs sitting on a common X11 display (though with gajim, D-Bus becomes somewhat hidden).
    • the shell, which lets you write little ad-hoc programlets and duct-tape together all the small utilities that accumulated in Unix since the early 1970ies (here: logger, date, and egrep).

    Inter-Process Communication with D-Bus

    The first thing I want to do is make tajim offline before a network interface goes down. That way, people don't have to wait for timeouts to see I am unavailable (unless someone pulls the cable or the Wifi disappears – without a network, gajim can't sign off). That means I have to control a running gajim from the outside, and the standard way to do that these days is through D-Bus, a nifty, if somewhat over-complicated way of calling functions within programs from other programs.

    One of these other programs is qdbus, which lets you inspect what listens on your sessions's (or, with an option, system's) D-Bus and what functions you can call where. For instance:

    $ qdbus org.gajim.Gajim /org/gajim/Gajim
    method void org.gtk.Actions.SetState(QString action_name, QDBusVariant value, QVariantMap platform_data)

    In Psi, with a bit of fiddling, a generic D-Bus tool was enough to switch the state. Since there's a QDBusVariant in the arguments gajim's SetState method wants according to the qdbus output, I don't think I could get away with that after the migration – qdbus does not seem to be able to generate that kind of argument.

    Enter gajim-remote

    But gajim comes with a D-Bus wrapper of its own, gajim-remote, and with that, you can run something like:

    gajim_remote change_status offline

    Except that won't work out of the box. That's because gajim comes with remote control disabled by default.

    To enable it, go to Preferences → Advanced, click Advanced Configuration Editor there, and then look for the remote_control configuration item. I have no idea why they've hidden that eminently useful setting so well.

    Anyway, once you've done that, you should be able to change your status with the command above and:

    gajim_remote change_status online

    ifupdown's Hooks

    I now need to arrange for these commands to be executed when network interfaces go up and down. These days, it would probably be smart to go all the way and run a little daemon listening to D-Bus events, but let me be a bit less high-tech, because last time I looked, something like that required actual and non-trivial programming.

    In contrast, if you are using ifupdown to manage your machine's network interfaces (and I think you should), all it takes is a bit of shell scripting. That's because ifupdown executes the scripts in /etc/network/if-up.d once a connection is up, and the ones in /etc/network/if-down.d before it brings a connection down in a controlled fashion. These scripts see a few environment variables that tell them what's going on (see interfaces(5) for a full list), the most important of which are IFACE (the name of the interface being operated on), and MODE, which would be start or stop, depending on what ifupdown is doing.

    The idea is to execute my change_status commands from these scripts. To make that a bit more manageable, I have a common script for both if-up.d and if-down.d. I have created a new subdirectory /etc/network/scripts for such shared ifupdown scripts, and I have placed the following file in there as jabber:

    # State management of gajim
    case $MODE in
      case $IFACE in
      eth* | wlan* | n900)
        if ! date +'%w/%H' | grep '[1-5]/\(0[789]\|1[0-9]\)'  > /dev/null; then
          exit 0
        su - $DESKTOP_USER -c 'DISPLAY=:0 gajim-remote change_status online "Got net"' > /dev/null || exit 0
      case $IFACE in
      eth* | wlan* | n900)
        if [ tonline == "t`su $DESKTOP_USER -c 'DISPLAY=:0 gajim-remote get_status'`" ]; then
          su - $DESKTOP_USER -c "DISPLAY=:0 gajim-remote change_status offline 'Losing network'" || exit 0
          sleep 0.5

    After chmod +x-ing this file, I made symbolic links like this:

    ln -s /etc/network/scripts/jabber /etc/network/if-down.d/
    ln -s /etc/network/scripts/jabber /etc/network/if-up.d/

    – and that should bascially be it (once you configure DESKTOP_USER).

    Nachtrag (2023-12-02)

    Let me admit that this never really worked terribly well with gajim, manly because – I think – its connections don't time out, and so once a status update hasn't worked for one reason or another, gajim would be in a sort of catatonic state. That's one of the reasons I switched on to pidgin, and its state management again broke when upgrading to Debian bookworm. My current script is near the bottom of this December 2023 post

    Debugging Admin Scripts

    Because it is a mouthful, let me comment a bit about what is going on:

    logger Jabber: $MODE $IFACE $LOGICAL

    logger is a useful program for when you have scripts started deeply within the bowels of your system. It writes messages to syslog, which effectively lets you do printf Debugging of your scripts. Once everything works for a script like this, you probably want to comment logger lines out.

    Note that while developing scripts of this kind, it is usually better to just get a normal shell, set the environment variables (or pass the arguments) that you may have obtained through logger, and then run them interactively, possibly with a -x option (print all statements executed) passed to sh. For instance:

    $ MODE=start IFACE=wlan0 sh -x /etc/network/scripts/jabber
    + DESKTOP_USER=anselmf
    + logger Jabber: start wlan0
    + case $MODE in
    + case $IFACE in
    + date +%w/%H
    + grep '[1-5]/\(0[789]\|1[0-9]\)'
    + exit 0

    – that way, you see exactly what commands are executed, and you don't have to continually watch /var/log/syslog (or journalctl if that's what you have), not to mention (for instance) bring network interfaces up and down all the time.

    Case Statments in Bourne's Legacy

    The main control structure in the script is:

    case $MODE in

    Case statements are one of the more powerful features of descendants of the Bourne shell. Read about them in the excellent ABS in case you are a bit mystified by the odd syntax and the critically important ;; lines.

    The particular case construct here is there so I can use the same script for if-up.d and if-down.d: it dispatches on whatever is in MODE. In case MODE is something other than start or stop, we silently do nothing. That is not always a good idea – programs failing without complaints are a major reason for the lack of hair on my head –, but since this isn't really user-callable, it's probably an acceptable behaviour.

    General rule of thumb, though: Be wary of case .. esac without a *) (which gives commands executed when nothing …

  • Eine neue Metrik für Webseiten: Crapicity

    Screenshot einer Webseite mit großem Banner "Crapicity" über einer Eingabezeile und einer Balkengrafik, die nach Lognormal-Verteilung aussieht.

    Die Krapizität der Webseiten, auf die ich hier so verlinkt habe (und noch ein paar mehr): works with netsurf!

    Kurz nachdem das Web seine akademische Unschuld verloren hat, Ende der 1990er, habe ich den UNiMUT Schwobifying Proxy geschrieben, ein kleines Skript, das das ganze Web in Schwäbisch, hust, erlebbar machte. Tatsächlich hat mir das gegen 2002 meine 15 Minuten des Ruhms verschafft, inklusive 200'000 Zugriffen pro Tag (fürs damalige Netz rasend viel), Besprechungen auf heise.de, spiegel.de und, für mich offen gestanden am schmeichelndsten, in Forschung aktuell (also gut: Computer und Kommunikation) im Deutschlandfunk.

    Ein heimlich verwandtes Web-Experiment war dass Dummschwätzranking von 1998, das – nicht völlig albern – die Dichte von (damals gerade modernen) Heißdampfworten in Webseiten beurteilt hat – dafür interessieren[1] sich bis heute Leute.

    Beide Spielereien sind praktisch abgestellt, teils, weil das kommerzielle Internet und SEO solche Sachen rechtlich oder praktisch sprengen. Teils aber auch, weil sie darauf bauen, dass das, was Leute im Browser sehen, auch in etwa das ist, was im HTML drinsteht, das so ein Programm von den Webservern bekommt. Das aber ist leider im Zuge der Javascriptisierung des Web immer weniger der Fall.

    Nun ist sicherlich die Habituierung der Menschen an „lass einfach mal alle Leute, von denen du was lesen willst, Code auf deiner Maschine ausführen“ die deutlich giftigere Folge des Post-Web-1-Megatrends. Aber schade ist es trotzdem, dass zumindest im kommerziellen Web kein Mensch mehr in die Seiten schreibt, was nachher in den dicken Javascript-Browsern angezeigt werden wird.

    Und weil das schade ist, habe ich einen postmodernen Nachfolger des Dummschwätzrankings geschrieben: Die Crapicity-Maschine (sorry, nicht lokalisiert, aber sprachlich sowieso eher arm). Ihre Metrik, namentlich die Crapicity oder kurz c7y: das Verhältnis der Länge lesbaren Textes (das, was ohne Javascript an Zeichen auf den Bildschirm zu lesen ist) zur Gesamtlänge der Seite mit allem eingebetteten Markup, Javascript und CSS (also: externes Javascript, CSS usf nicht gerechnet). In Python mit dem wunderbaren BeautifulSoup-Modul ist das schnell berechnet:

    def compute_crapicity(doc):
      """returns the crapicity of html in doc.
      doc really should be a str -- but if len() and BeautifulSoup() return
      something sensible with it, you can get away with something else, too.
      parsed = BeautifulSoup(doc, "html.parser")
      content_length = max(len(parsed.text), 1)
      return len(doc)/content_length

    Um diese knappe Funktion herum habe fast 700 Zeilen herumgeklöppelt, die Ergebnisse in einer SQLite-Datenbank festhalten und ein Webinterface bereitstellen. Als Debian-und-eine-Datei-Programm sollte das recht einfach an vielen Stellen laufen können – wer mag, bekommt die Software bei codeberg.

    Die Web-Schnittstelle bei https://blog.tfiu.de/c7y hat aber den Vorteil, dass sich die Scores sammeln. Spielt gerne damit rum und empfiehlt es weiter – ich fände es putzig, da vielleicht 10'000 Seiten vertreten zu haben. Ich habe selbst schon knapp 200 Web-Ressourcen durchgepfiffen, meistenteils Links aus dem Blog hier.

    Im Groben kommt raus, was wohl jedeR erwartet hat: Das kommerzielle Netz stinkt, alter Kram und Techno-Seiten sind meist ganz ok. Allerdings habe ich den aktuellen Spitzenreiter, eine Reddit-Seite mit einem c7y von über 17'000, noch nicht ganz debuggt: Erstaunlicherweise ist die Seite auch im netsurf und ohne Javascript lesbar. Wie sie das macht: nun, das habe ich in 800 kB Wirrnis noch nicht rausgefunden, und der Quellcode der Seite sieht so schrecklich aus, dass der Score sicherlich verdient ist.

    Ich nehme mal an, dass derzeit alle youtube-Seiten bei c7y=8222 liegen; dort ist ja durchweg ohne Javascript nichts zu sehen, und so haut auch dieser Score prima hin. Bei taz.de (gerade 892) geht es vielleicht nicht so gerecht zu, denn die Seite funktioniert in der Tat auch ohne Javascript ganz gut. Eventuell ist hier BeautifulSoup schuld. Hochverdient hingegen sind die 682 von nina.no – das ist ohne Javascript leer. Eine Twitter-Seite liegt bei 413, Bandcamp bei 247.

    Vernüftige Seiten liegen dagegen zwischen etwas über eins (minimales Markup) und zehn (z.B. wenig Text mit viel CSS). Dishonorable Mention: uni-heidelberg.de liegt trotz Akademia bei 177. Tatsächlich ist die Seite auch in normalen Browsern[2] halbwegs lesbar. Der schlechte Score liegt vor allem an eingebetten SVGs, ist also schon ein ganz klein wenig unfair. Aber ehrlich: wer für ein bisschen Glitz ein paar hundert Zeichen Text auf satte 680k aufbläht, hat eine große crapicity verdient, auch wenn die Seite selbst nicht richtig kaputt ist. Wer unbedingt so viel Glitz haben will, soll externe Bilder verwenden – die muss ich nicht runterladen, wenn ich nicht will.

    Wer interessante Krapizitäten findet: Die Kommentarbox wartet auf euch.

    [1]Na gut, viel von dem Interesse kam aus erkennbar aus SEO-Kreisen; das war dann auch einer der Gründe, warum ich das eintragen von Links beim Dummschwätzranking abgestellt habe.
    [2]Definiert als: Alles außer den Monstren Firefox, Chrome, Webkit und ihren Derivaten.
  • A QR Code Scanner for the Desktop

    Screenshot: Two windows.  One contains a photo of a QR code, the other two buttons, one for opening the URI parsed from the QR code, the other for canceling and scanning on.

    qropen.py in action: Here, it has scanned a QR code on a chocolate wrapper and asks for confirmation that you actually want to open the URI contained (of course, it doesn't come with the glitzy background).

    When I was investigating the SARS-2 vaccination certificates last year (sorry: that post is in German), I played a bit with QR codes. A cheap by-product of this was a little Python program scanning QR codes (and other symbologies). I cannot say I am a huge fan of these things – I'd take short-ish URIs without cat-on-the-keyboard strings like “?id=508“ instead any day –, but sometimes I get curious, and then this thing comes in handy given that my telephone cannot deal with QR codes.

    Yesterday, I have put that little one-file script, qropen.py, on codeberg, complemented by a readme that points to the various ways you could hack the code. I'll gratefully accept merge requests, perhaps regarding a useful CLI for selecting cameras – you see, with an external camera, something like this thing starts being actually useful, as when I used it to cobble together a vaccination certificate checker in such a setup. Or perhaps doing something smart with EAN codes parsed (so far, they just end up on stdout)?

    On that last point, I will admit that with the camera on my Thinkpad X240, most product EAN codes do not scan well. The underlying reason has been the biggest challenge with this program even for QR codes: Laptop cameras generally have a wide field of view with a fixed focus.

    The wide field of view means that you have to bring the barcodes pretty close to the camera in order to have the features be something like three pixels wide (which is what zbar is most fond of). At that small distance, the fixed focus means that the object is severely out of focus and hence the edges are so blurry that zbar again does not like them.

    Qropen.py tries to mitigate that by unsharp masking and potentially steep gammas. But when the lines are very thin to begin with – as with EAN stripes –, that does not really help. Which means that the QR codes, perhaps somewhat surprisingly given their higher information content, in general work a lot better for qropen.py than to the simple and ancient EAN codes.

    There clearly is a moral to this part of the story. I'm just not sure which (beyond the triviality that EANs were invented for lasers rather than cameras).

  • Making Linux React to Power Gain and Loss

    Photo of a mains switch built into a power socket

    This is what this post is about: having a single switch for monitor, amplifier, and computer.

    I use an oldish notebook with a retired monitor and an amplifier I picked up from kerbside junk to watch TV („consume visual media“, if you prefer), and I want all that hardware to be switched on and off using a single power switch (see, um… Figure 1).

    Given that the notebook's battery still is good for many minutes, it's a perfectly reasonable stand-in for a UPS. Hence, my problem is quite like the one in the ancient days when big-iron servers had UPSes with just enough juice to let them orderly shut down when the power grid was failing. This used to involve daemons watching a serial line coming in from the UPS. Today, with ACPI in almost every x86 box and batteries in many of them, it's quite a bit simpler.

    This post shows how to automatically power (up and) down with acpid. If you run systemd, you probably will have to do a few extra tweaks to keep it from interfering. Please write in if you figure them out or if things just work.

    Make the Box Wake Up On Power

    The first half is to configure the machine to wake up when mains power returns. Notebooks typically don't do that out of the box, but most ACPI firmwares can be configured that way. On my box, a Thinkpad X230 controlled by a legacy BIOS rather than UEFI, it is a setting in the BIOS setup pages[1]. If you boot through UEFI, you may be able to do this configuration from within the Linux (please write in if you can provide details on that).

    Having said that, let me, perhaps only loosely relatedly, mention /proc/acpi/wakeup, which may play a role in this for you (although it does not on the X230). If you cat this file, you will see something like:

    LID       S4    *enabled   platform:PNP0C0D:00
    SLPB      S3    *enabled   platform:PNP0C0E:00
    IGBE      S4    *enabled   pci:0000:00:19.0
    EXP3      S4    *disabled  pci:0000:00:1c.2
    XHCI      S3    *enabled   pci:0000:00:14.0
    EHC1      S3    *enabled   pci:0000:00:1d.0
    EHC2      S3    *enabled   pci:0000:00:1a.0
    HDEF      S4    *disabled  pci:0000:00:1b.0

    Whatever is enabled here will wake the machine up, sometimes depending on whether it is hibernating or just suspended. There are various events that could cause a wakeup, such as when the lid is opened (in the ACPI lingo used here, LID), when a Wake-on-LAN packet arrives (IGBE), when the sleep/power button is pressed (SLPB) or when someone puts in a signal via USB (XHCI, EHC1, ECH2; typically, that would be a keyboard)[2]. To change this, you echo the respective string into the file, which toggles the enabledness:

    $ echo LID | sudo tee /proc/acpi/wakeup
    $ cat /proc/acpi/wakeup | grep LID
    LID       S4    *disabled  platform:PNP0C0D:00

    If there's no obvious BIOS setting for waking up the machine on power, look for something like PWR in /proc/acpi/wakeup. Incidentally, disabling wakeup sources here may actually conserve battery power when hibernating.

    Make the Box Hibernate on Mains Loss

    The second half is that the machine should go into hibernation when I flip the central power switch. A straightforward way to get there is to talk to the acpid. It seems it is still standard on PC-style hardware even when there is systemd.

    So, let us configure it to call an appropriate script when it switches to battery mode (i.e., the power has gone). You can do that sufficiently well by writing:

    # /etc/acpi/events/battery
    # Called when AC power goes away and we switch to battery

    to /etc/acpi/events/battery. The Debian-distributed acpid already has that file, but it calls the script power.sh, which, as delivered, does something entirely different; you could modify power.sh to your liking, but it's cleaner to use a different, custom script, for instance, because it is less hassle on dist-upgrades. Disclaimer: This will fire too often, namely both on power up and down. However, at least on my hardware that doesn't hurt, and it doesn't seem acpid generates different events for battery in/out.

    Then create the script /etc/acpi/to-battery.sh. I've written this there:

    sleep 2
    if [ `cat /sys/class/power_supply/AC/online` -eq 1 ]; then
    # x230 specials; you probably won't need them
    buslist="pci i2c"
    for bus in $buslist; do
      for i in /sys/bus/$bus/devices/*/power/control; do
          echo on > $i
    logger "powerbutton-acpi-support enter"
    echo platform > /sys/power/disk
    echo disk > /sys/power/state
    logger "powerbutton-acpi-support leave"
    (sleep 12; ntpdate pool.ntp.org) &
    # this is just an example of an extra hack for resetting a TV
    # card that would be broken after the wakeup.
    (sleep 2; logger reloading tv; /usr/local/bin/uhubctl -l 1-1 -a cycle) &

    This thing first waits two seconds and then ensures AC is really gone before doing anything else; this is because on my box I occasionally received spurious power loss notifications, and hibernating the box just when something interesting was on TV has interrupted the rare moments of enjoyable programming a few times too often. Besides, this will catch cases where the battery event is generated by power coming back.

    After that, I'm running a few specials where I enable power management on the PCI and I²C busses of the machine. That has been broken for some reason or another at least on one kernel version or another on this particular box. I've left it the script above in as an inspiration for how you could intervene if something doesn't quite work and needs some fiddling.

    It then proceeds to sync the disk, just in case something goes wrong on suspend or resume and eventually does a low-level hibernation. You could probably use pm-hibernate or some systemd facility just as well, but I personally have found the direct operation of /sys/power to be at the same time the least hassle and the least fragile option (at least if you're prepared to write a few lines of script like the bus loop in my example).

    The last two commands – an NTP update and a hack to reset a USB device that is confused after a wakeup – are executed as part of the wakeup (but in background shells so the box is quickly responsive again). Adapt to your needs.

    Enjoy – and conserve energy by centrally pulling power from all the greedy little wall plug transformers.

    [1]On the X230, to change it I had to press Enter immediately after power-up, then F1, and then navigate to “Power On with AC Attach“ in the Config pane – but regrettably, there's nothing even resembling a standard there, and given this is tech supposedly obsolete since, what, 15 years, I don't think there will ever be one.
    [2]In case you're wondering what HDEF is: Well, it's audio, according to other things ACPI. What I don't know is how to make the audio hardware send a wakeup sinal. I've tried plugging in a headphone, and that didn't work. If you know more… well, again, totally feel free to write in.
  • Wird Thomas Watson Recht behalten?

    In der diesjährigen Buchmesse-Sendung von Forschung aktuell am Deutschlandfunk ging es vor allem um Bücher zur Zukunft an und für sich. In der Anmoderation dazu sagte Ralf Krauter:

    Die Geschichte der Menschheit ist denn auch voll von krass falschen Prognosen. Die vielleicht bekannteste – oder eine meiner Favoriten – kennen Sie womöglich. Tom Watson, der fühere IBM-Chef, sagte im Jahr 1943 mal: Ich denke, es gibt einen Weltmarkt für vielleicht fünf Computer. Kam dann anders, wie wir alle wissen, aber es zeigt schon: Selbst die Prognosen von absoluten Fachleuten sind mit Vorsicht zu genießen.

    Als ich das gerade gehört habe, wollte ich spontan einwenden, dass das Urteil über Thomas Watsons Prognose noch aussteht. Natürlich wird es auf absehbare Zeit hunderte Milliarden von Mikroprozessoren geben, aber die meisten von denen tun Dinge, für die es früher vielleicht diskrete Steuerungen oder Analogelektronik gegeben hätte – sie dienen genau nicht als die universellen programmierbaren Geräte, die Watson bei seiner Schätzung im Kopf gehabt haben wird.

    Viele andere sind verbaut in den späten Erben der Fernschreiber und Terminals der Großrechnerära: Den Mobiltelefonen, die heute vielfach kaum mehr sind als Ein-/Ausgabegeräte für eine Handvoll großer Computer. Dabei muss mensch nochmal die Augen etwas zusammenkneifen; wenn wir Watson geben, dass er von riesigen Mainframes gesprochen hat, mit vielen, vielen CPUs, dann sind die heutigen „Clouds“ von Google, Facebook, Microsoft und Alibaba im Wesentlichen jeweils ein Computer im Sinn von Watson. In dieser Zählung – in der Router und Endgeräte nicht, die Rechenzentren der „Hyperscaler“ jeweils als ein Computer zählen – teilen sich in der Tat fünf oder zehn Computer den Großteil der Computernutzung eines Großteils der Menschheit.

    Je dominanter das Modell wird, in dem dumme Clients unter Kontrolle von Apple bzw. Google („Smartphones“) Dienste ausspielen, die auf einer kleinen Zahl von Infrastrukturen laufen („Cloud“), desto näher kommen wir wieder Watsons Einschätzung. Noch gibt es natürlich ordentliche Computer in allen möglichen Händen. Wie sehr sich die Menschen jedoch schon an das Konzept gewöhnt haben, dass sie nur Terminals, aber keine eigenen Computer mehr haben, mag eine Anekdote von meiner letzten Bahnreise illustrieren.

    Ich sitze im Zug von Würzburg nach Bamberg; er steht noch im Bahnhof. Ich kann mich nicht beherrschen und linse kurz auf den Bildschirm neben mir, und ich bin sehr erfreut, als dort jemand halbwegs ernsthafte Mathematik tippt, natürlich mit dem großartigen TeX. Meine Freude trübt sich etwas auf den zweiten Blick, denn der Mensch benutzt Overleaf, ein System, bei dem mensch in einem Webbrowser editiert und den TeX-Lauf auf einem Server macht, der dann die formatierten Seiten als Bilder wieder zurückschickt.

    Ich habe Overleaf, muss ich sagen, nie auch nur im Ansatz verstanden. Ich habe TeX schon auf meinem Atari ST laufen lassen, der ein Tausendstel des RAM der kleinsten heute verkauften Maschinen hatte, dessen Platte in einem ähnlichen Verhältnis zur Größe der kleinsten SD-Karte steht, die mensch heute im Drogeriemarkt kaufen kann. Gewiss, mit all den riesigen LaTeX-Paketen von heute wäre mein Atari ST überfordert, aber zwischen dem TeX von damals und dem TeX von heute ist kein Faktor 1000. LaTeX ist wirklich überall verfügbar, und es gibt gut gewartete und funktionierende Distributionen. Wer mit anderen gemeinsam schreiben will, kann auf eine gut geölte git-Infrastruktur zurückgreifen. Am wichtigsten: die Menschen vom Stamme vi können damit editieren, jene der emacs-Fraktion mit ihrem Lieblingseditor, niemand wird auf das hakelige Zeug von Overleaf gezwungen.

    Aber zurück zur Anekdote:

    Obwohl er also ganz einfach TeX auch auf seinem Rechner laufen lassen könnte, klickt mein Sitznachbar nur ein wenig hilflos herum, als wir den Bahnhof verlassen und mit dem WLAN auch das Hirn von Overleaf verschwindet. Er versucht dann, mit seinem Telefon die Nabelschnur zum Overleaf-Computer wiederherzustellen, aber (zum Unglück für Herrn Watson) ist die Mobilfunkversorgung der BRD marktförmig organisiert. Die Nabelschnur bleibt am flachen Land, wo zu wenig KundInnen Overleaf machen wollen, gerissen. Schließlich gibt er auf und packt seinen Nicht-Computer weg. Zu seinem Glück geht auf seinem Telefon immerhin noch mindestens ein Spiel ohne Internetverbindung…

    Dass die gegenwärtige Welt offenbar gegen die Vorhersagen von Thomas Watson konvergiert, dürfte kein Zufall sein. Watson war ein begnadeter Verkäufer, und er hat vom Markt geredet. Spätestens seit das WWW breite Schichten der Bevölkerung erreicht, wird auch das Internet mehr von begnadeten VerkäuferInnen und „dem Markt“ gestaltet als von BastlerInnen oder Nerds.

    Wenn ihr die Welt mit fünf Computern für eine Dystopie haltet und das Internet nicht „dem Markt“ überlassen wollt: Arg schwer ist das nicht, denn zumindest Unix und das Netz haben noch viel vom Erbe der WissenschaftlerInnen und BastlerInnen, die die beiden geschaffen haben. Siehe zum Beispiel die Tags Fediverse und DIY auf diesem Blog.

  • Blog Extensions on Codeberg

    Screenshot of a browser window showing http://localhost:6070/foo and a fortune cookie in glorious ASCII.

    This post takes an odd bend to become an apology for CGI (as in common gateway interface) scripts. This is the netsurf browser communicating with the CGI shell script at the foot of this post.

    I have written a few plugins and extensions for this blog, and I have discussed them in a few posts (e.g., feedback form, tag explanations, cited-by links, or the search engine). The code implementing these things has been strewn across the various posts. I have to admit that having that code attached to just a few blog posts has always felt somewhat too early-90iesy to me.

    Now that I have created my Codeberg account, I have finally copied together all the various bits and pieces to create a repository on Codeberg that you are welcome to clone if you're running pelican or perhaps some other static blog engine. And of course I appreciate merge requests with improvements.

    There is one major file in there I have not previously discussed here: cgiserver.py. You see, I'm a big fan of CGI scripts. They're reasonably simple to write, trivial to deploy, and I have CGIs that have been working with minimal maintenance for more than 20 years. Sure, pulling up an interpreter for every request is not terribly efficient, but for your average CGI that is perhaps called a dozen times per day (depending on how many web crawlers find it interesting) this really doesn't matter. And that's why both the feedback script and the search engine are written as CGIs.

    However, in contrast to apache, nginx (which serves this blog) does not support CGI scripts. I even by and large agree with their rationale for that design decision. Still, I would like to run CGIs, and that's why I've written the cgiserver. It is based on Python's built-in HTTP server and certainly will not scale – but for your average blog (or similar site) it should be just fine. And I don't think it has any glaring security problems (that you don't introduce with your CGIs, that is).

    Installation is almost trivial: put the file somewhere (the in-source sysvinit script assumes /var/www/blog-media/cgiserver.py, but there's absolutely no magic about this), and then run it with a port number (it will only bind to localhost; the default in the sysvinit script is 6070) and a directory into which you put your CGI scripts (the sysvinit script assumes /var/www/blog-media/cgi).

    When you have a cgi script foo, you can dump it in this directory, make it executable and then run it by retrieving http://localhost:6070/foo. In case you have nothing else, you can try a shell script like:

    echo "content-type: text/plain"

    (which of course only works in this form if you have something like fortunes-en installed on a Debian box). That should be enough to give you something like the screenshot opening this post. Even more than 25 years after I have written my first CGI, I am still amazed how simple this is.

    Disclaimer: Writing CGI scripts that take input such that they are not trivially exploitable is higher art. So… don't do it, except as a game. Oh, and to debug your scripts, simply let cgiserver run in a terminal – that way, you will see what your scripts emit on stderr. Note, however, that the way the sysvinit script starts cgiserver, it will run as nobody; if things work when you start cgiserver yourself but not when it's running as a daemon, that's the most likely reason.

  • Maintaining Static Blogs Using git push

    local                server
    main  --- push --->   main
                            | (merge)
                       published --- make publish --->  nginx
    Fig 1.  Our scheme in classic ASCII art.

    In my post on how I'm using pelican – the static blog engine that formats this site –, I had described that on a make install, I would do a local build (make publish) and then rsync the result to the production site. Since about June, I no longer do that, because the way pelican works – it touches every generated file every time – is not a good match for rsync. With a growing site, this means a substantial amount of data (well: a few megabytes for me at this time) is being transferred. What's a few megabytes these days, you ask? Well, ever since UMTS has been shut down, on the road all I have is GPRS (i.e., 10 kB/s with a bit of luck), and then a few Megabytes is a lot.

    I hence finally changed things to benefit from the fact that I keep the text content in a version control system. For a post without media, all that needs to be transferred are a few kilobytes for a git push. Here is how that is done (assuming a Debian-like setup).

    First, unless your source directory already is under git version control, in there run:

    git init
    git add Makefile content plugins pelicanconf.py publishconf.py theme tasks.py
    git commit -am "Migrating into git"

    You will probably also want to have a .gitignore, and then probably several other files on top, but that's beside the current point.

    Two Repos, Two Branches

    The rough plan is to have a complete, checked-out git repository on the server side (ahem: see Figure 1). It is updated from your local repo through pushes. Since you cannot push into a checked-out branch, the server-side repository has a branch published checked out, while your authoring happens in the main (traditionally called master) branch. After every push, main is merged into published, and then pelican's site generation runs.

    A word of warning: these merges will fail when you force-push. Don't do that. If you do, you will have to fix the breakage on the server side, either by dropping and re-creating the published branch, or by massaging all places that a force-pushed commit changed.

    To set this up, on the web server do (adapting to your site and taste if you don't like the path):

    sudo mkdir -p /var/blog/source
    sudo chown `id -u` /var/blog/source # you'll be pushing as yourself
    cd /var/blog/source
    # create a git repo you can push into
    git init
    # go away from the main/master branch so you can push into it
    git checkout -b published

    Then, in your local git repository for the blog, add the repository you just created as a remote named prod and push the main branch (this assumes you have the main branch checked out):

    git remote add prod ssh://USER@SERVER.NAME//var/blog/source
    git push prod

    On the remote server, you are still on the published branch, and hence you will not see what you have just pushed. You have to merge main using:

    git merge main

    (or master, if that's still the name of your main branch). You should now see whatever you have put into your local git above. If that's true, you can say make publish and see your publishable site in the output subdirectory. If it's not true, start debugging by making sure your main branch on the server side really contains what you think you have pushed.

    Automating the Publication

    This completes the basic setup. What is still missing is automation. That we can do with a git hook (see the githooks man page for more information on that nifty stuff) that is installed on the server side into /var/blog/source/.git/hooks/post-update. This file contains a shell script that is executed each time commits are pushed into a repository once git has updated everything. In this case, it is almost trivial, except for some bookkeeping and provisions for updating the search engine (all lines with BLOG_ROOT in them; delete these when you have not set that up):

    # This hook merges the main branch, builds the web page, and does
    # housekeeping.
    # This *assumes* we have the published branch checked out.  It should
    # probably check that one day.
    set -e
    unset GIT_DIR # this is important, since we're manipulating the
       # working tree, which is a bit uncommon in a post-update hook.
    cd ..
    git merge master
    make publish
    BLOG_DIR=$BLOG_ROOT/source/output $BLOG_ROOT/media/cgi/blogsearch

    Do not forget to chmod +x that file, or git will ignore it.

    Again at the local side, you have to modify your install target so something like:

           # adapt the paths!
                  rsync --info=progress2 -av /var/www-local/blog-media/ blog.tfiu.de:/var/blog/media/
    install: rsync
                  -git commit -a
                  git push -u prod master

    (the - in front of the git commit is because git returns non-zero if there is nothing to commit; in the present case, you may still want to push, perhaps because previous commits have not been pushed, and hence we tell make to not bother about the status of git commit).

    With this path and the separate media directory still updated through rsync (cf. the previous post on this), an nginx config would have to contain lines like:

    location / {
      root /var/blog/source/output;
    location /media/ {
      alias /var/blog/media/;

    This setup has worked nicely and without a flaw in the past few months. It makes a lot more sense the my previous setup, not the least because any junk that may accumulate in my local output directory while I'm fooling around will not propagate to the published server. So: If you work with pelican or a similar static blog generator, I'd say this is the way to partial bliss.

  • Bahnauskuft auf antiken Geräten – und auf Codeberg

    Foto: Altes Mobiltelefon mit Terminal, das eine etwas kryptische Bahnauskunft zeigt

    Bahnauskunft von 2022 auf einem Nokia N900 von 2009: Es braucht inzwischen etwas Mühe, um das gebastelt zu kriegen.

    Als die Bahn-Webseite nicht mehr ordentlich auf kompakten Browsern wie dillo funktionierte und auch nicht per WAP– also Mitte der 2010er Jahre –, habe ich mir ein ein kleines Skript geschrieben, das die wesentlichen Infos zur Zugauskunft aus dem HTML herausklaubte und dann in einem einfachen Kommandozeilen-Interface darstellte. Das war, worum es im letzten Sommer bei meinem Rant gegen Zwangs-Redirects umittelbar ging. Der weitere Hintergrund: Ich will Zugauskünfte von meinem alten Nokia N900 aus bekommen (und im Übrigen seit der Abschaltung von UMTS auch über eine 2G-Funkverbindung, also etwas wie 10 kB/s)[1].

    Nachdem das – jedenfalls nach Maßstäben von Programmen, die HTML auf Webseiten zerpflücken – überraschend lang gut ging, ist das im Rahmen der derzeitigen Verschlimmbesserung der Bahn-Seite neulich kaputt gegangen. Obendrauf ist die Javascript-Soße auf bahn.de damit so undurchsichtig geworden, dass mich die Lust, das Skript zu pflegen, sehr nachhaltig verlassen hat. In dieser Lage kam ein Vortrag über die Bahn-APIs, den jemand bei der Gulasch-Programmiernacht 2019 gehalten hat, gerade recht. Also: Das Video davon.

    In diesem Video habe ich gelernt, dass mein „unpromising“ im Rant vor einem Jahr,

    I know bahn.de has a proper API, too, and I'm sure it would be a lot faster if I used it, but alas, my experiments with it were unpromising [...],

    einen tiefen Hintergrund hat. Die Bahn hat nämlich keine API für die Fahrplanauskunft.

    Was es aber stattdessen gibt: die HaFAS-API, auf die die Reiseplanung der Bahn-App selbst aufsetzt. Und es stellt sich heraus, dass Leute schon mit viel Fleiß ausbaldowert haben, wie die so funktioniert, etwa in pyhafas.

    Mit pyhafas kann ich all das schreckliche HTML-parsing aus dem alten bahnconn.py durch ein paar Aufrufe in pyhafas rein ersetzen. Aber leider: pyhafas ist echt modernes Python, und weil es viel mehr kann als es für bahnconn.py bräuchte, wäre das Rückportieren davon nach Python 2.5 ein ernsthaftes Projekt; mehr habe ich aber auf meinem N900 nicht. Außerdem bin ich bekennender Fan von ein-Modul-und-stdlib-Programmen: die brauchen keine Installation und laufen zudem mit allem, das irgendwie Python verdauen kann, also etwa auch jython oder sowas, was spätestens dann in Frage steht, wenn Abhängigkeiten C-Code enthalten.

    Deshalb habe ich aus pyhafas die Dinge, die bahnconn dringend braucht, abgeschaut und eine minimale, Python-2.5-verträgliche Implementation gebastelt. Das Ergebnis: ein neues bahnconn. Holt es euch, wenn ihr Bahnauskunft auf älteren Geräten haben wollt. Ich habe es jetzt nicht auf Atari TTs probiert, aber ich kann mir gut vorstellen, dass es selbst da noch benutzbar ist.


    Gerade, als ich den Code einfach wieder hier auf dem Blog abwerfen wollte, habe ich beschlossen, das könne ein guter Anlass sein, endlich mal einen zweiten Blick auf Codeberg zu werfen.

    Bisher habe ich nämlich für allen etwas langlebigeren oder größeren Code (also: nicht einfach nur am Blog abgeworfenen Kram), ganz DIY, ein eigenes Subversion-Repository betrieben. Was in den letzten Jahren neu dazukam, habe ich in git+ssh+cgit gesteckt.

    Natürlich hat das niemand mehr gesehen; nicht mal Suchmaschinen gucken mehr auf sowas, seit aller Code der Welt bei github landet. Deshalb, und auch, weil ich Monstren wie gitea und gitlab wirklich nicht auf meiner Maschine haben will (allerdings: cgit ist ok und würde für Publikation auf subversion-Niveau reichen), habe ich mich mit dem Gedanken, dass mein Kram auf einer öffentlichen Plattform besser aufgehoben sein mag, mehr oder minder abgefunden.

    Auf Github bin ich beruflich schon so viel zu viel unterwegs, und der Laden ist deutlich zu nah am Surveillance Capitalism. Zwar kenne ich hinreichend Projekte und Firmen, die ihnen Geld geben, so dass sie gewiss ein konventionell-kapitalistisches Geschäftsmodell fahren könnten; aber schon da fehlt mir der Glaube. Obendrauf hat mir Microsoft in meinem Leben schon so viel Kummer bereitet, dass ich ihnen (bzw. ihrem Tochterunternehmen) nicht noch mehr KundInnen zutreiben will.

    Codeberg, auf der anderen Seite, wird von einem Verein betrieben und macht generell vieles richtig, bis hin zu Einblendungen von Javascript-Exceptions (warum machen das eigentlich nicht alle?), so dass die Seite nicht einfach heimlich kaputt ist, wenn ich Local Storage verbiete (gitea, die Software, auf der Codeberg soweit ich sehe aufsetzt, kann leider immer noch nicht ohne).

    Von dem gitea-Krampf abgesehen hat gestern alles schön funktioniert, nichts an der Anmeldeprozedur war fies oder unzumutbar. Codeberg hat hiermit erstmal das Anselm Flügel Seal of Approval. Ich denke, da werde ich noch mehr Code hinschaffen. Und mal ernsthaft über Spenden nachdenken.

    [1]Janaja, und natürlich nervte mich die fette Bahn-Webseite mit all dem Unsinn darauf auch auf dem Desktop und auch schon vor der gegenwärtigen Verschlimmbesserung.
  • How to Block a USB Port on Smart Hubs in Linux

    Lots of computer components (a notebook computer with its cover removed

    Somewhere beneath the fan on the right edge of this image there is breakage. This post is about how to limit the damage in software until I find the leisure to dig deeper into this pile of hitech.

    My machine (a Lenovo X240) has a smart card reader built in, attached to its internal USB. I don't need that device, but until a while ago it did not really hurt either. Yes, it may draw a bit of power, but I'd be surprised if that were more than a few milliwatts or, equivalently, one level of screen backlight brightness; at that level, not even I will bother.

    However, two weeks ago the thing started to become flaky, presumably because the connecting cable is starting to rot. The symptom is that the USB stack regularly re-registers the device, spewing a lot of characters into the syslog, like this:

    Aug 20 20:31:51 kernel: usb 1-1.5: USB disconnect, device number 72
    Aug 20 20:31:51 kernel: usb 1-1.5: new full-speed USB device number 73 using ehci-pci
    Aug 20 20:31:52 kernel: usb 1-1.5: device not accepting address 73, error -32
    Aug 20 20:31:52 kernel: usb 1-1.5: new full-speed USB device number 74 using ehci-pci
    Aug 20 20:31:52 kernel: usb 1-1.5: New USB device found, idVendor=058f, idProduct=9540, bcdDevice= 1.20
    Aug 20 20:31:52 kernel: usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    Aug 20 20:31:52 kernel: usb 1-1.5: Product: EMV Smartcard Reader
    Aug 20 20:31:52 kernel: usb 1-1.5: Manufacturer: Generic
    Aug 20 20:31:53 kernel: usb 1-1.5: USB disconnect, device number 74
    Aug 20 20:31:53 kernel: usb 1-1.5: new full-speed USB device number 75 using ehci-pci
    [as before]
    Aug 20 20:32:01 kernel: usb 1-1.5: new full-speed USB device number 76 using ehci-pci
    Aug 20 20:32:01 kernel: usb 1-1.5: New USB device found, idVendor=058f, idProduct=9540, bcdDevice= 1.20
    Aug 20 20:32:01 kernel: usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    [as before]
    Aug 20 20:32:02 kernel: usb 1-1.5: USB disconnect, device number 76

    And that's coming back sometimes after a few seconds, sometimes after a few 10s of minutes. Noise in the syslog is never a good thing (even when you don't scroll syslog on the desktop), as it will one day obscure something one really needs to see, and given that device registrations involve quite a bit of computation, this also is likely to become relevant power-wise. In short: this has to stop.

    One could just remove the device physically or at least unplug it. Unfortunately, in this case that is major surgery, which in particular would involve the removal of the CPU heat sink. For that I really want to replace the thermal grease, and I have not been to a shop that sells that kind of thing for a while. So: software to the rescue.

    With suitable hubs – the X240's internal hub with the smart card reader is one of them – the tiny utility uhubctl lets one cut power to individual ports. Uhubctl regrettably is not packaged yet; you hence have to build it yourself. I'd do it like this:

    sudo apt install git build-essential libusb-dev
    git clone https://github.com/mvp/uhubctl
    cd uhubctl
    prefix=/usr/local/ make
    sudo env prefix=/usr/local make install

    After that, you have a program /usr/local/sbin/uhubctl that you can run (as root or through sudo, as it needs elevated permissions) and that then tells you which of the USB hubs on your system support power switching, and it will also tell you about devices connected. In my case, that looks like this:

    $ sudo /usr/local/sbin/uhubctl
    Current status for hub 1-1 [8087:8000, USB 2.00, 8 ports, ppps]
      Port 1: 0100 power
      Port 5: 0107 power suspend enable connect [058f:9540 Generic EMV Smartcard Reader]

    This not only tells me the thing can switch off power, it also tells me the flaky device sits on port 5 on the hub 1-1 (careful inspection of the log lines above will reconfirm this finding). To disable it (that is, power it down), I can run:

    $ sudo /usr/local/sbin/uhubctl -a 0 -l 1-1 -p 5

    (read uhubctl --help if you don't take my word for it).

    Unfortunately, we are not done yet. The trouble is that the device will wake up the next time anyone touches anything in the wider vicinity of that port, as for instance run uhubctl itself. To keep the system from trying to wake the device up, you also need to instruct the kernel to keep its hands off. For our port 5 on the hub 1-1, that's:

    $ echo disabled > /sys/bus/usb/devices/1-1.5/power/wakeup

    or rather, because you cannot write to that file as a normal user and I/O redirection is done by your shell and hence wouldn't be influenced by sudo:

    $ echo disabled | sudo tee /sys/bus/usb/devices/1-1.5/power/wakeup

    That, indeed, shuts the device up.

    Until the next suspend/resume cycle that is, because these settings do not survive across one. To solve that, arrange for a script to be called after resume. That's simple if you use the excellent pm-utils. In that case, simply drop the following script into /etc/pm/sleep.d/30killreader (or so) and chmod +x the file:

    case "$1" in
        echo disabled > /sys/bus/usb/devices/1-1.5/power/wakeup
        /usr/local/sbin/uhubctl -a 0 -l 1-1 -p 5
    exit 0

    If you are curious what is going on here, see /usr/share/doc/pm-utils/HOWTO.hooks.gz.

    However, these days it is rather unlikely that you are still leaving suspend and hibernate to pm-utils; instead, on your box this will probably be handled by systemd-logind. You could run pm-utils next to that, I suppose, if you tactfully configured the host of items with baroque names like HandleLidSwitchExternalPower in logind.conf, but, frankly, I wouldn't try that. Systemd's reputation for wanting to manage it all is not altogether undeserved.

    I have tried to smuggle in my own code into logind's wakeup procedures years ago in systemd's infancy and found it hard if not impossible. I'm sure it is simpler now. If you know a good way to make logind run a script when resuming: Please let me know. I promise to amend this post for the benefit of people running systemd (which, on some suitable boxes, does include me).

  • PSA: netsurf 3 does not accept cookies from localhost

    As I have already pointed out in April, I consider simple and compact web browsers a matter of freedom (well, Freedom as in speech, actually), and although there's been a bit of talk about ladybird lately, my favourite in this category still is netsurf, which apparently to this date is lean enough to be runnable on vintage 1990 Atari TT machines. I'll freely admit I have not tried it, but the code is there.

    Yesterday, however, netsurf drove me crazy for a while: I was developing a web site, making sure it works with netsurf. This website has a cookie-based persistent login feature, and that didn't work. I sent my Set-Cookie headers all right – ngrep is your friend if you want to be sure, somewhat like this:

    sudo ngrep -i -d lo cookie port 8888

    Ngrep also clearly showed that netsurf really did not send any Cookie headers, so the problem wasn't on the cookie header parsing side of my program, either.

    But why did the cookies disappear? Cookie policy? Ha: netsurf does accept a cookie from Google, and crunching this would be the first thing any reasonable policy would do. Did I perhaps fail to properly adhere to the standards (which is another thing netsurf tends to uncover)? Hm: looking up the cookie syntax spec gave me some confidence that I was doing the right thing. Is my Max-Age ok? Sure, it is.

    The answer to this riddle: netsurf does not store cookies if it cannot sort them into a hierarchy of host names, and it never can do that for host names without dots (as in localhost, for instance). Given the ill-thought-out Domain attribute one can set for cookies (see the spec linked above if you want to shudder), I even have a solid amount of sympathy for that behaviour.

    But given that that is something that will probably bite a lot of people caring about freedom enough to bother with netsurf, I am still a bit surprised that my frantic querying of search engines on that matter did not bring up the slightly unconventional cookie handling of netsurf. Let us hope this post's title will change that. Again, netsurf 3 will not store cookies for not only localhost but any host name without dots in it. Which is a bit inconvenient for development, and hence despite my sympathy I am considering a bug report.

    Meanwhile, I've worked around the problem by adding: victor.local.de

    to my /etc/localhost (the name really doesn't matter as long as it will never clash with an actual site you want to talk to and it contains one or more dots) and access the site I'm developing as http://victor.local.de. Presto: my cookie comes back from netsurf all right.

    A Debugging Session

    So, how did I figure this riddle out? The great thing about Debian and halfway compact software like netsurf is that it makes it reasonably simple to figure out such (mis-) features. Since I firmly believe that the use of debuggers is a very basic skill everyone touching a computer should have, let me give a brief introduction here.

    First, you need to get the package's source. Make sure it matches the version of the program that you actually run; to do that, copy the deb line in /etc/apt/sources.list for the repository the package comes from (note that this could be the security repo if you got updates from there). In the copied line, replace deb with deb-src. In my case, that would be:

    deb-src https://deb.debian.org/debian bullseye main

    On a freshly installed Debian, it's likely you already have a line like this; consider commenting out the deb-src lines when not working with source code, as that will make your apt operations a bit faster.

    After an apt update, I can now pull the source. To keep your file system tidy, I put all such sources into children of a given directory, perhaps /usr/src if you're old-school, or ~/src if not:

    mkdir -p src/netsurf
    cd src/netsurf
    apt-get source netsurf-gtk

    I'm creating the intermediate netsurf directory because apt-get source creates four items in the directory, and in case you're actually building a package (which you could, based on this), more entries will follow; keeping all that mess outside of src helps a lot. Note that apt-get source does not need any special privileges. You really should run it as yourself.

    By the way, this is the first part where monsters like webkit make this kind of thing really strenuous: libwebkit sources (which still are missing much over a full browser) pull 26 megabytes of archive expanding to a whopping 300 Megabytes of source-ish goo.

    To go on, enter the directory that apt-get source created; in my case, that was netsurf-3.10. You can now look around, and something like:

    find . -name "*.c" | xargs grep "set-cookie"

    quickly brought me to a file called netsurf/content/urldb.c (yeah, you can use software like rgrep for „grep an entire tree“; but then the find/xargs combo is useful for many other tasks, too).

    Since I still suspected a problem when netsurf parses my set-cookie header, the function urldb_parse_cookie in there caught my eye. It's not pretty that that function is such an endless beast of hand-crafted C (rather than a few lines of lex[1]), but it's relatively readable C, and they are clearly trying to accomodate some of the horrible practices out there (which is probably the reason they're not using lex), so just looking at the code cast increasing doubts on my hypothesis of some minor standards breach on my end.

    In this way, idly browsing the source code went nowhere, and I decided I needed to see the thing in action. In order to not get lost in compiled machine code while doing that, one needs debug symbols, i.e., information that tells a debugger what compiled stuff resulted from what source code. Modern Debians have packages with these symbols in an extra repository; you can guess the naming scheme from the apt.sources string one has to use for bullseye:

    deb http://debug.mirrors.debian.org/debian-debug bullseye-debug main

    After another round of apt update, you can install the package netsurf-gtk-dbgsym (i.e., just append a -dbgsym to the name of the package that contains the program you want to debug). Once that's in, you can run the GNU debugger gdb:

    gdb netsurf

    which will drop you into a command line prompt (there's also a cool graphical front-end to gdb in Debian, ddd, but for little things like this I've found plain gdb to be less in my way). Oh, and be sure to do that in the directory with the extracted sources; only then can gdb show you the source lines (ok: you could configure it to find the sources elsewhere, but that's rarely worth the effort).

    Given we want to see what happens in the function urldb_parse_cookie, we tell gdb to come back to us when the program enters that function, and then to start the program:

    (gdb) break urldb_parse_cookie
    Breakpoint 1 at 0x1a1c80: file content/urldb.c, line 1842.
    (gdb) run
    Starting program: /usr/bin/netsurf

    With that, netsurf's UI comes up and I can go to my cookie-setting page. When I try to set the cookie, gdb indeed stops netsurf and asks me what to do next:

    Thread 1 "netsurf" hit Breakpoint 1, urldb_parse_cookie (url=0x56bcbcb0,
        cookie=0xffffbf54) at content/urldb.c:1842
    1842  {
    (gdb) n
    1853    assert(url && cookie && *cookie);

    n (next) lets me execute the next source line (which I did here). Other basic commands include print (to see values), list (to see code), s (to step into functions, which n will just execute as one instruction), and cont (which resumes execution).

    In this particular debugging session, everything went smoothly, except I needed to skip over a loop that was boring to watch stepping through code. This is exactly what gdb's until command is for: typing it at the end of the loop will fast forward over the loop execution and then come back once the loop is finished (at which point you can see what its net result is).

    But if the URL parsing went just fine: Why doesn't netsurf send back my cookie?

    Well, tracing on after the function returned eventually lead to this:

    3889      suffix = nspsl_getpublicsuffix(dot);
    3890      if (suffix == NULL) {

    and a print(suffifx) confirmed: suffix for localhost is NULL. Looking at the source code (you remember the list command, and I usually keep the source open in an editor window, too) confirms that this makes netsurf return before storing the freshly parsed cookie, and a cookie not stored is a cookie not sent back to the originating site. Ha!

    You do not want to contemplate how such a session would look like with a webkit browser or, worse, firefox or chromium, not to mention stuff you don't have the source …

  • Quick RST Previews for Posts in Pelican

    In January, I described how I use this blog's engine, pelican, and how I have a “development” and a “production” site (where I will concede any time that it's exceedingly silly to talk about “production” in this context). Part of that was a trivial script, remake.sh, that I would run while writing and revising a post to format it without doing too much unnecessary work. This script was running between a couple and a couple of dozen times until I was happy with an article.

    What the script did was call pelican asking to only write the document being processed. When pelican was instructed to cache work on the other articles, that was enough to keep build times around a second on my box; but as the number of posts on this blog approaches 200, build times ended up on the totally wrong side of that second, and I thought: “Well, why don't I run, perhaps, rst2html for formatting while revising?” That would be, essentially, instantaneous.

    But pelican does a lot more than rst2html. Especially, having the plugins and the templating available is a good thing when inspecting a post. So, I got to work and figured out how pelican builds a document. The result is a script build-one that only looks at a single (ReStructuredText) article – which it gets from its command line – and ignores everything else.

    This is fast enough to be run whenever I save the current file. Therefore, in my pelican directory I now have, together with the script, the following .vimrc enabling just that (% expands to the file currently edited in vim):

    augroup local
      autocmd BufWritePost *.rst !python build-one %
    augroup END

    I've briefly considered whether I should also add some trick to automatically reload a browser window when saving but then figured that's probably overdoing things: In all likelihood I want to scroll around in the rendered document, and hence I will have to focus it anyway. If I do that, then effort spent on saving pressing r after focusing feels misplaced.

    The script does have an actual drawback, though: Since pelican does not get to scan the file system with build-one, it cannot do file name substitution (as in {filename}2022-05-26.rst) and will instead warn whenever seeing one of these. Since, as described in January, my static files are not managed by pelican, that is not a serious problem in my setup, except I have to watch out for broken substitutions when doing a final make html (or the make install).

    Insights into Pelican

    It took me a bit to figure out how the various parts of pelican fit together at least to the extent of letting me format a ReStructuredText document with the jinja templates. Let me therefore briefly discuss what the script does.

    First, to make pelican do anything remotely resembling what it will do on make html, you have to load its settings; since I assume I am running in pelican's directory and this is building a “draft” version, I can simply do:

    settings = pelican.read_settings("pelicanconf.py")

    With that, I already now where to write to, which lets me construct a writer object; that will later arrange for actually placing the files. I can also construct a reader for my ReStructuredText files (and you would have to change that if you are writing in Markdown); these readers decouple the Article class from input formats:

    writer = writers.Writer(settings["OUTPUT_PATH"], settings)
    reader = readers.RstReader(settings)

    With that, I have to delve deep into pelican's transformation machinery, which consists of various generators – for articles, static files, pages, whatever. The constructors of these generator classes (which are totally unrelated to Python generators) take a lot of arguments, and I cannot say I investigated why they insist on having them passed in when I fill them with data from settings anyway (as does pelican itself); but then I suspect these extra arguments are important for non-Article generators. I only need to generate a single article, and so stereotypically writing:

    artgen = generators.ArticlesGenerator(
        settings.copy(), settings,
        settings["PATH"], settings["THEME"], settings["OUTPUT_PATH"])

    does the trick for me.

    Article generators will usually collect the articles to generate by looking at the file system. I don't want that; instead, I want to construct an Article instance myself and then restrict the generator's action to that.

    The Article class needs to be constructed with content and metadata, which happen to be what readers return. So, to construct an Article from the RST file passed in in source_path, I need to say:

    content, metadata = reader.read(source_path)
    art = contents.Article(content, metadata,
        source_path=source_path, settings=settings)

    After all that preparation, all that is left to do is overwrite any misguided ideas the article generator might have on what I would like to have processed and then let it run:

    artgen.translations = []
    artgen.articles = [art]
        functools.partial(writer.write_file, relative_urls=True))

    (the currying of the writer's write_file method to make sure it creates relative URLs you can probably do without, but I'm a fan of relative URLs and of almost anything in functools).

Seite 1 / 3 »

Letzte Ergänzungen