Tag blog

  • Feedback and Addenda in Pelican Posts

    Screenshot: a (relatively) rude comment and a reply, vaguely reminiscent of classic slashdot style.

    Blog comments may be dead out there; here, I'd like to at least pretend they're still alive, and thus I've written a pelican plugin to properly mark them up.

    When I added a feedback form to this site about a year ago, I also created a small ReStructuredText (RST) extension for putting feedback items into the files I feed to my blog engine Pelican. The extension has been sitting in my pelican plugins repo on codeberg since then, but because there has not been a lot of feedback on either it or the posts here (sigh!), that was about it.

    But occasionally a few interesting (or at least provocative) pieces of feedback did come in, and I thought it's a pity that basically nobody will notice them[1] or, (cough) much worse, my witty replies.

    At the same time, I had quite a few addenda to older articles, and I felt some proper markup for them (plus better chances for people to notice they're there) would be nice. After a bit of consideration, I figured the use cases are similar enough, and I started extending the feedback plugin to cover addenda, too. So, you can pull the updated plugin from codeberg now. People running it on their sites would certainly be encouragement to add it to the upstream's plugin collection (after some polishing, that is).

    Usage is simple – after copying the file to your plugins folder and adding "rstfeedback" to PLUGINS in pelicanconf.py, you write:

    .. feedback::
        :author: someone or other
        :date: 2022-03-07
    
        Example, yadda.
    

    for some feedback you got (you can nest these for replies) or:

    .. addition::
      :date: 2022-03-07
    
      Example, yadda.
    

    for some addition you want to make to an article; always put in a date in ISO format.

    In both cases a structured div element is generated in the HTML, which you can style in some way; the module comment shows how to get what's shown in the opening figure.

    The extension also adds a template variable LAST_FEEDBACK_ITEMS containing a list of the last ten changes to old posts. Each item is an instance of some ad-hoc class with attributes url, kind (feedback or addendum), the article title, and the date. On this blog, I'm currently formatting it like this in my base template:

    <h2>Letzte Ergänzungen</h2>
    <ul class="feedback">
    {% for feedback_item in LAST_FEEDBACK_ITEMS %}
            <li><a href="{{ SITEURL }}/{{ feedback_item.url }}">{{ feedback_item.kind }} zu „{{ feedback_item.title }}“</a> ({{ feedback_item.date }})</li>
    {% endfor %}
    </ul>
    

    As of this post, this block is at the very bottom of the page, but I plan to give it a more prominent place at least on wide displays real soon now. Let's see when I feel like a bit of CSS hackery.

    Caveats

    First of all, I have not localised the plugin, and for now it generates German strings for “Kommentar” (comment), “Nachtrag” (addendum) and “am” (on). This is relatively easy to fix, in particular because I can tell an article's language from within the extension from the article metadata. True, that doesn't help for infrastructure pages, but then these won't have additions anyway. If this found a single additional user, I'd happily put in support for your preferred language(s) – I should really be doing English for this one.

    This will only work with pages written in ReStructuredText; no markdown here, sorry. Since in my book RST is so much nicer and better defined than markdown and at the same time so easy to learn, I can't really see much of a reason to put in the extra effort. Also, legacy markdown content can be converted to RST using pandoc reasonably well.

    If you don't give a slug in your article's metadata, the plugin uses the post's title to generate a slug like pelican itself does by default. If you changed that default, the links in the LAST_FEEDBACK_ITEMS will be wrong. This is probably easy to fix, but I'd have to read a bit more of pelican's code to do it.

    I suppose the number of recent items – now hardcoded to be 10 – should become a configuration variable, which again ought to be easy to do. A more interesting (but also more involved) additional feature could be to have per-year (say) collections of such additions. Let's say I'm thinking about it.

    Oh, and error handling sucks. That would actually be the first thing I'd tackle if other people took the rstfeedback plugin up. So… If you'd like to have these or similar things in your Pelican – don't hesitate to use the feedback form (or even better your mail client) to make me add some finish to the code.

    [1]I made nginx write logs (without IP addresses, of course) for a while recently, and the result was that there's about a dozen human visitors a day here, mostly looking at rather recent articles, and so chances are really low anyone will ever see comments on old articles without some extra exposure.
  • Blog Extensions on Codeberg

    Screenshot of a browser window showing http://localhost:6070/foo and a fortune cookie in glorious ASCII.

    This post takes an odd bend to become an apology for CGI (as in common gateway interface) scripts. This is the netsurf browser communicating with the CGI shell script at the foot of this post.

    I have written a few plugins and extensions for this blog, and I have discussed them in a few posts (e.g., feedback form, tag explanations, cited-by links, or the search engine). The code implementing these things has been strewn across the various posts. I have to admit that having that code attached to just a few blog posts has always felt somewhat too early-90iesy to me.

    Now that I have created my Codeberg account, I have finally copied together all the various bits and pieces to create a repository on Codeberg that you are welcome to clone if you're running pelican or perhaps some other static blog engine. And of course I appreciate merge requests with improvements.

    There is one major file in there I have not previously discussed here: cgiserver.py. You see, I'm a big fan of CGI scripts. They're reasonably simple to write, trivial to deploy, and I have CGIs that have been working with minimal maintenance for more than 20 years. Sure, pulling up an interpreter for every request is not terribly efficient, but for your average CGI that is perhaps called a dozen times per day (depending on how many web crawlers find it interesting) this really doesn't matter. And that's why both the feedback script and the search engine are written as CGIs.

    However, in contrast to apache, nginx (which serves this blog) does not support CGI scripts. I even by and large agree with their rationale for that design decision. Still, I would like to run CGIs, and that's why I've written the cgiserver. It is based on Python's built-in HTTP server and certainly will not scale – but for your average blog (or similar site) it should be just fine. And I don't think it has any glaring security problems (that you don't introduce with your CGIs, that is).

    Installation is almost trivial: put the file somewhere (the in-source sysvinit script assumes /var/www/blog-media/cgiserver.py, but there's absolutely no magic about this), and then run it with a port number (it will only bind to localhost; the default in the sysvinit script is 6070) and a directory into which you put your CGI scripts (the sysvinit script assumes /var/www/blog-media/cgi).

    When you have a cgi script foo, you can dump it in this directory, make it executable and then run it by retrieving http://localhost:6070/foo. In case you have nothing else, you can try a shell script like:

    #!/bin/sh
    echo "content-type: text/plain"
    echo
    /usr/games/fortune
    

    (which of course only works in this form if you have something like fortunes-en installed on a Debian box). That should be enough to give you something like the screenshot opening this post. Even more than 25 years after I have written my first CGI, I am still amazed how simple this is.

    Disclaimer: Writing CGI scripts that take input such that they are not trivially exploitable is higher art. So… don't do it, except as a game. Oh, and to debug your scripts, simply let cgiserver run in a terminal – that way, you will see what your scripts emit on stderr. Note, however, that the way the sysvinit script starts cgiserver, it will run as nobody; if things work when you start cgiserver yourself but not when it's running as a daemon, that's the most likely reason.

  • Maintaining Static Blogs Using git push

    local                server
    
    main  --- push --->   main
                            |
                            | (merge)
                            |
                            v
                       published --- make publish --->  nginx
    
    Fig 1.  Our scheme in classic ASCII art.
    

    In my post on how I'm using pelican – the static blog engine that formats this site –, I had described that on a make install, I would do a local build (make publish) and then rsync the result to the production site. Since about June, I no longer do that, because the way pelican works – it touches every generated file every time – is not a good match for rsync. With a growing site, this means a substantial amount of data (well: a few megabytes for me at this time) is being transferred. What's a few megabytes these days, you ask? Well, ever since UMTS has been shut down, on the road all I have is GPRS (i.e., 10 kB/s with a bit of luck), and then a few Megabytes is a lot.

    I hence finally changed things to benefit from the fact that I keep the text content in a version control system. For a post without media, all that needs to be transferred are a few kilobytes for a git push. Here is how that is done (assuming a Debian-like setup).

    First, unless your source directory already is under git version control, in there run:

    git init
    git add Makefile content plugins pelicanconf.py publishconf.py theme tasks.py
    git commit -am "Migrating into git"
    

    You will probably also want to have a .gitignore, and then probably several other files on top, but that's beside the current point.

    Two Repos, Two Branches

    The rough plan is to have a complete, checked-out git repository on the server side (ahem: see Figure 1). It is updated from your local repo through pushes. Since you cannot push into a checked-out branch, the server-side repository has a branch published checked out, while your authoring happens in the main (traditionally called master) branch. After every push, main is merged into published, and then pelican's site generation runs.

    A word of warning: these merges will fail when you force-push. Don't do that. If you do, you will have to fix the breakage on the server side, either by dropping and re-creating the published branch, or by massaging all places that a force-pushed commit changed.

    To set this up, on the web server do (adapting to your site and taste if you don't like the path):

    sudo mkdir -p /var/blog/source
    sudo chown `id -u` /var/blog/source # you'll be pushing as yourself
    cd /var/blog/source
    # create a git repo you can push into
    git init
    # go away from the main/master branch so you can push into it
    git checkout -b published
    

    Then, in your local git repository for the blog, add the repository you just created as a remote named prod and push the main branch (this assumes you have the main branch checked out):

    git remote add prod ssh://USER@SERVER.NAME//var/blog/source
    git push prod
    

    On the remote server, you are still on the published branch, and hence you will not see what you have just pushed. You have to merge main using:

    git merge main
    

    (or master, if that's still the name of your main branch). You should now see whatever you have put into your local git above. If that's true, you can say make publish and see your publishable site in the output subdirectory. If it's not true, start debugging by making sure your main branch on the server side really contains what you think you have pushed.

    Automating the Publication

    This completes the basic setup. What is still missing is automation. That we can do with a git hook (see the githooks man page for more information on that nifty stuff) that is installed on the server side into /var/blog/source/.git/hooks/post-update. This file contains a shell script that is executed each time commits are pushed into a repository once git has updated everything. In this case, it is almost trivial, except for some bookkeeping and provisions for updating the search engine (all lines with BLOG_ROOT in them; delete these when you have not set that up):

    #!/bin/sh
    # This hook merges the main branch, builds the web page, and does
    # housekeeping.
    #
    # This *assumes* we have the published branch checked out.  It should
    # probably check that one day.
    
    set -e
    
    unset GIT_DIR # this is important, since we're manipulating the
       # working tree, which is a bit uncommon in a post-update hook.
    cd ..
    BLOG_ROOT=/var/blog
    
    git merge master
    make publish
    BLOG_DIR=$BLOG_ROOT/source/output $BLOG_ROOT/media/cgi/blogsearch
    

    Do not forget to chmod +x that file, or git will ignore it.

    Again at the local side, you have to modify your install target so something like:

    rsync:
           # adapt the paths!
                  rsync --info=progress2 -av /var/www-local/blog-media/ blog.tfiu.de:/var/blog/media/
    
    install: rsync
                  -git commit -a
                  git push -u prod master
    

    (the - in front of the git commit is because git returns non-zero if there is nothing to commit; in the present case, you may still want to push, perhaps because previous commits have not been pushed, and hence we tell make to not bother about the status of git commit).

    With this path and the separate media directory still updated through rsync (cf. the previous post on this), an nginx config would have to contain lines like:

    location / {
      root /var/blog/source/output;
    }
    
    location /media/ {
      alias /var/blog/media/;
    }
    

    This setup has worked nicely and without a flaw in the past few months. It makes a lot more sense the my previous setup, not the least because any junk that may accumulate in my local output directory while I'm fooling around will not propagate to the published server. So: If you work with pelican or a similar static blog generator, I'd say this is the way to partial bliss.

  • Quick RST Previews for Posts in Pelican

    In January, I described how I use this blog's engine, pelican, and how I have a “development” and a “production” site (where I will concede any time that it's exceedingly silly to talk about “production” in this context). Part of that was a trivial script, remake.sh, that I would run while writing and revising a post to format it without doing too much unnecessary work. This script was running between a couple and a couple of dozen times until I was happy with an article.

    What the script did was call pelican asking to only write the document being processed. When pelican was instructed to cache work on the other articles, that was enough to keep build times around a second on my box; but as the number of posts on this blog approaches 200, build times ended up on the totally wrong side of that second, and I thought: “Well, why don't I run, perhaps, rst2html for formatting while revising?” That would be, essentially, instantaneous.

    But pelican does a lot more than rst2html. Especially, having the plugins and the templating available is a good thing when inspecting a post. So, I got to work and figured out how pelican builds a document. The result is a script build-one that only looks at a single (ReStructuredText) article – which it gets from its command line – and ignores everything else.

    This is fast enough to be run whenever I save the current file. Therefore, in my pelican directory I now have, together with the script, the following .vimrc enabling just that (% expands to the file currently edited in vim):

    augroup local
      au!
      autocmd BufWritePost *.rst !python build-one %
    augroup END
    

    I've briefly considered whether I should also add some trick to automatically reload a browser window when saving but then figured that's probably overdoing things: In all likelihood I want to scroll around in the rendered document, and hence I will have to focus it anyway. If I do that, then effort spent on saving pressing r after focusing feels misplaced.

    The script does have an actual drawback, though: Since pelican does not get to scan the file system with build-one, it cannot do file name substitution (as in {filename}2022-05-26.rst) and will instead warn whenever seeing one of these. Since, as described in January, my static files are not managed by pelican, that is not a serious problem in my setup, except I have to watch out for broken substitutions when doing a final make html (or the make install).

    Insights into Pelican

    It took me a bit to figure out how the various parts of pelican fit together at least to the extent of letting me format a ReStructuredText document with the jinja templates. Let me therefore briefly discuss what the script does.

    First, to make pelican do anything remotely resembling what it will do on make html, you have to load its settings; since I assume I am running in pelican's directory and this is building a “draft” version, I can simply do:

    settings = pelican.read_settings("pelicanconf.py")
    

    With that, I already now where to write to, which lets me construct a writer object; that will later arrange for actually placing the files. I can also construct a reader for my ReStructuredText files (and you would have to change that if you are writing in Markdown); these readers decouple the Article class from input formats:

    writer = writers.Writer(settings["OUTPUT_PATH"], settings)
    reader = readers.RstReader(settings)
    

    With that, I have to delve deep into pelican's transformation machinery, which consists of various generators – for articles, static files, pages, whatever. The constructors of these generator classes (which are totally unrelated to Python generators) take a lot of arguments, and I cannot say I investigated why they insist on having them passed in when I fill them with data from settings anyway (as does pelican itself); but then I suspect these extra arguments are important for non-Article generators. I only need to generate a single article, and so stereotypically writing:

    artgen = generators.ArticlesGenerator(
        settings.copy(), settings,
        settings["PATH"], settings["THEME"], settings["OUTPUT_PATH"])
    

    does the trick for me.

    Article generators will usually collect the articles to generate by looking at the file system. I don't want that; instead, I want to construct an Article instance myself and then restrict the generator's action to that.

    The Article class needs to be constructed with content and metadata, which happen to be what readers return. So, to construct an Article from the RST file passed in in source_path, I need to say:

    content, metadata = reader.read(source_path)
    art = contents.Article(content, metadata,
        source_path=source_path, settings=settings)
    

    After all that preparation, all that is left to do is overwrite any misguided ideas the article generator might have on what I would like to have processed and then let it run:

    artgen.translations = []
    artgen.articles = [art]
    artgen.generate_articles(
        functools.partial(writer.write_file, relative_urls=True))
    

    (the currying of the writer's write_file method to make sure it creates relative URLs you can probably do without, but I'm a fan of relative URLs and of almost anything in functools).

  • A Feedback Form in Pelican

    I realise that the great days of discussions on blogs are over, as Sam Hartman blogged the other day – at least for now. Still, I'd like to make it somewhat more straightforward to send me feedback on the posts here than having to get the contact address and dropping me a mail. Hence, I've written a little Python script, feedback, that lets people comment from within their web browsers.

    Nachtrag (2022-10-07)

    Don't take it from here; rather, see https://codeberg.org/AnselmF/pelican-ext

    While the script itself is perfectly general and ought to work with any static blog engine, the form template I give in the module docstring is geared towards pelican and jinja, although only in very few places.

    To make it work, this needs to become a CGI (the template assumes it will show up in /bin/feedback according to the server configuration). The notes on deployment from my post on the search engine apply here, too, except that in addition the host has to be able to deliver mail. Most Unix boxes do locally, but whether anyone reads such mail is a different question.

    Is it ethical to check “ok to publish” by default?

    To configure where it sends mail to (by default, that's root, which may make sense if you have your own VM), you can set the CONTACT_ADDRESS environment variable (see the search engine post in case you're unsure how to do that for a web context). If your machine is set up to deliver mail to remote addresses – be it with a full mail server or using a package like nullmailer –, you could use your “normal” mail address here. In that case, you probably should inform people in your privacy policy that their comments will be sent by unencrypted mail, in particular if that “normal“ e-mail is handled by one of the usual rogues (Google still gets about a half of the mail I send – sigh).

    If you look below (or perhaps right if you run your browser full-screen), you will see that there is a checkbox “feel free to publish“ that, right now, I have checked by default. I had some doubts about that in terms of creepy antipatterns. Of course I am as annoyed by most contemporary cookie banners as anyone else, which in violation of the GDPR usually have practical defaults – sure: not what you get when you say “configure” – set at the maximum creep level the operators believe they can get away with. On the other hand, defaults should also be expectable, and I'd guess the expectation when someone fills out a reply form on a blog is that the response will be published with the article. If you disagree: well, the comment form is there for you.

    In terms of spam protection, I do an incredibly dumb textcha. Even if this script got deployed to a few dozen sites (which of course would be charming), I cannot see some spam engine bothering to figure it out; since it just sends a mail to the operator, there is basically nothing to be gained from spamming using the CGI. I fully expect this will be enough to keep out the dumb spambots that blindly send whatever forms they can find – it has worked on many similar services.

    Security Considerations

    The feedback script does at least two things that could be exploited:

    1. It enters remotely controlled values into rendered HTML and
    2. It calls a local binary with content controlled by the remote user.

    In case (1) – that's when I put the URI of the originating article into the reply message to send people back to where they came from –, this can potentially be exploited in cross-site attacks. Suppose you trust my site on only execute benign javascript (I give you that's close to an oxymoron these days), someone could trick you into clicking on a link that goes to my site but executes their, presumably adversarial, javascript.

    Bugs aside, the script is resilient against that, as it properly escapes any user input that gets copied into the output. That is thanks to my “micro templating“ that I keep around to paste into such one-script wonders. Have a look into the code if you're interested in how that works. And totally feel free to paste that into any Python code producing HTML or XML templated in any way – sure, it's not jinja or stan, but it has covered 80% of my templating needs at much less than 20% of the effort (counted in code lines of whatever dependency you'd pull in otherwise), which is a good deal in my book.

    Case (2) is probably a lot more interesting. In the evaluate_form function, I am doing:

    mail_text = MAIL_TEMPLATE.format(**locals())
    

    Code like this usually is reason for alarm, as far too many text formats can be used to execute code or cause other havoc – the cross-site thing I've discussed for HTML above being one example, the totally bizarre Excel CSV import exploit another (where I really cannot see how this doesn't immediately bring every Windows machine on this planet to a grinding halt). In this case, people could for example insert \ncc: victim@address into anything that gets into headers naively and turn the form into a spam engine.

    There are exactly 10000 lines if Python's email module in version 3.9.

    In addition, there is a concrete risk creating some way of locally executing code, as the template being filled out is then used as an input for a local program – in this case, whatever you use as sendmail. In theory, I'm pretty sure this is not a problem here, as no user-controlled input goes into the headers. If you change this, either sanitise the input, probably by clamping everything down to printable ASCII and normalising whitespace, or by parsing them yourself. The message content, on the other hand, gets properly MIME-encapsulated. In practice, I can't say I trust Python's email package too much, as by Python stdlib standards, it feels not terribly mature and is probably less widely used than one may think.

    But that's a risk I'm willing to take; even if someone spots a problem in the email module, shodan or a similar service still has no way to automatically figure out that it is in use in this form, and my page's insignificance makes it extremely unlikely that someone will do a targeted attack on day 0. Or even day 10.

    But then, perhaps this is a good occasion to read through email's source code? Fun fact: in python 3.9, a find . -name "*.py" | xargs wc -l gives exactly 10000 lines. And my instinct that headers are the trickiest part is probably right, too: 3003 of those are in _header_value_parser.py.

  • View with Netsurf

    A screenshot of a browser window

    An early version of this post rendered in netsurf.

    I believe about the worst threat to software freedom these days is web browsers. That is not only because they already are, for many people out there, a more relevant applications platform than their primary operating system, and that almost everything that gets run in them is extremely non-Free software. I've been linking to a discussion of this problem from these pages since this blog's day one as part of my quip on “best viewed with javascript disabled“.

    No, they are also a threat because the “major” browser engines are so humunguous that they are in effect locking out most platforms (which simply don't have enough power to run them). And they are because the sheer size and complexity of their code bases make it essentially impossible for an individual to fix almost any relevant bug in them related to rendering, javascript execution, or network interactions.

    That is why I am so grateful to the authors and maintainers of both dillo (Debian: dillo) and netsurf (Debian: netsurf-gtk, mainly), small browsers with maintainable code bases. While dillo is really basic and is missing so much of CSS and modern HTML that on today's web even many non-adversarial sites become barely usable, netsurf is usually just fine for websites respecting user rights.

    Flex layouts and the article elements: The good part of 20 years of web development after the Web 1.0.

    I have to admit I nevertheless only use it in very specific contexts, mostly because luakit with its vi-like key bindings and lua extensiblity in the end usually wins out even though I don't trust the webkit rendering engine for two cents[1]. And that is why I hadn't noticed that this blog has rendered a lot worse than it should have in netsurf. This is particularly shameful because that was mostly because I have taken liberties with web standards that I should not have taken. Apologies: Netsurf was right and I was wrong.

    I have improved that quite a bit this morning. Given I am using flex layouts quite liberally here, and these don't work in Debian stable's netsurf, the rendered pages do look quite a bit different in netsurf than on the “major” browsers. But the fallbacks are ok as far as I am concerned. Since flex layouts are among the few “innovations“ in the post-Web 1.0 ecosystem that are actually a good idea, I gladly accept these fallbacks. Let me stress again that it is a feature of a friendly web rather than a bug that pages look different in different user agents.

    Dillo, regrettably, is another matter because of the stupid^Wunderconsidered colour games I'm playing here. As things are right now, the light background below text like this one sits on an HTML5 article element, which dillo ignores. Hence, the text is black on dark green, which, well, may be barely readable but really is deeply sub-optimal. Since I consider the article element and its brethren real progress in terms of markup (the other positive “innovation” post Web-1.0), I will not change that markup just to make this render better in dillo. I may finally re-think the silly dark green background soon-ish, though.

    [1]If you feel like this, too, let's team up and massage luakit's front end to work with netsurf's rendering engine. Given the close entanglement of luakit with the webkitgtk API, this certainly will result in a very different program, and almost certainly there would be no way to re-use luakit extensions. Still, I could very well see such a thing become my main browser.
  • Giving in to Network Effects

    In my first Fediverse notes, I mused that I'd choose a larger community if I had to choose again.

    Well, after watching the Fediverse for a little while, I figured that while I may not actually have to choose again, I really want to. My old community, social.dev-wiki.de, had about 650 profiles that had posted 4500 toots between them. This undoubtedely counts as small, and that has the double effect that not terribly many toots are coming in on the federated feed (I can't bring myself to write “timeline”) because people on the instance don't follow too many others, and that toots I produce don't get distributed very far because there are not many instances with people following someone on that small instance. A double negative network effect.

    This is particularly unwelcome when globally searching for hashtags (as I did last Sunday when I thought the local elections in Saarland might reflect in the Fediverse). Sure, I can help fix that by starting to follow accounts from other instances, but that's a bit of a chicken-and-egg thing, since in my own instance's feeds I don't even see those other accounts. Perhaps sitting on the public feed of the “flagship” instance (mastodon.social has about 640'000 profiles) for a while might have helped.

    Fediverse relays are a honking great idea.

    But it also felt odd to be behind the most active profile on that instance, and so I decided to compromise. That is, I don't give in to the pressures of the network effect altogether. I am therefore not switching to the flagship instance (which does feel a bit central-ish). But I am switching to troet.cafe, which boasts 2600 users (a factor of 4 over my old instance) with 150'000 posts (a factor of 30) between them. Plus, it uses a “relay“ that somewhat mitigates the problem outlined above by essentially creating a sub-federation of smaller instances exchanging public toots regardless of whether people on them follow each other.

    So, I made the move today.

    It is nice that Mastodon has built-in support for moving; there is an “Import and Export” item in the settings menu that guides one reasonably clearly through the process and transfers the followers (small deal on my profile at the moment). It is then that it gets a little lame.

    You see, I'd have expected that when I get an archive of my profile under “Export” I ought to be able to import it again under the new profile's “Import”. But that is not how it works; it seems the downloaded archive cannot be uploaded, and whatever is in there is either lost (the toots) or needs to be manually restored (profile pictures). Instead, what can be uploaded are CSVs of followed people, block lists and the like. And these are not in the archive one downloads but need to be downloaded from the old profile and re-uploaded to the new profile one by one.

    Is this really the way it's supposed to be? Have I missed something?

    Ok, one doesn't move every day, but if I keep being a Fedinaut, I will probably move again one day – to my own instance. It would be nice if by then there were smoother migration paths.

  • Now on the Fediverse

    Mastodon logo

    AGPL (copyright)

    While I believe that RSS (or rather Atom a.k.a. RFC 4287) is a great standard for subscribing to media like blogs[1], I strongly suspect that virtually nobody pulls my RSS feed. I'm almost tempted to log for a while to ascertain that. Then again, just based on how few people still run RSS aggregators (me, I'm using a quick self-written hack based on python3-feedparser) I am already quite confident the RSS mainly sits idly on my server.

    At least outside of my bubble, I guess what RSS was designed for has been superceded by the timelines of Facebook, Twitter, and their less shopworn ilk. As a DIY zealot, of course none of that is an option for me. What is an option in this field (and what certainly can do with a bit more public attention) is what these days is commonly called the Fediverse, that is, various sites, servers and client programs in the rough vicinity of microblogging, held together by W3C's ActivityPub protocol.

    What this technobabble means in practice: If you already are in the Fediverse, you can follow @Anselm@social.dev-wiki.de and get a toot whenever I post something here (note, however, that most posts will be in German).

    If you're not in the Fediverse yet, well, choose a community[2] – if I had to choose again, I'd probably take a larger community, as that increases one's initial audience: other communities will, for all I understand, only carry your public toots (i.e., messages) if someone in them has subscribed someone from your community –, get a client – I'm using tootle as a GUI and toot for the CLI – and add my Fediverse id.

    To accomodate tooting about new posts, I have made two changes to by pelican tooling: For one, post.py3 now writes a skeleton toot for the new post, like this:

    with open("next-toot", "w", encoding="utf-8") as f:
      f.write(f"{headline} – https://blog.tfiu.de/{slug}.html\n#zuengeln\n")
    

    And I have a new Makefile target:

    toot:
      (cat next-toot; echo "Post?"; read x)
      toot post < next-toot
    

    In that way, when I have an idea what the toot for the article should contain while I'm writing the post, I edit next-toot, and after I've run my make install, I'm doing make toot to notify the Fediverse.

    A side benefit: if you'd like to comment publicly and don't want do use the mail contact below: you can now do that through Mastodon and company.

    [1]That it is a great standard is already betrayed by the fact that its machine-readable specification is in Relax NG rather than XML schema.
    [2]This article is tagged DIY although I'm not running a Mastodon (or other AcitivityPub server) instance myself because, well, I could do that. I don't, for now, because Mastodon is not packaged for Debian (and for all I can tell neither are alternative ActivityPub servers). Looking at Mastodon's source I can understand why. Also, I won't rule out that the whole Fediverse thing will be a fad for me (as was identi.ca around 2009), and if I bother to set up unpackaged infrastructure, I need to be dead sure it's worth it.
  • Explaining Tags in Pelican

    Right after I had celebrated the first anniversary of this blog with the post on my Pelican setup, I decided to write another plugin I've been planning to write for a while: taginfo.py.

    Nachtrag (2022-10-07)

    Don't take it from here; rather, see https://codeberg.org/AnselmF/pelican-ext

    This is for:

    Blog screenshot

    that is, including explanations in on pages for tags, telling people what the tag is supposed to mean.

    To use taginfo, put the file into your plugins folder, add taginfo to the PLUGINS list in your pelicanconf.py, and then create a folder taginfo next to your content folder. In there, for each tag you want to comment, create a file <tagname>.rstx (or just rst). Such a file has to contain reStructuredText, where pelican's extensions (e.g., {filename} links) do not work (yet). I suppose it wouldn't be hard to support them; if you're interested in this plugin, feel free to poke me in case you'd like to see the extra pelican markup.

    To make the descriptions visible, you need to change your tag.html template (typically in theme/templates/tag.html) in order to arrange for tag.make_description() to be callsed when rendering the document. Me, I'm doing it like this:

    {% block content_title %}
    <h1>Tag <em>{{ tag }}</em></h1>
    <div id="taginfo">
            {{ tag.make_description() }}
    </div>
    {% endblock %}
    

    (And I still find jinja templates exceptionally ugly).

  • How I'm Using Pelican

    I started this blog on January 14th last year. To celebrate the anniversary, I thought I could show how I'm using pelican (the blog engine I'm using); perhaps it'll help other people using it or some other static blog generator.

    Posting and Writing

    First, I structure my content subdirectory (for now) such that each article has the ISO-formatted date as its name, which makes that source name rather predictable (for linking using pelican's {filename} replacement), short, and gives the natural sort order sensible semantics.

    Also, I want to start each post from a template, and so among the first things I did was write a little script to automate name generation and template instantiation. Over the past year, that script has evolved into post.py3.

    Nachtrag (2022-03-15)

    I've changed a few things in the meantime; in particular, I am now opening a web browser because I got tired of hunting for the URI when it was scrolled off the screen before I first had something to open, and to make that work smoothly, I'm building the new post right after creating its source.

    It sits next to pelican's Makefile and is in the blog's version control. With this, starting this post looked like this:

    $ ./post.py3 "How I'm Using Pelican"
    http://blog/how-i-m-using-pelican.html
    remake.sh output/how-i-m-using-pelican.html
    

    Nachtrag (2022-05-26)

    The output is now a bit different, and now I do open the browser window – see below.

    What the thing printed is the URL the article will be seen under (I've considered using the webbrowser module to automatically open it, but for me just pasting the URL into my “permanent” blog browser window works better). The second line gives a command to build the document for review. This remake.sh script has seen a bit of experimentation while I tried to make the specification of what to remake more flexible. I've stopped that, and now it's just:

    #!/bin/bash
    pelican --write-selected "$1"
    

    When you add:

    CACHE_CONTENT = True
    LOAD_CONTENT_CACHE = True
    CONTENT_CACHING_LAYER = 'generator'
    

    to your pelicanconf.py, rebuilding just the current article should be relatively quick (about 1 s on my box). Since I like to proofread on the formatted document, that's rather important to me.

    Nachtrag (2022-05-26)

    N…no. This part I'm now doing very differently. See Quick RST Previews.

    If you look at post.py3's code, you will see that it also fixes the article's slug, i.e., the path part of the URL. I left this to Pelican for a while, but it annoyed me that even minor changes to a blog title would change the article's URI (and hence also the remake statment). I was frankly tempted to not bother having elements of the title in the slug at all, as I consider this practice SEO, and I am a fanatical enemy of SEO. But then I figured producing shorter URIs isn't worth that much, in particular when I'd like them to be unique and easy to pronounce. In the end I kept the title-based slugs.

    The script also picks the local file name as per the above consideration with some disambiguation if there's multiple posts on one day (which has only happened once in the past year). Finally, the script arranges for adding the new post to the version control system. Frankly, from where I stand now, I'd say I had overestimated the utility of git for blogging. But then, a git init is cheap, and who knows when that history may become useful.

    I'm not using pelican's draft feature. I experimented with it for a while, but I found it's a complication that's not worth anything given I'm always finishing a post before starting the next. That means that what otherwise would be the transition from draft to published for me is the make install. The big advantage of starting with status:published is that under normal circumstances, an article never changes its URI.

    Local Server Config and Media

    Another pelican feature I'm not using is attaching static files. I have experimented with that initially, but when the first larger binary files came in, I realised they really shouldn't be under version control. Also, I never managed to work out a smooth and non-confusing way to have pelican copy these files predictably anyway.

    What I ended up doing is have an unversioned, web-published directory that contains all non-article (“media”) files. On my local box, that's in /var/www/blog-media, and to keep a bit of order in there, the files sit in per-year subdirectories (you'll spot that in the link to the script above). The blog directory with the sources and the built documents, on the other hand, is within my home. To assemble all this, I have an /etc/apache2/sites-enabled/007-blog.conf containing:

    <VirtualHost *:80>
      ServerName blog
      DocumentRoot /home/anselm/blog/output
    
      Alias /media /var/www/blog-media
    
      ProxyPass /bin/ http://localhost:6070/
    
      <Directory "/home/anselm/blog/output">
        AllowOverride None
        Options Indexes FollowSymLinks
        Require all granted
      </Directory>
    
      <Directory ~ "/\.git">
        Require all denied
      </Directory>
    </VirtualHost>
    

    which needs something like:

    127.0.0.1 localhost blog
    

    in your /etc/hosts so the system knows what the ServerName means. The ProxyPass statement in there is for CGIs, which of course apache could do itself; more on this in some future post. And I'm blocking the access to git histories for now (which do exist in my media directory) because I consider them fairly personal data.

    Deployment

    Nachtrag (2022-07-10)

    I'm now doing this quite a bit differently because I have decided the procedure described here is a waste of bandwidth (which matters when all you have is GPRS). See Maintaining Static Blogs Using git push.

    When I'm happy with a post, I remake the whole site and push it to the publishing box (called sosa here). I have added an install target to pelican's Makefile for that:

    install: publish
      rsync --exclude .xapian_db -av output/ sosa:/var/blog/generated/
      rsync -av /var/www/blog-media/ sosa:/var/blog/media/
      ssh sosa "BLOG_DIR=/var/blog/generated/ /var/blog/media/cgi/blogsearch"
    

    As you can see, on the target machine there's a directory /var/blog belonging to me, and I'm putting the text content into the generated and the media files into the media subdirectory. The exclude option to the rsync and the call to blogsearch is related to my local search: I don't want the local index on the published site so I don't have to worry about keeping it current locally, and the call to blogsearch updates the index after the upload.

    The publication site uses nginx rather than apache. Its configuration (/etc/nginx/sites-enabled/blog.conf) looks like this (TLS config removed):

    server {
      include snippets/acme.conf;
      listen 80;
      server_name blog.tfiu.de;
    
      location / {
        root /var/blog/generated/;
      }
    
      location /media/ {
        alias /var/blog/media/;
      }
    
      location /bin/ {
        proxy_pass http://localhost:6070;
        proxy_set_header Host $host;
      }
    
      location ~ \.git/ {
        deny all;
      }
    }
    

    – again, the clause for /bin is related to local search and other scripting.

    Extensions

    Nachtrag (2022-10-07)

    Don't take the code from here; rather, see https://codeberg.org/AnselmF/pelican-ext

    In addition to my local search engine discussed elsewhere, I have also written two pelican plugins. I have not yet tried to get them into pelican's plugin collection because… well, because of the usual mixture of doubts. Words of encouragement will certainly help to overcome them.

    For one, again related to searching, it's articlemtime.py. This is just a few lines making sure the time stamps on the formatted articles match those of their input files. That is very desirable to limit re-indexing to just the changed articles. It might also have advantages for, for instance, external search engines or havesters working with the HTTP if-modified-since header; but then these won't see changes in the non-article material on the respective pages (e.g., the tag cloud). Whether or not that is an advantage I can't tell.

    Links to blog posts

    The citedby plugin in action: These are the articles that cite this post right now.

    The other custom extension I wrote when working on something like the third post in total, planning to revisit it later since it has obvious shortcomings. However, it has been good enough so far, and rather than doing it properly and then writing a post of it own, I'm now mentioning it here. It's citedby.py, and it adds links to later articles citing an article. I think this was known as a pingback in the Great Days of Blogs, though this is just within the site; whatever the name, I consider this kind of thing eminently useful when reading an old post, as figuring out how whatever was discussed unfolded later is half of your average story.

    The way I'm currently doing it is admittedly not ideal. Essentially, I'm keeping a litte sqlite database with the cited-citing pairs. This is populated when writing the articles (and pulls the information from the rendered HTML, which perhaps is a bit insane, too). This means, however, that a newly-made link will only …

  • Stemming for the Search Engine

    First off, here is a quick reference for the search syntax on this site (the search form links here):

    • Phrase searches ("this is a phrase")
    • Exclusions (-dontmatch)
    • Matches only when two words appear within 10 tokens of each other (matches NEAR appear)
    • Trailing wildcard as in file patterns (trail*)
    • Searches don't use stemming by default, but stem for German when introduced with l:de and for English when introduced with l:en
    • See also the Xapian syntax.

    If you only came here for the search syntax, that's it, and you can stop reading here.

    Otherwise, if you have read the previous post on my little search engine, you will remember I was a bit unhappy that I completely ignored the language of the posts and had wanted to support stemming so that you can find, ideally, documents containing any of "search", "searches", "searching", and "searched" when searching for any of these. Being able to do that (without completely ruining precision) is obviously language-dependent, which means the first step to make it happen is to properly declare the languague of your posts.

    As discussed in the previous post, my blogsearch script only looks at elements with the CSS class indexable, and so I decided to have the language declaration there, too. In my templates, I hence now use:

    <div class="indexable" lang="{{ article.lang }}">
    

    or:

    <div class="indexable" lang="{{ page.lang }}">
    

    as appropriate.

    This is interpreted by the indexer rather straightforwardly by pulling the value out of the attribute and asking xapian for a stemmer for the named language. That works for at least most European two-letter country codes, because those happen to coincide with what's legal in HTML's lang universal attribute. It does not work for the more complex BCP 47 language tags like de-AT (where no actually existing stemmer would give results different from plain de anyway) or even sr-Latn-RS (for which, I think, no stemmer exists).

    On searching, I was worried that enabling stemming would blow unstemmed searches, but xapian's indexes are clever enough that that's not a problem. But I still cannot stem queries by default, because it is hard to guess their language from just a word or two. Hence, I have defined a query syntax extension: If you prefix your query with l:whatever, blogsearch will try to construct a xapian stemmer from whatever. If that fails, you'll get an error, if it succeeds, it will stem the query in that language.

    As an aside, I considered for a moment whether it is a terribly good idea to hand through essentially unfiltered user input to a C++ API like xapian's. I eventually settled for just making it a bit harder to craft buffer overflows by saying:

    lang = parts[0][2:30]
    

    – that is, I'm only allowing through up to 28 characters of language code. Not that I expect that anything in between my code and xapian's core has an overflow problem, but this is a cheap defensive measure that would also limit the amount of code someone could smuggle in in case some vulnerability did sneak in. Since it's essentially free, I'd say that's reasonable defensive programming.

    In closing, I do not think stemmed searches will be used a lot, and as usual with these very simple stemmers, they leave a lot to be desired from a linguistic point of view. Compare, for instance, a simple search for going with the result l:en going to see where this is supposed to go (and compare with the result when stemming as German). And then compare with l:en went, which should return the same as l:en going in an ideal world but of course doesn't: Not with the simple snowball stemmer that xapian employs.

    I'm still happy the feature's there, and I'm sure I'll need it one of these days.

    And again, if you need a CGI that can index and query your static HTML collection with low deployment effort: you're welcome.

  • Der hundertste Post

    Vor 10 Monaten habe ich den ersten Artikel für dieses Blog geschrieben, und siehe da: Mit diesem sind es jetzt 100 Posts geworden.

    Das wäre ein guter Vorwand für ein paar Statistiken, aber da ich ja generell ein Feind von Metriken bin, die mensch ohne konkrete Fragestellung sammelt (das ist ein wenig wie beim statistischen Testen: Wenn du nicht von vorneherein weißt, worauf du testest, machst du es falsch), bestätige ich mir nur, dass meine Posts viel länger sind als ich das eigentlich will. Insgesamt nämlich habe ich nach Zählung von wc -l auf den Quelldateien fast 93000 Wörter in diesen Artikeln. Zur Fehlerabschätzung: xapian (vgl. unten) zählt nur 89000.

    Die Länge der Artikel ist nach wc-Wörtern so verteilt:

    Histogramm mit einem Klumpen zwischen 200 und 1000 und einem Outlier bei 3000

    Ich weiß auch nicht recht, warum ich mich nicht kürzer fassen kann. Oder will. Der überlange Post mit 3244 Wörtern ist übrigens der über die Konfiguration eines Mailservers – und das ist wieder ein gutes Beispiel für die Fragwürdigkeit von Metriken, denn erstens hat Englisch fast keine Komposita und ist von daher im Nachteil beim Wörterzählen und zweitens ist in dem Artikel ziemlich viel Material, das in Wirklichkeit Rechner lesen, und das sollte wirklich anders zählen als natürlichsprachiger Text.

    Na gut, und einem Weiteren kann ich nicht widerstehen: Wie viele verschiedene Wörter („Paradigmata“) kommen da eigentlich vor? Das ist natürlich auch Mumpitz, denn die Definition, wann zwei Wörter verschieden sind („die Token verschiedenen Paradigmata angehören“), ist alles andere als tivial. So würde ich beispielsweise behaupten, dass die Wörter Worte und Wörter praktisch nichts miteinander zu tun haben, während im Deuschen z.B. auf, schaute und aufschauen besser alle zusammen ein einziges Paradigma bilden sollten (zusammen mit allerlei anderem).

    Aber ist ja egal, sind ja nur Metriken, ist also eh Quatsch. Und es gibt die Daten auch schon, was für die Nutzung von und die Liebe zu Kennzahlen immer ein Vorteil ist. Ich habe nämlich den xapian-Index über dem Blog, und mit dem kann ich einfach ein paar Zeilen Python schreiben:

    import xapian
    db = xapian.Database("output/.xapian_db")
    print(sum(1 for w in db.allterms()))
    

    (Beachtet die elegante Längenbestimmung mit konstantem Speicherbedarf – db.allterms() ist nämlich ein Iterator).

    Damit bekomme ich – ich stemme nach wie vor nicht – 16540 raus. Klar, diese 16540 für die Zahl der verschiedenen Wörter ist selbst nach den lockeren Maßstäben von Metriken ganz besonders sinnlos, weil es ja eine wilde Mischung von Deutsch und Englisch ist.

    Um so mehr Spaß macht es, das mit den 100'000 Wörtern zu vergleichen, die schließlich mal im Goethe-Wörterbuch sein sollen, wenn es fertig ist. Eine schnelle Webrecherche hat leider nichts zur Frage ergeben, wie entsprechende Schätzungen für Thomas Mann aussehen. Einmal, wenn ich gerne Kennzahlen vergleichen würde…

  • A Local Search Engine for Pelican-based Blogs

    As the number of posts on this blog approaches 100, I figured some sort of search functionality would be in order. And since I'm wary of “free” commercial services and Free network search does not seem to go anywhere[1], the only way to offer that that is both practical and respectful of the digital rights of my readers is to have a local search engine. True, having a search engine running somewhat defeats the purpose of a static blog, except that there's a lot less code necessary for doing a simple search than for running a CMS, and of course you still get to version-control your posts.

    I have to admit that the “less code” argument is a bit relative given that I'm using xapian as a full-text indexer here. But I've long wanted to play with it, and it seems reasonably well-written and well-maintained. I have hence written a little CGI script enabling search over static collections of HTML files, which means in particular pelican blogs. In this post, I'll tell you first a few things about how this is written and then how you'd run it yourself.

    Using Xapian: Indexing

    At its core, xapian is not much more than an inverted index: Essentially, you feed it words (“tokens”), and it will generate a database pointing from each word to the documents that contain it.

    The first thing to understand when using xapian is that it doesn't really have a model of what exactly a document is; the example indexer code, for instance, indexes a text file such that each paragraph is treated as a separate document. All xapian itself cares about is a string („data“, but usually rather metadata) that you associate with a bunch of tokens. This pair receives a numeric id, and that's it.

    There is a higher-level thing called omega built on top of xapian that does identify files with xapian documents and can crawl and index a whole directory tree. It also knows (to some extent) how to pull tokens from a large variety of file types. I've tried it, and I wasn't happy; since pelican creates all those ancillary HTML files for tags, monthly archives, and whatnot, when indexing with omega, you get lots of really spurious matches as soon as people enter a term that's in an article title, and entering a tag or a category will yield almost all the files.

    So, I decided to write my own indexer, also with a view to later extending it to language detection (this blog has articles in German and English, and they eventually should be treated differently). The core is rather plain in Python:

    for dir, children, names in os.walk(document_dir):
      for name in fnmatch.filter(names, "*.html"):
        path = os.path.join(dir, name)
        doc = index_html(indexer, path, document_dir)
    

    That's enough for iterating over all HTML files in a pelican output directory (which document_dir should point to).

    In the code, there's a bit of additional logic in the do_index function. This code enables incremental indexing, i.e., only re-indexing a file if it has changed since the last indexing run (pelican fortunately manages the file timestamps properly).

    Nachtrag (2021-11-13)

    It didn't, actually; see the search engine update post for how to fix that.

    What I had to learn the hard way is that since xapian has no built-in relationship between what it considers a document and an operating system file, I need to explicitly remove the previous document matching a particular file. The function get_indexed_paths produces a suitable data structure for that from an existing database.

    The indexing also defines my document model; as said above, as far as xapian is concerned, a document is just some (typically metadata) string under user control (plus the id and the tokens, obviously). Since I want structured metadata, I need to structure that string, and these days, json is the least involved thing to have structured data in a flat string. That explains the first half of the function that actually indexes one single document, the path of which comes in in f_name:

    def index_html(indexer, f_name, document_dir):
      with open(f_name, encoding="utf-8") as f:
        soup = bs4.BeautifulSoup(f, "lxml")
      doc = xapian.Document()
      meta = {
        "title": soup_to_text(soup.find("title")),
        "path": remove_prefix(f_name, document_dir),
        "mtime": os.path.getmtime(f_name),}
      doc.set_data(json.dumps(meta))
    
      content = soup.find(class_="indexable")
      if not content:
        # only add terms if this isn't some index file or similar
        return doc
      print(f"Adding/updating {meta['path']}")
    
      indexer.set_document(doc)
      indexer.index_text(soup_to_text(content))
    
      return doc
    

    – my metadata thus consists of a title, a path relative to pelican's output directory, and the last modification time of the file.

    The other tricky part in here is that I only index children of the first element with an indexable class in the document. That's the key to keeping out all the tags, archive, and category files that pelican generates. But it means you will have to touch your templates if you want to adopt this to your pelican installation (see below). All other files are entered into the database, too, in order to avoid needlessly re-scanning them, but no tokens are associated with them, and hence they will never match a useful query.

    Nachtrag (2021-11-13)

    When you add the indexable class to your, also declare the language in order to support stemming; this would look like lang="{{ page.lang }} (substituting article for page as appropriate).

    There is a big lacuna here: the recall, i.e., the ratio between the number of documents actually returned for a query and the number of documents that should (in some sense) match, really suffers in both German and English if you don't do stemming, i.e., fail to strip off grammatical suffixes from words.

    Stemming is of course highly language-dependent. Fortunately, pelican's default metadata includes the language. Less fortunately, my templates don't communicate that metadata yet – but that would be quick to fix. The actual problem is that when I stem my documents, I'll also have to stem the incoming queries. Will I stem them for German or for English?

    I'll think about that problem later and for now don't stem at all; if you remember that I don't stem, you can simply append an asterisk to your search term; that's not exactly the same thing, but ought to be good enough in many cases.

    Using xapian: Searching

    Running searches using xapian is relatively straightforward: You open the database, parse the query, get the set of matches and then format the metadata you put in during indexing into links to the matches. In the code, that's in cgi_main; one could do paging here, but I figure spitting out 100 matches will be plenty, and distributing 100 matches on multiple HTML pages is silly (unless you're trying to optimise your access statistics; since I don't take those, that doesn't apply to me).

    The part with the query parser deserves a second look, because xapian supports a fairly rich query language, where I consider the most useful features:

    • Phrase searches ("this is a phrase")
    • Exclusions (-dontmatch)
    • Matches only when two words appear within 10 tokens of each other (matches NEAR appear)
    • Trailing wildcard as in file patterns (trail*)

    That last feature needs to be explicitly enabled, and since I find it somewhat unexpected that keyword arguments are not supported here, and perhaps even that the flag constant sits on the QueryParser object, here's how enabling wildcards in xapian looks in code:

    qp = xapian.QueryParser()
    parsed = qp.parse_query(query, qp.FLAG_WILDCARD)
    

    Deploying this on your Pelican Installation

    You can re-use my search script on your site relatively easily. It's one file, and if you're running an apache or something else that can run CGIs[2], making it run first is close to trivial: Install your equivalents of the Debian python3-xapian, python3-bs4, and python3-lxml packages. Perhaps you also need to explicitly allow CGI execution on your web server. In Debian's apache, that would be a2enmod cgi, elsewhere, you may need to otherwise arrange for mod_cgi or its equivalent to be loaded.

    Then you need to dump blogsearch somewhere in the file system.

    Nachtrag (2022-10-07)

    Don't take it from here; rather, see https://codeberg.org/AnselmF/pelican-ext

    While Debian has a default CGI directory defined, I'd suggest to put blogsearch somewhere next to your blog; I keep everything together in /var/blog (say), have the generated output in /var/blog/generated and would then keep the script in a directory /var/blog/cgi. Assuming this and apache, You'd then have something like:

    DocumentRoot /var/blog/generated
    ScriptAlias /bin /var/blog/cgi
    

    in your configuration, presumably in a VirtualHost definition. In addition, you will have to tell the script where your pelican directory is. It expects that information in the environment variable BLOG_DIR; so, for apache, add:

    SetEnv BLOG_DIR /var/blog/generated
    

    to the VirtualHost.

    After restarting your web server, the script would be ready (with the configuration above …

  • Math with ReStructuredText and Pelican

    I recently wrote a piece on estimating my power output from CO₂ measurements (in German) and for the first time in this blog needed to write at least some not entirely trivial math. Well: I was seriously unhappy with the way formulae came out.

    Ugly math of course is very common as soon as you leave the lofty realms of LaTeX. This blog is made with ReStructuredText (RST) in pelican. Now, RST at least supports the math interpreted text role (“inline”) and directive (“block“ or in this case rather “displayed“) out of the box. To my great delight, the input syntax is a subset of LaTeX's, which remains the least cumbersome way to input typeset math into a computer.

    But as I said, once I saw how the formulae came out in the browser, my satifsfaction went away: there was really bad spacing, fractions weren't there, and things were really hard to read.

    In consequence, when writing the post I'm citing above, rather than reading the docutils documentation to research whether the ugly rendering was a bug or a non-feature, I wrote a footnote:

    Sorry für die hässlichen Formeln. Vielleicht schreibe ich mal eine Erweiterung für ReStructuredText, die die ordentlich mit TeX formatiert. Oder zumindest mit MathML. Bis dahin: Danke für euer Verständnis.

    (Sorry for the ugly formulae. Perhaps one of these days I'll write an RST extension that properly formats using TeX. Or at least MathML. Until then: thanks for your understanding.)

    This is while the documentation clearly said, just two lines below the example that was all I had initially bothered to look at:

    For HTML, the math_output configuration setting (or the corresponding --math-output command line option) selects between alternative output formats with different subsets of supported elements.

    Following the link at least would have told me that MathML was already there, saving me some public embarrassment.

    Anyway, when yesterday I thought I might as well have a look at whether someone had already written any of the code I was talking about in the footnote, rather than properly reading the documentation I started operating search engines (shame on me).

    Only when those lead me to various sphinx and pelican extensions and I peeked into their source code I finally ended up at the docutils documentation again. And I noticed that the default math rendering was so ugly just because I didn't bother to include the math.css stylesheet. Oh, the miracles of reading documentation!

    With this, the default math rendering suddenly turns from ”ouch” to “might just do”.

    But since I now had seen that docutils supports MathML, and since I have wanted to have a look at it at various times in the past 20 years, I thought I might as well try it, too. It is fairly straightforward to turn it on; just say:

    [html writers]
    math_output: MathML
    

    in your ~/.docutils (or perhaps via a pelican plugin).

    I have to say I am rather underwhelmed by how my webkit renders it. Here's what the plain docutils stylesheet works out to in my current luakit:

    Screenshot with ok formulae.

    And here's how it looks like via MathML:

    Screenshot with less ok formulae.

    For my tastes, the spacing is quite a bit worse in the MathML case; additionally, the Wikipedia article on MathML mentions that the Internet Explorer never supported it (which perhaps wouldn't bother me too much) and that Chromium withdrew support at some point (what?). Anyway: plain docutils with the proper css is the clear winner here in my book.

    I've not evaluated mathjax, which is another option in docutils math_output and is what pelican's render_math plugin uses. Call me a luddite, but I'll file requiring people to let me execute almost arbitrary code on their box just so they see math into the big folder labelled “insanities of the modern Web”.

    So, I can't really tell whether mathjax would approach TeX's quality, but the other two options clearly lose out against real TeX, which using dvipng would render the example to:

    Screenshot with perfect formulae

    – the spacing is perfect, though of course the inline equation has a terrible break (which is not TeX's fault). It hence might still be worth hacking a pelican extension that collects all formulae, returns placeholder image links for them and then finally does a big dvipng run to create these images. But then this will mean dealing with a lot of files, which I'm not wild about.

    What I'd like to ideally use for the small PNGs we are talking about here would be inline images using the data scheme, as in:

    <img src="data:image/png;base64,AAA..."/>
    

    But since I would need to create the data string when docutils calls my extension function, I in that scheme cannot collect all the math rendering for a single run of LaTeX and dvipng. That in turn would mean either creating a new process for TeX and dvipng each for each piece of math, which really sounds bad, or hacking some wild pipeline involving both, which doesn't sound like a terribly viable proposition either.

    While considering this, I remembered that matplotlib renders quite a bit of TeX math strings, too, and it lets me render them without any fiddling with external executables. So, I whipped up this piece of Python:

    import base64
    import io
    import matplotlib
    from matplotlib import mathtext
    
    matplotlib.rcParams["mathtext.fontset"] = "cm"
    
    def render_math(tex_fragment):
        """returns self-contained HTML for a fragment of TeX (inline) math.
        """
        res = io.BytesIO()
        mathtext.math_to_image(f"${tex_fragment}$",
          res, dpi=100, format="png")
        encoded = base64.b64encode(res.getvalue()).decode("ascii")
        return (f'<img src="data:image/png;base64,{encoded}"'
            f' alt="{tex_fragment}" class="math-png"/>')
    
    if __name__=="__main__":
        print(render_math("\int_0^\infty \sin(x)^2\,dx"))
    

    This prints the HTML with the inline formula, which with the example provided looks like this: \int_0^\infty \sin(x)^2\,dx – ok, there's a bit too much cropping, I'd have to trick in transparency, there's no displayed styles as far as I can tell, and clearly one would have to think hard about CSS rules to make plausible choices for scale and baseline – but in case my current half-satisfaction with docutils' text choices wears off: This is what I will try to use in a docutils extension.

  • Vom Töten und Massenschlachten

    Heute morgen hat der Deutschlandfunk das IMI-Urgestein Tobias Pflüger interviewt (Informationen am Morgen, 14.9.), und die Art, in der der Interviewende versucht hat, Tobias dazu zu bringen, sich für einen „Mangel“ an Bellizismus zu entschuldigen, war erwartbar empörend. Umgekehrt aber war Tobias schon sehr zahm, verglichen jedenfalls mit dem militanten Pazifismus, den ich von ihm eigentlich kenne; nun ja, er trat auch als stellvertretender Vorsitzender der Linken auf, und die will erkennbar regieren[1].

    Das – wie natürlich auch das ultrazynische Rührstück um die „Ortskräfte“, die die Bundeswehr aus dem gleichen Afghanistan „rettete“, in das die Regierung in den Vormonaten noch mindestens 167 Menschen abschieben hat lassen – wiederum gibt mir den Vorwarnd, endlich ein paar politische Gegenstücke zum Tucholsky-Klassiker „Soldaten sind [meinethalben potenzielle] Mörder“ zu formulieren, die ich schon lang irgendwo unterbringen wollte (auch wenn ich anerkenne, dass sie vermutlich nicht sehr originell sind und bestimmt schon oft ganz ähnlich von PazifistInnen, AnarchistInnen und anarchistischen PazifistInnen formuliert wurden; ich sollte vermutlich mehr von solchen Leuten lesen).

    Erste Behauptung: Eine Regierung, die sich ein Militär hält, will für die eigene Macht Menschen töten.

    Für den Fall, dass jemand das nicht unmittelbar offensichtlich findet, will ich ein paar Ableitungsschritte nennen. Erstens ist nämlich Militär schicht dafür da, Krieg zu führen oder Aufstände zu unterdrücken. Ich gebe zu, dass die Bundeswehr auch schon Dämme ausgebessert, Brunnen gebohrt, und in Impfzentren ausgeholfen hat. Sie war dabei aber immer ausnehmend schlecht, bis hin zur Unfähigkeit, die Impfunterlagen korrekt und halbwegs gestapelt zusammenzutackern. Das ist soweit erwartbar, denn sowohl das Rumgeballer als auch die Gehorcherei sind bei nichttödlichen Einsätzen klar störend. Wer Personal für „humanitäre“ Einsätze vorhalten will, würde selbstverständlich keine Gewehre und Waffen kaufen und viel Geld dafür ausgeben, den Leuten den Umgang damit (statt mit Baggern, Bohrern und Büroklammern) beizubringen.

    Mithin geht es beim Militär um Personal zum Bedienen von Kriegswaffen, und das heißt zum Führen von Krieg (bei der Aufstandsbekämpfung ist das der Sonderfall des Bürgerkriegs).

    Was aber ist Krieg? Krieg ist auf der einen Seite der Versuch einer Regierung, eine andere Regierung zu ersetzen, entweder durch sich selbst („Eroberungskrieg“) oder durch eine der eigenen Machtausübung weniger hinderliche („Nation Building“). Und entweder komplett oder nur in einem Teil des Machtbereichs der anderen Regierung.

    Auf der anderen Seite ist Krieg der Versuch einer Regierung, die eigene Macht gegen eine andere Regierung oder Teile der Bevölkerung (beim Bürgerkrieg) zu halten. Wie herum es im Einzelfall auch sein mag: Es geht allein darum, Macht auszuweiten oder zu erhalten.

    Selbst wenn mensch der eigenen Regierung wider jede Evidenz (die Bundeswehr hat derzeit in, wievielen?, zwanzig oder so, anderen Ländern Waffen) unterstellt, sie sei dabei in der Rolle der machterhaltenden, quasi verteidigenden Regierung: Sie könnte jede Menge Blutvergießen verhindern, wenn sie einfach zurücktreten würde und sagen, die „angreifende“ Regierung könne ja gerne versuchen, ob sie es besser kann. Es gäbe dann keinen Krieg, und ob die Regierungsführung am Boden wirklich wesentlich schlechter wäre, ist überhaupt nicht ausgemacht. Ich z.B. würde es wahrscheinlich begrüßen, wenn die Schweiz die Regierung in Baden übernehmen würde. Oder Luxemburg: soweit es mich betrifft, könnten die mich schon erobern, denn sooo viel unethischer und steuerparadiesiger als meine gegenwärtige Regierung sind die auch nicht, aber ich glaube, deren Sozialsystem macht schon ein wenig mehr her.

    Ach, wenn nicht gerade Macron regiert, würde jetzt auch ein Überfall aus Frankreich nicht offensichtlich zu einem Rückschritt führen, wenn die Machtübergabe hinreichend friedlich passiert. Ich versuche ohnehin im Augenblick, ein wenig Französisch zu lernen.

    Also: Regierungen, die ein Militär unterhalten, sagen damit klar an, dass sie für ihre Macht töten wollen. Auf jeden Fall mal die Soldaten der anderen Regierungen.

    Es kommt aber noch schlimmer: Wie ich in meinem Furor über die Weigerung der deutschen Regierung, dem Atomwaffenverbotsvertrag TPNW beizutreten, argumentiert habe, sind Kernwaffen nur einsetzbar, um Hunderttausende oder Millionen von Untertanen einer (na ja: in der Regel) anderen Regierung zu töten. Es gibt schlicht keine anderen glaubhaften Einsatzszenarien.

    Mithin ist, wer die Bombe werfen will, gewillt, für die eigene Macht Städte in Schlachthäuser zu verwandeln. Alle deutschen Regierungen meiner Lebenszeit waren ganz wild auf die „nukleare Teilhabe“ und hatten damit diesen Willen. Die zweite Behauptung, die ich hier machen will, ergibt sich damit unmittelbar: Wer in der BRD lebt, wird regiert von Menschen, die für ihre Macht Städte ausradieren werden.

    Es wäre also schon ein großer zivilisatorischer Fortschritt, wenn sich die nächste Regierung durchringen könnte zum Statement, sie könne sich schon vorstellen, zwecks Machterhalt ein paar hundert, tausend, oder zehntausend Menschen zu töten (also: sie löst die Bundeswehr nicht einfach auf, was natürlich der erfreulichste Ausgang wäre); der eigene Machterhalt würde aber doch nicht rechtfertigen, dutzendweise Städte einzuäschern (weshalb sie den Spuk der nuklearen Teilhabe beenden und dem TPNW beitreten würde).

    Ich wette dagegen.

    [1]Da ich wild entschlossen bin, niemals mit meiner Zustimmung regiert zu werden und also nie der künftigen Regierung meine Stimme geben will – das ist mir bisher auch nicht schwer gefallen –, konnte ich daher leider den Linken nicht meine Stimme geben. Repräsentative Demokratie ist schon manchmal kompliziert, denn im Parlament will ich die Linke selbstverständlich schon haben: Wo wären wir heute ohne ihre parlamentarischen Anfragen?
  • Michel Foucault vs. Corona

    Weil ich es neulich von der GEW hatte: ein weiterer Grund, warum ich 20 Jahre, nachdem es hip war, einen Blog angefangen habe, war eine Telecon im April letzten Jahres und die GEW.

    Na gut, es war nicht direkt die Telecon und eigentlich auch gar nicht die GEW.

    Tatsächlich hatte ich damals aber die erste Lehrsituation im engeren Sinne via Telecon, und kurz danach ich eine Epiphanie dazu, warum sich Lehre über Videokonferenzen so scheiße anfühlt. Dazu habe ich dann einen Artikel geschrieben, den ich, ermutigt von GEW-KollegInnen, gerne in der B&W (das ist die monatlich an alle Mitglieder in Baden-Württemberg verschickte Zeitschrift) untergebracht hätte – so brilliant fand ich ihn. Ahem.

    Nun, was soll ich sagen, die Redaktion war skeptisch, um das mal vorsichtig zu sagen. Ich habe da auch einiges Verständnis dafür, denn im letzten Juni gings bestimmt hoch her in Sachen computervermitteltem Unterricht, und da wären Einwürfe, die Videokonferenzen mit wüsten Folterszenen in Verbindung brachten, bestimmt nicht hilfreich gewesen.

    Aber schade fand ich es doch. Ich hatte aber nicht wirklich einen Platz, um sowas geeignet unterzubringen.

    Jetzt habe ich einen. Und damit: „Wider das Panopticon – Michel Foucault und der Unterricht via Videokonferenz“.

  • Antisprache: Digitalisierung

    Wenn Menschen miteinander reden, kann das verschiedene Gründe haben. Sie können gemütlich plaudern, sie können sich beschimpfen, sie können versuchen, sich Kram zu verkaufen – sie können aber auch einen Diskurs führen, also Ideen austauschen, entwickeln oder kritisieren. Für die letztere Funktion ist eine Sprache sehr hilfreich, die klar und präzise ist, in der insbesondere Begriffe nachvollziehbare „Signifikate“ (also Mengen von bezeichneten „Objekten der Anschauung oder des Denkens”) in der wirklichen Welt haben.

    Oft genug aber haben Sprecher_innen genau an Klarheit und Präzision kein Interesse – ganz besonders, wenn von oben nach unten kommuniziert wird. Herrschaft funktioniert besser, wenn den Beherrschten nicht ganz so klar wird, dass ihr Wille, ihre Interessen, im Hintergrund stehen. Dann sind plötzlich Begriffe hilfreich, die Gedanken verwirren, nicht klären, die Informationen nicht übertragen, sondern zerstreuen. „Globalisierung“ ist ein Beispiel oder auch „Arbeitgeber“, „Verantwortung“ „Terrorismus“ oder „Lernzielkontrolle“ sind weitere.

    Für Begriffe, die so funktionieren, bin ich irgendwann mal auf den Begriff Antisprache gekommen: So wie Antimaterie und Materie, zusammengebracht, zu Strahlung reagieren, reagieren Antisprache und Sprache zu... ach, ich hätte jetzt gerne „Verstrahlung“ gesagt, weil es so gut passt, aber nein: letztlich Verwirrung.

    Das Stück Antisprache, das (vielleicht gemeinsam mit „Populismus“) in den letzten paar Jahren die steilste Karriere genommen hat, ist „Digitalisierung”. Der Begriff ist fast nicht kritisiert worden, jedenfalls nicht aus der Perspektive, was das eigentlich sei und ob das, was da alles drunter fallen soll, überhaupt irgendwie zusammengehört. Ich kann mal wieder nicht lügen: eine Motivation für dieses Blog war, mal öffentlich dazu zu ranten.

    Tatsächlich gehören die unzähligen Dinge, die unter „Digitalisierung” subsumiert werden (die „Extension des Konzepts“ sagt der Semantiker in mir) nämlich schlicht nicht zusammen. Noch nicht mal „halt was mit Computern“ umfasst, sagen wir, Automatisierung in der Industrie, Habituierung der Menschen an extern kontrollierte Ausspielkanäle von Medien und Waren („smartphones“, „smart TVs“), Rechnernutzung in Bildung und Ausbildung, Ausweitung des Netzzugangs, Sensoren aller Art in politischer und sozialer Repression, die Wikipedia, Dauererfassung von Herzfrequenz und Körpertemperatur, Open Access in der Wissenschaft und „autonome“ Autos (was wiederum nur ein kleiner Ausschnitt von dem ist, was mit „Digitalisierung“ schon so bemäntelt wurde. Weil ja da eben auch tatsächlich freundliche und nützliche Dinge dabei sind, taugt auch nicht mein zeitweiser Versuch einer Definition: „Digitalisierung ist, wenn wer will, dass andere Computer benutzen müssen“.

    Wenn das alles nichts miteinander zu tun hat, warum würde jemand all diese Dinge in einen Topf werfen wollen, einmal umrühren und dann „Digitalisierung“ draufschreiben? Und warum kommt das Wort eigentlich jetzt, wo eigentlich so gut wie alles, was von Rechnereinsatz ernsthaft profitiert, schon längst computerisiert ist?

    Wie häufig bei Antisprache verbinden sich da verschiedene Interessen, und am Anfang steht meist ein letztlich politisches Interesse an Tarnung. Wer „Digitalsierung“ sagt, definiert Rechnereinsatz als Sachzwang, und das ist saubequem, wenn mensch mit Leuten redet, deren Arbeit dabei verdichtet wird, die enger überwacht werden, ihr Einkommen verlieren oder ganz schlicht keinen Lust haben, noch ein Gerät um sich zu haben, von dem sie nichts verstehen. „Digitalisierung“ klingt wie etwas, das passiert, nicht wie etwas, das wer macht.

    Ein Hinweis darauf, dass „Digitalisierung“ etwas mit der Durchsetzung von EDV-Einsatz gegen unwillige Untergebene zu tun haben könnte, liefert übrigens auch, dass der Begriff im deutschen Sprachraum so groß ist (und warum es etwa auf Englisch kein „digitisation“ in vergleichbarer Rolle gibt): es gibt hier ein vergleichsweise breites Bewusstsein für Datenschutz (gelobt sei der Volkszählungsboykott der 1980er!), und je klarer jeweils ist, was Leute jetzt mit Computern machen sollen, desto mehr Widerstand gibt es.

    Die Rede von „Digitalisierung“ kann also auch verstanden werden als die Reaktion der verschiedenen Obrigkeiten auf das (vorübergehende?) Scheitern von elektronischen Gesundheitskarten und Personalausweisen, auf regelmäßige Rückschläge bei Kameraüberwachung an der Bäckereitheke und Tippzählerei im Bürocomputer.

    Die Erleichterung der Durchsetzung „unpopulärer Maßnahmen“ (mehr Überwachung, mehr Komplikation, abstürzende Kühlschränke) durch Vernebelung der tatsächlichen Gründe und Interessen ist ein generelles Kriterium von Antisprache. Wo scheinbar kein realer Akteur etwas durchsetzt, sondern ein unerklärbarer Zeitgeist weht, müssen auch diese „Maßnahmen” nicht mehr begründet werden. Ganz besonders drastisch ist das derzeit in den Schulen, denn eigentlich weiß niemand so recht, was dort mit Computern in der Schule anzufangen wäre – jenseits von „wir machen in Physik einen Zeitlupenfilm und berechnen aus den Einzelbildern Momentangeschwindigkeiten“ habe ich da bisher noch nicht viel Glaubhaftes gehört. Na ja, ok, und dann halt noch jetzt gerade als Videotelefone, aber das hat natürlich außerhalb einer Pandemie für keine_n der Beteiligten Sinn.

    „Digitalisierung“ hat, wie viele andere Antisprache auch, einen Booster, nämlich die trojanische Semantik. Dabei wird Kram, den wirklich keine_r will, mit einer Hülle von Populärem umgeben. Beispielsweise ist „Digitalisierung“ in den Hirnen vieler Menschen mit dem (für sie) positiven Gedanken an ihr Mobiltelefon und die vielen schönen Stunden, die sie mit ihm verbringen, assoziiert.

    Wer nun offensiv stromkundenfeindliche Technik wie zeitauflösende Stromzähler („smart meter“) durchsetzen will, kann auf weniger Widerstand bei den künftigen Opfern hoffen, wenn sie diese „smart meter“ in einer Wohlfühl-Bedeutungswolke von TikTok und Tinder einhergeschwebt kommen. Sie sind nicht ein Datenschutz-Disaster, die kommen mit der Digitalisierung, sie sind doch nur ein kleiner Preis, den du für die tollen Möglichkeiten zu bezahlen hast, die dein Smartphone dir bietet.

    Das gehört auch etwas zur oben gestellten Frage, warum das Gerede von „Digitalisierung“ gerade dann so anschwoll, als eigentlich alles, was Rechner sinnvoll tun können, schon von ihnen erledigt wurde: Wenn die Branche weiter wachsen will, dürfen ihre Kund_innen noch weniger als zuvor danach fragen, wozu der autonom nachbestellende Kühlschrank eigentlich gut ist. „Digitalisierung“ wäre dann die schlichte Ansprache: Frage nicht nach dem Warum, denn alle machen jetzt Digitalisierung, und wenn du das nicht machst, bist du ein Bedenkenträger, der bald ganz furchtbar abgehängt sein wird.

    Ganz falsch ist das bestimmt nicht. Aber auch nicht die ganze Wahrheit, wofür ich neulich einen wunderbaren Beleg gefunden habe. Und der ist so toll, der ist Material für einen anderen Post.

  • Klar: Corona

    Ich kann nicht lügen: Einer der Gründe, weshalb ich gerade jetzt mit diesem Blog daherkomme ist, dass ich mir seit Dezember ganz besonders auf die Schulter klopfe wegen der Präzision meiner Corona-Vorhersagen: Wie befriedigend wäre es gewesen, wenn ich auf was Öffentliches hinweisen könnte, das meine Vorhersagen Anfang November dokumentieren würde. Als der „weiche Lockdown“ losging, habe ich nämlich etwas verkniffen rumerzählt: Klar werden die Zahlen nicht runtergehen, solange die Betriebe [1] nicht massiv runterfahren, und weil sich das niemand traut, werden in der Folge immer bizarrere Maßnahmen getroffen werden.

    So ist es nun gekommen, bis hin zu den Windmühlenkämpfen gegen die Rodler_innen im Sauerland und die nächtlichen Ausgangssperren hier in Baden-Württemberg (die allerdings, soweit ich das erkennen kann, genau niemand durchsetzt).

    Nun, jetzt habe ich die nächste Gelegenheit. Zu den relativ wenig beachteten Phänomenen gerade gehört nämlich, dass die Zahl der Corona-Intensivpatient_innen seit dem 4.1. konsistent fällt. Ich bin ziemlich überzeugt, dass das im Wesentlichen das Runterfahren von eigentlich praktisch allem (Betriebe, Geschäfte, Schulen) in der Woche vor Weihnachten spiegelt; und das würde auch darauf hinweisen, dass die Intensivbelegung dem Infektionsgeschehen etwas weniger als die generell angenommenen drei Wochen hinterherläuft.

    Tatsächlich lasse ich seit September jeden Tag ein ad-hoc-Skript laufen, das die aktuellen DIVI-Zahlen aus dem RKI-Bericht des Tages extrahiert und dann logarithmisch (also: exponentielle Entwicklung ist eine Gerade) plottet. Das sieht dann etwa so aus:

    Plot: Intensivbelegung 9/2020-1/2021

    Das angebrachte Lineal ist ein kleiner Python-Hack, den ich extra dafür gemacht habe (da schreibe ich bestimmt demnächst auch mal was zu), und er zeigt: Wir haben seit fast zwei Wochen einen exponentiellen Rückgang der Intensivbelegung – auch bei den Beatmeten, was die untere Linie zeigt; deren paralleler Verlauf lässt übrigens ziemlich zuverlässig darauf schließen, dass wohl keine im Hinblick auf den Verlauf aggressivere Mutante in großer Zahl unterwegs ist.

    Die schlechte Nachricht: Wenn mensch die Steigung anschaut, kommt eine Halbierungszeit von was wie sechs Wochen raus. Das wird nicht reichen, zumal, und hier kommt jetzt meine Prognose, diese Entwicklung wohl bald gebrochen wird, denn zumindest in meiner Umgebung war die Weihnachtsruhe spätestens am 11.1. vorbei, in Bundesländern ohne Feiertag am 6. wahrscheinlich schon früher. Unter der Annahme von zweieinhalb Wochen zwischen Infektionsgeschehen und Intensivreaktion dürfte es dann also etwa Mitte nächster Woche so oder so vorbei sein mit dem Traum zurückgehender Infektionen.

    Und wenn ich schon über Coronazahlen rede: Diesen Belegungsplot mache ich, weil ich ziemlich sicher bin, dass von all den Zahlen, die das RKI derzeit verbreitet, nur die DIVI-Zahlen überhaupt ziemlich nah an dem sind, was sie zu sagen vorgeben, auch wenn Peter Antes, auf dessen Urteil ich viel gebe, da neulich auch Zweifel geäußert hat, die ich erstmal nicht ganz verstehe: die zwei „komischen“ Schnackler, die ich sehe, sind jetzt mal wirklich harmlos.

    Dass die Infektionszahlen problematisch sind, ist inzwischen ein Gemeinplatz; zwar wäre sicher, könnte mensch wirklich den Zeitpunkt der Übertragung in nennenswerter Zahl feststellen, ein sichtbarer Effekt vom Wochenende zu sehen (denn die Übertragung in der Breite dürfte derzeit stark von Arbeit und Arbeitsweg dominiert sein), aber nicht mal der würde die wilden Zacken verursachen, an die wir uns in den letzten Monaten gewöhnt haben.

    Aber ok – dass in Daten dieser Art das Wochenende sichtbar ist, hätte ich auch bei 24/7-Gesundheitsämtern jederzeit vorhergesagt. Beim besten Willen nicht vorhergesagt hätte ich allerdings die Zackigkeit dieser Kurve:

    Plot: Corona-Tote über Tag von JHU

    Zu sehen sind hier die Todesmeldungen pro Tag (jetzt nicht vom RKI, sondern von Johns Hopkins, aber beim RKI sieht das nicht anders aus). Sowohl nach Film-Klischee („Zeitpunkt des Todes: Dreizehnter Erster, Zwölf Uhr Dreiunddreissig“) als auch nach meiner eigenen Erfahrung als Zivi auf einer Intensivstation hätte ich mir gedacht, dass Sterbedaten im Regelfall zuverlässig sind. Und so sehr klar ist, dass während Volksfesten mehr Leute sterben und bei den Motorradtoten ein deutliches Wochenend-Signal zu sehen sein sollte: Corona kennt ganz sicher kein Wochenende.

    Also: DIVI rules.

    Und ich muss demnächst wirklich mal gegen dark mode ranten.

    [1]Als bekennender Autofeind muss ich ja zugeben, dass der größte Wow-Effekt der ganzen Corona-Geschichte war, als im letzten März VW die Produktion eingestellt hat. Dass ich das noch erleben durfte... Der zweitgrößte Wow-Effekt war übrigens, dass die doch ziemlich spürbare Reduktion im Autoverkehr im März und April sich nicht rasch in den Sterblichkeitsziffern reflektiert hat.
  • Engelszüngeln?

    Über 20 Jahre nach dem ersten Blog – und ja, ich bin alt genug, um mich an die frühen Zeiten von slashdot zu erinnern – fange ich jetzt (vielleicht) ein Blog an. Warum?

    Nun, ich kann von mir behaupten, gebloggt zu haben, bevor es den Begriff gab. Ende 1996 habe ich angefangen, den UNiMUT aktuell zu schreiben, der ziemlich genau dem späteren Blog-Begriff entsprach: Artikel, die, na ja, online geboren wurden und in der Tat schon damals in ein Web-Form eingegeben wurden: Wow, ich habe ein CMS geschrieben! Das Ding hat im Laufe der Jahre viele tolle Features bekommen, von eingebauten Abkürzungserklärungen bis zu Backlinks, wie z.B. in meinem all-time-Lieblingsartikel Ideologieproduktion in der Prüfungsordnung oder in einem, wie ich ohne jede Bescheidenheit behaupte, hellsichtigen und vielzitierten Beitag zur Bologna-Katastrophe mit dem visionären Titel Attenti a la Rossa (2002) jeweils unten zu sehen ist.

    Kurz: Bis 2006 hatte ich so ein Spielfeld, in dem ich mich austoben konnte. Warum ich das dann gelassen habe, gehört in einen anderen Post. Aber ich mir aber seitdem öfter mal gewünscht, wieder einen Platz für Rants zu haben, zumal solche, die nicht recht auf datenschmutz passen.

    Beispiele dafür hoffe ich, in den nächsten Wochen dann und wann zu posten.

    Mal sehen.

    Derweil, Gedanke des Tages: In der taz von heute steht, dass Nico Semsrott aus der PARTEI ausgetreten ist (oder austreten will), weil Martin Sonneborn ein T-Shirt mit „Au Widelsehen, Amelika“ getragen hatte und auf die Kritik offenbar unpassend reagiert hat. Auch das war eine Erinnerung an meine Zeiten beim UNiMUT, denn 1994 habe ich für einen ziemlich ähnlichen Witz auch ordentlich Kritik eingefangen: Ein Artikel über die Verleihung eines „Landeslehrpreises“ bediente die gleichen Klischees. Wer den Artikel liest, mag verstehen, wie wir damals darauf gekommen sind. Und nun überlege ich, ob ich mit dem „der Kritiker, der meinte, UNiMUT könne ohne tendenziell diskriminierende Aufmacher auskommen, hat natürlich recht“ aus der nächsten Ausgabe zu recht davongekommen bin...

Seite 1 / 1

Letzte Ergänzungen