How I'm Using Pelican

I started this blog on January 14th last year. To celebrate the anniversary, I thought I could show how I'm using pelican (the blog engine I'm using); perhaps it'll help other people using it or some other static blog generator.

Posting and Writing

First, I structure my content subdirectory (for now) such that each article has the ISO-formatted date as its name, which makes that source name rather predictable (for linking using pelican's {filename} replacement), short, and gives the natural sort order sensible semantics.

Also, I want to start each post from a template, and so among the first things I did was write a little script to automate name generation and template instantiation. Over the past year, that script has evolved into post.py3 [Update 2022-03-15: I've changed a few things in the meantime; in particular, I am now opening a web browser because I got tired of hunting for the URI when it was scrolled off the screen before I first had something to open, and to make that work smoothly, I'm building the new post right after creating its source]. It sits next to pelican's Makefile and is in the blog's version control. With this, starting this post looked like this:

$ ./post.py3 "How I'm Using Pelican"
http://blog/how-i-m-using-pelican.html
remake.sh output/how-i-m-using-pelican.html

[Update 2022-05-26: the output is now a bit different, and now I do open the browser window – see below]

What the thing printed is the URL the article will be seen under (I've considered using the webbrowser module to automatically open it, but for me just pasting the URL into my “permanent” blog browser window works better). The second line gives a command to build the document for review. This remake.sh script has seen a bit of experimentation while I tried to make the specification of what to remake more flexible. I've stopped that, and now it's just:

#!/bin/bash
pelican --write-selected "$1"

When you add:

CACHE_CONTENT = True
LOAD_CONTENT_CACHE = True
CONTENT_CACHING_LAYER = 'generator'

to your pelicanconf.py, rebuilding just the current article should be relatively quick (about 1 s on my box). Since I like to proofread on the formatted document, that's rather important to me.

[Update 2022-05-26: N…no. This part I'm now doing very differently. See Quick RST Previews]

If you look at post.py3's code, you will see that it also fixes the article's slug, i.e., the path part of the URL. I left this to Pelican for a while, but it annoyed me that even minor changes to a blog title would change the article's URI (and hence also the remake statment). I was frankly tempted to not bother having elements of the title in the slug at all, as I consider this practice SEO, and I am a fanatical enemy of SEO. But then I figured producing shorter URIs isn't worth that much, in particular when I'd like them to be unique and easy to pronounce. In the end I kept the title-based slugs.

The script also picks the local file name as per the above consideration with some disambiguation if there's multiple posts on one day (which has only happened once in the past year). Finally, the script arranges for adding the new post to the version control system. Frankly, from where I stand now, I'd say I had overestimated the utility of git for blogging. But then, a git init is cheap, and who knows when that history may become useful.

I'm not using pelican's draft feature. I experimented with it for a while, but I found it's a complication that's not worth anything given I'm always finishing a post before starting the next. That means that what otherwise would be the transition from draft to published for me is the make install. The big advantage of starting with status:published is that under normal circumstances, an article never changes its URI.

Local Server Config and Media

Another pelican feature I'm not using is attaching static files. I have experimented with that initially, but when the first larger binary files came in, I realised they really shouldn't be under version control. Also, I never managed to work out a smooth and non-confusing way to have pelican copy these files predictably anyway.

What I ended up doing is have an unversioned, web-published directory that contains all non-article (“media”) files. On my local box, that's in /var/www/blog-media, and to keep a bit of order in there, the files sit in per-year subdirectories (you'll spot that in the link to the script above). The blog directory with the sources and the built documents, on the other hand, is within my home. To assemble all this, I have an /etc/apache2/sites-enabled/007-blog.conf containing:

<VirtualHost *:80>
  ServerName blog
  DocumentRoot /home/anselm/blog/output

  Alias /media /var/www/blog-media

  ProxyPass /bin/ http://localhost:6070/

  <Directory "/home/anselm/blog/output">
    AllowOverride None
    Options Indexes FollowSymLinks
    Require all granted
  </Directory>

  <Directory ~ "/\.git">
    Require all denied
  </Directory>
</VirtualHost>

which needs something like:

127.0.0.1 localhost blog

in your /etc/hosts so the system knows what the ServerName means. The ProxyPass statement in there is for CGIs, which of course apache could do itself; more on this in some future post. And I'm blocking the access to git histories for now (which do exist in my media directory) because I consider them fairly personal data.

Deployment

[Update 2022-07-10: I'm now doing this quite a bit differently because I have decided the procedure described here is a waste of bandwidth (which matters when all you have is GPRS). Poke me for a post on what I'm doing now.]

When I'm happy with a post, I remake the whole site and push it to the publishing box (called sosa here). I have added an install target to pelican's Makefile for that:

install: publish
  rsync --exclude .xapian_db -av output/ sosa:/var/blog/generated/
  rsync -av /var/www/blog-media/ sosa:/var/blog/media/
  ssh sosa "BLOG_DIR=/var/blog/generated/ /var/blog/media/cgi/blogsearch"

As you can see, on the target machine there's a directory /var/blog belonging to me, and I'm putting the text content into the generated and the media files into the media subdirectory. The exclude option to the rsync and the call to blogsearch is related to my local search: I don't want the local index on the published site so I don't have to worry about keeping it current locally, and the call to blogsearch updates the index after the upload.

The publication site uses nginx rather than apache. Its configuration (/etc/nginx/sites-enabled/blog.conf) looks like this (TLS config removed):

server {
  include snippets/acme.conf;
  listen 80;
  server_name blog.tfiu.de;

  location / {
    root /var/blog/generated/;
  }

  location /media/ {
    alias /var/blog/media/;
  }

  location /bin/ {
    proxy_pass http://localhost:6070;
    proxy_set_header Host $host;
  }

  location ~ \.git/ {
    deny all;
  }
}

– again, the clause for /bin is related to local search and other scripting.

Extensions

In addition to my local search engine discussed elsewhere, I have also written two pelican plugins. I have not yet tried to get them into pelican's plugin collection because… well, because of the usual mixture of doubts. Words of encouragement will certainly help to overcome them.

For one, again related to searching, it's articlemtime.py. This is just a few lines making sure the time stamps on the formatted articles match those of their input files. That is very desirable to limit re-indexing to just the changed articles. It might also have advantages for, for instance, external search engines or havesters working with the HTTP if-modified-since header; but then these won't see changes in the non-article material on the respective pages (e.g., the tag cloud). Whether or not that is an advantage I can't tell.

Links to blog posts

The citedby plugin in action: These are the articles that cite this post right now.

The other custom extension I wrote when working on something like the third post in total, planning to revisit it later since it has obvious shortcomings. However, it has been good enough so far, and rather than doing it properly and then writing a post of it own, I'm now mentioning it here. It's citedby.py, and it adds links to later articles citing an article. I think this was known as a pingback in the Great Days of Blogs, though this is just within the site; whatever the name, I consider this kind of thing eminently useful when reading an old post, as figuring out how whatever was discussed unfolded later is half of your average story.

The way I'm currently doing it is admittedly not ideal. Essentially, I'm keeping a litte sqlite database with the cited-citing pairs. This is populated when writing the articles (and pulls the information from the rendered HTML, which perhaps is a bit insane, too). This means, however, that a newly-made link will only appear after a second generating pass.

Another weak point is that the thing does not notice when a post changes its slug, which means that when you change slugs, you will get zombie citing-cited pairs, and they will show as broken links on the rendered pages. It's easy enough to escape from that – rm content/.citedby.db; make clean html; make clean html – but that's certainly really clumsy.

I think one could do a lot better without undue effort by plugging into the parsing stage of the articles. But on the other hand, given I very typically generate the local site a few times before building the new published site, this two-pass-thing really is not a problem for me in practice, and unless people tell me they want to use this, too, it's unlikely I will put a lot of additional effort into it.

Editor configuration

There are a few vim extensions for dealing with reStructuredText. I'm not using any of them, mainly since I'm not a big fan of syntax highlighting or other attempts to make a computer seem smart. What I did find useful was automatically making hyperlink targets from their anchors (which is a bit tedious in reStructuredText). For that, I somewhat adapted autolink (which didn't quite work for me because it doesn't deal with links spread over multiple lines, and I'm religious about having source code lines no longer than 80 characters).

The thing has the simple purpose of adding .. _anchor text:: below a paragraph containing markup like `anchor text`_ when I hit \am. To enable this, my .vimrc contains an augroup for reStructuredText:

augroup rst
  au!
  autocmd BufRead,BufNewFile *.rstx,*.rst set tw=72 fo=tc2 et comments=""
  autocmd BufRead,BufNewFile *.rstx,*.rst  nnoremap <Leader>am :call Rest_create()<CR>
augroup END

For convenience, I define the Rest_create function referenced in place (i.e., in the .vimrc); and this longish source concludes this post:

" from: https://www.vim.org/scripts/script.php?script_id=4023
" Insert a search result for a ReST link definition.
" Insert a link definition like .. _foo:
function! Rest_createlink()
    " Find a link: `foo`_
    call search('`\_.\{,80}`_', 'bcW')
    " Get the text of the link.
    execute "normal! l\"my/`_\<cr>"
    let key = @m
    let key = substitute(key, '\n', ' ', 'g')
    " Insert the link definition after the current paragraph.
    call s:after_paragraph()
    execute "normal! o.. _".key.": \<esc>"
    call s:blank_line_if_next_does_not_match('\v^\s*\.\.')
endfunction

function! s:after_paragraph()
    execute "normal! }"
    if line('.') == line('$')
        " Last line in buffer. Make a blank line.
        execute "normal! o\<esc>"
    endif
endfunction

" Make a blank line after the current one if the next line exists and does not
" match a regex (i.e., another link definition).
function! s:blank_line_if_next_does_not_match(pat)
    if line('.') != line('$')
        let nextline = getline(line('.')+1)
        if match(nextline, a:pat) == -1
            execute "normal! o\<esc>k$"
        endif
    endif
endfunction

Zitiert in: Maintaining Static Blogs Using git push Quick RST Previews for Posts in Pelican Now on the Fediverse Explaining Tags in Pelican

Kategorie: edv

Kriegsfieber aktuell

Bar-plot in
    		Tönen von Oliv
(Erklärung)