Engelszüngeln: Maintaining Static Blogs Using git push

local                server

main  --- push --->   main
                        |
                        | (merge)
                        |
                        v
                   published --- make publish --->  nginx

Fig 1.  Our scheme in classic ASCII art.

Two Repos, Two Branches
Automating the Publication

In my post on how I'm using pelican – the static blog engine that formats this site –, I had described that on a make install, I would do a local build (make publish) and then rsync the result to the production site. Since about June, I no longer do that, because the way pelican works – it touches every generated file every time – is not a good match for rsync. With a growing site, this means a substantial amount of data (well: a few megabytes for me at this time) is being transferred. What's a few megabytes these days, you ask? Well, ever since UMTS has been shut down, on the road all I have is GPRS (i.e., 10 kB/s with a bit of luck), and then a few Megabytes is a lot.

I hence finally changed things to benefit from the fact that I keep the text content in a version control system. For a post without media, all that needs to be transferred are a few kilobytes for a git push. Here is how that is done (assuming a Debian-like setup).

First, unless your source directory already is under git version control, in there run:

git init
git add Makefile content plugins pelicanconf.py publishconf.py theme tasks.py
git commit -am "Migrating into git"

You will probably also want to have a .gitignore, and then probably several other files on top, but that's beside the current point.

Two Repos, Two Branches

The rough plan is to have a complete, checked-out git repository on the server side (ahem: see Figure 1). It is updated from your local repo through pushes. Since you cannot push into a checked-out branch, the server-side repository has a branch published checked out, while your authoring happens in the main (traditionally called master) branch. After every push, main is merged into published, and then pelican's site generation runs.

A word of warning: these merges will fail when you force-push. Don't do that. If you do, you will have to fix the breakage on the server side, either by dropping and re-creating the published branch, or by massaging all places that a force-pushed commit changed.

To set this up, on the web server do (adapting to your site and taste if you don't like the path):

sudo mkdir -p /var/blog/source
sudo chown `id -u` /var/blog/source # you'll be pushing as yourself
cd /var/blog/source
# create a git repo you can push into
git init
# go away from the main/master branch so you can push into it
git checkout -b published

Then, in your local git repository for the blog, add the repository you just created as a remote named prod and push the main branch (this assumes you have the main branch checked out):

git remote add prod ssh://USER@SERVER.NAME//var/blog/source
git push prod

On the remote server, you are still on the published branch, and hence you will not see what you have just pushed. You have to merge main using:

git merge main

(or master, if that's still the name of your main branch). You should now see whatever you have put into your local git above. If that's true, you can say make publish and see your publishable site in the output subdirectory. If it's not true, start debugging by making sure your main branch on the server side really contains what you think you have pushed.

Automating the Publication

This completes the basic setup. What is still missing is automation. That we can do with a git hook (see the githooks man page for more information on that nifty stuff) that is installed on the server side into /var/blog/source/.git/hooks/post-update. This file contains a shell script that is executed each time commits are pushed into a repository once git has updated everything. In this case, it is almost trivial, except for some bookkeeping and provisions for updating the search engine (all lines with BLOG_ROOT in them; delete these when you have not set that up):

#!/bin/sh
# This hook merges the main branch, builds the web page, and does
# housekeeping.
#
# This *assumes* we have the published branch checked out.  It should
# probably check that one day.

set -e

unset GIT_DIR # this is important, since we're manipulating the
   # working tree, which is a bit uncommon in a post-update hook.
cd ..
BLOG_ROOT=/var/blog

git merge master
make publish
BLOG_DIR=$BLOG_ROOT/source/output $BLOG_ROOT/media/cgi/blogsearch

Do not forget to chmod +x that file, or git will ignore it.

Again at the local side, you have to modify your install target so something like:

rsync:
       # adapt the paths!
              rsync --info=progress2 -av /var/www-local/blog-media/ blog.tfiu.de:/var/blog/media/

install: rsync
              -git commit -a
              git push -u prod master

(the - in front of the git commit is because git returns non-zero if there is nothing to commit; in the present case, you may still want to push, perhaps because previous commits have not been pushed, and hence we tell make to not bother about the status of git commit).

With this path and the separate media directory still updated through rsync (cf. the previous post on this), an nginx config would have to contain lines like:

location / {
  root /var/blog/source/output;
}

location /media/ {
  alias /var/blog/media/;
}

This setup has worked nicely and without a flaw in the past few months. It makes a lot more sense the my previous setup, not the least because any junk that may accumulate in my local output directory while I'm fooling around will not propagate to the published server. So: If you work with pelican or a similar static blog generator, I'd say this is the way to partial bliss.

Two Repos, Two Branches

Automating the Publication

Kriegsfieber aktuell