A Feedback Form in Pelican

I realise that the great days of discussions on blogs are over, as Sam Hartman blogged the other day – at least for now. Still, I'd like to make it somewhat more straightforward to send me feedback on the posts here than having to get the contact address and dropping me a mail. Hence, I've written a little Python script, feedback, that lets people comment from within their web browsers.

While the script itself is perfectly general and ought to work with any static blog engine, the form template I give in the module docstring is geared towards pelican and jinja, although only in very few places.

To make it work, this needs to become a CGI (the template assumes it will show up in /bin/feedback according to the server configuration). The notes on deployment from my post on the search engine apply here, too, except that in addition the host has to be able to deliver mail. Most Unix boxes do locally, but whether anyone reads such mail is a different question.

Is it ethical to check “ok to publish” by default?

To configure where it sends mail to (by default, that's root, which may make sense if you have your own VM), you can set the CONTACT_ADDRESS environment variable (see the search engine post in case you're unsure how to do that for a web context). If your machine is set up to deliver mail to remote addresses – be it with a full mail server or using a package like nullmailer –, you could use your “normal” mail address here. In that case, you probably should inform people in your privacy policy that their comments will be sent by unencrypted mail, in particular if that “normal“ e-mail is handled by one of the usual rogues (Google still gets about a half of the mail I send – sigh).

If you look below (or perhaps right if you run your browser full-screen), you will see that there is a checkbox “feel free to publish“ that, right now, I have checked by default. I had some doubts about that in terms of creepy antipatterns. Of course I am as annoyed by most contemporary cookie banners as anyone else, which in violation of the GDPR usually have practical defaults – sure: not what you get when you say “configure” – set at the maximum creep level the operators believe they can get away with. On the other hand, defaults should also be expectable, and I'd guess the expectation when someone fills out a reply form on a blog is that the response will be published with the article. If you disagree: well, the comment form is there for you.

In terms of spam protection, I do an incredibly dumb textcha. Even if this script got deployed to a few dozen sites (which of course would be charming), I cannot see some spam engine bothering to figure it out; since it just sends a mail to the operator, there is basically nothing to be gained from spamming using the CGI. I fully expect this will be enough to keep out the dumb spambots that blindly send whatever forms they can find – it has worked on many similar services.

Security Considerations

The feedback script does at least two things that could be exploited:

  1. It enters remotely controlled values into rendered HTML and
  2. It calls a local binary with content controlled by the remote user.

In case (1) – that's when I put the URI of the originating article into the reply message to send people back to where they came from –, this can potentially be exploited in cross-site attacks. Suppose you trust my site on only execute benign javascript (I give you that's close to an oxymoron these days), someone could trick you into clicking on a link that goes to my site but executes their, presumably adversarial, javascript.

Bugs aside, the script is resilient against that, as it properly escapes any user input that gets copied into the output. That is thanks to my “micro templating“ that I keep around to paste into such one-script wonders. Have a look into the code if you're interested in how that works. And totally feel free to paste that into any Python code producing HTML or XML templated in any way – sure, it's not jinja or stan, but it has covered 80% of my templating needs at much less than 20% of the effort (counted in code lines of whatever dependency you'd pull in otherwise), which is a good deal in my book.

Case (2) is probably a lot more interesting. In the evaluate_form function, I am doing:

mail_text = MAIL_TEMPLATE.format(**locals())

Code like this usually is reason for alarm, as far too many text formats can be used to execute code or cause other havoc – the cross-site thing I've discussed for HTML above being one example, the totally bizarre Excel CSV import exploit another (where I really cannot see how this doesn't immediately bring every Windows machine on this planet to a grinding halt). In this case, people could for example insert \ncc: victim@address into anything that gets into headers naively and turn the form into a spam engine.

There are exactly 10000 lines if Python's email module in version 3.9.

In addition, there is a concrete risk creating some way of locally executing code, as the template being filled out is then used as an input for a local program – in this case, whatever you use as sendmail. In theory, I'm pretty sure this is not a problem here, as no user-controlled input goes into the headers. If you change this, either sanitise the input, probably by clamping everything down to printable ASCII and normalising whitespace, or by parsing them yourself. The message content, on the other hand, gets properly MIME-encapsulated. In practice, I can't say I trust Python's email package too much, as by Python stdlib standards, it feels not terribly mature and is probably less widely used than one may think.

But that's a risk I'm willing to take; even if someone spots a problem in the email module, shodan or a similar service still has no way to automatically figure out that it is in use in this form, and my page's insignificance makes it extremely unlikely that someone will do a targeted attack on day 0. Or even day 10.

But then, perhaps this is a good occasion to read through email's source code? Fun fact: in python 3.9, a find . -name "*.py" | xargs wc -l gives exactly 10000 lines. And my instinct that headers are the trickiest part is probably right, too: 3003 of those are in _header_value_parser.py.

Kategorie: edv