I realise that the great days of discussions on blogs are over, as Sam Hartman blogged the other day – at least for now. Still, I'd like to make it somewhat more straightforward to send me feedback on the posts here than having to get the contact address and dropping me a mail. Hence, I've written a little Python script, feedback, that lets people comment from within their web browsers. [Update 2022-10-07: Don't take it from here; rather, see https://codeberg.org/AnselmF/pelican-ext]
While the script itself is perfectly general and ought to work with any static blog engine, the form template I give in the module docstring is geared towards pelican and jinja, although only in very few places.
To make it work, this needs to become a CGI (the template assumes it will show up in /bin/feedback according to the server configuration). The notes on deployment from my post on the search engine apply here, too, except that in addition the host has to be able to deliver mail. Most Unix boxes do locally, but whether anyone reads such mail is a different question.
Is it ethical to check “ok to publish” by default?
If you look below (or perhaps right if you run your browser full-screen), you will see that there is a checkbox “feel free to publish“ that, right now, I have checked by default. I had some doubts about that in terms of creepy antipatterns. Of course I am as annoyed by most contemporary cookie banners as anyone else, which in violation of the GDPR usually have practical defaults – sure: not what you get when you say “configure” – set at the maximum creep level the operators believe they can get away with. On the other hand, defaults should also be expectable, and I'd guess the expectation when someone fills out a reply form on a blog is that the response will be published with the article. If you disagree: well, the comment form is there for you.
In terms of spam protection, I do an incredibly dumb textcha. Even if this script got deployed to a few dozen sites (which of course would be charming), I cannot see some spam engine bothering to figure it out; since it just sends a mail to the operator, there is basically nothing to be gained from spamming using the CGI. I fully expect this will be enough to keep out the dumb spambots that blindly send whatever forms they can find – it has worked on many similar services.
The feedback script does at least two things that could be exploited:
- It enters remotely controlled values into rendered HTML and
- It calls a local binary with content controlled by the remote user.
Bugs aside, the script is resilient against that, as it properly escapes any user input that gets copied into the output. That is thanks to my “micro templating“ that I keep around to paste into such one-script wonders. Have a look into the code if you're interested in how that works. And totally feel free to paste that into any Python code producing HTML or XML templated in any way – sure, it's not jinja or stan, but it has covered 80% of my templating needs at much less than 20% of the effort (counted in code lines of whatever dependency you'd pull in otherwise), which is a good deal in my book.
Case (2) is probably a lot more interesting. In the evaluate_form function, I am doing:
mail_text = MAIL_TEMPLATE.format(**locals())
Code like this usually is reason for alarm, as far too many text formats can be used to execute code or cause other havoc – the cross-site thing I've discussed for HTML above being one example, the totally bizarre Excel CSV import exploit another (where I really cannot see how this doesn't immediately bring every Windows machine on this planet to a grinding halt). In this case, people could for example insert \ncc: victim@address into anything that gets into headers naively and turn the form into a spam engine.
There are exactly 10000 lines if Python's email module in version 3.9.
In addition, there is a concrete risk creating some way of locally executing code, as the template being filled out is then used as an input for a local program – in this case, whatever you use as sendmail. In theory, I'm pretty sure this is not a problem here, as no user-controlled input goes into the headers. If you change this, either sanitise the input, probably by clamping everything down to printable ASCII and normalising whitespace, or by parsing them yourself. The message content, on the other hand, gets properly MIME-encapsulated. In practice, I can't say I trust Python's email package too much, as by Python stdlib standards, it feels not terribly mature and is probably less widely used than one may think.
But that's a risk I'm willing to take; even if someone spots a problem in the email module, shodan or a similar service still has no way to automatically figure out that it is in use in this form, and my page's insignificance makes it extremely unlikely that someone will do a targeted attack on day 0. Or even day 10.
But then, perhaps this is a good occasion to read through email's source code? Fun fact: in python 3.9, a find . -name "*.py" | xargs wc -l gives exactly 10000 lines. And my instinct that headers are the trickiest part is probably right, too: 3003 of those are in _header_value_parser.py.