Entering Wild Unicode in Vim

Screenshot of a vi session showing the unicode PILE OF POO symbol

This is what this post is about: being able to type PILE OF POO (a.k.a. U+1f4a9) in vim in…tuitively. For certain notions of intuition.

As a veteran of writing texts in TeX, I've long tended to not bother with “interesting” characters (like the quotes I've just used, or the plethora of funny characters one has for writing math) in non-TeX texts. That is, until I started writing a lot of material not (directly) formatted using TeX, as for instance this post. And until reasonably robust Unicode tooling was widely available.[1]

I enter most of my text in vim, and once I decided I wanted the exotic unicode characters I experimented with various ways to marry unicode and vim. What I ended up doing is somewhat inspired by TeX, where one enters funny characters through macros: a backslash and a few reasonably suggestive letters, as perhaps \sigma for σ or \heartsuite for ❤.

What lets me do a similar thing in vim are interactive mode abbreviations and unicode escapes. I've found the abbreviations do not inconvenience me otherwise if I conclude them with a slash. And so I now have quite a few definitions like:

iab <expr> scissors/ "\u2702"

in my ~/.vimrc. This lets me type scissors/␣ to get ✂ (and blank/␣ to get the visible blank after the scissors). This works reasonably well for me; it's only when the abbreviation is not bounded by blanks that I have have to briefly leave the insert mode to make sure that vim recognises the abbreviation. For instance y▶y – where the abbreviation needs to directly abut the letter – I have to type as y<ESC>aarrleft/ <BACKSPACE>y. I don't know about people who didn't grow up with TeX, but in my world such a thing passes as really natural, and for me it easily beats the multibyte keymaps that, I think, the vim authors would recommend here.

And how do I figure out the unicode code points (i.e., the stuff after the \u)? Well, there is the unicode(1) command (Debian package unicode), which sounds cool but in reality only points me to what I'm looking for every other time or so: It's hard to come up with good words to look for characters the name of which one doesn't know.

In practice, most of the time I look at the various code blocks linked from the Wikipedia unicode page. Going by their titles in my experience is a good way to optically hunt for glyphs I'm looking for. The result is the following abbreviations – if you make interesting new ones, do send them in and I will update this list:

iab <expr> deg/ "\u00B0"
iab <expr> pm/ "\u00B1"
iab <expr> squared/ "\u00B2"
iab <expr> cubed/ "\u00B3"
iab <expr> times/ "\u00D7"
iab <expr> half/ "\u00BD"
iab <expr> acirc/ "\u00E2"
iab <expr> egrave/ "\u00E8"
iab <expr> idia/ "\u00EF"
iab <expr> subtwo/ "\u2082"
iab <expr> euro/ "\u20ac"
iab <expr> trademark/ "\u2122"
iab <expr> heart/ "\u2764"
iab <expr> smile/ "\u263A"
iab <expr> arrow/ "\u2192"
iab <expr> emptyset/ "\u2205"
iab <expr> bullet/ "\u2022"
iab <expr> intersects/ "\u2229"
iab <expr> scissors/ "\u2702"
iab <expr> umbrella/ "\u2614"
iab <expr> peace/ "\u262e"
iab <expr> point/ "\u261b"
iab <expr> dots/ "\u2026"
iab <expr> mdash/ "\u2014"
iab <expr> sum/ "\u2211"
iab <expr> sqrt/ "\u221a"
iab <expr> approx/ "\u2248"
iab <expr> neq/ "\u2260"
iab <expr> radio/ "\u2622"
iab <expr> hazmat/ "\u2623"
iab <expr> pick/ "\u26cf"
iab <expr> eject/ "\u23cf"
iab <expr> check/ "\u2713"
iab <expr> alpha/ "\u03b1"
iab <expr> beta/ "\u03b2"
iab <expr> gamma/ "\u03b3"
iab <expr> delta/ "\u03b4"
iab <expr> epsilon/ "\u03b5"
iab <expr> zeta/ "\u03b6"
iab <expr> eta/ "\u03b7"
iab <expr> theta/ "\u03b8"
iab <expr> kappa/ "\u03ba"
iab <expr> lambda/ "\u03bb"
iab <expr> mu/ "\u03bc"
iab <expr> nu/ "\u03bd"
iab <expr> Delta/ "\u0394"
iab <expr> Xi/ "\u039e"
iab <expr> pi/ "\u03c0"
iab <expr> rho/ "\u03c1"
iab <expr> sigma/ "\u03c3"
iab <expr> chi/ "\u03c7"
iab <expr> supp/ "\u207A"
iab <expr> supm/ "\u207B"
iab <expr> tripleeq/ "\u2261"
iab <expr> cdot/ "\u22c5"
iab <expr> powern/ "\u207f"
iab <expr> chevopen/ "»"
iab <expr> chevclose/ "«"
iab <expr> element/ "\u2208"
iab <expr> notelement/ "\u2209"
iab <expr> subset/ "\u2282"
iab <expr> superset/ "\u2283"
iab <expr> blank/ "\u2423"
iab <expr> block/ "\u2588"
iab <expr> achtel/ "\u266a"
iab <expr> clef/ "\u1d11e"
iab <expr> arrleft/ "\u25B6"
iab <expr> arrdown/ "\u25BC"
iab <expr> poop/ "\U1f4a9"

One last thing one should know: quite a few interesting unicode characters are outside of what's known as the „Basic Multilingual Plane“, which is a pompous way to say: within the first 65536 code points. That in particular includes all the emoijs (please don't torture me with those), but also the timeless PILE OF POO character, the rendering of which in the Hack font is shown in the opening image. Addressing codepoints above 65536 needs more than four hex characters, and to make vim grok those, you need to say \U rather than \u.

[1]Which, I give you, has been the case as of about 10 years ago, so it's not like all this is bleeding-edge.
Kategorie: edv

Letzte Ergänzungen