I received an email over lunch on Saturday from a kind stranger informing me that users of an XMPP-based chat server I operate have been sending spam to this person’s own system. I confess I’d been letting the server essentially run itself for many years, so when I visited the sub-basement of the jmac.org filesystem where it prints all its logs and flipped the lights on, I recoiled at the writhing mass of obvious non-human users which constituted the bulk of its activity, doing god knows what. Thus did I spend my afternoon in an unplanned but quite engaging digital sanitation exercise.
First, following my interlocutor’s advice — and after I mailed them back with a swift apology and an oath to make things right — I deactivated the server’s default, unfettered “in-band registration” feature. When active, this allows any person or process to sidle up to an XMPP server and create a new account in a single step. When I first set up this Jabber server in 2004 to use as the basis for a tremendously ambitious project, this seemed like a natural thing to want. We know better now, of course, but in all the time since, I confess I never thought back to it. So, I shut and painted over that door in the config file. This immediately stopped the unchecked mechanized immigration, but now I had to figure out what to do about all the critters who’d moved in since.
My next step, then, involved separating the soft, squishy sheep from the cold robo-goats. Once I had puzzled out how ejabberd’s command-line interface works nowadays, I asked it for a dump of all the server’s registered users, and got back around 42,000 names. Most of them resembled long strings of gibberish: obvious robots. But, I had no desire to pick out the humans from a such a long list on sight alone, and I knew that trying to do it via pattern-matching would surely catch up a lot of false positives.
So, I changed my focus from the static user-list to the Jabber server’s logs, recording in great detail all its activity over the last two weeks. (Normally the logs would cover a longer stretch of time, but the robots toiling in darkness kept the server so constantly busy that its automated log-rotation moved much faster than normal, foreshortening its calendar-coverage.) I decided to call it a win if I could eliminate all the robots responsible for recent abusive behavior, and not worrying at present about the cold scrap-pile of inactive bot-accounts, no matter how huge.
Eyeballing the logs, three facts became clear:
The robots connected from a wide variety of IP addresses, meaning that my server likely had at least one botnet aimed at it, and that merely blocking access to my machine based on incoming IP would probably not work.
The robots often opened several simultaneous, overlapping connections, keeping each one open for only a few seconds at a time. (I assume that each such connection stayed active just long enough to inject a payload of spam at its targets, then immediately exit.)
When logging in, the robots preferred to capitalize the first letter of their usernames. For example, if one held the account “abaca”, then it would always log in with the literal string “Abaca”, using a capital “A”.
Blocked by the first fact and guided by the latter two, I wrote a Perl script that analyzed the logs and printed the names of every account which, over the last two weeks, had at least three times connected for one minute or less using a leading-capital-letter login name. Within moments, I had a list of around 800 account names which displayed a fascinatingly oblique sort of homogeny. Here’s a few excerpts of the output file (which I’d sorted into alphabetical order):
All the bots’ names looked like this. What strikes me is how they tried. They could have chosen random strings of characters — indeed, most of the inert usernames in my initial file of 42,000 usernames is stuff like “!!dy42” or “14khuhrg9” (both real examples) — but they instead made the effort to pass as human by following some formula to programatically generate names that are on-sight pronounceable to an English-language reader. I imagine the robots somehow feeling pleased with their nametags, looking forward to mingling invisibly with humans, paying no mind that not a one of them matches a name any real human has ever carried. They further assert their friendly, perfectly-normal humanity by always carefully capitalizing their names when logging in, because of course that is how humans always write their names down, yes?
I am not sure whether this personification of the robot accounts as hapless alien infiltrators made my erasing them more or less pleasurable, but I rewrote the script to wrap all those names in account-deletion shell commands and ran it anyway. Having committed this atrocity, I find myself keeping the poor critters alive just a little bit longer, in a way; in writing this post and revisiting that list, I find I enjoy saying their names out loud. With their reliance on plosives and open-vowel endings, to my Anglophonic ear they have a pleasant, vaguely African lilt to them. They could be characters in an intriguing, otherworldly novel, and even as I type this I wonder if the master of these robots borrowed the services of an existing fantasy-story-name-generator tool.
And that is the end of the story of the funny little robots who wore silly man-masks and got away with it for a while. (While spewing spam across the internet from the safety of my own server. Sorry. I fixed it!)