You have just read a blog post written by Jason McIntosh.
If you wish, you can visit the rest of the blog, or subscribe to it via RSS. You can also find Jason on Twitter, or send him an email.
Thank you kindly for your time and attention today.
Today I learned a practical lesson on how, when investigating software bugs, simplifying the problem space as much as possible matters — and so does thoroughly exploring every angle offered by that simplified space, no matter how unlikely it seems.
A client wished to test a new version of a certain system that another system of theirs, one under my control, regularly contacts through a simple, HTTP-based API. We found a strange problem: it rejected all attempts at communication from the system I oversee — let’s call it “MySystem” — always sending back HTTP 403 responses, signifying an access-permission failure of some kind. However, it was happy to talk to every other computer on the internet. It would even greet manual pokes sent through a web browser. From my point of view, the test system was pleased to receive messages from everyone in the world except for MySystem, the one system that my client required it to listen to.
Step one of investigating any mystery like this often means writing a program that replicates the problem as simply as possible. And so, I knit up a stand-alone script that invoked the same code libraries and techniques that MySystem uses to make web requests. Even though they came from my own computer, the remote system helpfully rejected these requests as well. Progress, of a sort!
As coincidence would have it, I had recently begun reading brian d foy’s new book Mojolicious Web Clients, which opens with several simple examples of tiny command-line web-requesters that print the full text of their work to the terminal as they go. Out of curiosity, I aimed one of these little ready-to-go programs at the target, from the very same computer — and the remote server welcomed the request, sending back the correct response.
At this point I had two simple test programs. One used the thoroughly modern web toolkit for the Perl programming language called Mojolicious (“Mojo” for short); the other used the decades-old LWP, which the similarly venerable MySystem employs. The former could make requests of the target with no problem, and the latter had all its requests summarily rejected instead. The requests came from the same machine, and asked for the same URL, using the same HTTP method. What was going on?
I had no access to the server’s logs, so I didn’t know what complaint it might have had with the LWP-based requests. Nothing to do, then, but dig into this problem myself, seeing what difference existed between the requests the two toolkits sent to the remote server. Time for another simple script that tried the trick with both libraries, printing out the full text of the request made, and the HTTP code of the response. The result looked like this (if you’ll pardon my obvious obfuscation of the client’s URL):
Trying with Mojo: GET /some/path HTTP/1.1 User-Agent: Mojolicious (Perl) Host: api.jmacs-client.com Content-Length: 0 Accept-Encoding: gzip Result with Mojo: 200 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Trying with LWP: GET https://api.jmacs-client.com/some/path User-Agent: libwww-perl/6.43 Result with LWP: 403
(Note that an HTTP 200 code means a successful request.)
All right, some obvious differences between the two requests, then. Mojo seems to attach several more headers to outbound HTTP requests, by default, than LWP does; my program specified no headers in either case. I saw that Mojo also split its reported target URL into a Host
header and a path, compared to LWP’s simpler-looking GET of the full URL, and didn’t know offhand what any of that meant. And, of course, the User-Agent
headers are different, with the two toolkits taking the opportunity to identify themselves by their full names.
Mojo’s additional headers were easy enough to try with LWP, so I started my experimentation there, adding the three extra headers — Host
, Content-Length
, and Accept-Encoding
— to the LWP request and running the program again. Same result, other than seeing those new headers printed out. This made the different ways that Mojo and LWP phrased the GET
line more suspicious, and I frowned; that would be harder to experiment with, requiring research into the reasons for the dissimiliar displays.
But before moving on to that deeper layer, and just to satisfy myself that I really had made the headers of the two requests as similar as I could, I tried the program again with the LWP request claiming a User-Agent
of Mojolicious (Perl)
. In my experience*, the value of the User-Agent
header never has any mechanical effect on server behavior; it exists merely as a way for particular clients to leave a “calling card” in server logs, if they wish — just a bit of information that might prove useful, from time to time, when reviewing logs by hand. So I expected no change; I just wanted to tidy up and make both sets of headers truly identical, for aesthetic reasons as much as any other, before continuing the investigation.
And, as you have no doubt already predicted, this change made both requests succeed.
Quite surprised and confused, I experimented with different values of User-Agent
. Within a few minutes, I’d deduced that the remote server rejected all requests whose User-Agent
value contained, anywhere within it, the substring libwww-perl
. Removing or modifying that substring let the request succeed. This accounted for the success of the Mojo request — and the failure of the MySystem system’s requests, which used a lightly edited version of LWP’s default value for this header.
It seemed, at this point, that the customer’s web server intentionally rejected with prejudice any request made by a program using the LWP toolkit — at least, those that didn’t bother to change outgoing User-Agent
values to something other than the default. Since at this point I had enough information to formulate a short question, I presented the problem to the sages at the #perl channel on Freenode. And as they often do, the denizens answered accurately within moments: some corners of the cybersecurity world have recommended blocking all requests from agents identifying as libwww-perl
, and some servers duly accept this advice. This stance sees this substring as a flag flown by a filthy bot, one that didn’t even bother to set a non-default name for itself. Not exactly flying the Jolly Roger, but not troubling to provide a nation (or project URL) of origin, either — and therefore deserving of suspicion to the point of summary dismissal.
A most unexpected outcome! In my whole career as a web programmer, I’d never imagined the User-Agent
header employed by automated scripts for any purposes other than curiosity (when reviewing my own server logs) or amusement (when writing a program that would go mark up some other server’s log). As a younger hacker I took pleasure in always setting the agent string to something unique to my project, imagining the calling cards my software left in server logs around the world; more recently, I have seldom bothered. What a surprise, then, to at last encounter a good reason to change the value to something other than the default.
Or, you know, to just carry on with my plans to retire LWP from all my Perl-based projects going forward and just use Mojo instead, since that apparently works just as well…
See also: A little jaunt through the bitwise.
* As several readers have pointed out to me, this speaks to my particular experience as one who has never needed to care much about website appearances during the start of the smartphone era, or the waning years of MSIE 6 — both times when serving different content depending upon user agent were de rigueur.
To share a response that links to this page from somewhere else on the web, paste its URL here.