I thought I would be able to write a script that would parse my own Apache logs and output a daily summary of this blog’s subscriber count. Unfortunately, a few hours of research today suggests that this does not seem feasible, and I find myself rather discouraged at the state of bloggers’ ability to self-report right now.

Initially, discovering this old bash script by Marco Arment heartened me; while out of date (leaning heavily on an assumption of Google Reader still existing, which it doesn’t), it still performed operations in line with my own observations from skimming my logs. So, I set out to write a simple Perl program that would do something similar with 2015-style, post-Google Reader data.

Very soon, I discovered that several aggregators’ feed fetchers, most notably Feedly, do not adhere to the Google Reader-style “N subscribers” substring that more community-minded aggregators (e.g. NewsBlur and Feed Wrangler) place into their fetchers’ user-agent identifiers when making HTTP requests. Here’s a (truncated) example of NewsBlur’s fetcher visiting my site, this morning:

blog.jmac.org:80 192.34.62.14 - - [15/Feb/2015:05:39:14 -0500] “GET /atom.xml HTTP/1.1” 304 - “-” “NewsBlur Feed Fetcher - 3 subscribers - http://www.newsblur.com/site/5827127/jmacorg-blog”

And here is a subsequent request from Feedly:

blog.jmac.org:80 65.19.138.33 - - [15/Feb/2015:05:44:11 -0500] “GET /atom.xml HTTP/1.1” 304 - “-” “Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)”

You can see the difference. Newsblur tells me not only how many human subscribers it represented at the time of its visit (three of ‘em), but also shares the precise URI representing this blog’s ID within its own service. Feedly, on the other hand, only links to its own FAQ webpage.

Feedly’s log-line also states “like FeedFetcher-Google”, as if the thing it tries to reassuringly compare itself to hasn’t been offline for well over a year. Writing this now I find it hard not to read this detail as a further indicator of basic uncaring for community standards on Feedly’s part. Sadly, I suspect they can afford to not care; circumstantial evidence suggests that they scored big from Google Reader’s demise. I know from personal experience that Feedly offered an attractive alternative for RSS-reading refugees in 2013, and various apps such as Reeder and NewsBlur made migration to Feedly very easy.

To be clear, I state here that I certainly cannot blame anyone for using Feedly, because I use it myself, every day. I just haven’t had any reason to look at it from a writer’s standpoint, rather than a reader’s, before today. I welcome visits from Feedly’s fetcher, even as I feel disappointed in this particular behavior.

As this 2013 article by Alex Knight of FeedPress notes, Feedly has been ignoring this ad-hoc but quite beneficial standard for years, and also ignores any requests to change their behavior. (That article also puts Digg, another frequent (and equally welcome!) visitor to my blog, into the same camp.) After I read Feedly’s FAQ, I started to write their customer support email to ask if they had any plans to make their user-agent strings friendlier, but the spirit left me before I could continue; it seemed like a waste of time. I wrote this blog post instead.

So, here are my choices for getting accurate blog reader counts:

  • Go ahead and write my script, counting only what I can count and saying ¯\_(ツ)_/¯ to the others, even knowing that Feedly alone probably accounts a significant proportion of my readership.
  • Write an extended version of my script that combines log analysis with horrible screen-scraping of subscriber data from the websites of Feedly and Digg et al, digging through oceans of weird JavaScript hacks on their end, as well as the knowledge that they might change it at any time.
  • As above, but learn Feedly’s baroque developer API, which does offer subscription counts for feeds — but this would involve registering my hinky little script as an app with the service, and handling all their authentication stuff somehow, and yeah, no (and I don’t even want to look at Digg’s API).
  • Subscribe to a paid service like FeedPress to handle my RSS feeds for me, requiring both a recurring cost and significant reconfiguration hassle, in exchange for accurate subscriber counts — all because some other random service wants to be coy about writing this number into outgoing HTTP headers.
  • Look, just forget it, it’s not that important.

I feel a bit sad about the whole situation and would rather just remain ignorant for the time being, instead of sinking any more time, money or effort into this problem. While I would love to know exactly how many readers I have, the fact remains that I tell myself I primarily write for myself, and here I suppose I have an opportunity to act like I mean it.


Next post: Less tweet, more meat

Previous post: On getting out of the house