Bise remains in the very-alpha phase of itch-scratching personal software projects where I watch, over a period of weeks, to see how it really wants to be used. I’ve a couple of interesting shifts in relevant project philosophy to report and discuss since my initial release announcement from earlier this month.
In a social chatroom recently, I wondered out loud how AWStats, the high-level web-traffic analyzer I’ve used for a long time, presents its table of user visit-times. Each row of this table reports the number of remote users that have visited the analyzed website — whether less than a minute, or between one and five minutes, and so on. Since AWStats uses webserver access logs as its sole input source, and these logs’ entries record remote users’ activity only in terms of the resources they request, I found it entirely unclear how it claims to know how long a user stuck around reading or otherwise absorbing the requested media.
A colleague wiser than me revealed that AWStats simply measures visit-length from a given IP based on the bounding times around a discrete cluster of requests from that IP. So if the logs show that a certain visitor arrived and actively clicked around for 30 minutes (perhaps with a few minutes passing between clicks), AWStats considers that a half-hour visit. If another person stops by the front page and makes no further requests, then AWStats marks that as a five-second visit — regardless of whether the visitor really did bounce elsewhere immediately, or whether they lingered an hour to leisurely read the page’s text before moving on.
This strikes me on first blush as quite misleading, but on further reflection I can concede its greater utility for websites other than text-heavy blogs. More to the point, all the sites I design and maintain professionally present the user with goal-driven activity-flows, where they visit in order to accomplish some specific task — buying a ticket for a cruise, say, or viewing a list of one’s past orders. For these websites, we can quite reasonably expect that a typical user wouldn’t linger for a long time on any one page, and thus we can consider them to have departed soon after their final request of the server — just as AWStats does.
In the same conversation, my friend offered the observation that measuring visits by IP address — the technique used both by AWStats and Bise — unavoidably invites misinterpretation no matter how one reads it. Today more than ever, a single human user is likely as not to come at a given website from a multitude of IP addresses. An individual may have a home IP address and an office address, and either might change by way of DHCP, giving us a bunch of IPs already — but now we must also account for one or more mobile IPs, depending upon any number of factors (not at all limited to the number of mobile devices that person might use). I must concede that this fact discourages the simple and naive interpretation of a set of visits from a single IP address as representing exactly one human reader of my blog.
But I also feel that this does not therefore consign Bise’s output as useless. This interpretive thinking did remind me of a facet of Bise’s design philosophy that I did appear to forget by the time I wrote its documentation, such that it has no mention there (an oversight I plan to correct presently). Namely, while Bise does do its best given its input — again, plain old logfiles — to estimate the number of unique regular visitors a website has, one should read the numbers it offers as a score, more than a definitive turnstile-clicker person-count. It’s a slightly abstracted number, based on but not transparently indicative of objective reality. The number’s size suggests the size of your audience, in terms of about how many extra chairs you’d want to set up should you expect them all to visit your house at once. The number’s change over time reflects the growth of your readership, with rate and degree of change both represented.
Thinking of Bise’s output-numbers in this way also leads me to conclude that the program works best when you set its
regular_interval_days number to match your blog’s average time between new posts, measured in days, rather than the naive single-day default that I’d initially thought fit most purposes.
This configuration-file setting tells Bise the minimum number of days that should elapse between a given IP addresses earliest and most recent visit (within the weeks-long time frame of recent log entries that Bise considers) in order to count that IP as a “regular” visitor. In Bise’s first release, I set this to 1, meaning that any IP that hit the site and then hit it again 24 hours later would increment the blog’s “score” by 1.
But for Fogknife, to which I post around once per week, I’ve increased this setting to 7. So now, in order to tick the score up, a given IP address must return to the website at least a full week after its earliest known visit within Bise’s consideration window. As of this month, this reduces the “All visitors - regular” row in my Bise output table from 260 to 175. (That counts around 110 feed-readers, with the remaining 60 regulars split between those who stop by the front page weekly to see what’s new, and those who reload specific internal pages for reasons I can only speculate about.)
Envisioning my audience here as a cozy lecture hall with a bit fewer than a couple hundred seats set up feels right for a modest blog like this. I find myself quite willing to count any IP address that goes through the trouble of visiting twice across seven days as a reader, or at least a sufficiently readerlike entity. Moreover, this certainly seems more accurately meaningful than the thousands of unique IPs that AWStats reports as monthly visitors — the raw-traffic figure I referenced at the end of my 2017 project-review post, and the writing of which probably helped needle me into creating Bise. The pride of reporting that puffed-up number was followed almost immediately by suspicion that it hid a deeper — and rather more humble — truth.