Andrew Gray (shimgray) wrote,
Andrew Gray

Wikipedia pageviews

This is, perhaps, the geekiest thing I have ever posted to LJ. You may wish to look away now. There are graphs and pageview-based psephology.

Recently, someone hacked together something which sits on top of the Wikimedia servers and counts the number of pageviews to given pages. This is a lot more revolutionary than it sounds - yes, hit counters are stone-age tech, but previous attempts to implement one had foundered on a complete lack of resources and log-processing capability.

(Top-ten website, run by three guys and a lot of string. Go give money or it'll vanish. I digress.)

Anyhow, it's here. There are two interesting things to bring out, at a first pass:

Overall popularity.

A top-articles list, covering three weeks in February, is here. Slightly under 10,000 articles, cutoff at an average of one hit/minute.

The top entries are mostly obvious (the main page) or oddities; of the oddities, 'wiki' & 'wikipedia' are rather significant, because it reminds us just how many people use search engines as proxies for remembering a site address. Search, click first entry, go from there. Valentine's Day & Chinese New Year are both obviously "current events" - this is, after all, in February. 'Canine reproduction' is probably a data glitch.

After that, we settle down into a steady diet of sex, pop culture, and Very Basic Reference Material. This is what the Internet looks like when distilled down, people.

The ones that leap out at me are #7 and #9 - Barack Obama and John McCain. Hillary Clinton is, when we look hard, at #77, with about a third to a quarter of the pageviews of her rival - it's hard to be precise because of the way the stats treat alternate headings. What does that tell us? I really don't know; the two day-by-day graphs look pretty much the same (Obama; Clinton.) and both articles have about the same number of inbound links (~2000 to Clinton, ~2500 to Obama).

The articles on their campaigns are vastly different - 125k Obama to 4k Clinton! - whilst it's merely a 2:1 ratio for the articles on their political positions (Obama; Clinton). I guess if it indicates anything, it's that more people will happily admit to themselves they have no idea who Obama is, but they seem to feel confident they already know plenty about Clinton...

These figures roughly hold back to mid-December, before which we don't have the data. Anyone want to trace the rise and fall of the R. candidates through the same metrics?

Time distribution.

We saw above, with the Clinton/Obama graphs, a sharp spike on Super Tuesday, a general weekly cycle, and the odd news blip.

If we take an entirely mundane article, we can see the weekly pattern quite clearly - silver; pig. When we look at "pop culture" topics, they're either evened out (Roswell UFO incident), or show the inverse effect, spiking at the weekends (Sims 2; Disney Channel). The explanation for that is pretty clear - people working look for different things to people not-working. We can also get periodicity linked to other things; television schedules, for example (Torchwood, peaking v. sharply every Thursday).

And, perhaps most interestingly, the normal cycle can be swamped by directed traffic on the wiki itself. Let's look at the daily front-page articles - on Feb 11th, it was Peru; on Feb 12th, Constitution of Belarus; on Feb 13th, Irish phonology.

The first one shows both the normal weekly cycle and a very strong spike - 37k rather than the expected 7k - and then a couple of mildly elevated days after that before it returns to normal. The second, virtually no traffic then 18k views, three moderately elevated days, then back to almost nothing; the third, ditto, except 28k.

I am going to have to fight the urge to spend hours trying to tweak interesting stuff out of these numbers. If I do succumb, I'll try to spare you it.
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.