WikiCharts

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

WikiCharts was a tool written by User:LeonWeber (and User:Duesentrieb) for finding the most popular (most viewed) pages on a wiki. This was done using javascript (source code) for relaying the name of the viewed page to the toolserver. The script is only executed with a probability of 1/counter_factor, to avoid overloading the toolserver. The data gathered using the javascript code is collected into a log file, IP addresses are not recorded. An entry of the log looks something like this:

[25/Aug/2006:10:54:35 +0000] "GET /index.png?ns=User&title=Anneke%20Wolf&factor=601&wiki=dewiki HTTP/1.1"

On the toolserver (Zedler) I have an awk script which inserts the data from the log into a mysql database every hour.

The DB schema:

+----------+----------+---------------+------------+---------------+
| stats_id | stats_ns | stats_title   | stats_hits | stats_samples |
+----------+----------+---------------+------------+---------------+
|   177616 | User     | Anneke%20Wolf |        601 |             1 |
+----------+----------+---------------+------------+---------------+

stats_hits is the number of occurrences in the log multiplied by the factor, stats_samples is the number of occurrences in the log.

The insert script:

<?
$wikis = array(
        'dewiki' => '[1-9][0-9][0-9]',
        'enwiktionary' => '[1-9][0-9]',
        'enwikiversity' => '[1-9]'
        );
foreach( $wikis as $wiki => $factor_regexp )
{
        $cmd = "/usr/local/bin/gawk 
        'BEGIN { print 
                \"prepare pageviews_$wiki from \\\"INSERT INTO u_leon.wikicharts_cur_$wiki 
                SET stats_ns = ?, stats_title = ?, stats_hits = ?, stats_samples = 1 
                ON DUPLICATE KEY  UPDATE stats_hits=stats_hits+?, stats_samples=stats_samples+?;\\\";\\n\"
        }
        /\/index.png\?ns=.*\&title=.*\&factor=$factor_regexp\&wiki=$wiki/
        {a[$4]++}END{for(i in a) { 
                x=i;sub(/^.*\/index.png\?ns=/,\"\", x); 
                b=x; sub(/^.*&title=/,\"\", b);
                y=x;sub(/^.*&factor=/,\"\", y);
                sub(/&wiki=.*/, \"\", y);
                sub(/&title=.*/,\"\",x);
                sub(/&factor=.*/, \"\",b);
                sub(/\?.*/, \"\", x);
                gsub( \"/\\\"/\", \"%22\", x);
                gsub( \"\\\\\", \"%5C\", x);
                gsub( \"/\\\"/\", \"%22\", b);
                gsub( \"\\\\\", \"%5C\", b);
                print \"set @title = \\\"\"b\"\\\"; set @ns = \\\"\"x\"\\\";
                        set @hits = \"a[i]*y\"; set @factor = \"y\"; set @samples = \"a[i]\";
                        execute pageviews_$wiki using @ns, @title, @factor, @hits, @samples;\" 
        } }' /tmp/pgcount/access.log";
        $cmd .= " | /usr/local/mysql/bin/mysql";
        echo shell_exec( $cmd );
}
        $cmd = "cat /tmp/pgcount/access.log >> /home/leon/access.log; echo '' > /tmp/pgcount/access.log; echo '' > /tmp/pgcount/error.log;";
        echo shell_exec( $cmd );
?>

This works, it's fast and I understand it :)

The web page is created by a PHP-script, its source could be viewed there: [1].

I can enable this tool for any wiki that requests it. All you need is an admin on that wiki and about ten minutes on IRC for us to determine the right setup.

The live source[edit]

... could be viewed there: [2]

Current successors[edit]