User:Brooke Vibber/Dump build split
Appearance
Current plan: split to four threads; one for enwiki, one for the next few largest wikis, one for a few dozen more medium-to-large, and a fourth for everything else.
This will allow spreading out the timing more, make better utilization of database servers, etc.
Currently going to run:
- thread 1 (enwiki) on srv31
- thread 2 (large) on benet
- thread 3 (medium) on srv31
- thread 4 (small) on benet
ZOMG
[edit]Handy splitter tool
[edit]Attempts to break up the database list into similar-sized chunks. Not totally succesful. ;)
<?php
$total = 0;
$counts = array();
$threads = 4;
$fudge = 1.0;
foreach( file("dbsizes.csv") as $line ) {
list( $revs, $db ) = explode( "\t", trim( $line ) );
if( $db == "Database" ) continue;
//echo "$db: $revs\n";
$counts[] = array( "db" => $db, "revs" => intval( $revs ) );
$total += intval( $revs );
}
$perthread = intval( $total / $threads );
echo "Total: $total\n";
echo "Desired threads: $threads\n";
echo "Ideal count per thread: $perthread\n";
$assignments = array();
$dbindex = 0;
for( $i = 0; $i < $threads; $i++ ) {
$assignments[$i] = array();
$dbcount = 0;
$revcount = 0;
while( $revcount < $perthread * $fudge && $dbindex < count( $counts ) ) {
$revcount += $counts[$dbindex]["revs"];
$assignments[$i][] = $counts[$dbindex];
$dbindex++;
$dbcount++;
}
echo "Thread $i: $dbcount databases, $revcount revisions\n";
}
foreach( $assignments as $i => $dbs ) {
echo "\n# Thread $i\n";
usort( $dbs, 'sortDatabases' );
foreach( $dbs as $item ) {
echo $item["db"] . "\n";
}
}
function sortDatabases( $a, $b ) {
return strcmp( $a["db"], $b["db"] );
}
?>
Suggested splits from the tool
[edit]Total: 113956291 Desired threads: 4 Ideal count per thread: 28489072 Thread 0: 1 databases, 48078833 revisions Thread 1: 4 databases, 29382691 revisions Thread 2: 40 databases, 28577478 revisions Thread 3: 635 databases, 7917289 revisions
Thread 0
[edit]- enwiki
Thread 1
[edit]- dewiki
- frwiki
- nlwiki
- plwiki
Thread 2
[edit]- arwiki
- bgwiki
- bgwiktionary
- cawiki
- commonswiki
- cswiki
- dawiki
- dewiktionary
- enwikibooks
- enwikinews
- enwikiquote
- enwiktionary
- eowiki
- eswiki
- etwiki
- fiwiki
- frwiktionary
- hewiki
- hrwiki
- huwiki
- idwiki
- iowiktionary
- itwiki
- ltwiki
- metawiki
- nowiki
- plwiktionary
- ptwiki
- rowiki
- ruwiki
- sep11wiki
- skwiki
- slwiki
- sourceswiki
- srwiki
- svwiki
- trwiki
- ukwiki
- viwiki
- zhwiki
Thread 3
[edit]- everything else!