User talk:Shaihulud

From Meta, a Wikimedia project coordination wiki

Hello, someone told me, that you are creating the sql-dumps using scripts. I'd like to ask you, if it is possible to improve this, to get some smaller dumps by several techniques. The first idea is to create diffs beetween the uncompressed files thus someone can upload the diff and patch the previous table. May this need some changes to the sql-format (I am not sure yet) but the diff should be much smaller than the complete table. Second idea is to sort the old table first by title, secondly by namespace and third by id instead just by id. thus similar texts (different versions of an article) are close to each other and this should improve much the compression-ration. --Koethnig 20:49, 12 Oct 2004 (UTC)

Hello again. I starded coding and played a little bit with some old and cur tables. The idea to create diffs which can directly used to be updated seem do be a bad solution. For example in the german cur-tables this reduces less than to a factor 1/3 since more than 1/3 of all articles are modified (even in 3 days). For old-tables the factor should be much better but I belive that the factor for cur-tables could be shrinked to 1/30 and the size of the diff for cur-tables at most doubles the size for the diff of cur if we use other techniques. But first I will try to write a tool which reduces the size of the old-table in general without creating diffs. Thus I need a to write a tool which reduces the size of the dump and a tool which rebuild it. --Koethnig 01:29, 16 Oct 2004 (UTC)