Data dumps/2006 notes
Appearance
This page is kept for historical interest. Any policies mentioned may be obsolete. If you want to revive the topic, you can use the talk page or start a discussion on the community forum. |
Clusters
[edit]The wikis hosted in our Korean cluster will have a separate host, at http://download-yaseo.wikimedia.org/
Reporting
[edit]The backup runner script will generate some pretty HTML pages showing status as each file completes, so it should be easier to see what's done, what's in progress, and what failed.
I'm about to code up this part, shouldn't be too hard I hope. :)
File layout
[edit]This basic layout of file generation is complete in the script:
- public/
- dbname/
- YYYYMMDD/
- dbname-YYYYMMDD-all-titles-in-ns0.gz
- list of page names for BBC
- dbname-YYYYMMDD-table.gz
- SQL table dumps
- dbname-YYYYMMDD-pages-type.xml.bz2
- dbname-YYYYMMDD-pages-type.xml.7z
- XML page text dumps
- dbname-YYYYMMDD-abstract.xml.gz
- page extracts for Yahoo
- dbname-YYYYMMDD-all-titles-in-ns0.gz
- YYYYMMDD/
- dbname/
Static URLs
[edit]There will probably also be a directory with symbolic links for a static URL to whatever the latest version is of each file. Will likely look like this:
- public/
- dbname/
- latest/
- dbname-all-titles-in-ns0.gz
- list of page names for BBC
- dbname-table.gz
- SQL table dumps
- dbname-pages-type.xml.bz2
- dbname-pages-type.xml.7z
- XML page text dumps
- dbname-abstract.xml.gz
- page extracts for Yahoo
- dbname-all-titles-in-ns0.gz
- latest/
- dbname/
Images/uploads
[edit]Not yet included, this may change in near future.