Requests for dumps

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Requests and proposals Requests for dumps
Ask on this page for dumps that you need but that are not available from download.wikimedia.org, or that you cannot reasonably download from that site because of bandwidth limitations (on your end or on Wikimedia's end). Please specify which wikis you want dumps for and what data you want in the dump. If you want dumps shipped to you on some type of media, please note the kind of media, and provide a way for someone to get in touch with you to coordinate shipping (you will have to pay for materials and shipping).

If you want an unusual set of data in a dump, please provide the query necessary to complete your request - it'll be slower if the developer concerned has to write the query.

Note: No one seems to watch this page. The odds of you even getting a reply are slim to none. Please follow the XML Data Dumps mailing list] by reading the archives or subscribing, for up to date news about the dumps; you can also make inquiries about them there.

Contents

Static HTML of small wikimedia wikis[edit]

Out of the box wikibooks database viewing is a bit limited at the moment, how about a static html dump for these and other small but valuable wikis?

Dewiki request[edit]

Would it be possible to get the images for dewiki as a tarball? This is for a DVD distribution project and would be a one-off.

Request for a dump of revision table of SQL database[edit]

Hey, I'd love to get a dump of the revision table - it would obviously be much more managable than the full history, and allow me to scan how many edits have been done to different pages in an easier way. Thanks! Please write on my talk page if this is done!

--Monk (this is a link to my talk page)

dump of zhwiki[edit]

Hi, guys,

The lasted dump of zhwiki is June 25, 2008. It has been almost four months till now. Could you start the new process of dumping the latest zhwiki?

Thanks!

Hi, It has been four months since the last dump of the zhwiki. Wish the dumping process would be sooner start. Thanks!

alternative fuels[edit]

I am creating a wiki on alternative fuels and would like a dump on all alternative fuels data ..... Thank you for your help :) you can contact me at wiki@e85wikipedia.com --Jbyrd843 06:00, 8 October 2008 (UTC)

Enwiki with edit history, preferably earlier version[edit]

Hello, we are doing research on the evolutionary nature of Wikipedia from a computational perspective. Would it be possible to get a version of wikipedia with the history of edits upto that point. An early version from 2003 or 2004 would be superb, since the size would make it easier to work with, but other versions would do as well. A version with edit history, all of the internal links and categorical information would be necessary. Of course, I would be ready to assume expenses involved in making the copy and sending it. Please contact me (chenli@uiuc.edu), Thanks a lot.

How to dump my own WikiDatabase to be able to read it with WikiTaxi ?[edit]

Could you give me the instruction to dump my personnal WikiBase and be able to load it in the WikiTaxi application ? I need to provide the xxxx_pages_articles.xml.bz2 file, and I don't know how to do this exactly (with dumpBackup.php?, wich option ?) Thank you for your help. (guenola@newaccess.ch)


Hi,

try: php dumpBackup.php --full > /tmp/dump.xml
bzip2 /tmp/dump.xml

You may have to edit: AdminSettings.php/AdminSettings.sample first.

But I don't know how to add pictures and other files...


Frank

the latest full enwiki dump (01/03/08) has gone 404?[edit]

Just wondering what happened with the last full enwiki dump? Last week or so it was available here: [1]... does anyone know if it's coming back? --Andicat 12:22, 28 May 2008 (UTC)

English Wikipedia: Page dump with edit history for one category[edit]

Would it be possible to request a page dump of English Wikipedia's Society and Social Sciences categories with the complete edit histories but with.

The download bandwidth is not an issue.

Thanks.

Dump of English Wikipedia[edit]

I was wondering if anyone would be able to provide the following download on a DVD? pages-articles.xml.bz2 3.2 GB

Of course, I'm willing to pay for the postage, media and handling. Contact me at kwhartmaNOSPAM@artsci.wustl.edu. I'm on Wikipedia as OglaHai as well.

Thanks. 4.244.144.29 00:54, 14 February 2008 (UTC)


The enwiki dump of 4/25/08 failed. The last one available is from 3/15/08. Does anybody know if another attempt will be occurring soon? Is there a way of getting incremental updates so that the one from 3/15/08 can be augmented? Thank you very much.

Dump of wikiprotein.org[edit]

Please make a dump of wikiprotein.org. unsigned comment from 155.250.129.24

Hi,
wikiprotein.org isn't a wikimedia project.
Try to see on this wiki the best contact or mail omegawiki <at> gmail.com.
--Dereckson 15:58, 10 December 2007 (UTC)
See also http://www.wikiproteins.org/Development. A database dump of the Community content (based on GEMET data) is generated daily. The SwissProt and UMLS datasets are not dumped, but I believe this is a licensing issue rather than a technical one. László 12:05, 5 March 2008 (UTC)

Dump of ru.wikipedia.org images[edit]

Please make a dump (if possible split into parts less than 300M) of the images (actual images, not just their descriptions) used by ru.wikipedia.org. This is needed for several wiki-projects in Russian.

Thanks!


Dump For Wiktionary[edit]

Is it possible to have a dump for just page content and meta tags To clarify: I want is the an mysql or xml dump with just page content with meta tags. I want to be able to add to those meta tags. It shouldn't have anything else.

Thanks

Dump of WikiSpecies[edit]

I'm interested in getting a dump of species with taxonomic and vernacular names; is there any way to cross-reference with wikipedia articles on specific species? I'd really like to have the conservation status of each, but that doesn't seem to appear in WikiSpecies.



Dump of Romanian Wikipedia articles[edit]

I am an Natural Language Processing researcher, and need to have access to a dump of Romanian Wikipedia. I would prefer to have only the political articles, however the entire set of Romanian articles will do as well. The last dump that was performed had 25 errors, and the file pages-articles.xml.bz2 was not created. I would greatly appreciate your support. 11:00, 16 February 2006

Full history dump of all languages (from late 2006)[edit]

Full-history dumps of all languages are large. Some are too large for the dumps to complete successfully. The last full-history dump of en:wp that works at the moment is from mid-August (another is underway; let's hope this one completes without trouble...) I would like a copy of the latest full-history dump of each language of each project on a hard disk; along with a summary list of the dates of each dump. If you have such data (dates don't have to be the same for each dump) and want to help, let me know; and I will send you a drive and return packaging. See also below. +sj | help with translation |+ 03:37, 1 November 2006 (UTC)

Indeed; there is no working pages-meta-history.xml.bz2 in either of the dates available: one failed, one is still in progress. I suggest that the dumping system be improved to not delete a completed dump until a new, successful dump exists to take its place, to prevent this situation happening.--152.78.61.53 10:32, 2 May 2007 (UTC)

Image dumps (from November 2005)[edit]

I would like image dumps with metadata; for instance the Wikimedia image dumps from November 2005 from each project and from commons. More recent dumps would be excellent; but I don't know that any have been made.

The commons dump alone is 290GB. It will be cheaper and faster, and less strain on the dumps server, for me to ship a 500G disk to someone who could copy the dump. If you have access to a dump and could make me a copy, please get in touch by email to work out the details. +sj | help with translation |+ 03:20, 1 November 2006 (UTC)

Make random-subset dumps available from downloads.wikimedia[edit]

Many researchers just want a randomized subset of articles, or users, or revisions to run tests and experiments and programs on. At the moment, they either have to screenscrape what they want, or download the entire dump of the project they are interested in -- for en:wp, this is difficult to do regularly or efficiently.

Please consider adding subsets of en:wp and other wikis to the default collection of dumps: for instance, all revisions and users and metadata associated with 1000 random articles; or all metadata and revisions of articles touched by 1000 random users, along with their userspace pages. Here randomness could mean almost-even distribution across the set of all users with at least one edit, or articles that are not blank or redirects. Deciding how to randomize and what to select is not trivial; if you are interested in doing this, please ping the wiki-research-l list for more specific details on what good random subsets would include. +sj | help with translation |+

Latest enwiki dump for which pagelinks.sql.gz got successfully generated[edit]

Hi, I've been looking for the latest enwiki dump that has page-to-page link records, but I couldn't find any at http://download.wikimedia.org/enwiki/ (every dump shows "failed" status for it). Where can I find such an instance?

I'm looking for the same thing - could anyone point me to it? Martinp23 17:16, 14 May 2007 (UTC)

Full-history articles-only dump[edit]

It would be useful for research purposes to have a dump of all the content pages with full history, but not all the 'fluff' such as talk and user pages. This would fit a middle-ground between the current-version-only dumps, and the history-of-everthing gargantuan dump.--152.78.61.53 10:33, 2 May 2007 (UTC)

Regular Wikipedia xml dump seems to be corrupted[edit]

I have been trying to download pages-articles.xml.bz2 and it downloads a file of about 700b of size (<1k). bz2 says this is corrupted . Would it be possible to redump or point to an older but working version of the xml dump? There is currently no version available.

Are you downloading this file [2]? You're probably looking for this one [3]. --69.91.62.221 22:19, 20 May 2007 (UTC)

Dump schedule[edit]

Is there any way to know what's in the queue to be dumped? The status page at http://download.wikimedia.org/backup-index.html is useful for knowing what's currently being dumped and what was recently dumped, but doesn't let us know what's coming up. Adding that information, as well as some historic dump dates, would let us estimate when any particular dump will be available.

I realize this isn't exactly a dump request, but this was the closest page I could find to ask this, and I figured that the people who have to fulfill dump requests would probably be interested in preventing at least some pointless requests by letting would-be requestors know that their favorite dump file is already scheduled to be generated later that day (and more importantly, would know how to output the queue that the dumper is using).

Oh, and after watching the page for a while, I now see that it's just a big loop. It should probably say that somewhere.

English Wikipedia Dump of July 16[edit]

I am unable to understand how the English Wikipedia dump of July 16 : [4] has the page-meta-history.xml.bz2 file of just 5 GB. I remember the version previous to this one had a page edit history file in .bz2(and was named complete-page-edit-history.xml.bz2) format of size 85 GB. So, does the new meta file contain something other than complete page edit history?

Help dumps[edit]

I must not be the first to ask about dumping specific help/editing/meta pages. I would really like it if it were possible to download these pages, and to reuse them in a wiki I'm setting up for my company, instead of writing my own help pages.

All available enwiki dumps failed to dump pages and their histories[edit]

The two dumps currently available: 2007-09-08 and 2007-08-02 have both failed to dump 'All pages with complete edit history' in both the 7z and bz2 formats. There are no other enwiki dumps available in the web directory. Is there another source for the latest enwiki dumps that have the articles and their histories? Is there something terribly wrong going on with the dumping process for enwiki? Thanks. ivan 22:22, 30 September 2007 (UTC)

And the October enwiki got cancelled/failed too.[edit]

I would like to get the current-pages dump for my offline palm reader, but the backups seem to be very random and inconsistant. I hope they don't have to be relied upon! Please can someone say if there is a problem with enwiki. Is the database corrupt? Alun Liggins --77.96.111.181 23:19, 5 October 2007 (UTC)

Image thumbnails[edit]

It would be a great idea to have just the thumbnails of images available. That might at least be feasible for now.

[edit]

I would easily pay for a complete dump (especially for images). Private download or mailed hard disk.

DEWIKI[edit]

The last is many days ago an something failed. A fresh clean and error-free one would be great. Thanks.


Philosopedia.org[edit]

Please make a dump of philosopedia.org. We are moving to a new server and the current wiki is using the MyISAM format and we wish to import the data using INNODB. Thank you very much.


pages-meta-history.xml.bz2 for University Research[edit]

Hi, we're looking to create a local full history dump of the english wikipedia for research purposes at the Center for the Study of Digital Libraries at Texas A&M University. However, we have been unable to locate any version of pages-meta-history.xml.bz2. If it can be attempted or if someone else who does have a copy could let us get ahold of it, we'd be very grateful. Thanks.


I would like to reiterate this request from the University of Nottingham, Uk. We are desperately trying to download a recent dump of wikipedia with revision history (pages-meta-history.xml.bz2), and none seems to be available. Even an old dump would be of use, but again we cannot find one. If anyone knows where one is available then we would be extremely grateful. Otherwise I would like to formally request a reattempt of the latest dump asap. Regards.

I would also like to request the full revision history (pages-meta-history.xml.bz2). The current links are "in progress" or broken. Even older versions would be acceptable. Thanks. --Dhardy 21:57, 5 February 2008 (UTC)

Hi, I'm also in need of pages-meta-history.xml.bz2 for research at University of Waterloo. If anyone has any complete dump, even if old, please post a link below. Thanks.

full history dump for research purposes[edit]

Hi, I'd be interested in full dumps (pages-meta-history) of the en and de wikipedias for research purposes. Since these large XML-dumps apparently keep failing, I'd like to know whether MySQL-Dumps might be available or there is any other chance to get hold of the full data. Chassel 08:03, 2 December 2007 (UTC)

Another request for enwiki full history/pages-meta-history.xml.bz2[edit]

Also needed for research purposes. The last full dump I have is from last year and is of course now horribly outdated. Could we either get: 1) a new dump (run to completion!) or 2) at least a copy of the aforementioned mid-August dump?

DEWIKI[edit]

There is currently no DE Dump available. Can you please make a dump of it?

True. There is none. Maybe the Zeno lobbyists have won the toss to promote their proprietary software ;) I too would love to see a dewiki dump, because this is long ago we had a working one. -andy
Seconding that request. It says: "2007-11-14 19:08:09 dewiki: dump aborted" but there is also no way to access the previous dump. Is someone able to at least put back the previous version of http://download.wikimedia.org/dewiki/ in the meantime? Thanks!

84.149.74.117 09:21, 8 December 2007 (UTC)

Latest dumps (all current pages and all main pages) at [5]. Rich Farmbrough 08:32 24 December 2007 (GMT).

ESWIKI[edit]

I just want the spanish talk-pages of wikipedia, but would be really happy having at least the enormous full-pages version. Could you please make a dump of it? Thank you!


Is it possible to make an updated static html dump? The process is too long in a single machine. Thanks.

Abstracts XML[edit]

The abstracts XML file is not currently compressed - is there a reason for this? I would like to download and process the abstracts file on a regular basis (i.e. each time a new version is available) and obviously it would be better if this file was compressed so it can be downloaded faster. Thanks.

Image Tarballs[edit]

Is it possible to create the Image Tarballs another time? Espacially the German ones ;-) thanks!

Request for information on how to obtain entire Wikipedia on hard media[edit]

I'd like to know if its possible to obtain the entire English Wikipedia (with full revision histories) on a hard disk, and if so, how much this would cost, and what options are available, i.e. do I send the hard disk, are images available, are you able to set it up so that its active, etc. This is for possible use on a private intranet for a non-profit organization. You may email me via the "Email this user" option. Thank you. Johntrask 18:37, 20 January 2008 (UTC)

Template Dump[edit]

is it possible to get a dump of all the templates From enwiki? Please email me at noreply@moronicgaming.com (No use spamming it, i only read emails from people i know) Thanks Moronic.

Wikitravel Dump[edit]

Can you generate a dump of the wikitravel site or where can i get a succesfully generated enwiki dump ? Please email me at lefajardo@scaa.com.gt Thanks Lester.

Wikitravel does not provide dumps. This is probably because wikitravel is a for profit organisation owned by internet brands. users have been requesting dumps for years. Some got so infuriated they setup their own www.wikivoyage.com site. They detail more history at http://www.wikivoyage.org/general/About_us
Sigh. en:Wikitravel is not a for-profit organization. It is not even an organization. Internet Brands, the server host, is a for-profit organization, and it is they, not Wikitravel, which refuses to provide dumps. And yes, this is a point of frustration, but it is possible to generate your own dumps through creative use of the special:export function. --76.193.168.97 17:08, 20 April 2009 (UTC)

Wikitravel Dump[edit]

Could the wikitravel dump be made available publicly for download on http://download.wikimedia.org? Its quite useful to have a travel wiki that is portable. Otherwise could you please get in touch with me at jklown.10.redline@spamgourmet.com to organise a download or media transfer. Thanks JKlown

Static HTML for EN/ES/FR[edit]

I've been watching http://static.wikipedia.org/ and it claims to have been updated 2008-02, but it only seems to have gotten to "eml". Did we get stuck on English? I'd really like to have static HTML (article space only) for English, French and Spanish Wikipedias. Is there anything I can do to help? Are any older dumps available? Thanks, Bovlb 22:48, 22 May 2008 (UTC)

To answer my last question, I found dumps from over a year ago here. I'll work with those for now, but I'm looking forward to updates. Bovlb 05:35, 28 May 2008 (UTC)

This wikitech message from 2008-03-06 says, "We have a new server for the static HTML dumps which should be set up and generating new dumps within a week." The issue has been raised on the list a couple of times since then, but I can't see any reply.

I note to my surprise that the static HTML dumps (from April 2007) seem to be all pages, not just articles. Is there a demand for that? Bovlb 23:19, 30 May 2008 (UTC)

English Wikipedia Ireland dump[edit]

I woul like a dump of all Irish articles in English. I would also like a dump of all templates. if anyone can help me please do so.

Simple English Wiktionary[edit]

Could we get a dump of simple English wiktionary. It should take but a few seconds of computer time. The last two dumps were unsuccessful or aborted. Thank you.--Brett 15:47, 1 June 2008 (UTC)

Wikipedia Sports Category[edit]

Hi, Could you please give me a dump of the entire wikipedia sports section. Thanks

All Wikipedia templates[edit]

Could you please give me a dump of all wikipedia templates. Regards

me too!

92.50.71.43 10:55, 9 June 2012 (UTC)

7-zip[edit]

Please provide pages-articles.xml.7z for enwiki. Vi2 09:08, 20 June 2008 (UTC)

Dumps in Russian[edit]

Good Morning. I would like to ask u about dumps in russian. The thing is that my company used them regularly, but since June Wiki has stopped making them. Is there any service, that could produce them for us (maybe for some money)? We don't need everything - only ruwiki and tables.Lizukin 10:45, 30 September 2008 (UTC)

Dump of ukwiki[edit]

Dear administrators! Could you please make a new dump of the Ukrainian Wikipedia? The last dump was made on 11th of June 2008. Thank you for help. --Gutsul 10:57, 4 November 2008 (UTC)

Dumps of articles' history summary[edit]

On the ASE project we use articles' history pages to spot articles that need attention. It works very well, but for years now we have been hitting the web servers HARD. We crawl en.wikipedia.org, which is bad. We would like to move on and use XML dumps.

All we need is the information you can find on the first page of the history tab of each article: a chronological list of editors. Yes, that's all, from this we find patterns that quite accurately spot articles that volunteers then check and fix. In particular, we don't look at the articles' content at all. So, what dump we should use ? pages-meta-history is overkill, it contains articles' content whereas we just need the history summary, and it is impossible to download with my Southeast Asian country's very slow internet connection. Thanks! Nicolas1981 09:30, 17 November 2008 (UTC)

Dump of wikimedia commons[edit]

Dear administrators! I want to download the images from wikimedia commons. But I found that there is no dump file for wikimedia commons. Is it right? Could you please tell me how to get the dump file for wikimedia commons or create a new one? Thank you for your help.

137.132.250.12 08:07, 17 December 2008 (UTC)maggie

Dumps of images have been disabled. Stifle 15:44, 6 June 2009 (UTC)
There is discussion about how to remedy this. It is a 'temporary' measure. 18.85.46.244 15:08, 19 July 2009 (UTC)

dumps of abstract.xml[edit]

It looks like the abstract.xml dumps aren't working as of around the end May. Is this by design? If these are not going to be dumped anymore, can someone explain how I can still get article abstracts without having to download all of the page data? I will if I have to, but I have to imagine there is a better way. :) Thanks!

Request for a dump of all articles about Kenya and associated Templates[edit]

Hi, I am interested in a English Wikipedia database dump of all articles about the country Kenya [[6]], its politics, social, geo-political e.t.c Everything about Kenya, and the associated Templates. I hope i will get help. Contact me at kivuva@kenya.or.ke Thanks September 7th 2009

Request for a dump of CFD-wiki[edit]

Hi, I need a dump of the community project CFD-wiki a project based on media-wiki. I dunno under which category to search for in the overall dump downloads. contact me at krishnanveeraraaghavan@gmail.com. Thanks

Banned Users Dump[edit]

Hello, I am working on a research project to detect sock-puppetry automatically. We are building a model based on the full history dump, but in order to perform evaluation we will need some sort of ground truth. I believe access to the banned users dump which is listed as "Data for blocks of IP addresses, ranges, and users. (private)" will be ideal for this purpose. I understand that this might be sensitive information and am ready to comply with whatever is necessary in order to obtain it. I am planning to use it for research purposes only. Please, email me at petko@cs.ucsb.edu . Thanks

Request for a frwiki dump[edit]

No dump made today for frwiki, error saying “2010-06-09 21:36:14 frwiki (new): missing status record” ? Can this be checked? — A2 21:38, 9 June 2010 (UTC)

Request for a Wikinews dump[edit]

The address download.wikimedia.org has been down for almost a month now. I find this revolting. Now enwikinews dump progress on 20101101 at [7]

up to date english wiktionary dump[edit]

the last dump of the english wiktionary was in october. it would be great if we could get an up to date dump :D Tdomhan 11:39, 20 December 2010 (UTC)