Talk:Data dump torrents

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

I think much of all the wikimedia effort should be toward uploading torrent xml[edit]

As I have tried download it without torrent and it broke many times so I had to start the process over and over again. Just a advice! MahdiTheGuidedOne (talk) 22:57, 11 July 2012 (UTC)

Different tracker for enwiki-20120902-pages-articles.xml.bz2[edit]

Burnbit's tracker has been erratic, and right now their entire website is down. I've been thinking of setting up an additional tracker (I had some spare resources available to run one) and that downtime pushed me to give it a go. It's at tracker.lbft.net:6969 and it's restricted to authorised torrents (currently only enwiki-20120902-pages-articles.xml.bz2).

If anyone has any objection, or if I've stepped on anybody's toes by doing this, feel free to replace the torrent I created and linked with a different one (and I'll stop seeding mine and start seeding yours). If I've done anything wrong with the torrent or tracker, please let me know.

In case anything happens to my tracker (which I don't anticipate) the torrent will continue working; it's web seeded from dumps.wikimedia.org like the Burnbit torrents, I added OpenBitTorrent's tracker as a backup and modern BitTorrent clients do reasonably well on DHT alone.

-- lbft (talk) 10:00, 3 September 2012 (UTC)

Updated dewiki, frwiki, itwiki and nlwiki torrents[edit]

Since people seem to be using that enwiki torrent and burnbit's still down, I've also added torrents for the latest dumps of the other wikis on the page (dewiki-20120829-pages-articles.xml.bz2, frwiki-20120827-pages-articles.xml.bz2, itwiki-20120831-pages-articles.xml.bz2.torrent and nlwiki-20120824-pages-articles.xml.bz2.torrent.) I was able to find the original burnbit torrent files for these, so I just added my tracker and OpenBitTorrent's to them. That means that people who already had those torrents should at least be able to see peers in the new torrents over DHT.

If you were downloading one of those torrents and you don't see any seeds/peers, make sure you enable DHT in your client and, if your client supports it, add http://tracker.lbft.net:6969/announce and udp://tracker.openbittorrent.com:80 to the list of trackers - or just remove the torrent from your client, download and add the torrent file again and tell it to verify the existing file.

--lbft (talk) 09:34, 4 September 2012 (UTC)

Thanks. Will try that! :) Jesse V. (talk) 03:38, 16 September 2012 (UTC)

Burnbit's back up again, both their website and their tracker, but I'm not entirely comfortable relying solely on them - so with the latest dump, nlwiki-20120913-pages-articles.xml.bz2, I've added my tracker and OpenBitTorrent to their torrent. If you prefer the original Burnbit torrent, it's linked too. --lbft (talk) 00:53, 18 September 2012 (UTC)

frwiki-20130104-pages-articles.xml.bz2[edit]

This torrent is web seeded from dumps.wikimedia.your.org (an official dumps mirror) only, since BurnBit must've got a corrupted download from dumps.wikimedia.org for this torrent. The md5sums file shows that the file should have an MD5 hash of 7602a371059be5dfb5e10e05d9211736, but the dumps.wikimedia.org torrent has a hash of 75b2fe4dfd0146c2b7c38f3c2cf491a6 instead.

Basically, the frwiki-20130104-pages-articles.xml.bz2 torrent linked on Data dump torrents is fine, but if you go stick the dumps.wikimedia.org URL into BurnBit's website the torrent it sends you to is broken. -- lbft (talk) 08:07, 6 January 2013 (UTC)

Alternatives to burnbit.com?[edit]

Is there a safe alternative to burnbit? I ask because burnbit is (a) failing to process the latest enwiki dump, (b) getting multiple spam windows past my McAfee Antivirus protection and causing my Firefox to crash. Could the WMF host the torrent files that go with these dumps? -- John of Reading (talk) 12:31, 8 February 2015 (UTC)

Kiwix uses MirrorBrain. You could send a patch to enable such a system on dumps.wikimedia.org. If you are interested, you can probably copy something from Kiwix and ask Kelson if something is missing.[1] --Nemo 08:10, 9 February 2015 (UTC)
Well, I followed your links for I'm not much wiser! Are you saying that MirrorBrain is already used by one WMF project, so could reasonably be extended/adapted/redeployed (??) to do another task? I'm not competent to send a patch myself. Where could I propose that WMF take this on? -- John of Reading (talk) 16:27, 9 February 2015 (UTC)
I'm saying that MirrorBrain is used by some Wikimedia folks, but not by WMF. This was already proposed to WMF, at bugzilla:27653. If someone offered to do the work, I expect WMF would accept.
Otherwise, maybe there are other such services, but I don't know. Sorry, this is all I know. --Nemo 17:00, 9 February 2015 (UTC)
OK, thank you. I'll keep watching this page in case someone has better luck with the latest enwiki dump. -- John of Reading (talk) 18:08, 9 February 2015 (UTC)

Request for enwiki pages-meta-current[edit]

Would it be possible to add enwiki-20160305-pages-meta-current.xml.bz2 (and future files) to this page as well? Thanks! GoingBatty (talk) 12:52, 8 March 2016 (UTC)

@GoingBatty: Apparently not. When I tried it, burnbit.com responded immediately with "Sorry! but that file is too big". Perhaps there's another site out there that accepts bigger torrents? -- John of Reading (talk) 18:00, 8 March 2016 (UTC)

Burnbit doesn't support https[edit]

All the URLs at //dumps.wikimedia.org/enwiki/20160407/ are HTTPS, which burnbit.com refuses to accept: "The URL given is invalid! Only HTTP urls are supported by most of the torrent clients." Is this a recent change by dumps.wikimedia.org or by burnbit? Is there a workaround? -- John of Reading (talk) 13:03, 16 April 2016 (UTC)

(More) Apparently the workaround is to add ".your" in the middle of the domain name, http://dumps.wikimedia.your.org - though burnbit took over a week to process the file. -- John of Reading (talk) 05:42, 28 April 2016 (UTC)

2016-04-07 data dump[edit]

I am using the data dumps to measure the size of Wikipedia over time. This data dump, due to compilation error over its data dump, would be ignored for my needs, as it shrunk by approximately 10% of its actual value. Johnny Au (talk) 12:44, 5 May 2016 (UTC)

EnWiki[edit]

I can't seem to access the most recent dump. Can anyone assist?

400 Lux (talk) 15:49, 23 July 2016 (UTC)

Someone has created a torrent for the 20160720 dump, and I've added the link here. The original torrent for the 20160701 dump was corrupt and has been deleted. I've asked Burnbit to create a new one and will add it here if it works properly this time. -- John of Reading (talk) 16:48, 23 July 2016 (UTC)
The most recent torrents can take a few days to complete. Get the 2nd most recent torrent. Chuckr30 (talk) 14:37, 21 October 2016 (UTC)

Burnbit is apparently dead.[edit]

According to this burnbit.com has been down for more than a month. Almost all of the links on this page are dead. Brightgalrs (talk) 20:41, 3 October 2016 (UTC)

Yes, burnbit is dead, so is Mononova.org. I just tried both. TBP sucks as it does a popup every 1-2 minutes. Where else can I upload Wikipedia torrents? https://newtorrentzeu.com/ doesn't seem to support uploading torrents. Chuckr30 (talk) 14:33, 21 October 2016 (UTC)

ruwiki-20150806-pages-articles-multistream.xml.bz2[edit]

Does anyone have this dump? Ivan386 (talk) 01:43, 19 November 2016 (UTC)

I have ruwiki-20150806-pages-articles4.xml.bz2, but the latest dumps are https://dumps.wikimedia.org/ruwiki/20161120/ --AVRS (talk) 20:08, 23 November 2016 (UTC)

Burnbit[edit]

I removed all the dead links, they have been dead for some time. Would like to see new dumps via torrent somewhere. 400 Lux (talk) 17:17, 10 December 2016 (UTC)

archive.org has some torrents. What do you like torrents for? Maybe you can seed yourself from some server, if the problem is a slow home connection. Nemo 20:12, 30 December 2016 (UTC)
Torrents save time by allowing users to get the data faster than the server can write. This is important when we are squeezing productivity from every second. 400 Lux (talk) 22:33, 2 January 2017 (UTC)
Which server? Do you mean the 2 MiB/s throttle from dumps.wikimedia.org? Nemo 15:09, 3 January 2017 (UTC)
Yes, that example will illustrate. 400 Lux (talk) 02:31, 18 January 2017 (UTC)

Torrents v dump files[edit]

@Razorblack: Since this page is specifically about torrents, I'm not convinced by your addition of links to the dump files themselves. -- John of Reading (talk) 19:22, 3 February 2017 (UTC)

@John of Reading: Oh! Good point. I had actually forgotten this is specifically a torrents page... Please feel free to remove those two links if you think it's appropriate, or I can do it too. The ideal solution would be to create a torrent and post it here I suppose, but I don't have the upload resources to seed it myself consistently, and I don't think it would find other reliable seeders either. (Maybe I'm wrong though). I'm not sure what the right thing to do is. The page will remain pretty barren. -- User:Razorblack
@Razorblack: Yes, I fear this page will stay rather empty, unless the Wikimedia dumps server starts hosting torrent files with its dump files. -- John of Reading (talk) 21:20, 3 February 2017 (UTC)

Torrents at tools.wmflabs.org[edit]

I've just discovered tools.wmflabs.org/dump-torrents/, which has a torrent for just about every big dump file from December 2016 onwards. -- John of Reading (talk) 16:40, 21 September 2017 (UTC)

Nice, but no seeders.. -- GreenC (talk) 04:02, 17 February 2018 (UTC)
@GreenC: There are explicit web seeds in the torrent files; is your client not downloading from them? Mahir256 (talk) 04:56, 17 February 2018 (UTC)
@Mahir256: - there are seeders but no one sending anything. For example svwiki-20170401-pages-articles.xml.bz2.torrent - using Tixati as a client. Tixati may in fact be the problem. But then I noticed in the seed list archive.org and I was able to find the image needed from 2015 and doing a direct download. -- GreenC (talk) 06:07, 17 February 2018 (UTC)

enwiki-20150901-pages-articles.xml.bz2[edit]

Anyone have this one (or thereabouts)? -- GreenC (talk) 04:05, 17 February 2018 (UTC)

Discovered Internet Archive has it all: https://archive.org/download/enwiki-20150901/enwiki-20150901-pages-articles.xml.bz2 -- GreenC (talk) 06:07, 17 February 2018 (UTC)

Translate tags[edit]

If someone could fix the translate tags and explain to me how they work, that would be great, thanks. --Thibaut120094 (talk) 15:27, 6 December 2018 (UTC)