Grants:IdeaLab/Commons tarballs seedbox
- Assumption: mirroring our data is imperative.
- Fact: Our media mirrors are poor, especially for Wikimedia Commons; but the Internet Archive lends us a big hand.
- Problem: as of 2013 IA bandwidth is congested, wiki researchers expressed difficulties in downloading such datasets from them.
- The network was improved in July 2014. From the 2013 ISC Annual Report: «In 2013 we expanded our relationship with our largest Hosted@ partner, Internet Archive to include the provision of 3rd party Internet Transit Services.»
- WMF cannot offer mirrors because it would be a waste of their expensive space (source Ariel Glenn),
- almost all the mirror services of the world were contacted and none could help.
- Idea: the Internet Archive is fueled by cheap 3 TB disks, often bought as discounted external HDD. Computational power and home bandwidth are basically free nowadays. Make a seedbox.
- Alternatives: w:en:User:Emijrp/Wikipedia Archive#Help seed the garden of knowledge.
Assemble a 30 TB seedbox with less than 1500 € and maintain it for free.
Setup and example cost:
- raspberry pi freenas, 30 €
- 10 3 TB external HDD via USB or SATA in a RAID, 1100 €
- an 80plus high quality PSU and a case, donated by me
- some cables for powering and connection, few tens €
- power and a 10/10 Mb/s fiber connection donated by me (home powered by the Alps' dams btw), or 1000/1000 by a friendly university lab's ethernet port.
- stuff can be bought via WMIT to save VAT and not become a personal asset,
- if the experiment fails the hard disks can be donated to the Internet Archive,
- we don't need redundancy and stuff, it's just a copy of something we maintain elsewhere,
- torrent will take care of keeping the copies in good shape (unless there are hardware failures),
- if computational power didn't suffice I can donate an old computer or two to the cause,
- poor disk reading speed is ok, our bandwidth won't be higher than that anyway,
- with all the efforts I always fail to consume my bandwidth, but Milan is the most fiber-rich city of Europe so if there's a need I may get a 100/100 connection with little more out of my pocket,
- a researcher can't take more than few weeks to download all the data, but if torrents are better seeded we can find more home seeders for small chunks of the data.
- virtual servers, don't even talk about it,
- AWS S3, about 2300 $/month.
Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.