Distributed Media Storage/Software Requirements

Current statistics

As of August 15, 2005:

File count: 3087418 (~ 3 million)
Total size: 197021665938 bytes (almost 200 GB)
Mean object size: 63815 bytes
Maximum object size: 23610681 (~ 23 MB)
Image uploads (writes): < 10 / min

File distribution per project

/var/zwinger/htdocs/mediawiki: 6
/var/zwinger/htdocs/wikimedia.de: 14
/var/zwinger/htdocs/wikimedia.org: 114
/var/zwinger/htdocs/wikibooks.org: 19847
/var/zwinger/htdocs/foundation: 2797
/var/zwinger/htdocs/wikisource.org: 76
/var/zwinger/htdocs/sources: 20
/var/zwinger/htdocs/meta: 19382
/var/zwinger/htdocs/commons: 904398
/var/zwinger/htdocs/chr-x: 11
/var/zwinger/htdocs/boards: 2230
/var/zwinger/htdocs/textbook: 32
/var/zwinger/htdocs/wiktionary.org: 3509
/var/zwinger/htdocs/sep11: 37
/var/zwinger/htdocs/wikinews.org: 1344
/var/zwinger/htdocs/wikipedia.org: 2030720
/var/zwinger/htdocs/quote: 136
/var/zwinger/htdocs/wikiquote.org: 463
/var/zwinger/htdocs/wikimedia: 32007
/var/zwinger/htdocs/nostalgia: 8
/var/zwinger/htdocs/grants: 6

System characteristics

Many more reads than writes (stats?)
Caching is performed by a layer of Squid HTTP caches in front, so peak load is kept away from the image storage system
Maximum file size: ~ 200 MB
More or less append only. Deletes do occur, but can be done in batches. Overwrites could use a different key to store data.

Design assumptions

The system consists of a high number of individual, relatively unreliable components on relatively cheap and nonredundant hardware. Therefor individual component failures will be occuring frequently, and the system must be designed in such a way that it can tolerate up to a reasonable number of failures of individual nodes while maintaining data and service availability.
The mean object size is in the order of 64 kiBytes, and the amount of files is in the order of millions. This means that the system must be optimized for a large number of small files.
As the system is implementing a BLOB storage system, practically all object read requests will consist of requests to read the entire object from start to end.
The typical workload of the system consists of many more reads than writes.
The system can be regarded as and optimized for append only operation.
Low latency is more important than high throughput, as the majority of requests is for small files (e.g. thumbnails), taking (memory) resources during the request.
Peak read load is levelled by a cluster of HTTP proxy caches in front of the system. This reduces the amount of caching that needs to be performed by the system itself.