Talk:Q3 and Q4 2004 hardware order worksheet

Add topic
From Meta, a Wikimedia project coordination wiki

As discussed on the 20th of june (Mav is on holidays, so exact numbers are not available), our total amount in bank should be around 24000 dollars. There were 5400 left mid may. There was a refund of 9000 for a server. Plus, the trophy award, of an amount of 9000 dollars. Anthere 05:21, 21 Jun 2004 (UTC)

I just worked out a provisional hardware budget for Q3 2004 based on our past spending patterns. I found that we will need to buy $11000 (assuming we follow last quarter's growth rate) to $73000 (assuming a repeat of the growth rate a quarter before that - probably caused by the installation of the new server farm). Both of those figures, however, are probably outliers and thus do not represent any trend.
  • Averaging all growth rates since Q3 2002 gives an increase in traffic (hits) per quarter of 90.98%, meaning a Q3 2004 budget of between $26000 to $32000.
  • Removing the last two quarters gives a 68.72% growth rate, meaning a $20000 to $24000 budget.
  • Also removing an earlier outlier (Q1 2003's dismal 1.42% growth) gives an average growth rate of 85.55%, meaning a $24000 to $29000 budget.
My spreadsheet is still a bit messy but I think the estimate is sound. Please check:
I'm still working on getting these data in a more presentable form. The finance department also needs to come together so we can work on an annual budget and 5 year master plan. I could do this myself if our only expenses will be hardware related, but I imagine that we will want to expand the operations of the foundation. Thus we need to form policy on what we want to spend money on. I most certainly cannot do that alone.
--mav 08:56, 2 Jul 2004 (UTC)
Computing the needed money from what we've spent seems to be an odd approach to me. It doesn't take into account that we have to buy different kinds of hardware and that growth and prices of these differ. Estimating the budget from a "What additional hardware do we need"-approach will provide a better estimation. -- JeLuF 12:43, 10 Jul 2004 (UTC)
It is a measure of growth vs dollars spent (budgets use past spending patterns in order to predict future spending patterns). How the developers spend the money is up to them. If it turns out that I have budgeted too much money, then we will have a surplus that can be put into a rainy day fund or used for special projects. If their is a consistant surplus then the budget would need to be adjusted. --mav 20:56, 11 Jul 2004 (UTC)
Good budgets use more than past spending patterns to predict the future, particularly when we've only been spending money for two quarters. We can and should work out a more reliable relationship between traffic growth and required hardware layout; I don't know the details of our current traffic shape and exactly how it impacts servers, but there are many subtleties worth identifying. See Wikimedia budget/Hardware efficiencies. +sj+ 19:12, 13 Jul 2004 (UTC)

After many hours checking and rechecking my numbers, I've found that my previous preliminary estimate of $25K to $30K for a quarter 3 budget was a bit off. The problem was with my estimate of the amount of money we previously spent per 1000 hits over the last year. I figured *total* previous hits and then applied that to projected *increases* in hits in the future. Thus my estimate was too optimistic.

After recalculating for past *increases* in traffic I've found that if our traffic increases 90.53% over last quarter by the end of this quarter we will have to buy $29K to $35K dollars worth of new servers to keep up with the increase (if we don't have any traffic increase, then we won't have to spend a dime unless a server has to be replaced). This figure includes a Moore's Law adjustment which assumes we will be able to serve ~17% more hits per new server dollar over last quarter (I'm still not 100% confident that that is a valid assumption - any ideas?). The previous $25K to $30K estimate *did not* include a Moore's Law adjustment.

Other possibilities:

% growth        35.22%      254.90%     20.00%     50.00%      100.00%
$/1000 hits     $11,274.81  $81,595.03  $6,402.22  $16,005.54  $32,011.09
Average         $13,609.25  $98,489.19  $7,727.79  $19,319.47   $38,638.95

NOTE: The 35.22% figure is the actual growth from Q1 to Q2 and the 254.90% figure is the actual growth from Q4 2003 to Q1 2004. The average quarter to quarter growth over the last couple years, however, is about 90%. But who knows if this quarter will follow that or do something unpredictable like the last two quarters.


I have a small question. If you want a system such as the one Peter proposes in this article (with automated server replicating, remote consoles, etc) why not go for a IBM blade solution - which have pretty much everyting you need for a big project like this (integrated switches/fibre channel, integrated full remote management, lots of computer power in a small rackspace, etc).

Price for a IBM blade chassi (14 blades @ 7U height) cost 2700 USD, each blade cost around 2700 USD with 2 x 1.6 ghz Power CPUs. More expensive, but also more reliable and a lot easier to manage. /Magnus

chassis lysdexia 00:23, 9 Nov 2004 (UTC)

A shocking projection[edit]

For the grant proposal I projected 90% quarterly compounded hit growth into 2006. The results are a bit shocking due to exponential growth.

Year    annual budget
2004    $110K - $132K
2005    $660K - $797K
2006    $4200K - $5078K

This assumes business as usual (meaning we continue to use the same kind of commodity-grade rack-mount servers and do not have any real time mirrors - migrating to 'big-iron' servers and making use of a worldwide squid/mirror system will mitigate for this).

But there are still probably more errors in my estimates and I doubt we can continue exponential growth for too long into the future (at least not at the same absurd rate - 30% hit growth compounded quarterly would be much more manageable but still exponential).

Please see:


--mav 21:27, 11 Jul 2004 (UTC)

A caution on big iron: it'll probably raise costs. The way to scale inexpensively is lots of commodity servers. That's why so many supercomputers and calculation setups today use shelves of commodity PCs - they are very cheap to buy. For now we're going second best in rackmount: dual CPU 1U boxes. Databases scale similarly - lots of query servers with a lot of RAM and cheap ATA disks for reads, bigger iron behind them for writes. We're in the very early stages of this migration, both in terms of hardware and modifying the software to support it without being unduly bothered by replication lag. It's a very well proved route, with companies like Yahoo, Google and Sabre doing it. At present we're just getting to the point where we're shifting from one big database server to big ones behind lots of smaller ones. Cost per query should go down from now on. Jamesday 20:10, 20 Jul 2004 (UTC)
Good point. We will still need make sure our boxen each become more powerful (more CPUs, more RAM, faster connections, faster hard disks, 64-bit OS on each box, etc). Doing that will generate more bang for our buck. --Daniel Mayer 22:52, 20 Jul 2004 (UTC)

Just wondering how many NFS operations per second are done on the system? Possbly something like a Netapp filer might be an option as a NFS server. Eg this one on ebay for $3000 can do around 3900 NFS ops per second.

--SimonLyall, 08:43 Jul 13 UTC

I've used Netapp filers in the past, and while I agree that they're very nice pieces of hardware, they're probably not worth the cost in general. It might be worth it to buy used ones on the cheap like the one above, but it's probably even more worth it to simply buy our own hardware and build the equivalent of a Netapp. They're basically just fast drives in RAID with lots of fast RAM and NVRAM, running FreeBSD. -- 15:56, 26 Jul 2004 (UTC) --Wclark 15:57, 26 Jul 2004 (UTC)

Remote console during boot[edit]

I believe the second half of this statement may be wrong:

Using an APC remote switch, the servers can be turned off an on, but access to the systems is not possible during boot.

Access to the systems during boot is possible via a remote serial console setup. Instructions for setting this up can be found here:

Remote Serial Console HOWTO

I've used configurations like this in the past, and they've worked just fine for controlling the machine remotely during bootup. To really get the full functionality desired, we'd want to make sure that the BIOS on the machines supports serial consoles as well, as you can then monitor the entire boot sequence remotely. That's discussed at this page:

Optionally configure the BIOS

--Wclark 16:11, 26 Jul 2004 (UTC)

The problem isn't doing it but doing it for many servers with admins who are from 10 miles away to on the other side of the world, depending on the time of day. A remote console server handles this, connecting to the serial ports and making what's on them visible from anywhere on the net. Jamesday 09:06, 19 Sep 2004 (UTC)

I'd like to recommend some specific hardware for remote console and power if I might, based on my own sysadmin experiences (I'm not affiliated in any way with the manufacturer). Cyclades makes a really nice line of terminal servers that are based on embedded linux running on a powerpc architecture. They're very flexible and extensible (it's linux) and secure. Cyclades also makes some remote power stuff which gets attached to an unused port on their terminal server and integrates with it for server management. The models I used were the TS-Series terminal servers (I don't see much benefit in their more expensive 'ACS' versions of the same stuff). I haven't actually tried the power management stuff from them, only read about it.

(Please excuse any errors of style here, this is my first post to any sort of wiki ever)

--Blblack 19:22, 28 Jul 2004 (UTC)

Thanks for the suggestions. Jamesday 09:06, 19 Sep 2004 (UTC)