Upgrade to MySQL 4.0

From Meta, a Wikimedia project coordination wiki

Wikipedia server used to be running MySQL 3.somethingorother. In May 2003 we upgraded to 4.0.12.

Advantages:

  • Faster fulltext indexing
  • Fulltext boolean search mode could be used in place of our current, broken one not using this yet, should do so
  • Allegedly generally faster. 'Query cache' might help?
  • Can sort on indexes in reverse order! Could eliminate ugly inverse_timestamp field, would make some other special pages faster that do inverse order sorts. yay! still have inverse_timestamp field, but can remove it

Disadvantages:

  • Still not entirely trusted...
    • 4.0.12 has been released and declared stable. Lee will be trying out the software under it soon on his test box. seems to work ok
  • With our one server, we don't really have a way to test mysql 4 under production conditions short of taking the current server offline or dangerously stressing the server with double the load (Eek!) If we got separate database and web servers, the new mysql4 could be put on the other server and tested out more thoroughly. Just a though.
    • A second server is now on order.
  • We would need to maintain 3.x compatibility code for some third-party sites. hm

Alternatives:

  • Switch to some other database entirely. Magnify all above disadvantages. :) How important is the Wikipedia software compared to Wikipedia itself? If scalable performance is only acheivable via switching to a different database (PostgreSQL?), would that be the right choice to make?
    • That would be fine and dandy, if a) it will actually help and b) someone with PostreSQL experience, time, and commitment is actually going to produce the necessary code to adapt it. I hope you're volunteering!
  • I could certainly take a look, but I lack time, sadly. It still think it might be worthing giving it a look at. PostgreSQL has reverse sorting (of indexes) and auto-optimizing indexes, which could be helpful. It is known to perform better than MySQL under a heavy concurrent load (although not usually better under a single user load). I could volunteer my server (dual PII 233, 256MB RAM) for testing of the SQL, if you like.
    • I'm sure it's worth looking into, but it would require some re-engineering to adapt the software, while MySQL 4.0 will give us some advantages for a fairly straightforward drop-in upgrade. MySQL 4.0 is a short-term tactic, while thinking about PostgreSQL or other technologies is a long-term strategy.
  • Have we considered using LDAP for the storage of this data? It might work extremely well for the way this data is structured. It could work even better if you consider the multi-lingual aspect of Wikipedia.
    • About all I know about LDAP is how to spell it. ;) If someone could elaborate on what it would mean to use LDAP rather than an SQL-based relational database, I'd love to hear about it.
      • If you're familiar with database concepts, LDAP is basically a hierarchal database, much like a filesystem or the Windows registry. Its predecessor (X.500) was designed to hold information where you would have an DN (article name) and it would have various attributes. Below is an example:
cn=Marumari, ou=users, ou=en, o=wikipedia (think of each comma as a seperate directory)
cn: Marumari
email: nick@twoevils.org
email: this_is_not_an_email_address@wikipedia.org
homepage: http://www.twoevils.org
If you wanted to see if I exist, you just open my DN, and see if it is there. It is very quick to do searches on attributes, and would likely be faster than a traditional database for that. Could also be useful for integrating namespaces (especially for users!).
OpenLDAP is extremely slow compared to even the slowest SQL databases I've seen. I doubt pushing stuff into LDAP would solve any of Wikipedia's performance problems. --Sesse
  • Another option is file based storage of this data.

See Move Text to Filesystem.