User:Brooke Vibber/Sphinx search

From Meta, a Wikimedia project coordination wiki

At MySQL Users Conference I got cornered by the folks doing the Sphinx search engine. They claim it's significantly faster at both indexing and searching than Lucene, and that it returns better result relevancy.

The good:

  • Allegedly faster
  • C++
  • GPL
  • Has an "XML pipe" input for the indexer, theoretically ideal to plug in a transformed XML data dump

The bad:

  • Fairly immature code; had to hack the Makefile and headers to get it to compile
  • XML input is fake, doesn't use a real parser. Have to special-case your output carefully, and it is likely to break.
  • Only has support for English and Russian currently (stemming, case folding)
  • Little to no documentation

First impressions[edit]

Testing the unreleased 0.9.6 code, which includes various improvements over the older 0.9.5 release.

Haven't been able to get the daemon and the sample PHP client talking to each other yet. More to follow...

  • 2006-05-02 update: the binary protocol currently sends data in host byte order, and currently is hard-coded expecting little-endian so it doesn't work on my G5 Mac. ;)