Use another search engine

<- MediaWiki architecture < Ugly MySQL hacks

Because our use of MySQL's search engine is kind of hackish, and there have been performance problems with it, it's sometimes been suggested to use an alternate search engine. Some possibilities include:

Google

A few times in the past the internal search has been disabled entirely and we've pointed people at the wonder that is http://google.com/ .

Pro:

It can do full-text searches of pages on a given domain (ie, ours)
it's fast
it doesn't use any of our server power (short of occasionally spidering the site, which is fairly well behaved and it does it anyway.)
Handling of non-ascii characters usually mostly works

Con:

We have no control over its workings
The index only updates monthly or so
Won't distinguish namespaces; articles may not have priority over eg talk pages
Searches web pages, not wiki pages; tends to put a lot of interface gunk into the summaries
Takes users out of our interface
Search results are censored (this doesn't only apply to Google)

ht://Dig

ht://Dig is a web indexing and searching system for a website or set of websites.

Pro:

Proven technology
Can be configured to search per namespace
Open source (GPL)
Results can be presented within our user interface

Con:

It needs to periodically spider and index the entire site(s). This puts a load on the servers and the index will lag behind editing work.
No UTF-8 support yet.

Jakarta Lucene

"Jakarta Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform."

Pro:

Since we would run it and it's open source, we can tweak the indexing scheme to our liking and present results within our user interface
?

Con:

Need to run a Java VM, which may eat up extra memory
?

Sphider

"Sphider a lightweight search engine in PHP"

Pro:

Free
Simple install
Fast Full text indexing with google like results
Possbility to exclude common words from being indexed.

Con:

Uses some CPU time
Uses MySQL to store indexes

OmniFind Yahoo Edition

http://omnifind.ibm.yahoo.net/

Pro:

Free
Simple install
Full site indexing
Attachment/file indexing
Fully customizable search interface
Fast full text returns with cache available

Con:

Shared host hostile - CPU intensive
Based on WebSphere - Memory intensive
Requires tweaking to not recursively index the same pages over and over

?

moved from Search Engine:

Search engine

---Ideas---

The MySQL full text search is often switched off due to performance issues.

It also has in its default configuration a word stop list that can make for a poor search i.e. try to find R Smith amongst the Smiths.

---Possible solutions----

The search engine should be a separate database server so as to not impact the main servers. (load spreading)
Create the search index ourselves. (Optimising the search terms and data tables)
The search index can either be created at edit save time. (slows responsiveness)
The index can be created in the background. (delays the index but increases responsiveness and allows the casual browser the best performance)
Another available timeslot is when a user spell checks an article.(medium impact)

Writing our own search engine from scratch can be optimised for our requirements and therefore be faster.

See also User:archivist in the few lists that i have seen for Full text search nowhere is there a mention of the Windows Desktop Search. I agree that it is based on windows but it has a .net as well as a com Interface definitely something that can be looked into other advantages being it has a GUI for Administration and we can also change a Lot of parameters using the Console.has the ability to throttle up or down as per needs. completely compatible with Unicode supports various file Formats. including HTML/RTF/TEXT etc. has add ons for File types like PDF. provides a interface to create you own filter for your own File type. Supports 23 languages. is Unicode compliant. sounds like a good option for me.