During the past 2 years the author has developed an advanced website / catalogue search system used by the public library in Waalre, a small town nearby Eindhoven (NL). To a large extent the development steps have been directed by feedback about the system's use as logged in extensive log files and by ideas grasped from innovative developments on the internet.
Amongst others, quite a few experiences acquired might be beneficial to the Wiki community, i.e.
- Patrons of the library appeared to make quite some misspellings when typing the search phrase in the catalogue system, the most amazing being that the misspellings by an user were generally very persistent (i.e. used repeatedly). This resulted in an initial 35% return of null results on the average, because no match was found.
After the deployment of algorithms to correct most common error types - a function that is working in the background - the null result percentage dropped to well below 5%.
- The same algorithms are also used to provide the user with suggestions for extended search, even in the case there are search results returned.
- Searching on keywords, next to title and author an important searchable index, appeared to have a kind of ambiguity problem. The so called title descriptions (the base for the information included in any library catalogue), contain (human defined) keywords, but there might obviously be alternative synonyms. The system was therefore extended with an automatically working thesaurus facility (the thesaurus itself is automatically generated as well), helping the user to maximize the number of meaningful hits.
- The logs, especially the null results, can be used by the librarians to get information about books that are apparently sought for by the patrons, but not available in the holdings.
- Since appr. a year all the loans of books are kept in a log file. This log file is used to provide the user with suggestions for alternative authors or titles. This mechanism is more or less similar to the buying suggestions as available in many e-shops.
- More recently the integration of the catalogue system with other information sources has been extended. If one or more of the search words, entered by the user in the catalogue system, are also present in other (selected) websites or the searchable index of Wikipedia, an automatic link (no retyping required) is provided for further browsing.
How can Wikipedia benefit from these experiences?
Re 1 (misspellings):
- For an ordinary user (just looking for info) the suggestion to start a new article, after entering a misspelled search phrase, will result in a disappointment.
- Some of the algorithms to correct misspellings in the catalogue system are generic, others (such as converting single to plural nouns and vice versa), are language dependent, but not to difficult to convert to the other major languages.
Re 2 (suggestions):
- Wikipedia might include the same mechanisms to help users to find the information they want.
Re 3 (thesaurus):
- Now (human defined) synonyms are more or less implemented via the mechanism of redirects, but this feature could be extended and improved.
Re 4 (librarian's info):
- Equivalent logs in Wikipedia might be used to derive market (user) driven lists about desired articles.
Re 5 (alternatives):
- Wikipedia could log search requests (not followed by an edit) to build up a database of users (IP's) and viewed pages. Likewise the catalogue system, which is also using additional properties of authors and books to ensure that the suggestions are in some way interrelated (i.e. a patron may borrow a book by Danielle Steel and a book on gardening, which have obviously no relation), the category classification schemes (and portals) of Wikipedia can be used for the same purpose.
Re 6 (integration):
- Information sources such as Wikipedia could easier be integrated with other search systems, such as a library catalogue, if a Select Query (SQL) could be run directly on the database (preferably an all (title) words table). The present integration with the catalogue system required setting up an off-line, locally stored, database of (the nl version) of Wikipedia.
Download (PDF file 3.020 KB) - Right click and select Save Target As ...