Consolidating search results when query text is in Khmer
Implement an algorithm that normalises the character sequence of search terms and Wikipedia content when written in Khmer ខ្មែរ​ script.
created on03:42, 15 January 2018 (UTC)


Who are the people you want to introduce Wikipedia to?[edit]

People who predominantly use their native Khmer language to access information.

In what languages do they search for information, online or otherwise?[edit]

Khmer ខ្មែរ

In what ways does this group communicate with each other?[edit]

Facebook, even business communication is done this way here.

What are some reasons this group would use Wikipedia? How would they benefit from it, or what would they find useful?[edit]

After all of the educated and intellectuals were deliberately massacred between 1975 and 1979, there has continued to be a lack of reliable information in Cambodia.

Project idea[edit]

What language Wikipedia projects will you promote to new readers?[edit]

Khmer ខ្មែរ

How will you communicate with new readers? Will you be communicating with them online, in-person, or both?[edit]

Enabling search results to be effective in Wikipedia

Describe your idea to engage new readers. How might it be implemented? What will you tell people about Wikipedia?[edit]

From the request I posted to Google here:!topic/websearch/fenCCXsoZY4;context-place=topicsearchin/websearch/category$3Adesktop---other-please-specify

Khmer is the language and script used by the people of Cambodia.

Depending on the keystroke sequence used, the same word, with the same correct spelling, shows different and incomplete search results.

The Khmer script has been adapted to Unicode in an excellent manner that makes learning to type in Khmer easy. This is because it is forgiving in regard to the keystroke order. This is important because different people write the same looking word in different sequences. And because the qwerty keyboard in different operating systems behaves slightly differently. Keymaps are not identical, but all are capable of creating the required words.

For example, a word meaning 'eat' is: ញ៉ាំ pronounced nyarm. It can be written with the following keystrokes. Note that the script on the left looks the same regardless. Note also that uppercass is achieved by SHIFT + keystroke, so the symbol " is created by SHIFT + ' ញ៉ាំ J"am ញុាំ JuaM ញាុំ JauM ញំុា JMua ញំាុ JMau

Even though the scripts on the left look the same, if they are pasted into Google Search as a search terms, each version will generate completely different results. I am guessing that this is because Google indexes and searches based on the unicode sequence, not the resulting script.

This is a significant problem for the Khmer people because all available Internet searches fail to provide comprehensive and ordered results. This is exacerbated by the fact that very few, if any, Khmer people know that results are omitted due to keystroke order.

While not a simple task, an algorithm could be generated based on the same rules that are implemented in Khmer unicode, which could normalise the keystroke sequence when indexing, and then use the same algorithm to normalise the search terms, for Khmer script.

The result of implementing such an algorithm when Khmer script is detected would be a significant improvement in the availability of relevent, quality information to Khmer people. This is particularly important if you recall that all educated Cambodian people were systematically killed between 1975 and 1979, leaving an extreme knowledge gap that remains a significant problem for Cambodia.

How will you know if this project is successful? What are some outcomes that you can share after the project is completed?[edit]

Searches in Khmer on Wikipedia will not end up with "No Results" as often. This is a very easy metric to track. The increase rate of Khmer searches on Wikipedia should rise more sharply.

Do you think you can implement this idea? What support do you need?[edit]

No. This is a technical issue that needs to be resolved by Wikipedia. It would be great if once they have achieved this, they opensource their solution to Google and other search engine providers.

About the idea creator[edit]

I am an Australian, living in Cambodia and am concerned about the difficulties they face when accessing reliable information.



