I am an ML and NLP enthusiast from Bangladesh. I love working wih data and drawing information from them. I did my Bachelors in Computer Science and Engineering from Shahjalal University of Science and Technology, Bangladesh and Masters in Computer Science from University of Waterloo, Canada. Upon graduation, I worked as a Machine Learning Engineer for about a year before joining Wikimedia Foundation as a Data Analyst and Researcher, performing several roles along the way.
- I am currently working with the Research Team to improve link recommendation in all Wikipedia languages. This work includes fixing mwtokenizer to help parse all languages, improve existing language dependent link recommendation models, and then creating a language agnostic link recommendation model that will replace the 200+ language independent models deployed at present.
- I worked with the Research Team as a Research Data Scientist (NLP) to develop Copyediting as a structured task. To increase and maintain the standard of Wikipedia articles, it is important to ensure articles don't have typos, spelling, or grammatical errors. While there are ongoing efforts to automatically detect "commonly misspelled" words in English Wikipedia, most other languages are left behind. The intention was to find ways to detect errors in articles in all languages in an automated fashion. I wrote a program to automatically curate a list of commonly misspelled words from 100+ languages using Wiktionary. The coverage of these lists were compared with misspelling lists in 2-3 languages, and then the list was used to detect misspellings in all possible Wikipedia languages.
- Previously I worked with the Search and Analytics team to find ways to scale the Wikidata Query Service by analyzing the queries being made. Find the analysis results in User:AKhatun Subpages. Phabricator Work Board (WDQS Analysis).
- I worked on the Abstract Wikimedia project to analyze find out central Scribunto Modules across all the wikis. This work leads to the creation of a central repository of functions to be used in a language-independent manner in the future. See our work in Phabricator and Github.