Grants:Programs/Wikimedia Community Fund/Rapid Fund/Improving the Citation Generation Output of Nigerian Media Sites (ID: 23528923)
Applicant details
[edit]- Main Wikimedia username. (required)
Vanderwaalforces
- Organization
N/A
- If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)
N/A
- Describe all relevant roles with the name of the group or organization and description of the role. (required)
Main proposal
[edit]- 1. State the title of your proposal. This will also be the Meta-Wiki page title.
Improving the Citation Generation Output of Nigerian Media Sites
- 2. and 3. Proposed start and end dates for the proposal.
2025-11-01 - 2025-12-31
- 4. What is your tech project about, and how do you plan to build the product?
Include the following points in your answer:
- Project goal and problem you solve
- Product strategy or project roadmap
- Technical approach (infrastructure, tech stack, key tools and services)
- Integrations or dependencies (if any)
As an experienced Wikipedian I’ve seen this happen over and over: editors open VisualEditor’s citation tool, paste a Nigerian newspaper URL, and expect a clean, templated {{cite news}} reference. Instead, they get a mess. VisualEditor’s automatic citation mode (Citoid) either returns {{cite web}} instead of {{cite news}} drops or garbles the author name, fails to populate |newspaper=, mis-parses dates, and ignores ISSNs even when the paper lists them. This isn’t editors making mistakes — it’s the system. Citoid depends on Zotero translators to extract structured metadata. Where translators don’t exist or don’t identify the page as a newspaperArticle, Citoid can’t produce the correct template or fields. The result: thousands of Nigerian-news citations across Wikimedia projects are inconsistent, incomplete, and require manual correction — a huge and avoidable drain on editor time and on the quality of verifiability for Nigerian topics.
Project goal & problem solved
Build and deploy site-specific Zotero translators for 50 most-cited Nigerian newspaper websites so Citoid/VisualEditor returns correctly-formatted {{cite news}} references from a single URL lookup — automatically populating |last=, |first=, |newspaper=, |date=, |url=, and |issn= correctly. This closes the metadata-extraction gap, eliminates the need for repetitive manual fixes, and raises citation quality and consistency.
Project roadmap (Nov 1 – Dec 31)
Deliver tested translators + integration so VE’s automatic citation produces the right output.
- Preparation (Nov 1–3)
- Finalise the 50-target list (I will attach). Set up dev environment (Zotero + translation-server Docker), repo forks, test-case template.
- Prototype and pipeline (Nov 4–10)
- Build 3 prototype translators for the highest-priority outlets.
- Create a standard translator template, test harness, and a reproducible workflow.
- Build 3 prototype translators for the highest-priority outlets.
- Bulk development (Nov 11 – Dec 31)
- Work cadence: ~5 translators per week (code + at least 2 representative testcases).
- Submit translators as PRs to the Zotero translators repository.
- Work cadence: ~5 translators per week (code + at least 2 representative testcases).
- Integration and end-to-end testing
- Coordinate with Citoid maintainer (Mvolz (WMF)) and Zotero maintainers to ensure translation-server pulls new translators.
- Run VE end-to-end tests for each site (3 sample articles) and capture evidence.
- Coordinate with Citoid maintainer (Mvolz (WMF)) and Zotero maintainers to ensure translation-server pulls new translators.
- Wrap-up
Technical approach
- Translators: Zotero translator JavaScript files using
detectWeb()/doWeb()patterns. Each translator will explicitly setitemType = newspaperArticleand populateitem.creators(usingZU.cleanAuthor()),publicationTitle,date,ISSN(if present, present in most),section(if present), andurl. - Prioritised extraction order: schema.org JSON-LD → OpenGraph / Dublin Core meta tags → visible DOM selectors (byline spans,
time). - Testing and runtime: Zotero translation-server (Docker) for local headless runs and to run translator testcases. Each translator includes HTML snapshots + expected output.
- Version control and PRs: GitHub forks and PRs to Zotero’s translators repo. Include testcases in PR to speed review.
Integrations and dependencies
- Zotero translators repo (upstream) — translators should be merged; PR review timelines are external.
- Wikimedia translation-server / Citoid pipeline — Citoid uses a translation-server set. After a translator is available upstream (or via temporary bundle), the translation-server must pull the new translators.
- Site stability — translators depend on site HTML. For sites that redesign often or use client-side rendering with no server-side metadata, some translators may break and need patching. I will flag such sites, provide fallback options (print view, canonical pages), and document remediation.
Deliverables
- At least 50 Zotero translator JS files submitted as PRs.
- 5. What is the expected impact of your project, and how will you measure success?
Include the following points in your answer:
- Milestones and progress tracking
- Project impact and success metrics
Milestones & progress tracking
- Prep (Nov 1–3): final 50 list, dev env, tracking board. Track: “list done” checkbox.
- Prototype (Nov 4–10): 3 prototype translators + test harness. Track: PRs opened (3), local tests green.
- Bulk development (Nov 11–Dec 31): ~5 translators/week. Weekly tracking entries showing: translators completed, in QA, PRed, merged. Use a GitHub project or spreadsheet as dashboard.
- Integration and E2E (depending on Zotero/Citoid reviewers/maintainers): request translation-server pull; run VE E2E tests (2 URLs/site). Track: translation-server ticket number.
- Wrap-up: docs + final report. Track: docs published and report uploaded.
Tracking cadence: weekly progress update on the tracking board with PR links, test matrix status, and blockers.
Project impact & success metrics
- Deliverable count: 50 translators submitted (PR links or documented bundle). — Measured by PR list / bundle URL.
- Functional acceptance: ≥90% (45/50) domains have PRs with working testcases open, OR say the translation-server updates from both ends (Zotero and Citoid) have been updated, then I'd be measuring by how many domains produce correct
{{cite news}}output for 1 or 2 representative URLs each (fields: last, first/creators, newspaper, issn, date, url). — Measured by test matrix pass/fail. - Integration success: translation-server used by Citoid pulls the translators (or a confirmed temporary pull). — Measured by ticket confirmation from Zotero and Citoid maintainers.
- Upstream adoption: target ≥30 translators merged into Zotero upstream during project window. — Measured by PR merge count.
Success = 50 translators submitted, ≥45 domains pass the 1or2-URL tests, translation-server integration confirmed, test evidence provided, and basic community validation collected.
- 6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?
Include the following points in your answer:
- Project demand and target audience description
- Links to interaction(s) with Wikimedia community
- Evidence from community consultation such as the [Community Wishlist]
I conceived this personally, and I will be working on it single-handedly. I confirmed demand through firsthand experience as an active Wikipedian (regularly encountering broken Nigerian news citations). But at some point, I may request feedback from editors — target initial confirmations or bug reports from ≥5 active Nigerian editors within 2 weeks post-deploy.
For context, I have successfully PR'd and had my changes merged for two Nigerian newspapers; see here: Pull Request #3459 and Pull Request #3460.
- 7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?
Include the following points in your answer:
- The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
- How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks
This project works only with public article metadata (no private data collection). Identified risks are low and mitigated by sandboxing, sanitising test artefacts, secure code practices, and coordination with Zotero/Citoid maintainers.
- 8. Who is on your team, and what is your experience?
Include the following points in your answer:
- Your experience as a developer, relevant past projects
- Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
- Other team members, their roles and expertise
I am the sole person working on this project.
As a developer, I have developed a userscript on English Wikipedia which allows users to detect unattributed translations from other language Wikipedias: User:Vanderwaalforces/checkTranslationAttribution.js
I run a wiki bot, VWF bot, on English Wikipedia: Bot currently runs on Wikimedia Toolforge, fetching AI-generated images uploaded on commonswiki and used on enwiki articles.
I have successfully PR'd and gotten merged translators for 2 Nigerian newspapers, see here: Pull Request #3459 and Pull Request #3460.
Wikimedia SUL (developer): Vanderwaalforces / vwf
Github: [1]
- 9. How will the project be maintained long-term?
Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.
Primary maintainer (short term)
- Myself — primary maintainer during the grant window and for an initial 3-month handover period after deployment to triage breakages.
Long-term maintainers (after report)
- Zotero upstream maintainers: accept/merge translator PRs — once translators are merged upstream, Zotero community reduces maintenance burden.
- Citoid/translation-server operators: handle server pulls and advise on deployment; will be contacted for any production updates.
Expected long-term expenses
- My time: After all upstream merges happen, I will keep an eye on things for about 3 months, and the expenses will be communicated as a microgrant if possible.
- 10. Under what license will your code be released, and how will you ensure the product is well documented?
Include the following points in your answer:
- Code license and compatibility with Wikimedia projects
- Documentation plan
- Code license: I will release all Zotero translator files under GNU Affero General Public License v3 (AGPL-3.0 or later) and include a license header in each translator file. Zotero’s developer docs explicitly encourage using AGPL (the same license used by Zotero) so translators can be distributed and modified by the Zotero project.
- Why AGPL: AGPL is the Zotero-recommended license and matches the translation-server / Zotero codebase licensing, avoiding distribution/compatibility problems when translators are pulled into Zotero/translation-server and consumed by Citoid. (Translation-server and Zotero core use AGPLv3 in their repositories.)
- Compatibility note: using AGPL for translators is the recommended path to ensure Zotero/translation-server can legally distribute the code to Citoid; the on-wiki documentation will explain how translators map to Citoid/
cite news<code>cite news</code>, behaviour so Wikimedia ops can safely integrate them. - Documentation plan:
Per-translator header: each translator will include a top-of-file license header, brief description, contributor name, supported domain(s), detectWeb() logic notes, and example usage.
Test cases included: every translator PR will include 2 representative HTML testcases.
On-wiki user-facing docs: There's already a concise on-wiki guide showing how VisualEditor/Citoid uses translators (Creating Zotero translators). I will add to the Project's project page how editors report broken sites.
PRs: every change will be delivered via small GitHub PRs.
Maintenance notes: a one-page “how to fix a translator” checklist (selectors to check, where to find JSON-LD/OpenGraph meta, how to update testcases) so volunteers can triage breakages quickly.
- Per-translator header: each translator will include a top-of-file license header, brief description, contributor name, supported domain(s), detectWeb() logic notes, and example usage.
- Test cases included: every translator PR will include 2 representative HTML testcases.
- On-wiki user-facing docs: There's already a concise on-wiki guide showing how VisualEditor/Citoid uses translators (Creating Zotero translators). I will add to the Project's project page how editors report broken sites.
- PRs: every change will be delivered via small GitHub PRs.
- 11. Will your project depend on or contribute to third-party tools or services?
Yes.
- Depend on: Zotero translators repository and translation-server (for running translators).
- Depend on: GitHub (forks, PRs, issue tracking).
- Depend on: Docker (translation-server local testing) and minimal Node/npm tooling for test runners.
- Contribute to: Zotero translators upstream (PRs) and public GitHub repo with translators, tests, and docs.
- 12. Is there anything else you’d like to share about your project? (optional)
Budget
[edit]- 13. Upload your budget for this proposal or indicate the link to it. (required)
- 14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)
7664497.5 NGN
- 16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.
4998.95 USD
- We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.
Yes
Endorsements and Feedback
[edit]Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.
Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:
- Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
- Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
- Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
- Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
- Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.
