Learning patterns/Extracting usernames listed on a wiki page

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Correct.svg This page is currently a draft. More information pertaining to this may be available on the talk page.

Translation admins: Normally, drafts should not be marked for translation.


Extracting usernames listed on a wiki page
MechaDuck.png
problemYou need to extract a clean list of usernames from a wiki page
solutionCopy and paste the list from the page into this google sheet.
creatorAbittaker (WMF)
endorse
created on3 June, 2016


What problem does this solve?[edit]

Copying usernames from a page on wiki or from the page's wiki markup is time consuming because extra characters must be removed between the usernames before they can be used for the Magic Button, other tools such as X! Tools range contributions or in a SQL query.

What is the solution?[edit]

This Google Sheet will parse the names into a clean list that can be pasted into a metrics tool or saved as a .csv. Make a copy of the Google Sheet for yourself (so the original remains intact) and paste into it the text that contains the usernames. You can copy names either from the text of a page:

File:Screenshot of event page with caption stating the event page

or from the wiki markup of that page, found under the "Edit source" tab:

File:Screenshot of wiki markup of said event page with caption stating source

The function in the Google Sheet should be able to parse the usernames for either of those sets of text. It is designed (perhaps imperfectly) to extract usernames from any kinds of characters that might be around or between them.

Full link: https://docs.google.com/spreadsheets/d/1f_Sk-2tBieysGT_BTQB693W7BxIcL2bQuS7Lw3KVtHE/edit?usp=sharing

Things to consider[edit]

I made this for my own use and tried to test it against as many usernames as I could. However, our usernames can be very varied. Please check the output of the formula to make sure that it's correct. If you find a line that parses incorrectly, please leave a note on the talk page so I can try to fix it.

You can find the regular expression that parses the text within the function in the spreadsheet.

Also there is need to have usernames presented in different ways for different tools. Wikimetrics work with usernames without any markers, but e.g. Quarry needs them to be presented differently. One can edit the spreadsheet in a way to add needed mark-up to usernames for different tools, e.g here is an example of usernames edited for Quarry queries.

Related patterns[edit]

References[edit]

Endorsements[edit]

  • I'm going to use this right now. Thought you should know! :) EGalvez (WMF) (talk) 20:48, 26 July 2016 (UTC)