Talk:Wikimedia Rust developers user group

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Getting started[edit]

Welcome! Are there other goals people would like to accomplish or tasks to work on? What would you like to see this group do? Legoktm (talk) 07:13, 12 February 2021 (UTC)[reply]

@Legoktm Is this a Wikimedia user group? If so, have this group applied for recognition to the AffCom yet? Thanks. —MarcoAurelio (talk) 11:03, 2 March 2021 (UTC)[reply]
@MarcoAurelio: that's the plan. I was going to wait until after our first meeting (which I'm behind on scheduling..) before starting the application process though. Legoktm (talk) 18:08, 2 March 2021 (UTC)[reply]
@Legoktm Thanks for your reply. I'll categorise the main page accordingly then. Makes sense to wait until the first meeting :-) Good luck. —MarcoAurelio (talk) 18:19, 2 March 2021 (UTC)[reply]

First meeting[edit]

I'd like to have our first realtime meeting around the end of February (in about 2 weeks). Is there a preference for a video/audio call like on https://meet.wmcloud.org/? Or would people be more comfortable with doing it on IRC? Any day of the week/timezone preference? Legoktm (talk) 07:18, 12 February 2021 (UTC)[reply]

I'm ok with both DCaro (WMF) (talk) 08:27, 12 February 2021 (UTC)[reply]
Personally, I'd prefer video call over IRC. --Magnus Manske (talk) 12:41, 17 February 2021 (UTC)[reply]

Advertising this group[edit]

So far at:

Other places? @Enterprisey: maybe there are some Rust channels you're in (I recall something about Discord) you could spread it in? Legoktm (talk) 07:54, 12 February 2021 (UTC)[reply]

Yeah; they're mostly about Rust primarily, but I'll certainly post where I can. Enterprisey (talk) 23:35, 12 February 2021 (UTC)[reply]
Perhaps we could create a server just for this. Firestar464 (talk) 02:45, 23 February 2021 (UTC)[reply]

Remote Hackathon[edit]

Anybody interested in collaboration? Max Semenik (talk) 16:13, 1 April 2021 (UTC)[reply]

I want to work on dump processing with data analytics tools. Stealth project on GitHub: Wikidumptools, spoiler:
Screenshot wikidumptools search
Any form of collaboration or general talk about Rust would be great. --Count Count (talk) 10:39, 11 May 2021 (UTC)[reply]
@MaxSem: FYI --Count Count (talk) 13:58, 11 May 2021 (UTC)[reply]

What are you working on? (January 2022)[edit]

I vastly understimated how much time I'd have to contribute to organizing this group last year, my apologies. Let's try to kick off 2022 properly :)

So, what are you working on related to Rust this month? Do you want help with something, or are looking for something to do? Want some feedback or just want to show off? Legoktm (talk) 06:27, 12 January 2022 (UTC)[reply]

I'll start! This month I'm working on continuing porting some of my Python/PHP bots to Rust [1], [2]. I've also been hacking away at automatically combining API queries, see this ticket for details, I want to keep improving that and would welcome help/contributions. Legoktm (talk) 06:33, 12 January 2022 (UTC)[reply]
I am currently investgating how Rust can improve the archiving and bare ref citation fixing I do on enwiki.
One thing I would like to know about is if there is any tools for processing the database dumps in Rust. I use the database dumps for both tasks. Having something like that in Rust would be faster than the current setup.
Another thing that would be intresting is same page concurrency, for example, somehow taking one page and analyzing multiple links at the same time instead of going in order synchronously. Figuring out how to do that could be helpful.
Enterprisey has also been very understanding of my work, which I appreciate. When my Rust skills improve I'll have to start contributing. Rlink2 (talk) 03:35, 13 January 2022 (UTC)[reply]
Rlink2, for the dumps, I personally use a fork of the parse_wiki_text crate and the partial XML dump parser crate and my SQL dump parser crate. I also have a crate that translates the XML dump into CBOR and other formats, which could be refactored into a general XML dump parser. My crate grabs all fields from the XML dump files, at least pages-articles.xml and pages-meta-current.xml (not sure if I've tested it on pages-meta-history.xml), whereas the partial XML dump parser retrieves only the more interesting fields, like the title and wikitext. If you find parse_wiki_text useful, maybe I could publish a new crate on crates.io, because the original maintainer has disappeared, and anybody interested in the group here can be added as maintainers so that hopefully it's never abandoned again. Erutuon (talk) 19:27, 13 January 2022 (UTC)[reply]
Republishing the parse_wiki_text crate seems like a good idea since people are clearly using it, you're more than welcome to add it to the mwbot-rs organization to help increase bus factor. Though maybe we can come up with a better name that doesn't have so_many_underscores :)
Do you also have a copy of the parse_mediawiki_dump repository? If not we can download it from crates.io and import that into Git. Legoktm (talk) 08:24, 14 January 2022 (UTC)[reply]
I do have a copy of parse_mediawiki_dump and it's all in a fork on GitHub. Erutuon (talk) 04:49, 15 January 2022 (UTC)[reply]
For concurrency you can use tokio to do something like:
let links = vec![...];
let mut handles = vec![];
for link in links {
    // Spawn a new task for each link
    handles.push(tokio::spawn(async move {
        do_something(link).await
    }));
}

for handle in handles {
    // Wait for each spawned task to finish
    let result = handle.await.unwrap();
    // Do something with each result...
}
I've been meaning to write a blog post on how best to do this in bots... Legoktm (talk) 08:19, 14 January 2022 (UTC)[reply]
As promised, I published "Building fast Wikipedia bots in Rust" on my blog which steps through building a fast+concurrent bot. Legoktm (talk) 08:34, 21 January 2022 (UTC)[reply]
I might finally make a search engine for translations in English Wiktionary. Various people have mentioned various translation-related searches that they'd like to do, like finding all translations to a given language, but it hasn't been possible in any kind of systematic way, though I do have a template search engine that can sometimes kind of work. This would involve parsing the translation sections in English entries from the XML dump, figuring out a database schema, and writing an executable to generate it when each dump comes out. Translation sections contain a translation header template displaying one of the definitions of the English word and under it various templates that link to non-English words with that meaning. I will start using my program that generates the CBOR template dumps. Erutuon (talk) 19:27, 13 January 2022 (UTC)[reply]
The database is pretty well filled out and the website (source) has a couple of search queries available (translations using a given language code, all translations listing a given word). Erutuon (talk) 03:06, 23 January 2022 (UTC)[reply]
Hello, I am here because logektm pinged me via IRC. (Thank you!) Unfortunately, I have not had any purpose when I joined the IRC channel and just had heard Rust is something new and good. That's why I post this so late. But just now I've decided to write a bot for creating pages on WikiApiary based on Miraheze's complete wiki list, just for fun, and I think it would be good if I write my first Rust program using mwbot-rs library instead of Python or Node. Lens0021 (talk) 10:48, 29 January 2022 (UTC)[reply]