Jump to content

Africa Growth Pilot/Online self-paced course/Module 2/Contributing to Wikisource

From Meta, a Wikimedia project coordination wiki

But how did this text get here? How can we contribute as volunteers to getting more texts here? And the answer is that there is a beginner's guide that is linked here, and there's also a video tutorial that is also linked here. I encourage you to read or watch as you prefer and jump right in.

I will make a very quick demonstration of what that looks like. So you can see here in Wikisource that there is a current collaboration. I'm going to zoom in a bit. There is a current collaborations box here, and there is a monthly challenge, and we can click on this monthly challenge and be taken to the October 2023 monthly challenge. And we can see that there are 72 different works, 72 different books or texts that are part of the monthly challenge this month. And it's deliberate that there's a lot of them so that we could all find something that's interesting for us.

These 72 works include 25,000, almost 26,000 pages. So there's plenty of work for everyone. And if we scroll down, we can see these are short texts under 50 pages that are awaiting proofreading. For example, a text by the famous economist Thomas Malthus. But we can also just look at whatever is new this month, whatever is waiting to be proofread.

What should we pick? Ooh, The Necromancer, or The Tale of the Black Forest! That sounds nice. So we click this text and we are taken to an index page. This is a page that tells us how much of this work is already done. And you can see that of its 227 pages, all of them are red. Nobody has even started proofreading this text. You can just click on any of these red pages. Maybe you'll want to start with the first page, but you can really click on any one, say page number five, and you are taken to this interface. I'm going to zoom out a little bit so you can have a better view. Right. This interface, which shows side by side the scan on the right -- the scan of the text -- and the computer recognized output on the left.

And you can see here, for example, that to a human reading this, looking at this scan, I think it will be quite obvious that the word "necromancer" at the top here is not part of the text. Right? It's literally the name of the book. Right? And it just appears on the top of every page. We know this if we've ever used a book. But the computer doesn't know this, so it included it here as part of the recognized text. But this should actually be put here in this box, the header box, because that's where it belongs. And also the page number. Right. This five here is again not part of the text. It's what you might call a paratext. It's something that's part of the book but not really part of the *text* of the book. A page number is just an aid for the reader, right? It's not part of the text. So we remove this, and move it to the header.

And then we can also see some mistakes. I'm deliberately showing you a little bit of how it's done. You can see here that the first word, I hope we can all easily determine the first word is "length". Right. The simple English word "length". But the computer failed to recognize it, and instead used the non-existent word "tenoff". Right. That's clearly wrong. So we literally have an edit box here and we just change this T and this O and there we have the word length. After that we have the word "he". That's correct.

But then we have this funny word. What is this word? It's not "alfo" as the computer thought. It's "also", the simple English word "also". It's just that this text from the 18th century still uses that old style of S in the middle of a word. Tou can see here that, elsewhere here, you see that in this word "first" and in this word "still". Here you have the word "ascertain" and the letters that look like an eff, but they're esses.

So if we know this, if we understand this, we can correct all of this, right? We can correct this "alfo" to be also. And this "moft beloved" to be "most beloved". Et cetera, et cetera. This is the kind of thing the computer is not smart enough to do. So in this particular example, and I really picked it at random, there's actually quite a lot to correct. Sometimes there will be almost nothing to correct. Either way, we are comparing the text that the computer has created and correcting it according to what the scan says.

Once we're done, once we have gone through this whole text, which I'm not doing now, in order to save time... But once we are done, we can move this little colored control here to the yellow position. You can see that the yellow position is "proofread". It means I'm marking this page as proofread. Remember, it was red. But once I mark it in yellow, it means it has been proofread. Why yellow and not green? Because each page on Wikisource is supposed to be looked at by at least two people. So one person will proofread and correct everything that they see, and then the second person will go over the proof, read the text, and make sure everything looks good, and approve it, turning it green. That way we achieve better quality for the texts. So, I don't want to spend the time actually correcting this whole page, so I'm not going to mark it.

But after you do, after you have corrected the whole page, you move it to the yellow state and you click publish page and that's it, you're done. And then you can move on to the next page. So you can do it page by page. If a number of you are friends or want to edit together, you can each work on a page and then together make shorter work of proofreading the book. And like I said, just this month, there are these 70 books to work on, but there are thousands and thousands of books waiting for proofreading on Wikisource. And I'm sure you can find something that you would be interested in, in terms of the material, so that you can enjoy your work.

All this takes is some patience and attention to detail. You don't have to worry about citing sources; You don't have to worry about notability, about deletions. The text is given, and all you need to do is make sure it is correct, after the computerized recognition. So in terms of atmosphere, remember I mentioned different projects have different atmospheres. Wikisource is a very chill, relaxed project. There's very little conflict and friction going on, because there's very little to argue about on Wikisource. So if that sounds like something you want to pursue, look at this tutorial or read the beginning guide.