Community Wishlist Survey 2022/Archive/Croping pictures from PDF & DjVu files in Wikisource without needing to re-upload them

From Meta, a Wikimedia project coordination wiki

Croping pictures from PDF & DjVu files in Wikisource without needing to re-upload them

NoN Functionality exists

Discussion

I don't understand computer programming any more, was a whizz at it in the 1980's, so sorry if this is a silly proposal. When I upload a DjVu or PDF file to Commons, it contains every image in the book, is there any way that the images could be extracted /cropped automatically from the pages under the same licence / description as the book upload? 03:44, 23 January 2022 (UTC)

Good proposal. I am not aware of any current auto cropper, croptool does crop pdfs and djvus with help from the user. Full page images can be linked to, like so: [[File:image name|thumb|page=pageno]]. These are the steps: 1. figuring out where the images are, 2. cropping out the empty space/text, 3. uploading the image.
This is how I see it, there are probably different methods. 1. The machine kind of knows where images are due to OCR. 2. Imagemagick, which we use, does allow to auto crop out an empty area, 3. re-uploading with same licence is defiantly possible.--Snævar (talk) 10:10, 23 January 2022 (UTC)[reply]
And oh, AlwynapHuw, please fill in the proposal. This wishlist does get translated into other languages than english, and that translation does not include the discussion.--Snævar (talk) 10:11, 23 January 2022 (UTC)[reply]
  • @AlwynapHuw: As Snævar says, extracting images from PDFs and DjVus can be done with the CropTool. This makes sure that the resulting files have the correct metadata. It doesn't detect the bounds of the images automatically, but that's usually a manual process that requires some editorial control. — SWilson (WMF) (talk) 05:35, 25 January 2022 (UTC)[reply]