Community Wishlist Survey 2023/Multimedia and Commons/Add status messages to assembling and publishing stages of upload process

Random proposal ►◄ Multimedia and Commons The survey has concluded. Here are the results!

Add status messages to assembling and publishing stages of upload process

Problem: Upload to Commons consists of the three stages uploading, assembling and publishing. During the assembling and publishing stages (that can take minutes each) the server does not send progress reports to the upload tool (like upload wizard)
Proposed solution: Server can be queried for progress reports (JSON) during assembling and publishing by the upload tool
Who would benefit: Users, Uploaders and developers who get better error reports for failed uploads.
More comments:
Phabricator tickets: T309094
Proposer: C.Suthorn (talk) 17:44, 28 January 2023 (UTC)[reply]

Discussion

This already exists, and can be retrieved via the API: mw:API:Upload#Additional notes. -FASTILY 22:37, 28 January 2023 (UTC)[reply]

No. What you are linking to are the results of a (failed) upload.

"getting EXIF from file"

"inserting XXX in database table 1"

"inserting XXX in database table 2"

"checking for malicous code in uploaded file"

"adding file to database"

"updateing file counter"

"writing EXIF to database"

"creating file desciption page"

"moving uploaded file to file system"

"updateing "patrolled" entry

or whatever "assembling" and "publishing" actually do, while you wait 5 minutes for your upload to appear, until it does not for some reason. C.Suthorn (talk) 00:41, 29 January 2023 (UTC)[reply]

You should spend some time to familiarize yourself with how chunked uploads work. "publishing" and "assembling" are meaningful statuses. A robust api client will be making use of this endpoint. -FASTILY 01:15, 29 January 2023 (UTC)[reply]

I Should not need to do this. The Upload process should work. It is not meaningful, of the upload is stuck for 5 minutes in "assembling" and if you poll for information you neither get a progress report like "7%, 23%, 51%" , nor an information what the sevrver is actually doing. Only ever "assembling", "assembling", "assembling". BTW: Have you tried in the lasst 6 month to upload a file of 600MB, 1.2GB or 4GiB with the Upload Wizard, Rillke'd tool and your own Upload tool? Did you succeed in the first try, second try, at all? C.Suthorn (talk) 12:23, 29 January 2023 (UTC)[reply]

This feels like a solution in search of a problem? Error reports need a unique identifier for the upload process that can be correlated with system logs (not sure if this exists but you should at least get a unique error ID which is close enough), not random status labels. --Tgr (talk) 00:43, 1 February 2023 (UTC)[reply]
@Tgr have you uploaded af file of more than 1.2GB in the last 6 month? About "unique identifiers": In the past I have created phab tasks with the available identifiers from failed uploads, but either these are not helpful, or there is no interest to fix uploading to MW. C.Suthorn (talk) 08:19, 1 February 2023 (UTC)[reply]
@C.Suthorn let me rephrase: this suggestion feels like the politician's syllogism to me. Commons large file upload is broken; here is a random feature that wouldn't make it any less broken; it is something so we must do it. There are all kinds of things that might help; I'm not convinced that trying to get the user interested in what specific technical steps are being taken in the background is one of those things. (A single end-to-end progress bar is nice, when the total timespan of the operation can be reasonably well estimated. For file uploads, that's probably not the case.) Tgr (talk) 20:19, 1 February 2023 (UTC)[reply]

Wrt phab tasks, I think it's mostly the latter: it isn't anyone's job to deal with upload bugs, and it's both more complicated and arguably less productive than other kinds of technical improvements so few people spend time on it. Figuring out the immediate reason an upload failed is usually not hard (and often something mundane along the lines of "the file was too big and something timed out / ran out of memory"). Fixing uploads that ended up in some half-broken state does tend to be hard, but that would require an entirely different kind of logging. Tgr (talk) 20:24, 1 February 2023 (UTC)[reply]
Ich hatte gefragt, ob Du in letzter Zeit mal eine sehr große Datei hochgeladen hast. Offensichtlich haben überhaupt nur 15 User in der Zeit von 2017 bis 2023 webm, ogv oder tif Dateien von 4,2GB bis 4.0GiB hochgeladen. Die Hälfte davon vor 2020 und teilweise per Server-Side-Upload. Seit 31. März 2022 haben nur @PantheraLeo1359531 und ich solche Dateien hochgeladen. Ich mit meinem eigennen Tool, PantheraLeo1359531 mit bigChunkedUpload (und das muss eine Qual gewesen sein, 10 Teile eines Videos hochzuladen, weil bigchunkedupload immer nur eine Datei hochlädt - wenn es denn klappt). Warum werden kaum solche Dateien hochgeladen? Weil es mit fast keinem Tool klappt (zuletzt auch nicht per ServerSideUpload, siehe @Urbanec). Ein Grund dürften TimeOuts, Deadlocks und Lifelocks sein. Diese treten aber immer wieder auch bei kleineren Dateien auf. Und wenn das passiert, gibt der User (häufig ein Neuling) gewöhnlich auf - und eine möglicherweise wichtige Datei ist für Commons auf immer verloren. Ja, es wäre nett, wenn es möglich wäre, dann einen Devloper zu rufen, der in die Logs schaut, das Problem findet, die Ursache erkennt und dann repariert - oder jedenfalls in phab dokumentiert, damit irgendein Developer es später tut. Leider sagt meine Erfahrung (die ich mit viel eigener Lebenszeit bezahlt habe), dass es so nicht funktioniert. Natürlich ist es den meisten Usern komplett Wurst, was schief geht und welche Fehlermeldungen angezeigt werden. Aber eben nicht allen. Und wenn dann eine Meldung aus der Assembling- oder Publishing-Stage mitgeteilt wird, dann gibt es jedenfalls einen Ansatz mit dem Developer arbeiten können -selbst noch wenn die Logs längst Geschichte sind. Profitieren werden davon alle, weil diese Fehler bei kleinen Dateien zwar viel seltener aber eben doch auftreten. C.Suthorn (talk) 22:02, 1 February 2023 (UTC)[reply]

The stages you mentioned above (Well the ones that actually exist, and it depends how loosely I interpret what you are saying) should already be reported by the API. e.g. you would get an error code of stashfailed, but the error info would be different depending on the cause. I suspect you never see them, because that is not where the upload is failing. Anyways, chunked upload is really fragile and could use lots of love. Bawolff (talk) 06:37, 2 February 2023 (UTC)[reply]

There are three stages (of four, if you count "queueing" as stage): "uploading", "assembling" and "publishing". While "uploading" you can poll for information and get the number of bytes already uploaded. If you poll while "assembling" or "publishing" you only get "assembling" or "polling" as answer. After "assembling" has finished you get the metadata. If the upload fails for a valid reason "user has no right", "database is down", then you get this reason and that is fine. But if the upload fails even though it should have succeeded (and it actually will succeed, if you try once more (with luck) or thousend times more (with less luck) ) you don't get any status at all. You say chunked upload is really fragile. I don't think so. I think it actually is very stable, but it fails under specific circumstances. As i have written before: Upload Wizard will succeed with files of upto 600MB, it may succeed with 600MB to 1.2GB, and it will always fail with 1.2GB+ . Rillke's tool (and my own tool) will (at the moment) upload 4.0GiB in most cases with the first try (because chunked upload is actually stable). However both in Rillke's tool and in my tool you can see that "assembling" and "publishing" both takes minutes and while you are waiting you only get the status "still assembling"/"still publishing" until it either fails without any status message or it succeeds with either a published file or an error like (filename not allowed, already uploaded, user has not right, ...).

And I repeat: Upload a file of 4GiB (or at least more than 1.2GB) and see what happens. It will succeed with Rillke's tool and with mine, but fail with every other tool without any information why it failed.

And then: Why do I actually discuss this: My own uploads work, and what do I care, if the uploads of other users fail? C.Suthorn (talk) 09:52, 2 February 2023 (UTC)[reply]

I agree with the points that C.Suthorn made up. The upload process of larger files sometimes fails. For example, I upload some public domain textures that are sometimes larger (above 1 GiB) with recurring errors (server didn't respond within a timespan, ...). This sometimes needs much time. And the point is that some files (especially videos and more detailed meshes) will have file sizes above 1 GiB more often in the near future, as recording systems get more capable (especially videos in 4K longer than 5-10 minutes). And it is sad that it is sometimes buggy, which leads some users to give up. I think it is very important to support uploads of larger files (for example as suggested), and also higher file size limits, to be prepared for the future. --PantheraLeo1359531 (talk) 15:23, 2 February 2023 (UTC)[reply]

Addendum: While uploading the FAILED: stashfailed: Could not connect to storage backend "local-swift-codfw". error occurs --PantheraLeo1359531 (talk) 15:43, 2 February 2023 (UTC)[reply]

Voting

Support We should have status processes in uploading large files. Thingofme (talk) 03:26, 12 February 2023 (UTC)[reply]
Support Libcub (talk) 06:03, 12 February 2023 (UTC)[reply]
Support Mauricio V. Genta (talk) 07:57, 12 February 2023 (UTC)[reply]
Support Jusore (talk) 21:31, 12 February 2023 (UTC)[reply]
Support cyrfaw (talk) 13:40, 17 February 2023 (UTC)[reply]
Support Hans5958 (talk) 05:35, 20 February 2023 (UTC)[reply]
Support Althair (talk) 04:21, 23 February 2023 (UTC)[reply]