Uploadable Bots

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

The following is a way to implement w:Wikipedia:Bot policy. See w:Wikipedia talk:Bot policy for further dicussion and comparison.

Uploadable Bot-Scripts[edit]

Process of creating script:

  1. Goto Special:Bot. Click "Create New Bot".
  2. Name the Bot and add usage description. Write or paste the script. Click "Add Bot".
  3. (Bot is added. User is sent to its page.)
  4. Click "Run Bot".
  5. (Bot's DATA section is interpretted)
  6. Fill out prompts. Upload any required databases. Click "Continue".
  7. (Bot's CODE loop is executed)
  8. Review Change-log, click "Save Changes" or "Cancel".

The run of the script should occur async to the user who should be able to view the bots progress by returning to the bot's page. The bot's progress should appear immediately in articles as the bot is processing its script. The execution should be both cancel-able and undo-able on the bot's page. A change-log should appear from the bot's page but not in the list of recently updated pages (enless the viewer of recent updates elects to view bot activity). One entry in the list of recently updated pages should exist for each bot per hour (linking to the bot's change-log). Use of external bots should be forbidden. Rlee0001 09:47 Oct 28, 2002 (UTC)

Idea is nice, but very dangerous. Uploading scripts... Jeronimo
I'm thinking, proprietery language. With commands like IfExists, CreateArticle, AppendArticle, ReplaceArticle, MoveArticle, InsertIntoOrderedList, InsertIntoUnorderedList, InsertSection, UpdateProperty, OpenDatabase, CloseDatabase, MoveTo, RecordCount, MoveNext, MoveFirst, MoveLast, MovePrevious, SaveSetting, LoadSetting, OpenArticle, CloseArticle, RemoveArticle, ClearArticle, RedirectArticle, CreateRegion, MoveRegion, PositionRegion, BoldRegion, ItalicRegion, IndentRegion, InsertOrderedList, InsertUnorderedList, AddProperty, InsertCode, CodeRegion, InsertTable, InsertImage, InsertLink, UpdateLink, GetFirstLink, GetNextLink, GetFirstSection, GetNextSection, GetSectionTitle, GetLinkTarget, GetLinkText, SetLinkTarget, SetLinkText, SetSectionTitle, SetSectionText, RemoveLink, RemoveSection, RemoveTable, RemoveCode, RemoveUnorderedList, RemoveOrderedList, and so on and so on... Perhaps:
OnError {
  EmitLog ("Unable to Initialize, Quitting.");
  Return (1);
}
RecordSet CanadianCities;
Article ThisCity;
CanadianCities = OpenDatabase("CanCits.mdb", "CityTable")

ForEach (CanadianCity, CanadianCities) {
  OnError {
    EmitLog ("Skipping article: [[" & CanadianCity.CityName & ", " & _
      CanadianCity.Province & "]]. Unknown error working with article.");
    MoveNext
  }
  IfArticleExists (CanadianCity.CityName & ", " & CanadianCity.Province) {
    EmitLog ("Skipping article: [[" & CanadianCity.CityName & ", " & _
      CanadianCity.Province & "]]. Article already exists.");
    MoveNext;
  }
  CreateArticle (CanadianCity.CityName & ", " & CanadianCity.Province);
  ThisCity = OpenArticle (CanadianCity.CityName & ", " & _
    CanadianCity.Province);
  ThisCity.AppendWikiText ("" & CanadianCity.CityName & ", " & _
    CanadianCity.Province & " has a population of " & _
    CanadianCity.Population & " and a total land area of " & _
    CanadianCity.LandArea & ".");
  ThisCity.Save ("Creating stub, added population and land area.");
  ThisCity.Close;
  MoveNext;
}
Return (0);
Just an idea. Rlee0001 21:40 Oct 28, 2002 (UTC)
This makes it a little safer, but not much; I can still upload a database with all article names and overwrite everything with "Wikipedia!" or so. Even if you let the user only fill in the last part of your code (were a new article is created), it's still kinda dangerous. In fact, I'm afraid having this possibility will attract more "vandals" than when allowing only self-built bots (because that means somebody will actually have to do some work). Jeronimo
Well I was thinking that the execution of the bot would be "Undoable" as one single edit. Obviously each individual article can also be reverted/edited, but if a bot made a mistake, as long as the bot's log were in a server-readable format, any user can revert the entire bot's run by simply openning the log for the run and clicking "Revert All" (Or whatever). The only exception would be, if an article were modified after the bot modified the article, the undo option would have no effect on that particular article becouse of possible edit conflicts. Another words, each "bot" should have a "History" page which lists all the runs that the bot has made. Possibly like this:
Openning one of the logs above would give you a "Modification List" exactly like the Recently Updated Pages page. So that people can see what articles were affected and see the DIFF between article versions and so on. And of course, Revert All becomes possible. Furthormore, becouse the script is server-side, the 'administration' (or whoever) can grant or deny any user access to the bot feature on the per-run bases, per-bot bases or indefinatly. Furthor, the server can enforce a limit on how many edits any given run can make. This forces users to break runs down into sections so that if a catestrofic error were found, it would only effect a limited number of articles (say, 1000). Between runs, a bot can save settings and restore settings (like the last record number used). Rlee0001 09:23 Oct 29, 2002 (UTC)
Sound like a lot of work to implement, but a good idea! In this way, we tackle the problem at the root, and there should be no need for "tagging" or anything. Jeronimo
Do you think I should make this Proposal #4 or is it even practical to impliment. I can't even imagine how hard it would be to write a server-side interpretter. I mean, you'd have to address usage of database locks, CPU cycles, bandwidth limits, numbers of concurrent threads, etc... And if you want to be able to run it asyncronously, that would make it even more difficult to code and almost impossible in PHP alone. You'd pretty much have to hand the script over to a seperate process which does the processing I think. Possibly even a seperate server if lots of bots need to run concurrently. Do you think the plan addresses all the current issues with remote bots? Does it raise issues (besides the implimentation process) that cannot be addressed with proper policies and usage limitations? Rlee0001 09:50 Oct 30, 2002 (UTC)

Hmm, I don't think I like the idea of running scripts on our server... and implementation details aside, it doesn't solve any of the problems of malicious remote bots that work through the web interface except with a stern warning. En-masse rollback should be possible in general, not just for known and approved bots. Automatic grouping of large numbers of page creations on Recentchanges should help on the RC-flooding front. A scripting-friendly remote interface to encourage well-behaved bots might be a good idea as well. --Brion 10:26 Oct 30, 2002 (UTC)

"it doesn't solve any of the problems..." No, but I don't think these problems needn't be solved. If we can guarantee that bots behave well (several of the measures you mention would help a lot - and maybe running scripts serverside is not such a good idea indeed), I think there's no reason not to include them. Malicious bots can be stopped quickly and easily, and are maybe even easier to spot and stop than a malicious user or IP. Jeronimo
It would work just as effectively if the <strikeout>server</strikeout>data-center-side bots were to be placed on a different machine, it would not load the network, you could feed real-time load information to the bot-server, and as a last resort, pull the plug. Operating the "bots" on their own host with reboots between runs would allow current client-side bots to be run at the data-center with little or no modification, the only need for a client-side bot would be for performing test-runs. --Martin Rudat(T|@|C) 10:58, 28 February 2006 (UTC)

To reach a positive solution, if someone else views something as a problem which you don't, you should accept is as a problem. E.g., I don't care about Esperanto[1], but I know that others do, and so I attempt to consider their concerns as if they were my own.

My alternative is to convince them to see my point of view.

I do not consider ignoring their opinion an option.

--The Cunctator

[1] well, I do a little, since I'm a Harry Harrison fan, but for the sake of argument.

You mean like you did with my opinion? Rlee0001 05:54 Oct 31, 2002 (UTC)
What do you mean? I asked for clarification. --The Cunctator
  1. Adds tens of thousands of entries to Wikipedia that are unlikely to see a human edit any time soon (in fact, we could probably extrapolate the nearly exact rate at which they will get edited by seeing how many have been edited so far)
    I don't see how any solution to bots will ever fix this. This is inherent of any mass edit.
    One solution is to ban mass edits. Alternative solutions, including those proposed, involve explicitly marking such entries as imported, and presenting them as unwritten entries for the sake of editing (but not searching).
  2. Artifically inflates the perceived activity of Wikipedia
    Ditto
    Actually, this is different. This is primarily an interface issue, namely the article count, etc.
  3. Can be perceived as tilting (and possibly could tilt) the purpose of Wikipedia away from being an encyclopedia and towards being a gazetteer / Sports Trivia Reference / etc. This is also a problem with hand-generated imports from other resources.
    How is this a problem at all? Is it our job to decide how readers will use information?
    Yes. See w:Wikipedia is not a dictionary.
  4. Danger of abuse by "vandal-bots", or just "clueless-bots". A bot running out of control could potentially cause heavy server load or even a w:denial of service attack.
    My solution completely solves this concern. With an undo option and be bringing the bot to ther server, how it is used and who can use them can be decided by either an admin or by some sort of consensus.
    Well, it doesn't completely solve it (unless someone writes some amazing quality-control code), but yes, your proposal does as well in this regard as any of the others. Perhaps even better.
    Unless the number of bots requesting permission to be run is an enormous number, there could be a vote called, or a requirement of admin aproval to allow a bot to leave the queue. From what I've read about the number of wiki bots around, quite a few of them are either simple invocations of pywikipedia, or custom code that gets run periodically, only the remainder would really require close attention. --Martin Rudat(T|@|C) 10:58, 28 February 2006 (UTC)
  5. General complaints about interference with normal contributor operations, esp. Special:RecentChanges.
    By bringing bots server-side, A single entry can be added to the "Special:RecentChanges log instead of thousends. IE:
    That doesn't seem directly advisable, though it certainly could work.
    And concerns about "Server hogging" disappear as the network isn't used for individual edits.Rlee0001 05:54 Oct 31, 2002 (UTC)
    Yes.
    (Network bandwidth is probably a non-issue here. 30,000 ram-bot edits are a drop in the bucket. --Brion)
    Here's a question for you: do you see this as one-or-the-other? Or could you imagine a set-up with both external bots and server processes? One must take into consideration that people will always want to write bots, and the hard-banning of them, rather than some middle ground, will encourage mischief (or at least cluelessness).
    I could imagine a place for server processes, but it's such a non-trivial problem to set up that the mind boggles. At least this one. It would be very impressive if someone got it working.
    It's just that it's a good basic design principle (a central OO principle, for example) that functionalities have a single interface (encapsulation, etc.). Problems arise if you have two alternate and equivalently powerful methods of manipulating entries. Again, it's possible to code something that obviates this problem, but it's a scary task.
    Perhaps pywikipedia could be extended to include its own mini-lanuage (as proposed above), and be used both externally and internally for implementing bots? Or is this not what you are referring to? --Martin Rudat(T|@|C)
--The Cunctator