Talk:Objective Revision Evaluation Service

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

2 hours downtime today during a deploy[edit]

Hey folks,

We experienced some major downtime during a deploy today due to a version issue. See the post mortem for more details. Sorry for the trouble. We've got a set of tasks specified to make sure this doesn't happen again (see phab:T111806,phab:T111826, phab:T111828).

On the bright side, we now are running ORES 0.4.0 and revscoring 0.5.0 in production. These version include some substantial improvements in performance and a substantial increase in flexibility with which we can handle multi-lingual wikis. --EpochFail (talk) 17:07, 8 September 2015 (UTC)

ORES revert models super negative[edit]

A screenshot of ScoredRevisions that shows extreme negativity in the ORES revert model.
Extreme negativity in 'revert' model. A screenshot of ScoredRevisions that shows extreme negativity in the ORES revert model.

I've just become aware of this. I'm digging into the cause. I'll post updates here as I go. --EpochFail (talk) 16:02, 12 September 2015 (UTC)

Here's a diff of feature extraction for the two different versions:

feature set log(diff.added_symbolic_chars_ratio + 1), log(diff.chars_added + 1), log(diff.chars_removed + 1), diff.longest_repeated_char_added, diff.longest_token_added, log(diff.markup_chars_added + 1), log(diff.markup_chars_removed + 1), log(diff.numeric_chars_added + 1), log(diff.numeric_chars_removed + 1), diff.proportion_of_chars_added, diff.proportion_of_chars_removed, diff.proportion_of_markup_chars_added, diff.proportion_of_numeric_chars_added, diff.proportion_of_symbolic_chars_added, diff.proportion_of_uppercase_chars_added, log(diff.segments_added + 1), log(diff.segments_removed + 1), log(diff.symbolic_chars_added + 1), log(diff.symbolic_chars_removed + 1), log(diff.uppercase_chars_added + 1), log(diff.uppercase_chars_removed + 1), log(english.diff.words_added + 1), log(english.diff.words_removed + 1), diff.bytes_changed + 1, diff.bytes_changed_ratio, page.is_content_namespace, parent_revision.was_same_user, log(english.parent_revision.words + 1), log(user.age + 1), user.is_anon, user.is_bot, added_badwords_ratio, added_informals_ratio, added_misspellings_ratio, log(english.diff.badwords_added + 1), log(english.diff.badwords_removed + 1), log(english.diff.informals_added + 1), log(english.diff.informals_removed + 1), log(english.diff.misspellings_added + 1), log(english.diff.misspellings_removed + 1), proportion_of_badwords_added, proportion_of_badwords_removed, proportion_of_informals_added, proportion_of_informals_removed, proportion_of_misspellings_added, proportion_of_misspellings_removed reverted
old 2.9637098329855847 2.9444389791664403 3.5263605246161616 2 2 0.0 0.0 0.0 0.0 0.005968169761273209 0.010887495875948531 0.0 0.0 0.6666666666666666 0.0 1.9459101490553132 1.9459101490553132 2.5649493574615367 2.772588722239781 0.0 0.0 0.0 0.0 -14 -0.0049488617617947876 True True 6.115892125483034 19.176634902080707 False False 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 False
new 2.9637098329855847 2.9444389791664403 3.5263605246161616 2 2 0.0 0.0 0.0 0.0 0.005968169761273209 0.010887495875948531 0.0 0.0 0.6666666666666666 0.0 1.9459101490553132 1.9459101490553132 2.5649493574615367 2.772588722239781 0.0 0.0 0.0 0.0 -14 -0.0049488617617947876 True True 6.115892125483034 0.0 False False 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 False
old 5.170483995038151 3.713572066704308 0.0 2 16 1.6094379124341003 0.0 1.6094379124341003 0.0 1.0 0.0 0.1 0.1 0.175 0.125 0.6931471805599453 0.0 2.0794415416798357 0.0 1.791759469228055 0.0 1.3862943611198906 1.3862943611198906 41 40.0 False False 0.0 19.46581301576379 False False 0.0 0.0 33.33333333333333 0.0 0.0 0.0 0.0 0.6931471805599453 0.6931471805599453 0.0 0.0 0.0 0.0 0.3333333333333333 0.3333333333333333 False
new 5.170483995038151 3.713572066704308 0.0 2 16 1.6094379124341003 0.0 1.6094379124341003 0.0 1.0 0.0 0.1 0.1 0.175 0.125 0.6931471805599453 0.0 2.0794415416798357 0.0 1.791759469228055 0.0 1.3862943611198906 1.3862943611198906 41 40.0 False False 0.0 0.0 False False 0.0 0.0 33.33333333333333 0.0 0.0 0.0 0.0 0.6931471805599453 0.6931471805599453 0.0 0.0 0.0 0.0 0.3333333333333333 0.3333333333333333 False

It looks like the real issue is in getting information about the user who made the edit so that we can calculate age. --EpochFail (talk) 16:28, 12 September 2015 (UTC)

And we're back. After cleaning up that error and re-deploying the code, it looks like ORES has stopped being so sensitive about things. --EpochFail (talk) 18:49, 12 September 2015 (UTC)

ORES HTTPS down. HTTP still up.[edit]

ORES is served from Wikimedia Labs, a shared cloud computing service for Wikimedia tech stuff. Regretfully, the SSL certificate expired and was not renewed in time. If you try to access ORES through HTTPS, this will cause a security error (a warning in your browser or a failure if loaded from a gadget in Wikipedia). You can still access the service through HTTP without issue. See T112608 - *.wmflabs.org https certificate expired (tools.wmflabs.org) for details. I'm told that a new certificate will be issued by "next business day". I'll post again when I know ORES HTTPS is back online. --EpochFail (talk) 05:03, 15 September 2015 (UTC)

Looks like the new cert is applied. HTTPS back online. --EpochFail (talk) 16:12, 15 September 2015 (UTC)

More Artificial Intelligence for your quality control/curation work.[edit]

Support table:
context edit quality article quality
damaging goodfaith reverted wp10
arwiki Arabic Wikipedia Yes check.svg
cswiki Czech Wikipedia Yes check.svg Yes check.svg Yes check.svg
dewiki German Wikipedia Yes check.svg
enwiki English Wikipedia Yes check.svg Yes check.svg Yes check.svg Yes check.svg
enwiktionary English Wiktionary Yes check.svg
eswiki Spanish Wikipedia Yes check.svg
eswikibooks Spanish Wikibooks Yes check.svg
etwiki Estonian Wikipedia Yes check.svg Yes check.svg Yes check.svg
fawiki Persian Wikipedia Yes check.svg Yes check.svg Yes check.svg
fiwiki Finnish Wikipedia Yes check.svg
frwiki French Wikipedia Yes check.svg Yes check.svg
hewiki Hebrew Wikipedia Yes check.svg Yes check.svg Yes check.svg
huwiki Hungarian Wikipedia Yes check.svg
idwiki Indonesian Wikipedia Yes check.svg
itwiki Italian Wikipedia Yes check.svg
kowiki Korean Wikipedia Yes check.svg
nlwiki Dutch Wikipedia Yes check.svg Yes check.svg Yes check.svg
nowiki Norwegian Wikipedia Yes check.svg
plwiki Polish Wikipedia Yes check.svg Yes check.svg Yes check.svg
ptwiki Portuguese Wikipedia Yes check.svg Yes check.svg Yes check.svg
rowiki Romanian Wikipedia Yes check.svg
ruwiki Russian Wikipedia Yes check.svg Yes check.svg Yes check.svg Yes check.svg
svwiki Swedish Wikipedia Yes check.svg
trwiki Turkish Wikipedia Yes check.svg Yes check.svg Yes check.svg
ukwiki Ukrainian Wikipedia Yes check.svg
viwiki Vietnamese Wikipedia Yes check.svg
wikidatawiki Wikidata Yes check.svg Yes check.svg Yes check.svg

Hey folks. Today, we (the revision scoring team) are announcing a major breakthrough in our work to democratize efficient quality control and curation systems for Wikimedia projects. We've added basic support for our "reverted" (how likely is it that this edit will need to be reverted?) model to 14 wikis that span 13 languages. We've also extended support for our article quality "wp10" (what assessment class would this version of an article get?) to English and French Wikipedia. We'd like to invite you (especially tool developers) to test the service and consider using it in your next project. en:WP:Huggle has already adopted ORES and there are a number of tools and user scripts available that make use of our predictions.

A note of caution: These machine learning models will help find damaging edits and help assess the quality of articles, but they are inherently imperfect. They will make mistakes. Please use caution and keep this in mind. We want to learn from these mistakes to make the system better and to reduce the biases that have been learned by the models. Please report any interesting false positives and let us know of any trends that you see in the mistakes that ORES makes.

See also:

Happy hacking! --EpochFail (talk) 20:30, 14 November 2015 (UTC)

Here are a few tables showing the revisions with the highest scores for each wiki. It might help users on patrolling old revisions on their wiki:
Helder 11:59, 15 November 2015 (UTC)

API documentation needed[edit]

What do B, C, FA and GA stand for? KLeduc (WMF) (talk) 16:41, 16 November 2015 (UTC)

Yes check.svg Done in diff=14662557&oldid=14607146 --EpochFail (talk) 17:08, 16 November 2015 (UTC)

Checklist for itwiki setup.[edit]

Per Elitre's request, I'm writing up a list of to-do's for ORES support in Italian Wikipedia.

Edit quality models
  1. Set up reverted model based on past reverted edits. Yes check.svg Done
    Note that this model is imperfect and biased because it predicts a "revert" and not "damage". The damaging and goodfaith models are more directly applicable to quality control work, but they require a Wiki labels campaign.
  2. Setup a Wiki labels edit quality campaign (see Phab:T114505)
    1. Form translations [1] Yes check.svg Done
    2. UI translations [2] Yes check.svg Done
    3. Setup Wikipedia:Labels project page (see en:Wikipedia:Labels for reference -- native speakers needed)
    4. Load a sample of edits for labeling (EpochFail will do this)
Article quality models

Do Italian Wikipedians label articles by their quality level like en:Wikipedia:Version 1.0 Editorial Team/Assessment? If so, I need someone to how it works and what templates/categories are applicable. If no such ratings are done consistently in itwiki, I can set up a Wiki labels campaign for assessing article quality and use that to train a model. So either:

  1. Describe how Italian Wikipedia does article quality assessment.

or

  1. Work out what quality assessments you would like ORES to predict (doesn't have to be a single scale!) and I'll set up a Wiki labels campaign.

--EpochFail (talk) 16:44, 18 November 2015 (UTC)

@EpochFail: Hi, thanks for your work. I would like to ask you how the training of the neural network works (how many edits are analyzed and which ones, and which metrics are used), for the itwiki results we get in this moment through http://ores.wmflabs.org/scores/itwiki/. Thanks. --Rotpunkt (talk) 13:04, 19 November 2015 (UTC)
Rotpunkt, we're not using a neural network strategy in these model. We're using a support vector machine. For an overview of ORES, see ORES/What. That page includes a basic overview of supervised learning. Now for your specific questions:
  • How many edits are analyzed? We train the reverted, damaging and goodfaith models with 20,000 observations. See our Makefile and the quarry query for sampling edits from the last full year of Italian Wikipedia.
  • Which metrics are used? We detect reverts the identity revert strategy. The feature list for predicting damage in Italian Wikipedia can be found the feature_lists file: itwiki.py Note that the file imports from enwiki.py. This is because (1) most vandalism predictors use the same feature set and (2) there's English language vandalism everywhere we look.
I hope that this answers your questions. Please let me know if you have any proposals for how we should extend ORES/What --EpochFail (talk) 14:29, 19 November 2015 (UTC)
Thanks a lot, it's more clear now (although I will need some time to understand a bit of the entire system :) ). Just for information, in revscoring I found this script: [3]. Is it already used? If so, maybe that badwords and informals could be improved and in this way the ORES results would be better? Where did you found these italian words? --Rotpunkt (talk) 16:05, 19 November 2015 (UTC)
Rotpunkt Indeed! italian.py contains the assets we use for detecting badwords/informals and doing other language-specific feature extraction for Italian. These words come from a automated strategy for identifying words added by reverted edits, but not non-reverted edits. See Research:Revision scoring as a service/Word lists/it. We had a native speaker (DarTar) curate the list for us. If you can extend these lists for us, I'd be happy to incorporate that into the model and increase our accuracy/fitness. :) --EpochFail (talk) 16:18, 19 November 2015 (UTC)
@EpochFail: Ok! I will talk about it with other users in itwiki. Today I have added a new gadget so that many users (patrollers mainly) can start to familiarize with ORES and report false positives (I found ScoredRevisions the easiest tool to start playing with ORES). --Rotpunkt (talk) 17:42, 19 November 2015 (UTC)
Sounds good. FYI: He7d3r, ScoredRevisions is getting some love. :) --EpochFail (talk) 18:09, 19 November 2015 (UTC)

@EpochFail and He7d3r: In itwiki we have started to report false positives in this page: it:Progetto:Patrolling/ORES (do you need some translations or can DarTar help you?). In that page I have created some subsections:

  • Generici: for generic false positives
  • Correzioni in pagine di disambiguazione: for false positives in disambiguation pages (revision scoring seems influenced by edits in disambiguation pages)
  • Correzioni verbo avere: false positives related to italian verb "have" (why?)
  • Annullamenti o rimozioni di vandalismo: false positives that are removal of vandalism. IMHO these are the most annoying false positives :)

--Rotpunkt (talk) 12:17, 23 November 2015 (UTC)

Hi Rotpunkt! So, I think I figured out at least one of those issues. Right now, the model for itwiki borrows English language words for vandalism detection (since so much vandalism is in English cross-wiki). It turns out that "ha", "haha" and "hahaha" are all informal expressions that are common to English language vandalism, but "ha" is a totally legitimate word in Italian. I'll experiment with dropping English informals from the list of features for itwiki and re-scoring some of those revisions.
For the removal of vandalism, that seems to be a common problem that I think is related to the way that we were detecting reverts. Essentially, we don't care who or how a revision gets reverted when making this prediction, but vandals will often revert the patrollers! We've since implemented a way to minimize this effect and it seems that we're getting positive results from it. When I run that model on the example on the page you linked, we get consistently lower "reverted" probabilities. I should be able to get this model deployed within the next week.
For the other issues, we'll need to look and think harder. The notes you are helping us gather are invaluable. Thank you! --EpochFail (talk) 01:09, 25 November 2015 (UTC)
Nice, we look forward to these improvements, thanks to you and your team for this wonderful project :) Sometimes we also notice false negatives. Is it useful for you if we report false negatives (edits which should be reverted with score for "reverted" below 50%) in a dedicated section of it:Progetto:Patrolling/ORES? --Rotpunkt (talk) 07:40, 25 November 2015 (UTC)
Rotpunkt, yes those are worth noting too. Thank you!  :) --EpochFail (talk) 14:03, 25 November 2015 (UTC)
BTW, I just started extracting features and building new models. I should have an update on my two notes above within the next 5-6 hours. (I'm rebuilding all the models -- not just itwiki). --EpochFail (talk) 14:59, 25 November 2015 (UTC)
@EpochFail: Hi! Unfortunately we still have the problem of false positives related to italian verb "have". For example today: it:Special:Diff/77186648 (98%), it:Special:Diff/77186644 (94%) or yesterday: it:Special:Diff/77173988 (100%). --Rotpunkt (talk) 11:02, 14 December 2015 (UTC)
Looks like the new model didn't make it up in our most recent push. I'll try to get it on ORES today. Thanks for the ping. --Halfak (WMF) (talk) 13:59, 14 December 2015 (UTC)

┌─────────────────────────────────┘
I got the new model up on our staging server. But it still looks like there is something wrong. Here's a few of the problematic revisions that you noted:

So.... it looks like I made a mistake and accidentally added the English informals back into the model. *sigh* Sorry for the delay. I'll go fix this can get it on the staging server. I'll ping when that is ready. --Halfak (WMF) (talk) 17:46, 14 December 2015 (UTC)

OK. Just retrained the models and it looks like I've found "ha" in the italian informals too! I've made a change to the code and I'll be rebuilding again once it is merged. --Halfak (WMF) (talk) 19:20, 14 December 2015 (UTC)
...and we're building. I should be back in a couple hours with another update. --Halfak (WMF) (talk) 19:26, 14 December 2015 (UTC)
Nice, thanks a lot for your work! --Rotpunkt (talk) 20:06, 14 December 2015 (UTC)
I've got some new models trained, but it seems their predictions are still high.
I'm going to get this model deployed and keep working on the issue. I have a bunch of new features (sources of signal for the prediction) in the works that might help. --EpochFail (talk) 23:03, 14 December 2015 (UTC)
Rotpunkt The new model is live on ores.wmflabs.org. I've got some ideas for how to improve this further. I'll be digging into those ideas over the next week. I'll report back with what I learn. --Halfak (WMF) (talk) 15:04, 15 December 2015 (UTC)
Ok, I'll stay tuned, thanks! --Rotpunkt (talk) 20:12, 15 December 2015 (UTC)

┌─────────────────────────────────┘
Hi Rotpunkt. Sorry for the long wait. We've been doing a lot of infrastructural work around ORES, so I wasn't able to look at this as quickly as I'd hoped. ... So, I've been experimenting with different modeling strategies. It seems that we can get a little bit better statistical "fitness" with a en:gradient boosting (GB) model than the old linear en:support vector machine (SVM) model. Here's the new scores that I get for these three edits:

We don't have this deployed yet. I'm a little bit concerned that, while higher fitness, the probabilities for the GB type of model are scaled differently than the probabilities for the SVM model. Deploying the GB model now might be surprising to tool devs who rely on ORES and have hard-coded thresholds. It turns out that a lot of other wikis' models performed better with a change in modeling strategy too, so I'm hoping to do an announcement and to switch all of the models at the same time. I don't have a timeline for this yet as I haven't made the announcement, but we could do this as early as next week if we decide that it is urgent. --Halfak (WMF) (talk) 21:38, 24 March 2016 (UTC)

@Halfak Nice job, thanks from itwiki! --Rotpunkt (talk) 15:58, 25 March 2016 (UTC)

Other Wikis[edit]

Is it possible to add other wikis to this service? In particular, I'm looking to see how it would work for en.wikiversity in finding damaging edits. -- Dave Braunschweig (talk) 17:32, 2 December 2015 (UTC)

Hi Dave Braunschweig, sorry for the delay. It is possible to get this up and running on other wikis. I've filed a request. See Phab:T121397. You can subscribe there for updates, but I'll also ping here once we have something together. --Halfak (WMF) (talk) 14:01, 14 December 2015 (UTC)

Media coverage for Revscoring/ORES[edit]

A graph of daily pageviews for m:Objective Revision Evaluation Service and m:Research:Revision scoring as a service shows a sudden burst in interest after a post on the Wikimedia blog.
Pageviews to ORES and Revscoring docs. A graph of daily pageviews for m:Objective Revision Evaluation Service and m:Research:Revision scoring as a service shows a sudden burst in interest after a post on the Wikimedia blog.

Hey folks! Over the past couple of weeks, I have been working with the WMF Communications department and User:DarTar to write a blog post about our project. After going through a few iterations, the comms team got kind of excited about the potential for media attention, so we reached out to a couple of reporters that we knew. Well, coverage of the project has blown up. I've lost count of how many interviews I have given. I've started a list of media articles that cover our project on the main revscoring project page. See Research talk:Revision scoring as a service#Media coverage for Revscoring/ORES --Halfak (WMF) (talk) 17:12, 2 December 2015 (UTC)


Automatically generated documentation[edit]

Mockup of a swagger UI for ORES that contains details about model fitness

So, I've been looking into Swagger API specifications and swagger UI for describing the paths of the ORES service. See the phab task: Phab:T119271. So, it occurs to me that we could include most of the details that are currently on-wiki in the swagger specification. Since we can generate the swagger spec on demand from ORES, that means we can make sure that the spec is always up to date with the models that ORES hosts. We can also probably have plots of the ROC and PR curves in the docs. That would be very cool. To capture this idea, I made a quick mockup. See the thumb on the right. The two plots next to the "reverted" and "damaging" models are supposed to represent ROC and PR curves. --Halfak (WMF) (talk) 16:50, 9 December 2015 (UTC)

Hey folks. We got the first iteration of this online today. No fancy graphs and we don't have a section per wiki, but we do have some nice, clean documentation. See http://ores.wmflabs.org/v1/ and http://ores.wmflabs.org/v2/. --EpochFail (talk) 13:43, 22 March 2016 (UTC)

3 hours of downtime today[edit]

Hey folks. We had 3 hours of downtime today. This happened because our redis node ran out of disk space. I've posted a full incident report. See wikitech:Incident documentation/20151216-ores. We're back online and running at full capacity now. Thanks to Yuvipanda for waking up to help get the problem resolved. --Halfak (WMF) (talk) 17:03, 16 December 2015 (UTC)

RevisionNotFound[edit]

Hi, I am developing a bot that monitors recent changes and for some of them I ask the revision score to ores.wmflabs.org. I make only 4-5 request for minute. Sometimes, a couple of times each hour, I get a response like this one:

  "12345678": {
    "reverted": {
      "error": {
        "message": "RevisionNotFound: Could not locate revision. It may have been deleted.",
        "type": "RevisionNotFound"
      }
    }
  }

If I repeat the request after some time, it's ok. Is it related to database replication lag? I have to wait some seconds before querying http://tools.wmflabs.org? It happens only a couple of times each hour. --Rotpunkt (talk) 10:55, 18 December 2015 (UTC)

It turns out that there were a few reasons that an error message like this might be returned. We've since deployed a new version of ORES and updated the error messaging strategy. The new error messages should make it clear what resource is actually missing. Hopefully this helps. --EpochFail (talk) 14:02, 23 March 2016 (UTC)

Maintenance scheduled: Downtime expected @ 1400 UTC March 23rd[edit]

Hey folks,

As part of setting up production servers, we need to do a change that will affect our hosts in labs (via puppet). This will require a brief downtime event that should last less than 15 minutes.

We'll do this at 1400 UTC tomorrow, Tuesday March, 23rd. I'll post an update once we're done.

--EpochFail (talk) 13:36, 22 March 2016 (UTC)

Starting now. --EpochFail (talk) 14:01, 23 March 2016 (UTC)
Eek! I forgot to ping when it ended! We were done within 1 hours because of some issues with https-only. The good news is that the web & worker nodes were robust to a redis outage.
The models look good statistically, but please let us know how they actually work in practice. --EpochFail (talk) 00:04, 25 March 2016 (UTC)

Verifying ORES integrity[edit]

I just had a quick chat with NSchaad about how we verify that ORES is working. I figured I'd copy and paste the small set of notes I gave him here.

Verifying "up"
We use icinga to verify that ORES is online and to send us notifications if it is not online. icinga will send HTTP requests to ORES once every two minutes.
What about flower?
We use flower to monitor the celery workers during a deployment. We currently have an instance of flower running on ores-web-03. You'll need to ssh tunnel to get to it. It runs on the port 5555
  • Here's my ssh command to set up the port forwarding : ssh -L 5555:localhost:5555 ores-web-03.eqiad.wmflabs
  • Then direct your browser to http://localhost:5555 and you should get the flower UI
Graphite in labs
  • We send usage metrics to a statsd instance in labs. You can look at the metrics @ http://graphite.wmflabs.org. Looks for "ores" under metrics.

--EpochFail (talk) 18:49, 25 March 2016 (UTC)

Updates to ORES service & BREAKING CHANGE on April 7th[edit]

Hey folks, we have a couple of announcements for you today. First is that ORES has a large set of new functionality that you might like to take advantage of. We’ll also want to talk about a breaking change that’s coming on April 7th.

New functionality[edit]

Scoring UI

Sometimes you just want to score a few revisions in ORES and remembering the URL structure is hard. So, we’ve build a simple scoring user-interface that will allow you to more easily score a set of edits.

New API version

We’ve been consistently getting requests to include more information in ORES’ responses. In order to make space for this new information, we needed to change the structure of responses. But we wanted to do this without breaking the tools that are already using ORES. So, we’ve developed a versioning scheme that will allow you to take advantage of new functionality when you are ready. The same old API will continue to be available at https://ores.wmflabs.org/scores/, but we’ve added two additional paths on top of this.

Swagger documentation

Curious about the new functionality available in “v2” or maybe what the change was from “v1”? We’ve implemented a structured description of both versions of the scoring API using swagger – which is becoming a defacto stanard for this sort of thing. Visit https://ores.wmflabs.org/v1/ or https://ores.wmflabs.org/v2/ to see the Swagger user-interface. Visit https://ores.wmflabs.org/v1/spec/ or https://ores.wmflabs.org/v2/spec/ to get the specification in a machine-readable format.

Feature values & injection

Have you wondered what ORES uses to make it’s predictions? You can now ask ORES to show you the list of “feature” statistics it uses to score revisions. For example, https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892/?features will return the score with a mapping of feature values used by the “wp10” article quality model in English Wikipedia to score 34567892. You can also “inject” features into the scoring process to see how that affects the prediction. E.g., https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892?features&feature.wikitext.revision.chars=10000

Breaking change – new models[edit]

We’ve been experimenting with new learning algorithms to make ORES work better and we’ve found that we get better results with gradient boosting and random forest strategies than we do with the current linear svc models. We’d like to get these new, better models deployed as soon as possible, but with the new algorithm comes a change in the range of probabilities returned by the model. So, when we deploy this change, any tools that uses hard-coded thresholds on ORES’ prediction probabilities will suddenly start behaving strangely. Regretfully, we haven’t found a way around this problem, so we’re announcing the change now and we plan to deploy this BREAKING CHANGE on April 7th. Please subscribe to the AI mailinglist or watch this page to catch announcements of future changes and new functionality.

In order to make sure we don’t end up in the same situation the next time we want to change an algorithm, we’ve included a suite of evaluation statistics with each model. The filter_rate_at_recall(0.9), filter_rate_at_recall(0.75), and recall_at_fpr(0.1) thresholds represent three critical thresholds (should review, needs review, and definitely damaging – respectively) that can be used to automatically configure your wiki tool. So, come breaking change, we strongly recommend basing your thresholds on these statistics in the future. We’ll be working to submit patches to tools that use ORES in the next week to implement this flexibility.

--EpochFail (talk) 09:47, 3 April 2016 (UTC)

Proposed section[edit]

I want to add this section. Let's talk before :)

Adding new language

If you want to add support for a new language. Firstly, File a bug in phabricator for language utilities. Here's an example from Dutch. Then review "bad words" we generate for the language and Translate interface messages. After that, we will implement a basic model and we can have Wikilabels ready for more advanced ones.

What do you think? Amir (talk) 12:25, 1 May 2016 (UTC)

I'm a fan. I think that we should have a step by step process for each model. Regretfully, it's a little bit complicated because it's dependency graph. E.g. we need language assets in order to build the reverted, goodfaith and damaging models, but the latter two also need a fully completed Wiki labels campaign. The wp10 model requires a labeled data strategy which might come via consultation or through a Wiki labels campaign. Just thinking about how to give someone an overview of a process like this and I thought of "Instructions for Bot Operators" at en:Wikipedia:Bots/Requests for approval. We might borrow some insights from there, though I think we want to avoid the verbosity. --EpochFail (talk) 15:10, 1 May 2016 (UTC)

Minor downtime today[edit]

Labs had a DNS blip that caused ORES to be down for a few minutes (between 14:59 and 15:04 UTC) today. Everything seems to be back to normal now.

From #wikimedia-labsconnect:

[10:32:25] <YuviPanda> halfak: temp dns blip
[10:32:36] <halfak> Gotcha.  Thanks YuviPanda 
[10:32:57] <halfak> was it big enough to warrant a write-up?
[10:33:13] <halfak> If not, I'll just post "temp DNS blib" to my ORES users and call it good.
[10:33:39] <YuviPanda> halfak: probably not, since we're doing a bunch of DNS stuff in the next few days to shore up DNS
[10:34:20] <halfak> kk

--Halfak (WMF) (talk) 15:37, 4 May 2016 (UTC)

Capturing features regarding markup removal[edit]

While reviewing some edits for ORES, I came to conclude that ores flags false negative when the user messes with markup stuff, "i.e. removing a markup character".So this section is intended to solve this problem.


--Ghassanmass (talk) 00:02, 13 July 2016 (UTC)

Hi Ghassanmass. Sorry to not notice your post until now. Can you link to a few examples of edits that are misclassified? --EpochFail (talk) 18:18, 23 August 2016 (UTC)

Hello, halfak, I was able to reproduce the error while doing the opposite "fixing the markup removal", so now we are dealing with false positive using arwiki reverted "0.2.2". RevId=20953320. --Ghassanmass (talk) 13:01, 5 September 2016 (UTC)

Damaging model for frwiki[edit]

Hi there! Community Tech is in the process of porting Copypatrol to work on French wikipedia. We make use of ORES damaging scores in the tool to indicate edits with higher probability of being plagiarized. Can we have them turned on for frwiki as well? It'd be very helpful. Thank you. -- NKohli (WMF) (talk) 11:15, 3 November 2016 (UTC)

[Cross-post] Including new filter interface in ORES review tool[edit]

The new filtering interface demo

Hey folks,

I made a post at mw:Topic:Tflhjj5x1numzg67 about including the new advanced filtering interface that the Collaboration Team is working on in the ORES beta feature. See the original post and add any discussion points there. --EpochFail (talk) 23:05, 18 November 2016 (UTC)

Get me up to speed please[edit]

Due to the urgency of seriously improving the triage of its massive daily intake of new articles, I have been heavily involved in the immediate action of improving the tools we already have: Page Curation/New Pages Feed, and to make them more attractive to users. I am therefore not up to speed with ORES, but I certainly appreciate its potential.

Is there someone with whom I can have a short discussion over Skype who can quickly bring me up to date with ORES, bearing in mind that while I am a linguist and communication expert (I worked on some of the first semantic search systems nearly 20 years ago), I have a severely limited knowledge of AI and IT. Perhaps someone can e-mail me first at en:user:Kudpung. Thanks. Kudpung (talk) 09:48, 27 November 2016 (UTC)

Took a while, but I'm very grateful to Aaron, (EpochFail), for discussing this at length with me over Skype last night, Friday 3 February.

Whilst I welcome any form of AI that will help preserve what is now a seriously dwindling public respect for the quality of Wikipedia content, before deploying (or even developing) ORES for Page Curation, we need to establish why the patroller community is largely resistant to adopting the New Pages Feed and its Page Curation Tool as the default process for controlling new intake. The reasons are actually quite clear but on its own admission the Foundation no longer regards addressing them as a priority.

One important way to address this and significantly reduce the fire hose of trash is to educate new users the instant they register, on what they can and cannot insert in the Wikipedia. A proper welcome page has never been developed and a 2011 attempt by some employees to create one (Article Creation Work Flow) as part of the Page Curation process was thwarted by internal events within the WMF. This was the other half of the Page Curation project which was begun by the Foundation in answer to the community's overwhelming demand for en:WP:ACTRIAL (which might now soon have to become a reality).

AI is not a panacea - it should assist but not seek to replace the dysfunctional human aspect of the triage of new pages, or be a palliative for the improvement of the parts of the Curation GUI that the Foundation will not prioritise. New Page Patrolling is the only firewall against unwanted new pages, not only is it now a very serious critical issue, but it should be the Foundation's single top priority before anything else of any kind. —The preceding unsigned comment was added by Kudpung (talk)

If any of those working on ORES could also have a brief conversation with me over Skype sometime in the near future, I'd appreciate it. I'd like to discuss the potential of ORES to identify paid editing rings/patterns. Please shoot me an email. ~ Rob13Talk 04:31, 23 February 2017 (UTC)
Hi BU Rob13. We don't have paid editing patterns on our radar in the near future, but there's some related things we're working on and other things that can be done to increase our focus on COI editing.
  1. We've been working on en:PCFGs -- a natural language processing strategy that allows us to score sentences by whether or not they "look" a certain way. E.g. we can score a sentence as likely to have come from a "Featured Article" vs. coming from an article deleted because it's promotional. You could use such a model to see if a user's past edits swung towards promotional/spammy tone. We haven't been able to deploy this language modeling strategy because of operational concerns (grammar parsers use too much memory). But we're working on some alternative grammar parsing strategies that may allow us to get this type of model deployed. See my last comment in Phab:T120170 and Phab:T157041
  2. The best way that we can make progress on this is to find someone who wants to do the data analysis and modeling work that is involved with setting up these models. I'm always recruiting grad students and other researchers to pick up these projects. I haven't found someone who is specifically interested in paid editing/COI/spam to take a look at this. I think something that we could do in order to gather some attention is to create a dataset of known paid editing rings. Is there a list that could be referenced? --EpochFail (talk) 19:58, 25 February 2017 (UTC)
There are categories on enwiki of existing paid editing rings. Let me talk to some admins who deal with this primarily and see if I can get you some lists. Administrators can safely be used as a "negative" for paid editing to compare against the "positive" lists. Let me talk to a friend who's a CS grad student. He may or may not be interested in contributing. ~ Rob13Talk 05:21, 26 February 2017 (UTC)

How does one use this?[edit]

There does not appear to be any set of instructions for using this software. Is there a way to run it within a project on en:WP to get a table of ORES ratings vs human ratings? · · · Peter (Southwood) (talk): 07:48, 29 March 2017 (UTC)

Peter. Find a diff that you'd like to be scored. E.g. en:Special:Diff/772810534. Then you provide that ID to ORES. E.g. https://ores.wikimedia.org/v2/scores/enwiki/damaging/772810534 and you get a prediction. (0.961 true "damaging") ORES is not intended to be an interface for humans, but rather an interface for tools, gadgets, and bots. The idea is that ORES handles the hard part of making a good prediction and other tool developers connect to ORES to apply it to a specific use-case. I'm not aware of any tools that allow ORES to run within a specific WikiProject. --EpochFail (talk) 14:03, 29 March 2017 (UTC)

WikiProject assessments vs. external reviewers vs. ORES[edit]

Cross-posting from en:Wikipedia:Village_pump_(miscellaneous)#WikiProject assessments vs. external reviewers vs. ORES

Hey folks, I have been collaborating with some researchers who are publishing a dataset of externally reviewed Wikipedia articles (the sample was taken back in 2006). I'd like to take the opportunity to compare the prediction quality of m:ORES' article quality model with these external reviewers, but in order get a good picture of the situation, it would also be very helpful to get a set of Wikipedian assessments for the same dataset. So, I have gathered all of the versions of externally reviewed articles in en:User:EpochFail/ORES_audit and I'm asking for your help to gather assessments. There's 90 old revisions of articles that I need your help assessing. I don't think this will take long, but I need to borrow your judgement here to make sure I'm not biasing things.

To help out, see en:User:EpochFail/ORES_audit.

ORES is a generalized machine prediction service that helps catch vandalism, measure the development of articles, and support student editors. The more we know about how ORES performs against important baselines, the better use of it we can make it to measure Wikipedia and direct wiki work. --EpochFail (talk) 22:08, 6 April 2017 (UTC)

Translations[edit]

Hello

I'm sending messages to many communities concerning the Edit Review Improvements and the related filters. Most of these communities hasn't started the process to get ORES, and I suggest them to start it to have the new predictions filters.

ORES pages are in English only, which may be a possible blocker for people who want to know more. Can I mark the pages for translation ?

Thanks, Trizek (WMF) (talk) 12:53, 21 April 2017 (UTC)