Jump to content

Grants talk:IdeaLab/Search recent changes: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 8 years ago by DGarry (WMF) in topic Right of first refusal
Content deleted Content added
EllenCT (talk | contribs)
Line 6: Line 6:
# Based on my limited understanding of the technology at play, this proposal is actually quite a large undertaking. Our search index is already very large (on the order of terabytes), and it only takes into account the ''current'' state of the pages on-wiki; adding in every previous revision of every page would likely cause the index to grow to an unmanageable size. There may be ways to mitigate this problem and allow the project to succeed, but that investigation would take time that Discovery does not have.
# Based on my limited understanding of the technology at play, this proposal is actually quite a large undertaking. Our search index is already very large (on the order of terabytes), and it only takes into account the ''current'' state of the pages on-wiki; adding in every previous revision of every page would likely cause the index to grow to an unmanageable size. There may be ways to mitigate this problem and allow the project to succeed, but that investigation would take time that Discovery does not have.
That said, I think this a promising project, so best of luck! --[[User:Deskana (WMF)|Dan Garry, Wikimedia Foundation]] ([[User talk:DGarry (WMF)|talk]]) 20:29, 28 March 2016 (UTC)
That said, I think this a promising project, so best of luck! --[[User:Deskana (WMF)|Dan Garry, Wikimedia Foundation]] ([[User talk:DGarry (WMF)|talk]]) 20:29, 28 March 2016 (UTC)
::{{Ping|DGarry (WMF)}} thank you for your comments. How would you estimate the index size and work requirements if recent changes for the English Wikipedia were made available for the past two months only? [[User:EllenCT|EllenCT]] ([[User talk:EllenCT|talk]]) 01:43, 29 March 2016 (UTC)
:{{Ping|DGarry (WMF)}} thank you for your comments. How would you estimate the index size and work requirements if recent changes for the English Wikipedia were made available for the past two months only? [[User:EllenCT|EllenCT]] ([[User talk:EllenCT|talk]]) 01:43, 29 March 2016 (UTC)
::{{ping|EllenCT}} I don't know, and as mentioned above I don't have spare capacity to investigate. That said, off-hand, it seems like limiting how far back you store the data would be a good approach to limit the hardware requirements. --[[User:Deskana (WMF)|Dan Garry, Wikimedia Foundation]] ([[User talk:DGarry (WMF)|talk]]) 17:45, 30 March 2016 (UTC)


== Maybe not public-unauthenticated? ==
== Maybe not public-unauthenticated? ==

Revision as of 17:46, 30 March 2016

Right of first refusal

@WMoran (WMF): you and your staff are most welcome to do this, as are volunteers. EllenCT (talk) 20:52, 16 March 2016 (UTC)Reply

@EllenCT: Hey! Thanks for the proposal. Discovery will not be able to work on this, for two reasons.

  1. The proposal is outside our current plans for our work in the next fiscal year (July 2016 - June 2017). This isn't to say that the proposal isn't valuable, but given the limited resources the team needs to prioritise our work based on where we think we can see the biggest rewards for our efforts.
  2. Based on my limited understanding of the technology at play, this proposal is actually quite a large undertaking. Our search index is already very large (on the order of terabytes), and it only takes into account the current state of the pages on-wiki; adding in every previous revision of every page would likely cause the index to grow to an unmanageable size. There may be ways to mitigate this problem and allow the project to succeed, but that investigation would take time that Discovery does not have.

That said, I think this a promising project, so best of luck! --Dan Garry, Wikimedia Foundation (talk) 20:29, 28 March 2016 (UTC)Reply

@DGarry (WMF): thank you for your comments. How would you estimate the index size and work requirements if recent changes for the English Wikipedia were made available for the past two months only? EllenCT (talk) 01:43, 29 March 2016 (UTC)Reply
@EllenCT: I don't know, and as mentioned above I don't have spare capacity to investigate. That said, off-hand, it seems like limiting how far back you store the data would be a good approach to limit the hardware requirements. --Dan Garry, Wikimedia Foundation (talk) 17:45, 30 March 2016 (UTC)Reply

Maybe not public-unauthenticated?

While this would be a useful tool for the reasons identified (and perhaps others), it might also enable abuse (in particular, stalking someone with a contrary PoV). It might be appropriate, for example, to restrict not-logged-in users to only a random sample of the past few hours, or expose a log of search-term ranges and keywords to the public, or to administrators. — The preceding unsigned comment was added by DavidLeeLambert (talk)

@DavidLeeLambert: interesting! Can you describe in more detail the scenario you envision? EllenCT (talk) 03:09, 19 March 2016 (UTC)Reply