Knowledge Engine/FAQ

Goals[edit]

What is the goal of the Knowledge Engine[edit]

1. I would ask a rather simple question: what is the ultimate goal of the Knowledge Engine project? I am interested not in the goal of the Knowledge Engine grant (12 months and a minor fraction of the budget) but in that of the entire project (4 * 18 months = 6 years), so what do we expect to be accomplished in 6 years if the Knowledge Engine project is successful? Thanks — NickK (talk) 21:24, 17 February 2016 (UTC)[reply]

I can answer in my personal capacity as a board member - I don't speak for Lila nor for the board. First, it's really important to understand that the idea of "Knowledge Engine" is dead, replaced by a broader concept of "Discovery". The ultimate goal of the project is, to me and I think to all people who have seriously considered it, not yet clear. This is really important to understanding in an atmosphere where there is a lot of concern that there is some kind of secret master plan. Here's some stuff that we all know. We know that our internal search engine sucks. We know that the http://www.wikipedia.org 'home page' isn't very useful - it is more or less exactly as useless as it has been for more than 10 years. We know that the ways that people are finding things online are changing. We know that mobile is having a huge impact. We know that improvements to search - some low hanging fruit - will require some engineering but won't be that hard. So in my view, the Knight grant is a first step in figuring out what we should do about "discovery" in the long run. There is not at this time enough research to really have formulated a longer term plan, although of course many people will have great ideas about where we should be heading.--Jimbo Wales (talk) 16:34, 20 February 2016 (UTC)[reply]

@Jimbo Wales: Thank you for your answer. I am really surprised to find out that Knowledge Engine is dead as the title of this page says "Knowledge Engine/FAQ", not "Discovery/FAQ". This does not add any clarity, hence I have to ask three additional questions:

Is Discovery just a new name of Knowledge Engine project or is it a new project with different goals?
What is "Discovery" about? The word "discovery" can mean anything from a search engine (discovering websites) to exploring indigenous languages in Subsaharan Africa (discovering languages). I do see some small projects (improving search, improving wikipedia.org portal), but what is the general idea of Discovery, what kind of activities does it cover?
Is the Knowledge Engine timeframe (4 stages * 18 months) still relevant?

Thank you in advance for your (or Lila's, or someone's else) answers to these questions — NickK (talk) 01:55, 21 February 2016 (UTC)[reply]

NickK a lot of this has been answered elsewhere. (see here for example. The answer to 1 is messy - Namewise, KE transitioned to "Search and Discovery" and then just "Discovery". Lila and the WMF board have not been clear about what was happening under those various names. nor what things they were committed to over the last year; we are hoping to get disclosure of that. They are saying now that Discovery has only modest, preliminary goals. For #2, what Discovery is doing, see Discovery's FAQ here. About #3, WMF management and the board have not spoken to the longer timeline. They have talked a lot about Discovery being preliminary but they just stay silent on the longer arc or say that there is no longer term strategy. It is hard to believe that any organization as serious as WMF, with serious people on its board, would not have a long term plan, nor that a tech executive like Lila would operate without the context of a longterm strategy, and I and many others are doing what we can to try to elicit disclosure of that too. Maybe when Lila returns, she will speak to that. She has not written anything here, at her FAQ/AMA, since the 18th, and I am looking forward to her coming back. That is where things stand, as far as I understand them. Jytdog (talk) 04:30, 21 February 2016 (UTC)[reply]

Thanks @Jytdog: for your comment. I have read MaxSem's answer before asking my question here: it is quite clear that Discovery team is a bunch of people who are doing cool and useful stuff at the moment but it is not clear what is the long-term goal of the KE/Discovery project. It would be very surprising to me as well to see Lila launch a project without clear goals and with very vague strategy, that's why I hope to get clear answers to my questions — NickK (talk) 12:25, 21 February 2016 (UTC)[reply]

Yep. Jytdog (talk) 21:21, 21 February 2016 (UTC)[reply]

2. I have posted a bunch of questions on the Discussion page of the Discovery team's FAQ, here. At a high level, I would like the WMF to clearly - in relatively plain English without technobabble - lay out the vision for what the KE will do, what kind of results it will produce, and how those results relate to existing WP content. I also want to understand how all this relates to WMF's commitment to making existing WP content more available to the public. I am looking for something as clear as the following, which is what i understand the vision to be: people will enter queries at wikipedia.org, and then the KE will query Wikidata and whatever linked datasources there are, and then will construct WP-article-like content that it will present as the result, on the fly, per query. Like this. Completely bypassing existing WP content. That is what the technobabble, "an open channel beyond an encyclopedia", seems to mean. It seems that the role of the editing community will be to curate Wikidata. To me this means that WMF intends to walk away from the Wikipedia-that-exists and remake it as something completely different. Without even talking to the community about that. Please do explain how the KE will work, and how it relates to existing WP content, and what the role of the editing community would be in that vision. I also would like to understand what the WMF's commitment is going forward, to Wikipedia-as-it-is. Thanks. Jytdog (talk) 22:03, 17 February 2016 (UTC)[reply]

You are wrong about what is current planned. KE, as you describe it, is dead, and has been dead for a long time. Using Wikidata to improve search results is a great idea, but I personally think it would be folly to imagine that using Wikidata to generate WP-article-like content on the fly is likely to be a viable thing, ever. But some low hanging fruit strikes me as very straightforward. Not every search is a simple search for a concept like "Elvis Presley". Sometimes people search for "When did Elvis die" or "Pictures of Elvis" or "Tours of Elvis home". You can see how Wikidata, Wikimedia Commons, or Wikivoyager might be intelligently useful for such a search. It's worth exploring such ideas, and not just casually, but with some real investment: what would it take to improve search? what would the result look like? what existing open source technologies can be leveraged to do this cheaply? what would we have to develop from scratch and how hard would it be?--Jimbo Wales (talk) 16:34, 20 February 2016 (UTC)[reply]

I hear you on the importance of improving search and I hear it that Discovery is working on that. I realize it would have been more helpful had I framed my question to encompass what was envisioned, as well as what is envisioned now. Some of us are looking for a disclosure of what was, as well as what is, and in a way that makes sense out of everything that has happened, including whatever it was that upset Doc James so much. This is not something that we can just let go of. I hope you can come to understand that. So -- when you write, "KE, as you describe it, is dead", it ~seems~ that you are saying that what I am describing was actually the vision for a while. Is that the case? Jytdog (talk) 20:36, 20 February 2016 (UTC)[reply]

It will be difficult to search every Wikimedia project from Wikipedia searchbox. A first step could be the solution firefox browsers use for choosing between different search machines. And maybe this way we could create an option for open content outside Wikimedia to integrate into the community. I would like to see GLAM-Institutions integrate. As far as I see this proposal, WMF has to build an API for the searchbox, because there is no deep integration into our wikis. That shouldn´t cost many millions. --Molarus (talk) 13:40, 22 February 2016 (UTC)[reply]

3. Are you thinking of delivering machine-created pages or snippets, or generating simple, Wikidata-based articles "on the fly", in response to English and/or non-English queries? (I'm asking because of the reference to Reasonator and Autodesc here, and this statement by Denny on the occasion of Wikipedia's 15th birthday: "I want us to think about ways how to achieve a billion articles. We need tools and workflows that go well beyond Wikidata and Content Translation to really achieve that goal. Ways to allow to create and maintain a knowledge base which abstracts from natural language, and ways to generate articles in any of our supported languages on the fly. This generators have to be as community-editable and creatable as the content itself, as anything else won't scale for our means.") --Andreas JN466 21:03, 17 February 2016 (UTC)[reply]

I think Denny's statement is super interesting but don't confuse the open discussion of ideas by board members (including me) as actual plans of the Foundation. If you're interested in Denny's ideas, I recommend talking to him - he's very smart and an expert on Wikidata.--Jimbo Wales (talk) 16:34, 20 February 2016 (UTC)[reply]

It has made it into some plans by Lila. Would be good for her to elaborate on what she was thinking. Doc James (talk · contribs · email) 17:09, 20 February 2016 (UTC)[reply]

When Jytdog asked you about this on Wikipedia, you said, "Much of what you have said is mistaken, and unnecessarily paranoid. No offense intended, so let me go through this step by step, line by line. First the idea that Wikidata could be used to 'construct articles' with 'no need for editors to edit actual article content' is pretty absurd from a technological point of view. Major breakthroughs in AI would be necessary. That isn't what is intended at all, obviously." and "There is absolutely no plan that I have ever heard from anyone that articles should be built on the fly from Wikidata."

I can only think of two explanations.

You are out of the loop and had never heard about these ideas until you read this page (yet you took the liberty of being condescending and patronising to Jytdog, who had heard of them).
You were aware that these ideas were being considered, and that development work has been done on them, but decided to withhold that information and misdirect Jytdog instead.

Is there a third option, and if so, what is it? Because if either of these explanations is true, then there is little point in anyone talking to you until you either inform yourself, or undergo something akin to a Damascene conversion (a pretty rare and unlikely type of event). Andreas JN466 21:32, 20 February 2016 (UTC)[reply]

How does such a big project fit into WMF strategy?[edit]

Compared to other software teams at the WMF this is a big project, the $250,000 grant only covers a small part of it. Presumably it wouldn't have been greenlighted unless management and trustees thought it had a significant chance of doing something important for the mission. What is the strategic benefit hoped for from the Knowledge Engine? If it was going to be a rival to Google then it could have ensured that those of our readers who used it still saw the fundraising banners that are the WMF's life blood and the edit buttons that are the community's life blood. But as improved search within the projects what big thing does it achieve, other than reinforce Wikipedia's less popular sister projects such as Wiktionary? WereSpielChequers (talk) 10:21, 18 February 2016 (UTC)[reply]

Seconding the question -- Tito Dutta (talk) 19:03, 18 February 2016 (UTC)[reply]

I've given my personal answer up above. I see many many problems with our current software, and I wouldn't personally put "Discovery" at the top of the list - I think fixing longstanding problems with the editing environment and providing tools that editors want and need is a higher priority. However, I think that Discovery, broadly considered, is important - particularly in the long run - and I support the idea of moving forward with some seriousness at figuring out what we can do without a lot of cost. This is where I see the Knight grant as playing a useful role - a starting point.--Jimbo Wales (talk) 16:37, 20 February 2016 (UTC)[reply]

Evolution of plans[edit]

Answered[edit]

Max Semenik, in a comment to the WMF blog post, says:
To clarify:
- Yes, there were plans of making an internet search engine. I don’t understand why we’re still trying to avoid giving a direct answer about it.
- There has never been any actual technical work on this project.
- The whole project didn’t live long and was ditched soon after the Search team was created, after FY15/16 budget was finalized, and it did not have the money allocated for such work (umm, was it in April? in such case, this should have been soon after the leaked document was created).
- I don’t think anybody but the certain champion of the project has considered competing with Google with any degree of seriousness.
- The scrapping was finalized in summer, after said champion and WMF parted ways.
- However, ideas and wording from that search engine plan made their way to numerous discovery team documents and were never fully expelled.
- Speaking of team name, “Discovery” is not about stage one from that leaked plan. The team was initially called “Search” then almost immediately after realizing it also works on non-search projects (like maps) it was renamed to Search and Discovery then just Discovery. At the time of the second renaming, we already had no plans of actually doing any internet search engine work.
- In the hindsight, I think our continued use of Knowledge Engine name is misleading and should have ended when internet search engine plans were ditched.
- No, we’re really not working on internet search engine.
- And will not work in the future.
- For shizzle.
Is this an accurate summary of what happened? Andreas JN466 01:18, 18 February 2016 (UTC)[reply]
This timeline is logical and comprehensible - it's also direct and honest. It demonstrates the evolution of an idea into a project that is, now, quite sensible/viable. Given that, what was the need for all the obfuscation and denials (both at the start, and also since)?? Wittylama (talk) 12:36, 18 February 2016 (UTC)[reply]

My recollection of events is close to Max’s.

Some things to point out are: commercial search has not been a consideration, from conception. The Knowledge Engine (KE) as described in the grant is limited to free, open sources of knowledge. The words “making an internet search engine” reads to me too broad. From my point of view, we already have a specialized search engine, though it is highly underutilized. This is www.wikipedia.org, and our current focus with the KE grant is to make it highly usable and highly efficient at surfacing Wikimedia content. Our focus is on integrating and highlighting the high quality, community driven content across our projects. We will evaluate surfacing non-Wikimedia sourced free, open knowledge (e.g. OpenStreetMaps) on a case-by-case basis collaboratively and analytically with contributors to the Wikimedia movement on the product portal. To ensure the proper course of action going forward, deep analysis and lively discussion, based off of real data from relevant tests should precede any step forward.

I don’t agree we’ve officially “ditched” a project (Search/Discovery/KE), instead we’ve made adjustments to an idea as our thinking and learning evolved, and analysis of feedback continued. We discussed our nascent, early idea with Knight Foundation, and in discussions with them and others since, we have integrated feedback and course corrected accordingly. I’d like to thank Wes Moran, for helping us narrow down and discover our path to Discovery. I’d also like to thank our many outstanding team members at the Foundation who helped us keep our donor informed, and supportive, of these changes as they were made.

Mistakes are a natural part of evolution, and I am sincere in addressing those mistakes by creating a clear idea of what our Discovery team is working on now, and going forward. To do this, I’d like to spend more of my own, and the Foundation’s, time and resources looking forward - and less looking backward.

It is not that I do not sincerely regret mistakes of the past, especially surrounding communication and collaboration, but simply that we are unable to actually focus on our day-to-day commitments and duties to the mission when we are continually spending energy on matters which do not change nor solve today’s (and tomorrow’s) challenges. Additionally, and perhaps more importantly, we must begin to regroup and recharge, so we are strong, united and ready for the next set of challenges that we will face as an organization and as a movement. LilaTretikov (WMF) (talk) 18:26, 18 February 2016 (UTC)[reply]
Hi Lila It is understandable that there are course corrections, and it is good to hear that WMF has been in dialogue with the Knight Foundation about course corrections. That does happen. I have worked with research contracts a lot, and the KF contract is like others, in requiring that changes to the scope of the work being done requires authorization in writing from the funder (paragraph 1 of the Basic Grant Conditions). The vision and scope in the now-published KF grant is the most reliable thing we have about the vision and scope - WMF is accountable to the KF via that contract in a way that WMF is accountable to no one else. It would help build trust around what you are saying, if you would make public the Knight Foundation's approval of the course correction and what was agreed to. Please let us know if you will consider doing that. I understand you will need the consent of the Knight Foundation, but if you agree to try to do this, please keep us abreast of what unfolds. Thanks. Jytdog (talk) 22:41, 18 February 2016 (UTC)[reply]
If the Knowledge Engine's scope eventually goes beyond the basket of Wikimedia projects, what criteria will a source have to fulfil in order to be included in the Knowledge Engine's search results? Will it be open-access sources only (i.e. excluding sources like the New York Times or Nature)? Andreas JN466 21:03, 17 February 2016 (UTC)[reply]
Hello Andreas. What do you think the criteria should be? We are very early in the development of these ideas. When we are ready to consider other sources we want to have the community involved in the decision process. I believe personally that not only should the sources be open-access, but they should be in agreement with our other values, neutral point of view, free license, etc. CKoerner (WMF) (talk) 17:54, 18 February 2016 (UTC)[reply]
CKoerner (WMF), I don't harbour any illusions that I as an individual can have a significant influence on the WMF's decision here. What I do know is that readers interested in finding knowledge online don't care whether a New York Times article they're reading is CC-licensed or "All rights reserved." They can learn from the NYT article just the same as from a Wikipedia page, and are typically blissfully unaware of the difference in licensing. The only one who really cares about the difference is Silicon Valley, because they can earn money from free content (by slapping ads on it or incorporating it in commercial products). So to me, as an observer, this is kind of a shibboleth to tell whether WMF is serving the reader's interests, or Silicon Valley's business interests. --Andreas JN466 12:13, 22 February 2016 (UTC)[reply]

To give another example, CKoerner (WMF), the Stanford Encyclopedia of Philosophy (SEP) is a superb, freely accessible educational resource, generally held to be far superior to Wikipedia's coverage of this topic area, but it's "All rights reserved". To me, SEP would be a prime candidate for inclusion in a knowledge engine designed to direct readers to worthwhile educational resources. How would you or Lila see that case? --Andreas JN466 15:22, 22 February 2016 (UTC)[reply]
You can't have a significant influence? Pshaw! That's how this whole crazy thing works! Individuals like yourself giving a hoot. :) Your examples are exactly what we have to discuss if we look introduce new, non-Wikimedia, content (I hate that word) into our corner of the world. I'd like to think that people such as yourself, and many of the staff I've met, take things like licensing and copyright seriously. Just because folks outside the movement don't care doesn't mean we should give in/up! It's part of our values and deeply ingrained in our culture. Whatever we surface in search results would have to be congruent with that. CKoerner (WMF) (talk) 15:52, 22 February 2016 (UTC)[reply]
Chris, I don't agree that we would "give in/up" by pointing a user to an article in the Stanford Encyclopedia of Philosophy. On the contrary, I think we'd be betraying the Wikimedia vision statement ("Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.") by withholding that option from them. Andreas JN466 16:56, 22 February 2016 (UTC)[reply]

Hi Chris. I'm an en.Wikipedia editor and on the board of Wiki Project Med Foundation, an independent nonprofit whose aim is, among other things, to improve the reliability of Wikimedia's medicine offering.
Regarding your comment "Just because folks outside the movement don't care [about copyright and licensing] doesn't mean we should give in/up! It's part of our values and deeply ingrained in our culture. Whatever we surface in search results would have to be congruent with that." The WMF's present mission is "to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally." So, what you say fits with that.
But my vision, and I think that of most of my comrades, and the stated vision of the WMF, is of a world in which every single human being can freely share in the sum of all knowledge. That's it. I'm not wedded to (and that shared vision doesn't lock us into) an ideology that says only free-to-reuse data is valid data. So, there's a kind of disconnect here, a dissonance, between the WMF's mission and its and our vision, that becomes apparent when we discuss search and discovery. It is entirely possible - and I would think desirable - for the WMF to rethink the present formulation of its mission statement if that statement is impeding our shared vision when the WMF expands its scope to embrace search and discovery. --Anthonyhcole (talk) 13:12, 24 February 2016 (UTC)[reply]
Apologies for putting such a narrow focus on things. My intent was to alleviate any concerns that the foundation would suddenly, without consultation, include content from sources not inline with our mission or values. Say TMZ or The Inquirer as two purposefully terrible examples :). I should not have surfaced my personal opinion (which is being influenced by these thoughtful responses!) as part of the answer to this FAQ. CKoerner (WMF) (talk) 15:00, 24 February 2016 (UTC)[reply]
Hi, I'm not sure what's the plan for this page (so please feel free to move this question as it suits you best; I just followed Lila's invitation to add questions here directly). Lila wrote on February 16 to wikimedia-l that she is "not considering" a "Google-scale" search engine, and that "[g]oing after general search engine traffic and users is inconsistent with our mission." In light of this, I'm somewhat wondering why the mock-up shown at w:Wikipedia:Wikipedia Signpost/2016-02-10/Special report includes non-Wikimedia and, indeed, non-open source search results. Does that mean there was a change of plans at some point? If so, can you elaborate on the considerations behind that? Or was the slide taken out of context? (If so, what is the proper context?) Thanks in advance, — Pajz (talk) 19:42, 17 February 2016 (UTC)[reply]
@Pajz: Thanks for the question. In agile product development teams, mockups are often intentionally made to explore, push boundaries, and evaluate ideas that would require significant investment to come to fruition. For example, a mockup is quite easy to make compared to a functional prototype or a more full product, so making mockups of ideas is a (relatively) cheap way to see how an idea fits together, and whether or not it works. It's perfectly normal for product development teams to make mockups that get discarded or put on a backburner, and for other work to be prioritised instead. Even mockups that do get implemented normally change in minor ways during implementation, due to (for example) technical concerns that were not known until implementation began. Mockups are not immutable commitments of future plans. Regarding the second part of your question, there was a plan to make a more general-purpose internet search engine; that plan was scrapped and no actual technical work ever took place on it (see this blog post reply by Max Seminik for more on that. Hopefully this answers your questions. Let me know if I can provide further information. --Dan Garry, Wikimedia Foundation (talk) 01:40, 20 February 2016 (UTC)[reply]
@DGarry (WMF): I understand why this mock-up was made. I do not understand why this mock-up was shown to the Knight Foundation in April 2015.
My interpretation of this mock-up is that the Knowledge Engine is not going to be a full-internet search engine, but that it will be a search engine that includes selected non-free data sources. Is this interpretation of that particular mock-up accurate?

Is the possibility of including non-free data sources into the KE a possibility that was under consideration in April 2015. Was this consideration serious enough that it warranted showing this to the Knight Foundation? Is the possibility of including non-free sources into the KE a possibility that is still under consideration today?
—Ruud 13:10, 20 February 2016 (UTC)[reply]

Pending[edit]

A number of journalists have commented on the apparent mismatch between how the project is characterised in the Knight Foundation grant agreement, and more recent statements about what the project is and is not. Examples from the press: [1][2][3][4] In the view of all of these writers, this discrepancy has caused "confusion". How did that mismatch come about? --Andreas JN466 21:08, 17 February 2016 (UTC)[reply]
+1. Most specifically: if (as has been asserted in various places) the ambition to create a broad, general search engine ended with the departure of Damon Sicore (June 2015), why did the language of general search engines survive into the final grant proposal (September 2015), and the language used by the Knight Foundation itself on its web site? -Pete F (talk) 17:36, 24 February 2016 (UTC)[reply]
Pretty simple really, if you are Knight investing in early research, you want a broadly worded grant so that you do not crush thinking into looking at the promising. And if you are the WMF, you want any restricted grant to be as unrestricted as possible.Alanscottwalker (talk) 19:22, 25 February 2016 (UTC)[reply]
How do you feel about usability experiments where ordinary people recruited through, e.g. neighborhood postbill flyers or Craigslist gigs ads, come in to some office without any Wikimedia or Wikipedia branding (but with usability studies branding), are offered payment for their agreement to be observed answering questions requiring general reference information with a web browser on low-end phones, laptops, and tablets, with their behavior analyzed to identify deficiencies in Wikimedia search and default search engines' interface with Wikimedia projects? EllenCT (talk) 15:39, 21 February 2016 (UTC) Do you agree with Jimbo below? Is this a way to salvage the effort? EllenCT (talk) 08:41, 2 March 2016 (UTC)[reply]
I strongly support usability experiments with various classes or categories of users on a variety of devices. We have a poor understanding, mostly anecdotal and not sufficiently systematic, about how people find us, what they are looking for when they do, and also importantly how they fail to find us when they are looking for what we provide. I just went to Wikivoyage and looked at recent changes. Someone just edited the entry on Lapu-Lapu, the section about arriving by plane. Then I went to Google and searched for "Lapu-Lapu by plane" - we are the 8th link, and the ones above us are all selling plane tickets. I think that's interesting, but what I don't know is whether people who do searches like that are looking for knowledge resources or if they are shopping. This is a totally random example. I think we - in the communities - would like to know more, and I hope that the Foundation, which is uniquely positioned to have the resources to do this kind of research, does a lot of it and shares it widely.--Jimbo Wales (talk) 15:55, 21 February 2016 (UTC)

Financial questions[edit]

Answered[edit]

Hello, I am interested in the costs of "Discovery". I understand that names and goals have changed over the last months. Putting the past aside and seeing it from this moment, I would like to know whether the project/team/program "Discovery" is permanent or whether it will exist for a definite time. What will "Discovery" cost within the first year, and what in the (projected) years to come? And how much do you expect will be paid by the Knight Foundation in total? Ziko (talk) 01:03, 18 February 2016 (UTC)[reply]
I am checking specific numbers as we put them into the plan and will post below with the similar question from SarahV. LilaTretikov (WMF) (talk) 07:11, 18 February 2016 (UTC)[reply]

While it difficult to describe any product team as “permanent” - we cluster our work where there are projects and development needs, the need to focus on content discovery is unlikely to go away. The need for robust and effective search capabilities will certainly remain, but the amount of resourcing it requires and the amount of innovation may vary over time, often we make these changes within the year boundary. There have long been requests to put more resources into improving our search. It is fair to say it is a long-term project and a long-term team. The size and scope of the team or efforts will depend on the initial testing and feedback we receive. Any further plans and budget requests will be posted as part of the annual planning process. The team is modest in comparison to the Reading and Editing teams. If the Discovery team continues at its current levels of resourcing, FY 2015-2016 costs will be USD$2.4 M. The Knight Foundation grant has provided USD$250,000 towards our Discovery project and is not expected to provide further funding as of today.LilaTretikov (WMF) (talk) 18:48, 18 February 2016 (UTC)[reply]
The last estimate I have seen was 32 million over about 6 years, but that estimate was described as conservative. I also believe that it only included the team itself and not the efforts required by other departments. Doc James (talk · contribs · email) 19:14, 19 February 2016 (UTC)[reply]
The grant application to the Knight Foundation says that the "Search Engine by Wikipedia" budget for 2015–2016 is $2.4 million, and that this was approved by the Board of Trustees. [5] What was the date of the Board meeting at which this was approved, and how was the project described at that meeting? SarahSV ^talk 22:34, 17 February 2016 (UTC)[reply]
- Noting that I've requested this information on Wikimedia-l too (12 February and 17 February). SarahSV ^talk 22:41, 17 February 2016 (UTC)[reply]
And also Pine on February 12, and me on February 15. Wittylama (talk) 11:24, 18 February 2016 (UTC)[reply]
If it was "approved" this occurred before Wikimania in Mexico. Doc James (talk · contribs · email) 19:23, 19 February 2016 (UTC)[reply]
While I won't comment on the precise budget for the Discovery program, I will note that the Board of Trustees approved the 2015-16 Annual Plan on 28 June 2015.[6] I do not know how reasonable it is to assume that the Board had a full departmental breakdown of spending, as none of the public documentation includes that kind of information. I will note, however, that it was approved two days before the close of the fiscal year. Risker (talk) 18:43, 20 February 2016 (UTC)[reply]
This has been addressed by Sj, who was on the Board at the time:

First, the Board did not specially approve a $2.5M budget for 'stage 1' of the grant proposal. Financial approvals by the board happen in bulk every May-June when it reviews the annual plan. The board approved the 2015-16 annual plan, with funding for the entire staff, in June. This included funding for the discovery team. The board had not seen any version of this grant at that point; the team's budget was funded based on its public projects and targets. The budget mentioned in the grant proposal seems to be the annual budget for that team. [7]

SarahSV ^talk 00:41, 27 February 2016 (UTC)[reply]

Impact of open sourcing[edit]

If the Wikimedia community engages in "public curation of relevance", then search engines like Google and Bing will (via the API) be able to take full advantage of any insights gained in order to improve their own products, right? (For example, if Discovery focuses on open-access sources, then Google would be able to add an "open-access" option to their search engine to complement the "Web", "News" etc. options and use the Wikimedia results; or they would be able to add snippets from the best sources Wikimedia volunteers identify to their Knowledge Graph, wouldn't they?) --Andreas JN466 21:03, 17 February 2016 (UTC)[reply]
This is correct, and not unlike our current approach to serving open knowledge. All of our volunteers’ contributions are freely used and reused today. This is a fundamental tenet of our values. What a highly usable discovery portal would do, however, is strengthen and unite the individual elements that compose our most enduring and widely-recognized asset: our brand. Wikimedia maintaining a strong, trustworthy brand is vital to both our mission, and to the open knowledge movement at large. LilaTretikov (WMF) (talk) 00:51, 18 February 2016 (UTC)[reply]
Lila in my view WMF's biggest asset is the good will it has with the volunteer community, which actually creates and maintains the content under the brand. Yes among the various bits of IP that WMF actually controls, the trademark and logo are by far its strongest assets. But lose the community and that brand becomes worth very little. How much is the Nupedia brand worth today? What do you say to this? Jytdog (talk) 01:07, 18 February 2016 (UTC)[reply]
Jytdog, Wikimedia refers to the movement in total, not to the Wikimedia Foundation. :) It encompasses volunteers as well as staff. --Maggie Dennis (WMF) (talk) 13:56, 18 February 2016 (UTC)[reply]
Mdennis (WMF) what I was reacting to was Lita's statement that in her view the biggest asset is the brand. It is how for-profit executives think - I understand that. It doesn't make it less disappointing to hear. Jytdog (talk) 20:41, 18 February 2016 (UTC)[reply]

Thank you. The board and the mailing list recently discussed charging re-users for API usage – an idea I tend to be in favour of as long as no one is able to buy "premium" access, because such payments would involve a recognition of value. I don't think unpaid volunteer work and donor-supported bandwidth costs should be expended so that Google, Bing, Apple etc. can get commercially valuable relevance data for free that will add billions to their ad or product revenue. Is this idea of charging for API usage still being entertained? --Andreas JN466 11:42, 18 February 2016 (UTC)[reply]
If you think there was something google, microsoft or apple could do to add billions to their ad or product revenue don't you think they would already be doing it? Its not as if those companies aren't holding billions in overseas bank accounts that could be spent on such a project.Geni (talk) 18:51, 18 February 2016 (UTC)[reply]
As Google Knol demonstrated, there are some things you can't buy; volunteer engagement is one of them. Andreas JN466 02:58, 19 February 2016 (UTC)[reply]
Again you claimed billions. What would they need volunteers for?Geni (talk) 05:22, 19 February 2016 (UTC)[reply]
Human curation of relevance supplementing their alogrithms, specifically with respect to open-access sources they can reproduce freely on their own pages. As for billions, "Google's advertising business drove its sales strength in the fourth quarter [of 2015]. Revenue from advertising climbed to $19.08 billion, up 17 percent from the previous year, as sales for Google websites specifically rose 20 percent. The Internet giant's aggregate paid clicks, a key advertising metric, increased 31 percent from the previous year, beating consensus expectations of about 22 percent ... The Knowledge Graph panel (as well as the answer box) is one of the product improvements driving that increase in profitability. It trains users' eyes to pay most attention to the parts of the screen (top and top right) where the paying ads are.

So the question is whether human curation of search engine relevance – which isn't something the community has traditionally considered one of its main tasks – is worth volunteers' while. Traditionally, the community has highlighted good sources – open access or no – by citing them. So, when we speak of the Wikimedia movement getting into relevance curation for search engines (Wikimedia's own and others), what I'm trying to understand is, what's in it for the Wikimedia movement and volunteer community? If commercial players derive substantial benefits from relevance curation, while imposing huge overheads on donor-supported Wikimedia bandwidth, maybe it's right that they pay for the privilege? Andreas JN466 13:48, 19 February 2016 (UTC)[reply]

What is meant by Google being able to reduce the success of the project?[edit]

The grant application says under "Key challenges that could disrupt the project" (p. 13):

Risks: Two challenges could disrupt the project: 1. Third-party influence or interference. Google, Yahoo or another big commercial search engine could suddenly devote resources to a similar project, which could reduce the success of the project. This is the biggest challenge, and an external one. ... The way to mitigate the first challenge: Proceed with the search engine project as deliberately as possible – which is what the Wikimedia Foundation is doing.

This is one of the passages that people have wondered about, because it's not clear how Google or Yahoo would be in a position to reduce the success of a Wikimedia search project. SarahSV ^talk 06:56, 18 February 2016 (UTC)[reply]

Former VP of Engineering Damon Sicore, who as far as I know conceived the 'knowledge engine', shopped the idea around in secret (to the point of GPG-encrypting emails about it) with the idea that Google/etc form an 'existential threat' to Wikipedia in the long term by co-opting our traffic, potentially reducing the inflow of new contributors via the 'reader -> editor' pipeline. More ambitiously there was some talk about trying to capture more total web search mindshare/user-share... obviously since Google/etc have butt-tons of money they can much more effectively grab the user share, making our potential project unpopular until it gets canned... I guess? Given the secrecy at that stage, I assumed Damon was just a bit ... 'colorfully' paranoid about things like Google hiring people away or organizing their offerings to more thoroughly hide us... obviously if we'd gone through with a giant search engine it would have been public knowledge before we *did* it, so it never made much sense to me to hide it other than in coordinating an initial organizational/PR 'blitz'. It kind of feels like Lila stayed in 'KE is secret project' mode while everyone else moved away from it, but again I've not been in the loop for this stuff... --brion (talk) 17:50, 18 February 2016 (UTC)[reply]

Brion, thanks for the reply. The secrecy has been the most puzzling thing. Improving internal search is a great idea. Extending it to other sources is also interesting, and I like the idea of using wikipedia.org. But the secrecy has made people think that some other, huge thing is being planned. SarahSV ^talk 02:07, 19 February 2016 (UTC)[reply]

It is important, most likely, that people know that Damon's secrecy was not something that was known to me or the rest of the board. I've only yesterday been sent, by a longtime member of staff who prefers to remain anonymous, the document that Damon was passing around GPG-encrypted with strict orders to keep it top secret. Apparently, he (and he alone, as far as I can tell) really was advocating for taking a run at Google. The idea got no traction, and certainly was never part of any board deliberations. Damon is no longer with us, and the project did not move in that direction, so for me, I think of it as more of a "brain storming" concept that he was pushing but which didn't go anywhere. I don't know Damon (met him once or twice) so I have no idea why all the cloak and dagger.--Jimbo Wales (talk) 16:43, 20 February 2016 (UTC)[reply]

Thanks Brion, that seems about right. The issue I have had is not so much the idea itself. It is the purposeful lack of transparency around the idea. Building an open search engine and worrying about google finding out is a big enough thing and one we were talking about spending 32 M over 6 years on. Now 32 M is nothing for google but it is a fair bit for our movement. Doc James (talk · contribs · email) 02:28, 19 February 2016 (UTC)[reply]

SarahSV, I am one of them. The "open channel beyond an encyclopedia" from the scope of the Knight Foundation grant, that would be accessed from wikipedia.org, remains a great point of concern for me - that "beyond" = "bypassing." I really do want to hear what the WMF boards vision is for promoting and better enabling the Wikipedia-that-exists (the encyclopedia) even as it works to provide "knowledge-graph-like" results through wikipedia.org (the "channel beyond"). I want to hear a vision that opens both of those boxes at the same time. That set of questions is at the top of the list now and I am looking forward to Lila addressing them. Jytdog (talk) 05:14, 19 February 2016 (UTC)[reply]

Jytdog, I think "beyond" just means "as well as" in this context. SarahSV ^talk 05:28, 19 February 2016 (UTC)[reply]

Maybe but that is a quite a strain on "beyond". :) I would grant you just poor word choice, but a) Denny's vision as expressed by Andreas away up above, is about walking away from WP-as-it-exists and toward that, this video shows the discovery team playing with it just last summer, and Approach 6 discussed here says it too. I would be happy if you are right (or it was just bad word choice), or opening up only box at a time, but I am looking forward to Lila's answer about how the two fit. Jytdog (talk) 05:57, 19 February 2016 (UTC)[reply]

Branding[edit]

Answered[edit]

Was the recent seizure of the Wikipedia.org portal related to any of the Knowledge Engine plans? --Yair rand (talk) 14:51, 18 February 2016 (UTC)[reply]
Yes this is where the Knowledge Engine is to "live". And I think many of us agree it could be improved. But I personally believe those improvements should be done with community involvement. Doc James (talk · contribs · email) 03:23, 19 February 2016 (UTC)[reply]

Nothing was seized, stolen, or locked away. We had a discussion, in public (https://phabricator.wikimedia.org/T110070) with the community members who were involved in the maintenance of this portal. Those who actively contributed to the upkeep of the portal agreed in the decision to move.

By moving the Wikipedia.org portal (not specific to any language wikipedia) we are able to standardize the maintance of all portals to be the same (less tools and process to muck with) and provide a better foundation for future improvements, such as A/B testing, understanding javascript support, and further enhancments to the portal experince.

The process for updating the portal remains the same, contributors make suggestions for edits and they are implemented by volunteers and Wikimedia Foundation staff when needed. The difference is that it's no longer the sole portal managed by a wiki page containing more HTML than wikitext that was inflexible and updated by hand (for the counts of articles). CKoerner (WMF) (talk) 20:13, 19 February 2016 (UTC)[reply]

Pending[edit]

The various sketches show the search engine being hosted on wikipedia.org and branded as Wikipedia. However, there are two problems:
1. Wikipedia is an encyclopedia that has search capabilities, and is not a search engine (even just for Wikimedia sites) backed by an encyclopedia. The sole purpose of Wikipedia is to build an encyclopedia and any deviation from this purpose will dilute the brand. (This does not include adding non-intrusive to Wikipedia's search results page that highlight sister project content).
2. The contents of wikipedia.org are controlled by the community.
Both of these become a non-concern if this project is given a separate brand with a separate domain (e.g. Wikisearch or Wikimedia Search?). Is that going to happen and will the look and feel of the landing page be controlled by the community? MER-C (talk) 13:20, 18 February 2016 (UTC)[reply]
The contents of wikipedia.org is no longer controlled by the community. It was seized by the WMF some months ago, and they have no plans to give it back. --Yair rand (talk) 15:29, 18 February 2016 (UTC)[reply]
The plan as I read it is to expand what the name "Wikipedia" means. Thus Wikipedia would no longer be just a wiki based encyclopedia, it would also be this new knowledge engine which is planned to go at www.wikipedia.org As mention that page currently gets "200 to 300 million pageviews" per months.[8]

We all realize that we have one primary brand and that is "Wikipedia". I guess the question is should we try to leverage this brand for all the sister sites? We would than have Wikipedia Commons, Wikipedia Voyage, Wikipedia Source, which could still be abbreviated WikiVoyage, WikiSource etc. It is an interesting idea and one that deserves discussion. Doc James (talk · contribs · email) 03:18, 19 February 2016 (UTC)[reply]
I disagree, for reasons outlined at Wikimedia Foundation Board noticeboard#Build or rebuild. Layering over the Wikipedia model not changing it seems to me the way forward. Rogol Domedonfors (talk) 22:42, 19 February 2016 (UTC)[reply]

I entirely agree with James. I think we'd be much more successful with our non-Wikipedia projects, if they made use of the Wikipedia brand. I don't think a significant part of our readers would be able to tell whether Wikivoyage or Wikitravel is the wiki affiliated with the people behind Wikipedia (i.e. Wikimedia). "Wikipedia Travel", "Wikipedia Books", "Wikipedia News" etc. would be much more obvious --Tobias talk · contrib 12:17, 20 February 2016 (UTC)[reply]
I am tentatively supportive but felt that such a major change should only be carried out following discussion and if their was significant community support. Doc James (talk · contribs · email) 15:43, 20 February 2016 (UTC)[reply]

Curation[edit]

Pending[edit]

You have pitched the noncommercial nature of the KE pretty hard. If Wikidata will remain something that "any one can edit" and editors' privacy will be as strictly protected as it is on en-wiki, how will the integrity of Wikidata be maintained? Please open the "privacy" box and the "integrity" box fully and at the same time, when you answer the question. If you are not aware of it, SEO companies are already writing articles in their trade magazines about how to manipulate Wikidata to benefit their clients. Thanks. Jytdog (talk) 22:18, 17 February 2016 (UTC)[reply]
I am concernd about the "curation" element of the Knowledge Engine proposals. While I am assured that current plans do not involve asking the volunteer community to do more or different work, I would like to have the same degree of confidence that no proposals will be considered for curation that would rely on volunteer effort without a very clear community consensus that that effort is likely to be forthcoming, either from within the existing community or with a very clear pathway to developing new parts of the community to support the curation required. Rogol Domedonfors (talk) 22:42, 19 February 2016 (UTC)[reply]

Miscellaneous[edit]

Answered[edit]

Would you please explain why you are putting out statements to the media, saying: "What are we not doing? We’re not building a global crawler search engine." As far as I know, nobody has ever said WMF was trying to build a crawler. It seems to me that you are trying to divert people from the real issue - namely that the vision is that wikipedia searches results will be "better" for certain kinds of queries than the results people can get through commercial search engines (this is the argument made to the Knight Foundation), and that also having better intra-WMF site searches will help keep people who are already in our domain, in our domain. All of that is clearly competing with Google and other commercial search engines for certain kinds of searches, and keeping users from leaving us for them. Would you please address why you are diverting people from the point with this "crawler" stuff, and not addressing this plainly? Thanks. Jytdog (talk) 04:56, 18 February 2016 (UTC)[reply]
Crawler is a technique used to implement a broad commercial search and index the entire web. This is not, and has not been, a goal of the WMF. I appreciate your question, and your understanding that a “crawler” is not our goal -- however, responses and inquiries indicate some have interpreted we are doing this. So reinforcing our position on this distinction seemed important in the blog.

I do not see the Wikipedia portal producing necessarily better results than a commercial search engine, but rather, results of a different nature. In my eyes, the distinction is really made at user motivation. When a user opens a search engine, are they looking to find something, or learn something? Users who’d like to find something, can easily use one of the many available search engines and they will be returned a mix of organic and sponsored results, and will then eventually narrow down their search and find what they are looking for. Users who’d like to learn something, pretty much have to do the same thing. And this is precisely why I’d like to improve www.wikipedia.org, to make learning motivated searches much more efficient than generalized searches. For example, if search for Dr. Faust in Google (on mobile) yields the local doctor in my area (followed by a Wikipedia entry), in Wikipedia I simply will find Goethe’s Faust.

Thus, in addressing your assertion that “for certain kinds of searches” we would be competing with other search engines - seems correct -- and true even today. We are improving our existing search portal to empower motivated, inspired learners and knowledge seekers to learn from a trusted, free, non-proprietary, openly available knowledge source. Also, a source which makes search and learning an irresistible journey.

Currently, we receive 200 to 300 million pageviews on our www.wikipedia.org portal page, per month. Google receives 100 Billion, of which half come from mobile. We are a 58.5 million dollar non-profit, while Google is 67.39 billion dollar commercial corporation. Google is a broad range commercial search engine, one product of many, owned by the Alphabet holding company. We are not competing for market share on generalized searches. We are simply trying to help keep our movement of free, open knowledge alive and and relevant as times and user behaviors change. LilaTretikov (WMF) (talk) 09:17, 18 February 2016 (UTC)[reply]
Thanks for replying but the beginning is really off-target. What I wrote was that almost nobody has been writing that WMF is building a crawler; the thrust of the concerns is not about that. Putting the statement out there, that WMF is not building a crawler, seems to be a pretty classic diversionary/spin tactic as far as I can see. Which is not the kind of behavior I look for from the WMF. But it seems you don't want to address that. Glad we are on the same page about the areas where we always have competed and will keep competing with commercial search engines. Jytdog (talk) 20:46, 18 February 2016 (UTC)[reply]
A survey has shown very low staff morale at the Foundation and there have been a number of departures, notably in "community Engagement" including people resigning without even having another job to go to. Is the Knowledge Engine the cause of this, and if so is it because of the way it was hidden from the community or something else? WereSpielChequers (talk) 09:57, 18 February 2016 (UTC)[reply]

Hello again WereSpielChequers. I joined the Foundation as part of community engagement less than a month ago. I was fortunate enough to attend a few Wikimedia Foundation supported events last few years and thought, “Hot damn they are doing some cool things.” I left a comfortable, steady job in healthcare IT to do this work because it excites me. As a volunteer I wasn’t able to dedicate as much time as I would have liked to improving the work of the movement.

Yes, some folks at the Foundation are frustrated. Some have left out of that frustration and burnout. Our leadership - hell the whole organization - is facing some incredible challenges as we grow and the social impact of Wikipedia continues to be ingrained in cultures around the world. It’s not easy work. I’m empathetic to anyone who is feeling down.

There isn’t a single person I’ve meet at the Foundation that doesn’t consider themselves an equal part of the movement - including caring an absurd amount about doing the right thing - which is often the hardest kind of work.

So, short answer to your question. The term Knowledge Engine or more specifically the Knight Foundation Grant is not the only cause of low morale, but the amount of secrecy and lack of clear communication around it definitely contributed. This FAQ, the folks from the Foundation here answering questions, and the direct and hopefully illuminating information is proof that we are are learning and moving forward. CKoerner (WMF) (talk) 19:00, 18 February 2016 (UTC)[reply]

Hi User:CKoerner (WMF), I'd like to extend on that a bit. First off, it's exciting to have you at WMF, you've made a great first impression, and you've been hired into a critical role in a challenging position. That said, it was indeed the Knowledge Engine that prompted both WMF Fundraising and Doc James to raise deep concerns about transparency, integrity, accountability, community awareness, fiscal responsibility, and strategic alignment within and between WMF and the Community.

It was Doc Jame's fierce investigations into what was actually proposed to our Board and then actually approved by Knight Foundation that led concerns about executive leadership to pretty much explode. It was our Fundraising team's denied insistence that we fairly and accurately represent our motives and our intentions to a) Knight Foundation; b) WMF's Board; c) WMF staff, including the Discovery team itself; and d) the community and our many affiliates--that revealed deep breaches of trust and serious gaps in information among these parties. It was the continued lack of transparency around the published grant's explicitly described scope that worsened the belief and furthered evidence that staff were having something 'put upon' or 'pulled over' us.

Indeed there are many other concerning issues that have brought WMF morale to a crippled state, but Knowledge Engine was both the catalyst for and in many ways the epitome of those concerns. Having been in the CE department since Lila arrived (or it's pre-reorg equivalent), I can say that very much it has been an anchor for discontent and drowning morale--albeit not the only weight. Jake Ocaasi (WMF) (talk) 18:07, 19 February 2016 (UTC)[reply]

Thanks for the welcome Ocaasi (WMF). You are not the only staff member to reach out to me on my poor interoperation and summary. I apologize for presuming to speak too simply for such a complex matter. CKoerner (WMF) (talk) 20:25, 19 February 2016 (UTC)[reply]

Pending[edit]

The WMF is very good at economically hosting large heavily used databases on the Internet. It runs a top ten website in terms of popularity, and it does so on a budget smaller than most top thousand websites. But it doesn't have a good record as a software house. AFT, Mediaviewer, FLow and Visual Editor have cost a lot of money, damaged relations between the WMF and the community, and left a suspicion that even when the WMF is as with V/E trying to write software that the community wants, it is capable of doing so in a way that does more harm than good. With that track record is it wise to go for another major software investment that doesn't align well with the movement's needs? WereSpielChequers (talk) 10:33, 18 February 2016 (UTC)[reply]
We believe improving search does align quite well with the movement’s needs; it was repeatedly mentioned in our 2015 strategy consultation, where it was one of the top 15 themes of comments, especially by anonymous responders. Facilitating finding the content our movement produces and curates is a pretty critical component of disseminating it effectively. --CKoerner (WMF) (talk) 20:16, 19 February 2016 (UTC)[reply]
Hi CKoerner. Going back to my own submission in 2015 I think it was my point 7, that's higher than fifteenth for me. So I wouldn't dispute that there is some alignment with community needs, but it doesn't align well with community needs. For example why give search more resources than say GLAM or Education? Especially given the WMF's sub optimal record in software development? WereSpielChequers (talk) 20:59, 19 February 2016 (UTC)[reply]
In this statement about the Knowledge Engine, Lila Tretikov answered the question "Why should the community and staff support this decision of our board and leadership?" with the following: "I would hope that for staff, the answer to this question is clear.". It's not clear to me, so I'd like to request a clarification, especially in light of (1) the record insatisfaction among staff (2) the high level of staff resignations and medical leaves (3) the widespread allegations of unfair treatment by the hierarchy. Cenarium (talk) 02:05, 24 February 2016 (UTC)[reply]