Grants talk:IEG/Understanding the English Wikipedia Category System
- 1 Few questions
- 2 Eligibility confirmed, round 1 2014
- 3 Prior work?
- 4 Interesting proposal
- 5 Related Work
- 6 Phases
- 7 Number of category links clicked
- 8 Aggregated feedback from the committee for Optimizing Wikimedia Category Systems
- 9 Round 1 2014 Decision
- 10 Location for worklog
- 11 Guiltiness?
- 12 How does this connect to...
Thanks for sharing your idea and how this can improve the user experience. After reading the 95% of your idea, I have a few questions:
- What do you think about Wikidata and the future of how we categorize?, the proposed taxonomy system is a future step in Wikidata (or will be, I don't know), because is related with web semantic (or semantics, as you want): subject + verb + noun.
- Is this idea extensible in other Wikipedias?.
- About "Categorization guidelines", do you want write a new guideline or just a recommendation?
- About "Usage of category links", what tool(s) do you use to measure the category links? (just a question).
[I reformatted your list of questions from bulleted to numbered for easier referencing.]
- Loading and managing common data centrally makes so much sense. Although there are some upfront development costs, once the functionality is implemented, many sites benefit when data is added, changed, or deleted. One analogy is database normalization: Wikidata helps us achieve a deeper level of "normalization" in the Wikimedia infosphere.
- I certainly hope so!
- I will take the standard Wikimedia (and public scholarship) approach: I will work with Wikimedia communities throughout the project. We as a community can decide whether it makes more sense for me to just provide my analysis of the categorization guidelines, to make recommendations about the guidelines, to work with other Wikimedians to draft proposed categorization guideline revisions, etc.
- I will use various existing Wikimedia community statistics and tools (such as the API) for the project, as well as doing my own programming.
Eligibility confirmed, round 1 2014
This Individual Engagement Grant proposal is under review!
We've confirmed your proposal is eligible for round 1 2014 review. Please feel free to ask questions here on the talk page and make changes to your proposal as discussions continue during this community comments period.
The committee's formal review for round 1 2014 begins on 21 April 2014, and grants will be announced in May. See the schedule for more details.
Did you look into things like Commons:User:Multichill/Next generation categories, Commons:User:Multichill/Commons Wikidata roadmap, Commons:User:CategorizationBot, CommonSense, category's main topic (P301), topic's main category (P910), category combines topics (P971)? Multichill (talk) 18:55, 8 April 2014 (UTC)
- Yes, I am familiar with those. Phase 1 is mostly exploratory/descriptive research. In later phases these and other proposals that have been made would come to bear. Libcub (talk) 03:45, 10 April 2014 (UTC)
- I also recommend commenting on those things. You found the Beyond Categories discussion and messaged there. My first thought is that the part of your research which is looking at category intersections would not be appreciated because no one who has commented on it has ever liked this system. It has always been forced on the community by the infrastructure and there are best practices for using it, but I would not want this horrible system studied or anyone else to learn it and many people wish for Wikidata to supplant it entirely.
- If you addressed the intersection and Wikidata concern then I would comment on other parts of your proposal if you pinged me. One other issue you raise is, for example, the LGBT issue, or how to address categories for topics which are debatable. Perhaps you have seen the extensive discussions about historical figures being categorized as LGBT when the designation "LGBT" is a modern descriptor. The current consensus is that descriptors in categories are allowed when the descriptors are present in the article, which moves this issue out of the domain of specialized categorization rules. While I anticipate that category migration to Wikidata would raise new aspects of this issue, I do not immediately recognize how researching the current practices will inform a future practice which - as you note - will have interlanguage implications. There just is not much interlanguage data to be had about categories, and without Wikidata, it would be difficult to research this.
- Nice to meet you. Have you attended or will you attend these classes at UW? They are relevant. Blue Rasberry (talk) 00:16, 9 April 2014 (UTC)
- I am indeed aware that there are many opinions on the category systems in WMF wikis. (That is one reason why researching ways to improve them is such an interesting project!) There have been discussions on the English Wikipedia that include strong support. I posted in the major places that seemed relevant, whether I shared the predominant opinions on the page or not, because I really do want feedback from all sides. Yes, I am familiar with the English Wikipedia discussions on categories such as LGBT people, and I am indeed participating in the Community Data Science Workshops that are being held here at UW. I would say the first one was quite successful! Libcub (talk) 04:02, 10 April 2014 (UTC)
I have a love-hate relationship with categories, but of course I use them heavily and I think any one of the typical Wikipedian "hobby editors" out there does. It's still today the only way to find new stuff that gets added to the project. I think you address several interesting issues in your proposal and I never thought of taking a step back and analyzing what we already have. Like many, I have latched on to Wikidata as a way to move forward from all of the problems I personally have with categories. However, I also see lots of problems with Wikidata, especially the complex nature of search combined with property creation and management. There is a lot to be said for the searchability of the categories, despite their other drawbacks. So yes, I would definitely be interested in this research, though I am uncertain what it would deliver besides interesting reading. That said, I think you need to take the plunge and report along the way. We don't know how valuable the category tree is until we know what it looks like in the bigger picture and how it's currently used. Jane023 (talk) 16:22, 9 April 2014 (UTC)
It is great to see proposed work relating to the category system. I would suggest the following pieces of related work be considered for inclusion in this proposal:
Voss, J. 2006. Collaborative thesaurus tagging the Wikipedia way. Wikimetrics research papers, (1)1. Retrieved from: http://arxiv.org/abs/cs/0604036.
Kittur, A., Chi, E. and Suh, B. (2009). What's in Wikipedia?: mapping topics and conflict using socially annotated category structure. In Proceedings of the 27th international conference on Human factors in computing systems,1509-1512.
Hecht, Brent Jaron. 2007. Utilizing Wikipedia as a spatiotemporal knowledge repository. Thesis (M.A.)--University of California, Santa Barbara, 2007.
Akdag Salah, Almila, Cheng Gao, Krzysztof Suchecki, and Andrea Scharnhorst. 2011. “Generating Ambiguities: Mapping Category Names of Wikipedia to UDC Class Numbers.” In Critical Point of View: A Wikipedia Reader, ed. Geert Lovink and Nathaniel Tkacz, 63–77. Institute of Network Cultures, Amsterdam.
Holloway, T., Bozicevic, M. and Börner, K. Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Complexity, 12 (2007), 30–40.
Katherine Thornton and David W. McDonald. 2012. Tagging Wikipedia: collaboratively creating a category system. In Proceedings of the 17th ACM international conference on Supporting group work (GROUP '12). ACM, New York, NY, USA, 219-228.
- I had a fair amount of internal debate on how to handle connections to prior work. In most grant proposals of course, that is fundamental. When I was working on another IEG proposal for the first round last year, I didn't see other proposals that cited references. And the ones I looked at were very practical, with near-term impact. I was hesitant this go-round to make it seem that my proposal is "just research" and not sufficiently related to the real WMF world. I didn't want potential endorsers to see the references to scholarly articles, and be bored or scared off. Which I suppose is silly. Perhaps I have internalized some expert-phobia from part of the Wikimedia community. I see that others this round have included citations. I think you are right; I will add references. Conveniently, the 2 authors of the last work you list are in my building. :-) Libcub (talk) 04:35, 10 April 2014 (UTC)
- Let me put the question more bluntly: why on earth you should be given money to produce yet another study on categories? Thanks, Nemo 16:17, 12 April 2014 (UTC)
- Sorry, that really doesn't answer the question. Every research was (or claimed to be) different from the previous ones and useful for reconsiderating the system. --Nemo 06:52, 15 April 2014 (UTC)
- Back to related work:
Classifying Taxonomic Relations between Pairs of Wikipedia Articles (IJCNLP 2013) may be of interest. Jodi.a.schneider (talk) 15:50, 10 May 2015 (UTC)
The idea of researching categories on Wikipedia is interesting. It seems to me that most of the value in this proposal would be in phase 2 and phase 5. Is there a reason that phase 1 as you have designed it is necessary before phase 2 and phase 5? My impression is that most of the phases could be done independently. --Pine✉ 03:52, 16 April 2014 (UTC)
- I agree that the order that I have the phases in is not crucial, and I am open to considering other orders. One reason for beginning with Phase 1 is that it is the cheapest and the shortest phase. I thought it would be a good way for me to establish my credibility as a funded researcher within the Wikimedia community. Also, I generally believe it is useful to understand the basics of a system before looking at how it is used. That can help guide the questions and direction of future phases. I also hope that the data uncovered in phase 1 will prove useful to other researchers. Libcub (talk) 02:17, 21 April 2014 (UTC)
- Yes, Phase 2 is certainly the bit I would be more interested in. Despite working extensively on categories on both the English Wikipedia and Commons for several years, I still have no idea how much people actually use them! That does seem like the more logical place to start, and it's ground that has been less well-trodden in the past (echoing Nemo's concerns above). the wub "?!" 23:10, 19 April 2014 (UTC)
Do you have any idea on how to count the number of category links clicked? As far as I know, there has not been a public dataset that enables us to measure something like "X people clicked the page A directly after visiting the page B." I would like to know how RCom and WMF Analytics team think about the feasibility. whym (talk) 04:47, 20 April 2014 (UTC)
- Would you perhaps want to drop that part or make it optional from this proposal? If my understanding on the current server infrastructure is correct, collecting these numbers would require developing/modifying and inserting a new tracking code to MediaWiki which likely needs support from someone from the WMF Engineering. I'm not sure such cost for software development and negotiation would be worth it. I'm sure, if it is feasible, it will certainly make a useful dataset (especially if you are going to share it freely), though. whym (talk) 10:18, 22 April 2014 (UTC)
Aggregated feedback from the committee for Optimizing Wikimedia Category Systems
Thank you for submitting this proposal. The committee is now deliberating based on these scoring results, and WMF is proceeding with it's due-diligence. You are welcome to continue making updates to your proposal pages during this period. Funding decisions will be announced by the end of May. — ΛΧΣ21 23:54, 12 May 2014 (UTC)
Round 1 2014 Decision
Congratulations! Your proposal has been selected for an Individual Engagement Grant.
The committee has recommended this proposal and WMF has approved funding for the full amount of your request, $9750
Comments regarding this decision:
We appreciate your interest in engaging closely with the community to produce research that will better facilitate the community’s work and decision-making. We look forward to seeing the project progress over the coming months!
- You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
- Review the information for grantees.
- Use the new buttons on your original proposal to create your project pages.
- Start work on your project!
Location for worklog
at Grants:IEG/Understanding the English Wikipedia Category System#Additional information on the research and development agenda: Facilitate the work of category editors, by (...) Improving the categorization guiltiness by rewording areas that are confusing. There's a typo right, or how to understand this? --Francis Schonken (talk) 15:59, 18 July 2014 (UTC)
- Yes, that is indeed a typo--thanks for pointing it out to me! I've corrected it. Libcub (talk) 16:55, 18 July 2014 (UTC)
How does this connect to...
Above (and in the proposal) there's some language on how this could connect to Wikidata. This is beyond the (current) first phase I suppose. I mean, I don't see any data being gathered (yet) that directly checks whether current wikidata (on interwiki-links) are correctly represented in en.wikipedia pages. This would extend the scope of the project beyond "en.wikipedia", and for that reason is probably left out in this stage (please correct me if that is not the view this first phase of the project is working from).
On the other hand I see at least two fields that would be interesting to gather data on with regard to possible future developments (as exemplified in Grants:IEG/Understanding the English Wikipedia Category System#Intended impact), and that would keep within en.wikipedia, and so within the scope of this project stage. So I'd like to ask these questions: how does this relate to...
(Of course persondata is limited to the main biographical article on a person) ... nonetheless would it be worth while to check for some biographical articles whether or not the content of the persondata qualifiers matches the categorization qualifiers?
A future development might be that for instance persondata-like invisible content is used for what is called facets in the proposal, and can thus be used to compose intersection categorizations?
(See for instance en:Wikipedia:Categories, lists, and navigation templates#Navigation templates) ... it might be that some articles have few categories, but nonetheless have extensive links to related topics via navboxes at the bottom of the page (see for instance en:Crazy for You (musical)). It might be worth examining whether (for instance) extensive navigation provided by navboxes is one of the indicators/predictors of lack (or excess?) of linking via categories? Also: do the two systems usually overlap in the provided information, or are they rather complementary as far as linking to related articles is concerned?
For future development this might be relevant to know whether the structured display of categories (as in the examples provided at Grants:IEG/Understanding the English Wikipedia Category System#Detailed example) is something readers and editors would perceive as a category/navbox hybrid solving (or fueling eternal discussion about) tension between a categorization and navbox approach. Note that categorization of persons is identified as the single most contentious area in categorization. I suppose the categories, lists and navigation templates tensions come second.
- Correct, the Wikidata piece is currently in Phase 4 and 5. I think studying metadata in Wikipedia beyond the category system (persondata, infoboxes, lists, etc.) would certainly be worthwhile. I doubt I will have time to do that within this project phase, but if you or others do, that would be great! I would be more than happy to talk with anybody who might take this on. Or I might be able to build that into future projects. There is certainly not a lack of relevant things to study! Libcub (talk) 02:47, 24 July 2014 (UTC)