Grants:IEG/Understanding the English Wikipedia Category System

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Individual Engagement Grants This project is funded by an Individual Engagement Grant

proposal people timeline & progress finances



status: selected

project:

Understanding the English Wikipedia Category System


project contact:

pjweiss at uw.edu

participants:


grantees: Paul J. Weiss, Libcub on en:WP, etc.

advisors: Jevin West, Jin Ha Lee, & Allyson Carlyle, all faculty here at the University of Washington's Information School

summary:

Investigate the nature of the English Wikipedia's category system, as the first step in designing ways to optimize category systems throughout WMF wikis.

engagement target:

English Wikipedia

strategic priority:

Improving Quality

total amount requested:

9750 USD


2014 round 1

Project idea[edit]

What is the problem you're trying to solve?[edit]

For the first four years of Wikipedia (2001-2004), the category system did not exist.[1] We are now approaching the 10th anniversary of the implementation of the category system. This seems like an opportune time to devote resources toward understanding the category system, its current uses, and ways we might improve upon it.

What is your solution?[edit]

Basic research. There has been some investigation into using WMF wiki category systems outside of WMF projects, such as in automatic categorization in other large textual databases and in taxonomy development. However, there has been remarkably little research looking at WMF wikis' category systems as they exist and are used in WMF wikis themselves, by their readers and editors.

Project goals[edit]

The research and development agenda[edit]

I envision a multi-phase research and development agenda with the ultimate goal of designing ways to optimize the category systems in WMF wikis.

  • Phase 1. Determine the nature of the category system in the English Wikipedia.
  • Phase 2. Investigate how readers and editors utilize the category system in the English Wikipedia.
  • Phase 3. Investigate the category systems in other language Wikipedias and in other WMF projects.
  • Phase 4. Explore the value and feasibility of using Wikidata as the basis for the category system across WMF wikis. If deemed appropriate by the community, work with the community to develop and implement this.
  • Phase 5. Utilize user-centered design methodologies to prototype various enhancements to the category system to improve the user experience. If deemed appropriate by the community, work with the community to develop and implement such enhancements.

This project[edit]

This project is phase 1 of the broader research and development agenda: To determine the nature of the category system in the English Wikipedia.

  • Generate basic statistics on various aspects of the category system.
  • Develop predictors for selected variables related to the category system.
  • Develop taxonomies of some aspects of the category system.
  • Analyze similarities and differences in the categorization of pages across namespaces.


Project plan[edit]

Scope[edit]

[For the sake of brevity, in this section, "Wikipedia" is used to mean "the English Wikipedia" unless otherwise noted. Much of this section applies to many of the various language Wikipedias, but I have not yet examined other Wikipedias to know the extent of that.]

Understanding the nature of a system is a crucial first step in optimizing it. What are the basic aspects of Wikipedia's category system?

Number of categories per page[edit]

The number of categories per page varies widely. For example, in the main article space:

This is true even for similar topics:

Are there predictors of number of categories assigned per page? Is there a relationship between the number of categories assigned to an article and: article length, the number of edits the article has, the age of the article?

Deliverables for Phase 1:

  • Statistics for numbers of categories per page (overall, by namespace, etc.).
  • One or more predictors of numbers of categories per page.

Metric for Phase 1:

  • To assess the accuracy of the predictor(s): Given a random sample of Wikipedia pages, the accuracy rate of the predictor(s).

Utility in later phases: Once we know the numbers of categories assigned to pages and their distribution, we can begin to look at:

  • How best to display lists of categories, and whether there are different optimal displays for lists of different lengths.
  • Whether there is a number of categories per page beyond which the utility of the categorization decreases. Can too many categories be detrimental?
  • Identifying pages that are under-categorized compared with similar pages, so that editors can focus on those.
  • What the impacts are of the number of assigned categories per page. For example, are readers more likely to click on a category that is in a short list or a long list?
  • Whether the number of categories assigned per page are in line with the categorization guideline documents.

Structure of category names[edit]

Category names can consist of 1, 2, 3, or more words. At the same time, category names can concern a single concept or multiple concepts:

Some of the multi-concept categories fill the gap that exists because of the lack of automated category intersection in Wikipedia. American Quakers, for example, can be seen as the intersection of the categories American people and Quakers, in other words, people who are both American and Quaker.

A particular aspect or attribute of a type of thing is known as a facet. For persons, "American" would be a possible value of the nationality facet, while "Quakers" could be a value of the religion facet.

What is the distribution of words and concepts per category name? How strong is the correlation between them? What kinds of facets are utilized in category assignments for various types of article topics? What is the distribution of particular facets of category names?

Deliverables for Phase 1:

  • A taxonomy of category facets currently in use.
  • Statistics for numbers and types of words, concepts, and facets per category name.
  • One or more predictors of number of facets per category name.

Metrics for Phase 1:

  • To assess the completeness of the taxonomy of facets: Given a random sample of Wikipedia categories, the percentage of the categories whose facets have been identified by this project.
  • To assess the usefulness of the concept of facets:
    • Utilization of the term "facet" and terms for specific facets in the categorization guideline documents and on talk pages.
    • Survey of editors.
  • To assess the accuracy of the predictor(s): Given a random sample of Wikipedia categories, the accuracy rate of the predictor(s).

Utility in later phases: Once we understand the structures of category names, we can begin to look at:

  • How best to display categories with multiple words, concepts, and/or facets. Are single-facet categories easier for users to understand? Do long category names need to be displayed differently from short category names for maximum user understanding?
  • How the lack of automated category intersection features impacts the navigability of Wikipedia's category system.
  • Whether there are new facets not utilized in Wikipedia that might prove useful.
  • Whether editors might apply more category facets and categories in a faceted environment.

Category scope[edit]

The scope or object of categories varies. Typically, content categories assigned to articles apply to the topic of the article, such as the category Needlework assigned to the Crochet article. But content categories sometimes instead apply to the terms in the name of the article page, such as the category Pop culture words of Bantu origin. Sometimes a content category, such as Philosophical phrases is a hybrid, partially applying to the article topic, and partially applying to the article name. Administrative categories tend to apply to the article-as-document, rather than the topic of the article: Articles with links needing disambiguation from August 2013, Articles containing Persian-language text, Pages with ISBN errors. Are there additional scopes of categories than these? What is the frequency of the different category scopes across categories and articles?

Deliverables for Phase 1:

  • A taxonomy of category scopes currently in use.
  • Statistics on category scopes (overall, by namespace, etc.).

Metrics for Phase 1:

  • To assess the completeness of the taxonomy of scopes: Given a random sample of Wikipedia categories, the percentage that match the scopes identified by this project.
  • To assess the usefulness of the concept of scope for categories:
    • Utilization of the term "scope" and terms for specific scopes in the categorization guideline documents and on talk pages.
    • Survey of editors.

Utility in later phases: Once we know the types of category scopes and their frequencies, we can begin to look at:

  • How best to display categories of different scopes.
  • The utility of hybrid categories, such as Philosophical phrases. Are they better handled in a different way, such as by
    • Splitting the hybrid category into single scope categories (Philosophical concepts and Phrases)
    • Dropping one or more scopes altogether (retaining Philosophical concepts while dropping indication of phrases).
  • How best to explain the different scopes and how they can be helpful.
  • What the impacts are of the different scopes of categories. For example, are users more likely to click on a category that applies to the article topic rather than one that applies to the article name?
  • Whether the scopes of categories are in line with the categorization guideline documents.
  • Whether there are new category scopes not currently utilized in Wikipedia that might prove useful.

Page-category relationships[edit]

What types of semantic relationships exist between pages and their assigned categories? Two common page-category relationships are is-a (hyponym/hypernym) and whole/part (meronym/holonym). For example:

There are many other types as well. For example:

Cause & effect: Arson & Fire
Object of study & discipline: Tooth & Animal anatomy
Person & area known for or active in: Bayard Rustin & Community organizing
Proximate: Gulf of Mexico & Mexico-United States border

What are these other types? What is the frequency of each type?

Deliverables for Phase 1:

  • A taxonomy of page-category relationships currently in use.
  • Statistics for page-category relationships (overall, by namespace, etc.).

Metrics for Phase 1:

  • To assess the completeness of the taxonomy of relationships: Given a random sample of Wikipedia pages, the percentage of page-category pairs whose relationships match the relationships identified by this project.
  • To assess the usefulness of the concept of page-category relationships:
    • Utilization of the term "relationship" and terms for specific relationships in the categorization guideline documents and on talk pages.
    • Survey of editors.

Utility in later phases: Once we understand page-category relationships, we can begin to look at:

  • How best to display categories of different relationships to a page.
  • How best to explain the different relationships and how they can be helpful.
  • The impacts of the different page-category relationships. For example, are users more likely to click on a category that is in an is-a relationship with the page topic than one that is in a whole-part relationship?
  • Identifying pages that are lacking categories of relationships that are present on similar pages, so that editors can focus on those.
  • Whether there are new page-category relationships not utilized in Wikipedia that might prove useful.
  • If page-category relationships were made explicit to editors, whether editors might apply categories in more different types of relationships to a page.

Categorization of pages across namespaces[edit]

In addition to the aspects discussed above, how is the application of the category system to article pages the same as or different from the application to category pages or pages in other namespaces? There are numerous instances of an article and a category having the same name (or the same name except for singular vs. plural, so-called "eponymous categories"). Even in this case, sometimes that category is assigned to the article, and sometimes not. For example, note these category assignments:

Even when multiple categories are assigned to both an article and a category of the same name, it is not uncommon for the category assignments to be mutually exclusive. Either page type can have more categories than the other:

What is the nature of these differences? What other differences are there in the application of categories to pages in different namespaces?

Deliverables for Phase 1:

  • An analysis of additional similarities and differences in the categorization of pages across namespaces.
  • Statistics that arise from that analysis.
  • Possibly predictors, if they arise from the analysis.

Metric for Phase 1:

  • To assess the accuracy of the predictor(s): Given a random sample of topics with both an article and a category page, the accuracy rate of the predictor(s).

Utility in later phases: Once we know how categorization works across namespaces, we can begin to look at:

  • How best to display categories on pages in different namespaces.
  • The impacts of the similarities and differences. For example, is categorization in a particular namespace more consistent than in another namespace? Why? Are the differences confusing to editors or readers?
  • Whether the differences in categorization across namespaces are in line with the categorization guideline documents.
  • How best to explain the differences.

Categorization guidelines[edit]

There are fairly extensive guideline documents on assigning categories in Wikipedia. Nevertheless, and unsurprisingly, there is substantial inconsistency in application. The primary categorization guideline document in Wikipedia contains over 5600 words, and is not the most user-friendly document, especially to new Wikipedians.

Categorization of persons is perhaps the single most contentious categorization issue in the Wikipedia community, and it has its own main 4500-word guideline document, as well as one specifically on ethnicity, gender, religion, and sexuality. Issues these guideline documents address include: Should every person be categorized by his or her birth and death years? Should LGBT categories be assigned to ancient people, who may have lived in cultures where the basic concept of sexual orientation did not exist? Should LGBT categories be assigned to people who are not publicly out? How much evidence does one need to apply a category to a person? How important does an aspect of a person have to be in order to assign it as a category?

What percentage of articles accurately adhere to the categorization guideline documents? What is the nature of individual guidelines, and to what extent are they being followed? What have been the controversies about the guideline documents? What types of changes to the guideline documents have editors asked for?

Deliverables for Phase 1:

  • A taxonomy of the types of individual categorization guidelines.
  • Statistics for how often a sample of individual categorization guidelines are adhered to (overall, by namespace, etc.).
  • One or more predictors of category guideline non-adherence.

Metrics for Phase 1:

  • To assess the accuracy of the predictor(s): Given a random sample of Wikipedia pages, the accuracy rate of the predictor(s).
  • To assess the usefulness of individual categorization guideline types:
    • Utilization of the term "type" and terms for individual guideline types in the categorization guideline documents and on talk pages.
    • Survey of editors.

Utility in later phases: Once we analyze the categorization guideline documents and their adherence, we can begin to look at:

  • What the impacts of categorization guideline documents are on readers and editors.
  • Whether editors would work more on categories if they understood the guideline documents better or if the guideline documents were shorter.
  • Identifying pages that might contain much guideline non-adherence, so that editors can focus on those.
  • How best to display category guideline documents.
  • How best to structure category guideline documents.
  • How to promote the adherence to category guideline documents.
  • How best to explain individual guidelines in category guideline documents.
  • What kinds of training material might be helpful.
  • Whether there are new ways available to resolve past guideline conflicts.
  • Whether the categorization guideline documents follow Wikipedia policy and guideline document best practices.

Usage of category links[edit]

How frequently are category links followed? What percentage of internal Wikipedia clicked links are categories?

Deliverable for Phase 1:

  • Statistics for category link usage (overall, by namespace, etc.).

Utility in later phases: Once we understand the usage of category links, we can begin to look at:

  • The utility of category links compared with other types of links. What value do readers and editors get from category links? For what purposes (navigation, understanding concept relationships, etc.) do they use them?
  • How to promote the various uses of categories.
  • The impacts of links for proposed (red-linked) categories.
  • How best to display category links to increase usage.

Budget[edit]

Total amount requested[edit]

9750 USD

Budget breakdown[edit]

The only budget item is my time to do the research: 15 hours/week, at $25/hour = 9750 USD

Intended impact[edit]

Detailed example[edit]

Here is one example of the possible value of this research.

Currently Wikipedia displays categories in a single paragraph:

One simple option for a display that might be more readable for some people would be to list the categories vertically:

However, what we learn from this research could help us design category displays that are even more useful to Wikipedia readers.

We could, for example, display the categories by page-category relationship:

We could also facet the categories, with individual categories appearing in as many categories as they have facets:

Atomizing the categories before faceting would lead to each category appearing in only one facet:

Page-category relationships, faceting, and category atomization could be used together:

These new ways to display category data might be more understandable and usable for readers and editors. Making page-category relationships explicit and faceting categories could help readers and editors contextualize article content and improve navigation.

Beyond displays, encoding this additional information about categories could open additional ways to process and use category data, both by human users of Wikipedia, as well as by computers, bringing Wikipedia closer to the promised land of the Semantic Web.

Target audience[edit]

Readers and editors of the English Wikipedia. The foundational data gathered in this phase would also likely prove useful to other WMF wiki research projects, such as those investigating visualization of the category structure.

Community engagement[edit]

I plan to engage the English Wikipedia editor and reader communities in deeper ways than have been utilized in the past. Being a preeminent open content and open collaboration project, Wikipedia is in a unique position to lead in the area of open research and public scholarship.

My current vision for including open research activities in the project would start with an on-wiki page for the project, either a subpage of my user page, or in the project namespace. Page content would be updated frequently, and would include project updates, challenges I run into, and data analysis results as I get them, to keep the process of the project transparent. But much more importantly, it would include interactions between me as the lead researcher on the one hand, and other researchers and interested members of the editor community on the other. I plan to use this feature throughout the project to get community input on such things as priority setting, making tasks as efficient as possible, interpretation of results, use and implications of results, and where to disseminate findings. I hope that this can help to demystify the research process. Ideally a few other researchers and editors would be interested in taking on explicit roles on the project. I am especially interested in mentoring a small number of interested editors or readers who are not professional researchers, but are interested in participating in the project, to perform (probably small and relatively simple) research tasks in the project.

In a sense, this level of community engagement would serve as an ongoing form of mini open peer review, as any blind spots or errors may be spotted by others, who can notify me of them right on the project page, and in a larger sense serve to keep the project vital and meaningful for the community.

I want to enact at least some parts of the project as co-creations between me and the broader Wikipedia community. I want interested affiliates to feel that their time with the project is worthwhile. I want to earn the trust and respect of my co-creators and vice versa.

I am quite aware that adding open research components to the project is much more complex and time-consuming than might seem at first glance. I will continually adapt my community engagement practices to balance getting the research done, with involving the community. I am confident that the members of my UW advisory committee will help ensure that I maintain such a balance. :-)

I do plan to submit articles on the results of this project to academic journals, so will need to keep publication constraints, such intellectual property issues, in mind. For instance, can authorship of a paper be something along the lines of "Paul J. Weiss and Members of the English Wikipedia community"? Or would I list every person who contributed to the project, as is done in areas such as high-energy physics? If so, would I use their Wikipedia profile names? If we were to use real names, how could I verify that a name really is the person behind a profile name? Will relevant journals accept a paper with diffuse authorship? Will they balk if the credentials of some contributors are not academic? What other publisher constraints on publication do I need to watch?

I hope to learn from open research activities in this project things that could be useful to future open Wikimedia research efforts. I will continue to confer with colleagues at the University of Washington's Simpson Center for the Humanities, which is a leader in the public scholarship movement.

Fit with strategy[edit]

The overall research and development agenda fits with four of the WMF strategic priorities:

  • Increase reach. Leveraging the power of Wikidata to run the category systems in WMF wikis gives us more bang for the buck. When a user in one language's Wikipedia adds an existing category to a page, that new piece of content can be spread to other language Wikipedias. For new categories, which would require translation, the bar for translating a category is significantly lower than for translating an article. These aspects could in particular be of benefit to the smaller WMF wikis; the scaffolding that the category system provides can jumpstart article creation and translation.
  • Increase participation. Optimizing and promoting the category system has the potential add a new, low-risk on-ramp for potential Wikimedians. Anecdotally there are potential editors who are too afraid to directly edit article content, but would consider doing category work. Also, making category work more efficient would give existing WMF wiki editors more time for other contributions. Two of the operational initiatives under this priority in the strategic plan are:
    • Facilitate community efforts to create organizational models and structures that support the Wikimedia projects.
    • Support volunteer initiatives that fuel the growth of communities and projects around the world, including meet-ups, public outreach activities and other volunteer innovations.
I believe that the community engagement strategy described in the previous section would further these two initiatives, building bridges between the Wikipedia research community and the Wikipedia editor community. I hope that some citizens of Researchville and Editorburg will consider extended visits to the other community.
  • Improve quality. Optimizing and promoting the category system would likely improve the quality of category work in WMF wikis.
  • Encouraging innovation. A re-envisioned, modern category system could spark previously unthought-of innovations. A measure of relatedness of two people who have Wikipedia articles? Subcategories as checklists for various purposes? One of the operational initiatives under this priority in the strategic plan is "Foster a healthy community of researchers interested in analyzing Wikimedia, provide access to relevant data, and highlight important questions to be addressed." The community engagement strategy above could lead to editors becoming involved in Wikimedia research in various ways, enabling the adaptation of the concept of "citizen science" to the Wikimedia community.

Phase 1 primarily lays the groundwork for the above. However, characterizing the category system better, on its own, could increase awareness and discussion among editors, which could lead to improvements in efficiency and quality.

Sustainability[edit]

I hope that successful completion of Phase 1 would lead to future phases being funded. If Phase 1 is the only phase that actually gets carried out, there is no inherent need for additional, post-grant work. I would hope that the findings in Phase 1 would provide fodder for editors doing category work and for other researchers.

Measures of success[edit]

Deliverables

  • Basic statistics on various aspects of the category system
    • Number of categories per page
    • Structure of category names
    • Category scope
    • Page-category relationships
    • Categorization of pages across namespaces
    • Categorization guidelines
    • Usage of category links
  • Predictors for
    • Number of categories per page
    • Structure of category names
    • Categorization of pages across namespaces
  • Taxonomies of the following that are currently in use
    • Category facets
    • Category scopes
    • Page-category relationships
  • Analysis of the similarities and differences in the categorization of pages across namespaces

Metrics

  • To assess the accuracy of predictors: Given a random sample of Wikipedia pages or categories (as appropriate), the accuracy rate of the predictors.
  • To assess the completeness of taxonomies: Given a random sample of Wikipedia pages or categories (as appropriate), the percentage of the categories whose types have been identified by this project.
  • To assess the usefulness of concepts new to the Wikipedia category system:
    • Utilization of the term for the new concept and terms for specific types in the categorization guideline documents and on talk pages.
    • Surveys of editors.

Additional information on the research and development agenda[edit]

Optimizing the category system could help WMF wiki readers and editors in a number of ways.

  • Get users to the content they want, whether for reading or editing, more:
    • Quickly, by improving wiki navigation.
    • Accurately, by increased consistency in how the category system is applied.
    • Often, by making users more aware of the category system and what it can do for them.
    • Fluidly, by all the above-mentioned methods.
  • Facilitate the work of category editors, by
    • Improving the categorization guidelines by rewording areas that are confusing.
    • Providing suggested category facets to consider.
    • Providing displays of pages on similar topics and their categories.
    • Improving the visibility of the category system and demystifying it, which could increase editor activity on categories.
    • More fully recognizing the category system as a valuable aspect of Wikipedia, which could motivate more editors to work on it.
  • Increase more directly collaborative work by increasing the visibility and respect for the category system. Category work is by its nature a more holistic activity than article work; it basically requires collaboration and coordination on one level or another.
  • Help readers to think about various aspects of their topic in a more systematic way, by displaying categories in more helpful ways.
  • Get more bang for the buck for the work put into the category system, by all the previously mentioned methods, and by leveraging other identified strengths.
  • Previously unthought-of innovations may become evident as we develop a deeper understanding of the category system.

As indicators of the importance of categories to editors of the English Wikipedia, the main categorization guideline document gets more than 8,000 page views per month, it has over 1500 edits on it, and there are in addition 6 specialized guideline documents, 2 FAQs, and 10 essays. WikiProject Categories has a respectable number of 156 listed members.

The value of this work reaches beyond WMF wikis. The category system, as implemented in the English Wikipedia and many other WMF wikis, is distinctive in several ways:

  • It categorizes article pages and content category pages in the same scheme.
  • It differentiates reader-facing content categories from editor-facing administrative categories, via the concept of hidden categories, which were enabled in 2008 (in addition to attempting to have mutually exclusive content and administrative categories).
  • Category pages are basically as editable as article pages.
  • The community has developed several guideline documents for assigning categories.

These aspects are not common in other category systems, and therefore how they are used and how successful they are are less well known. Because of these distinctive features, investigating the WMF category systems will not only aid WMF wiki readers and editors, but other category systems as well. This is yet another opportunity for WMF to lead in the broader infosphere.

Specific phases[edit]

  • Phase 2. Investigate how readers and editors utilize the category system in the English Wikipedia. What percentage of readers are even aware of categories in Wikipedia? What percentage utilize the category system? Why/how do they utilize it? How useful do they feel that it is? What changes would they like to see? Is the category system utilized more heavily by Wikipedia editors than readers? Are there significant differences between the two groups in how the system is used? Surveys, focus groups, interviews, and observation would be the likely methods used here. I feel that the category system is currently underutilized. When we know how and why readers and editors do and do not utilize the category system, we can market it better and increase usage. These user studies could help in prioritizing WP feature development.
  • Phase 3. Investigate the category systems in other language Wikipedias and in other WMF projects. How does the category system in the English Wikipedia compare with other WMF wikis? Are there aspects in other WMF wikis that are worth exploring for use in the English Wikipedia? There certainly are differences now. As one basic example, the name of one category in the English Wikipedia is American people, while the same category in the Commons is named People of the United States. Due to scope issues, some categories that work in one language will not work in another (such as the English Wikipedia category Pop culture words of Bantu origin); this type of category is not inherently helpful interlinguistically, since the name of the article on the same topic in another language may be neither of Bantu origin, nor considered to be a pop culture word. Using the frameworks developed in Phases 1 and 2 can help the efficiency and effectiveness of this phase.
  • Phase 4. Explore the value and feasibility of using Wikidata as the basis for the category system across WMF wikis. If deemed appropriate by the community, work with the community to develop and implement this. Migrating the category system to Wikidata could provide more standardization to category names and relationships (facilitating navigation across WMF wikis). It could also significantly improve the efficiency and effectiveness of editors' category work. And it could lead to a higher degree of non-English functionality in single-language WMF projects such as the Commons.
  • Phase 5. Utilize user-centered design methodologies to prototype various enhancements to the category system to improve the user experience. If deemed appropriate by the community, work with the community to develop and implement such enhancements. Can we increase the visibility of the category system? Its utility to readers and editors? The number of category links clicked on? The number of articles viewed per session?

Participant(s)[edit]

Grantee: Paul J. Weiss (Libcub on en:WP, etc.). I have been a Wikimedian for over 6 years, with over 3200 edits in the English Wikipedia, and a few contributions to other projects. My bachelor's is in linguistics, my master's is in library & information studies, and I am now in the PhD program in information science at the University of Washington's Information School. I have spent 28 years as a librarian in the field, working primarily in the cataloging and metadata spheres. All of these contribute to my ability to conduct this research. UW's iSchool has a strong history with WMF-related research and grants, which I will be honored to continue.

Advisors: UW faculty members Jevin West, Jin Ha Lee, and Allyson Carlyle. They are the three members of my PhD Advisory Committee, at the University of Washington's Information School. Their collective areas of expertise relevant to this project include data science, big data, knowledge organization, information visualization, metadata, network science, and information retrieval.

References[edit]

  1. " History of Wikipedia", English Wikipedia. Retrieved on 18 June 2007.

Discussion[edit]

Community Notification[edit]

I have posted notifications about this proposal at these places:

en:Wikipedia

Talk:Categorization
Talk:Category intersection
Talk:Category names
Talk:Overcategorization
Talk:WikiProject Categories
Village pump (idea lab)

Meta

Categorization
Help talk:Category
Talk:Beyond categories

Wikidata

Wikidata:Wikidata:Project chat

Comments[edit]

  • I suspect that you find the category system to be mix of a tagging system, some tree structures, some non-tree structures, parts of an ontology, excessively detailed stuff never used anywhere, broken stuff, some missing bits and pieces, that is in short words, a mess. ;-) --Purodha Blissenbach (talk) 15:41, 17 April 2014 (UTC)

Endorsements[edit]

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.


  • I assue this project to be useful research. It is imho suited to be a case study about "How to desgin software that noone understands" or "How to better not design software" in several respects and may aid to the decision to abandon the category system in favour of clearer semantics, as e.g. available with the Semantic Mediawiki extension. --Purodha Blissenbach (talk) 15:41, 17 April 2014 (UTC)
  • Strong Support. Very necessary research. Lately I've been wondering why we both add persondata AND categories such as "1905 births". Fundamental research in this topic can go a long way and the proposer has the background necessary to do this. Jodi.a.schneider (talk) 19:37, 19 April 2014 (UTC)
  • Endorse - I look forward to any conclusions you can draw from this antiquated tool so that we don't create the same mess on Wikidata. Jane023 (talk) 20:32, 22 April 2014 (UTC)
  • endorse with strong support. Seems like very useful research. I'd be happy to help on my own nickel, I've been very involved with categorization and challenges of ghettoization, non-diffusing categories, and have done a little work with Magnus Manske on category intersection. Drop a note if you'd like to chat further as you get into your research.--Obiwankenobi (talk) 21:03, 22 April 2014 (UTC)
  • Endorse with strong support. Well structured categories can be very powerful and from reading over the various category proposla and discussion page, it's clear this would be a great area to work on. I'm also selfishly interested because categories are a pain-point for a variety of MediaWiki installations I've run (and trying to contribute more in the MediaWiki community going forward) --Catrionajay (talk) 23:52, 22 April 2014 (UTC)
  • congratulations - there are many misinterpretations of what categorization is about - any simplification of the process and understanding of the process is well worth being an IEG grantee sats (talk) 06:27, 1 June 2014 (UTC)
  • endorse like the potential of where this could go maybe even returninging all content whether its on wp, commons, species, source etc when looking at a category. First we need to know more, this is a good step. Gnangarra (talk) 07:07, 1 June 2014 (UTC)
  • Community member: add your name and rationale here.