Research talk:The Construction and Application of Personality Profile Based on User Behavior in Wikipedia

Two pilot surveys[edit]

I see two pilot surveys referenced but they are not described. What are the pilot surveys about? --EpochFail (talk) 23:10, 24 September 2018 (UTC)[reply]

Firstly, Previous research didn‘t attempt to generate a Big 5 questionnaire on the Wikipedia. And we are not sure what other information we need to obtain in addition to the personality data when conducting the study. In addition, we want to use the stratified sampling method to collect data from all of the five different personalities, but we are not sure whether this method is effective. So we want to improve our survey through the two pilot surveys. -- Ml Anrew Li (talk) 05:14, 27 October 2018 (UTC)[reply]

I'm not sure this answers my question. What questions are in the two pilot surveys? --EpochFail (talk) 16:05, 19 November 2018 (UTC)[reply]

The purpose of the first pilot survey is to know what data we are going to obtain from the study. Because in addition to the personality data, we may also need other data such as the name of the entry that the user has edited. The purpose of the second pilot survey is to test whether our stratified sampling method is able to collect all users with five different personalities. The two pilot surveys are more conservative. Fortunately, we may only need to conduct one pilot survey. -- Ml Anrew Li (talk) 07:28, 28 November 2018 (UTC)[reply]

Again, what are the questions in the survey? It's common practice to include the survey itself in an a review like this one? Do you have a university ethics review board that you've shown your study details to? We're looking for the same type of information. --EpochFail (talk) 16:27, 4 December 2018 (UTC)[reply]

Because we lack experience in issuing questionnaires in the OKC, and may encounter the lack of dataset of one or two of the five personalities, the purpose of the pilot survey is to familiarize with the process of issuing wiki questionnaires, and ensure the richness and completeness of the dataset.

In addition, we have just passed the review of the Ethics Committee. The supporting materials are as follows:

https://docs.google.com/document/d/10pH6VzFjNNZfMTIQRVD6XT2vGsf9WyIasbMqmNnTNAk/edit?usp=sharing --Ml Anrew Li (talk) 07:41, 26 June 2019 (UTC)[reply]

How will you correlate use personality with article quality dynamics?[edit]

It seems like you'd need to gather the personality profiles of everyone involved with the construction of an article and then run a multi-level statistical analysis to make sense of the article/user dynamics. How is that going to work? --EpochFail (talk) 23:12, 24 September 2018 (UTC)[reply]

According to the trait theory, one person's personality is always relatively stable, so we don’t plan to limit the scope of the questionnaire to a certain article. In addition, we intend to use the ensemble learning algorithm to train a personality predicting model, and then use the model to predict the users’ personality of different articles. -- Ml Anrew Li (talk) 05:14, 27 October 2018 (UTC)[reply]

So, you're planning to build a personality prediction model and apply that to all of the editors of an article? --EpochFail (talk) 16:06, 19 November 2018 (UTC)[reply]

Yes, we are going to do this. -- Ml Anrew Li (talk) 07:28, 28 November 2018 (UTC)[reply]

How many people will need to complete your survey?[edit]

--EpochFail (talk) 23:12, 24 September 2018 (UTC)[reply]

We plan to collect 300 valid questionnaires to build our personality prediction model, so we may issue more than 2,000 questionnaires to get the data considering previous Wikipedia surveys. -- Ml Anrew Li (talk) 05:14, 27 October 2018 (UTC)[reply]

This is too large of a number. I think you can expect to reach out to ~1000 people and get ~100 responses (if you are lucky) without getting substantial pushback from the community. It's a burden to have a bunch of researchers surveying their population all of the time, so there's some strong feeling about limiting the scope of such projects.

What previous surveys are you referring to? --EpochFail (talk) 16:09, 19 November 2018 (UTC)[reply]

OK, I think you’re right. The following are the references:

[1]Lai C Y , Yang H L . The reasons why people continue editing Wikipedia content – task value confirmation perspective[M]. Taylor & Francis, Inc. 2014.

[2]Ortigosa, Alvaro, Rosa M. Carro, and José Ignacio Quiroga. "Predicting user personality by mining social interactions in Facebook." Journal of computer and System Sciences 80.1 (2014): 57-71. -- Ml Anrew Li (talk) 07:28, 28 November 2018 (UTC)[reply]

I can't access the Lai et al. study because it is behind a paywall that my University doesn't subscribe to. So I can't comment on the number of survey respondents that they were able to get. I see that the study information hasn't been updated to reflect the smaller numbers that I proposed. Please make your plans clear in the study description including specific projections in the number of survey requests to be posted. --EpochFail (talk) 16:31, 4 December 2018 (UTC)[reply]

Isn't there an inherent bias because you're only working with people willing to risk their privacy by answering the survey, supposedly truthfully, and doesn't that skew the results unreasonably? (I'm tired of being a lab rat. Where do I vote against this kind of thing, or is it pointless?) —[AlanM1(talk)]— 01:20, 16 December 2018 (UTC)[reply]

At present, most of the research on personality is to use the questionnaire method to obtain personality data, and the questionnaire survey method is for those who fill out the questionnaires. As for the question you said about inherent bias, I am sorry that we cannot give an accurate answer. However, we have passed the audit of the university's ethics committee, so we still hope to conduct our research.--Ml Anrew Li (talk) 07:41, 26 June 2019 (UTC)[reply]

Ml Anrew Li, I'm posting again in the same thread because I don't see this basic question addressed in your response here or in the study description. Please explain how many editors will need to complete your personality survey in order for you to complete your study. We will need to know how many recruitment messages you will plan on posting. If your original proposal to have 300 people complete your survey is necessary, I fear that you will never be able to attain so many responses without substantially disrupting the work of editors on Wikipedia. --EpochFail (talk) 14:23, 26 June 2019 (UTC)[reply]

300 people are my reference to the above paper. If this is very difficult, I think 100 people is acceptable too.--Ml Anrew Li (talk) 13:20, 29 June 2019 (UTC)[reply]

Avi_gan & Nennes recently ran a survey study asking Wikipedians to do a similar amount of work in support of their research. Could you tell us the response rate you saw?

If I remember correctly, the response rate was below 10% -- which means that we'd expect 100 responses to require 1000 postings. That seems like too many. I think 300 posting and an expected 30 responses would be the upper end of what would be reasonable for a big wiki. Maybe you can attempt this survey in another wiki. English Wikipedia is a bit over-surveyed. They might be more likely to have a higher response rate. --EpochFail (talk) 16:30, 1 July 2019 (UTC)[reply]

Yes, our response rate was very low (~5%) even with incentives to fill out the survey.--avi_gan (talk) 14:39, 2 July 2019 (UTC)[reply]

EpochFail, I think 300 posts are too small to support our research. Considering the normal operation of English Wikipedia and the feasibility of our research, I think 700 posts are acceptable. In addition, our team has done a lot of basic work in the English version of Wikipedia, so we still don't plan to switch to other versions of Wikipedia.--Ml Anrew Li (talk) 06:54, 10 July 2019 (UTC)[reply]

I think 700 posts is totally unacceptable. --EpochFail (talk) 13:34, 12 July 2019 (UTC)[reply]

@EpochFail:, no offense, but according to the general experience of the research, more than 30 samples are the basic requirements of the study, and we do need about 700 samples if response rate is 5%. In addition, we did some work on the English Wikipedia, such as downloading offline data, writing or publishing the following papers:

（1） Qiu J, Zuo M, Yang S, et al. A qualitative knowledge representation model and application for crisis events[C]. Procedia Computer Science, 2018, 126: 1828-1836.

（2） Jiangnan Q, Chunling W, Miao C. The influence of cognitive conflict on the result of collaborative editing in Wikipedia[J]. Behaviour & Information Technology, 2014, 33(12): 1361-1370.

（3） Qiu J, Liwei Xu, Zuo M, Jingxian Wang. OKC-Enabled online knowledge integration: Role of group heterogeneity and group interaction process. Information technology and People（SSCI）.under review

（4） Jiangnan Q, Zuo M, Jingxian Wang,Hellen. Different effects of individual cognition and team interaction on the performance of self-organizing groups in online knowledge communities. Information systems journal (SSCI). under review

（5） Jiangnan Q, Chengjie Cai, Zuo M, Jingxian Wang, Hellen. Group Heterogeneity versus Group Interaction: Examining knowledge ordering in Online Knowledge Community. Writing

（6） Jiangnan Q, Zuo M, Meihui Zhang，Jingguo Wang. Understanding Knowledge Heterogeneity and Cognitive Conflicts in Online Knowledge Community: How Group Cooperation and Competition Matters. Writing

（7） Jiangnan Q, Zuo M, Yiru Wang. Understanding characteristics of human-robot collaboration in online knowledge community and their impact on working performance. Writing

Data usage[edit]

Ml_Anrew_Li how will you store personal information collected in the survey? How do you plan to use this data? How will you maintain privacy and/or anonymity of respondents? Do you plan to publish the data, and in what form? Has your study been reviewed by an IRB? As is, you don't answer these questions at all. And I don't think anyone should respond to your survey without at least some assurances. Jtmorgan (talk) 17:52, 18 October 2018 (UTC)[reply]

Thanks for your reminder. We plan to use Google Forms to send and collect questionnaires, but I hope you could tell us if there is a better way. In addition, we intend to collect the user's username on Wikipedia without collecting information such as their real name. We don't plan to publish the data, but if someone wants to get it, we will do the necessary processing to protect privacy. --Ml Anrew Li (talk) 05:14, 27 October 2018 (UTC)[reply]

What is "the necessary processing to protect privacy"? I think it's good to specify these things in advance so that Wikipedians can know just how much their are opening themselves up by giving you their information. As Jtmorgan asks, I'd like to know if your study has been reviewed by an IRB. --EpochFail (talk) 16:12, 19 November 2018 (UTC)[reply]

If someone wants to get a piece of data, we will encrypt the user name and the edited entries' name and retain the user's personality data. I'm sorry that our research has not yet applied for an IRB. -- Ml Anrew Li (talk) 07:28, 28 November 2018 (UTC)[reply]

I think that this review process should probably pause until you have applied for the relevant IRB approval. We need the same information that your local IRB (or equivalent) needs. --EpochFail (talk) 16:32, 4 December 2018 (UTC)[reply]

If you intend to release this survey without IRB approval, and you are unable or unwilling to address the issues of data privacy that EpochFail has raised, then I intend to notify the Village Pump on English Wikipedia that this survey does not follow best practices for ethical research. I will recommend that nobody take the survey, at very least. Your irresponsible approach to research concerns me and may erode community trust in future good-faith researchers who follow best practices. Please halt your study and address the concerns that EpochFail and I have raised. Jtmorgan (talk) 23:50, 4 December 2018 (UTC)[reply]

^^ Ml_Anrew_Li Jtmorgan (talk) 23:51, 4 December 2018 (UTC)[reply]

As promised, I have posted this notification. Jtmorgan (talk) 21:00, 12 December 2018 (UTC)[reply]

Follow up question, is Dàgōng aware that you are planning on conducting research on human subjects without applying for IRB approval? Have you been given a waiver, or have you simply not applied? GMG ^talk 20:24, 12 December 2018 (UTC)[reply]

I have submitted an application for review, and it should take a little time. But if the application is passed, where should I send the documents? Please give me an address? --Ml Anrew Li (talk) 09:05, 30 December 2018 (UTC)[reply]

We just passed the audit of the university's ethics committee. And we still hope to continue our research very much.

The supporting materials are as follows:

https://docs.google.com/document/d/10pH6VzFjNNZfMTIQRVD6XT2vGsf9WyIasbMqmNnTNAk/edit?usp=sharing--Ml Anrew Li (talk) 07:41, 26 June 2019 (UTC)[reply]

Restarting this discussion[edit]

Hey folks,

It appears that the researchers (lead by Ml Anrew Li) have received a stamp of approval from their university for the proposed research project. The document in question (see the google drive link here) provides some details about the study in question and contains a visible stamp.

@Allthingsgo, Xiplus, Xaosflux, Jtmorgan, GreenMeansGo, and AlanM1: you were all involved in past discussions, so I am pinging to make sure you are aware of this recent development. --EpochFail (talk) 14:21, 26 June 2019 (UTC)[reply]

Personally, I'm familiar with IRB review documents at US universities. In those review processes, much more information is necessary for review to be completed. E.g., How will subjects be recruited? What protections are provided for their data? What risks will subject incur through participating and how will the researchers minimize those risks? As far as I can tell, this document does not suggest that any of these basic questions were asked. So that gives me some hesitation, but I don't think that we should necessarily stop this research because the University's requirements for ethical research seem to be lax.

I'm concerned about one part of the study description on the document. The researchers wrote: "[...] we will attempt to reveal the personality differences of users in different quality or types of articles to help improve the quality of different types of articles in the Wikipedia community." I'm not sure it is clear to me how building a model of personality types of editors helps improve the quality of articles in Wikipedia. Because this makes a claim about benefiting Wikipedia and such a claim could make us consider engaging in some risk (e.g. user privacy) for such a potential benefit, this benefit should be clearly explained. What is the theoretical connection between personality types and the quality of articles? How might this help? --EpochFail (talk) 14:21, 26 June 2019 (UTC)[reply]

@EpochFail: Thanks for the ping.

I'm not an academic, so pardon me if the following are insensitive or normally understood in that environment. I am just another somewhat-educated "JoeUser" with a few decades of experience in the world.

The inherent sample bias issue I mentioned above was not resolved. Have there been studies that address this problem, which would seem to be inherent in many similar types of studies?
Has it been addressed how to either create a culturally-neutral questionnaire or to somehow post-process the responses to make them comparable? It would seem that personality-related questions would be particularly susceptible to this sort of question.
I don't want to be a lab rat. I'm tired of people trying to benefit from what I do and say without my consent and without paying me. I don't think I, or society, get enough value in return. While this is purportedly an educational endeavor, there's nothing (I know of) to stop misuse of the (possibly-flawed) published results by anyone, including Wikimedia (to govern future policy and development decisions). I don't want to have to live with the result of responses by what I believe might be an inherently-biased sample.
What guarantees are there that a Chinese university will follow the laws and privacy expectations of Americans, British, EU, etc.? In my experience, the cost of entering into cross-border contractual relationships necessary to enforce these expectations is high, usually accompanied by significant financial reward to at least offset it, which will not be the case here; not to mention the difficulty (legally and politically) of addressing any breach. While we should assume good faith here, we should not ignore history.
(ed) Can you be more specific about what you hope to discover and how that would help Wikipedia? We generally want to focus on content and actions and disregard editor personalities.

Thanks. —[AlanM1(talk)]— 23:04, 26 June 2019 (UTC)[reply]

I'm not sure this document in particular really provides us with a lot of additional useful information. I'm not sure the issue of security has been resolved, and I would feel much better personally if this was done to the satisfaction of those such as User:EpochFail, who is more technically competent than I am at evaluating such things. I'm not sure I see how this data set, once gathered, could be effectively de-pseudo-anonymized, and I note that a number of the questions are of a highly personal nature, which could be embarrassing to users if made public (e.g., "I often feel inferior to others," "Sometimes I feel completely worthless," "I often feel helpless").

On a tangentially related note, maybe this exists and I'm unaware, but it seems like it might be useful at some point to develop some kind of best practice recommendations regarding the frequency and volume of on-wiki notifications related to research. This is not the first time I've seen researchers suggest schedules that are fairly wildly out of step with what the community would consider disruptive or realistic. GMG ^talk 12:30, 27 June 2019 (UTC)[reply]

The question we want to study is "when different types of articles (know-why and know-what) or the same type of articles are of different quality, what personality contributors will bring more improvements to the quality of the articles and what kind of personality contributors will have a bad influence on the quality of the articles ", and we think this is beneficial to the improvement of the quality of the articles and the development of Wikipedia.--Ml Anrew Li (talk) 13:20, 29 June 2019 (UTC)[reply]
Our research does not focus on contributors, but just wants to get a better-performing personality learner. In order to get it, some label should be required.--Ml Anrew Li (talk) 13:20, 29 June 2019 (UTC)[reply]
The questions in this questionnaire are not produced by us, but a questionnaire that is currently more effective in measuring the Big Five personality. And if you have any good security solutions, welcome to come and we will do our best to protect the privacy of these users.--Ml Anrew Li (talk) 13:20, 29 June 2019 (UTC)[reply]

@Allthingsgo, Xiplus, Xaosflux, Jtmorgan, GreenMeansGo, AlanM1, and EpochFail:Hello, guys, thanks for your comments. In order to make our research more valuable, we would like to ask for your opinions:

1. What can we do to protect the privacy of wikipedians better?

2. What would we need to change in order to make this study worthwhile to English Wikipedians?

We will pay attention to your opinions.