Community Insights/Community Insights 2020 Report/Collaboration, Diversity & Inclusion (2020)/Statistical endnotes

Appendix. Methodological and statistical endnotes[edit]

All those who completed at least 50 percent of the Community Insights survey were weighted based on their monthly edit count used for sampling and compared across groups to understand differences. As many as 1517 and as few as 441 individuals responded to the various question sets to capture our social climate factors. Participant counts for the different question sets vary as some were randomized to shorten survey length while others were presented to every survey participant (See Appendix: 2019 Descriptive statistics). All responses were collected on a 5-point likert scale of agreement with an option to respond “unsure.” All responses within each question set were scored such that a higher score is positive, some items were reverse scored where appropriate (as noted) and averaged to produce each factor score presented (See also Appendix: Methodological and Statistical Endnotes).

To examine the potential effects of the 50 percent or more completion requirement for inclusion in the final analysis, due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factor scores between those who did and did not complete at least half the survey.^[1]^[2] Importantly, in two cases there was a detectable difference in factor scores between those who did and did not complete at least 50% of the survey. Those differences were explored to understand the nature of the differences although ultimately, those who did not complete at least half and thus, had completed no demographic information, were excluded from the final analysis. Factors which the analyzed sample reported significantly higher scores included: Engagement (n = 2572; mean rank of 1734.70 compared to 1494.12 for those who did not make it to the half-way point, n = 784; U = 1152778.0, p = .000), Non-Discrimination (n = 2468; mean rank of 1399.81 compared to 1312.15 for those who did not make it to the half-way point, n = 311; U = 407985.5, p = .048).

Along with the following statistical tests, means and medians are also reported in the data tables for clarity even though assumptions of normality and outliers are violated.^[3] Sometimes t-tests data are also reported alongside nonparametric test results to ease interpretation.

Gender Gap[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in the distribution of contributor groups based on geography: Geographic distributions were not similar for all groups, as assessed by visual inspection of a boxplot. Distributions were statistically significantly different between the different contributor groups, χ2(4) = 74.746, p = .000, N = 1580. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences between Europe (mean rank = 741.06) and Africa (mean rank = 1094.58) (p = .000), Asia & Pacific (mean rank = 869.08) (p = .000), as well as North America (mean rank = 852.25) (p = .001); as well as between South America (mean rank = 718.64) and North America (p = .003), Asia & Pacific (p = .000) , and Africa (p = .000); as well as between North America and Africa (p = .000), and between Asia & Pacific and Africa (p = .001), but not between any other group combination.

Geo Gap[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in demographic representativeness between contributor groups, while there was no significant difference for language fluency e, there were differences for male representativeness. Specifically, ratios of Male participants were not similar for all groups, as assessed by visual inspection of a boxplot. Proportion of participants identifying as male was statistically significantly different between the different contributor groups, χ2(4) = 71.580, p = .000, N = 1572. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences between Movement organizing Admins (mean rank = 836.05) and Movement Organizers (mean rank = 635.03) (p = .000) as well as Editors (mean rank = 818.74) (p = .000) and non-organizing Admins (mean rank = 850.08) (p = .001); as well as between Movement Organizers , and Developers (mean rank = 712.25) (p = .014), and to a lesser extent between Developers and between Movement organizing Admins (p = .085); but not between any other group combination.

Audience differences in Collaborative Engagement[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Kruskal-Wallis^[4] test was conducted to determine if there were differences in Collaborative Engagement factor scores between contributor groups were different based on mean ranks^[5]: Editors, On-wiki Admins, Developers, Movement Organizers), and Movement Organizing Admin. (Note: n-value varies by item see details in Appendix: 2019 Descriptive statistics for n-values, means, and medians

Distributions of Engagement scores were not similar for all groups, as assessed by visual inspection of a boxplot. Engagement scores were statistically significantly different between the different contributor types, χ2(4) = 79.153, p = .000, N = 1673. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significantly lower Engagement scores among Developers (mean rank = 632.08) compared to Movement Organizers (mean rank = 974.04) (p = .000), and Movement organizing Admins (mean rank = 1125.66) (p = .000). Editors (mean rank = 805.51) also tended to score higher than Developers (p = .003) and lower than Movement organizing admins (p = .000). Among non-organizing on-wiki Admins (mean rank = 775.20). Engagement was also lower compared to Movement Organizing Admins (p = .000), but not between Movement Organizers and Movement organizing Admins, Developers and non-organizing Admins, or Admins and Editors.
Distributions of Feelings of Belonging scores were not similar for all groups, as assessed by visual inspection of a boxplot. Feelings of Belonging scores were statistically significantly different between the different contributor groups, χ2(4) = 25.823, p = .000, N = 1600. Subsequently, pairwise comparisons were performed using Dunn' procedure with a Bonferroni correction for multiple comparisons^[6]. This post hoc analysis revealed statistically significant differences in Feelings of Belonging scores between Developers (mean rank = 714.34) and Movement Organizers (mean rank = 891.52)(p = .010), and Movement organizing Admins (mean rank = 943.66)(p = .005), as well as all between movement organizers and Editors (mean rank = 775.33) (p = .003), and Editors and Movement organizing Admins (p = .008), but not between the Developers and Editors, or non-organizing Admins (mean rank = 822.08) or any other group combination.
Distributions of Fairness scores were not similar for all groups, as assessed by visual inspection of a boxplot. Fairness scores were statistically significantly different between the different contributor groups, χ2(4) =30.581, p = .000, N = 1531. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significant differences in Fairness scores between Movement Organizers (mean rank = 695.65) and movement organizing Admins (mean rank = 971.04) (p = .000), as well as movement organizing Admins and Editors (mean rank =767.42) (p = .000), and movement organizing Admins and Developers (mean rank = 703.73) (p = .000), but not between non-organizing Admins (mean rank = 825.27) and any other group combination.
Distributions of Movement Leadership scores were not similar for all groups, as assessed by visual inspection of a boxplot. Movement Leadership scores were statistically significantly different between the different contributor groups, χ2(4) = 10.352, p = .035, N = 1630. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significant differences in Movement Leadership scores between Developers (mean rank = 693.38) and Movement Organizers (mean rank = 864.95) (p = .013), and to a lesser extent, Developers and Editors (mean rank = 816.72) (p = .086), but not between the any other group combination.
Distributions of Movement Strategy scores were not similar for all groups, as assessed by visual inspection of a boxplot. Movement Strategy scores were statistically significantly different between the different contributor groups, χ2(4) = 114.43, p = .000, N = 1337. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significantly lower scores among Developers (mean rank = 632.31) compared to Movement Organizers (mean rank = 873.33) (p = .000), and Movement organizing Admins (mean rank = 843.28) (p = .000). Editors (mean rank = 601.24) also tended to score lower than Movement Organizers (p = .000), and Movement organizing Admins (p = .002). Movement Strategy was also lower among non-organizing on-wiki Admins (mean rank = 650.49) compared to Movement Organizing Admins (p = .011), but not between any other group pairs.

Year-over-year differences in Collaborative Engagement[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factors scores between 2018 and 2019 data.^[1] (Note: n-values vary by indicator and year and are specified in the parenthetical notes). Year over year analysis included only the editors group for which we are able to apply partial propensity score matching to weight for better representation based on the higher tendency for our more active editors to both start and complete the survey. When compared to 2018, an independent samples U-test found contributors less likely to report experiencing high levels of Fairness (mean = 3.60, N = 1672, t =-2.124, p = 0.034; U = 773692.5, p = .001), Feelings of Belonging (mean = 3.43, N =1735, t =-2.241, p = 0.025; U = 912252, p = .000), and Movement Leadership (mean = 3.34, N = 1766, t =-3.798, p = 0.000; U=632879, p = .000). While these may indicate a true difference, we recommend caution as we also have undergone changes to the sample metadata and, while we have worked to ensure alignment as best as possible, the threshold for reaching 50% or more completion in order to be included in the analysis was much more reliable with the 2019 data. Due to this difference, we may have been more, or less, conservative with retention for analysis than intended. It is unknown what effect this may also have had on the comparison. (See also Appendix: Changes from 2018 to 2019)

Audience differences in Diversity & Inclusion[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in Collaborative Engagement factor scores between contributor groups as follows:

Distributions of Inclusive Culture scores were not similar for all groups, as assessed by visual inspection of a boxplot. Inclusive Culture scores were statistically significantly different between the different contributor groups, χ2(4) = 29.929, p = .000, N = 1439. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significant differences in Inclusive Culture scores between Movement organizing Admins (mean rank = 902.12) (p = .000) and Movement Organizers (p = .014) compared to Editors (mean rank = 689.38) as well as between Movement Organizing Admins and non-organizing Admins (mean rank = 660.72) (p = .017), and between Movement organizing Admins and Developers (mean rank = 720.65) (p = .000), but not between any other group combination.
Distributions of Non-Discrimination scores were not similar for all groups, as assessed by visual inspection of a boxplot. Non-Discrimination scores were statistically significantly different between the different contributor groups, χ2(4) = 45.532, p = .000, N = 1600. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significant differences in Non-Discrimination scores between Editors (mean rank = 843.88) and Developers (mean rank = 690.40) (p = .003), as well as between Editors and Movement Organizers (mean rank = 671.88) (p = .000), but not between any other group combination.
Distributions of Individual Commitment to Diversity scores were not similar for all groups, as assessed by visual inspection of a boxplot. Individual Commitment to Diversity scores were statistically significantly different between the different contributor groups, χ2(4) = 13.351, p = .01, N = 524. Subsequently, pairwise comparisons were performed using Dunn's procedure with a Bonferroni correction for multiple comparisons.^[6] This post hoc analysis revealed statistically significant differences in Individual Commitment to Diversity between Movement organizing Admins and Editors (mean rank = 248.30) (p = .036), but not between any other group combination.

Year-over-year differences in Diversity & Inclusion[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factors scores between 2018 and 2019 data.^[1] (Note: n-values vary by indicator and year and are specified in the parenthetical notes). Once again, year over year analysis included only the editors group for which we are able to apply partial propensity score matching to weight for better representation based on the higher tendency for our more active editors to both start and complete the survey. When compared to 2018, an independent samples U-test found overall that contributors reported lower levels of Non-Discrimination (mean = 4.27, n = 1732, t =-8.923, p = 0.000; mean rank of 1052.08 compared to 1161.62 in 2018,, U = 727937.5, p = .000) and Inclusive Culture (mean = 3.43, n = 1544, t =-5.686, p = 0.419; mean rank of 869.61 compared to 1006.53 in 2018,, U = 727937.5, p = .000), and higher levels of Inclusive Interactions (mean = 3.67, n = 549, t =-0.809, p = 0.416; mean rank of 736.48 compared to 517.31 in 2018, U = 154913.5, p = .000), compared to 2018. While this may indicate a true difference, we recommend caution as we also have undergone changes to the sample metadata and, while we have worked to ensure alignment as best as possible, the threshold for reaching 50% or more completion in order to be included in the analysis was much more reliable with the 2019 data. Due to this difference, we may have been more, or less, conservative with retention for analysis than intended. It is unknown what effect this may also have had on the comparison. (See also Appendix: Changes from 2018 to 2019.

Supplement 1[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factors scores between Male and non-Male Collaborative Engagement scores. (Note: n-values vary by indicator and are specified in the parenthetical notes). When compared to non-makes, an independent samples U-test found overall that males reported higher levels along six of the nine Collaborative Engagement factors including: Engagement (mean = 3.48, mean rank of 634.11 for males compared to mean = 3.20 and mean rank = 533.22 for non-males, U = 129900.0, p = .000); Feelings of Belonging (mean = 3.27, mean rank of 603.84 for males compared to mean = 3.01 and meanrank = 510.70 for non-males, U = 115525.5, p = .000); Problem Solving and Negotiating (mean = 4.10 compared to mean = 3.82 for non-males, U = 12408.5, p = .015); Fairness (mean = 3.65 for males compared to mean = 3.42 for non-males, U = 104336.0, p = .000); Movement Leadership (mean = 3.38 and mean rank of 612.38 for males compared to mean = 3.16 and mean rank = 527.03 for non-males, U = 120177.5, p = .001); and lastly, Awareness of Others (mean = 3.31 and mean rank of 206.86 for males compared to mean = 3.10 and mean rank = 173.69 for non-males, U = 3.1, p = .031).
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was also used to determine if there were differences in factors scores between males and non-males and Diversity and Inclusion scores. When compared to non-males, an independent samples U-test found overall that males reported higher levels of Non-Discrimination (mean = 4.40 and mean rank of 612.41 for males compared to mean = 4.01 and mean rank of 480.57 for non-males, U = 123817.0, p = .000); Individual Commitment to Diversity (mean = 4.07 and mean rank of 197.59 for males compared to mean = 3.70 and mean rank of 150.65 for non-males, U = 13272.0, p = .001); and Inclusive Culture (mean = 3.46 and mean rank of 528.0 for males compared to mean = 3.27 and mean rank of 466.25 for non-males, U = 87446.5, p = .009).
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factors scores between English vs. Non-English fluent contributors and Collaborative Engagement factors. When compared to non-English fluent contributors, an independent samples U-test found overall that English Fluent contributors reported lower levels along some Collaborative Engagement factors compared to their non-English fluent counterparts. This included factors of: Feelings of Belonging (mean = 3.47 and mean rank of 607.94 for English-fluent compared to mean = 3.34 and mean rank of 541.38 for for those who were not English fluent, U = 144574.0, p = .004); Movement Leadership (mean = 3.36 and mean rank of 614.74 for English fluent compared to mean = 3.27 and mean rank of 562.45 for for those who were not English fluent, U = 144904.0, p = .024); Collaborative Intention (mean = 3.44 and mean rank of 212.27 for English-fluent compared to mean = 3.37 and mean rank of 187.95 for those who were not English fluent, U = 18078.5, p = .067); and Awareness of Self (mean = 2.76 and mean rank of 199.74 for English fluent compared to mean = 2.52 and mean rank of 160.42 for those who were not English fluent, U = 17054.4, p = .001).
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, a Mann-Whitney U-test was used to determine if there were differences in factors scores between English-fluent and non-English fluent contributors and Diversity & Inclusion factors. When compared to non-English fluent contributors, an independent samples U-test found overall that English Fluent contributors reported significantly higher scores for Inclusive Culture (mean = 3.47 and mean rank of 533.85 for English-fluent compared to mean = 3.31 and mean rank of 474.75 for for those who were not English fluent, U = 110313.5, p = .005); and somewhat elevated score for Inclusive Interactions (mean = 3.71 and mean rank of 188.89 for English-fluent compared to mean = 3.56 and mean rank of 165.87 for for those who were not English fluent, U =12685.0, p = .087).
Education was highly skewed toward more years of formal education among contributors overall, compared to the population (See Appendix. Factors by years of formal education). While there were not many direct effects predicted by age or education, Generalized Linear Modeling was used to assessed the full factorial model of age and education to reveal that there was an overall finding in terms of Collaborative Engagement where age*education predicted differences along four factors: Awareness of Others (F = 1.57; p = 0.009; R Squared = .242; Adjusted R Squared = .088), Movement Leadership (F = 1.51; p = 0.006; R Squared = .131; Adjusted R Squared = .044), Movement Strategy (F = 1.40; p = .021; R Squared = .123; Adjusted R Squared = .035), and Fairness (F = 1.40; p = .034 R Squared = .105; Adjusted R Squared = .016).
Generalized Linear Modeling was again used to examine a full factorial model to determine also that age*education predicted differences in three factors: Leadership Commitment to Diversity (F = 1.90; p = .001 R Squared = .281 Adjusted R Squared = .133). Individual Commitment to Diversity (F = 1.50; p = .017 R Squared = .237 Adjusted R Squared = .079), and Inclusive Interactions (F = 1.47; p = .027 R Squared = .223 Adjusted R Squared = .071).

Supplement 2[edit]

Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in the distribution of key demographics among contributor groups based on geography.
1. English Fluency distributions were not similar for all groups, as assessed by visual inspection of a boxplot. Distributions were statistically significantly different between the different contributor groups, χ2(4) = 14.112, p = .007, N= 1549. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences between English Fluency among Europe based contributors (mean rank = 793.79) and those in North America (mean rank = 712.70) (p = .017) but not between any other group combination.While Africa seems to be similar to North America in distribution, there were too few observations to reach statistical power in that case.
2. Age distributions were not similar for all groups, as assessed by visual inspection of a boxplot. Distributions were statistically significantly different between contributor groups, χ2(4) = 163.487, p = .000, N = 1469. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences between Asia & Pacific (mean rank = 532.88) and North America (mean rank = 766.03) (p = .000) as well as Europe (mean rank = 849.18) (p = .000) as well as between South America (mean rank = 552.41) and North America (p = .000) as well as Europe (p = .000), and between Europe and Africa (mean rank = 595.44) (p = .001), but not between any other group combination. Education distributions were not similar for all groups, as assessed by visual inspection of a boxplot. Distributions/Medians were statistically significantly different between the different contributor groups, χ2(4) = 35.631, p = .000, N = 1506. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences between Asia & Pacific (mean rank = 629.33) and North America (mean rank = 774.00) (p = .000), South America (mean rank = 821.66) (p = .000), and Europe (mean rank = 784.73) (p = .000) but not between any other group combination.
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in Collaborative Engagement factor scores between contributor groups as follows:
1. Distributions of Awareness of Others scores were not similar for all groups, as assessed by visual inspection of a boxplot. Awareness of Others scores were statistically significantly different between contributors from different continents., χ2(4) = 14.99, p = .005. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Awareness of Others scores between Africa (mean rank = 71.40) compared to all other continents: North America (mean rank = 194.19) (p = .018), South America (mean rank = 213.99) ( p = .003), Europe (mean rank = 201.61) (p = .004), as well as between Asia & Pacific (mean rank = 215.42) (p = .002), but not between any other group combination.
2. Distributions of Engagement scores were not similar for all groups, as assessed by visual inspection of a boxplot. Engagement scores were statistically significantly different between the different contributor groups, χ2(4) = 31.42, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Engagement scores between Africa (mean rank = 987.11) (p = .000) and North America (mean rank = 680.39) (p = .007) compared to Asia & Pacific (mean rank = 555.17), as well as South America (mean rank = 615.42) (p = .000), Europe (mean rank = 620.02) (p = .000) and North America compared to Africa (p = .006), but not between any other group combination.
3. Distributions of Fairness scores were not similar for all groups, as assessed by visual inspection of a boxplot. Fairness scores were statistically significantly different between the different contributor groups, χ2(4) = 22.288, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Fairness scores between Asia & Pacific (mean rank = 491.79) (p = .000) and, to some extent, South America (mean rank = 514.71) (p = .080) compared to Europe (mean rank = 596.20), but not between any other group combination.
4. Distributions of Movement Leadership scores were not similar for all groups, as assessed by visual inspection of a boxplot. Movement Leadership scores were statistically significantly different between the different contributor groups, χ2(4) = 10.384, p = .034. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed no statistically significant differences in Movement Leadership scores between any group combinations.
5. Distributions of Movement Strategy scores were not similar for all groups, as assessed by visual inspection of a boxplot. FOO scores were statistically significantly different between the different contributor groups, χ2(4) =19.368, p = .001. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Movement Strategy scores between Africa (mean rank = 684.85) and Europe (mean rank = 465.84) (p = .041) as well as between Asia & Pacific (mean rank = 522.82) and North America (mean rank = 419.79) (p = .011), but not between any other group combination.
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in Diversity & Inclusion factor scores between contributor groups as follows:
1. Distributions of Non-Discrimination scores were not similar for all groups, as assessed by visual inspection of a boxplot. Non-Discrimination scores were statistically significantly different between the different contributor groups, χ2(4) = 10.231, p = .037. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Non-Discrimination scores between Europe (mean rank = 611.64) and Asia & Pacific (mean rank = 546.93) (p = .047), but not between any other group combination.
2. Distributions of Individual Commitment to Diversity scores were not similar for all groups, as assessed by visual inspection of a boxplot. Individual Commitment to Diversity scores were statistically significantly different between the different contributor groups, χ2(4) = 14.720, p = .005 Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Individual Commitment to Diversity scores between Africa (mean rank = 346.75) compared to South America (mean rank = 179.33) (p = .032), Europe (mean rank = 182.01) (p = .026), and marginally, Asia & Pacific (mean rank = 191.52) (p = .053), but not between any other group combination.
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in Collaborative Engagement factor scores between contributor groups to different wikimedia home spaces. This included six categories of participants, those with home project spaces of: Wikipedia, Commons, Wikidata, Other Wikimedia online projects, Developers, Organizers for charting purposes Commons and Wikidata home groups have been combined with Other Wikimedia online projects to simplify the visualization while the analysis examined each of the groups.
1. Distributions of Engagement scores were not similar for all groups, as assessed by visual inspection of a boxplot. Engagement scores were statistically significantly different between the different contributor groups, χ2(5) = 66.598, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Engagement scores between Developers (mean rank = 734.89) compared to active editors whose home project was Commons (mean rank = 903.51) (p = .047) and Organizers (mean rank = 1036.99) (p = .000) as well as between editors whose home project was Commons and those whose home project was Wikipedia (mean rank = 764.31) (p = .011), as well as between editors whose home project was Wikipedia (p = .000) or other Wikimedia (mean rank = 781.06) (p = .000) and Organizers, but not between any other group combination.
2. Distributions of Feelings of Belonging scores were not similar for all groups, as assessed by visual inspection of a boxplot. Feelings of Belonging scores were statistically significantly different between the different contributor groups, χ2(5) = 11.548, p = .042. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Feelings of Belonging scores between those whose home project was Wikipedia (mean rank = 752.27) and Organizers (mean rank = 868.30) (p = .020), but not between any other group combination.
3. Distributions of Fairness scores were not similar for all groups, as assessed by visual inspection of a boxplot. Fairness scores were statistically significantly different between the different contributor groups, χ2(5) = 32.801, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Fairness scores between those whose home project was Wikipedia (mean rank = 700.97) and those whose home project was Other Wikimedia (mean rank = 826.27) (p = .033), Commons (mean rank = 836.63) (p = .006) or Organizers (mean rank = 812.18) (p = .033) as well as between Developers (mean rank = 670.71) and those whose home project was Other Wikimedia (p = .033) or Commons (p = .035) , but not between any other group combination.
4. Distributions of Movement Strategy scores were not similar for all groups, as assessed by visual inspection of a boxplot. Movement Strategy scores were statistically significantly different between the different contributor groups, χ2(5) = 86.860, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Movement Strategy scores between Organizers (mean rank = 858.16) and those whose home project is Wikipedia (mean rank = 602.40) (p = .000), Commons (mean rank = 598.48) (p = .000), or other Wikimedia projects (mean rank = 560.46) (p = .000), as well as for Developers (mean rank = 759.13) compared to those whose home project is Wikipedia (p = .001), Commons (p = .026) or other Wikimedia projects (p = .000), but not between any other group combination.
Due to the non-normal distribution of the data which could not be corrected via statistical transformation, again a Kruskal-Wallis test was conducted to determine if there were differences in Diversity & Inclusion factor scores between contributor groups to different wikimedia home spaces. This included six categories of participants, those with home project spaces of: Wikipedia, Commons, Wikidata, Other Wikimedia online projects, Developers, Organizers for charting purposes Commons and Wikidata home groups have been combined with Other Wikimedia online projects to simplify the visualization while the analysis examined each of the groups.
1. Distributions of Non-Discrimination scores were not similar for all groups, as assessed by visual inspection of a boxplot. Non-Discrimination scores were statistically significantly different between the different contributor groups, χ2(5) = 48.812, p = .000. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed statistically significant differences in Non-Discrimination scores between those whose home space is Wikipedia (mean rank = 782.36) (p = .007), Commons (mean rank = 833.71) (p = .002), or other Wikimedia (mean rank = 890.83) (p = .000) and Developers (mean rank = 646.73) as well as between those whose home space is Wikipedia (p = .001) , Commons (p = .001), or other Wikimedia (p = .000) and Organizers (mean rank = 654.20). Lastly, the difference between those whose home space is Wikipedia and those whose home space was other Wikimedia project was also significant (p = .011), but not any other group comparison.
2. Distributions of Individual Commitment to Diversity scores were not similar for all groups, as assessed by visual inspection of a boxplot. Individual Commitment to Diversity scores were statistically significantly different between the different contributor groups, χ2(5) = 13.627, p = .018. Subsequently, pairwise comparisons were performed using Dunn's (1964) procedure with a Bonferroni correction for multiple comparisons. This post hoc analysis revealed only one statistically significant differences in Individual Commitment to Diversity scores between those who make Wikipedia their home (mean rank = 247.69) and B Organizers (mean rank = 308.79) (p = .013).
A multinomial logistic regression was run to determine the effect of dominant category identification as Male, English-fluent, and online wikimedia space on the experiences of Collaborative Engagement and Diversity & Inclusion factors.^[7] Across the Collaborative Engagement and DIversity & Inclusion factors, several factors were found to vary by orientation to these three categories. Specifically:
1. Awareness of Others WALDχ2(7) = 18.418 p = .001. There was a direct effect for those who identified with non-dominant online project spaces, to report lower scores for Awareness of Others than other contributors -.436 (95% CI, -0.919 to 0.047; WALDχ2(1) = 6.791 p = .009). There was also a three-way interaction effect for Wikipedia*English-Fluent*Male 3.248 (95% CI, 0.805 to 5.69; WALDχ2(1) = 6.791 p = .009) in which this effect may become additive.
2. Self-Awareness WALDχ2(7) = 19.653 p = .006. There was a direct effect for those who identified with non-dominant online project spaces to report higher Self-Awareness of other contributors +.54 (95% CI, 0.052 to 1.037; WALDχ2(1) = 4.693 p = .030). There was also an interaction effect for English-Fluency -1.33 (95% CI, -2.363 to -0.305; WALDχ2(1) = 6.462 p = .011) in which this project home space difference is reversed for those not fluent in English.
3. Collaborative Intention WALDχ2(7) = 20.622 p = .004. There was a direct effect for those who identified with the dominant online project space of Wikipedia to report lower scores in Collaborative Intention than other contributors -0.552 (95% CI, -1.014 to -0.09; WALDχ2(1) = 5.479 p = .019). There was also a direct effect of English-fluency -0.532 (95% CI, -1.024 to -0.04; WALDχ2(1) = 4.486 p = .034) and a three-way interaction effect for Wikipedia*English-Fluent*Male 2.771 (95% CI, 0.459 to 5.083); WALD χ2(df) = 5.516, p = .019 in which these effects may become additive.
4. Feelings of Belonging WALDχ2(7) = 27.898 p = .000. The direct effect for those who identified with the dominant online project space of Wikipedia did not reach significance while the direct effect of being non-Male predicted lower scores on Feelings of Belonging -0.424 (95% CI, -0,805 to -0.043; WALDχ2(1) = 4.753 p = .029). No interaction effects were significant.
5. Problem-Solving & Negotiating WALDχ2(7) = 22.574 p = .002. There was no direct effect for those who identified with the dominant online project space of Wikipedia or English-fluency on Problem-Solving and Negotiating; however, there was a direct effect of identifying as non-male to predict lower scores -0.537 (95% CI, -1.659 to 0.586; WALDχ2(1) = 3.921 p = .048). There was also a three-way interaction effect for Wikipedia*English-Fluent*Male 3.659 (95% CI, 1.164 to 6.154; WALDχ2(1) = 8.26, p = .004) in which the effects of non-dominant status can become additive.
6. Movement Leadership WALDχ2(7) = 22.102 p = .002. The direct effect for those who identified with the dominant online project space of Wikipedia did not reach significance while the direct effect of being non-Male predicted lower scores on Movement Leadership -0.434 (95% CI, -0.794 to -0.074; WALDχ2(1) = 5.57 p = .018). The three-way interaction approached, but did not reach, significance.
7. Non-Discrimination WALDχ2(7) = 52.64 p = .000. There was a direct effect for those who identified with the dominant online project space of Wikipedia 0.564 (95% CI, 0.206 to 0.922; WALDχ2(1) = 9.524 p = .002) to score lower as well as those who identify as male -0.893 (95% CI, -1.277 to -0.51; WALDχ2(1) = 20.821 p = .000) to score higher in Non-Discrimination. There were no additional interaction effects.
8. Inclusive Culture WALDχ2(7) =22.684 p = .002. There was a direct effect for those who identified as male to report higher scores in terms of Inclusive Culture -0.448 (95% CI, -0.828 to -0.069; WALDχ2(1) = 5.368 p = .021). There was also an interaction effect for the project space*English-literacy -0.703 (95% CI, -1.323 to -0.083; WALDχ2(1) = 4.935, p = .026) in which the diversity gap is amplified.
9. Individual Commitment to Diversity WALDχ2(7) = 21.245 p = .003. There was a direct effect for those who identified as male to report higher scores in terms of Individual Commitment to Diversity -0.866 (95% CI, -1.513 to -0.219; WALDχ2(1) = 6.89 p = .009). There was also an interaction effect for project space*English-literacy which approached but did not reach significance -0.89 (95% CI, -1.91 to 0.129; WALDχ2(df) = 2.932, p = .087) in which the diversity gap is amplified.

References[edit]

↑ ^a ^b ^c Mann, H.B.; Whitney, D. R. (1947). "On a test of whether one of two-random variables is stochastically larger than the other.". The Annals of Mathematical Statistics, 18(1), 50-60.
↑ Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). Boca Raton, FL, USA.: Chapman & Hall/CRC Press.
↑ Laerd Statistics (2015). "Statistical tutorials and software guides". Laerd Statistics.
↑ Kruskal, W. H.; Wallis, W. A. (1964). "Use of ranks in one-criterion variance analysis". Journal of the American Statistical Association, 47(260), 583-621.
↑ Lehmann, E. L. (2006). Nonparametrics: Statistical methods based on ranks. New York: Springer.
↑ ^a ^b ^c ^d ^e ^f ^g ^h Dunn, O. J. (1964). "Multiple comparisons using rank sums". Technometrics, 6, 241-252.
↑ Kleinbaum, D. G.; Klein, M. (2010). Logistic regression (3rd ed.). New York: Springer.

[:2-1] Mann, H.B.; Whitney, D. R. (1947). "On a test of whether one of two-random variables is stochastically larger than the other.". The Annals of Mathematical Statistics, 18(1), 50-60.

[2] Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). Boca Raton, FL, USA.: Chapman & Hall/CRC Press.

[3] Laerd Statistics (2015). "Statistical tutorials and software guides". Laerd Statistics.

[4] Kruskal, W. H.; Wallis, W. A. (1964). "Use of ranks in one-criterion variance analysis". Journal of the American Statistical Association, 47(260), 583-621.

[5] Lehmann, E. L. (2006). Nonparametrics: Statistical methods based on ranks. New York: Springer.

[:7-6] ↑ ^a ^b ^c ^d ^e ^f ^g ^h Dunn, O. J. (1964). "Multiple comparisons using rank sums". Technometrics, 6, 241-252.

[7] Kleinbaum, D. G.; Klein, M. (2010). Logistic regression (3rd ed.). New York: Springer.

[1]

[2]

[3]

[4]

[5]

[6]

[7]