File:Similarity analysis and clustering of modules.pdf

From Meta, a Wikimedia project coordination wiki

Original file(2,133 × 148,614 pixels, file size: 429 KB, MIME type: application/pdf)

This is a file from the Wikimedia Commons. The description on its description page there is copied below.


English: This report is part of Abstract_Wikipedia data science project (Abstract_Wikipedia/Data). The final project is hosted in and the code is in GitHub (

Testing various clustering algorithms and analyzing their results to find a suitable match for our task (determining which modules are similar and possible candidates to be merged).

Also contains a brief literature review of code similarity detection. List of possible candidates for improvement of clustering using better algorithms.
Source Own work
Author Aisha Khatun


I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.


Notebook containing similarity analysis and various methods of clustering for Scribunto modules.

Items portrayed in this file


2 March 2021

File history

Click on a date/time to view the file as it appeared at that time.

current07:02, 8 March 2021Thumbnail for version as of 07:02, 8 March 20212,133 × 148,614 (429 KB)Aisha KhatunUploaded own work with UploadWizard

There are no pages that use this file.