Research:Revision scoring as a service/Revscoring library
Feature extraction garden
When supporting an ecosystem with multiple models that use similar features, it's important that features are (1) well defined and (2) don't duplicate work. #Feature dependencies depicts a set of example features, their dependencies on datasources and other features. By using a dependency injection strategy for specifying and actualizing relationships between features/datasources, we can allow for easy development of new features based on old features and datasources. We can also minimize the work that the system will need to perform when building feature sets for a large set of different models.
Example Makefile style dependency expression for MisspellingRaioDifferential
WordsAdded: RevisionDiff <parse revision diff> \ return count MisspellingsAdded: RevisionDiff Dictionary <parse revision diff and use Dictionary to find misspellings> \ return count PreviousWords: ParsedPreviousRevisionText <parse non-markup content> \ return count PreviousMisspellings: ParsedPreviousRevisionText Dictionary <parse non-markup content and use Dictionary to find misspellings> \ return count MisspellingRaioDifferential: WordsAdded, MisspellingsAdded, PreviousWords, PreviousMisspellings return (MisspellingsAdded/WordsAdded) / \ ((MisspellingsAdded/WordsAdded)+(PreviousMisspellings/PreviousWords))