Grants:IEG/Lua libs for behavior-driven development/Midpoint

From Meta, a Wikimedia project coordination wiki


Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first 3 months.

Summary[edit]

In a few short sentences or bullet points, give the main highlights of what happened with your project so far.

  • Got the baseline extension up and running. Tracking categories work, together with logging, and indicators.
  • Got the interactive testing up and going. It is in a separate pane below the current debug pane.
  • Started on the expectations.

Methods and activities[edit]

How have you setup your project, and what work has been completed so far?

Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.


Dev environment
  • A MediaWiki-Vagrant instance with codeeditor, geshi, memcached, scribunto, and wikieditor (setup at Help:Spec/Vagrant)
  • Code editor is Visual Studio Code withsome additional extensions
  • Checking of styles, etc, is in place
  • Code is on GitHub (sorry)
Functional
  1. Create proper entries at the statistics page for source modules and their test modules
    Not done Used logging and tracking categories instead. See #Statistics about status changes.
  2. Create a bare minimum portal about Spec-style testing on Mediawiki
    partially done as Help:Spec, must still be extended
  3. Create a bare minimum extension for Spec-style testing
    partial done, code at GitHub: jeblad/spec
    The code must be moved to Wikimedias repository before localization is started
  4. Extend the extension with test doubles
    Not done so far
  5. Extend the extension with spies
    Not done so far
  6. Extend the extension with coverage of public interfaces
    Not done so far
Socializing
  1. Maintain a list of non-techie updates at Help:Spec/Rollout
    Maintained pages about the extension, but not rollout. (Shouldn't that be a rollout to the sites with full functionality?)
  2. At startup add a note at the biggest wikimedia projects' community portals about existence of the Help:Spec
    Done for some projects.
  3. Mark introduction and core pages at Help:Spec for translation as early as possible
    Done for some pages, but it is several that isn't stable.
  4. Write a monthly newsletter for the wikitech mailing list, mostly about technical progress
    Not done
  5. Write a non-techie newsletter for the community portals when the extension has the first on-wiki running tests for the examples
    Not there yet

Midpoint outcomes[edit]

What are the results of your project or any experiments you’ve worked on so far?

Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.

The outcome of the tests are parsed down to an overall status. That result is used to set various bells and whistles. Internally statuses are exchanges as text strings, and they are picked up by registered callbacks on various hooks. An alternative could be to pull in more stuff through RequestContexts, but then the code would get slightly more dependencies.

Background processing[edit]

The extension will first try to find a module at one of several locations, and when found a match it will try to invoke it. The result will then be parsed and an overall status found. After the status is found extension data is set together with page properties. Extension data will then be used for further message passing internally in the extension.

UI changes during view[edit]

A module that has tests, with its page indicator and tracking category.

Page indicator[edit]

at the top of the pages for the module and the tests there is a Help:Page indicator. It will show the overall status of the outcome from the tests.

Styling is done through CSS, which includes coloring and icon. The icon should be imported through @embed, and not "hardcoded" in the style sheet. The set of icons is somewhat limited, and it is not clear if the icons should follow the state or be even more simplified.

The text is fully localizable.

Tracking category[edit]

At the bottom of the pages for the module there is a Help:Tracking category. It will show the overall status of the outcome from the tests. The tests will not have tracking category for outcome of the tests.

The text is fully localizable, as usual for tracking categories.

UI changes during edit[edit]

A test for a module, with its page indicator and the test panel below the debug panel.

Test panel[edit]

The test panel is below the debug panel, and constructed the same way. A single button is used to run the tests, and another to clear reports from previous tests. There is a slight difference in how tests are run for the tester and the testee, but that should not be visible to the developer. It is possible to run tests from the tester while editing the testee, and to run saved code from testee while editing tester. It is not possible to edit both the tester and testee at the same time with this solution, even if it is possible.

The texts are fully localizable.

Changes to logging[edit]

Changes in status will trigger a logging event. Those changes might come as part of an edit but might also come later. Because it is not necessarily obvious who we should blame the user logging the event is a system user. It is possible to make a test run during save, and then use that user for logging purposes, but the whole code would be somewhat involved. It is much easier to just use the render call created after the redirect.

The text is fully localizable.

Finances[edit]

Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.

As an individual grant, and according to email from WMF, the previous part is used as salary for the grant holder.

Hours used on this project, as of this writing, is

  • July; 108.75 hours
  • August; 159.50 hours
  • Total 268.25 hours

A three month project on full time would be about 480 hours, so in time I would be slightly over halfway through the project. That is 50% of full-time would be 240 hours, or slightly less than my 268.25 hours.

More fine grained statistics are available upon request.

Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project.

There are no changes to previous plan that I know about so far.

Learning[edit]

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges[edit]

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

Technical
  • Generalization of some processes to make them extensible. Especially planning for alternate test methods has taken a lot of time.
  • Figuring out which objects are defined and valid at each event, ie. the life span of core object (aka the "globals")
  • Some of the odd areas of extension setup, and how this is done in the extension.json
  • Requirements on other extensions, it is undefined for the moment
  • Missing docs on resource loader. Especially how to inline code and styles.
Localization
  • How to plan for, and document, localization. The extension has a lot of repeating documentation.
Documentation
  • Part of the MW-documentation is slightly outdated, and incomplete. It is pretty well-known.
  • The code editor VSC can't properly parse a lot of the MW libs, thereby making it necessary to look up the definitions manually
  • Style tools for Lua are missing, or well, there ain't nothing at all.
  • Description of the Lua sandboxing is missing, that is how one and the same lib on a single page can't share the structure with itself.
  • Description of how and why a Lua-lib is set up in PHP, and what can trigger circular references and how to avoid it.
  • The motivation for writing bug reports when I'm the only one reading them is kind of low.
Socializing
  • It is difficult to create interest for something that isn't fully working. ;)

What is working well[edit]

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

  • Your learning pattern link goes here

For now only a few points

  • Do use extensive testing, even if it is only yourself coding the project. Code breaks. You do some changes in an unrelated part of the code, and it propagates to a part of the code you don't work at for the moment. If it takes to long time before you observe the bug, then you loose context. When you loose context it takes a long time to figure out whats wrong.
  • Do follow style manuals, they make the code easier to read. The most important developer is you, and you must be able to read your own code. If you can't read it nobody else can read it. Use of automated tools are nice, but there are unfortunately no such tool for Lua.
  • Comments are a lie. If you need comments to explain your code, then you should probably refactor your code. Often, but not always, a comment marks a point where code should be moved out to a separate function. Avoid saying whats obvious, let the code talk.
  • Use inheritance, but note that inheritance does not imply classes. Composition is often preferable over classes. Remember that Lua modules exports tables, which are instances. That is a kind of composite class that can be instantiated.
  • Code for reuse, but don't waste time on code you will newer use. Start with something minimal and extend the libs as necessary. Ask yourself if the code is general and solves the problem at hand. Would you reuse the code in another project if it was not your code?
  • Remember to make the code testable, it takes a lot of time to fix messy code later on. It is not strictly necessary to do the testdevelop close loop, I often use spikestabilize and then enter the testdevelop loop. During spike I try to get an idea about what the code should do, and during stabilize I refactor and make the code testable.

Next steps and opportunities[edit]

What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points. If you're considering applying for a 6-month renewal of this IEG at the end of your project, please also mention this here.

As of this writing
  • Derived classes for Adapt, that is Expectation and Subject. The two later will probably be merged.
  • Classes for Describe, Context, and It
  • Creating a few formatters for the TAP-report. Probably a minimal set of compact (minimalistic text style), full (text style), and vivid (html style)
  • Creating the access points from mw-lib
  • Move from GitHub to Gerrit is approx week 36-37

Grantee reflection[edit]

We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?

This turned into a "Good ideas from the previous work" and less about my own experiences.

Statistics about status changes[edit]

After some testing (and a lot of deep thoughts) I found that posting information about the status of testing on the statistics special page is simply posting for the wrong audience. Not that we can't (or shouldn't) post overall statistics there, but we won't get the attention of the most concerned users there. What we want is to have a DarnSimple™ way to identify which one to annoy and which one to pat on their back when modules changes state from good to bad or otherwise. I was somewhat influenced by an video from GTAC 2013; Keynote: Evolution from Quality Assurance to Test.

After rethinking I ended up with a solution where a module has a test, and that test post an overall indication of the result on the module page. Obviously that is posted as a page indicator. The obvious page indicators are "good" and "bad", or "failed". The next obvious thing was to add tracking categories. If the tests for a module is good, then the module is added to a tracking category for good modules. In addition I soon figured out that logging would be very nice, because a module can use some other module, and that can make our module break even if that module itself does not break. So a logging feature was added.

Notifications to watching users[edit]

It seems like alerts to the users that has a specific module on their watchlist is a pretty obvious next step. This makes the blame for a good or failed module visible to those involved. There is although some users that don't respond well to messages only going to themselves, and we need some way for other to receive the messages. One idea would be to deliver bundled notices to admins. See 32:20 to 36:10 on the GTAC video.

There is a differences between those who want to watch because they can fix upcoming problems in the Lua modules, and those who are mostly interested in impact on articles. The later would probably be admins in general, and those are probably more interested in statistical changes. That is an increase in failures that correlates with use of a specific mudule or template. The problem seems similar to watching users, but it has a very different implementation.

A special page for related changes[edit]

It is possible to make a kind of special page that analyze the changes made to other modules, and thereby identify which edit make the module break. Such a tool could be linked from the tracking log of tested modules. That would be an awesome tool, but it is not part of the plan for this project.

The idea is to take the timestamps for the good and failed events, and then check all revisions of required pages for both tester and testee, and pages required from those. The whole set is then checked for edits. One of those can be the cause of the failed tests. It is also possible that the failure can be initiated from pages that are loaded in some other way.

Such a page should probably include the state changes for the other related pages. It is highly likely that several modules have broke due to a single change.

A special page for prioritisation[edit]

It could be interesting to have a special page for prioritisation on failing modules. The same holds for other similar problems. The idea is that there are some event, and if the count (amount) within some timeframe (probably N events) is high then the page (module) is listed on top of a special page. Typically a module with a lot of failures are fragile and should be fixed.

If failures on pages using the module can be appropriated to correct module, then the impact of a fragile module can be a factor, and this will help developers identify the modules that has a high risk.

Ranking on number of pages embedding the module will also say a lot about potential risk, and could be a help in this respect.

Flaky tests[edit]

We will probably run into the problem with flaky tests. That is tests that does not return the correct expected result, often because the test data is live in some way or another. Perhaps the most common problem is because the user does not provide proper test data but use the live site for testing. Those tests will generate a lot of log entries, and possibly user alerts, and we need some way to identify them and inhibit them.

At some point it could be important to test against hermetic instances.

Similar problems[edit]

It is likely that ideas from this project can be reused on other similar problems. It is for example possible to track changes in readability, and if the readability drops below a threshold then the event is logged, the page categorized, and watching users notified.

An other area where the idea could be repurposed is failing templates. Now the page is marked as failed, that is the aggregate where an instance of the template is used, but not the template itself. By tracking the failing templates, and number of failures, then it could be possible to prioritize which template needs fixing.

Dual panel editor[edit]

It could be nice to change the present editor into a dual panel editor, with one panel for the testee and one panel for the tester code. This would make it possible to develop without constantly saving each iteration, and such a setup would make it possible to only save versions that are ready for w:continuous delivery. A change to do testing in a dual panel editor is pretty straight forward.

This is probably obvious for those that knows the code, but the present solution uses the debug facility to create a query using a single table and a statement to be evaluated. From one page the tester lib is used as the table, and from another page the testee lib is used. Because there are no obvious way to tell each page where the other code is, ie. it is in the text field of another tab or window, we can't simply create a lib with two tables. If both panels are within one webpage, then we can do this and then evaluate both tester and testee without saving them.