Grants:IEG/Magic expression

From Meta, a Wikimedia project coordination wiki

status: withdrawn

Individual Engagement Grants
Individual Engagement Grants
Review grant submissions
review
grant submissions
Visit IdeaLab submissions
visit
IdeaLab submissions
eligibility and selection criteria

project:

Magic expression


project contact:

pastakhov(_AT_)yandex.ru

participants:



grantees: Pavel Astakhov


summary:

Developing friendly and fast expressions of Magic Words





2014 round 1

Project idea[edit]

What is the problem you're trying to solve?[edit]

At the moment there is only one way to insert to wiki page a value or function's result, such as time, site details, or page names. It is the Magic words. All MW extensions which would like to enhance the wikitext parser with helpful functions must use this technology.

Expressions of Magic words are often used for various purposes, but such use has some serious limitations:

  • usability
  • unification
  • performance

What is your solution?[edit]

I propose to use the technology of Magic expression to solve these problems.

Magic expression is the same as expression of Magic words, but it is more comfortable for perception, has more opportunities and works much faster.

My first prototype is extension Foxway. The current implementation is extension PhpTags.

This is not the way to embed arbitrary code in wiki pages, because there is never used direct code execution. It only looks like the code for readability.

I would like to transfer a successful model of using HTML + PHP to MediaWiki. It is Wiki markup + Magic expression and where MW extensions will be used as functions and/or objects.

Usability[edit]

Magic expression looks like scripting language.

Expressions of Magic words have a complicated template structures. A simple example:

{{ #vardefine: r | 0 }}{{ #vardefine: i | 0 }}{{
  #while:
  | {{ #ifexpr: {{ #var: i }} < 100 | true }}
  | {{ #vardefine: i | {{ #expr: {{ #var: i }} + 1 }} }}
    {{ #vardefine: j | 0 }}
    {{
      #while:
      | {{ #ifexpr: {{ #var: j }} < 1000 | true }}
      | {{ #vardefine: j | {{ #expr: {{ #var: j }} + 1 }} }}
        {{ #vardefine: r | {{ #expr: ({{ #var: r }} + ( {{ #var: i }} * {{ #var: j }} ) % 100 ) % 47 }} }}
    }}
}}

Real examples: en:Template:Citation/core, User:Church_of_emacs/Template_love.

Also I came across another example, I could not understand what it is:

{{{{{{##{#}#}FODJFODJOIJ}#P#IP{{{I###ifthis}:}}}}}}{#switch}}{{{{{{}{}}}}}{}F}D}SF}}}}}SD}DF}D}FS}}}}}}}}}

If this simple example will look like as a scripting programming language, it will be easier to understand, write and maintain.

$r = 0;
$i = 0;
while ( $i < 100 ) {
    $i++;
    $j = 0;
    while ( $j < 1000 ) {
        $j++;
        $r = ($r + ($i * $j) % 100) % 47;
    }
}

And of course, a scripting language provides much greater opportunities.

Unification[edit]

Magic expression has single standard to parse and pass parameters and thus they will facilitate the work of editors and extension developers.

Extensions such as EasyTimeline, Maps, Semantic Forms, mw:Extension:Graph, QPoll and many others used their own markup.

Again they have a complicated template structures and in addition:

  • Users need to learn and get used to each of them
  • Developers need to invent their own markup parser for parameters
  • Require even greater efforts to integrate visual editing tools

When using technology of Magic expression, users will need to learn only one syntax, developers will not have to worry about parsing the parameters, the code editor will be a means of visual editing.

In bugzilla there is bug 54221 Support for text/syntax/markup driven or WYSIWYG editable charts, diagrams, graphs, flowcharts etc. As example, in order to make extension Vega need only to describe the expected parameters and transmit them from Magic expression to javascript.

Example of how it might look for editors:

$data = [12, 23, 47, 6, 52, 19];
echo new Vega( "Arc", ['data'=>$data, 'width'=>400, 'height'=>400] );
// $vega = new Vega( "Arc" );
// $vega->data = $data;
// $vega->width = 400;
// $vega->height = 400;


Magic expression lets to transmit data from one extension to other thus it is possible to create the loose coupling. This will give full freedom to editors and simplifies the development of extensions.

Now every extension basically does the same thing, it is getting information, processing it and display it. And you can manage this process only by those some tools that provide this extension. If you want something more, you need enhance it for each extension. But it is not always possible for extension authors and you have to clone the extension and enhance functionality in it. All these extensions need to maintain, and it takes a lot of resources. Thus, editor's freedom is severely limited.

It is possible to result much examples of this, and one of them very close. It is Grants:IEG/Rewrite Extension:QPoll to provide polls and quizes with Lua backend.

For mw:Extension:Quiz there are two other extensions mw:Extension:QPoll and mw:Extension:QuizTabulate that enhance its functionality. And it is not very successful.

I think each extension should do one thing and do it well. This is way unix, this is the way of freedom.

Editors must be able to request data using one extension, process them in other extensions, and display data using any of the available ones. And Magic expression make it available.

To display the html forms should serve the extension Form, and then the extension Quiz might just take an array of of questions, prepare and transmit it to the extension Form. At the same time editors will have the freedom to choose where to get an array of questions, edit form and in its sole discretion and can choose any method of processing results. Quiz extension developers will not have to worry about a lot of things that are now consuming their resources and do not allow concentrate on the essentials.

Example of how it might look for editors:

$quiz = new Quiz( 'Questions:Pagename' );
$form = $quiz->getForm;
$form->Option = 'some value';
echo $form;

Perfomance[edit]

In the current implementation not everything is enough optimized so the test results are not unambiguous.

Brute force test. Magic expression is 32 times faster than expression of Magic words. Magic expression is 323 times slower than pure PHP.

These two examples do exactly the same thing, but each uses its technology.

Expression of Magic words source:

{{ #vardefine: r | 0 }}{{ #vardefine: i | 0 }}{{
  #while:
  | {{ #ifexpr: {{ #var: i }} < 20 | true }}
  | {{ #vardefine: i | {{ #expr: {{ #var: i }} + 1 }} }}
    {{ #vardefine: j | 0 }}
    {{
      #while:
      | {{ #ifexpr: {{ #var: j }} < 100 | true }}
      | {{ #vardefine: j | {{ #expr: {{ #var: j }} + 1 }} }}
        {{ #vardefine: r | {{ #expr: ({{ #var: r }} + ( {{ #var: i }} * {{ #var: j }} ) % 100 ) % 47 }} }}
    }}
}}

Magic expression source:

$r = 0;
$i = 0;
while ( $i < 100 ) {
    $i++;
    $j = 0;
    while ( $j < 1000 ) {
        $j++;
        $r = ($r + ($i * $j) % 100) % 47;
    }
}
Expression of Magic words did 2 000 loops in 7.423 seconds.
It is 0.003712 second per loop.

Magic expression did 100 000 loops in 11.643 seconds.
It is 0.000116 second per loop.

Pure PHP did 100 000 loops in 0.036 seconds.

Magic expression has not optimized variable usage. I expect double the performance gain from their optimization.

In any case, the test shows that process the magic words in one scope is much faster than separately.

I hope that nobody uses Magic expression in this way. This just shows the degree of optimization. For such expensive calculations you need to write an extension and use it as a function in Magic expression.

Big script test. Magic expression is 10 times slower than pure PHP.

This test is closer to the real use of Magic expression. I took the code from unit tests to measure performance. You can see the code here.

Performance result:

Pure PHP: 0.005 sec
PhpTags time usage: 0.280 sec
          Compiler: 0.240 sec
           Runtime: 0.040 sec
PhpTags / PHP = 0.280/0.005 = 56
PhpTags\Runtime / PHP = 0.040 / 0.005 = 8

Compilation time can be neglected since the compilation will occur rarely, mostly when saving a wiki page.

I can not do a similar test for magic words, but in the first test one were 10,000 times slower than pure PHP. So, Magic expression very very much times faster expressions of Magic words.

Compared to LUA scripting[edit]

I know, WMF uses Lua scripting to improve performance, but in my opinion it does not solve all problems and adds new. I'm not wikipedia editor, and for me usage of Lua scripting looks like migration of Mediawiki from PHP to LUA.

I do not see any sense in it. For me it is reinventing the wheel and degradation.

Some modules simply implement the functions that already in PHP, example en:Module:String. You spend human resources to write it and these functions are slower than similar functions in PHP. For Magic expression it is very cheap, example mw:Extension:PhpTags_Functions.

The rest of the modules are used as templates where calculations are performed quickly. Wiki pages continue to use complicated template structures and the overall performance is still highly dependent on the performance of templates (Magic words). It increases the number of pages included in each other and it becomes harder to understand and it slows performance.

A lot of modules are used in a large number of pages, their modification causes high load on the server. This happens every time when functionality is expanded or errors are corrected, even if it will not affect million pages that use this module, but they must be rebuilt. You can make changes in MW extensions without causing stress on the server. And of course you can use unit tests and code review.

Unlike LUA scripting Magic expression uses only PHP, this does not complicate portability and it adds no overhead.

Many people may mistakenly believe that LUA is faster PHP. And since Magic expression is by far slower than PHP, then this technology is not even worth their focus. But it is not so for this case.

I took the module Location_map and on this base I made almost the same script using the Magic expression. Performance was comparable.

How I tested:

I put on the page same maps and looked the Limit report.

4 maps, LUA en:User talk:Pastakhov 4 maps,Magic Expression Location_map at test.foxway.org
Module:Location_map Template:Location_map
<!-- 
NewPP limit report
Parsed by mw1082
CPU time usage: 0.428 seconds
Real time usage: 0.467 seconds
Preprocessor visited node count: 384/1000000
Preprocessor generated node count: 1122/1500000
Post‐expand include size: 20066/2048000 bytes
Template argument size: 75/2048000 bytes
Highest expansion depth: 5/40
Expensive parser function count: 3/500
Lua time usage: 0.139/10.000 seconds
Lua memory usage: 1.17 MB/50 MB
-->
<!-- 
NewPP limit report
CPU time usage: 0.124 seconds
Real time usage: 0.130 seconds
Preprocessor visited node count: 116/1000000
Preprocessor generated node count: 400/1000000
Post‐expand include size: 600/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 4/40
Expensive parser function count: 0/100
PhpTags time usage: 0.020 sec
          Compiler: 0.000 sec
           Runtime: 0.020 sec
-->
40 maps, LUA en:User talk:Pastakhov/40 40 maps, Magic Expression Location_map_40 at test.foxway.org
<!-- 
NewPP limit report
Parsed by mw1169
CPU time usage: 2.124 seconds
Real time usage: 2.258 seconds
Preprocessor visited node count: 2548/1000000
Preprocessor generated node count: 10818/1500000
Post‐expand include size: 200510/2048000 bytes
Template argument size: 75/2048000 bytes
Highest expansion depth: 5/40
Expensive parser function count: 3/500
Lua time usage: 0.614/10.000 seconds
Lua memory usage: 1.47 MB/50 MB
-->
<!-- 
NewPP limit report
CPU time usage: 0.956 seconds
Real time usage: 0.960 seconds
Preprocessor visited node count: 1271/1000000
Preprocessor generated node count: 3956/1000000
Post‐expand include size: 6150/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 4/40
Expensive parser function count: 0/100
PhpTags time usage: 0.272 sec
          Compiler: 0.000 sec
           Runtime: 0.272 sec
-->

Yes, there is uncertainty but it's better than nothing.

So:

Scribunto (LUA) Magic expression (PhpTags) Summary
CPU time usage 4 maps: 0.428 seconds
40 maps: 2.124 seconds
4 maps: 0.124 seconds
40 maps: 0.956 seconds
MW with PhpTags faster MW with Scribunto 3.45 times
MW with PhpTags faster MW with Scribunto 2.22 times
Solution time usage 4 maps: 0.139 seconds
40 maps: 0.614
4 maps: 0.020 seconds
40 maps: 0.272
PhpTags faster Scribunto 6.95 times
PhpTags faster Scribunto 2.26 times

Of course the comparison is made on different servers, but I'm sure that the uncertainty is not large, because the blank pages [1] and [2] have the same generation time (0.008 seconds).

I'm not going to do Magic expression for calculations of this kind. All these calculations have to be performed in pure PHP. Magic expression is intended to manage this process. It will be much faster and easier to use.

Of course, nobody forbids to use Magic expression in this way, moreover, with PHP this code can be made simpler and thus this may be much faster and clearer, I tried to stick to the original decision.

As an alternative, you can always move this code to pure PHP and then use it in Magic expression for example as a function.

How Magic expression works[edit]

There are compiler and runtime classes. The compiler class recognizes and translates code from the Magic expression to a set of instructions for the runtime class. It is an array of simple commands such as add, multiply, check the condition and call extension. Runtime class runs the array and just executes the commands.

All commands are safe and do not execute any arbitrary code.

Consider as an example this simple Magic expression (Live Demo)

$foo = cos( M_PI + 1 * 2 );
echo $foo;

The compiler class translates this into array:

  1. 1 * 2
  2. call extension to get M_PI
  3. result of 2 + result of 1
  4. call extension to get cos( result of 3 )
  5. set $variables["foo"] result of 4
  6. return string of $variables["foo"]

The runtime class runs:

  1. 1 * 2 ( 2 )
  2. finds what extension registered hook constant M_PI. call it ( FoundExtension::onConstantHook('M_PI') returns 3.1415926535898 )
  3. 3.1415926535898 + 2 ( 5.1415926535898 )
  4. finds what extension registered hook function cos. call it ( FoundExtension::onFunctionHook('cos', 5.1415926535898 ) returns 0.41614683654714 )
  5. $variables["foo"] = 0.41614683654714
  6. return string of $variables["foo"] to wiki parser ( 0.41614683654714 )

Example with a class:

$foo = new Foo( "name", ['data' => 'value'] );
$foo->property = 123;
echo $foo;

The compiler class translates this into array:

  1. call extension to init Foo
  2. set $variables["foo"] result of 1
  3. call extension to set property "property" with 123
  4. return string of $variables['foo']

The runtime class runs:

  1. finds what extension registered hook class "Foo". call it ( FoundExtension::onConstructorHook("Foo", arguments) return special class like stdClass)
  2. $variables["foo"] = stdClass
  3. finds what extension registered hook class "Foo". call it ( FoundExtension::onPropertyHook("Property", stdClass, value ) returns void )
  4. return string of $variables["foo"] to wiki parser ( stdClass::__toString() )
    1. stdClass call extension ( FoundExtension::onEchoHook( stdClass ) ) and return string.

The extension MyExtension:

$wgHooks['PhpTagsRuntimeFirstInit'][] = 'MyExtension::initializeRuntime';

class MyExtension extends PhpTags\BaseHooks {
  public static function initializeRuntime() {
    \PhpTags\Runtime::setClassHook( 'Foo', 'MyExtension' );
  }

  public static function onConstructorHook( $name, $arguments ) {
    return new PhpTags\stdClass( $name, $arguments );
  }

  public static function onPropertyHook( $name, $class, $value ) {
    return $class->setProperty( $name, $value );
  }

  public static function onEchoHook( $class ) {
    return "Hello, my data is " . $class->data . " and property is " . $class->property;
  }
}

Feel free to experiment with PhpTags in namespace Sandbox at test.foxway.org. But do not forget that most of the features are not implemented yet.

What I have done[edit]

mw:Extension:PhpTags:

  • Operators ( Arithmetic, Assignment, Bitwise, Comparison, Incrementing/Decrementing, Logical, String, Array )
  • Variables ( limited by page, static, global )
  • Control Structures ( if, else, elseif, while, foreach, break, continue )
  • Extension can define static constants
  • Extension can define functions
  • Extension that implements some native PHP functions ( Array, Math, PCRE, Variable handling Functions )

Project goals[edit]

Now we have this scheme of work extensions: I suggest using this:

I suggest making MediaWiki extensions as universal blocks and thus to give editors the ability to build what they want using Magic expression.

This solution does not negate the use of the magic words, it only extends the functionality and can improve the performance.

Integration of existing extensions is very simple.

Example of infobox for page Paris

The example of universal extension Infobox as a class Infobox for Magic expression. This is just one of the possible variants, and you can always hide all or some part of it to a template.

$infobox = new Infobox( 'Paris' ); 
// new Infobox( [ 'title' => 'Paris' ] );
// $infobox = new Infobox(); $infobox->title = 'Paris';

$image = new Image( 
  'Paris montage2.jpg',
  ['width'=>275, 'alt'=>'Paris montage. Clicking on an image in the picture causes ...']
);

$image->addMap('rect', [0, 0, 1200, 441], '[[Le Louvre]]');
$image->addMap('rect', [0, 1260, 618, 1398] '[[Champs de Mars]]');
...

$infobox->addRow( 
  [ 
    $image,
   'Clockwise: Pyramid of the [[Louvre]],
 [[Arc de Triomphe]], Looking towards [[La Défense]],
 Skyline of Paris on the [[Seine]] river with the [[Pont des Arts]] bridge,
 and the [[Eiffel Tower]] - clickable image'
  ]
);

$infobox->addRow( 
  [ 
    new Image('Flag of Paris.svg', ['width'=>100]),
    new Image('Grandes Armes de Paris.svg', ['width'=>120]),
    "''[[Fluctuat nec mergitur]]''<br />(Latin: \"It is tossed by the waves, but does not sink\")",
    'align'=>'center'
  ]
);

...
echo $infobox;

Final implementation will depend only on the imagination of developers extensions. You can combine these two options, or come up with other, there are no limitations for creativity.

  • No need for included pages. Magic expression is many times faster their.
  • When you extend the functionality do not need rebuild a lot of pages wasted.
  • The upper level at one place, and lower level at other one place, it is clear for understanding
  • The upper level available for tracing
  • The lower level is available for unit tests, code review and debugging

Once again, because many confused: there is no direct code execution.

Part 2: The Project Plan[edit]

Project plan[edit]

Scope[edit]

Activities[edit]

I will spend most of the time coding and gathering feedback from the community.

1 month:

  • fix a bug associated with the array pointer
  • fix passing variable as link for functions
  • finalize the constant declaration
  • add a constraint execution time
  • add the ability to restrict access to unnecessary functions
  • add unrealized operators ( switch, for, do, Heredoc, Nowdoc ... )

After this point it will be possible trying to use it in the real world.

2,3,4 months:

  • improve performance of variables
  • add String and Date/Time Functions
  • use cache for Compiler
  • develop and implement objects for Magic expression
  • make the extensions for testing, demonstration and use as the examples for other developers.
  • experiments with the code editor
  • experiments with the tracer

After this point it will be possible adapting existing extensions to Magic expression and create new.

5, 6 month:

  • find and fix bugs (more phpunit tests)
  • gather feedback
  • documentation

Remaining free time I will spend to:

  • adapting existing extensions to Magic expression
  • creating extensions to retrieve data from MediaWiki and display data in different ways
  • improving the code editor
  • improving the tracer (debuger)
  • improving the performance (add callback to array functions)

Budget[edit]

Total amount requested[edit]

$30,000

Budget breakdown[edit]

Received funds I will use for software development. I am willing to work on this project with a lower amount, but in this case I can not give it a scheduled amount of time and I can not guarantee the implementation of all these functions. In addition though the code editor and the tracer (debuger) are not part of this project, I am ready to share these funds with other developers, if they will help with the development of client-side (in javascript).

Intended impact[edit]

Target audience[edit]

Community engagement[edit]

Fit with strategy[edit]

Sustainability[edit]

Measures of success[edit]

Need target-setting tips?

Participant(s)[edit]

Discussion[edit]

Community Notification[edit]

Please paste a link below to where the relevant communities have been notified of this proposal, and to any other relevant community discussions. Need notification tips?

Endorsements[edit]

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.

  • Community member: add your name and rationale here.