Wikimedia Blog/Converting wiki pages to blog posts
From Meta, a Wikimedia project coordination wiki
[edit] How to convert a wiki page on a Wikimedia wiki into a Wordpress posting on the Wikimedia blog
- Open the printable version of the wiki page (likely to be accessible via the "Print/Export" link in the sidebar);
- Copy the relevant HTML for the content you want to publish (i.e. leave all the sidebar navigation and other stuff);
- Remove the table of content (everything between
<table id="toc" class="toc">and</table>at the beginning); - Pass it through the python cleanup script:
python blogfix.py wikipage.html > wikipageclean.html
where blogfix.py is:
#!/usr/bin/python # Short script to take the HTML from a Wikimedia wiki and turn it into HTML suitable for posting to the Wikimedia blog # Original author: [[user:RobLa]], modified by [[user:guillom]] and [[user:HaeB]] import os import sys import re wikidomain = 'meta.wikimedia.org' # wikidomain = 'www.mediawiki.org' if len(sys.argv)>1: for arg in sys.argv[1:]: f = open(arg) for line in f: old = r'<h([2345])><span class="editsection">\[<a href="[^\"]*" title="[^\"]*">edit</a>]</span> <span class="mw-headline" id="([^\"]*)">([^\<]*)</span></h[234]>' new = r'<h\1 id="\2">\3</h\1>' m = re.sub(old, new, line.strip()) old = r'href="/wiki' new = r'href="https://'+wikidomain+'/wiki' m = re.sub(old, new, m) old = r'( rel="nofollow" class="external text"| class="external text" rel="nofollow"| rel="nofollow" class="external autonumber"| rel="nofollow" class="external free"| class="external free" rel="nofollow"| class="mw-headline"| class="extiw")' new = r'' m = re.sub(old, new, m) old = r'( class="external text"| class="external free"| class="external autonumber")' # matches external links to Wikimedia sites without Nofollow new = r'' m = re.sub(old, new, m) # thumbnail layout from http://bits.wikimedia.org/meta.wikimedia.org/load.php?debug=false&lang=en&modules=ext.wikihiero|mediawiki.legacy.commonPrint%2Cshared|skins.vector&only=styles&skin=vector&* old = r'class="thumb tright"' new = r'style="text-align:center;border:1px solid #ccc;margin:2px;float:right;clear:right;margin:0.5em 0 0.8em 1.4em;"' m = re.sub(old, new, m) old = r'class="thumb tleft"' new = r'style="text-align:center;border:1px solid #ccc;margin:2px;float:left;clear:left;margin:0.5em 1.4em 0.8em 0;"' m = re.sub(old, new, m) old = r'class="thumbinner" style="' new = r'style="padding: 3px !important; border: 1px solid rgb(204, 204, 204); text-align: center; overflow: hidden; font-size: 94%; background-color: white; ' m = re.sub(old, new, m) old = r'class="thumbimage"' new = r'style="border:1px solid #ccc;"' m = re.sub(old, new, m) old = r'class="thumbcaption"' new = r'style="border:none;text-align:left;line-height:1.4em;padding:3px !important;font-size:94%;"' m = re.sub(old, new, m) old = r'class="magnify"' new = r'style="float:right;border:none !important;background:none !important;"' m = re.sub(old, new, m) # protocol-relative URLs old = r'href="//' new = r'href="http://' m = re.sub(old, new, m) old = r'src="//' new = r'src="http://' m = re.sub(old, new, m) # escape magic word that would generate an archive list of blog postings old = r'\[archives\]' new = r'[<!-- -->archives]' m = re.sub(old, new, m) print m else: print "usage: blogfix.py foo.html > bar.html" print " (where 'foo.html' is a file containing the HTML output from "+wikidomain+", and 'bar.html' is the cleaner one)"
[edit] Known issues
- Images work fine, including the link to the image description page (although one may want to modify it to point directly to Commons, rather than to the local image description page on the wiki where the page was converted from). But depending on taste, one may want to remove the "magnify" icon on thumbnails, or otherwise modify the wiki layout for the blog. Also, while the URL of the embedded thumbnails should be reasonably stable, the location of certain icons that are automatically added by MediaWiki from bits.wikimedia.org can go stale over time (e.g the aforentioned magnify icon: MW 1.17, MW 1.20wmf2).
- Embedding videos does not work - remove "
" to get a thumbnail with a link to the actual video instead.
<button ...</button> - Section headings that include links may not be converted correctly, leaving a section edit link. Remove each '
<span class="editsection">...</span>' to correct this
If the wiki page was a joint work by several authors (check the edit history of the wiki page for the actual list), you may consider adding a footer which lists them, e.g. in order to honor the attribution requirements of a CC-BY-SA license. Example:
<hr /><em>This article was written by Mark Bergsma, Tomasz Finc, Danese Cooper, Alolita Sharma, CT Woo, Rob Lanphier & Guillaume Paumier. See <a title="revision history" href="http://www.mediawiki.org/w/index.php?title=Wikimedia_engineering_report/2011/April&action=history">full revision history</a>. A <a href="http://www.mediawiki.org/wiki/Wikimedia_engineering_report/2011/April" title="report on mediawiki.org">wiki version</a> is also available.</em>