Talk:PDF Export

From Meta, a Wikimedia project coordination wiki

This is a really strange way to get pdf's.. here's the code which I fixed plus got rid of the hard links (You no longer have to edit the code)

 $PDFFile = $_SERVER["DOCUMENT_ROOT"] . '/printouts/' . str_replace("'","_",
   str_replace(" ","_",$wgTitle->getText())) . ".pdf";
 $PDFExec = "/usr/bin/htmldoc --webpage -f " . $PDFFile;
 $addedText = "";
 
 $SaveText = $wgOut->mBodytext;
 $wgOut->mBodytext = ""; 
 
 $i = strpos($SaveText,"{");
 while ($i >= 0 && $SaveText != "") {
   $j = strpos($SaveText,"}");
   if ($j <= $i) break;
   $multi_art = explode('|',substr($SaveText, $i+1, $j-$i-1));
   if (strlen($SaveText) > $j+1)
     $SaveText = substr($SaveText, $j+1);
   else
     $SaveText = "";
   $NewBodyText = "";
   foreach ($multi_art as $one_art) {
     $wgOut->mBodytext = "";
     $art = trim($one_art);
     $addedText .= "Creating: (" . $art . ")<br>";
     $PDFTitle = Title::newFromURL( $art );
     $PDFArticle = new Article($PDFTitle);
     $PDFArticle->view();
     $bodyText = str_replace('<img src="/stylesheets/images/magnify-clip.png" width="15" height="11" alt="Enlarge" />',
                 '',
                 str_replace('<div class="editsection" style="float:right;margin-left:5px;">[',
                 '',
                 str_replace('>edit</a>]</div>',
                 '></a>',
                 $wgOut->mBodytext)));
     $NewBodyText .= "<h1>" . $art . "</h1><hr>" . str_replace('<a href="/index.php',
                     '<a href="http://' . $_SERVER["SERVER_NAME"] . ':' . $_SERVER["SERVER_PORT"] . '/index.php',
                     str_replace('<img src="/images/thumb',
                     '<img src="' . $_SERVER["DOCUMENT_ROOT"] . '/images/thumb',
                     $bodyText));


   }
   $h = fopen("/tmp/" . str_replace("'","_",str_replace(" ","_",$art)) . ".htm" ,"w");
   fwrite($h,"<html><body>");
   fwrite($h,$NewBodyText);
   fwrite($h,"</body></html>");
   fclose($h);
   $PDFExec .= " " . "/tmp/" . str_replace("'","_",str_replace(" ","_",$art)) . ".htm";
   $i = strpos($SaveText,"{");
 }
 
 exec($PDFExec, $results);
 foreach ($results as $line)
   $addedText .= $line . "<br>";
 
 $addedText .= "<br><a href='http://" . $_SERVER["SERVER_NAME"] . ':' . $_SERVER["SERVER_PORT"] . '/printouts/' .
   str_replace("'","_",str_replace(" ","_",$wgTitle->getText())) . ".pdf'>" .
   $wgTitle->getText() . ".pdf</a>";
 $wgOut->mBodytext = "";
 
 $wgArticle->view();
 $wgOut->addHTML($addedText);

No tmp directory[edit]

Are there any preconditions? Something like a special caching? I get the error "fopen(/tmp/Print_Testfile1.htm): failed to open stream: No such file or directory". I don't have a directory tmp (I am using MediaWiki 1.4.0 with PHP 4.3.10 (apache) and MySQL 4.1.9-max on Windows XP for testing). I created the directory manually. But where is the right place? When are htm-files dumped into this directory? Thanks, Lukas


Change the code to be "c:/tmp/etc..." or somesuch - the coding I did was for Linux. Also make sure permissions allow the default internet user (INET_USER or something like that) to read/write to that folder. --MHart 13:37, 10 August 2005 (UTC)[reply]

I can't get this to work[edit]

I can't get this to work. The instructions are not very clear. I get a similar error to Lukas above:

PHP Warning: fclose(): supplied argument is not a valid stream resource in C:\\Program Files\\Apache Group\\Apache2\\htdocs\\wiki\\printarticles.php on line 262, referer: http://localhost/wiki/index.php/Pdf The system cannot find the path specified.

I am not entirely clear how this implimentation works. I don't want people to have to format their page specially to be able to print.

I would love to have a 'create as pdf' link on each and every page on my wiki. All I really need is a pure html page to be produced showing just the page content, similar to the 'printable' page generateable by some skins, but without the css. That way I can use HTMLDoc quite easily.. Is this possible? -Chris

If you're using Windows[edit]

Then /tmp doesn't exist. You'll need to find the code that references /tmp and change it to your Windows temp directory. i.e. C:\\tmp

Instructions for current MediaWiki version 1.6.6[edit]

The hook line isn't present any more in the current version's index.php...

1.5beta3[edit]

I can't seem to get it to work in 1.5b3. I see the following error in the Apache error logs:

sh: -c: line 1: syntax error near unexpected token `;'
sh: -c: line 1: `/usr/bin/htmldoc --webpage -f /path/to/mediawiki-1.5beta3/printouts/Some_Url.pdf /tmp/Some_Url.htm /tmp/var_tocShowText_=_"show";_var_tocHideText_=_"hide";_showTocToggle();.htm'

I also see the following on generated PrintArticles.php page:

Creating: (Some Url)
Creating: (var tocShowText = "show"; var tocHideText = "hide"; showTocToggle();)

My guess? In version 1.5b3, MediaWiki had

{var tocShowText = "show"; var tocHideText = "hide"; showTocToggle();}

somewhere in the page, but I know diddly about PHP. I created the PrintArticle.php script from 1.5b3's index.php. I then copied PrintArticles.php to my old install of mediawiki, and it works fine there.


The PrintArticles.php page is a modification/strip of index.php and probably needs to be redone with each new version of MediaWiki's index.php file. The error is most likely caused by the old index.php stuff that's at the top of the PrintArticles page. --MHart 13:40, 10 August 2005 (UTC)[reply]


Any plans to make this into an extension or provide steps to integrate into 1.6.x? Also, does this method support table output? --Dandavis 15:45, 26 April 2006 (UTC)[reply]


I'm using MediaWiki 1.6.3 and can't find the location the instructions are referencing. it says Near the bottom (between the big switch/case and $wgOut->output();, add the script below. there is no switch/case and $wgOut->output() either.

HTMLDoc not working with Wikimedia generated pages?[edit]

I've tried using HTMLDoc in the manner described. No problems with invoking the software to have a page converted to PDF, but HTMLDoc itself seems to choke on the pages generated by Mediawiki (we're using v1.6.3). The error message is:

 Unable to parse HTML element on line 10.

I get the same result if I try to convert a Mediawiki page manually.

The offending line 10 is:

 <style type="text/css" media="screen,projection">/*<![CDATA[*/ @import "/mediawiki/skins/monobook/main.css?7"; /*]]>*/</style>

If I remove the line, it'll just stall on a similar statement further down. Has anyone else succesfully converted Mediawiki pages using HTMLDoc?

By the way, the HTMLDoc documentation suggests interesting ways to run the software as a cgi-script (by URL or from within PHP). This would get rid of having to store a temporary PDF, and will serve the PDF to the user right in his browser.

Different output that a "Printable" version...[edit]

With the extension installed, it produces output that is styled different that the default "Printable Version". Is there a special tweak to the style sheet that needs to be done to get it to appear properly? The two variations are here: Printable Version (default) PDF Export Version --156.99.249.151 20:08, 9 August 2006 (UTC)[reply]

mozzilla2ps[edit]

I had great looking results with mozzilla2ps. Installing it was a bit tricky. At least for me as I have only limited knowledge of xulrunner and mozzilla technologies in general. The resulting output is the same as what you would get by printing the page directly. I only used it externally, but I can't see a reason why ot could not be integrated to mediawiki in a similar way as htmldoc. --Jussi 11:41, 15 September 2006 (UTC)[reply]

PHP error[edit]

Call to a member function on a non-object in /usr/home/admin/domains/<<domain>>/public_html/mediawiki-1.6.10/PrintArticles.php on line 183

PrintArticles.php:

<?php
/**
 * Main wiki script; see docs/design.txt
 * @package MediaWiki
 */
$wgRequestTime = microtime();

# getrusage() does not exist on the Microsoft Windows platforms, catching this
if ( function_exists ( 'getrusage' ) ) {
	$wgRUstart = getrusage();
} else {
	$wgRUstart = array();
}

unset( $IP );
@ini_set( 'allow_url_fopen', 0 ); # For security...

if ( isset( $_REQUEST['GLOBALS'] ) ) {
	die( '<a href="http://www.hardened-php.net/index.76.html">$GLOBALS overwrite vulnerability</a>');
}

# Valid web server entry point, enable includes.
# Please don't move this line to includes/Defines.php. This line essentially
# defines a valid entry point. If you put it in includes/Defines.php, then
# any script that includes it becomes an entry point, thereby defeating
# its purpose.
define( 'MEDIAWIKI', true );

# Load up some global defines.
require_once( './includes/Defines.php' );

# LocalSettings.php is the per site customization file. If it does not exit
# the wiki installer need to be launched or the generated file moved from
# ./config/ to ./
if( !file_exists( 'LocalSettings.php' ) ) {
	$IP = '.';
	require_once( 'includes/DefaultSettings.php' ); # used for printing the version
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
	<head>
		<title>MediaWiki <?php echo $wgVersion ?></title>
		<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
		<style type='text/css' media='screen, projection'>
			html, body {
				color: #000;
				background-color: #fff;
				font-family: sans-serif;
				text-align: center;
			}

			h1 {
				font-size: 150%;
			}
		</style>
	</head>
	<body>
		<img src='skins/common/images/mediawiki.png' alt='The MediaWiki logo' />

		<h1>MediaWiki <?php echo $wgVersion ?></h1>
		<div class='error'>
		<?php
		if ( file_exists( 'config/LocalSettings.php' ) ) {
			echo( 'To complete the installation, move <tt>config/LocalSettings.php</tt> to the parent directory.' );
		} else {
			echo( 'Please <a href="config/index.php" title="setup">setup the wiki</a> first.' );
		}
		?>

		</div>
	</body>
</html>
<?php
	die();
}

# Include this site setttings
require_once( './LocalSettings.php' );
# Prepare MediaWiki
require_once( 'includes/Setup.php' );

# Initialize MediaWiki base class
require_once( "includes/Wiki.php" );
$mediaWiki = new MediaWiki();

wfProfileIn( 'main-misc-setup' );
OutputPage::setEncodings(); # Not really used yet

# Query string fields
$action = $wgRequest->getVal( 'action', 'view' );
$title = $wgRequest->getVal( 'title' );

#
# Send Ajax requests to the Ajax dispatcher.
#
if ( $wgUseAjax && $action == 'ajax' ) {
	require_once( 'AjaxDispatcher.php' );

	$dispatcher = new AjaxDispatcher();
	$dispatcher->performAction();

	exit;
}

$wgTitle = $mediaWiki->checkInitialQueries( $title,$action,$wgOut, $wgRequest, $wgContLang );
if ($wgTitle == NULL) {
	unset( $wgTitle );
}

wfProfileOut( 'main-misc-setup' );

# Setting global variables in mediaWiki
$mediaWiki->setVal( 'Server', $wgServer );
$mediaWiki->setVal( 'DisableInternalSearch', $wgDisableInternalSearch );
$mediaWiki->setVal( 'action', $action );
$mediaWiki->setVal( 'SquidMaxage', $wgSquidMaxage );
$mediaWiki->setVal( 'EnableDublinCoreRdf', $wgEnableDublinCoreRdf );
$mediaWiki->setVal( 'EnableCreativeCommonsRdf', $wgEnableCreativeCommonsRdf );
$mediaWiki->setVal( 'CommandLineMode', $wgCommandLineMode );
$mediaWiki->setVal( 'UseExternalEditor', $wgUseExternalEditor );
$mediaWiki->setVal( 'DisabledActions', $wgDisabledActions );

$wgArticle = $mediaWiki->initialize ( $wgTitle, $wgOut, $wgUser, $wgRequest );
#$mediaWiki->finalCleanup ( $wgDeferredUpdateList, $wgLoadBalancer, $wgOut );

$PDFFile = $_SERVER["DOCUMENT_ROOT"] . '/printouts/' . str_replace("'","_",
       str_replace(" ","_",$wgTitle->getText())) . ".pdf";
putenv("HTMLDOC_NOCGI=1"); 
$PDFExec = "/usr/bin/htmldoc --webpage -f " . $PDFFile;
$addedText = "";

$SaveText = $wgOut->mBodytext;
$wgOut->mBodytext = "";

$i = strpos($SaveText,"{");
while ($i >= 0 && $SaveText != "") {
  $j = strpos($SaveText,"}");
  if ($j <= $i) break;
  $multi_art = explode('|',substr($SaveText, $i+1, $j-$i-1));
  if (strlen($SaveText) > $j+1)
    $SaveText = substr($SaveText, $j+1);
  else
    $SaveText = "";
  $NewBodyText = "";
  foreach ($multi_art as $one_art) {
    $wgOut->mBodytext = "";
    $art = trim($one_art);
    $addedText .= "Creating: (" . $art . ")<br>";
    $PDFTitle = Title::newFromURL( $art );
    $PDFArticle = new Article($PDFTitle);
    $PDFArticle->view();
    $bodyText = str_replace('<img src="/stylesheets/images/magnify-clip.png" width="15" height="11" alt="Enlarge" />',
                '',
                str_replace('<div class="editsection" style="float:right;margin-left:5px;">[',
                '',
                str_replace('>edit</a>]</div>',
                '></a>', 
                $wgOut->mBodytext)));
    $NewBodyText .= "<h1>" . $art . "</h1><hr>" . str_replace('<a href="/index.php',
                    '<a href="http://' . $_SERVER["SERVER_NAME"] . '/index.php',
                    str_replace('<img src="/images/thumb',
                    '<img src="' . $_SERVER["DOCUMENT_ROOT"] . '/images/thumb',
                    $bodyText));
  }
  $h = fopen("/tmp/" . str_replace("'","_",str_replace(" ","_",$art)) . ".htm" ,"w");
  fwrite($h,"<html><body>");
  fwrite($h,$NewBodyText);
  fwrite($h,"</body></html>");
  fclose($h);
  $PDFExec .= " " . "/tmp/" . str_replace("'","_",str_replace(" ","_",$art)) . ".htm";
  $i = strpos($SaveText,"{");
}

exec($PDFExec, $results);
foreach ($results as $line)
  $addedText .= $line . "<br>";

$addedText .= "<br><a href='http://" . $_SERVER["SERVER_NAME"] . '/printouts/' .
              str_replace("'","_",str_replace(" ","_",$wgTitle->getText())) . ".pdf'>" . 
              $wgTitle->getText() . ".pdf</a>";
$wgOut->mBodytext = "";

$wgArticle->view();
$wgOut->addHTML($addedText);

# Not sure when $wgPostCommitUpdateList gets set, so I keep this separate from finalCleanup
$mediaWiki->doUpdates( $wgPostCommitUpdateList );

$mediaWiki->restInPeace( $wgLoadBalancer );
?>

HTMLDOC is installed