PDF doc search II
Appearance
This article is considered of unknown usefulness and may be a candidate for deletion. If you want to revive discussion regarding the subject, you may try using the talk page or start a discussion at Meta:Babel. |
Introduction
[edit]I edited includes/SpecialSearch.php to make the saved PDFs searchable. This contribution is limited to a small amount of PDFs. When creating the PDFs don´t use compress text and images in e.g. acrobat distiller. SpecialSearch.php starts 2 AWK-Scripts. These scripts convert the PDFs into strings. These strings are searchable out of the wikisearchmask. Put the scripts searchAllPDFs.sh
and searchPDF.sh
in your mediawiki includes/ Folder. Tested under mediawiki 1.54 and LAMP.
see PDF doc search for the older not maintained code.
Authors
[edit]Bernd Flunger and Armin Lanzinger User:elai
Installation
[edit]- In SpecialSearch.php copy and paste the contribution from
Edit SpecialSearch.php
. - You have to edit this line $output = shell_exec('includes/searchAllPDFs.sh -i '. $s . ' /home/xyz/mediawiki/images'); (Note: change /home/xyz/mediawiki/images to your real server path.
- Edit Special:Allmessages
- In MediaWiki namespace you have to add 2 words: Pdfmatches, Nopdfmatches http://xyz/mediawiki/index.php/MediaWiki:Nopdfmatches http://xyz/mediawiki/index.php/MediaWiki:Pdfmatches for English edit in Pdfmatches: Matches in PDFs in Nopdfmatches: No matches in PDFs for German edit in Pdfmatches: Übereinstimmungen in PDFs in Nopdfmatches: Keine Übereinstimmungen in PDFs
- User rights The files searchAllPDFs.sh and searchPDF.sh have to be readable by your apache-user.
Edit SpecialSearch.php
[edit]Put the content after line 199:
(Note: Contribution starts with ### line)
(Note: replace www.xyz.com with with your domain and replace /home/xyz/ with your mediawiki path)
if( $titleMatches ) { if( $titleMatches->numRows() ) { $wgOut->addWikiText( '==' . wfMsg( 'titlematches' ) . "==\n" ); $wgOut->addHTML( $this->showMatches( $titleMatches ) ); } else { $wgOut->addWikiText( '==' . wfMsg( 'notitlematches' ) . "==\n" ); } } ###################################### ###################################### ##### ##### contribution PDF doc search II ##### $s = $term; $output = shell_exec('includes/searchAllPDFs.sh -i '. $s . ' /home/xyz/mediawiki/images'); $output_array = preg_split('/[\n\r]+/', $output); $n = count($output_array); $n = $n - 1; $PDFi = 0; $realn = 0; while($PDFi<=$n) { $output_array_short = split("\.", $output_array[$i]); $output_array_short[1] = $output_array_short[1].".".$output_array_short[2]; $filename = basename($output_array[$PDFi]); $dirname = dirname($output_array[$PDFi]); $archive = strpos($dirname, 'archiv'); $temp = strpos($dirname, 'temp'); if ( ($temp===false) && ($archive===false) ) { $dirname = str_replace("/home/xyz/mediawiki/images", "", $dirname); $wgOut->addHTML( "$ausgabe \n" ); # $ausgabe = "<li>(<a href=\"http://www.xyz.com/mediawiki/index.php/Bild:$filename\">Beschreibung</a>) <a href=\"http://www.xyz.com/mediawiki/images$dirname/$filename\">$filename</a></li>"; $ausgabe_array[$realn] = "<li>(<a href=\"http://www.xyz.com/mediawiki/index.php/Bild:$filename\">Beschreibung</a>) <a href=\"http://www.xyz.com/mediawiki/images$dirname/$filename\">$filename</a></li>"; $realn++; } $PDFi++; } $PDFi = 0; $realn--; $PDFi = 0; if ($realn>0) { $wgOut->addWikiText( '==' . wfMsg( 'pdfmatches' ) . "==\n" ); $PDFsuchabfrage = "<p></p>\n Ihr Suchbegriff wurde <b>$realn</b> mal gefunden. Ihr Suchbegriff: <b>$s</b> "; $wgOut->addHTML( "$PDFsuchabfrage<br /> " ); } else { $wgOut->addWikiText( '==' . wfMsg( 'nopdfmatches' ) . "==\n" ); } while($PDFi<$realn) { if($ol!=1) { $wgOut->addHTML( " <ol start='1' style='none'>" ); $ol=1; } $wgOut->addHTML( "$ausgabe_array[$PDFi]\n" ); $PDFi++; } $wgOut->addHTML( "</ol>" ); # $wgOut->addHTML( "<b>$PDFsearchstring</b><br /> " ); ############################### ###############################
searchAllPDFs.sh
[edit]#!/bin/bash #--------------------------------------------------- bindir=includes #--------------------------------------------------- if [ "$1" = "-i" ] then ic=$1 shift fi #--------------------------------------------------- what=$1 if [ "$what" = "" ] then echo "" echo "usage: $0 [-i] <what> [<where>]" echo " -i ... ignore case" echo " <what> ... what to search for" echo " <where> ... where (directory) to search for" echo "" exit fi shift #--------------------------------------------------- #curdir=$(pwd) dir=$1 if [ "$dir" = "" ] then dir="./" fi #--------------------------------------------------- find $dir -name "*pdf" -exec ${bindir}/searchPDF.sh $ic $what \{\} \; -print #--------------------------------------------------- #cd $curdir #---------------------------------------------------
searchPDF.sh
[edit]#!/bin/bash #---------------------------------------------------------- if [ "$1" = "-i" ] then ic=$1 shift fi #---------------------------------------------------------- what=$1 file=$2 #---------------------------------------------------------- #strings $file | grep $ic $what |grep $ic -v "/$what" 1>/dev/null 2>&1 gawk 'BEGIN{ n=0 } { if($0 ~ ")Tj$"){ n++ txt=substr($0,2,length($0)-4) print txt next } if($0 ~ ")]TJ$"){ n++ #printf ">>>"$0"<<<" txt=substr($0,3,length($0)-6) ka="@klammer-auf@" kz="@klammer-zu@" kb="@backslash@" gsub(/\\\(/,ka,txt) gsub(/\\\)/,kz,txt) gsub(/\\\\/,kb,txt) gsub("[)][^(]+[(]","",txt) gsub(ka,"(",txt) gsub(kz,")",txt) gsub(kb,"\\",txt) print txt next } } END{ if(n>0){ exit 0 }else{ exit 1 } }' $file | grep $ic $what > /dev/null #---------------------------------------------------------- if [ $? = 0 ] then exit 0 else exit 1 fi #----------------------------------------------------------