Changeset 5129


Ignore:
Timestamp:
09/24/09 14:05:52 (3 years ago)
Author:
AndrewJones
Message:

Item1363: small fixes and docs update

Location:
trunk/SearchEngineKinoSearchAddOn
Files:
1 added
4 edited

Legend:

Unmodified
Added
Removed
  • trunk/SearchEngineKinoSearchAddOn/data/System/KinoSearch.txt

    r4915 r5129  
    3333   * ==text:kino== or just ==kino== 
    3434   * ==text:"search engine"== or just =="search engine"== 
    35    * ==author:MarkusHesse== — note that to search for a wiki author, use their WikiName 
     35   * ==author:%WIKINAME%== to search for a wiki author, use their WikiName 
    3636   * ==form:WebFormName== to get all topics with that form attached. 
    3737   * ==type:doc== to get all attachments of given type 
  • trunk/SearchEngineKinoSearchAddOn/data/System/SearchEngineKinoSearchAddOn.txt

    r4923 r5129  
    55%TOC% 
    66 
    7 [[http://www.rectangular.com/kinosearch/][KinoSearch]] is a Perl implementation of Lucene search engine (implemented in Java). This is the base of this indexed search engine for Foswiki. With !KinoSearch you create an index over all webs including attachments like Word, Excel and PDF. Based on that you get a really fast search over all topics and the attachments. You need this add-on if: 
    8    * your wiki has grown so big, that normal search is too slow or 
     7[[http://www.rectangular.com/kinosearch/][KinoSearch]] is a Perl implementation of the [[http://lucene.apache.org/java/docs/][Apache Lucene]] search engine (implemented in Java). This is the implementation of this indexed search engine for Foswiki. With !KinoSearch you create an index over all webs including attachments like Word, Excel and PDF. Based on that you get a really fast search over all topics and the attachments. You need this add-on if: 
     8   * your wiki has grown so big, that default search is too slow or 
    99   * you want to do search not only on the topics but also the attachments. 
    1010 
    1111---++ Screenshot 
    1212 
    13      <img src="%ATTACHURLPATH%/KinoSearchResult.jpg" alt="KinoSearchResult.jpg"  /> 
     13<img src="%ATTACHURLPATH%/KinoSearchResult.jpg" alt="KinoSearchResult.jpg" /> 
    1414 
    1515---++ Usage 
     
    1919---+++ Searching With Kinosearch 
    2020 
    21 The ==kinosearch== script uses a template called ==kinosearch.tmpl== to render the results. You can override it in the same way as any other templates (i.e. create ==kinosearch.yourskin.tmpl==). 
    22  
    23 There is also the *KinoSearch* topic with a form ready to use with the =kinosearch= script. 
    24  
    25 If you have enabled the SearchEngineKinoSearchPlugin, you can use the rest handler either from a URL (this works only for a smaller wiki), or the command line. The syntax is identical to the =kinosearch= script. 
     21The =kinosearch= script uses a template called =kinosearch.tmpl= to render the results. You can override it in the same way as any other templates (i.e. create =kinosearch.yourskin.tmpl=, =Set SKIN = yourskin,pattern=). 
     22 
     23There is also the *[[KinoSearch]]* topic with a form ready to use with the =kinosearch= script. 
     24 
     25If you have enabled the SearchEngineKinoSearchPlugin, you can use the rest handler instead. The syntax is identical to the =kinosearch= script. 
    2626   * =%SCRIPTURL{rest}%/SearchEngineKinoSearchPlugin/search= 
    27    * =cd foswiki/bin ; ./rest SearchEngineKinoSearchPlugin.search= 
     27   * =cd foswiki/bin ; ./rest <nop>SearchEngineKinoSearchPlugin.search= 
     28 
     29__Note:__ Rest handlers currently require the user to be authenticated. If you want unauthenticated users to search, use the =kinosearch= script instead. 
    2830    
    29 The following form submits text to the =kinosearch= script. The installation instructions are detailed below. 
     31The following form submits a query to the =kinosearch= script. The installation instructions are detailed below. 
    3032 
    3133<form action="%SCRIPTURLPATH%/kinosearch%SCRIPTSUFFIX%/%INTURLENCODE{"%INCLUDINGWEB%"}%/"> 
    32    <input type="text" name="search" size="32" class="foswikiInputField" /> <input type="submit" value="Search text" class="foswikiSubmit" /> | [[%SYSTEMWEB%.KinoSearch][Help]] 
     34   <input type="text" name="search" size="32" class="foswikiInputField" /> <input type="submit" value="Search text" class="foswikiSubmit" /><span class="foswikiSeparator"> | </span>[[%SYSTEMWEB%.KinoSearch][Help]] 
    3335</form> 
    3436 
     
    4547 
    4648The reason this feature is experimental, is that kinosearch does not do partial matching, so searching for =TAG= will not match text like =%TAG{"something"}%=, only instances where the word TAG is seperated by whitespace. Foswiki's SEARCH expects total partial matching. 
     49 
     50__Note:__ This currently only works for Foswiki =1.0.x=. 
    4751 
    4852---+++ RSS Feeds 
     
    6670---+++ Updating the Index 
    6771 
    68 The ==kinoupdate== script uses the web's ==.changes== files to know about topic modifications. Also, a ==.kinoupdate== file is used on each web directory storing the last timestamp the script was run on it. So when this script is executed, it first checks if there are any topic updates since last execution. The most recent topic updates are removed from the index and then reindexed again. 
     72The =kinoupdate= script uses the web's =.changes= files to know about topic modifications. Also, a =.kinoupdate= file is used on each web directory storing the last timestamp the script was run on it. So when this script is executed, it first checks if there are any topic updates since last execution. The most recent topic updates are removed from the index and then reindexed again. 
    6973   * =cd foswiki/kinosearch/bin ; ./kinoupdate= 
    7074 
     
    8488 
    8589By default, the following file types are indexed: 
    86   * =.txt= 
    87   * =.html= 
    88   * =.xml= 
    89   * =.doc= 
    90   * =.docx= 
    91   * =.xls= 
    92   * =.xlsx= 
    93   * =.ppt= 
    94   * =.pptx= 
    95   * =.pdf= 
     90   * =.txt= 
     91   * =.html= 
     92   * =.xml= 
     93   * =.doc= 
     94   * =.docx= 
     95   * =.xls= 
     96   * =.xlsx= 
     97   * =.ppt= 
     98   * =.pptx= 
     99   * =.pdf= 
    96100 
    97101You can change this with the =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexExtensions}= setting in =configure=. 
    98102 
    99 If you add other file extensions, they are treated as ASCII files. If needed, you can add more specialised stringifiers for further document types ( see [[%TOPIC%#Indexing_further_document_types][Indexing further document types]]). 
     103If you add other file extensions, they are treated as ASCII files. If needed, you can add more specialised stringifiers for further document types (see [[%TOPIC%#Indexing_further_document_types][Indexing further document types]]). 
    100104 
    101105---+++ Indexing of Form Fields 
     
    105109__Note__: With =kinoupdate= only the form fields that existed at the time the initial index was created are indexed. Thus if you add a form or if you add a new field to an existing form, you should create a new index with =kinoindex=. 
    106110 
    107 ---++ Indexing Further Document Types 
     111---++ Add-On Installation Instructions 
     112 
     113---+++ Backend for Indexing Word 2003 Documents 
     114 
     115To index Word 2003 Documents (=.doc=) you will need to install one of the following: 
     116 
     117   * =antiword= (recommended) 
     118   * =abiword= 
     119   * =wvWare= 
     120 
     121You can then select the tool to use in =configure=. 
     122 
     123---+++ Backend for PDF 
     124 
     125To index =.pdf= files you need to install =xpdf-utils=. 
     126 
     127---+++ Backend for PPT 
     128 
     129To index =.ppt= files you need to install =ppthtml=. 
     130 
     131---+++ Backends for DOCX, PPTX 
     132 
     133To index these file types, you will need to install the following tools from Sourceforge: 
     134   * [[http://sourceforge.net/projects/docx2txt/][docx2txt]] for =.docx= 
     135   * [[http://sourceforge.net/projects/pptx2txt/][pptx2txt]] for =.pptx= 
     136 
     137Then set the command path to these tools in =configure=. 
     138 
     139---+++ Instaling the !AddOn 
     140 
     141%$INSTALLER_INSTRUCTIONS% 
     142 
     143---+++ Configuration 
     144 
     145There are a number of settings that need to be set in =configure=. 
     146 
     147You only need to enable the SearchEngineKinoSearchPlugin if you are using the =rest= handlers, or the =%<nop>KINOSEARCH%= macro. 
     148 
     149---+++ Test of the Installation 
     150 
     151   * Test if the installation was successful: 
     152      * Check that =antiword=, =abiword= or =wvHtml= is in place: Type =antiword=,  =abiword= or =wvHtml= on the prompt and check that the command exists. 
     153      * Check that =pdftotext= is in place: Type =pdftotext= on the prompt and check that the command exists. 
     154      * Check that =ppthtml= is in place: Type =ppthtml= on the prompt and check that the command exists. 
     155      * Change the working directory to the =kinosearch/bin= Foswiki installation directory. 
     156      * Run =./kinoindex= 
     157      * Once finished, open a browser window and point it to the =[[System.KinoSearch]]= topic. 
     158      * Just type a query and check the results. 
     159 
     160---+++ Test of Stringification with =ks_test= 
     161 
     162Some users report problems with the stringification: The =kinoindex= scipts fails, takes too long on attachments or =kinosearch= does not yield correct results. Some times this may result from installation errors esp. of the installation of the backends for the stringification. 
     163 
     164=ks_test= give you the opportunity to test the stringification in advance. 
     165 
     166Usage: =ks_test stringify file_name= 
     167 
     168(I plan to extend ks_test, but at the moment the only possible second parameter is stringify). 
     169 
     170In the result you see, which stringifier is used and the result of the stringification. 
     171 
     172Example: 
     173 
     174<verbatim> 
     175/path/to/foswiki/kinosearch/bin$ ./ks_test stringify /path/to/foswiki/SearchEngineKinoSearchAddOn/test/unit/SearchEngineKinoSearchAddOn/attachement_examples/Simple_example.doc 
     176 
     177Used stringifier: Foswiki::Contrib::SearchEngineKinoSearchAddOn::StringifyPlugins::DOC_antiword 
     178 
     179Stringified text: 
     180 
     181  Simple example  Keyword: dummy  Umlaute: Grober, Uberschall, Anderung 
     182</verbatim> 
     183 
     184You see that the stringifier DOC_antiword is used and the resulting 
     185text seems to be O.K. 
     186 
     187---+++ Upgrading From the TWiki Version 
     188 
     189If you previously used the TWiki version (< 1.21) of this !AddOn (either on TWiki or on Foswiki) then you will need to move your settings from [[%LOCALSITEPREFS%]] into =configure=. 
     190 
     191Also the following settings have been renamed, for consistency: 
     192 
     193   * =$Foswiki::cfg{KinoSearchLogDir}= __-->__ =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{LogDirectory}= 
     194   * =$Foswiki::cfg{KinoSearchIndexDir}= __-->__ =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexDirectory}= 
     195 
     196---++ Further Development 
     197 
     198There is certainly a lot more that can be done with this Add-on, such as adding more stringifiers, improving the performance and making it more robust. See Foswiki:Tasks/SearchEngineKinoSearchAddOn for currently open tasks. 
     199 
     200---+++ Indexing Further Document Types 
    108201 
    109202The indexing of attached documents is realised in two steps:  
     
    111204   1 this ASCII string is indexed with <nop>KinoSearch. This is the normal way in all index applications.   
    112205 
    113 To index different types of documents, it is necessary to have specialised stringifiers, i.e. classes to extract the ASCII text out of the document.  In this add-on, a plug-in mechanism is implemented, so that additional stringifiers can be added without changing the existing code. All stringifier plugins are stored in the directory =lib/Foswiki/Contrib/KinoSearch/StringifierPlugins=.  
     206To index different types of documents, it is necessary to have specialised stringifiers, i.e. classes to extract the ASCII text out of the document.  In this add-on, a plug-in mechanism is implemented, so that additional stringifiers can be added without changing the existing code. All stringifier plugins are stored in the directory =lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/StringifierPlugins=.  
    114207 
    115208You can add new stringifier plugins by just adding new files here. The minimum things to be implemented are: 
     
    118211   * The plugin must implement the method =$text = stringForFile ($filename)= 
    119212 
    120 Then you should add to the list in =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexExtensions}= in =configure=. Now the defined document type should be indexed and the new stringifier should be used. 
    121  
    122 NOTE: If you just extend the list without having a special stringifier in place, this document type is treaded like an ASCII file. For binary document types, this may lead to problems (inpropper search results, long indexing times and potential indexing break downs). 
    123  
    124 ---++ Add-On Installation Instructions 
    125  
    126 ---+++ Backend for Indexing Word 2003 Documents 
    127  
    128 Install a backend to stringify Word documents if you want to index Word documents. For this either install antiword, abiword or !wvWare. 
    129  
    130 __Note:__ This add-on comes with stringifiers for all three of them. Select the right stringifier is in =configure=. 
    131  
    132 ---+++ Backend for PDF 
    133  
    134 To index =.pdf= files you need to install =xpdf-utils=. 
    135  
    136 ---+++ Backend for PPT 
    137  
    138 To index =.ppt= files you need to install =ppthtml=. 
    139  
    140 ---+++ Backends for DOCX, PPTX 
    141  
    142 To index these file types, you will need to install the following tools from Sourceforge: 
    143    * [[http://sourceforge.net/projects/docx2txt/][docx2txt]] for =.docx= 
    144    * [[http://sourceforge.net/projects/pptx2txt/][pptx2txt]] for =.pptx= 
    145  
    146 Then set the command path to these tools in =configure=. 
    147  
    148 _Note for Windows_: For Windows, make sure you have a C-compiler in place. This is normally part of Visual Studio. 
    149  
    150 ---+++ Instaling the !AddOn 
    151  
    152 %$INSTALLER_INSTRUCTIONS% 
    153  
    154 ---+++ Configuration 
    155  
    156 There are a number of settings that need to be set in =configure=. 
    157  
    158 You only need to enable the SearchEngineKinoSearchPlugin if you are using the =rest= handlers, or the =%<nop>KINOSEARCH%= macro. 
    159  
    160 ---+++ Test of the Installation 
    161  
    162    * Test if the installation was successful: 
    163       * Check that =antiword=, =abiword= or =wvHtml= is in place: Type =antiword=,  =abiword= or =wvHtml= on the prompt and check that the command exists. 
    164       * Check that =pdftotext= is in place: Type =pdftotext= on the prompt and check that the command exists. 
    165       * Check that =ppthtml= is in place: Type =ppthtml= on the prompt and check that the command exists. 
    166       * Change the working directory to the ==kinosearch/bin== Foswiki installation directory. 
    167       * Run =./kinoindex= 
    168       * Once finished, open a browser window and point it to the ==System/KinoSearch== topic. 
    169       * Just type a query and check the results. 
    170  
    171 ---+++ Test of Stringification with =ks_test= 
    172  
    173 Some users report problems with the stringification: The =kinoindex= scipts fails, takes too long on attachments or =kinosearch= does not yield correct results. Some times this may result from installation errors esp. of the installation of the backends for the stringification. 
    174  
    175 =ks_test= give you the opportunity to test the stringification in advance. 
    176  
    177 Usage: =ks_test stringify file_name= 
    178  
    179 (I plan to extend ks_test, but at the moment the only possible second parameter is stringify). 
    180  
    181 In the result you see, which stringifier is used and the result of the stringification. 
    182  
    183 Example: 
    184  
    185 <verbatim> 
    186 /path/to/foswiki/kinosearch/bin$ ./ks_test stringify /path/to/foswiki/SearchEngineKinoSearchAddOn/test/unit/SearchEngineKinoSearchAddOn/attachement_examples/Simple_example.doc 
    187  
    188 Used stringifier: Foswiki::Contrib::SearchEngineKinoSearchAddOn::StringifyPlugins::DOC_antiword 
    189  
    190 Stringified text: 
    191  
    192   Simple example  Keyword: dummy  Umlaute: Grober, Uberschall, Anderung 
    193 </verbatim> 
    194  
    195 You see that the stringifier DOC_antiword is used and the resulting 
    196 text seems to be O.K. 
    197  
    198 ---+++ Upgrading From the TWiki Version 
    199  
    200 If you previously used the TWiki version (< 1.21) of this !AddOn (either on TWiki or on Foswiki) then you will need to move your settings from [[%LOCALSITEPREFS%]] into =configure=. 
    201  
    202 Also the following settings have been renamed, for consistency: 
    203  
    204    * =$Foswiki::cfg{KinoSearchLogDir}= => =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{LogDirectory}= 
    205    * =$Foswiki::cfg{KinoSearchIndexDir}= => =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexDirectory}= 
     213All the stringifiers have unit tests associated with them, and we would encourage you to provide unit tests for any you wish to contribute. See Foswiki:Development/UnitTests for more information on unit testing. 
    206214 
    207215---++ Add-On Info 
     
    211219--> 
    212220 
    213 |  Author(s): | Foswiki:Main.MarkusHesse and Foswiki:Main.SvenDowideit | 
    214 |  Copyright:  | &copy; 2009, Foswiki:Main.MarkusHesse; &copy; 2009, Foswiki Contributors | 
     221|  Author(s): | Foswiki:Main.MarkusHesse, Foswiki:Main.SvenDowideit & Foswiki:Main.AndrewJones | 
     222|  Copyright:  | &copy; 2007, Foswiki:Main.MarkusHesse; &copy; 2009, Foswiki Contributors | 
    215223|  Version: | %$VERSION% | 
    216224|  Change History: | <!-- versions below in reverse order -->&nbsp; | 
    217  Sep 2009:  | v 1.21 Foswikitask:Item1363 - port to Foswiki; add stringifiers for =.docx=, =.pptx= and =.xlsx=; change the =kinosearch= script to work with [[Foswiki:Development.FoswikiStandAlone][FSA]]; Moved settings into =configure=; Commands now set in =configure=; Replaced =system()= calls with =Foswiki::Sandbox->sysCommand()=; updated and simplified docs; Foswikitask:Item8246 - fix checking of access controls -- Foswiki:Main.AndrewJones, Foswiki:Main.WillNorris | 
     22524 Sep 2009:  | v 1.21, Foswikitask:Item1363: port to Foswiki -- Foswiki:Main.WillNorris. add stringifiers for =.docx=, =.pptx= and =.xlsx=; change the =kinosearch= script to work with [[Foswiki:Development.FoswikiStandAlone][FSA]]; Moved settings into =configure=; Commands now set in =configure=; Replaced =system()= calls with =Foswiki::Sandbox->sysCommand()=; Handle passworded MS Office files; Make the index more robust if it somehow encounters binary files; Can now specify skipped topics; updated and simplified docs; added doc for TipsContrib; update templates; Foswikitask:Item8246: fix checking of access controls -- Foswiki:Main.AndrewJones | 
    218226|  06 Nov 2008:  | v 1.20, minor revert to stop crash | 
    219227|  05 Nov 2008:  | v 1.19, fixes for (nex)twiki/trunk | 
    220 |  20 Aug 2008:  | v 1.18, added Integrated SEARCH, SearchEngineKinoSearchPlugin, restHandlers, updated code and tests -- TWiki:Main.SvenDowideit | 
    221 |  6 Aug 2008:   | v 1.17, TWikibug:Item5717: persist use form choices, TWikibug:Item5647: cope better with attachment problems -- TWiki:Main.SvenDowideit | 
     228|  20 Aug 2008:  | v 1.18, added Integrated SEARCH, SearchEngineKinoSearchPlugin, restHandlers, updated code and tests -- Foswiki:Main.SvenDowideit | 
     229|  6 Aug 2008:   | v 1.17, TWikibug:Item5717: persist use form choices, TWikibug:Item5647: cope better with attachment problems -- Foswiki:Main.SvenDowideit | 
    222230|  4 Jun 2008:   | v 1.16, TWikibug:Item5646: Problem with attachments with capital letter suffix | 
    223231|  12 May 2008:  | v 1.15, TWikibug:Item5579, TWikibug:Item5580, TWikibug:Item5619: Problem with ALLOWWEBVIEW and Forms fixed | 
  • trunk/SearchEngineKinoSearchAddOn/lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/Index.pm

    r5116 r5129  
    496496            foreach my $field (@fields) { 
    497497                my $name = $field->{"name"}; 
    498                 if ( defined($fldNames{$name}) &&  $fldNames{$name}) { 
     498                if ( %fldNames &&  $fldNames{$name}) { 
    499499                    my $value = $field->{"value"}; 
    500500                    next if (!defined($value)); #field not there. 
  • trunk/SearchEngineKinoSearchAddOn/lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/MANIFEST

    r4913 r5129  
    33data/System/SearchEngineKinoSearchAddOn.txt 0664 Documentation 
    44data/System/KinoSearch.txt 0664 End user documentation 
     5data/System/TipTopicForKinoSearch.txt 0664 Topic for TipsContrib 
    56pub/System/SearchEngineKinoSearchAddOn/KinoSearchResult.jpg 0664 Screenshot 
    67pub/System/SearchEngineKinoSearchAddOn/KinoSEARCH.jpg 0664 Screenshot 
Note: See TracChangeset for help on using the changeset viewer.