Changeset 5129
- Timestamp:
- 09/24/09 14:05:52 (3 years ago)
- Location:
- trunk/SearchEngineKinoSearchAddOn
- Files:
-
- 1 added
- 4 edited
-
data/System/KinoSearch.txt (modified) (1 diff)
-
data/System/SearchEngineKinoSearchAddOn.txt (modified) (9 diffs)
-
data/System/TipTopicForKinoSearch.txt (added)
-
lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/Index.pm (modified) (1 diff)
-
lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/MANIFEST (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
trunk/SearchEngineKinoSearchAddOn/data/System/KinoSearch.txt
r4915 r5129 33 33 * ==text:kino== or just ==kino== 34 34 * ==text:"search engine"== or just =="search engine"== 35 * ==author: MarkusHesse== — note thatto search for a wiki author, use their WikiName35 * ==author:%WIKINAME%== to search for a wiki author, use their WikiName 36 36 * ==form:WebFormName== to get all topics with that form attached. 37 37 * ==type:doc== to get all attachments of given type -
trunk/SearchEngineKinoSearchAddOn/data/System/SearchEngineKinoSearchAddOn.txt
r4923 r5129 5 5 %TOC% 6 6 7 [[http://www.rectangular.com/kinosearch/][KinoSearch]] is a Perl implementation of Lucene search engine (implemented in Java). This is the baseof this indexed search engine for Foswiki. With !KinoSearch you create an index over all webs including attachments like Word, Excel and PDF. Based on that you get a really fast search over all topics and the attachments. You need this add-on if:8 * your wiki has grown so big, that normalsearch is too slow or7 [[http://www.rectangular.com/kinosearch/][KinoSearch]] is a Perl implementation of the [[http://lucene.apache.org/java/docs/][Apache Lucene]] search engine (implemented in Java). This is the implementation of this indexed search engine for Foswiki. With !KinoSearch you create an index over all webs including attachments like Word, Excel and PDF. Based on that you get a really fast search over all topics and the attachments. You need this add-on if: 8 * your wiki has grown so big, that default search is too slow or 9 9 * you want to do search not only on the topics but also the attachments. 10 10 11 11 ---++ Screenshot 12 12 13 <img src="%ATTACHURLPATH%/KinoSearchResult.jpg" alt="KinoSearchResult.jpg"/>13 <img src="%ATTACHURLPATH%/KinoSearchResult.jpg" alt="KinoSearchResult.jpg" /> 14 14 15 15 ---++ Usage … … 19 19 ---+++ Searching With Kinosearch 20 20 21 The = =kinosearch== script uses a template called ==kinosearch.tmpl== to render the results. You can override it in the same way as any other templates (i.e. create ==kinosearch.yourskin.tmpl==).22 23 There is also the * KinoSearch* topic with a form ready to use with the =kinosearch= script.24 25 If you have enabled the SearchEngineKinoSearchPlugin, you can use the rest handler either from a URL (this works only for a smaller wiki), or the command line. The syntax is identical to the =kinosearch= script.21 The =kinosearch= script uses a template called =kinosearch.tmpl= to render the results. You can override it in the same way as any other templates (i.e. create =kinosearch.yourskin.tmpl=, =Set SKIN = yourskin,pattern=). 22 23 There is also the *[[KinoSearch]]* topic with a form ready to use with the =kinosearch= script. 24 25 If you have enabled the SearchEngineKinoSearchPlugin, you can use the rest handler instead. The syntax is identical to the =kinosearch= script. 26 26 * =%SCRIPTURL{rest}%/SearchEngineKinoSearchPlugin/search= 27 * =cd foswiki/bin ; ./rest SearchEngineKinoSearchPlugin.search= 27 * =cd foswiki/bin ; ./rest <nop>SearchEngineKinoSearchPlugin.search= 28 29 __Note:__ Rest handlers currently require the user to be authenticated. If you want unauthenticated users to search, use the =kinosearch= script instead. 28 30 29 The following form submits textto the =kinosearch= script. The installation instructions are detailed below.31 The following form submits a query to the =kinosearch= script. The installation instructions are detailed below. 30 32 31 33 <form action="%SCRIPTURLPATH%/kinosearch%SCRIPTSUFFIX%/%INTURLENCODE{"%INCLUDINGWEB%"}%/"> 32 <input type="text" name="search" size="32" class="foswikiInputField" /> <input type="submit" value="Search text" class="foswikiSubmit" /> |[[%SYSTEMWEB%.KinoSearch][Help]]34 <input type="text" name="search" size="32" class="foswikiInputField" /> <input type="submit" value="Search text" class="foswikiSubmit" /><span class="foswikiSeparator"> | </span>[[%SYSTEMWEB%.KinoSearch][Help]] 33 35 </form> 34 36 … … 45 47 46 48 The reason this feature is experimental, is that kinosearch does not do partial matching, so searching for =TAG= will not match text like =%TAG{"something"}%=, only instances where the word TAG is seperated by whitespace. Foswiki's SEARCH expects total partial matching. 49 50 __Note:__ This currently only works for Foswiki =1.0.x=. 47 51 48 52 ---+++ RSS Feeds … … 66 70 ---+++ Updating the Index 67 71 68 The = =kinoupdate== script uses the web's ==.changes== files to know about topic modifications. Also, a ==.kinoupdate== file is used on each web directory storing the last timestamp the script was run on it. So when this script is executed, it first checks if there are any topic updates since last execution. The most recent topic updates are removed from the index and then reindexed again.72 The =kinoupdate= script uses the web's =.changes= files to know about topic modifications. Also, a =.kinoupdate= file is used on each web directory storing the last timestamp the script was run on it. So when this script is executed, it first checks if there are any topic updates since last execution. The most recent topic updates are removed from the index and then reindexed again. 69 73 * =cd foswiki/kinosearch/bin ; ./kinoupdate= 70 74 … … 84 88 85 89 By default, the following file types are indexed: 86 * =.txt=87 * =.html=88 * =.xml=89 * =.doc=90 * =.docx=91 * =.xls=92 * =.xlsx=93 * =.ppt=94 * =.pptx=95 * =.pdf=90 * =.txt= 91 * =.html= 92 * =.xml= 93 * =.doc= 94 * =.docx= 95 * =.xls= 96 * =.xlsx= 97 * =.ppt= 98 * =.pptx= 99 * =.pdf= 96 100 97 101 You can change this with the =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexExtensions}= setting in =configure=. 98 102 99 If you add other file extensions, they are treated as ASCII files. If needed, you can add more specialised stringifiers for further document types ( see [[%TOPIC%#Indexing_further_document_types][Indexing further document types]]).103 If you add other file extensions, they are treated as ASCII files. If needed, you can add more specialised stringifiers for further document types (see [[%TOPIC%#Indexing_further_document_types][Indexing further document types]]). 100 104 101 105 ---+++ Indexing of Form Fields … … 105 109 __Note__: With =kinoupdate= only the form fields that existed at the time the initial index was created are indexed. Thus if you add a form or if you add a new field to an existing form, you should create a new index with =kinoindex=. 106 110 107 ---++ Indexing Further Document Types 111 ---++ Add-On Installation Instructions 112 113 ---+++ Backend for Indexing Word 2003 Documents 114 115 To index Word 2003 Documents (=.doc=) you will need to install one of the following: 116 117 * =antiword= (recommended) 118 * =abiword= 119 * =wvWare= 120 121 You can then select the tool to use in =configure=. 122 123 ---+++ Backend for PDF 124 125 To index =.pdf= files you need to install =xpdf-utils=. 126 127 ---+++ Backend for PPT 128 129 To index =.ppt= files you need to install =ppthtml=. 130 131 ---+++ Backends for DOCX, PPTX 132 133 To index these file types, you will need to install the following tools from Sourceforge: 134 * [[http://sourceforge.net/projects/docx2txt/][docx2txt]] for =.docx= 135 * [[http://sourceforge.net/projects/pptx2txt/][pptx2txt]] for =.pptx= 136 137 Then set the command path to these tools in =configure=. 138 139 ---+++ Instaling the !AddOn 140 141 %$INSTALLER_INSTRUCTIONS% 142 143 ---+++ Configuration 144 145 There are a number of settings that need to be set in =configure=. 146 147 You only need to enable the SearchEngineKinoSearchPlugin if you are using the =rest= handlers, or the =%<nop>KINOSEARCH%= macro. 148 149 ---+++ Test of the Installation 150 151 * Test if the installation was successful: 152 * Check that =antiword=, =abiword= or =wvHtml= is in place: Type =antiword=, =abiword= or =wvHtml= on the prompt and check that the command exists. 153 * Check that =pdftotext= is in place: Type =pdftotext= on the prompt and check that the command exists. 154 * Check that =ppthtml= is in place: Type =ppthtml= on the prompt and check that the command exists. 155 * Change the working directory to the =kinosearch/bin= Foswiki installation directory. 156 * Run =./kinoindex= 157 * Once finished, open a browser window and point it to the =[[System.KinoSearch]]= topic. 158 * Just type a query and check the results. 159 160 ---+++ Test of Stringification with =ks_test= 161 162 Some users report problems with the stringification: The =kinoindex= scipts fails, takes too long on attachments or =kinosearch= does not yield correct results. Some times this may result from installation errors esp. of the installation of the backends for the stringification. 163 164 =ks_test= give you the opportunity to test the stringification in advance. 165 166 Usage: =ks_test stringify file_name= 167 168 (I plan to extend ks_test, but at the moment the only possible second parameter is stringify). 169 170 In the result you see, which stringifier is used and the result of the stringification. 171 172 Example: 173 174 <verbatim> 175 /path/to/foswiki/kinosearch/bin$ ./ks_test stringify /path/to/foswiki/SearchEngineKinoSearchAddOn/test/unit/SearchEngineKinoSearchAddOn/attachement_examples/Simple_example.doc 176 177 Used stringifier: Foswiki::Contrib::SearchEngineKinoSearchAddOn::StringifyPlugins::DOC_antiword 178 179 Stringified text: 180 181 Simple example Keyword: dummy Umlaute: Grober, Uberschall, Anderung 182 </verbatim> 183 184 You see that the stringifier DOC_antiword is used and the resulting 185 text seems to be O.K. 186 187 ---+++ Upgrading From the TWiki Version 188 189 If you previously used the TWiki version (< 1.21) of this !AddOn (either on TWiki or on Foswiki) then you will need to move your settings from [[%LOCALSITEPREFS%]] into =configure=. 190 191 Also the following settings have been renamed, for consistency: 192 193 * =$Foswiki::cfg{KinoSearchLogDir}= __-->__ =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{LogDirectory}= 194 * =$Foswiki::cfg{KinoSearchIndexDir}= __-->__ =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexDirectory}= 195 196 ---++ Further Development 197 198 There is certainly a lot more that can be done with this Add-on, such as adding more stringifiers, improving the performance and making it more robust. See Foswiki:Tasks/SearchEngineKinoSearchAddOn for currently open tasks. 199 200 ---+++ Indexing Further Document Types 108 201 109 202 The indexing of attached documents is realised in two steps: … … 111 204 1 this ASCII string is indexed with <nop>KinoSearch. This is the normal way in all index applications. 112 205 113 To index different types of documents, it is necessary to have specialised stringifiers, i.e. classes to extract the ASCII text out of the document. In this add-on, a plug-in mechanism is implemented, so that additional stringifiers can be added without changing the existing code. All stringifier plugins are stored in the directory =lib/Foswiki/Contrib/ KinoSearch/StringifierPlugins=.206 To index different types of documents, it is necessary to have specialised stringifiers, i.e. classes to extract the ASCII text out of the document. In this add-on, a plug-in mechanism is implemented, so that additional stringifiers can be added without changing the existing code. All stringifier plugins are stored in the directory =lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/StringifierPlugins=. 114 207 115 208 You can add new stringifier plugins by just adding new files here. The minimum things to be implemented are: … … 118 211 * The plugin must implement the method =$text = stringForFile ($filename)= 119 212 120 Then you should add to the list in =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexExtensions}= in =configure=. Now the defined document type should be indexed and the new stringifier should be used. 121 122 NOTE: If you just extend the list without having a special stringifier in place, this document type is treaded like an ASCII file. For binary document types, this may lead to problems (inpropper search results, long indexing times and potential indexing break downs). 123 124 ---++ Add-On Installation Instructions 125 126 ---+++ Backend for Indexing Word 2003 Documents 127 128 Install a backend to stringify Word documents if you want to index Word documents. For this either install antiword, abiword or !wvWare. 129 130 __Note:__ This add-on comes with stringifiers for all three of them. Select the right stringifier is in =configure=. 131 132 ---+++ Backend for PDF 133 134 To index =.pdf= files you need to install =xpdf-utils=. 135 136 ---+++ Backend for PPT 137 138 To index =.ppt= files you need to install =ppthtml=. 139 140 ---+++ Backends for DOCX, PPTX 141 142 To index these file types, you will need to install the following tools from Sourceforge: 143 * [[http://sourceforge.net/projects/docx2txt/][docx2txt]] for =.docx= 144 * [[http://sourceforge.net/projects/pptx2txt/][pptx2txt]] for =.pptx= 145 146 Then set the command path to these tools in =configure=. 147 148 _Note for Windows_: For Windows, make sure you have a C-compiler in place. This is normally part of Visual Studio. 149 150 ---+++ Instaling the !AddOn 151 152 %$INSTALLER_INSTRUCTIONS% 153 154 ---+++ Configuration 155 156 There are a number of settings that need to be set in =configure=. 157 158 You only need to enable the SearchEngineKinoSearchPlugin if you are using the =rest= handlers, or the =%<nop>KINOSEARCH%= macro. 159 160 ---+++ Test of the Installation 161 162 * Test if the installation was successful: 163 * Check that =antiword=, =abiword= or =wvHtml= is in place: Type =antiword=, =abiword= or =wvHtml= on the prompt and check that the command exists. 164 * Check that =pdftotext= is in place: Type =pdftotext= on the prompt and check that the command exists. 165 * Check that =ppthtml= is in place: Type =ppthtml= on the prompt and check that the command exists. 166 * Change the working directory to the ==kinosearch/bin== Foswiki installation directory. 167 * Run =./kinoindex= 168 * Once finished, open a browser window and point it to the ==System/KinoSearch== topic. 169 * Just type a query and check the results. 170 171 ---+++ Test of Stringification with =ks_test= 172 173 Some users report problems with the stringification: The =kinoindex= scipts fails, takes too long on attachments or =kinosearch= does not yield correct results. Some times this may result from installation errors esp. of the installation of the backends for the stringification. 174 175 =ks_test= give you the opportunity to test the stringification in advance. 176 177 Usage: =ks_test stringify file_name= 178 179 (I plan to extend ks_test, but at the moment the only possible second parameter is stringify). 180 181 In the result you see, which stringifier is used and the result of the stringification. 182 183 Example: 184 185 <verbatim> 186 /path/to/foswiki/kinosearch/bin$ ./ks_test stringify /path/to/foswiki/SearchEngineKinoSearchAddOn/test/unit/SearchEngineKinoSearchAddOn/attachement_examples/Simple_example.doc 187 188 Used stringifier: Foswiki::Contrib::SearchEngineKinoSearchAddOn::StringifyPlugins::DOC_antiword 189 190 Stringified text: 191 192 Simple example Keyword: dummy Umlaute: Grober, Uberschall, Anderung 193 </verbatim> 194 195 You see that the stringifier DOC_antiword is used and the resulting 196 text seems to be O.K. 197 198 ---+++ Upgrading From the TWiki Version 199 200 If you previously used the TWiki version (< 1.21) of this !AddOn (either on TWiki or on Foswiki) then you will need to move your settings from [[%LOCALSITEPREFS%]] into =configure=. 201 202 Also the following settings have been renamed, for consistency: 203 204 * =$Foswiki::cfg{KinoSearchLogDir}= => =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{LogDirectory}= 205 * =$Foswiki::cfg{KinoSearchIndexDir}= => =$Foswiki::cfg{SearchEngineKinoSearchAddOn}{IndexDirectory}= 213 All the stringifiers have unit tests associated with them, and we would encourage you to provide unit tests for any you wish to contribute. See Foswiki:Development/UnitTests for more information on unit testing. 206 214 207 215 ---++ Add-On Info … … 211 219 --> 212 220 213 | Author(s): | Foswiki:Main.MarkusHesse and Foswiki:Main.SvenDowideit|214 | Copyright: | © 200 9, Foswiki:Main.MarkusHesse; © 2009, Foswiki Contributors |221 | Author(s): | Foswiki:Main.MarkusHesse, Foswiki:Main.SvenDowideit & Foswiki:Main.AndrewJones | 222 | Copyright: | © 2007, Foswiki:Main.MarkusHesse; © 2009, Foswiki Contributors | 215 223 | Version: | %$VERSION% | 216 224 | Change History: | <!-- versions below in reverse order --> | 217 | Sep 2009: | v 1.21 Foswikitask:Item1363 - port to Foswiki; add stringifiers for =.docx=, =.pptx= and =.xlsx=; change the =kinosearch= script to work with [[Foswiki:Development.FoswikiStandAlone][FSA]]; Moved settings into =configure=; Commands now set in =configure=; Replaced =system()= calls with =Foswiki::Sandbox->sysCommand()=; updated and simplified docs; Foswikitask:Item8246 - fix checking of access controls -- Foswiki:Main.AndrewJones, Foswiki:Main.WillNorris |225 | 24 Sep 2009: | v 1.21, Foswikitask:Item1363: port to Foswiki -- Foswiki:Main.WillNorris. add stringifiers for =.docx=, =.pptx= and =.xlsx=; change the =kinosearch= script to work with [[Foswiki:Development.FoswikiStandAlone][FSA]]; Moved settings into =configure=; Commands now set in =configure=; Replaced =system()= calls with =Foswiki::Sandbox->sysCommand()=; Handle passworded MS Office files; Make the index more robust if it somehow encounters binary files; Can now specify skipped topics; updated and simplified docs; added doc for TipsContrib; update templates; Foswikitask:Item8246: fix checking of access controls -- Foswiki:Main.AndrewJones | 218 226 | 06 Nov 2008: | v 1.20, minor revert to stop crash | 219 227 | 05 Nov 2008: | v 1.19, fixes for (nex)twiki/trunk | 220 | 20 Aug 2008: | v 1.18, added Integrated SEARCH, SearchEngineKinoSearchPlugin, restHandlers, updated code and tests -- TWiki:Main.SvenDowideit |221 | 6 Aug 2008: | v 1.17, TWikibug:Item5717: persist use form choices, TWikibug:Item5647: cope better with attachment problems -- TWiki:Main.SvenDowideit |228 | 20 Aug 2008: | v 1.18, added Integrated SEARCH, SearchEngineKinoSearchPlugin, restHandlers, updated code and tests -- Foswiki:Main.SvenDowideit | 229 | 6 Aug 2008: | v 1.17, TWikibug:Item5717: persist use form choices, TWikibug:Item5647: cope better with attachment problems -- Foswiki:Main.SvenDowideit | 222 230 | 4 Jun 2008: | v 1.16, TWikibug:Item5646: Problem with attachments with capital letter suffix | 223 231 | 12 May 2008: | v 1.15, TWikibug:Item5579, TWikibug:Item5580, TWikibug:Item5619: Problem with ALLOWWEBVIEW and Forms fixed | -
trunk/SearchEngineKinoSearchAddOn/lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/Index.pm
r5116 r5129 496 496 foreach my $field (@fields) { 497 497 my $name = $field->{"name"}; 498 if ( defined($fldNames{$name})&& $fldNames{$name}) {498 if ( %fldNames && $fldNames{$name}) { 499 499 my $value = $field->{"value"}; 500 500 next if (!defined($value)); #field not there. -
trunk/SearchEngineKinoSearchAddOn/lib/Foswiki/Contrib/SearchEngineKinoSearchAddOn/MANIFEST
r4913 r5129 3 3 data/System/SearchEngineKinoSearchAddOn.txt 0664 Documentation 4 4 data/System/KinoSearch.txt 0664 End user documentation 5 data/System/TipTopicForKinoSearch.txt 0664 Topic for TipsContrib 5 6 pub/System/SearchEngineKinoSearchAddOn/KinoSearchResult.jpg 0664 Screenshot 6 7 pub/System/SearchEngineKinoSearchAddOn/KinoSEARCH.jpg 0664 Screenshot
Note: See TracChangeset
for help on using the changeset viewer.
