X-Git-Url: https://git.stg.codes/stg.git/blobdiff_plain/0c9c28efcd43f53ac54aa60b2dfefa69c70dbadf..6b6d9b29e9e9e91f79507a8bf193fb30de311dcc:/doc/help/xslt/webhelp/docs/content/ch03s02.html diff --git a/doc/help/xslt/webhelp/docs/content/ch03s02.html b/doc/help/xslt/webhelp/docs/content/ch03s02.html new file mode 100644 index 00000000..c4ba8729 --- /dev/null +++ b/doc/help/xslt/webhelp/docs/content/ch03s02.html @@ -0,0 +1,124 @@ + + + + +Search

Search

Overview design of Search mechanism.

+ The searching is a fully client-side implementation of querying texts for + content searching, and no server is involved. That means when a user enters a query, + it is processed by JavaScript inside the browser, and displays the matching results by + comparing the query with a generated 'index', which too reside in the client-side web browser. + + Mainly the search mechanism has two parts. +

  • Indexing: First we need to traverse the content in the docs/content folder and index + the words in it. This is done by nw-cms.jar. You can invoke it by + ant index command from the root of webhelp of directory. You can recompile it + again and build the jar file by ant build-indexer. Indexer has some extensive + support for such as stemming of words. Indexer has extensive support for English, German, + French languages. By extensive support, what I meant is that those texts are stemmed + first, to get the root word and then indexes them. For CJK (Chinese, Japanese, Korean) + languages, it uses bi-gram tokenizing to break up the words. (CJK languages does not have + spaces between words.) +

    + When we run ant index, it generates five output files: +

    • htmlFileList.js - This contains an array named fl which stores details + all the files indexed by the indexer. +

    • htmlFileInfoList.js - This includes some meta data about the indexed files in an array + named fil. It includes details about file name, file (html) title, a summary + of the content.Format would look like, + fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an overview of how webhelp is implemented."; +

    • index-*.js (Three index files) - These three files actually stores the index of the content. + Index is added to an array named w.

    + +

  • + Querying: Query processing happens totally in client side. Following JavaScript files handles them. +

    • nwSearchFnt.js - This handles the user query and returns the search results. It does query + word tokenizing, drop unnecessary punctuations and common words, do stemming if docbook language + supports it, etc.

    • {$indexer-language-code}_stemmer.js - This includes the stemming library. + nwSearchFnt.js file calls stemmer method in this file for stemming. + ex: var stem = stemmer(foobar); +

    +

+

New Stemmers

Adding new Stemmers is very simple.

Currently, only English, French, and German stemmers are integrated in to WebHelp. But the code is + extensible such that you can add new stemmers easily by few steps.

What you need: +

  • You'll need two versions of the stemmer; One written in JavaScript, and another in Java. But fortunately, + Snowball contains Java stemmers for number of popular languages, and are already included with the package. + You can see the full list in Adding support for other (non-CJKV) languages. + If your language is listed there, + Then you have to find javascript version of the stemmer. Generally, new stemmers are getting added in to + Snowball Stemmers in other languages location. + If javascript stemmer for your language is available, then download it. Else, you can write a new stemmer in + JavaScript using SnowBall algorithm fairly easily. Algorithms are at + Snowball. +

  • Then, name the JS stemmer exactly like this: {$language-code}_stemmer.js. For example, + for Italian(it), name it as, it_stemmer.js. Then, copy it to the + docbook-webhelp/template/content/search/stemmers/ folder. (I assumed + docbook-webhelp is the root folder for webhelp.) +

    Note

    Make sure you changed the webhelp.indexer.language property in build.properties + to your language. +

    + +

  • Now two easy changes needed for the indexer.

    • Open docbook-webhelp/indexer/src/com/nexwave/nquindexer/IndexerTask.java in + a text editor and add your language code to the supportedLanguages String Array.

      Example 3.1. Add new language to supportedLanguages array

      + change the Array from, +

      +private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko"}; 
      +    //currently extended support available for
      +    // English, German, French and CJK (Chinese, Japanese, Korean) languages only.
      +

      + To,

      +private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", "it"}; 
      +  //currently extended support available for
      +  // English, German, French, CJK (Chinese, Japanese, Korean), and Italian languages only.
      +                    

    • + Now, open docbook-webhelp/indexer/src/com/nexwave/nquindexer/SaxHTMLIndex.java and + add the following line to the code where it initializes the Stemmer (Search for + SnowballStemmer stemmer;). Then add code to initialize the stemmer Object in your language. + It's self understandable. See the example. The class names are at: + docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/. +

      Example 3.2. initialize correct stemmer based on the webhelp.indexer.language specified

      +      SnowballStemmer stemmer;
      +      if(indexerLanguage.equalsIgnoreCase("en")){
      +           stemmer = new EnglishStemmer();
      +      } else if (indexerLanguage.equalsIgnoreCase("de")){
      +          stemmer= new GermanStemmer();
      +      } else if (indexerLanguage.equalsIgnoreCase("fr")){
      +          stemmer= new FrenchStemmer();
      +      }
      +else if (indexerLanguage.equalsIgnoreCase("it")){ //If language code is "it" (Italian)
      +          stemmer= new italianStemmer();  //Initialize the stemmer to italianStemmer object.
      +      }       
      +      else {
      +          stemmer = null;
      +      }
      +

+

That's all. Now run ant build-indexer to compile and build the java code. + Then, run ant webhelp to generate the output from your docbook file. + For any questions, contact us or email to the docbook mailing list + . +