]> git.stg.codes - stg.git/blob - doc/xslt/webhelp/docs/content/ch03s02.html
Implemented new command line options parser.
[stg.git] / doc / xslt / webhelp / docs / content / ch03s02.html
1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xmlns:exsl="http://exslt.org/common" xmlns:ng="http://docbook.org/docbook-ng" xmlns:db="http://docbook.org/ns/docbook"><head>
4 <meta http-equiv="X-UA-Compatible" content="IE=7" />
5 <title>Search</title><meta name="generator" content="DocBook XSL Stylesheets V1.76.1" /><link rel="home" href="index.html" title="README: Web-based Help from DocBook XML" /><link rel="up" href="ch03.html" title="Chapter 3. Developer Docs" /><link rel="prev" href="ch03s01.html" title="Design" /><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><script type="text/javascript">
6             //The id for tree cookie
7             var treeCookieId = "treeview-897";
8             var language = "en";
9             var w = new Object();
10             //Localization
11             txt_filesfound = 'Results';
12             txt_enter_at_least_1_char = "You must enter at least one character.";
13             txt_browser_not_supported = "Your browser is not supported. Use of Mozilla Firefox is recommended.";
14             txt_please_wait = "Please wait. Search in progress...";
15             txt_results_for = "Results for: ";
16         </script><style type="text/css">
17             input {
18             margin-bottom: 5px;
19             margin-top: 2px;
20             }
21
22             .folder {
23             display: block;
24             height: 22px;
25             padding-left: 20px;
26             background: transparent url(../common/jquery/treeview/images/folder.gif) 0 0px no-repeat;
27             }
28             </style><link rel="shortcut icon" href="../favicon.ico" type="image/x-icon" /><link rel="stylesheet" type="text/css" href="../common/css/positioning.css" /><link rel="stylesheet" type="text/css" href="../common/jquery/theme-redmond/jquery-ui-1.8.2.custom.css" /><link rel="stylesheet" type="text/css" href="../common/jquery/treeview/jquery.treeview.css" /><script type="text/javascript" src="../common/jquery/jquery-1.4.2.min.js"></script><script type="text/javascript" src="../common/jquery/jquery-ui-1.8.2.custom.min.js"></script><script type="text/javascript" src="../common/jquery/jquery.cookie.js"></script><script type="text/javascript" src="../common/jquery/treeview/jquery.treeview.min.js"></script><script type="text/javascript" src="search/htmlFileList.js"></script><script type="text/javascript" src="search/htmlFileInfoList.js"></script><script type="text/javascript" src="search/nwSearchFnt.js"></script><script type="text/javascript" src="search/stemmers/en_stemmer.js"><!--//make this scalable to other languages as well.--></script><script type="text/javascript" src="search/index-1.js"></script><script type="text/javascript" src="search/index-2.js"></script><script type="text/javascript" src="search/index-3.js"></script></head><body><div id="header"><img style="margin-right: 2px; height: 59px; padding-right: 25px; padding-top: 8px" align="right" src="../common/images/logo.png" alt="Company Logo" /><h1 align="center">Search<br />Chapter 3. Developer Docs</h1><div id="navheader" align="right"><table><tr><td style="height: 28px; width: 16px;"><a id="showHideButton" onclick="showHideToc();" class="pointLeft" title="Hide TOC tree">.
29                             </a></td><td><img src="../common/images/highlight-blue.gif" alt="H" height="25px" onclick="toggleHighlight()" id="showHideHighlight" style="cursor:pointer" title="Toggle search result highlighting" /></td><td><a accesskey="p" href="ch03s01.html">Prev</a>
30                                         |
31                                         <a accesskey="u" href="ch03.html">Up</a></td></tr></table></div></div><div id="content"><div class="section" title="Search"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="id36124495"></a>Search</h2></div></div></div><p class="summary">Overview design of Search mechanism.</p><p>
32         The searching is a fully client-side implementation of querying texts for
33         content searching, and no server is involved. That means when a user enters a query, 
34         it is processed by JavaScript inside the browser, and displays the matching results by
35         comparing the query with a generated 'index', which too reside in the client-side web browser.
36         
37         Mainly the search mechanism has two parts.
38         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Indexing: First we need to traverse the content in the docs/content folder and index 
39               the words in it. This is done by <code class="filename">nw-cms.jar</code>. You can invoke it by  
40             <code class="code">ant index</code> command from the root of webhelp of directory. You can recompile it 
41               again and build the jar file by <code class="code">ant build-indexer</code>. Indexer has some extensive 
42               support for such as stemming of words. Indexer has extensive support for English, German,
43               French languages. By extensive support, what I meant is that those texts are stemmed
44               first, to get the root word and then indexes them. For CJK (Chinese, Japanese, Korean)
45               languages, it uses bi-gram tokenizing to break up the words. (CJK languages does not have 
46               spaces between words.)                
47             </p><p>
48               When we run <code class="code">ant index</code>, it generates five output files:
49                 </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><code class="filename">htmlFileList.js</code> - This contains an array named <code class="code">fl</code> which stores details 
50                     all the files indexed by the indexer.  
51                     </p></li><li class="listitem"><p><code class="filename">htmlFileInfoList.js</code> - This includes some meta data about the indexed files in an array 
52                       named <code class="code">fil</code>. It includes details about file name, file (html) title, a summary 
53                       of the content.Format would look like, 
54       <code class="code">fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an overview of how webhelp is implemented.";</code> 
55                     </p></li><li class="listitem"><p><code class="filename">index-*.js</code> (Three index files) - These three files actually stores the index of the content. 
56                       Index is added to an array named <code class="code">w</code>.</p></li></ul></div><p>
57               
58             </p></li><li class="listitem"><p>
59               Querying: Query processing happens totally in client side. Following JavaScript files handles them.
60               </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><code class="filename">nwSearchFnt.js</code> - This handles the user query and returns the search results. It does query 
61                     word tokenizing, drop unnecessary punctuations and common words, do stemming if docbook language 
62                     supports it, etc.</p></li><li class="listitem"><p><code class="filename">{$indexer-language-code}_stemmer.js</code> - This includes the stemming library. 
63                     <code class="filename">nwSearchFnt.js</code> file calls <code class="code">stemmer</code> method in this file for stemming.
64                     ex: <code class="code">var stem = stemmer(foobar);</code>                    
65                   </p></li></ul></div><p>
66             </p></li></ul></div><p>
67       </p><div class="section" title="New Stemmers"><div class="titlepage"><div><div><h3 class="title"><a id="id36124646"></a>New Stemmers</h3></div></div></div><p class="summary">Adding new Stemmers is very simple.</p><p>Currently, only English, French, and German stemmers are integrated in to WebHelp. But the code is
68         extensible such that you can add new stemmers easily by few steps.</p><p>What you need:
69         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>You'll need two versions of the stemmer; One written in JavaScript, and another in Java. But fortunately, 
70               Snowball contains Java stemmers for number of popular languages, and are already included with the package. 
71               You can see the full list in <a class="ulink" href="ch02s04.html" target="_top">Adding support for other (non-CJKV) languages</a>. 
72               If your language is listed there,
73               Then you have to find javascript version of the stemmer. Generally, new stemmers are getting added in to
74               <a class="ulink" href="http://snowball.tartarus.org/otherlangs/index.html" target="_top">Snowball Stemmers in other languages</a> location.
75               If javascript stemmer for your language is available, then download it. Else, you can write a new stemmer in 
76               JavaScript using SnowBall algorithm fairly easily. Algorithms are at  
77               <a class="ulink" href="http://snowball.tartarus.org/" target="_top">Snowball</a>. 
78             </p></li><li class="listitem"><p>Then, name the JS stemmer exactly like this: <code class="filename">{$language-code}_stemmer.js</code>. For example, 
79               for Italian(it), name it as, <code class="filename">it_stemmer.js</code>. Then, copy it to the 
80               <code class="filename">docbook-webhelp/template/content/search/stemmers/</code> folder. (I assumed 
81               <code class="filename">docbook-webhelp</code> is the root folder for webhelp.)
82               </p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Make sure you changed the <code class="code">webhelp.indexer.language</code> property in <code class="filename">build.properties</code>
83                 to your language.
84                 </p></div><p>
85
86             </p></li><li class="listitem"><p>Now two easy changes needed for the indexer.</p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>Open <code class="filename">docbook-webhelp/indexer/src/com/nexwave/nquindexer/IndexerTask.java</code> in 
87                   a text editor and add your language code to the <code class="code">supportedLanguages</code> String Array. </p><div class="example"><a id="id36124759"></a><p class="title"><strong>Example 3.1. Add new language to supportedLanguages array</strong></p><div class="example-contents"><p>
88                     change the Array from,
89 </p><pre class="programlisting">
90 private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko"}; 
91     //currently extended support available for
92     // English, German, French and CJK (Chinese, Japanese, Korean) languages only.
93 </pre><p>
94                     To,</p><pre class="programlisting">
95 private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", <span class="emphasis"><em>"it"</em></span>}; 
96   //currently extended support available for
97   // English, German, French, CJK (Chinese, Japanese, Korean), and Italian languages only.
98                     </pre></div></div><br class="example-break" /></li><li class="listitem"><p>
99                   Now, open <code class="filename">docbook-webhelp/indexer/src/com/nexwave/nquindexer/SaxHTMLIndex.java</code> and 
100                   add the following line to the code where it initializes the Stemmer (Search for 
101                   <code class="code">SnowballStemmer stemmer;</code>). Then add code to initialize the stemmer Object in your language. 
102                   It's self understandable. See the example. The class names are at: 
103                   <code class="filename">docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/</code>.
104                 </p><div class="example"><a id="id36124809"></a><p class="title"><strong>Example 3.2. initialize correct stemmer based on the <code class="code">webhelp.indexer.language</code> specified</strong></p><div class="example-contents"><pre class="programlisting">
105       SnowballStemmer stemmer;
106       if(indexerLanguage.equalsIgnoreCase("en")){
107            stemmer = new EnglishStemmer();
108       } else if (indexerLanguage.equalsIgnoreCase("de")){
109           stemmer= new GermanStemmer();
110       } else if (indexerLanguage.equalsIgnoreCase("fr")){
111           stemmer= new FrenchStemmer();
112       }
113 <span class="emphasis"><em>else if (indexerLanguage.equalsIgnoreCase("it")){ //If language code is "it" (Italian)
114           stemmer= new italianStemmer();  //Initialize the stemmer to <code class="code">italianStemmer</code> object.
115       } </em></span>      
116       else {
117           stemmer = null;
118       }
119 </pre></div></div><br class="example-break" /></li></ul></div></li></ul></div><p>
120         </p><p>That's all. Now run <code class="code">ant build-indexer</code> to compile and build the java code. 
121           Then, run <code class="code">ant webhelp</code> to generate the output from your docbook file. 
122           For any questions, contact us or email to the docbook mailing list 
123           <code class="email">&lt;<a class="email" href="mailto:docbook-apps@lists.oasis-open.org">docbook-apps@lists.oasis-open.org</a>&gt;</code>.
124         </p></div></div><script type="text/javascript" src="../common/main.js"></script><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch03s01.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="ch03.html">Up</a></td><td width="40%" align="right"> </td></tr><tr><td width="40%" align="left" valign="top"> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> </td></tr></table></div></div><div><div id="leftnavigation" style="padding-top:3px; background-color:white;"><div id="tabs"><ul><li><a href="#treeDiv"><em>Contents</em></a></li><li><a href="#searchDiv"><em>Search</em></a></li></ul><div id="treeDiv"><img src="../common/images/loading.gif" alt="loading table of contents..." id="tocLoading" style="display:block;" /><div id="ulTreeDiv" style="display:none"><ul id="tree" class="filetree"><li><span class="file"><a href="ch01.html">Introduction</a></span></li><li><span class="file"><a href="ch02.html">Using the package</a></span><ul><li><span class="file"><a href="ch02s01.html">Generating webhelp output</a></span></li><li><span class="file"><a href="ch02s02.html">Using and customizing the output</a></span><ul><li><span class="file"><a href="ch02s02.html#id36124136">Recommended Apache configurations</a></span></li></ul></li><li><span class="file"><a href="ch02s03.html">Building the indexer</a></span></li><li><span class="file"><a href="ch02s04.html">Adding support for other (non-CJKV) languages</a></span></li></ul></li><li><span class="file"><a href="ch03.html">Developer Docs</a></span><ul><li><span class="file"><a href="ch03s01.html">Design</a></span></li><li id="webhelp-currentid"><span class="file"><a href="ch03s02.html">Search</a></span><ul><li><span class="file"><a href="ch03s02.html#id36124646">New Stemmers</a></span></li></ul></li></ul></li></ul></div></div><div id="searchDiv"><div id="search"><form onsubmit="Verifie(ditaSearch_Form);return false" name="ditaSearch_Form" class="searchForm" id="ditaSearch_Form"><fieldset class="searchFieldSet"><legend>Search</legend><center><input id="textToSearch" name="textToSearch" type="text" class="searchText" /> &nbsp; <input onclick="Verifie(ditaSearch_Form)" type="button" class="searchButton" value="Go" id="doSearch" /></center></fieldset></form></div><div id="searchResults"><center></center></div></div></div></div></div></body></html>