X-Git-Url: https://git.stg.codes/stg.git/blobdiff_plain/bfec9cc7ab5a396f7662090b208691ec59a69f1b..2f1753cc3e240fa497a87873ed19fe3f11e22331:/doc/xslt/webhelp/docsrc/readme.xml diff --git a/doc/xslt/webhelp/docsrc/readme.xml b/doc/xslt/webhelp/docsrc/readme.xml new file mode 100755 index 00000000..4d191906 --- /dev/null +++ b/doc/xslt/webhelp/docsrc/readme.xml @@ -0,0 +1,928 @@ + + + + README: Web-based Help from DocBook XML + + + + Permission is hereby granted, free of charge, to any person + obtaining a copy of this software and associated documentation files + (the Software), to deal in the Software without + restriction, including without limitation the rights to use, copy, + modify, merge, publish, distribute, sublicense, and/or sell copies of + the Software, and to permit persons to whom the Software is furnished to + do so, subject to the following conditions: + + The above copyright notice and this permission notice shall + be included in all copies or substantial portions of the + Software. + + + + Except as contained in this notice, the names of individuals + credited with contribution to this software shall not be used in + advertising or otherwise to promote the sale, use or other + dealings in this Software without prior written authorization from + the individuals in question. + + + + Any stylesheet derived from this Software that is publicly + distributed will be identified with a different name and the + version strings in any derived Software will be changed so that no + possibility of confusion between the derived package and this + Software will exist. + + + + + Warranty: + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + IN NO EVENT SHALL DAVID CRAMER, KASUN GAJASINGHE, OR ANY OTHER CONTRIBUTOR BE LIABLE FOR + ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF + CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + + + This package is maintained by Kasun Gajasinghe, kasunbg AT + gmail DOT com and David Cramer, david AT thingbag DOT + net. + + This package also includes the following software written and + copyrighted by others: + + Files in template/common/jquery are + copyrighted by JQuery + under the MIT License. The file + jquery.cookie.js Copyright (c) 2006 Klaus + Hartl under the MIT license. + + + jquery + + + + + Some files in the template/content/search and indexer directories were originally + part of N. Quaine's htmlsearch DITA plugin. The htmlsearch DITA + plugin is available from the files + page of the DITA-users yahoogroup. The htmlsearch plugin + was released under a BSD-style license. See + indexer/license.txt for details. + htmlsearch + + DITA + + htmlsearch plugin + + + + + Stemmers from the Snowball + project released under a BSD license. + + + + Code from the Apache + Lucene search engine provides support for tokenizing + Chinese, Japanese, and Korean content released under the Apache + 2.0 license. + + + Webhelp for DocBook was developed as a Google Summer of Code project. + + + + + 2008-2010 + + Kasun Gajasinghe + + David Cramer + + + + David + + Cramer + + dcramer AT motive DOT com + + david AT thingbag DOT net + + + + Kasun + + Gajasinghe + + kasunbg AT gmail DOT com + + + August 2010 + + + + + + + + Overview of the package. + + + + Introduction + + A common requirement for technical publications groups is to produce a Web-based help + format that includes a table of contents pane, a search feature, and an index similar to what + you get from the Microsoft HTML Help (.chm) format or Eclipse help. If the content is help for + a Web application that is not exposed to the Internet or requires that the user be logged in, + then it is impossible to use services like Google to add search. + features + + + Features + + Full text search. + search + features + + + + Stemming support for English, French, and German. Stemming support can be added + for other languages by implementing a stemmer. + search + stemming + + + + Support for Chinese, Japanese, and Korean using code from the Lucene search + engine. + + + Search highlighting shows where the searched for term appears in the results. + Use the H button to toggle the highlighting off and on. + + search + highlighting + + + + Search results can include brief descriptions of the target. + search + descriptions + + + + + + Table of contents pane with collapsible toc tree. + + + Auto-synchronization of content pane and TOC. + + + TOC and search pane implemented without the use of a frameset. + + + An Ant build.xml file to generate output. You can use this + build file by importing it into your own or use it as a model for integrating this + output format into your own build system. + + + + Possible future enhancements + + Move webhelp-specific parameters and gentext strings into base DocBook stylesheets. + + + + Use tabindex attributes to control the tab + order in the output. The Contents and Search tabs should be first and second, then the + search box and button, then the table of contents items, and so on. + + + Add "Expand all" and "Collapse all" buttons to the table of contents. + + + Add other search options: + + + Add an option to use Lucene for server-side searches with table of contents + state persisted on the server. + + + Add a simple form that uses a Google site:my.domain.com based search. + + + + + Sort search results based on relevance + + + Support wild card characters in the search query. + + + Parameterize width of the TOC pane OR make the TOC pane resizeable by the + user. + + + Automate search results summary text: + + + Automatically use the first non-heading content as the summary in the search + results. + + + Automatically limit the size of the search description to something 140 + characters. + + + + + Support boolean operators in search. + + + Parameterize list of files to exclude from indexing. Currently it's hard coded that + we don't index index.html and ix01.html (the + legal notice and index topics). It should be smarter and automatically not index the + index file even if it's not named ix01.html. + + + Improve performance by moving the table of contents div out of each page and into a + separate JavaScript file which then adds it to the page. + + + Add to the indexer the ability to specify a list of files or file patterns not to + index. Currently it does not index index.html or + ix01.html, which is generally appropriate, but it should be up to + the user to decide. + + + Add an index tab populated by a separate JavaScript file. Include a param/property + that allows the content creator to disable the index. + + + Add functionality to the build.xml file so that when a property + is set, the build generates a pdf version of the document and includes a link to it from + the header. + + + Add breadcrumbs so the user will know what topics he's been to. + + + Consider using more advanced Lucene indexers for Chinese and Japanese than the + CJKAnalyzer + + + + + + Using the package + + The following sections describe how to install and + use the package on Windows. + +
+ + + Installation instructions + + + + Generating webhelp output + + + To install the package on Windows + + + The examples in this procedure assume a Windows installation, + but the process is the same in other environments, + mutatis mutandis. + + + + If necessary, install Java 1.6 or + higher. + + + + Confirm that Java is installed and in your + PATH by typing the following at a command prompt: + java -version + + To build the indexer, you must have the JDK. + + + + + + + If necessary, install Apache Ant 1.6.5 + or higher. + + + + Unzip the Ant binary distribution to a convenient location + on your system. For example: c:\Program + Files. + + + + Set the environment variable ANT_HOME to + the top-level Ant directory. For example: c:\Program + Files\apache-ant-1.7.1. + See How To Manage + Environment Variables in Windows XP for information + on setting environment variables. + + + + + Add the Ant bin directory to your + PATH. For example: c:\Program + Files\apache-ant-1.7.1\bin + + + + Confirm that Ant is installed by typing the following at a + command prompt: ant -version + + + If you see a message about the file + tools.jar being missing, you can safely + ignore it. + + + + + + + Download Saxon + 6.5.x and unzip the distribution to a convenient location on your file system. + You will use the path to saxon.jar in below. + The build.xml has only been tested with Saxon 6.5, though + it could be adapted to work with other XSLT processors. However, when you generate + output, the Saxon jar must not be in your + CLASSPATH. + + + + + In a text editor, edit the + build.properties file in the webhelp directory + and make the changes indicated by the comments:# The path (relative to the build.xml file) to your input document. +# To use your own input document, create a build.xml file of your own +# and import this build.xml. +input-xml=docsrc/readme.xml + +# The directory in which to put the output files. +# This directory is created if it does not exist. +output-dir=docs + +# If you are using a customization layer that imports webhelp.xsl, use +# this property to point to it. +stylesheet-path=${ant.file.dir}/xsl/webhelp.xsl + +# If your document has image directories that need to be copied +# to the output directory, you can list patterns here. +# See the Ant documentation for fileset for documentation +# on patterns. +#input-images-dirs=images/**,figures/**,graphics/** + +# By default, the ant script assumes your images are stored +# in the same directory as the input-xml. If you store your +# image directories in another directory, specify it here. +# and uncomment this line. +#input-images-basedir=/path/to/image/location + +# Modify this so that it points to your copy of the Saxon 6.5 jar. +xslt-processor-classpath=/usr/share/java/saxon-6.5.5.jar + +# For non-ns version only, this validates the document +# against a dtd. +validate-against-dtd=true + +# Set this to false if you don't need a search tab. +webhelp.include.search.tab=true + +# indexer-language is used to tell the search indexer which language +# the docbook is written. This will be used to identify the correct +# stemmer, and punctuations that differs from language to language. +# see the documentation for details. en=English, fr=French, de=German, +# zh=Chinese, ja=Japanese etc. +webhelp.indexer.language=en + + + + Test the package by running the command ant webhelp + -Doutput-dir=test-ouput at the command line in the webhelp directory. It should + generate a copy of this documentation in the doc + directory. Type start test-output\index.html to open the output in a + browser. Once you have confirmed that the process worked, you can delete the test-output directory. + The Saxon 6.5 jar should not be in your + CLASSPATH when you generate the webhelp output. If you have any + problems, try running ant with an empty CLASSPATH. + + + + + To process your own document, simply refer to this package + from another build.xml in arbitrary location on + your system: + + + + Create a new build.xml file that + defines the name of your source file, the desired output + directory, and imports the build.xml from + this package. For example: <project> + <property name="input-xml" value="path-to/yourfile.xml"/> + <property name="input-images-dirs" value="images/** figures/** graphics/**"/> + <property name="output-dir" value="path-to/desired-output-dir"/> + <import file="path-to/docbook-webhelp/build.xml"/> +</project> + + + + From the directory containing your newly created + build.xml file, type ant + webhelp to build your document. + + The Saxon 6.5 jar should not be in your + CLASSPATH when you generate the webhelp output. If you have any + problems, try running ant with an empty CLASSPATH. + + + + + +
+ +
+ Using and customizing the output + + To deep link to a topic inside the help set, simply link directly + to the page. This help system uses no frameset, so nothing further is + necessary. + See Chunking into + multiple HTML files in Bob Stayton's DocBook XSL: The + Complete Guide for information on controlling output file + names and which files are chunked in DocBook. + + + When you perform a search, the results can include brief + summaries. These are populated in one of two ways: + + By adding role="summary" to a + para or phrase in the + chapter or section. + + + + By adding an abstract to the + chapterinfo or sectioninfo + element. + + + + To customize the look and feel of the help, study the following + css files: + + docs/common/css/positioning.css: This + handles the Positioning of DIVs in appropriate positions. For + example, it causes the leftnavigation div to appear + on the left, the header on top, and so on. Use this if you need to + change the relative positions or need to change the width/height + etc. + + + + docs/common/jquery/theme-redmond/jquery-ui-1.8.2.custom.css: + This is the theming part which adds colors and stuff. This is a + default theme comes with jqueryui unchanged. You + can get any theme based your interest from this. (Themes are on + right navigation bar.) Then replace the css theme folder + (theme-redmond) with it, and change the xsl to point to the new + css. + + + + docs/common/jquery/treeview/jquery.treeview.css: + This styles the toc Tree. Generally, you don't have to edit this + file. + + + +
+ Recommended Apache configurations + + If you are serving a long document from an Apache web server, we + recommend you make the following additions or changes to your + httpd.conf or .htaccess + file. TODO: Explain what each thing + does.AddDefaultCharSet UTF-8 # + + # 480 weeks + <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$"> # + Header set Cache-Control "max-age=290304000, public" + </FilesMatch> + + # 2 DAYS + <FilesMatch "\.(xml|txt)$"> + Header set Cache-Control "max-age=172800, public, must-revalidate" + </FilesMatch> + + # 2 HOURS + <FilesMatch "\.(html|htm)$"> + Header set Cache-Control "max-age=7200, must-revalidate" + </FilesMatch> + + # compress text, html, javascript, css, xml: + AddOutputFilterByType DEFLATE text/plain # + AddOutputFilterByType DEFLATE text/html + AddOutputFilterByType DEFLATE text/xml + AddOutputFilterByType DEFLATE text/css + AddOutputFilterByType DEFLATE application/xml + AddOutputFilterByType DEFLATE application/xhtml+xml + AddOutputFilterByType DEFLATE application/rss+xml + AddOutputFilterByType DEFLATE application/javascript + AddOutputFilterByType DEFLATE application/x-javascript + + # Or, compress certain file types by extension: + <Files *.html> + SetOutputFilter DEFLATE + </Files> + + + See Odd + characters in HTML output in Bob Stayton's book + DocBook XSL: The Complete Guide for more + information about this setting. + + + + These lines and those that follow cause the browser to + cache various resources such as bitmaps and JavaScript files. + Note that caching JavaScript files could cause your users to + have stale search indexes if you update your document since the + search index is stored in JavaScript files. + + + + These lines cause the the server to compress html, css, + and JavaScript files and the brower to uncompress them to + improve download performance. + + +
+
+ +
+ Building the indexer + + To build the indexer, you must have installed the + JDK version 1.5 or higher and set the ANT_HOME + environment variable. Run ant build-indexer to recompile + nw-cms.jar + + + ANT_HOME + + + + indexer + + building + +
+ +
+ Adding support for other (non-CJKV) languages + + To support stemming for a language, the search mechanism requires + a stemmer implemented in both Java and JavaScript. The Java version is + used by the indexer and the JavaScript verison is used to stem the + user's input on the search form. Currently the search mechanism supports + stemming for English and German. In addition, Java stemmers are included + for the following languages. Therefore, to support these languages, you + only need to implement the stemmer in JavaScript and add it to the + template. If you do undertake this task, please consider contributing + the JavaScript version back to this project and to Martin + Porter's project. + + Danish + + + + Dutch + + + + Finnish + + + + Hungarian + + + + Italian + + + + Norwegian + + + + Portuguese + + + + Romanian + + + + Russian + + + + Spanish + + + + Swedish + + + + Turkish + + +
+
+ + + Developer Docs + + This chapter provides an overview of how webhelp is implemented. + + The table of contents and search panes are implemented as divs and + rendered as if they were the left pane in a frameset. As a result, the + page must save the state of the table of contents and the search in + cookies when you navigate away from a page. When you load a new page, the + page reads these cookies and restores the state of the table of contents + tree and search. The result is that the help system behaves exactly as if + it were a frameset. + +
+ Design + An overview of webhelp page structure. + DocBook WebHelp page structure is fully built on css-based design + abandoning frameset structure. Overall page structure can be divided in to three main sections + + + Header: Header is a separate Div which include company logo, + navigation button(prev, next etc.), page title and heading of parent topic. + + + + Content: This includes the content of the documentation. The processing of this part is + done by + DocBook XSL Chunking customization. Few further css-styling applied from + positioning.css. + + + + + Left Navigation: This includes the table of contents and search tab. This + is customized using jquery-ui styling. + + + Tabbed Navigation: The navigation pane is organized in to two tabs. + Contents tab, and Search tab. Tabbed output is achieved using + JQuery Tabs plugin. + + + + + Table of Contents (TOC) tree: When building the chunked html from the + docbook file, Table of Contents is generated as an Unordered List (a list + made from <ul> <li> tags). When page loads in the browser, + we apply styling to it to achieve the nice look that you see. Styling for TOC + tree is done by a JQuery UI plugin called + + TreeView. We can generate the tree easily by following javascript code: + + +//Generate the tree +$("#tree").treeview({ +collapsed: true, +animated: "medium", +control: "#sidetreecontrol", +persist: "cookie" +}); + + + + + + Search Tab: This includes the search feature. + + + + + + +
+ +
+ Search + Overview design of Search mechanism. + + The searching is a fully client-side implementation of querying texts for + content searching, and no server is involved. That means when a user enters a query, + it is processed by JavaScript inside the browser, and displays the matching results by + comparing the query with a generated 'index', which too reside in the client-side web browser. + + Mainly the search mechanism has two parts. + + + Indexing: First we need to traverse the content in the docs/content folder and index + the words in it. This is done by nw-cms.jar. You can invoke it by + ant index command from the root of webhelp of directory. You can recompile it + again and build the jar file by ant build-indexer. Indexer has some extensive + support for such as stemming of words. Indexer has extensive support for English, German, + French languages. By extensive support, what I meant is that those texts are stemmed + first, to get the root word and then indexes them. For CJK (Chinese, Japanese, Korean) + languages, it uses bi-gram tokenizing to break up the words. (CJK languages does not have + spaces between words.) + + + When we run ant index, it generates five output files: + + + htmlFileList.js - This contains an array named fl which stores details + all the files indexed by the indexer. + + + + htmlFileInfoList.js - This includes some meta data about the indexed files in an array + named fil. It includes details about file name, file (html) title, a summary + of the content.Format would look like, + fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an overview of how webhelp is implemented."; + + + + + index-*.js (Three index files) - These three files actually stores the index of the content. + Index is added to an array named w. + + + + + + + + + Querying: Query processing happens totally in client side. Following JavaScript files handles them. + + + nwSearchFnt.js - This handles the user query and returns the search results. It does query + word tokenizing, drop unnecessary punctuations and common words, do stemming if docbook language + supports it, etc. + + + {$indexer-language-code}_stemmer.js - This includes the stemming library. + nwSearchFnt.js file calls stemmer method in this file for stemming. + ex: var stem = stemmer(foobar); + + + + + + + + +
+ New Stemmers + Adding new Stemmers is very simple. + Currently, only English, French, and German stemmers are integrated in to WebHelp. But the code is + extensible such that you can add new stemmers easily by few steps. + What you need: + + + You'll need two versions of the stemmer; One written in JavaScript, and another in Java. But fortunately, + Snowball contains Java stemmers for number of popular languages, and are already included with the package. + You can see the full list in Adding support for other (non-CJKV) languages. + If your language is listed there, + Then you have to find javascript version of the stemmer. Generally, new stemmers are getting added in to + Snowball Stemmers in other languages location. + If javascript stemmer for your language is available, then download it. Else, you can write a new stemmer in + JavaScript using SnowBall algorithm fairly easily. Algorithms are at + Snowball. + + + + Then, name the JS stemmer exactly like this: {$language-code}_stemmer.js. For example, + for Italian(it), name it as, it_stemmer.js. Then, copy it to the + docbook-webhelp/template/content/search/stemmers/ folder. (I assumed + docbook-webhelp is the root folder for webhelp.) + + Make sure you changed the webhelp.indexer.language property in build.properties + to your language. + + + + + + + + Now two easy changes needed for the indexer. + + + Open docbook-webhelp/indexer/src/com/nexwave/nquindexer/IndexerTask.java in + a text editor and add your language code to the supportedLanguages String Array. + + Add new language to supportedLanguages array + + change the Array from, + +private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko"}; + //currently extended support available for + // English, German, French and CJK (Chinese, Japanese, Korean) languages only. + + To, + +private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", "it"}; + //currently extended support available for + // English, German, French, CJK (Chinese, Japanese, Korean), and Italian languages only. + + + + + + + Now, open docbook-webhelp/indexer/src/com/nexwave/nquindexer/SaxHTMLIndex.java and + add the following line to the code where it initializes the Stemmer (Search for + SnowballStemmer stemmer;). Then add code to initialize the stemmer Object in your language. + It's self understandable. See the example. The class names are at: + docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/. + + + initialize correct stemmer based on the <code>webhelp.indexer.language</code> specified + + SnowballStemmer stemmer; + if(indexerLanguage.equalsIgnoreCase("en")){ + stemmer = new EnglishStemmer(); + } else if (indexerLanguage.equalsIgnoreCase("de")){ + stemmer= new GermanStemmer(); + } else if (indexerLanguage.equalsIgnoreCase("fr")){ + stemmer= new FrenchStemmer(); + } +else if (indexerLanguage.equalsIgnoreCase("it")){ //If language code is "it" (Italian) + stemmer= new italianStemmer(); //Initialize the stemmer to italianStemmer object. + } + else { + stemmer = null; + } + + + + + + + + That's all. Now run ant build-indexer to compile and build the java code. + Then, run ant webhelp to generate the output from your docbook file. + For any questions, contact us or email to the docbook mailing list + docbook-apps@lists.oasis-open.org. + +
+
+
+