A common requirement for technical publications groups is to produce a Web-based help format that includes a table of contents pane, a search feature, and an index similar to what you get from the Microsoft HTML Help (.chm) format or Eclipse help. If the content is help for a Web application that is not exposed to the Internet or requires that the user be logged in, then it is impossible to use services like Google to add search.
Features
Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer.
Support for Chinese, Japanese, and Korean using code from the Lucene search engine.
Search highlighting shows where the searched for term appears in the results. Use the
button to toggle the highlighting off and on.Search results can include brief descriptions of the target.
Table of contents pane with collapsible toc tree.
Auto-synchronization of content pane and TOC.
TOC and search pane implemented without the use of a frameset.
An Ant
build.xml
file to generate output. You can use this build file by importing it into your own or use it as a model for integrating this output format into your own build system.
Possible future enhancements
Move webhelp-specific parameters and gentext strings into base DocBook stylesheets.
Use
tabindex
attributes to control the tab order in the output. The Contents and Search tabs should be first and second, then the search box and button, then the table of contents items, and so on.Add "Expand all" and "Collapse all" buttons to the table of contents.
Add other search options:
Add an option to use Lucene for server-side searches with table of contents state persisted on the server.
Add a simple form that uses a Google site:my.domain.com based search.
Sort search results based on relevance
Support wild card characters in the search query.
Parameterize width of the TOC pane OR make the TOC pane resizeable by the user.
Automate search results summary text:
Automatically use the first non-heading content as the summary in the search results.
Automatically limit the size of the search description to something 140 characters.
Support boolean operators in search.
Parameterize list of files to exclude from indexing. Currently it's hard coded that we don't index
index.html
andix01.html
(the legal notice and index topics). It should be smarter and automatically not index the index file even if it's not namedix01.html
.Improve performance by moving the table of contents div out of each page and into a separate JavaScript file which then adds it to the page.
Add to the indexer the ability to specify a list of files or file patterns not to index. Currently it does not index
index.html
orix01.html
, which is generally appropriate, but it should be up to the user to decide.Add an index tab populated by a separate JavaScript file. Include a param/property that allows the content creator to disable the index.
Add functionality to the
build.xml
file so that when a property is set, the build generates a pdf version of the document and includes a link to it from the header.Add breadcrumbs so the user will know what topics he's been to.
Consider using more advanced Lucene indexers for Chinese and Japanese than the CJKAnalyzer