Offline Search Support

Packagers can include search indexes to support browser- and filesystem-based searching in packages.

Overview

One major disadvantage to offline browsing is that users cannot use the features provided by the TD search engine, like full-text or faceted search. However, there are some document formats that may want a specially-formatted index file or files to enable search in the archive's viewer.

The PackageDocument object has a textContents property that can be used to extract the textual contents of the file without any markup or styling, primarily for building a full-text index.

Building the search index

The Freemarker template, packagers/common/search-index.ftl, will prepare lunr.js indexes (in multiple languages) of text content in package documents.

In addition to the normal package variables, this page uses the following Freemarker variables to communicate with the template process.

docsToIndex
A sequence of package documents to index. This could be used to exclude certain package documents, such as large PDF documents, from indexing. Defaults to package documents list.
lunrLangScripts
A sequence of lunr-languages source file names that must be loaded in the browser to support searching.

This template uses <@package.generate> to load required libraries from lunr.js and lunr-languages into Nashorn script engine, then runs the lunr indexer to build search indexes for specified package documents. For each document to be indexed:

  1. The document language is identified from document metadata locale (preferably) or lang fields.
  2. Determine if language can be indexed. Since lunr-languages does not handle regional variants, only 2-letter language codes are considered. Only Chinese language codes indicating Simplified Chinese (zh-CN, zh-SG, zh-HANS) are handled.
  3. Group documents by language, for more efficient indexing.
  4. Load needed javascript source files to handle languages.

After grouping the documents by language (omitting documents in unhandled languages), the index is built. The indexing function is defined in packagers/scripts/server/search-index.js

The result of the indexing process is a javascript snippet that defines a structure containing the lunr.js indexes (in all applicable languages) and minimal document metadata to support navigation to documents returned from searches. This snippet is printed to the template content.

Indexing a large multi-lingual document collection can take several minutes. Since TD version 4.2, indexing progress will be updated in the package status dialog. (Prior to 4.2, only the start of indexing will be shown in the status dialog.) Indexing Simplified Chinese documents is particularly slow.

Browser searching

The utility template, helpers/searchHelpers.ftl will write the required <script> elements into the HTML page heads, and add search input control and page division for displaying search results.

The script, scripts/browser-search.js, executes the search and formats the results. The results will show a snippet of the document with search hits highlighted. Since lunr.js will in many cases return only the stemmed token that scored the hits, the script tries to find complete words that match the token. This process is not completely accurate, but normally will show several valid variants of the stem token that produced the hit. In some cases, there is no match found, and the snippet will be empty.