Offline Search Support
Overview
One major disadvantage to offline browsing is that users cannot use the features provided by the TD search engine, like full-text or faceted search. However, there are some document formats that may want a specially-formatted index file or files to enable search in the archive's viewer.
The PackageDocument
object has a textContents
property
that can be used to extract the textual contents of the file without any markup or
styling,
primarily for building a full-text index.
Building the search index
The Freemarker template, packagers/common/search-index.ftl
, will
prepare lunr.js
indexes (in multiple languages) of text content in
package documents.
In addition to the normal package variables, this page uses the following Freemarker variables to communicate with the template process.
- docsToIndex
- A sequence of package documents to index. This could be used to exclude certain package documents, such as large PDF documents, from indexing. Defaults to package documents list.
- lunrLangScripts
- A sequence of
lunr-languages
source file names that must be loaded in the browser to support searching.
This template uses <@package.generate>
to load required libraries
from lunr.js
and lunr-languages
into Nashorn script
engine, then runs the lunr
indexer to build search indexes for specified
package documents. For each document to be indexed:
- The document language is identified from document metadata locale (preferably) or lang fields.
- Determine if language can be indexed. Since
lunr-languages
does not handle regional variants, only 2-letter language codes are considered. Only Chinese language codes indicating Simplified Chinese (zh-CN
,zh-SG
,zh-HANS
) are handled. - Group documents by language, for more efficient indexing.
- Load needed javascript source files to handle languages.
After grouping the documents by language (omitting documents in unhandled languages),
the
index is built. The indexing function is defined in
packagers/scripts/server/search-index.js
The result of the indexing process is a javascript snippet that defines a structure
containing the lunr.js
indexes (in all applicable languages) and minimal
document metadata to support navigation to documents returned from searches. This
snippet is
print
ed to the template content.
Indexing a large multi-lingual document collection can take several minutes. Since TD version 4.2, indexing progress will be updated in the package status dialog. (Prior to 4.2, only the start of indexing will be shown in the status dialog.) Indexing Simplified Chinese documents is particularly slow.
Browser searching
The utility template, helpers/searchHelpers.ftl
will write the
required <script> elements into the HTML page heads, and add search
input control and page division for displaying search results.
The script, scripts/browser-search.js
, executes the search and formats
the results. The results will show a snippet of the document with search hits highlighted.
Since lunr.js
will in many cases return only the stemmed token that
scored the hits, the script tries to find complete words that match the token. This
process
is not completely accurate, but normally will show several valid variants of the stem
token
that produced the hit. In some cases, there is no match found, and the snippet will
be
empty.