Navigation: Learn About All Features > Application Settings >
Publications or ebooks created with HTML Executable come with a built-in search engine that allows end users to search for specific words or expressions through the entire publication in seconds. When compiling your publication or ebook, HTML Executable parses all HTML pages and PDF documents, collecting keywords from them. These keywords are then indexed and the result is stored in the publication's data. Since keywords are indexed, it only takes seconds for a search query to be completed.
HTML Executable indicates the number of pages and unique words that were found while indexing pages in the compilation log.
End users can access the search panel by clicking the "Search" button or by selecting "Navigate|Show Search" in the application.
Enabling the search engine can result in a larger EXE output file (it depends on the number of HTML pages and PDF documents you compile). If you, therefore, do not want to include a search engine in your publication, then turn the following option on: "Disable the search engine".
When a search is complete, the application lists the results. Each result displays the page's title and an extract with the found keyword(s) or expression(s).
Search results are automatically sorted by relevance. To do this, HTML Executable counts the number of occurrences of the search terms in each page, then assigns a percentage of relevance.
Some keywords may be automatically excluded from the index so they won't give any result if end users search for them. In addition to some common words, you may add your own sensitive keywords to the exclusion list. Just press Add and specify the keyword to add. On the contrary, you can remove keywords from the exclusion list by selecting them and clicking Remove. Keyword exclusion lists may be imported/exported from/to XML files using the XML Tools button, so you can edit them manually using any XML editor.
HTML Executable's search engine functionality allows you to index the content of HTML pages specifically within certain tags. By default, it indexes the content within the 'body' tag. However, if you're using a template with various frames where the same words appear on all pages of your website, this can skew the search results. To address this, you are encouraged to specify the name of the tag that contains unique content on each page. For instance, suppose your website contains content within a 'div' HTML tag with the ID 'content'. In this case, you would input `<div id="content"` into HTML Executable. This way, HTML Executable will only index the content enclosed between the `<div id="content">` tag and its corresponding closing `</div>` tag. This feature provides a more accurate and focused search functionality by indexing only the relevant content on each page.
The search engine is Unicode-enabled. When parsing HTML pages, HTML Executable takes account of the encoding format and the charset defined in HTML documents. All keywords are natively converted and stored in UTF-8 format.
If no charset is defined in an HTML file, you can specify the default HTML charset that should be used (by default, UTF-8).
HTML Executable uses lunr.js as its search engine. According to its documentation:
At the most basic level, search queries can consist of a single term, like 'hello'. However, they can also include multiple terms, which are joined with an OR operator. For example, the query 'hello world' will retrieve documents containing either 'hello' or 'world', but documents containing both will rank higher.
You can add wildcards to terms to represent one or more unspecified characters. These wildcards can be positioned anywhere within the term and a term can contain more than one wildcard. While this broadens the range of documents found, it can negatively affect query performance, especially when a wildcard is placed at the beginning of a term.
By default, when the end user types a query, HTML Executable immediately starts searching for it (a wildcard is always added to the end of the query entered).
You can limit terms to specific fields. For instance, with 'title:hello', only documents with 'hello' in the title field will match. Using a field that isn't in the index will result in an error.
HTML Executable's search facility supports modifiers for terms, including edit distance and boost. Boosting a term (e.g., 'foo^5') increases the ranking of documents matching that term. Edit distance enables fuzzy matching—for example, 'hello~2' will match documents containing 'hello' within an edit distance of 2. To improve query performance, it's best to avoid large values for edit distance.
Terms can have a presence modifier. By default, the presence of a term in a document is optional, but you can make it required or prohibited. Prefix the term with '+' to require its presence (e.g., '+foo bar' searches for documents that must contain 'foo' and may contain 'bar'). Prefix with '-' to prohibit its presence (e.g., '-foo bar' searches for documents that cannot contain 'foo' but may contain 'bar').
To escape special characters, use the backslash ('\'). This allows you to include characters in searches that would typically be viewed as modifiers. For instance, 'foo\~2' will search for the term "foo~2" instead of trying to apply a boost of 2 to the search term "foo".
When a page from a search result is opened, keywords that were searched for may be highlighted. For PDF documents, keywords are highlighted too.
C:\Program Files (x86)\HTML Executable 2023\Resources\Chromium\
However, it's important to note that only experienced users should attempt these modifications. It's essential to make backups before making any changes to prevent loss of original files or data.
If you get an "out of memory" error while compiling your publication, try to enable the Keep the search index data outside the EXE file option available in Output Format. The error means that you have reached the free memory limit available for 32-bit programs (2 GB). In that case, HTML Executable cannot store your search index in memory, and must store it as a file on the hard disk.