Search Engine

Applies to HTML Viewer, IE Browser publications.

About the built-in search engine

Publications or ebooks created with HTML Executable come with a built-in search engine that allows end users to search for specific words or expressions through the entire publication in seconds. When compiling your publication or ebook, HTML Executable parses all HTML pages and PDF documents, collecting keywords from them. These keywords are then indexed and the result is stored in the publication’s data. Since keywords are indexed, it only takes seconds for a search query to be completed.

HTML Executable indicates the number of pages and unique words that were found while indexing pages in the compilation log.

End users can access the search panel by clicking the “Search” button or by selecting “Navigate|Show Search” in the application. You can also use HEScript commands to integrate the search engine into your HTML pages: see StartFullSearch.

Configuring the search engine

Enabling the search engine can result in a larger publication file (it depends on the number of HTML pages and PDF documents you compile). If you, therefore, do not want to include a search engine in your publication, then turn the following option on: “Disable the search engine”.

PDF documents can be indexed too, even if the built-in PDF viewer is deactivated.

When a search is complete, the publication lists the results. Each result displays the page’s title and URL on which end users can click to access the page. If you prefer to keep your URLs secret, you can also hide page URLs from the search results. In this case, a “(click)” URL will be displayed instead.

Search results are automatically sorted by relevance. To do this, HTML Executable counts the number of occurrences of the search terms in each page, then assigns a percentage of relevance. If you do not wish to display these search data, you can set the HESearchRelevanceNoDisplay global variable’s value to 0. However, pages will still be sorted by relevance.

search

Some keywords may be automatically excluded from the index so they won’t give any result if end users search for them. In addition to some common words, you may add your own sensitive keywords to the exclusion list. Just press Add and specify the keyword to add. On the contrary, you can remove keywords from the exclusion list by selecting them and clicking Remove. Keyword exclusion lists may be imported/exported from/to XML files using the XML Tools button, so you can edit them manually using any XML editor.

Finally, if your compiled website uses frames, you may need to specify in which frame a page whose URL was clicked on should be displayed. Use the SearchFrameTarget property to indicate it. This only applies to IE browser publications, however.

Support for Unicode

The search engine is Unicode-enabled. When parsing HTML pages, HTML Executable takes account of the encoding format and the charset defined in HTML documents. All keywords are natively converted and stored in UTF-8 format.

The “Use default word delimiters based on Unicode character categories” otpion should never be turned off. Otherwise, the search engine will use word delimiters defined in the Environment Options.

The “Automatically split CJK characters” (Chinese, Japanese and Korean) option allows an improved search for East Asian languages.

By default, publications and ebooks offer a search box so that end users can type their query directly into the field, and perform a search by clicking the “Magnifier” icon or pressing ENTER:

search box

The search box can be hidden if you want thanks to the “Do not show the search box” option.

About searches

The search engine supports phrases containing logical operators as for major Web search engines: + (AND), - (NOT), OR, *, ? (wild cards) and double quotes.

In particular “?” is used as a substitute for any one character as opposed to the asterisk, “*“, which can be used as a substitute for zero or more characters in a keyword.

Examples:

  • red apple will return pages that both contain red and apple.

  • "red apple" comes with pages that exactly contain the “red apple” expression.

  • red OR apple returns pages with red, and pages with apple.

  • red -apple returns pages that contain “red” but that also do not contain “apple”.

  • app* returns pages which contain any words beginning with app. The wildcard operator can be placed between characters like this: char*s. You may use up to 3 wildcards anywhere in your query.

When a page from a search result is opened, keywords that were searched for may be highlighted. You can modify the text style for highlighted words. For PDF documents, keywords are highlighted too.

The HTML Viewer engine supports highlighting one keyword only.

Customizing the display of search results

It is possible to customize how search results are formatted: go to the Application Behavior => Language page, and under Resource Strings, you can modify these three resource strings:

  • SSearchResHTMLTableStart: HTML tags that start the HTML table which will contain the search results.

  • SSearchResHTMLCellFormat: HTML tags that define a single table cell and its contents. The four %s parameters are required: do not remove them (enclose a parameter with an HTML comment <!-- --> if you do not want to make it visible). The 1st %s parameter is replaced by the result’s index; the 2nd one by the page’s title; the 3rd one by the URL that would display the page if clicked and finally the last one by either the filename of the page or the “click” word (see above). If you want to use the percent symbol (for example 3%), use it twice: 3%% (thus the formatting routine doesn’t misunderstand it with a parameter like %s).

  • SSearchResHTMLCellFormatAltern: same as SSearchResHTMLCellFormat, but used every two search results. Thus, you can get alternative backgrounds for search results (see the screenshot above).

  • SSearchResHTMLTableEnd: HTML tags that end the HTML table.

Customizing the search query

You can programmatically modify the query of the end user with the UserMain.OnModifySearchRequest HEScript event.

Large search index

If you get an “out of memory” error while compiling your publication, try to enable the Keep the search index data outside the EXE file option available in Output Format. The error means that you have reached the free memory limit available for 32-bit programs (2 GB). In that case, HTML Executable cannot store your search index in memory, and must store it as a file on the hard disk.


Copyright G.D.G. Software 2018. All rights reserved