Search Appliance

 

Thunderstone Search Appliance Manual

Language Analysis

If Enable is set to Y, pages walked are processed through the Language Analysis Module (LAM), obtained and installed separately. This module helps support searching in languages such as Chinese, Japanese and Korean, where there is often no whitespace to delineate one "word" (logogram, or group of characters) from another, making searching difficult. The Language Analysis Module inserts spaces between words in the text of such pages, enabling ordinary non-wildcard searches to match better. At search time, users' queries are also passed through the module, so that they can match the processed pages' text.

Language

A two-letter ISO 639 language code "hint" for the LAM. If all or a majority of the walked data is a single language, entering that language's code here will help the LAM process data better. The default is empty (no hint). Added in Texis version 6.00.1294975881 20110113.

Preserve 7-bit

Whether to preserve the separation of all-7-bit tokens. Sometimes the LAM will separate alphanumeric tokens that are not language words, e.g. part numbers, causing search problems. Setting this to Y will attempt to preserve the separation (or lack thereof) of all-7-bit tokens in the walked text.


Copyright © Thunderstone Software     Last updated: Mar 21 2023