Search Appliance
Appliance Home
Top of Manual
PDF Version
Thunderstone Search Appliance Manual
<<Previous:
Thunderstone Search Appliance ... ...
↑Up:
Thunderstone Search Appliance ... ...
Next>>:
Overview
Contents
Overview
Features
Technical Support
Installation
How to unpack and install the Search Appliance
Console Menu
Front Panel LCD
Customizing the Search Appliance's Appearance
Operation
Running the Administrative Interface
First Time Run: Quick Start
Step 1: Create an Account
Step 2: Create a Profile
Step 3: Walk the Profile
Last Step: Search
Administrative Interface Overview
Basic Walk Settings
All Walk Settings
Search Settings
List/Edit URLs
Browse URLs by Folder
List Duplicates
Test Fetch
Test Search
Query Log
Replication Tools
Results Cache
SOAP Tools
Integration Tools
Best Bet Groups
Status
Search
Profiles
Dashboard
System
System Information
Document Usage Overview
Log Viewer
Test Network and Servers
Task Monitor
Thesaurus
Client Certificates
Static Content
DBWalker
Connectors
Network Shares
OneBox Providers
System Wide Settings
AWS Tools
Update Software
RAID Array Management
SSL/HTTPS Certificates
Webmin System Management
Backup Appliance Settings
Restore Appliance Settings
System Replication Queue
System Replication Target Status
Accounts & Groups
Access Control Lists
Extra Downloads
Upload Thunderstone Updates Manually
Support Connection
Support Command
Repair Tools
Check Version Upgrade Actions
Re-output XSL files
Re-schedule walks
Docs
Basic Walk Settings
Walk Summary
Notes
Base URL(s)
Robots
Robots Crawl-delay
Allow Extensions
Exclude Extensions
Exclusions
Walk Delay
Parallelism
Verbosity
Disable Starting Walks
Rewalk Type
New
Refresh All
Refresh
Singles Only
Rewalk Type Summary Table
Rewalk Schedule
Action Buttons
Advanced Walk Settings
Watch URL
End of Walk Email
Attach Logs
Categories
Categories Type
DBWalker
URL File
URL URL
Single Page
Page File
Page URL
Strip Queries
Keep Query Vars
Ignore Query Vars
Sort Query Vars
Lower Query Var Values
Ignore Case
Host Aliases
Host Aliases from robots.txt
Extra Domains
Extra Networks
Extra URLs REX
Exclusion REX
Exclusion Prefix
RSS Feeds
Exclude by Field
Additional Fields
Data from Field
Data From Field Example - Using Description for Title
Data From Field Example - Using PublishDate for Last Modified Date
Data From Field Example - Grabbing Price from Meta
Data From Field Example - Grabbing Price from Text
Data From Field Example - Subfetch to use PDF Contents for a Web Page
Required REX
Required Prefix
Max Page Size
Max Pages
Max Bytes
Max Depth
Max URL Size
Max Requests
Max Connection Lifetime
Page Timeout
Meta Tags
Standard Meta
All Meta
Storage Charset
Source Default Charset
XML UTF-8
Keep Links
Remove Common
Ignore Selectors
Ignore HTML Strings
Keep Selectors
Keep HTML Strings
Ignore Characters
Plugin Split
Language Analysis
CJK Mode
Unknown File Formats
PDF Title Action
Word Definition
Text Search Mode
Attribute Compare Mode
Index Fields
Compound Index Fields
Extra Indexes
Spell-check Dictionaries
Primer Type
Primer URLs
Submitting the Form Directly: Custom Primer URL
Filling Out the Form: Custom Primer Variables
Checking for Bad Logins: Bad Login MM Query
Multiple Primers: Base URL MM Query
Following additional links with the !FOLLOW_LINK token
Unprimer URLs
Submitting the Form Directly: Custom Unprimer URL
Filling Out the Form: Custom Unprimer Variables
Checking for Bad Logins: Bad Login MM Query
Multiple Unprimers: Base URL MM Query
Following additional links with the !FOLLOW_LINK token
Login Info
Proxy Auto-Config URL
Proxy
Proxy Login Info
Client Certificate
Cookie Source Path
Cookie Jar
Strict Cookie Paths
Off-Site Pages
Off-Site Components
Stay Under
Prevent Duplicates
Respect Canonical URLs
Duplicate Check Fields
Store Refs
Inline Iframes
Max Components
Execute JavaScript
Fetch JavaScript
JavaScript String Links
Debug JavaScript
JavaScript Memory
JavaScript Timeout
AJAX Crawlable URLs
Walk Trace Settings
Audit Log
Performance Logging
Batch Locks
URL Protocols
HTTP Version
SSL Client Protocols
SSL Client Ciphers
SSL Use SNI
SSL Allow Unsafe Renegotiation
IP Protocols
Network Share Access Method
Network Share Protocols
File URL Get Owner Headers
Authentication Schemes
Embedded Security
Body Storage Method
Multiple Fetches
Follow Cross-Site Links
Max Redirects
Empty Form Redirects
Execute Walked Dataload
Index Name
DNS Mode
User Agent
Robots.txt Agents
Mime Types
Custom Headers
Respect Expires Header
Cache Content
Default Refresh Time
Minimum Refresh Time
Maximum Refresh Time
Maximum Process Size
Always Refresh Listing Page
Maximum Load Average
Replication Settings
Send Data
Send Settings
Batch Rows
Batch Size
Batch Idle
Log Replication
Search Settings
Notes
Query Logging
Rotate Schedule
Email
Result Order
Results Style
Allow RSS
Format XSL Output
XSL File
Abstract Style
Abstract Length
Max Title Length
Max URL Display Length
Results per Page
Max User Results per Page
Page Links Shown
Results per Site
Allow site: syntax
Allow link: syntax
Results Width
Box Color
Show File Icons
Show Thunderstone logo on results
Show Advanced Search
Query Autocomplete
Max Completions
Results Highlighting
Context Highlighting
PDF Query Highlighting
PDF Highlighting Format
Font
Display Charset
Top HTML and Bottom HTML
Enable Sherlock
Best Bet Match Mode
Top Best Bet Title
Right Best Bet Title
Top Best Bet Group
Right Best Bet Group
Top Best Bet Box Color
Right Best Bet Box Color
Top Best Bet Border Style
Right Best Bet Border Style
Right Best Bet Box Width
Authorization Method
Login Cookies
Login URL
Additional CAS Setup
Basic/NTLM/file Cookie Type
Login Verification URL
Authorization Target
Unauthorized Result Query
Username Fixup
Examples
Max Docs to Auth-Check
Successful Auth Result Limit
Total Auth Timeout
Allow Authorization URL
Authorization Caching
Authorization Debug Log
Show Authorization Info
Enable Spell Check
Suggest Time Limit
Number of Suggestions
Synonyms
Main Thesaurus
Secondary Thesaurus
Translate Boolean
Quotes for Literal
Allow the @ Operator
Allow Linear
Allow "NOT" Logic
Allow Post-Processing
Allow Wildcards
Allow Leading Wildcards
Single-Word Wildcards
Allow WITHIN Operators
Require All Words
Resolve Phrase Noise Words
Phrase Word Processing
Keep Noise Words
Noise List
Search Timeout
Show Error Messages
Debug SQL Level
Debug Metamorph Level
Search Trace Settings
Fast Result Counts
Proximity
Language Characters
Word Forms
Custom Suffix List
Custom Suffix Default Removal
Custom Suffix Min Length
Word Ordering
Word Proximity
Database Frequency
Document Frequency
Position in Text
Depth in Site
Date Bias
Ranked Rows
XML Export Variables
File URL Format
Redirect Format
Phishing Protection
Prevent Find Similar Fetch
Decode Displayed URLs
Results Caching
Max Cache Entry Age
Max Cache Size
Min Search Time
Visible
System Wide Settings
System Alert Email
Admin Theme
Admin Logo
Home Page
Enter At Search
Default Profile
Favicon.ico
Robots.txt
Cluster Members
API Logging
Task Monitor Logging
Google Connector Logging
Audit Logging
Console Password
OS Login Banner
Admin Banner
Login Expiration
Disable Starting All Walks
Update Software
HTTP Proxy Server
Proxy Username
Proxy Password
System Replication Settings
Allow Receiving
Log All Replication
Enable HTTPS Server
Require HTTPS for Direct Admin
Require HTTPS for Proxy Admin
Admin Access IPs
HTTPS/SSL Protocols
HTTPS/SSL Ciphers
Honor Cipher Order
Enable SNMP service
SNMP Community Name
SNMP Location Value
SNMP Contact Value
SNMP Access IPs
Syslog Forwarding Targets
Administration Interface Options
<title> order
<title> max profile length
Experimental Features
Results Authorization
Results Authorization Walk Settings
Results Authorization Search Settings
Meta Search - Search multiple profiles as one
Profile Creation
Meta Search Walk Settings
Search Settings
Access Control
User Groups
Object hierarchy
Access Control Lists
Determining Effective Rights
Required Rights for Admin Actions
Walk and Search Settings
Starting and stopping a walk
Best Bets
List/Edit URLs
List Duplicates
Walk Status
Query Log
Profiles
Accounts
User Groups
Access Control
Maintenance
Running the Search Interface
Procedures and Examples
Searching your Index
Similarity Searching
Using the Thesaurus Feature
Getting Software Updates
Page Exclusion, Robots.txt, and Meta-robots
Indexing Other Sites
Indexing Individual Pages
Reindexing on a Schedule
Checking for Web Server Errors
Removing Pages from the Database
Troubleshooting missing content URLs
Erasing the Entire Database
Using Multiple Databases
Integrating Search with your Site
Link to the Appliance
Embed a search box
Request XML search results
Issuing a Query Programmatically
Search Parameters
XML Elements in Search Results
Invoking Query Autocomplete
Invoke the search SOAP API
Search Result RSS Feeds
OpenSearch Support
Using Best Bets
Quick Creation
Fully Customized
Using Access Control
Initial Lockdown
Example: User with Complete Control on One Profile
Example: User with Look and Feel Control on All Profiles
Indexing File Servers
Replication
Replication Overview
Procedure - Replicating One Profile
Set up the Sender Profile
Create the Receiver Profile
Procedure - Separate Hot Backup Machine
Configure the Backup Machine
Configure the Main Machine
Synchronize Pre-existing Profiles
Making Backup Live on Main Failure
Using Circular Replication
Setup
Notes and Limitations
Dataload API
Submitting Content
Uploading a binary file
Combining the two: binary files with custom fields
Additional Fields
Refs and Errors
Setting Best Bet Groups
Setting Best Bets
Reply Format
Dataload SOAP API
Additional Fields
Overview
Populating
Sorting
Searching
DBWalker
Overview
Configuration Overview
DBWalker Output Overview
DBWalker Authentication Overview
Obtaining DBWalker
Managing DBWalker
DBWalker Global Options
Managing DBWalker Configurations
Managing DBWalker Stylesheets
Adding Configurations to Profiles
SOAP API
SOAP Overview
SOAP API vs. XML Output
Getting the WSDL
Global vs. per-profile WSDLs
Configuring the SOAP Interface
Dataload SOAP API
C# example project
SOAP Links for Languages
SOAP API search Reference
search
moreLikeThis
matchInfo
showParents
getCompletions
SOAP API dataload reference
dataload
SOAP API admin Reference
login
listProfiles
getDocumentUsageOverview
getProfileStatus
addProfile
deleteProfile
getSettings
setSettings
getQueryLogRaw
pauseWalk
stopWalk
startWalk
getTask
getTasks
getProfileErrors
getProfileLog
setParametricFields
getBestBetGroups
saveBestBetGroup
deleteBestBetGroup
getBestBets
saveBestBets
deleteBestBets
getThesauruses
setThesaurus
deleteThesaurus
Thunderstone ISAPI Proxy Module
Overview
Requirements
Installing the Proxy Module
Post-Install Setup
Grant "Trust for Delegation" to the proxy machine
Configuring Internet Explorer for Passing Credentials
Configuring the Search Appliance
Add the Proxy Machine to Cluster Members
Make the Target Profiles Visible
Enable Results Authorization for the Target Profile
Manually Configuring the Proxy Module
Troubleshooting the Proxy Module Authentication
Review Installation Steps
Machine names and SPNs
DelegConfig Diagnostic Tool
Proxy Module
conf/texis.ini
Section
Auth Proxy
conf/texis.ini
Section
Security Best Practices
Reference
REX Syntax
Expressions
Repetition Operators
RE2 Syntax
\<nomatch\> Syntax
REX Caveats and Commentary
Some Useful REX Expressions
REX Replace Syntax
Supported File Formats
Database and File Usage
Walk Database Tables and Fields
Options Table Fields
Customizing the Search
Customizing the Walker
Search Interface Help
Forming a Query
Query Rules of Thumb
Overview of Query Abilities
Controlling Proximity
Ranking Factors
Keywords Phrases and Wild-cards
Applying Search Logic
Natural Language Query
Using the Special Pattern Matchers
Invoking Thesaurus Expansion
Using Word Forms
Controlling Proximity
Interpreting Search Results
Viewing Match Info
Finding Similar Documents
Showing Document Parents
Third-Party Software
Antiword
Aspell
Catdoc xls2csv
Cole library
iconv
libpst
libxml2
Libxslt
Libexslt
JDBC drivers
Oracle JDBC driver
JTDS JDBC driver
PostgreSQL JDBC driver
MySQL JDBC driver
ppt2html, msg2html
SSL/HTTPS plugin
unrar
unzip
zlib
SpiderMonkey (JavaScript-C) Engine
PDF/anytotx plugin
JANSSON
thttpd - throttling HTTP server
RedHat Linux
CentOS Linux
MagnificPopup
Webmin
Java
OpenSSL RPM
RAID utilities
LCDpoc
GNU General Public License
GNU Lesser General Public License
GNU Library General Public License
Netscape Public License
UnixUtils
PuTTY
MIT Kerberos
Cyrus SASL
Copyright © Thunderstone Software
Last updated: Mar 21 2023
<<Previous:
Thunderstone Search Appliance ... ...
↑Up:
Thunderstone Search Appliance ... ...
Next>>:
Overview