MatchMaker
Features, Specifications, and Requirements
MatchMaker is a high performance fault-tolerant data matching and search
platform. Its unique focus on structured data allows very advanced data handling
capabilities at very high speeds ideal for directories, e-commerce applications,
enterprise security and data quality applications, and much more. MatchMaker is
also a full-featured server application with advanced configuration GUIs, data
connectivity support, connectivity for distributed architectures, benchmarking and
monitoring tools, and advanced APIs.
MatchMaker offers various kinds of options and extensions. There are also prepackages applications developed around its powerful data handling capabilities.
This document details all its features, options, extensions, and system
requirements.
System Requirements
Hardware
requirements
MatchMaker is a memory intensive application and should use a
DDR Ram System with a high FSB clock rate. CPU power is not as
important, but should be around 1GHz on such systems in order
to use the DDR-RAM to its best.
Operating
systems
requirements
Binary support is currently provided for:
Windows NT Family: Windows 2000, Windows XP, Windows
Server 2003
Linux: gcc 3.2 on Intel.
Solaris: Solaris 8 and 9 on Sparc.
Customer specific executables and libraries are available on
demand for all major systems or platforms.
Data
compatibility
MatchMaker can be paired with any database or structured data
format. Its proprietary in-memory index and incremental update
capabilities provide complete flexibility.
Indexing features allow to also index full-text data and
unstructured documents for lookups from MatchMaker.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
9/12/2008
System Requirements
Unicode and
languages
MatchMaker is Unicode compliant and language independent.
The data matching techniques available in MatchMaker rely
primarily on mathematical algorithms that work in all languages.
Transliteration
MatchMaker offers standard support for transliteration functions
from Hangul, Kanji, Katakana and Hiragana for Japanese to Roman
character sets. This allows building indexes that can be
alternatively accessed through Katakana or Hiragana or Romaji
(Roman) input, even with errors in them. Additional languages can
be easily added on request. So MatchMaker can be used on Asian
language interfaces with no changes to existing systems.
Architecture
Scalability of
system
Recall engine can be clustered to increase throughput by using
multiple processors on a single machine or multiple machines. The
system scales well for latency in large datasets with approximately
linear increase with the data size. Support is included for splitting
very large datasets to reduce memory requirements and latency.
Availability
The system remains available during data update. A drop in
available bandwidth during the update equal to one over the
number of recall engine services deployed is the only measurable
consequence of an index update.
Error-recovery
There is support for recovery from network errors and any forced
program termination within the run-time system components.
File history
New dataset builds can be reverted to one backup of the
compiled files and runtime configuration.
The files have a time-stamp id and only the youngest files and
runtime configuration are kept available.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
Architecture
Context storage
For constructing powerful interactive multi-field servers (patents
pending).
Dictionary size
Any size of database can be searched. In MatchMaker there is
virtually no limit to the size of the dictionary (your data indexed
by MatchMaker).
Data updates
For data without time sensitive changes, complete data can be
fetched and recompiled. Incremental update is supported through
a difference file mechanism and up to two levels of delta servers
to index the changes while the main index recompiles.
ODBC and CSV
Data Import
MatchMaker has the ability to import data via ODBC and CSV
sources. Data can thus be imported directly into MatchMaker
either from running databases or CSV files, without intermediate
extraction steps.
File mapping
File mapping on Windows and Unix systems allows faster loading,
sharing of memory and use by multiple processes.
Thread safety
Thread safety is offered after initialization (optional).
Release Policy
Releases are numbered as [Link]. Minor numbers
indicate feature changes and major numbers indicate the
introduction of major new features. Changes of a build number
only indicate bug fixes. The minor release interval is between 5
and 8 months.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
MatchMaker Components
exMinistrator
MatchMakers configuration manager includes wizards, graphical
support and logical arrangement of parameters in individual
configuration panels and links to an on-line help that describes all
parameters.
It allows to:
Create instances of exTractor (see below) components and
schedule the build process:
o exTractor compiles data
o exTractor signals exTributor
o exTributor signals exSight
o exSight fetches data
Install and configure services for exStream, exTributor and
exSight
Stop and start all active services, to allow installation of
software updates
exTractor
exTractor allows to configure the build and recall settings for a
given data source. Compiled database storage that combines all
functionality into one consistent data structure. It handles more
than 30 fields that can each use a different approximate matching
technique.
There are four main branches in tree view:
Project (etp), this contains the links to the other files
Extract (mbc), this contains data source and optional data
extraction settings
Build (mbc), this contains the settings that are needed to
compile the data
Session (ses), this contains only run time settings.
exTributor
exTributor is a query receptor and dispatcher required for each
MatchMaker index which can communicate with a web server
through the Matchmaker API.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
MatchMaker Components
exSight
The SESSION exSight engine is the query processor which handles
interactive multi-field search sessions.
The server stores the state of each open search session up to a
licensed limit.
The session has the following parts:
o Input fields with status, modes, etc.
o Field candidates
o Record Suggestions with selectable columns
o Context Selection
All field results have a status with respect to the context (in,
out, unknown)
exSampler
exTtractor links to exSampler for quickly testing the settings
Standalone process
Run locally for testing configuration
exSpector
Interactive test client per data (browser Build or Runtime system
simulate API calls).
exSpeed and
Monitor
Additional benchmarking tool and server performance monitor.
Dynamic analysis and visualization of queries in real-time. An
engineer can thus examine, search latency. Storage requirements
for the index file and queries can also be graphically analyzed.
This enables developers to easily find bugs such as memory leaks
and much more.
Optional MatchMaker Components
exStream
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
Scalability enabling system to broadcast between different
portions of data through several instances of exTributor and a
remote client through the MatchMaker API.
[Link]
T +1 503 616 4007
F +1 503 914 5937
MatchMaker Extensions
MatchMaker
Search
A specific implementation and configuration of MatchMaker
offering:
Web-based user GUIs allowing searching of data, display of
results, and search engine-style implementations with worldclass approximate results quality and relevance of
MatchMaker.
SearchNavigator
A specific implementation and configuration of MatchMaker
offering:
Web-based user GUI allowing interactive Web 2.0-style
suggestions. User gets a number of suggestions in an
interactive JavaScript layer as he types each character of his
query, straight from full index. This can be used as a search
interface enhancement, or a tool for decision support.
FlexForm
A specific implementation and configuration of MatchMaker
offering:
Web-based user GUI allowing interactive Web 2.0-style
advanced query configuration. User gets a number of
suggestions in an interactive JavaScript layer as he types each
character of his query. Once he chooses a suggestions, query
terms is stored under a given query field and suggestions for
the next logical field are offered. Again working straight from
full index. This can be used as a search interface
enhancement, a tool for decision support, a product
configuration, or simplified combination of advanced and
simple query interfaces.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
Matchmaker
OCR
A specific implementation and configuration of MatchMaker
offering:
automated matching of Optical Character Processed content
against a database of normalized content (addresses, words,
forms, administrative processing, etc.),
queuing of queries falling under a relevance metrics threshold,
Web-based GUIs allowing the manual processing of queued
queries by human operators with interactive multi-stage
decision support through MatchMaker Search and
SearchNavigator.
MatchMaker-based Applications
MatchMaker
Data Quality
Server
(Q1 2009)
A specific implementation and configuration of MatchMaker
offering:
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
automated matching of queries against a database of
structured content,
queuing of queries falling under a relevance metrics threshold,
sophisticated Web-based GUIs allowing the manual
processing of queued queries by human operators with
interactive multi-stage decision support through MatchMaker
Search and SearchNavigator,
a template language allowing the creation of complex GUIs
supporting all the search methods available in this
configuration.
[Link]
T +1 503 616 4007
F +1 503 914 5937
MatchMaker
Directory
Platform
A specific implementation and configuration of MatchMaker for
the online directory industry (Yellow Pages, local search, vertical
directories, etc.) usable a white label directory platform offering:
(Q2 2009)
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
a preconfigured MatchMaker implementation specially geared
at searching name and address data,
a Search Navigator enhanced search interface for optimal
usability,
a fully functional white label directory web site including
search results pages featuring maps, targeted ads, premium
listings, categorical browsing,
a unique set of partner applications for content management,
data sales management, ad serving, etc.
[Link]
T +1 503 616 4007
F +1 503 914 5937
Recall Engine Features
String Recall
MatchMaker contains several string matching algorithms such as
the Levenshtein edit-distance algorithm for unlimited fault
tolerance, the finite automaton structure (even faster recall limited
to 3 edit operations), and the longest common subsequence (LCS)
algorithm. The Levenshtein edit-distance is scaled by weighted
query and entry length calculated with: edit structure for unlimited
fault tolerance; auto structure for faster recall limited to 3 edit
operations; or words structure for word-order independent recall.
Special algorithms such as the SUBSET and SUPERSET algorithms
match word flipping like breakfast->fast-break.
Three modes are available: exact recall; approx for whole word
approximate access; complete for approximate prefix matching;
and detect for in-word matching. String recall is all performed in
8-bit character sets. The edit method also supports wild-card
search with glob characters '*' and '?' in exact mode.
Error correction
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
The following search term or error correction operations are
supported:
Insertion of single characters.
Deletion of single characters.
Substitution of single characters.
Transposition of adjacent characters.
Global transposition of characters and also parts of words.
Global character mapping, Umlaut expansion, de-accentuation
etc.
Local character alternatives ([g9]).
Wildcards and globstyle matching ('?', *).
[Link]
T +1 503 616 4007
F +1 503 914 5937
Recall Engine Features
Multi-field
Access
Weighted multi-field result intersection with re-evaluation and
support for local aliases. In addition to the field weighting fields
can be marked as mandatory or optional. Three index to base
references modes available: normal; interval for indexes with many
repeating values; and group for multi-word keyword type indexes.
Multiple-fields are combined using conjunction with the option of
disjunction on one level below. One field may be used as bias
field, meaning scores from this field are used to push results up,
rather than contributing to the conjunction.
Cross Field
Search
MatchMaker offers a fast Cross Field Search Module which parses
and assigns the query strings to the correct field. The result set
contains all similar results from each field. Using this splitting
approach, a query strings is easily possible. The queries are
interpreted as follows: "Is there any field whose contents are
approximately equal to the Nth part of the query?"
Combined InField Methods
MatchMaker enables the processing of fields with mixed contents.
One field may, for instance, contain a date value sometimes and,
in some cases, normal strings. MatchMaker can be configured to
automatically call the appropriate method based on the content
type. This means MatchMaker activates the appropriate
comparison method depending on the data not on the query (i.e.
date comparison is applied to date entries, while string
comparison is applied to all other entries.
Words2 Methods
Several multi-word lookup methods (with word swaps, free factor
search and special compression scheme) handle multi-word
search terms that contain interchangeable words. There is
automatic relevance ranking, single word aliasing, single word
biasing and other approaches. For fast and fault-tolerant
extraction of such keywords from long text, MatchMaker also
supports a quick scanning function with approximate lookup. This
allows labeling text entries containing relevant keywords which
can be then used to support SearchNavigator.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
10
Recall Engine Features
Approximate
grep
Function for approximate scanning of files.
Phonetic
matching
Going beyond SOUNDEX (SOUNDEX, METAPHONE or Exorbytes
own phonetics all available) and using edit distance features in
addition.
Flag Attributes or
Option Attributes
Highly structured data often contain yes/no attributes (flag
attributes) and numerical attributes with limited value sets (option
attributes). Such data attributes can now be efficiently
compressed by MatchMaker, so that hundreds of these attributes
can be combined and queried approximately. This enables a single
query to return all entries whose attribute sets most closely match
those in the query, without having to test each attribute set
separately.
Automatic
re-evaluation
For all query modes for building ordinary SQL data servers.
Geographic
recall (GPS)
Geographic recall method available for bi-dimensional
approximate geometric recall. MatchMaker can configure the
radius and the sharpness of such a lookup. Results can be scored
and ranked as for any other string distance recall.
Date
Comparison
Module
Data fields with calendar date information can be compared using
many different standard formats. Comparison is approximate with
respect to spelling AND time distance, where the sharpness of
time retrieval is configurable.
Number Range
Data fields with simple numbers can be used with range
comparison functions that allow for approximate queries like
value is approximately greater than, where the accuracy can be
configured.
Other recall
methods
Counter for trivial unique-key handling; bias for including
popularity or other additional weighting information in record
scoring; premise method for evaluation premise ranges; a
configurable Tcl method for custom recall.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
11
Recall Engine Features
Fast combination
of results
From different modules using weights and relevance from each
single query result.
Very fast wordby-word
comparison
Routines using similar techniques as the indexing version for
almost all modes, including localization of the actual match.
Plausibility
evaluation
For a priori plausibility and a posteriori evaluation of
corrections made.
Aliases
MatchMaker supports alias (synonyms, acronyms, abbreviations,
fixed skip words, white lists, black lists, etc.) handling in different
ways: Local aliases that only apply to a single record, or global
aliases that are valid for the whole data set. Systematic aliases are
also supported through alternative fields. In a different language
for instance, that relationship is treated as a logical OR.
Search Profiles
MatchMaker allows custom search profiles. Different clients,
depending on their roles, may use different interfaces to connect
to the same database (weighting differences, connection logic,
rescaling, thresholds, extra fields, etc.) All of these settings can
now be configured separately for each client type and saved
under a given profile name.
Access Rights
Management /
View Module
Matchmaker offers a very fast filtering method that provides
filtering of "allowed" candidates whether they are individual
entries, whole branches of a category-tree, or other subsets. The
method allows for easy implementation of a view or of role
concepts already taken into account during the search on the
server.
Character
Encodings
Queries and data can use any standard encoding scheme (e.g. ISO
8859 or Unicode utf-8). Internally Unicode data is handled by
mapping utf-8 strings into an 8-bit character set. Configurable
character mappings are used before all comparisons to allow sets
of similar characters to be unified before comparison. The
mappings also allow for single characters to be mapped to
character strings up to 4 characters.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
12
Recall Engine Features
Full Text Search
Module
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
MatchMaker has a full text indexing engine for documents on a
file by file basis. For each word in the document collection
MatchMaker tracks the position of the word within each
document, the frequency of the word in each document, overall
frequency and other metrics. These measurements all become
available as ranking criteria for subsequent query processing.
MatchMakers full text module supports phrase detection,
exclusion words, inclusion words, approximate or exact recall on
word basis, word combination and word splits on demand, single
word aliasing, single word biasing, custom skip words, automatic
skip words, wildcard search, suffix and prefix search, and more.
MatchMaker full text module can also generate teasers (text
surrounding the matched words in the original document) on
exact or approximate matches.
MatchMaker full text module can be integrated with standard web
crawlers. Crawlers can be controlled from MatchMaker to directly
import documents of different types.
[Link]
T +1 503 616 4007
F +1 503 914 5937
13
Integration and API
OCR-Extensions
The OCR version of Matchmaker has special recall functionality for
handling of OCR recognition engine character guesses. And also
additional logic for labeling the status of a given result with
respect to the suitability of the result for automatic update and/or
display to human operator for manual verification.
Server Side
Scripting
MatchMaker supports Tcl server-side scripting to replace standard
search functionality by a custom logic with the ability to read all
input strings and modes. Manipulate the recall session and
construct results by merging several query results and/or post
processing the results. In the scripting ten extra DETECT recall
modes are available for selecting special algorithmic modes and
re-evaluation models.
It also allows custom created scripting filters on single fields,
manipulating complex queries, modifying results by merging
several query results, and post-processing results. Additional
resources can be utilized in the scripting, like using special
comparison methods (string matching algorithms). Writing scripts
is supported by template generators for each function type.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
14
Integration and API
Application
Programming
Interface (API)
Communication with exTributor or exStream is done via an API
(called MMI).
This client-side programming interface for building powerful
applications allows almost any programming language required.
The underlying unified client interface is plain text over TCP/IP
sockets, allowing access from any language with support for
sockets. Native APIs and example code are available for: C++,
COM, Java, PHP, Tcl and Python.
For each query we open a new port (optional)
All data is ASCII with tab separators for keys and value pairs
Full interface specifications, code examples, and documentation
are available to support developers who write their own client
libraries.
Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA
[Link]
T +1 503 616 4007
F +1 503 914 5937
15