=head1 pgestraier - PostgreSQL full-text search using Hyper Estraier
This package is essentially composed of two different parts:
=over 4
=item search function
PostgreSQL function to search Hyper Estraier full-text index, using
full-text queries and attribute filtering to return user-specified
table of results.
This function can mimic SQL C<LIMIT>, C<OFFSET> and C<ORDER BY>
functionality much faster than using those SQL constructs on search
results.
=item trigger function
PostgreSQL trigger function to keep Hyper Estraier in sync with PostgreSQL.
It triggers after insert, update or delete and update full-text index
accordingly.
=back
Both functions are written in C, while test framework and supporting
utilities are written in perl.
You can use just one of those functions. If you want just to search existing
Hyper Estraier index or generate it off-line (after nightly batch jobs, for
example), just use search function.
On the other hand, if you want just to keep your Hyper Estraier index in
sync with PostgreSQL data, you can use just trigger function to achieve that.
=head1 Why is it written?
Aside from providing single query language (SQL) to RDBMS and full text index
(using any language that has PostgreSQL client libraries), real power is
hidden in ability to join results from full text index and structured data
in RDBMS.
For simple real-life example which address problem
C<< WHERE name LIKE '%foo%' OR surname LIKE '%foo%' >>
is slow see L<Tutorial> and L<pgest-index> documentation.
=head1 How to install
Installation should be simple. However, you will have to have following
software already installed before you try this functions:
=over
=item *
PostgreSQL (tested with versions 7.4, 8.0 and 8.1) with development libraries
=item *
Hyper Estraier (tested with various versions, recommended 1.2.4 or newer)
with development headers
=item *
working C compiler (tested with gcc)
=back
If you want to use helper script to create consistency triggers to keep
Hyper Estraier in sync with PostgreSQL database, you will also need:
=over
=item *
working perl installation
=item *
perl modules C<Search::Estraier>, C<DBI> and C<DBD::Pg>
=back
To run tests you will also need:
=over
=item *
perl module C<Test::More>
=item *
C<trivia.list.gz> from Internet Movie Database in C<data/> directory.
You can download it from L<http://www.imdb.com/interfaces>
=item *
PostgreSQL database C<test> with permissions for current user
=item *
Hyper Estraier C<estmaster> running with permissions for C<admin> user
to create C<trivia> node.
=back
If you have all that, you should be able to type
make
and see sample results. You will be asked your password once (via sudo) to
install C<pgest.so> shared library in system-wide location so that PostgreSQL
could access it.
=head2 Create sample index using Hyper Estraier perl bindings
Perl bindings for Hyper Estraier are available at CPAN:
L<http://search.cpan.org/~dpavlin/Search-Estraier/>
After installing C<Search::Estraier> you can create index using following commands:
cd data
make index
cd ..
To run tests (which require that you have estcmd in your $PATH) issue
make test
See also included file C<test.sql> for more examples of usage.
=head1 Usage of search function pgest from SQL
C<pgest> PostgreSQL function tries to mimic usage of normal database tables (with support for attribute filtering, limit and offset) in following way:
SELECT
-- columns to return (defined later)
id,title,size
FROM pgest(
-- node URI, login, password and depth of search
'http://localhost:1978/node/trivia', 'admin', 'admin', 42,
-- query
'blade runner',
-- additional attributes, use NULL or '' to disable
-- multiple attributes conditions can be separated by {{!}}
'@title ISTRINC blade',
-- order results by
'@title STRA',
-- limit, use NULL or 0 to disable
null,
-- offset, use NULL or 0 to disable
null,
-- attributes to return as columns
ARRAY['@id','@title','@size']
) AS (
-- specify names and types of returned attributes
id text, title text, size text
);
You should note that Hyper Estraier uses UTF-8 encoding, while your
PostgreSQL installation might use different encoding. To fix that, use
C<convert> function in PostgreSQL to convert encodings.
=head2 Using index via C<estmaster> server process
This is default and recommended way to use C<pgest> functionality. In this
case, C<pgest> will use node API and access index through C<estmaster>
process which should be running on (local or remote) machine.
This will remove database opening overhead, at a cost of (small) additional network
traffic. However, you can have Hyper Estraier C<estmaster> process running on
different machine or update index while doing searches, so benefits of this
approach are obvious.
=head2 Accessing database directly
B<Please note that direct access to database is depreciated.> As such, it's
not stated in example, and it's kept just for backward compatibility, but it
will probably be removed in future versions of C<pgest>.
If you want to access database directly (without running C<estmaster> process), you
have to replace node URI, login, password and depth with full path to database file.
Have in mind that C<postgres> user under which PostgreSQL is running must
have read permission on Hyper Estraier database files.
This will work a bit faster on really small indexes. However, when your
index grows bigger, you might consider using node API to remove overhead of
database opening on each query.
=head1 Usage of trigger function pgest_trigger from SQL
Let's first say that I really suggest that you use C<dbi-index.pl> helper script to
create triggers because it already supports following steps automatically:
=over
=item begin transaction
Transaction is needed to catch updates which might happen while creation
of full-text index is in progress (and on huge collections this can take a while,
just like normal index creation in PostgreSQL).
=item insert all existing data in full-text index
This will be done directly from PostgreSQL database to Hyper Estraier index.
This is somewhat faster than waiting for trigger to fire for each existing
row.
=item create insert, update and delete triggers
Which will keep data in sync later
=item commit transaction
=back
If you still want to do that manually, you will need to know format of
C<pgest_trigger> function:
CREATE TRIGGER pgest_trigger_insert AFTER INSERT
ON table FOR EACH ROW
EXECUTE PROCEDURE pgest_trigger(
-- node URI, login and password
'http://localhost:1978/node/trivia', 'admin', 'admin',
-- name of primary key column
'id',
-- names of all other columns to index (one or more)
'column', 'another_one', 'and_another'
)
You have to create triggers for C<UPDATE> and C<DELETE> in similar way.
=head1 Who wrote this?
Hyper Estraier is written by Mikio Hirabayashi.
PostgreSQL is written by hackers calling themselves PostgreSQL Global
Development Group.
This small C functions are written by L<Dobrica Pavlinusic|http://www.rot13.org/~dpavlin/>, dpavlin@rot13.org.
=head1 See also
=over
=item *
L<Tutorial> - how to create first full-text index in under 10 minutes!
=item *
L<ChangeLog> - what has changed since last version
=item *
L<pgest-index> - helper script to create index and triggers
=item *
L<pgFoundry|http://pgfoundry.org/projects/pgestraier/> hosts home page of this project
=item *
L<Hyper Estraier user guide|http://hyperestraier.sourceforge.net/uguide-en.html#searchcond>
has a documentaton about query format. C<pgestraier> is using noraml queries (with
C<AND>, C<OR> etc.) and not simplified queryies (with C<|>).
=back