/[SWISH-PlusPlus]/trunk/PlusPlus.pm

This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!

Diff of /trunk/PlusPlus.pm

Parent Directory | Revision Log | View Patch Patch

-revision 20 by dpavlin,
Sun Dec  5 21:06:48 2004 UTC
+revision 21 by dpavlin,
Sun Dec  5 22:24:09 2004 UTC
 Line 13 
 use BerkeleyDB;
  =head1 NAME
- SWISH::PlusPlus - Perl extension SWISH++
+ SWISH::PlusPlus - Perl extension for full-text indexer SWISH++ with properties support
  =head1 SYNOPSIS
 Line 23 
 SWISH::PlusPlus - Perl extension SWISH++
  =head1 DESCRIPTION
  This is perl module to use SWISH++ indexer by Paul J. Lucas. SWISH++ is
- rewrite of swish-e in C++ which is extremly fast (thank to mmap), but without
+ rewrite of swish-e in C++ which is extremely fast (due to mmap usage and
- support for properties (which this module tries to fix).
+ clever language heuristics), but without support for properties (which this
+ module tries to fix).
- Implementation of this module is crafted after L<Plucene::Simple> and it
- should be easy to replace Plucene with this module for increased
+ Implementation of API is something in-between C<SWISH::API> and
- performance. However, this module is not plug-in replacement.
+ C<Plucene::Simple>. It should be easy to replace Plucene or swish-e with
+ this module for increased performance. However, this module is not plug-in
+ replacement.
  =head1 METHODS
  =head2 new
- Create new indexing object.
+ Create new instance for index.
    my $i = SWISH::PlusPlus->new(
          index_dir => '/path/to/index',
-Line 45 
 Create new indexing object.
+Line 47 
 Create new indexing object.
          use_stopwords => 1,
    );
- Options to new are following:
+ Options are described below:
  =over 5
  =item C<index_dir>
- Path to directory in which index will be created.
+ Path to directory in which index and meta database will be created.
  =item C<index>
-Line 71 
 C<STDERR> prefixed by C<##>.
+Line 73 
 C<STDERR> prefixed by C<##>.
  =item C<meta_in_body>
  This option (off by default) enables to search content of meta fields
- without specifing them (like they are in body of document). This will
+ without specifying them (like they are in body of document). This will
- somewhat increate index size.
+ somewhat increase index size.
  =item C<use_stopwords>
-Line 119 
 sub new {
+Line 121 
 sub new {
  =head2 check_bin
- Check if swish++ binaries specified in L<new> are available and verify
+ Check if SWISH++ binaries specified in L<new> are available and verify
  version signature.
    if ($i->check_bin) {
-Line 130 
 It will also setup property
+Line 132 
 It will also setup property
    $i->{'version'}
- which you can examine to see version.
+ which you can examined to see numeric version (something like C<6.0.4>).
  =cut
-Line 161 
 sub check_bin {
+Line 163 
 sub check_bin {
  Quick way to add simple data to index.
-   $i->index_document($key, $data);
+   $i->index_document($path, $data);
    $i->index_document( 42 => 'meaning of life' );
+ C<$path> value is really path, so you don't want to use directory
+ separators (slashes, /) in it probably.
  =cut
  sub index_document {
-Line 183 
 sub index_document {
+Line 188 
 sub index_document {
  =head2 add
- Add document with metadata to index.
+ Add document with meta-data to index.
    $i->add(
          path => 'path/to/document',
-Line 207 
 sub add {
+Line 212 
 sub add {
          return 1;
  }
  =head2 search
- Search your index.
+ Search your index using any valid SWISH++ query.
+   my @results = $i->search("swish query");
-   my @results = $i->search("swhish query");
+ Returns array with elements like this:
- Returns array with result IDs.
+   {
+    rank => 10,                  # rank of result
+    path => 'path to result',    # path to result
+    size => 999,                 # size in bytes
+    title => 'title of result'   # title meta property
+   }
  =cut
-Line 285 
 sub property {
+Line 298 
 sub property {
  =head2 finish_update
- This method will close index.
+ This method will close index binary and enable search. Searching is not
+ available while indexing is in process.
    $i->finish_update;
- It will be called on DESTROY when $i goes out of scope.
+ Usually, you don't need to call this method directly. It will be called on
+ DESTROY when $i goes out of scope or when you first call search in session
+ if indexing was started.
  =cut
-Line 308 
 sub DESTROY {
+Line 324 
 sub DESTROY {
  =head1 PRIVATE METHODS
- Private methods implement internals for creating temporary file needed for
+ Private methods implement internals for creating temporary files needed for
- swish++. You should have no need to call them directly, and they are here
+ SWISH++. You should have no need to call them directly, and they are here
  just to have documentation.
  =head2 _init_indexer
-Line 359 
 sub _init_indexer {
+Line 375 
 sub _init_indexer {
          return $self->{'_index_fh'};
  }
- =head2 _tie_meta_db
- Open BerkeleyDB database with meta properties.
-   $i->_tie_meta_db(DB_CREATE);
-   $i->_tie_meta_db(DB_RDONLY);
- }
- =cut
- sub _tie_meta_db  {
-         my $self = shift;
-         my $flags = shift || confess "need DB_CREATE or DB_RDONLY";
-         return if ($self->{'_meta_db_flags'} && $self->{'_meta_db_flags'} == $flags);
-         print STDERR "## _tie_meta_db($flags)\n" if ($self->{'debug'});
-         $self->_untie_meta_db;
-         $self->{'_meta_db_flags'} = $flags;
-         my $file = $self->{'index_dir'}.'/meta.db';
-         tie %{$self->{'meta_db'}}, "BerkeleyDB::Hash",
-                 -Filename => $file,
-                 -Flags    => $flags
-         or confess "cannot open $file: $! $BerkeleyDB::Error\n" ;
-         return 1;
- }
- =head2 _untie_meta_db
- Close BerkeleyDB database with meta properties.
-   $i->_untie_meta_db
- =cut
- sub _untie_meta_db {
-         my $self = shift;
-         return unless ($self->{'meta_db'});
-         print STDERR "## _untie_meta_db\n" if ($self->{'debug'});
-         untie %{$self->{'meta_db'}} || confess "can't untie!";
-         undef $self->{'meta_db'};
-         undef $self->{'_meta_db_flags'};
-         return 1;
- }
  =head2 _create_doc
- Create temporary file and pass it's name to swish++
+ Create temporary file and pass it's name to SWISH++
    $i->_create_doc(
          path => 'path/to/store/in/index',
-Line 497 
 sub _close_index {
+Line 459 
 sub _close_index {
          return 1;
  }
+ =head2 _tie_meta_db
+ Open BerkeleyDB database with meta properties.
+   $i->_tie_meta_db(DB_CREATE);
+   $i->_tie_meta_db(DB_RDONLY);
+ }
+ =cut
+ sub _tie_meta_db  {
+         my $self = shift;
+         my $flags = shift || confess "need DB_CREATE or DB_RDONLY";
+         return if ($self->{'_meta_db_flags'} && $self->{'_meta_db_flags'} == $flags);
+         print STDERR "## _tie_meta_db($flags)\n" if ($self->{'debug'});
+         $self->_untie_meta_db;
+         $self->{'_meta_db_flags'} = $flags;
+         my $file = $self->{'index_dir'}.'/meta.db';
+         tie %{$self->{'meta_db'}}, "BerkeleyDB::Hash",
+                 -Filename => $file,
+                 -Flags    => $flags
+         or confess "cannot open $file: $! $BerkeleyDB::Error\n" ;
+         return 1;
+ }
+ =head2 _untie_meta_db
+ Close BerkeleyDB database with meta properties.
+   $i->_untie_meta_db;
+ =cut
+ sub _untie_meta_db {
+         my $self = shift;
+         return unless ($self->{'meta_db'});
+         print STDERR "## _untie_meta_db\n" if ($self->{'debug'});
+         untie %{$self->{'meta_db'}} || confess "can't untie!";
+         undef $self->{'meta_db'};
+         undef $self->{'_meta_db_flags'};
+         return 1;
+ }
 ;
  __END__
-Line 508 
 None by default.
+Line 524 
 None by default.
  =head2 Debian
- Debian version of swish++ is often old (version 5 at moment of this writing
+ Debian version of SWISH++ is often old (version 5 at moment of this writing
  while version 6 is available in source code), so this module by default
  uses executable names B<index> and B<search> for self-compiled version
  instead of one from Debian package. See L<new> how to specify Debian
-Line 516 
 default binaries B<index++> and B<search
+Line 532 
 default binaries B<index++> and B<search
  =head2 SWISH++
- Aside from very good rewrite in C++, SWISH++ is fatster because it has
+ Aside from very good rewrite in C++, SWISH++ is faster because it uses
  claver heuristics about which data in input files are words to index and
  which are not. It's based on English language and might be best choice if
- you plan to install large amount of long text documents.
+ you plan to index large amount of long text documents.
  However, if you plan to index all data from structured storage (e.g. RDBMS)
  you might want B<all> words from data to end up in index as opposed to just
-Line 527 
 those which look like English words. Thi
+Line 543 
 those which look like English words. Thi
  don't plan to index English texts with this module.
  With distribution build versions of SWISH++ you might have problems with
- disepearing words. To overcome this problem, you will have to compile and
+ disapearing words. To overcome this problem, you will have to compile and
  configure SWISH++ yourself (because language characteristics are
  compilation-time option).
-Line 543 
 configuration is needed for B<date test>
+Line 559 
 configuration is needed for B<date test>
  doesn't recognize 2004-12-05 as date. Have in mind that your index size
  might explode.
+ =head1 BUGS
+ Currently there is no way to specify which meta data will be stored as
+ properties. B<This will be fixed very soon>.
+ There is no garbage collection on temporary files created for SWISH++. This
+ means that one run of indexer will take additional disk space for temporary
+ files, which will be removed at end. There should be some way to remove
+ files after they are indexed by SWISH++. However, at this early stage of
+ development it's just not supported yet. Have plenty of disk space!
  =head1 SEE ALSO
- C<swish++> web site L<http://homepage.mac.com/pauljlucas/software/swish/>
+ SWISH++ web site L<http://homepage.mac.com/pauljlucas/software/swish/>
  =head1 AUTHOR

 Legend:



Removed from v.20
 


changed lines


 
Added in v.21
 Legend:



Removed from v.20
 


changed lines


 
Added in v.21
-Removed from v.20
+Added in v.21

	ViewVC Help
Powered by ViewVC 1.1.26