/[wait]/trunk/README
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Diff of /trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 106 by dpavlin, Mon May 24 13:44:01 2004 UTC revision 107 by dpavlin, Tue Jul 13 12:45:55 2004 UTC
# Line 102  DESCRIPTION Line 102  DESCRIPTION
102      addition a query and a display module must be choosen.      addition a query and a display module must be choosen.
103    
104    Access    Access
   
105      The access module defines which documents are members of a database.      The access module defines which documents are members of a database.
106      Usually an access module is a tied hash, whose keys are the Ids of the      Usually an access module is a tied hash, whose keys are the Ids of the
107      documents (did = document id) and whose values are the documents      documents (did = document id) and whose values are the documents
108      themselves. The indexing process loops over the keys using `FIRSTKEY'      themselves. The indexing process loops over the keys using "FIRSTKEY"
109      and `NEXTKEY'. Documents are retrieved with `FETCH'.      and "NEXTKEY". Documents are retrieved with "FETCH".
110    
111      By convention access modules should be members of the `WAIT::Document'      By convention access modules should be members of the "WAIT::Document"
112      hierarchy. Have a look at the `WAIT::Document::Split' module to get the      hierarchy. Have a look at the "WAIT::Document::Split" module to get the
113      idea.      idea.
114    
115    Parse    Parse
   
116      The task of the parse module is to split the documents into logical      The task of the parse module is to split the documents into logical
117      parts via the `split' method. E.g. the `WAIT::Parse::Nroff' splits      parts via the "split" method. E.g. the "WAIT::Parse::Nroff" splits
118      manuals piped through nroff(1) into the sections *name*, *synopsis*,      manuals piped through nroff(1) into the sections *name*, *synopsis*,
119      *options*, *description*, *author*, *example*, *bugs*, *text*, *see*,      *options*, *description*, *author*, *example*, *bugs*, *text*, *see*,
120      and *environment*. Here is the implementation of `WAIT::Parse::Base'      and *environment*. Here is the implementation of "WAIT::Parse::Base"
121      which handles documents with a pretty simple tagged format:      which handles documents with a pretty simple tagged format:
122    
123        AU: Pfeifer, U.; Fuhr, N.; Huynh, T.        AU: Pfeifer, U.; Fuhr, N.; Huynh, T.
# Line 150  DESCRIPTION Line 148  DESCRIPTION
148      we need a second method (*tag*) which marks the regions of the document      we need a second method (*tag*) which marks the regions of the document
149      with tags for the different attributes. This tagged form is used by the      with tags for the different attributes. This tagged form is used by the
150      display module to hilight search terms in the documents. Besides the      display module to hilight search terms in the documents. Besides the
151      tags for the attributes, the method might assign the special tags `_b'      tags for the attributes, the method might assign the special tags "_b"
152      and `_i' for indicating bold and italic regions.      and "_i" for indicating bold and italic regions.
153    
154        sub tag {        sub tag {
155          my @result;          my @result;
# Line 172  DESCRIPTION Line 170  DESCRIPTION
170          return @result;               # we don't go for speed          return @result;               # we don't go for speed
171        }        }
172    
173      Obviously one could implement `split' via `tag'. The reason for having      Obviously one could implement "split" via "tag". The reason for having
174      two functions is speed. We need to call `split' for each document when      two functions is speed. We need to call "split" for each document when
175      indexing a collection. Therefore speed is essential. On the other hand,      indexing a collection. Therefore speed is essential. On the other hand,
176      `tag' is called in order to display a single document and may be a      "tag" is called in order to display a single document and may be a
177      little slower. It may care about tagging bold and italic regions. See      little slower. It may care about tagging bold and italic regions. See
178      `WAIT::Parse::Nroff' how this might decrease performance.      "WAIT::Parse::Nroff" how this might decrease performance.
179    
180    Filter definition    Filter definition
   
181      From the Information Retrieval perspective, the hardest part of the      From the Information Retrieval perspective, the hardest part of the
182      system is the filter module. The database administrator defines for each      system is the filter module. The database administrator defines for each
183      attribute, how the contents should be processed before it is stored in      attribute, how the contents should be processed before it is stored in
# Line 196  DESCRIPTION Line 193  DESCRIPTION
193    
194              [ 'isotr', 'isolc', 'split2', 'stop', 'Stem']              [ 'isotr', 'isolc', 'split2', 'stop', 'Stem']
195    
196      The function `isotr' replaces unknown characters by blanks. `isolc'      The function "isotr" replaces unknown characters by blanks. "isolc"
197      transforms to lower case. `split2' splits into words and removes words      transforms to lower case. "split2" splits into words and removes words
198      shorter than two characters. `stop' removes the freeWAIS-sf stopwords      shorter than two characters. "stop" removes the freeWAIS-sf stopwords
199      and `Stem' applies the Porter algorithm for computing the stem of the      and "Stem" applies the Porter algorithm for computing the stem of the
200      words.      words.
201    
202      The filter definition for a collection defines a set of pipelines for      The filter definition for a collection defines a set of pipelines for

Legend:
Removed from v.106  
changed lines
  Added in v.107

  ViewVC Help
Powered by ViewVC 1.1.26