/[webpac]/trunk2/doc/lookup.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /trunk2/doc/lookup.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 337 - (hide annotations)
Thu Jun 10 19:22:40 2004 UTC (19 years, 9 months ago) by dpavlin
File MIME type: text/plain
File size: 6282 byte(s)
new trunk for webpac v2

1 dpavlin 222 How to lookup some value in my output?
2    
3    
4     You might want to use these feature if you try to display something that is
5     related to current record.
6    
7     All lookups are modelled around key => value(s) idea, so you can store any
8     value attached to unique key value. Both of those values can have fields for
9     any import formats or fixed values (delimiters, prefixes etc.)
10    
11     First, it's important that database that have to create key => value data
12     must be specified before database that uses those values in all2xml.conf.
13    
14     Second, that usually means that you will have to have two database
15     configurations in all2xml.conf which point to same database if you want to
16     lookup records from same database. I would suggest to have two import_xml/
17     files, one which just store lookup key and values (and thus is faster
18     executed) and another that creates output for swish and indexer which just
19     use lookup.
20    
21    
22     1. Lookup to other database (using type="lookup_key" and lookup="1")
23    
24     For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which
25     have unique identifiers in field 900 and we want those term for display.
26    
27     Bibliographic database (import_xml/isis_hidra_bib.xml) have just field
28     which has field 900 from entry in thesaurus. While that's enough to create
29     links in search results (using links and format, see doc/links.txt) we would
30     like to display term from thesaurus and not value of field 900.
31    
32     In first step, we store fields from thesaurus (as value) that relates to
33     field 900 for that entry (which is key) using following XML (in
34     import_xml/isis_hidra.ths.xml):
35    
36     <IDths name="ID" order="300">
37     <isis type="lookup_key">900</isis>
38     </IDths>
39    
40     <SubjectIndex name="Predmetno kazalo" order="301">
41     <isis type="lookup_val">[5624] 562a</isis>
42     </SubjectIndex>
43    
44     This will create lookup which you might write like this:
45    
46     900 => "[5624] 562a"
47    
48     Quotes are added to denote that value is single entry.
49     We also have to specify in all2xml.conf something like:
50    
51     lookup_newfile=/data/webpac/thes.lookup
52    
53     Which will create new lookup file.
54    
55     For bibliographic database which will do lookups into previously created file,
56     all2xml.conf must have:
57    
58     lookup_open=/data/webpac/thes.lookup
59    
60     and then in import_xml/ we use:
61    
62     <isis lookup="1">6013</isis>
63    
64     Value of field 6103 must match exactly to field 900 (which is key) from
65     thesaurus. You can however add arbitrary prefix or suffix to store unrelated
66     keys in values in same lookup.
67    
68    
69     1.1 NOTE about memory usage:
70    
71     This lookups are created on disk. Default configuration also creates
72     memory cache for faster indexing which you can turn off by changing line
73    
74     my $use_lhash_cache = 1;
75    
76     in all2xml.pl to
77    
78     my $use_lhash_cache = 0;
79    
80     You won't probably need to do that so, it's not configuration option.
81    
82    
83     2. Lookup that has to store more than one value
84    
85     While lookups described above are sufficient when you want to store just one
86     value associated with one key, they don't quite help us if we need to have
87     more than one value for each key.
88    
89     Typical example of that might be displaying of narrower terms in thesaurus.
90     Each narrower term have id of parent term (which is enough to display
91     narrower term), but we would like to display all brother terms with each
92     term also.
93    
94     So, we'll store under key of parent term all keys of terms which are brother.
95     But, we would also like to display terms and not term numbers. That requests
96     first to find all brother terms (which is lookup returning one or more term ids)
97     and than lookup names of those returned terms for display.
98    
99     It's usually called indirect lookup, and is much hated by CS majors in their
100     freshman year. Later, it becomes so natural that you think it's the only way
101     to solve problem. So, you are stuck with it :-)
102    
103     Since lookups can return more than one value, and we would like to use format
104     to create links, this lookup is implemented like filter="mem_lookup". Let's
105     look at example.
106    
107     <LookupThesNT name="lookup for thesaurus narrow term">
108     <!--
109     Store value of field 250a (for display) in key composed
110     of prefix "d:" and value of field 900.
111     This is one key - one value lookup.
112     -->
113     <isis filter="mem_lookup" type="display">d:900 => 250a</isis>
114    
115     <!--
116     Now, for each entry generate parent ID (using fields
117     5614, 5624, 4611 add prefix "a:" to it as a key)
118     and value of field 900 for value.
119     That will create lookup which can (and will) have
120     more than one value for each key (because parent
121     term have more than one child).
122     -->
123     <isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis>
124    
125     </LookupThesNT>
126    
127     So, after we index database with import_xml which have mem_lookup filter (which won't
128     create any output to swish or index) we have just two lookups stored in memory (that's
129     where name mem_lookup comes from):
130    
131     d:900 => 250a
132    
133     a:5614:5624:4611 => 900 900 900 900 900 ...
134    
135     Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or
136     a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614,
137     and descriptors have 5614 and 5624 or all of them, depending on level).
138    
139     Now, let's display some of those lookups.
140    
141     First, we can display all ids of fields which are child to field 251:
142    
143     <isis type="display" filter="mem_looku">[a:251]</isis>
144    
145     That's not very useful, because we would like to display terms, and not
146     ids, possibly separated by " * ".
147    
148     <isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis>
149    
150     That's great. But, let's link those fields using format:
151    
152     <format name="IDths"><![CDATA[
153     <a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a>
154     ]]></format>
155    
156     <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis>
157    
158    
159     There is only one problem left. Since we want to display just child records
160     from current record, we have to use three different tags to display child
161     records (for field, micro-thesaurus and term). However, that means that
162     term will display also all child fields and child micro-thesaurus terms which
163     isn't what's needed.
164    
165     But, each record has also it's own level written in 901a, so we can filter
166     just correct child entries using something like:
167    
168     <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis>
169    

Properties

Name Value
cvs2svn:cvs-rev 1.1

  ViewVC Help
Powered by ViewVC 1.1.26