/[webpac]/trunk/doc/lookup.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /trunk/doc/lookup.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 222 - (show annotations)
Sun Feb 8 15:44:28 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 6282 byte(s)
Documentation describing usage of lookups

1 How to lookup some value in my output?
2
3
4 You might want to use these feature if you try to display something that is
5 related to current record.
6
7 All lookups are modelled around key => value(s) idea, so you can store any
8 value attached to unique key value. Both of those values can have fields for
9 any import formats or fixed values (delimiters, prefixes etc.)
10
11 First, it's important that database that have to create key => value data
12 must be specified before database that uses those values in all2xml.conf.
13
14 Second, that usually means that you will have to have two database
15 configurations in all2xml.conf which point to same database if you want to
16 lookup records from same database. I would suggest to have two import_xml/
17 files, one which just store lookup key and values (and thus is faster
18 executed) and another that creates output for swish and indexer which just
19 use lookup.
20
21
22 1. Lookup to other database (using type="lookup_key" and lookup="1")
23
24 For example (from import_xml/isis_hidra_ths.xml) thesaurus have terms which
25 have unique identifiers in field 900 and we want those term for display.
26
27 Bibliographic database (import_xml/isis_hidra_bib.xml) have just field
28 which has field 900 from entry in thesaurus. While that's enough to create
29 links in search results (using links and format, see doc/links.txt) we would
30 like to display term from thesaurus and not value of field 900.
31
32 In first step, we store fields from thesaurus (as value) that relates to
33 field 900 for that entry (which is key) using following XML (in
34 import_xml/isis_hidra.ths.xml):
35
36 <IDths name="ID" order="300">
37 <isis type="lookup_key">900</isis>
38 </IDths>
39
40 <SubjectIndex name="Predmetno kazalo" order="301">
41 <isis type="lookup_val">[5624] 562a</isis>
42 </SubjectIndex>
43
44 This will create lookup which you might write like this:
45
46 900 => "[5624] 562a"
47
48 Quotes are added to denote that value is single entry.
49 We also have to specify in all2xml.conf something like:
50
51 lookup_newfile=/data/webpac/thes.lookup
52
53 Which will create new lookup file.
54
55 For bibliographic database which will do lookups into previously created file,
56 all2xml.conf must have:
57
58 lookup_open=/data/webpac/thes.lookup
59
60 and then in import_xml/ we use:
61
62 <isis lookup="1">6013</isis>
63
64 Value of field 6103 must match exactly to field 900 (which is key) from
65 thesaurus. You can however add arbitrary prefix or suffix to store unrelated
66 keys in values in same lookup.
67
68
69 1.1 NOTE about memory usage:
70
71 This lookups are created on disk. Default configuration also creates
72 memory cache for faster indexing which you can turn off by changing line
73
74 my $use_lhash_cache = 1;
75
76 in all2xml.pl to
77
78 my $use_lhash_cache = 0;
79
80 You won't probably need to do that so, it's not configuration option.
81
82
83 2. Lookup that has to store more than one value
84
85 While lookups described above are sufficient when you want to store just one
86 value associated with one key, they don't quite help us if we need to have
87 more than one value for each key.
88
89 Typical example of that might be displaying of narrower terms in thesaurus.
90 Each narrower term have id of parent term (which is enough to display
91 narrower term), but we would like to display all brother terms with each
92 term also.
93
94 So, we'll store under key of parent term all keys of terms which are brother.
95 But, we would also like to display terms and not term numbers. That requests
96 first to find all brother terms (which is lookup returning one or more term ids)
97 and than lookup names of those returned terms for display.
98
99 It's usually called indirect lookup, and is much hated by CS majors in their
100 freshman year. Later, it becomes so natural that you think it's the only way
101 to solve problem. So, you are stuck with it :-)
102
103 Since lookups can return more than one value, and we would like to use format
104 to create links, this lookup is implemented like filter="mem_lookup". Let's
105 look at example.
106
107 <LookupThesNT name="lookup for thesaurus narrow term">
108 <!--
109 Store value of field 250a (for display) in key composed
110 of prefix "d:" and value of field 900.
111 This is one key - one value lookup.
112 -->
113 <isis filter="mem_lookup" type="display">d:900 => 250a</isis>
114
115 <!--
116 Now, for each entry generate parent ID (using fields
117 5614, 5624, 4611 add prefix "a:" to it as a key)
118 and value of field 900 for value.
119 That will create lookup which can (and will) have
120 more than one value for each key (because parent
121 term have more than one child).
122 -->
123 <isis filter="mem_lookup" type="display">a:5614:5624:4611 => 900</isis>
124
125 </LookupThesNT>
126
127 So, after we index database with import_xml which have mem_lookup filter (which won't
128 create any output to swish or index) we have just two lookups stored in memory (that's
129 where name mem_lookup comes from):
130
131 d:900 => 250a
132
133 a:5614:5624:4611 => 900 900 900 900 900 ...
134
135 Actual key of second ("a:") lookup can have form of a:5614, a:5614:5624 or
136 a:5614:5624:4611 depending on record (micro-thesaurus terms have just 5614,
137 and descriptors have 5614 and 5624 or all of them, depending on level).
138
139 Now, let's display some of those lookups.
140
141 First, we can display all ids of fields which are child to field 251:
142
143 <isis type="display" filter="mem_looku">[a:251]</isis>
144
145 That's not very useful, because we would like to display terms, and not
146 ids, possibly separated by " * ".
147
148 <isis type="display" filter="mem_lookup" delimiter=" * ">[d:[a:251]]</isis>
149
150 That's great. But, let's link those fields using format:
151
152 <format name="IDths"><![CDATA[
153 <a href="?rm=results&show_full=1&f=IDths&v=%s">%s</a>
154 ]]></format>
155
156 <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">[a:251];;[d:[a:251]]</isis>
157
158
159 There is only one problem left. Since we want to display just child records
160 from current record, we have to use three different tags to display child
161 records (for field, micro-thesaurus and term). However, that means that
162 term will display also all child fields and child micro-thesaurus terms which
163 isn't what's needed.
164
165 But, each record has also it's own level written in 901a, so we can filter
166 just correct child entries using something like:
167
168 <isis type="display" format_name="IDths" format_delimiter=";;" filter="mem_lookup" delimiter=" * ">eval{"901a" eq "Podruèje"}[a:251];;[d:[a:251]]</isis>
169

Properties

Name Value
cvs2svn:cvs-rev 1.1

  ViewVC Help
Powered by ViewVC 1.1.26