This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/current/doc/RdbConv.txt

Parent Directory Parent Directory | Revision Log Revision Log

Revision 237 - (show annotations)
Mon Mar 8 17:43:12 2004 UTC (16 years, 7 months ago) by dpavlin
File MIME type: text/plain
File size: 5821 byte(s)
initial import of openisis 0.9.0 vendor drop

1 Representing ISIS data in a relational database
3 If you consider setting up a table with columns "author", "title"
4 and so on, don't do it. The following structure gives you at least
5 some of the benefits of ISIS, because it's shaped more closely
6 to the ISIS data model.
8 * table structure
10 It is not very difficult to think of a table structure
11 for lossless representation of the contents of an ISIS masterfile:
12 $
14 mfn number, -- master file number
15 fld number, -- counting fields in this masterfile record
16 tag number, -- the field tag
17 val varchar(2048) -- field's content
18 )
19 $
20 Notes:
21 - a row in this table does NOT hold an ISIS record,
22 but one field of an ISIS record
23 - each ISIS record is represented by all table rows
24 with the same mfn
25 - you may omit the column "fld" if you don't care for the ordering
26 - you may choose other datatypes for the field tag,
27 for example number(3) or char(3),
28 depending on the compatibility required (e.g. MARC)
29 - you may choose other dimensioning for the val column.
30 ISIS can hold up to 32KB (signed short) in one field,
31 Oracle has a maximum of 4KB on the varchar type.
33 * accessing the data
35 Retrieving row 123 is easily done with
36 $
37 SELECT tag, val WHERE mfn = 123 ORDER BY fld
38 $
39 It is suggested that you fill the val column with native
40 ISIS data as of "MPL" proof mode. Then you have to extract
41 subfields and perform other substitutions for output.
42 This can be achieved for example using the v-method of the
43 > /jdoc/org/openisis/Field.html Field
44 Java class (it's a static method of about 100 lines,
45 which you may grab and port to your system).
48 An alternative would be to pre-split the subfields
49 into a structure like
50 $
52 mfn number,
53 fld number,
54 tag number,
55 sub char(1), -- subfield code
56 val varchar(2048) -- subfield's content
57 )
58 $
59 A typical keywords field like
60 $
61 <deltas><humid zones><tropical zones>
62 $
63 should be split into three subfields using any subfield code, say '>'.
65 The advantage of the pre-split approach is that it makes both
66 display and indexing easier.
67 The disadvantage is, that you'll loose some information,
68 albeit not too much.
70 * searching
72 Searching is also relatively easy, as long as you ask simple questions ;).
73 To get a list of the mfns with a field 24 containing Twain, use
74 $
75 SELECT DISTINCT(mfn) FROM Mst WHERE val = 'Twain' AND tag = 24
76 $
77 For this to work efficiently, you have to put an index on val.
78 You may than search for 'Twain' with or without the restriction
79 on a certain tag, just like in ISIS.
80 You can also use right truncation and restrict on a couple of tags:
81 $
82 SELECT DISTINCT(mfn) FROM Mst WHERE val LIKE 'Twain%' AND tag IN (201,401)
83 $
84 Right truncation should still use the index on val,
85 thus be reasonable efficient (depending on how smart your RDB is).
86 With the pre-split structure as above, this applies to any subfield.
87 If you want to find Twain anywhere within the field's values, you need
88 $
90 $
91 This will perform a pretty costly full table scan,
92 so you may go fetch yourself a coffee meanwhile.
95 While you may use SQLs OR predicate to combine several terms as
96 alternatives, the ISIS AND does NOT map to SQLs AND,
97 but to an INTERSECTion on the mfn lists.
98 It's actually pretty much the same as ISIS does internally (but more costly).
100 * building an index
102 Still, this is lightyears away from the possibilities of an ISIS db.
103 The index on val does not do word splits, does not use stopwords,
104 is not case insensitive, does not apply character or word conversion
105 and so on.
107 To come anywhere near the features of ISIS, you have to do just
108 what ISIS does: set up a separate "Inverted File".
109 $
110 CREATE TABLE Index (
111 val varchar(30), -- index entry
112 mfn number, -- record containing entry
113 tag number, -- tag of field containing entry
114 occ number, -- field is occ'th occurence of field tag in record
115 pos number -- entry is pos'th word in field
116 )
117 $
118 To fill this table, you could of course go and reimplement
119 all the features of an ISIS .FST in, say, Oracle's PL/SQL
120 and recreate one record's entries from within an INSERT/UPDATE trigger.
122 Much easier, given you had a true ISIS database in the first place,
123 is to load the .LK1 and .LK2 text files created during full index update.
124 Some ISIS tools also allow you to create these files separately.
126 Searching is performed against the Index table
127 (which of course needs an RDB index at least on val),
128 not the Mst table.
129 The latter is only used to retrieve the full content for a given mfn,
130 or for full table scans.
132 * so why then use ISIS in the first place?
134 There is a long list of reasons, here go a few of them:
135 - You don't want to reimplement the .FST, the full formatting language,
136 all the nifty details of the various indexing techniques and so on.
137 Actually, if you DO want, you're very welcome in the OpenIsis team ;)
138 - Not only the index building, but also data entry is done much more
139 comfortable using ISIS tools.
140 - Any queries but the most simple examples given above will not only
141 be somewhat complex to write in SQL, but will also execute much slower
142 than in a native ISIS implementation (consider publishing with OpenIsis).
143 - Especially the NEAR operators can't easily be expressed in SQL.
144 Although the index structure given above contains the necessary data
145 in occ and pos, advanced SQL features like nested queries are needed
146 in the appropriate SQL query. You need to have an advanced (read:
147 expensive) RDBMS and a deep and thorough understanding of it's
148 query optimizer and hinting system for this to run in any
149 reasonable time.
150 - So why then use a RDBMS in the first place?
152 See also
153 > whatabout a comparision of ISIS and other database systems
154 and
155 > unirec notes on the flexibility of the ISIS data model.
156 ---
157 $Id: RdbConv.txt,v 1.4 2002/11/26 16:57:19 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26