/[webpac]/openisis/current/doc/RdbConv.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/RdbConv.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 5821 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 Representing ISIS data in a relational database
2    
3     If you consider setting up a table with columns "author", "title"
4     and so on, don't do it. The following structure gives you at least
5     some of the benefits of ISIS, because it's shaped more closely
6     to the ISIS data model.
7    
8     * table structure
9    
10     It is not very difficult to think of a table structure
11     for lossless representation of the contents of an ISIS masterfile:
12     $
13     CREATE TABLE Mst (
14     mfn number, -- master file number
15     fld number, -- counting fields in this masterfile record
16     tag number, -- the field tag
17     val varchar(2048) -- field's content
18     )
19     $
20     Notes:
21     - a row in this table does NOT hold an ISIS record,
22     but one field of an ISIS record
23     - each ISIS record is represented by all table rows
24     with the same mfn
25     - you may omit the column "fld" if you don't care for the ordering
26     - you may choose other datatypes for the field tag,
27     for example number(3) or char(3),
28     depending on the compatibility required (e.g. MARC)
29     - you may choose other dimensioning for the val column.
30     ISIS can hold up to 32KB (signed short) in one field,
31     Oracle has a maximum of 4KB on the varchar type.
32    
33     * accessing the data
34    
35     Retrieving row 123 is easily done with
36     $
37     SELECT tag, val WHERE mfn = 123 ORDER BY fld
38     $
39     It is suggested that you fill the val column with native
40     ISIS data as of "MPL" proof mode. Then you have to extract
41     subfields and perform other substitutions for output.
42     This can be achieved for example using the v-method of the
43     > /jdoc/org/openisis/Field.html Field
44     Java class (it's a static method of about 100 lines,
45     which you may grab and port to your system).
46    
47    
48     An alternative would be to pre-split the subfields
49     into a structure like
50     $
51     CREATE TABLE Mst (
52     mfn number,
53     fld number,
54     tag number,
55     sub char(1), -- subfield code
56     val varchar(2048) -- subfield's content
57     )
58     $
59     A typical keywords field like
60     $
61     <deltas><humid zones><tropical zones>
62     $
63     should be split into three subfields using any subfield code, say '>'.
64    
65     The advantage of the pre-split approach is that it makes both
66     display and indexing easier.
67     The disadvantage is, that you'll loose some information,
68     albeit not too much.
69    
70     * searching
71    
72     Searching is also relatively easy, as long as you ask simple questions ;).
73     To get a list of the mfns with a field 24 containing Twain, use
74     $
75     SELECT DISTINCT(mfn) FROM Mst WHERE val = 'Twain' AND tag = 24
76     $
77     For this to work efficiently, you have to put an index on val.
78     You may than search for 'Twain' with or without the restriction
79     on a certain tag, just like in ISIS.
80     You can also use right truncation and restrict on a couple of tags:
81     $
82     SELECT DISTINCT(mfn) FROM Mst WHERE val LIKE 'Twain%' AND tag IN (201,401)
83     $
84     Right truncation should still use the index on val,
85     thus be reasonable efficient (depending on how smart your RDB is).
86     With the pre-split structure as above, this applies to any subfield.
87     If you want to find Twain anywhere within the field's values, you need
88     $
89     SELECT DISTINCT(mfn) FROM Mst WHERE val LIKE '%Twain%'
90     $
91     This will perform a pretty costly full table scan,
92     so you may go fetch yourself a coffee meanwhile.
93    
94    
95     While you may use SQLs OR predicate to combine several terms as
96     alternatives, the ISIS AND does NOT map to SQLs AND,
97     but to an INTERSECTion on the mfn lists.
98     It's actually pretty much the same as ISIS does internally (but more costly).
99    
100     * building an index
101    
102     Still, this is lightyears away from the possibilities of an ISIS db.
103     The index on val does not do word splits, does not use stopwords,
104     is not case insensitive, does not apply character or word conversion
105     and so on.
106    
107     To come anywhere near the features of ISIS, you have to do just
108     what ISIS does: set up a separate "Inverted File".
109     $
110     CREATE TABLE Index (
111     val varchar(30), -- index entry
112     mfn number, -- record containing entry
113     tag number, -- tag of field containing entry
114     occ number, -- field is occ'th occurence of field tag in record
115     pos number -- entry is pos'th word in field
116     )
117     $
118     To fill this table, you could of course go and reimplement
119     all the features of an ISIS .FST in, say, Oracle's PL/SQL
120     and recreate one record's entries from within an INSERT/UPDATE trigger.
121    
122     Much easier, given you had a true ISIS database in the first place,
123     is to load the .LK1 and .LK2 text files created during full index update.
124     Some ISIS tools also allow you to create these files separately.
125    
126     Searching is performed against the Index table
127     (which of course needs an RDB index at least on val),
128     not the Mst table.
129     The latter is only used to retrieve the full content for a given mfn,
130     or for full table scans.
131    
132     * so why then use ISIS in the first place?
133    
134     There is a long list of reasons, here go a few of them:
135     - You don't want to reimplement the .FST, the full formatting language,
136     all the nifty details of the various indexing techniques and so on.
137     Actually, if you DO want, you're very welcome in the OpenIsis team ;)
138     - Not only the index building, but also data entry is done much more
139     comfortable using ISIS tools.
140     - Any queries but the most simple examples given above will not only
141     be somewhat complex to write in SQL, but will also execute much slower
142     than in a native ISIS implementation (consider publishing with OpenIsis).
143     - Especially the NEAR operators can't easily be expressed in SQL.
144     Although the index structure given above contains the necessary data
145     in occ and pos, advanced SQL features like nested queries are needed
146     in the appropriate SQL query. You need to have an advanced (read:
147     expensive) RDBMS and a deep and thorough understanding of it's
148     query optimizer and hinting system for this to run in any
149     reasonable time.
150     - So why then use a RDBMS in the first place?
151    
152     See also
153     > whatabout a comparision of ISIS and other database systems
154     and
155     > unirec notes on the flexibility of the ISIS data model.
156     ---
157     $Id: RdbConv.txt,v 1.4 2002/11/26 16:57:19 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26