/[webpac]/openisis/current/doc/PatchWork.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/current/doc/PatchWork.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (show annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 7709 byte(s)
initial import of openisis 0.9.0 vendor drop

1 principles of a patchworked database:
2 how OO-style inheritance works for data
3
4
5 * references
6
7 It is quite common for records in a database to refer in some way to other
8 records. In fact, it seems to be such an interesting property,
9 that the main classification of databases is along the lines of how
10 they handle references:
11 - in a hierarchical DB, every record has 1 or 0 parents,
12 and any number of childs
13 - in a networked DB, every record has a variable number of references
14 (typically fix for a given type of record)
15 - in a relational DB, every record has any number of references,
16 which are determined by queries. References are constructed on the fly;
17 if they are backed by some index and the query optimizer is able to
18 figure that out, this works even for larger DBs.
19
20 While basically every model can be emulated on top of every other,
21 it is a matter of performance and convenience (i.e. performance counted
22 by hours of programming) how well a given model of data access supports
23 a given pattern of data usage (that's why you should stop thinking that
24 RDBMS solve all problems).
25
26
27 The magic working behind references is always the same:
28 either a (more or less) physical key of the referred record
29 or some logical key is stored in the referring record,
30 the latter requiring translation by an index (typically B-Tree).
31 In general, the key might as well be a RDF URI,
32 resolved by a http or other server.
33
34
35 The "physical" key is known as "master file number" (MFN) in ISIS databases,
36 as "ROWID" or similar in most RDBMS. Using a ROWID is a gross violation
37 of the principles of relational databases (Codd's axioms),
38 and most RDBMS cannot retain a ROWID reference via export/import
39 (i.e. DBs using such references can neither be reliably exported nor backed up,
40 short to a DB shutdown and file backup, as stated by a major vendor).
41
42
43 Hierarchical and networked DBs, on the other hand, have much stronger support
44 for the fast physical key, but typically still allow for logical keys to
45 reach via an index on any record. They just don't sport an elaborate
46 and "standardized" query language like SQL giving convenient access
47 to what does and doesn't work. It's up to the application programmer,
48 rather than the query optimizer designer, to follow the paths of well
49 supported references.
50
51
52 ISIS databases are clearly in the latter (older) branch of the family of
53 databases, believing more in the application programmer's knowledge
54 than in artificial intelligence. Besides the conceptually simple and ultra-fast
55 MFN reference, it also has a very flexible indexing mechanism,
56 allowing for references to even be based on fulltext.
57
58
59 We want to present a view on data, that has some interesting properties
60 and (while it can be emulated on any DBMS) is particularily well supported
61 on ISIS DBs:
62
63
64 * the patch relation
65
66 We propose a relation between records that transcedes a mere reference
67 (independent of whether the underlying reference is based on a physical
68 or logical key):
69 - there is a patch operator
70 that constructs a new record from referring and referred-to record
71 - there is a diff operator
72 that constructs a new record from referring and referred-to record,
73 in such a way that the patch operator, applied to the latter and the result,
74 will construct the referring record
75
76 Basically, a very small record can tell a story like
77 - I am related to that record
78 - I am very similar to that record, but
79 - I have these and those fields added/changed/deleted
80
81
82 Everybody who is accustomed to tools like RCS or CVS (based on RCS)
83 will get the idea of such a relation:
84 depending on what is most commonly needed, either one or the other version
85 can be stored much more compactly as just the diff to the other one.
86 Or, if both are referred to oftenly, you can store them both at full
87 life size, but put only the differences in the index.
88
89
90 If the idea of applying diff and patch to database records seems strange,
91 have a look at
92 > Serialized a plain text representation for ISIS records.
93 This paper does include a means to store patch "scripts" even at database
94 master file level, thereby providing very efficient support for
95 cross-referring patch records.
96
97
98 However, the patch relation can as well very efficiently
99 be implemented on top of a standard ISIS database.
100 Some meta data entries (per field in the FDT) might be used to set
101 convenient defaults for interpretation of references and patches:
102 - to which database an entry in field x referres to
103 - index tag and/or prefix to be used on lookup
104 - whether field occurences in the patch record are additive or overriding
105
106 In the absence of more specific instructions,
107 the default patch operation is used by reading the referring record
108 as a series of set statements against it's base:
109 every field (tag) which is present in the referring record
110 overrides all fields of the same tag in the base.
111 Should there be multiple bases defined, they are by default added
112 (i.e. concatenated). A multiplikative operation is modeled
113 by chaining inheritance.
114
115
116 In a way, every reference can be regarded as defining a patch relation:
117 If you resolve the reference in a table join, you are actually creating
118 a new virtual record, comprised of some fields of the referred and referring
119 records. However, in the much more flexible ISIS data model,
120 the creation of a new record is much more flexibel.
121
122
123 * inheritance by patching
124
125 One way to use patching is to think of a patch relation as being an
126 inheritance relation: not by class, but by object!
127 A record can inherit data fields from one or several other records,
128 which still exists as independent entities.
129
130
131 When the inheritance is resolved, some data from the parents is copied
132 to the new virtual record, some is dropped or overwritten.
133 An inheritance relation does not only apply to logically subordinate child
134 records, it can also be used for versioning, with a successor inheriting
135 from it's predecessor, while both are still available as independent entities.
136
137
138 * example: the serial killer
139
140 Management of serials is known as a notoriuosly hard problem.
141 So let's have a look at how we could make a data model using
142 record inheritance.
143
144
145 We start out simple, with
146 - one record for the serial,
147 - one for each issue, inheriting from the serial
148 - one for each copy, inheriting from the issue
149
150 The issue adds a number, date, table of contents and so on to the serial.
151 The copy adds some id to the issue, and maybe a note that it's in a poor
152 state, since somebody poured coffee over it.
153
154
155 But then things can get a little bit more complicated:
156 - if the serial changes it's name or other attributes,
157 a successor may inherit from the original one,
158 and following issues inherit from the successor
159 (while earlier issues still belong to the original)
160 - if two serials are joined, a successor may inherit from both
161 - sometimes two otherwise independent serials are printed
162 together, so the actual issues inherit from both
163 - an issue might have a reprint,
164 which inherits from the original issue,
165 changing the date and adding a new preface
166 - if somebody ripped an article out of a given copy,
167 that copy might adjust it's issues table of contents
168 - if copies of several issues are bound together,
169 you actually have only one copy, inheriting from several issues
170
171
172 To summarize, a lot of complicated situations can be modelled in a quite
173 natural fashion when thinking in terms of inheritance through patch relations,
174 and these, in turn can be easily and efficiently be implemented based
175 on ISIS record and database structures.
176
177 If you think of modelling that in a relational database,
178 you will probably find that you end up with
179 > RdbConv mimicking
180 the ISIS record.
181
182 ---
183 $Id: PatchWork.txt,v 1.2 2003/06/23 14:43:24 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26