1 |
principles of a patchworked database: |
2 |
how OO-style inheritance works for data |
3 |
|
4 |
|
5 |
* references |
6 |
|
7 |
It is quite common for records in a database to refer in some way to other |
8 |
records. In fact, it seems to be such an interesting property, |
9 |
that the main classification of databases is along the lines of how |
10 |
they handle references: |
11 |
- in a hierarchical DB, every record has 1 or 0 parents, |
12 |
and any number of childs |
13 |
- in a networked DB, every record has a variable number of references |
14 |
(typically fix for a given type of record) |
15 |
- in a relational DB, every record has any number of references, |
16 |
which are determined by queries. References are constructed on the fly; |
17 |
if they are backed by some index and the query optimizer is able to |
18 |
figure that out, this works even for larger DBs. |
19 |
|
20 |
While basically every model can be emulated on top of every other, |
21 |
it is a matter of performance and convenience (i.e. performance counted |
22 |
by hours of programming) how well a given model of data access supports |
23 |
a given pattern of data usage (that's why you should stop thinking that |
24 |
RDBMS solve all problems). |
25 |
|
26 |
|
27 |
The magic working behind references is always the same: |
28 |
either a (more or less) physical key of the referred record |
29 |
or some logical key is stored in the referring record, |
30 |
the latter requiring translation by an index (typically B-Tree). |
31 |
In general, the key might as well be a RDF URI, |
32 |
resolved by a http or other server. |
33 |
|
34 |
|
35 |
The "physical" key is known as "master file number" (MFN) in ISIS databases, |
36 |
as "ROWID" or similar in most RDBMS. Using a ROWID is a gross violation |
37 |
of the principles of relational databases (Codd's axioms), |
38 |
and most RDBMS cannot retain a ROWID reference via export/import |
39 |
(i.e. DBs using such references can neither be reliably exported nor backed up, |
40 |
short to a DB shutdown and file backup, as stated by a major vendor). |
41 |
|
42 |
|
43 |
Hierarchical and networked DBs, on the other hand, have much stronger support |
44 |
for the fast physical key, but typically still allow for logical keys to |
45 |
reach via an index on any record. They just don't sport an elaborate |
46 |
and "standardized" query language like SQL giving convenient access |
47 |
to what does and doesn't work. It's up to the application programmer, |
48 |
rather than the query optimizer designer, to follow the paths of well |
49 |
supported references. |
50 |
|
51 |
|
52 |
ISIS databases are clearly in the latter (older) branch of the family of |
53 |
databases, believing more in the application programmer's knowledge |
54 |
than in artificial intelligence. Besides the conceptually simple and ultra-fast |
55 |
MFN reference, it also has a very flexible indexing mechanism, |
56 |
allowing for references to even be based on fulltext. |
57 |
|
58 |
|
59 |
We want to present a view on data, that has some interesting properties |
60 |
and (while it can be emulated on any DBMS) is particularily well supported |
61 |
on ISIS DBs: |
62 |
|
63 |
|
64 |
* the patch relation |
65 |
|
66 |
We propose a relation between records that transcedes a mere reference |
67 |
(independent of whether the underlying reference is based on a physical |
68 |
or logical key): |
69 |
- there is a patch operator |
70 |
that constructs a new record from referring and referred-to record |
71 |
- there is a diff operator |
72 |
that constructs a new record from referring and referred-to record, |
73 |
in such a way that the patch operator, applied to the latter and the result, |
74 |
will construct the referring record |
75 |
|
76 |
Basically, a very small record can tell a story like |
77 |
- I am related to that record |
78 |
- I am very similar to that record, but |
79 |
- I have these and those fields added/changed/deleted |
80 |
|
81 |
|
82 |
Everybody who is accustomed to tools like RCS or CVS (based on RCS) |
83 |
will get the idea of such a relation: |
84 |
depending on what is most commonly needed, either one or the other version |
85 |
can be stored much more compactly as just the diff to the other one. |
86 |
Or, if both are referred to oftenly, you can store them both at full |
87 |
life size, but put only the differences in the index. |
88 |
|
89 |
|
90 |
If the idea of applying diff and patch to database records seems strange, |
91 |
have a look at |
92 |
> Serialized a plain text representation for ISIS records. |
93 |
This paper does include a means to store patch "scripts" even at database |
94 |
master file level, thereby providing very efficient support for |
95 |
cross-referring patch records. |
96 |
|
97 |
|
98 |
However, the patch relation can as well very efficiently |
99 |
be implemented on top of a standard ISIS database. |
100 |
Some meta data entries (per field in the FDT) might be used to set |
101 |
convenient defaults for interpretation of references and patches: |
102 |
- to which database an entry in field x referres to |
103 |
- index tag and/or prefix to be used on lookup |
104 |
- whether field occurences in the patch record are additive or overriding |
105 |
|
106 |
In the absence of more specific instructions, |
107 |
the default patch operation is used by reading the referring record |
108 |
as a series of set statements against it's base: |
109 |
every field (tag) which is present in the referring record |
110 |
overrides all fields of the same tag in the base. |
111 |
Should there be multiple bases defined, they are by default added |
112 |
(i.e. concatenated). A multiplikative operation is modeled |
113 |
by chaining inheritance. |
114 |
|
115 |
|
116 |
In a way, every reference can be regarded as defining a patch relation: |
117 |
If you resolve the reference in a table join, you are actually creating |
118 |
a new virtual record, comprised of some fields of the referred and referring |
119 |
records. However, in the much more flexible ISIS data model, |
120 |
the creation of a new record is much more flexibel. |
121 |
|
122 |
|
123 |
* inheritance by patching |
124 |
|
125 |
One way to use patching is to think of a patch relation as being an |
126 |
inheritance relation: not by class, but by object! |
127 |
A record can inherit data fields from one or several other records, |
128 |
which still exists as independent entities. |
129 |
|
130 |
|
131 |
When the inheritance is resolved, some data from the parents is copied |
132 |
to the new virtual record, some is dropped or overwritten. |
133 |
An inheritance relation does not only apply to logically subordinate child |
134 |
records, it can also be used for versioning, with a successor inheriting |
135 |
from it's predecessor, while both are still available as independent entities. |
136 |
|
137 |
|
138 |
* example: the serial killer |
139 |
|
140 |
Management of serials is known as a notoriuosly hard problem. |
141 |
So let's have a look at how we could make a data model using |
142 |
record inheritance. |
143 |
|
144 |
|
145 |
We start out simple, with |
146 |
- one record for the serial, |
147 |
- one for each issue, inheriting from the serial |
148 |
- one for each copy, inheriting from the issue |
149 |
|
150 |
The issue adds a number, date, table of contents and so on to the serial. |
151 |
The copy adds some id to the issue, and maybe a note that it's in a poor |
152 |
state, since somebody poured coffee over it. |
153 |
|
154 |
|
155 |
But then things can get a little bit more complicated: |
156 |
- if the serial changes it's name or other attributes, |
157 |
a successor may inherit from the original one, |
158 |
and following issues inherit from the successor |
159 |
(while earlier issues still belong to the original) |
160 |
- if two serials are joined, a successor may inherit from both |
161 |
- sometimes two otherwise independent serials are printed |
162 |
together, so the actual issues inherit from both |
163 |
- an issue might have a reprint, |
164 |
which inherits from the original issue, |
165 |
changing the date and adding a new preface |
166 |
- if somebody ripped an article out of a given copy, |
167 |
that copy might adjust it's issues table of contents |
168 |
- if copies of several issues are bound together, |
169 |
you actually have only one copy, inheriting from several issues |
170 |
|
171 |
|
172 |
To summarize, a lot of complicated situations can be modelled in a quite |
173 |
natural fashion when thinking in terms of inheritance through patch relations, |
174 |
and these, in turn can be easily and efficiently be implemented based |
175 |
on ISIS record and database structures. |
176 |
|
177 |
If you think of modelling that in a relational database, |
178 |
you will probably find that you end up with |
179 |
> RdbConv mimicking |
180 |
the ISIS record. |
181 |
|
182 |
--- |
183 |
$Id: PatchWork.txt,v 1.2 2003/06/23 14:43:24 kripke Exp $ |