1 |
dpavlin |
237 |
principles of a patchworked database: |
2 |
|
|
how OO-style inheritance works for data |
3 |
|
|
|
4 |
|
|
|
5 |
|
|
* references |
6 |
|
|
|
7 |
|
|
It is quite common for records in a database to refer in some way to other |
8 |
|
|
records. In fact, it seems to be such an interesting property, |
9 |
|
|
that the main classification of databases is along the lines of how |
10 |
|
|
they handle references: |
11 |
|
|
- in a hierarchical DB, every record has 1 or 0 parents, |
12 |
|
|
and any number of childs |
13 |
|
|
- in a networked DB, every record has a variable number of references |
14 |
|
|
(typically fix for a given type of record) |
15 |
|
|
- in a relational DB, every record has any number of references, |
16 |
|
|
which are determined by queries. References are constructed on the fly; |
17 |
|
|
if they are backed by some index and the query optimizer is able to |
18 |
|
|
figure that out, this works even for larger DBs. |
19 |
|
|
|
20 |
|
|
While basically every model can be emulated on top of every other, |
21 |
|
|
it is a matter of performance and convenience (i.e. performance counted |
22 |
|
|
by hours of programming) how well a given model of data access supports |
23 |
|
|
a given pattern of data usage (that's why you should stop thinking that |
24 |
|
|
RDBMS solve all problems). |
25 |
|
|
|
26 |
|
|
|
27 |
|
|
The magic working behind references is always the same: |
28 |
|
|
either a (more or less) physical key of the referred record |
29 |
|
|
or some logical key is stored in the referring record, |
30 |
|
|
the latter requiring translation by an index (typically B-Tree). |
31 |
|
|
In general, the key might as well be a RDF URI, |
32 |
|
|
resolved by a http or other server. |
33 |
|
|
|
34 |
|
|
|
35 |
|
|
The "physical" key is known as "master file number" (MFN) in ISIS databases, |
36 |
|
|
as "ROWID" or similar in most RDBMS. Using a ROWID is a gross violation |
37 |
|
|
of the principles of relational databases (Codd's axioms), |
38 |
|
|
and most RDBMS cannot retain a ROWID reference via export/import |
39 |
|
|
(i.e. DBs using such references can neither be reliably exported nor backed up, |
40 |
|
|
short to a DB shutdown and file backup, as stated by a major vendor). |
41 |
|
|
|
42 |
|
|
|
43 |
|
|
Hierarchical and networked DBs, on the other hand, have much stronger support |
44 |
|
|
for the fast physical key, but typically still allow for logical keys to |
45 |
|
|
reach via an index on any record. They just don't sport an elaborate |
46 |
|
|
and "standardized" query language like SQL giving convenient access |
47 |
|
|
to what does and doesn't work. It's up to the application programmer, |
48 |
|
|
rather than the query optimizer designer, to follow the paths of well |
49 |
|
|
supported references. |
50 |
|
|
|
51 |
|
|
|
52 |
|
|
ISIS databases are clearly in the latter (older) branch of the family of |
53 |
|
|
databases, believing more in the application programmer's knowledge |
54 |
|
|
than in artificial intelligence. Besides the conceptually simple and ultra-fast |
55 |
|
|
MFN reference, it also has a very flexible indexing mechanism, |
56 |
|
|
allowing for references to even be based on fulltext. |
57 |
|
|
|
58 |
|
|
|
59 |
|
|
We want to present a view on data, that has some interesting properties |
60 |
|
|
and (while it can be emulated on any DBMS) is particularily well supported |
61 |
|
|
on ISIS DBs: |
62 |
|
|
|
63 |
|
|
|
64 |
|
|
* the patch relation |
65 |
|
|
|
66 |
|
|
We propose a relation between records that transcedes a mere reference |
67 |
|
|
(independent of whether the underlying reference is based on a physical |
68 |
|
|
or logical key): |
69 |
|
|
- there is a patch operator |
70 |
|
|
that constructs a new record from referring and referred-to record |
71 |
|
|
- there is a diff operator |
72 |
|
|
that constructs a new record from referring and referred-to record, |
73 |
|
|
in such a way that the patch operator, applied to the latter and the result, |
74 |
|
|
will construct the referring record |
75 |
|
|
|
76 |
|
|
Basically, a very small record can tell a story like |
77 |
|
|
- I am related to that record |
78 |
|
|
- I am very similar to that record, but |
79 |
|
|
- I have these and those fields added/changed/deleted |
80 |
|
|
|
81 |
|
|
|
82 |
|
|
Everybody who is accustomed to tools like RCS or CVS (based on RCS) |
83 |
|
|
will get the idea of such a relation: |
84 |
|
|
depending on what is most commonly needed, either one or the other version |
85 |
|
|
can be stored much more compactly as just the diff to the other one. |
86 |
|
|
Or, if both are referred to oftenly, you can store them both at full |
87 |
|
|
life size, but put only the differences in the index. |
88 |
|
|
|
89 |
|
|
|
90 |
|
|
If the idea of applying diff and patch to database records seems strange, |
91 |
|
|
have a look at |
92 |
|
|
> Serialized a plain text representation for ISIS records. |
93 |
|
|
This paper does include a means to store patch "scripts" even at database |
94 |
|
|
master file level, thereby providing very efficient support for |
95 |
|
|
cross-referring patch records. |
96 |
|
|
|
97 |
|
|
|
98 |
|
|
However, the patch relation can as well very efficiently |
99 |
|
|
be implemented on top of a standard ISIS database. |
100 |
|
|
Some meta data entries (per field in the FDT) might be used to set |
101 |
|
|
convenient defaults for interpretation of references and patches: |
102 |
|
|
- to which database an entry in field x referres to |
103 |
|
|
- index tag and/or prefix to be used on lookup |
104 |
|
|
- whether field occurences in the patch record are additive or overriding |
105 |
|
|
|
106 |
|
|
In the absence of more specific instructions, |
107 |
|
|
the default patch operation is used by reading the referring record |
108 |
|
|
as a series of set statements against it's base: |
109 |
|
|
every field (tag) which is present in the referring record |
110 |
|
|
overrides all fields of the same tag in the base. |
111 |
|
|
Should there be multiple bases defined, they are by default added |
112 |
|
|
(i.e. concatenated). A multiplikative operation is modeled |
113 |
|
|
by chaining inheritance. |
114 |
|
|
|
115 |
|
|
|
116 |
|
|
In a way, every reference can be regarded as defining a patch relation: |
117 |
|
|
If you resolve the reference in a table join, you are actually creating |
118 |
|
|
a new virtual record, comprised of some fields of the referred and referring |
119 |
|
|
records. However, in the much more flexible ISIS data model, |
120 |
|
|
the creation of a new record is much more flexibel. |
121 |
|
|
|
122 |
|
|
|
123 |
|
|
* inheritance by patching |
124 |
|
|
|
125 |
|
|
One way to use patching is to think of a patch relation as being an |
126 |
|
|
inheritance relation: not by class, but by object! |
127 |
|
|
A record can inherit data fields from one or several other records, |
128 |
|
|
which still exists as independent entities. |
129 |
|
|
|
130 |
|
|
|
131 |
|
|
When the inheritance is resolved, some data from the parents is copied |
132 |
|
|
to the new virtual record, some is dropped or overwritten. |
133 |
|
|
An inheritance relation does not only apply to logically subordinate child |
134 |
|
|
records, it can also be used for versioning, with a successor inheriting |
135 |
|
|
from it's predecessor, while both are still available as independent entities. |
136 |
|
|
|
137 |
|
|
|
138 |
|
|
* example: the serial killer |
139 |
|
|
|
140 |
|
|
Management of serials is known as a notoriuosly hard problem. |
141 |
|
|
So let's have a look at how we could make a data model using |
142 |
|
|
record inheritance. |
143 |
|
|
|
144 |
|
|
|
145 |
|
|
We start out simple, with |
146 |
|
|
- one record for the serial, |
147 |
|
|
- one for each issue, inheriting from the serial |
148 |
|
|
- one for each copy, inheriting from the issue |
149 |
|
|
|
150 |
|
|
The issue adds a number, date, table of contents and so on to the serial. |
151 |
|
|
The copy adds some id to the issue, and maybe a note that it's in a poor |
152 |
|
|
state, since somebody poured coffee over it. |
153 |
|
|
|
154 |
|
|
|
155 |
|
|
But then things can get a little bit more complicated: |
156 |
|
|
- if the serial changes it's name or other attributes, |
157 |
|
|
a successor may inherit from the original one, |
158 |
|
|
and following issues inherit from the successor |
159 |
|
|
(while earlier issues still belong to the original) |
160 |
|
|
- if two serials are joined, a successor may inherit from both |
161 |
|
|
- sometimes two otherwise independent serials are printed |
162 |
|
|
together, so the actual issues inherit from both |
163 |
|
|
- an issue might have a reprint, |
164 |
|
|
which inherits from the original issue, |
165 |
|
|
changing the date and adding a new preface |
166 |
|
|
- if somebody ripped an article out of a given copy, |
167 |
|
|
that copy might adjust it's issues table of contents |
168 |
|
|
- if copies of several issues are bound together, |
169 |
|
|
you actually have only one copy, inheriting from several issues |
170 |
|
|
|
171 |
|
|
|
172 |
|
|
To summarize, a lot of complicated situations can be modelled in a quite |
173 |
|
|
natural fashion when thinking in terms of inheritance through patch relations, |
174 |
|
|
and these, in turn can be easily and efficiently be implemented based |
175 |
|
|
on ISIS record and database structures. |
176 |
|
|
|
177 |
|
|
If you think of modelling that in a relational database, |
178 |
|
|
you will probably find that you end up with |
179 |
|
|
> RdbConv mimicking |
180 |
|
|
the ISIS record. |
181 |
|
|
|
182 |
|
|
--- |
183 |
|
|
$Id: PatchWork.txt,v 1.2 2003/06/23 14:43:24 kripke Exp $ |