/[webpac]/openisis/current/doc/PatchWork.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/PatchWork.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years ago) by dpavlin
File MIME type: text/plain
File size: 7709 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 principles of a patchworked database:
2     how OO-style inheritance works for data
3    
4    
5     * references
6    
7     It is quite common for records in a database to refer in some way to other
8     records. In fact, it seems to be such an interesting property,
9     that the main classification of databases is along the lines of how
10     they handle references:
11     - in a hierarchical DB, every record has 1 or 0 parents,
12     and any number of childs
13     - in a networked DB, every record has a variable number of references
14     (typically fix for a given type of record)
15     - in a relational DB, every record has any number of references,
16     which are determined by queries. References are constructed on the fly;
17     if they are backed by some index and the query optimizer is able to
18     figure that out, this works even for larger DBs.
19    
20     While basically every model can be emulated on top of every other,
21     it is a matter of performance and convenience (i.e. performance counted
22     by hours of programming) how well a given model of data access supports
23     a given pattern of data usage (that's why you should stop thinking that
24     RDBMS solve all problems).
25    
26    
27     The magic working behind references is always the same:
28     either a (more or less) physical key of the referred record
29     or some logical key is stored in the referring record,
30     the latter requiring translation by an index (typically B-Tree).
31     In general, the key might as well be a RDF URI,
32     resolved by a http or other server.
33    
34    
35     The "physical" key is known as "master file number" (MFN) in ISIS databases,
36     as "ROWID" or similar in most RDBMS. Using a ROWID is a gross violation
37     of the principles of relational databases (Codd's axioms),
38     and most RDBMS cannot retain a ROWID reference via export/import
39     (i.e. DBs using such references can neither be reliably exported nor backed up,
40     short to a DB shutdown and file backup, as stated by a major vendor).
41    
42    
43     Hierarchical and networked DBs, on the other hand, have much stronger support
44     for the fast physical key, but typically still allow for logical keys to
45     reach via an index on any record. They just don't sport an elaborate
46     and "standardized" query language like SQL giving convenient access
47     to what does and doesn't work. It's up to the application programmer,
48     rather than the query optimizer designer, to follow the paths of well
49     supported references.
50    
51    
52     ISIS databases are clearly in the latter (older) branch of the family of
53     databases, believing more in the application programmer's knowledge
54     than in artificial intelligence. Besides the conceptually simple and ultra-fast
55     MFN reference, it also has a very flexible indexing mechanism,
56     allowing for references to even be based on fulltext.
57    
58    
59     We want to present a view on data, that has some interesting properties
60     and (while it can be emulated on any DBMS) is particularily well supported
61     on ISIS DBs:
62    
63    
64     * the patch relation
65    
66     We propose a relation between records that transcedes a mere reference
67     (independent of whether the underlying reference is based on a physical
68     or logical key):
69     - there is a patch operator
70     that constructs a new record from referring and referred-to record
71     - there is a diff operator
72     that constructs a new record from referring and referred-to record,
73     in such a way that the patch operator, applied to the latter and the result,
74     will construct the referring record
75    
76     Basically, a very small record can tell a story like
77     - I am related to that record
78     - I am very similar to that record, but
79     - I have these and those fields added/changed/deleted
80    
81    
82     Everybody who is accustomed to tools like RCS or CVS (based on RCS)
83     will get the idea of such a relation:
84     depending on what is most commonly needed, either one or the other version
85     can be stored much more compactly as just the diff to the other one.
86     Or, if both are referred to oftenly, you can store them both at full
87     life size, but put only the differences in the index.
88    
89    
90     If the idea of applying diff and patch to database records seems strange,
91     have a look at
92     > Serialized a plain text representation for ISIS records.
93     This paper does include a means to store patch "scripts" even at database
94     master file level, thereby providing very efficient support for
95     cross-referring patch records.
96    
97    
98     However, the patch relation can as well very efficiently
99     be implemented on top of a standard ISIS database.
100     Some meta data entries (per field in the FDT) might be used to set
101     convenient defaults for interpretation of references and patches:
102     - to which database an entry in field x referres to
103     - index tag and/or prefix to be used on lookup
104     - whether field occurences in the patch record are additive or overriding
105    
106     In the absence of more specific instructions,
107     the default patch operation is used by reading the referring record
108     as a series of set statements against it's base:
109     every field (tag) which is present in the referring record
110     overrides all fields of the same tag in the base.
111     Should there be multiple bases defined, they are by default added
112     (i.e. concatenated). A multiplikative operation is modeled
113     by chaining inheritance.
114    
115    
116     In a way, every reference can be regarded as defining a patch relation:
117     If you resolve the reference in a table join, you are actually creating
118     a new virtual record, comprised of some fields of the referred and referring
119     records. However, in the much more flexible ISIS data model,
120     the creation of a new record is much more flexibel.
121    
122    
123     * inheritance by patching
124    
125     One way to use patching is to think of a patch relation as being an
126     inheritance relation: not by class, but by object!
127     A record can inherit data fields from one or several other records,
128     which still exists as independent entities.
129    
130    
131     When the inheritance is resolved, some data from the parents is copied
132     to the new virtual record, some is dropped or overwritten.
133     An inheritance relation does not only apply to logically subordinate child
134     records, it can also be used for versioning, with a successor inheriting
135     from it's predecessor, while both are still available as independent entities.
136    
137    
138     * example: the serial killer
139    
140     Management of serials is known as a notoriuosly hard problem.
141     So let's have a look at how we could make a data model using
142     record inheritance.
143    
144    
145     We start out simple, with
146     - one record for the serial,
147     - one for each issue, inheriting from the serial
148     - one for each copy, inheriting from the issue
149    
150     The issue adds a number, date, table of contents and so on to the serial.
151     The copy adds some id to the issue, and maybe a note that it's in a poor
152     state, since somebody poured coffee over it.
153    
154    
155     But then things can get a little bit more complicated:
156     - if the serial changes it's name or other attributes,
157     a successor may inherit from the original one,
158     and following issues inherit from the successor
159     (while earlier issues still belong to the original)
160     - if two serials are joined, a successor may inherit from both
161     - sometimes two otherwise independent serials are printed
162     together, so the actual issues inherit from both
163     - an issue might have a reprint,
164     which inherits from the original issue,
165     changing the date and adding a new preface
166     - if somebody ripped an article out of a given copy,
167     that copy might adjust it's issues table of contents
168     - if copies of several issues are bound together,
169     you actually have only one copy, inheriting from several issues
170    
171    
172     To summarize, a lot of complicated situations can be modelled in a quite
173     natural fashion when thinking in terms of inheritance through patch relations,
174     and these, in turn can be easily and efficiently be implemented based
175     on ISIS record and database structures.
176    
177     If you think of modelling that in a relational database,
178     you will probably find that you end up with
179     > RdbConv mimicking
180     the ISIS record.
181    
182     ---
183     $Id: PatchWork.txt,v 1.2 2003/06/23 14:43:24 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26