/[webpac]/openisis/current/doc/whatabout.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/whatabout.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 13521 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 * what makes ISIS ISIS ?
2    
3     Andrew Giles-Peters raised the important question
4     "What is it about ISIS that makes it ISIS?"
5    
6    
7     So here are some thougts on this topic from the OpenIsis team:
8    
9     - As a database used for bibliographic data (among other),
10     ISIS must be able to store and retrieve records as exchanged via
11     ISO2709 efficiently and with no or minimal loss of information.
12     - Besides the ability to retrieve records by number,
13     ISIS must support an indexing mechanism which is essentially
14     "function based", that is, index entries are not the immediate
15     field values, but rather the values of a "view" derived by
16     some computation are indexed.
17     - ISIS must efficiently support typical query elements commonly
18     used on bibliographic databases, like looking up a value without
19     regard for the field or in several fields at once and specifying
20     a distance within search terms should occur.
21    
22     Since these are minimal requirements,
23     they would not stop anybody from adding tons of features on top.
24     For example, it's relatively easy to store ISO2709 data in a
25     relational database like Sybase (used by OCLC/Pica),
26     each record covering several rows (mfn, field number, field occ, value),
27     then compute a second similar table for the index and so on.
28    
29    
30     However, there is the word "efficiently",
31     which practically turns out to put some restrictions on the
32     feature-load, especially when combined with:
33    
34     - ISIS must be widely usable even in the face of *very* low budgets.
35     Therefore, not only the software itself must be available for at most
36     a nominal fee, but it also must not require very new, very powerful
37     or otherwise expensive hardware and system.
38     Even very large catalogs should get by with moderate system costs.
39    
40     The OCLC/Pica system for example requires one to spend
41     hundreds of thousands of dollars for powerful Sun machines.
42    
43    
44     * end of story ?
45    
46     Still, it would be very nice if more areas of application could
47     be explored for ISIS, both for the librarians in order to be able
48     to use their favourite DB (i.e. ISIS) for a broader range of
49     tasks and also to expand the user community, possibly leading
50     to more support for everybody.
51    
52     One important question is whether ISIS needs some fundamental changes
53     deep in it's guts, or whether it already has everything that's needed
54     to build a broad range of sophisticated solutions on top of it.
55     As you might expect, we are pretty well convinced of the latter.
56    
57    
58     * file formats
59    
60     Just like it doesn't harm a database much to be exported to and
61     imported from ISO2709, there is not much of a problem with different
62     file formats, as long as there do exist conversion tools.
63     As you know, CISIS/Unix-DBs are incompatible to WinIsis/DOS-DBs,
64     but may be converted via ISO files.
65     As long as the basic data structures are the same,
66     lossless conversion is just a matter of tools.
67     It's even less of a problem if the software itself can read
68     several file formats (like openisis does).
69     You won't care much whether your wordprocessor is reading
70     a .doc or .rtf file, would you?
71     We did an interesting and very successful study implementing
72     an ISIS-like DB in pure Java using a plaintext masterfile
73     very similar to the Mbox mailfolder format
74     (hope to be able to release the code soon).
75     Likewise there is no reason why one should not be able to read
76     directly from an ISO2709 file.
77     Besides convertible masterfile formats, one might well use other
78     formats for xref and index, which always can be reconstructed as needed.
79     There are several reasons like improved performance or robustness to do so.
80     So I don't think ISIS is defined in terms of detailled file formats,
81     but rather in terms of the basic data structures.
82    
83     One problem that might come to mind when talking about file formats
84     are the limits. While the maximum number of records per DB as well
85     as the maximum total file sizes are bypassed relatively easy
86     by logically joining several databases, the maximum record size of
87     about 32K is a limit which might be unacceptable for some applications.
88     (Although it can partly be resolved by deploying external files
89     like OCLC/Pica does to circumvent Sybase's varchar limits).
90     Raising this limit would clearly restrict lossless conversion to one way,
91     from small to large DB. Where a large DB model is needed,
92     all parties developing ISIS software should agree on one format
93     to allow for as-painless-as-possible interoperability.
94    
95    
96     * so what kind of database is ISIS ?
97    
98     Classical database theory basically distinguishes ISAM,
99     network, hierarchical and relational database systems.
100     ISIS is strongly related to ISAM DBs, however it's flexible
101     indexing is rarely paralleled by any of these systems
102     and it's non-flat data model is targeted by hierarchical DBs
103     only (in greater generality and with much higher costs).
104    
105     - Although direct joins by MFN shouldn't be too costly,
106     ISIS is not the database of choice when several records
107     typically need to be combined in queries or transactions.
108     However, in many application cases, only one ISIS record is
109     needed as opposed to several relational table rows.
110     In such situations, ISIS is even an excellent and efficient
111     transaction (OLTP) database (since save writing of an ISIS
112     record is much simpler than other DB's undo/redo logs).
113     - ISIS is not the database of choice when records are updated
114     by the hour. However, where only about 10% of records are
115     changed between two (monthly, weekly or daily) runs of backup
116     and compactification, the space overhead is not a big problem.
117     Where old versions of data need to be retained anyway
118     (as often needed and supported, for example, by postgres history),
119     you would hardly find a more efficient solution.
120     - ISIS is not the database of choice when it comes to high volume online
121     analytical processing (querying statistics on several dimensions, OLAP).
122     However, after reading some database books and Oracle manuals,
123     one learns that OLAP requires a well designed ("star schema")
124     database separate from the transactional one, anyway.
125     - ISIS does not, in itself, provide any concurrency control
126     (actual implementations do, to some extend). This doesn't
127     hurt when running a read-only multi-user catalogue,
128     a stand-alone application and in some insert-only situations.
129     For distributed multi-client update, there are mechanisms based
130     on timestamps or stored procedures that need to be supported
131     by some ISIS server to come.
132    
133    
134     While these data models are strongly tied to the logical
135     nature and physical organisation of the data,
136     newer notions like that of an 'object oriented' or 'XML'
137     database rather describe a way to use and access a database.
138     Actually OO or XML DBs are usually based on one of
139     the above mentioned systems (mostly relational ones).
140     For the most part, using a DB as OO or XML storage does require nothing
141     but some libraries and optionally precompilers for C++ or Java
142     -- these can be build on top of existing ISIS without changing it,
143     and ISIS will be an excellent choice for many applications.
144     Some aspects of increased functionality and performance will
145     require sort of "stored procedures" running inside the database.
146     In the case of a XML DB they are used for example to decomposite structures,
147     in the OO case they might need some sort of "magic switch" (method
148     overriding) to perform differently for some records than for others.
149     We believe that all this magic can be achieved based on ISIS.
150     The concepts of an ISIS database server and a scripting language as an
151     alternative to formatting exits are to be discussed elsewhere ...
152    
153     First we want to shed some more light on the great flexibility the
154     ISIS database system has by it's very nature.
155    
156    
157     * ISIS is a mail database
158    
159     Looking at http://www.faqs.org/rfcs/rfc822.html (or its updates)
160     one will find many similarities between ISO2709 records and internet mails,
161     which are, after all, essentially a series of header names and values.
162     After assigning numbers to the 100 or 200 most commonly used headers
163     and some sort of subfield encoding (e.g. "^nname^vvalue",
164     "name<TAB>value" or simply "name: value") to store other header lines
165     with a special field number, mails are easily and very efficiently
166     stored in an ISIS database. Given the enormous number of communication,
167     groupware and workflow systems that are nowadays built upon standard plain
168     internet mails (typically using a set of special mail headers),
169     this is a very large area to be served by ISIS databases.
170     The above mentioned Mbox-style implementation of ISIS tends towards
171     that direction, building upon the javax.mail standard.
172     IMAP mail servers could greatly benefit from the powerful indexing
173     and retrieval system of ISIS databases.
174     If also the mail sending application allows to select special headers
175     from an entry form prepared by a skilled librarian with thesauri
176     and systematics, an institution or company could really come to a new
177     way of using mail as a system of qualified, living information.
178    
179    
180     * ISIS is a multimedia database
181    
182     After all the mail not only has got headers, but also a body.
183     A plaintext body of reasonable length (some KB, like sent by nice people),
184     fits without problem in a field whose number means "body".
185     A multipart body is easily decomposed to a series of body fields.
186     Wether larger or non-plaintext bodies are stored within or outside
187     the masterfile is a matter of the actual implementation and doesn't
188     need to be discussed here, both approaches have their pros and cons.
189     Anyway, the MIME standard, up and running since 1982,
190     allows for storage and transmission of anything that uses bytes,
191     and is easily integrated with ISIS databases
192     (we partly did it, code to be released).
193    
194    
195     * ISIS is a XML database
196    
197     Likewise XML, which basically is text, can be stored in an ISIS database
198     (with respect to the implementation's maximum record length).
199     Add some formatting exits to address the XML node content via
200     a DOM-style a.b.c notation as used in javascript, use them in your FST
201     and you will for sure have one of the world's best indexed and fastest
202     XML database -- most others are using a relational DB as basis.
203     So indexing, retrieving and displaying XML data is more or less
204     simply a matter of some formatting functions.
205    
206     However, when thinking about data entry forms, for example,
207     the dark side of the force shows up:
208     Even with a very sophisticated database system with the ability to make
209     sense out of XML DTDs, it is anyway potentially much more complicated.
210     XML was meant to provide arbitrary complexity in the first place.
211     And when it comes to DTDs like that of XHTML, which will carry just about
212     the same content as any HTML page, one easily understands that reasonable
213     automatic processing becomes nearly impossible -- that's the reason why
214     HTML pages are largely beefed up with headers (Dublin Core and others).
215     If you really desperately need it, it's good to have it,
216     but else using it might be looking for trouble.
217    
218     When having to work with XML structures for one or the other reason,
219     typically because they should be imported or exported,
220     one should think of a mapping between XML and ISIS structures.
221     In many situations XML structures are shallow and can be ISIfied by
222     simply mapping the first level of sister nodes to ISIS fields and the
223     second level to subfields (may require repeated subfield support).
224     In other situations a closer look at the data structure may reveal
225     that it is not well designed with regard to Ockham's razor but contains
226     totally unnecessary depth which may be collapsed to the first case.
227     Actually, during several years of work with XML structures as
228     suggested by several "standards", I rarely found a reasonable
229     structure which can not be mapped to a field-subfield-schema.
230    
231    
232     But even if you really need XML structures "as is",
233     they can be stored very
234     > Struct efficiently
235     in ISIS, with all the benefits of the flexible index
236     (c.f.
237     > unirec the universal ISIS record)
238     .
239     Anyway, Dublin Core metadata or other RDF (resource description framework)
240     headers are conveniently stored in ISIS just like mail headers.
241     Maybe, as this schema was created to suit the needs of the very
242     old science of bibliographic knowledge management, much of that
243     experience was built into it.
244    
245     On the other hand, XML's ancestor SGML was conceived for a document's body,
246     not the head, and I guess there still is it's place in spite of
247     programming industry's hype. The use of XML for structuring documents
248     that are ment to be read by humans rather than machines of course
249     is perfectly reasonable. Transparent access to file based data associated
250     with a record and a XML add-on to the formatting language could aid
251     in converting extracts of document contents to metadata accessible
252     in the ISIS database and/or it's index.
253    
254    
255     To wrap it up, I'd suggest to look at XML as an optional add-on to ISIS
256     rather than an integral part. ISIS already has all the functionality
257     needed to support any reasonable use of XML. ISIS data can much more
258     efficiently contain XML structures than the other way round.
259    
260    
261     * ISIS is a database for document/content management systems
262    
263     It follows that ISIS may very well support the needs of
264     systems for XML documents or website content in XML or HTML.
265     With increasing experience with such systems, people tend to
266     understand that content metadata should be organized according
267     to bibliographic principles. (Not that surprising, is it)?
268     In cooperation with the oc4science.org there are projects at german
269     universities to integrate publishing, document management and website CMS,
270     based on an (Open)ISIS DB and directed by the librarian.
271    
272    
273     -----------
274     $Id: whatabout.txt,v 1.8 2003/02/14 17:30:33 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26