This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/current/doc/whatabout.txt

Parent Directory Parent Directory | Revision Log Revision Log

Revision 237 - (show annotations)
Mon Mar 8 17:43:12 2004 UTC (17 years, 3 months ago) by dpavlin
File MIME type: text/plain
File size: 13521 byte(s)
initial import of openisis 0.9.0 vendor drop

1 * what makes ISIS ISIS ?
3 Andrew Giles-Peters raised the important question
4 "What is it about ISIS that makes it ISIS?"
7 So here are some thougts on this topic from the OpenIsis team:
9 - As a database used for bibliographic data (among other),
10 ISIS must be able to store and retrieve records as exchanged via
11 ISO2709 efficiently and with no or minimal loss of information.
12 - Besides the ability to retrieve records by number,
13 ISIS must support an indexing mechanism which is essentially
14 "function based", that is, index entries are not the immediate
15 field values, but rather the values of a "view" derived by
16 some computation are indexed.
17 - ISIS must efficiently support typical query elements commonly
18 used on bibliographic databases, like looking up a value without
19 regard for the field or in several fields at once and specifying
20 a distance within search terms should occur.
22 Since these are minimal requirements,
23 they would not stop anybody from adding tons of features on top.
24 For example, it's relatively easy to store ISO2709 data in a
25 relational database like Sybase (used by OCLC/Pica),
26 each record covering several rows (mfn, field number, field occ, value),
27 then compute a second similar table for the index and so on.
30 However, there is the word "efficiently",
31 which practically turns out to put some restrictions on the
32 feature-load, especially when combined with:
34 - ISIS must be widely usable even in the face of *very* low budgets.
35 Therefore, not only the software itself must be available for at most
36 a nominal fee, but it also must not require very new, very powerful
37 or otherwise expensive hardware and system.
38 Even very large catalogs should get by with moderate system costs.
40 The OCLC/Pica system for example requires one to spend
41 hundreds of thousands of dollars for powerful Sun machines.
44 * end of story ?
46 Still, it would be very nice if more areas of application could
47 be explored for ISIS, both for the librarians in order to be able
48 to use their favourite DB (i.e. ISIS) for a broader range of
49 tasks and also to expand the user community, possibly leading
50 to more support for everybody.
52 One important question is whether ISIS needs some fundamental changes
53 deep in it's guts, or whether it already has everything that's needed
54 to build a broad range of sophisticated solutions on top of it.
55 As you might expect, we are pretty well convinced of the latter.
58 * file formats
60 Just like it doesn't harm a database much to be exported to and
61 imported from ISO2709, there is not much of a problem with different
62 file formats, as long as there do exist conversion tools.
63 As you know, CISIS/Unix-DBs are incompatible to WinIsis/DOS-DBs,
64 but may be converted via ISO files.
65 As long as the basic data structures are the same,
66 lossless conversion is just a matter of tools.
67 It's even less of a problem if the software itself can read
68 several file formats (like openisis does).
69 You won't care much whether your wordprocessor is reading
70 a .doc or .rtf file, would you?
71 We did an interesting and very successful study implementing
72 an ISIS-like DB in pure Java using a plaintext masterfile
73 very similar to the Mbox mailfolder format
74 (hope to be able to release the code soon).
75 Likewise there is no reason why one should not be able to read
76 directly from an ISO2709 file.
77 Besides convertible masterfile formats, one might well use other
78 formats for xref and index, which always can be reconstructed as needed.
79 There are several reasons like improved performance or robustness to do so.
80 So I don't think ISIS is defined in terms of detailled file formats,
81 but rather in terms of the basic data structures.
83 One problem that might come to mind when talking about file formats
84 are the limits. While the maximum number of records per DB as well
85 as the maximum total file sizes are bypassed relatively easy
86 by logically joining several databases, the maximum record size of
87 about 32K is a limit which might be unacceptable for some applications.
88 (Although it can partly be resolved by deploying external files
89 like OCLC/Pica does to circumvent Sybase's varchar limits).
90 Raising this limit would clearly restrict lossless conversion to one way,
91 from small to large DB. Where a large DB model is needed,
92 all parties developing ISIS software should agree on one format
93 to allow for as-painless-as-possible interoperability.
96 * so what kind of database is ISIS ?
98 Classical database theory basically distinguishes ISAM,
99 network, hierarchical and relational database systems.
100 ISIS is strongly related to ISAM DBs, however it's flexible
101 indexing is rarely paralleled by any of these systems
102 and it's non-flat data model is targeted by hierarchical DBs
103 only (in greater generality and with much higher costs).
105 - Although direct joins by MFN shouldn't be too costly,
106 ISIS is not the database of choice when several records
107 typically need to be combined in queries or transactions.
108 However, in many application cases, only one ISIS record is
109 needed as opposed to several relational table rows.
110 In such situations, ISIS is even an excellent and efficient
111 transaction (OLTP) database (since save writing of an ISIS
112 record is much simpler than other DB's undo/redo logs).
113 - ISIS is not the database of choice when records are updated
114 by the hour. However, where only about 10% of records are
115 changed between two (monthly, weekly or daily) runs of backup
116 and compactification, the space overhead is not a big problem.
117 Where old versions of data need to be retained anyway
118 (as often needed and supported, for example, by postgres history),
119 you would hardly find a more efficient solution.
120 - ISIS is not the database of choice when it comes to high volume online
121 analytical processing (querying statistics on several dimensions, OLAP).
122 However, after reading some database books and Oracle manuals,
123 one learns that OLAP requires a well designed ("star schema")
124 database separate from the transactional one, anyway.
125 - ISIS does not, in itself, provide any concurrency control
126 (actual implementations do, to some extend). This doesn't
127 hurt when running a read-only multi-user catalogue,
128 a stand-alone application and in some insert-only situations.
129 For distributed multi-client update, there are mechanisms based
130 on timestamps or stored procedures that need to be supported
131 by some ISIS server to come.
134 While these data models are strongly tied to the logical
135 nature and physical organisation of the data,
136 newer notions like that of an 'object oriented' or 'XML'
137 database rather describe a way to use and access a database.
138 Actually OO or XML DBs are usually based on one of
139 the above mentioned systems (mostly relational ones).
140 For the most part, using a DB as OO or XML storage does require nothing
141 but some libraries and optionally precompilers for C++ or Java
142 -- these can be build on top of existing ISIS without changing it,
143 and ISIS will be an excellent choice for many applications.
144 Some aspects of increased functionality and performance will
145 require sort of "stored procedures" running inside the database.
146 In the case of a XML DB they are used for example to decomposite structures,
147 in the OO case they might need some sort of "magic switch" (method
148 overriding) to perform differently for some records than for others.
149 We believe that all this magic can be achieved based on ISIS.
150 The concepts of an ISIS database server and a scripting language as an
151 alternative to formatting exits are to be discussed elsewhere ...
153 First we want to shed some more light on the great flexibility the
154 ISIS database system has by it's very nature.
157 * ISIS is a mail database
159 Looking at http://www.faqs.org/rfcs/rfc822.html (or its updates)
160 one will find many similarities between ISO2709 records and internet mails,
161 which are, after all, essentially a series of header names and values.
162 After assigning numbers to the 100 or 200 most commonly used headers
163 and some sort of subfield encoding (e.g. "^nname^vvalue",
164 "name<TAB>value" or simply "name: value") to store other header lines
165 with a special field number, mails are easily and very efficiently
166 stored in an ISIS database. Given the enormous number of communication,
167 groupware and workflow systems that are nowadays built upon standard plain
168 internet mails (typically using a set of special mail headers),
169 this is a very large area to be served by ISIS databases.
170 The above mentioned Mbox-style implementation of ISIS tends towards
171 that direction, building upon the javax.mail standard.
172 IMAP mail servers could greatly benefit from the powerful indexing
173 and retrieval system of ISIS databases.
174 If also the mail sending application allows to select special headers
175 from an entry form prepared by a skilled librarian with thesauri
176 and systematics, an institution or company could really come to a new
177 way of using mail as a system of qualified, living information.
180 * ISIS is a multimedia database
182 After all the mail not only has got headers, but also a body.
183 A plaintext body of reasonable length (some KB, like sent by nice people),
184 fits without problem in a field whose number means "body".
185 A multipart body is easily decomposed to a series of body fields.
186 Wether larger or non-plaintext bodies are stored within or outside
187 the masterfile is a matter of the actual implementation and doesn't
188 need to be discussed here, both approaches have their pros and cons.
189 Anyway, the MIME standard, up and running since 1982,
190 allows for storage and transmission of anything that uses bytes,
191 and is easily integrated with ISIS databases
192 (we partly did it, code to be released).
195 * ISIS is a XML database
197 Likewise XML, which basically is text, can be stored in an ISIS database
198 (with respect to the implementation's maximum record length).
199 Add some formatting exits to address the XML node content via
200 a DOM-style a.b.c notation as used in javascript, use them in your FST
201 and you will for sure have one of the world's best indexed and fastest
202 XML database -- most others are using a relational DB as basis.
203 So indexing, retrieving and displaying XML data is more or less
204 simply a matter of some formatting functions.
206 However, when thinking about data entry forms, for example,
207 the dark side of the force shows up:
208 Even with a very sophisticated database system with the ability to make
209 sense out of XML DTDs, it is anyway potentially much more complicated.
210 XML was meant to provide arbitrary complexity in the first place.
211 And when it comes to DTDs like that of XHTML, which will carry just about
212 the same content as any HTML page, one easily understands that reasonable
213 automatic processing becomes nearly impossible -- that's the reason why
214 HTML pages are largely beefed up with headers (Dublin Core and others).
215 If you really desperately need it, it's good to have it,
216 but else using it might be looking for trouble.
218 When having to work with XML structures for one or the other reason,
219 typically because they should be imported or exported,
220 one should think of a mapping between XML and ISIS structures.
221 In many situations XML structures are shallow and can be ISIfied by
222 simply mapping the first level of sister nodes to ISIS fields and the
223 second level to subfields (may require repeated subfield support).
224 In other situations a closer look at the data structure may reveal
225 that it is not well designed with regard to Ockham's razor but contains
226 totally unnecessary depth which may be collapsed to the first case.
227 Actually, during several years of work with XML structures as
228 suggested by several "standards", I rarely found a reasonable
229 structure which can not be mapped to a field-subfield-schema.
232 But even if you really need XML structures "as is",
233 they can be stored very
234 > Struct efficiently
235 in ISIS, with all the benefits of the flexible index
236 (c.f.
237 > unirec the universal ISIS record)
238 .
239 Anyway, Dublin Core metadata or other RDF (resource description framework)
240 headers are conveniently stored in ISIS just like mail headers.
241 Maybe, as this schema was created to suit the needs of the very
242 old science of bibliographic knowledge management, much of that
243 experience was built into it.
245 On the other hand, XML's ancestor SGML was conceived for a document's body,
246 not the head, and I guess there still is it's place in spite of
247 programming industry's hype. The use of XML for structuring documents
248 that are ment to be read by humans rather than machines of course
249 is perfectly reasonable. Transparent access to file based data associated
250 with a record and a XML add-on to the formatting language could aid
251 in converting extracts of document contents to metadata accessible
252 in the ISIS database and/or it's index.
255 To wrap it up, I'd suggest to look at XML as an optional add-on to ISIS
256 rather than an integral part. ISIS already has all the functionality
257 needed to support any reasonable use of XML. ISIS data can much more
258 efficiently contain XML structures than the other way round.
261 * ISIS is a database for document/content management systems
263 It follows that ISIS may very well support the needs of
264 systems for XML documents or website content in XML or HTML.
265 With increasing experience with such systems, people tend to
266 understand that content metadata should be organized according
267 to bibliographic principles. (Not that surprising, is it)?
268 In cooperation with the oc4science.org there are projects at german
269 universities to integrate publishing, document management and website CMS,
270 based on an (Open)ISIS DB and directed by the librarian.
273 -----------
274 $Id: whatabout.txt,v 1.8 2003/02/14 17:30:33 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26