1 |
dpavlin |
237 |
* what makes ISIS ISIS ? |
2 |
|
|
|
3 |
|
|
Andrew Giles-Peters raised the important question |
4 |
|
|
"What is it about ISIS that makes it ISIS?" |
5 |
|
|
|
6 |
|
|
|
7 |
|
|
So here are some thougts on this topic from the OpenIsis team: |
8 |
|
|
|
9 |
|
|
- As a database used for bibliographic data (among other), |
10 |
|
|
ISIS must be able to store and retrieve records as exchanged via |
11 |
|
|
ISO2709 efficiently and with no or minimal loss of information. |
12 |
|
|
- Besides the ability to retrieve records by number, |
13 |
|
|
ISIS must support an indexing mechanism which is essentially |
14 |
|
|
"function based", that is, index entries are not the immediate |
15 |
|
|
field values, but rather the values of a "view" derived by |
16 |
|
|
some computation are indexed. |
17 |
|
|
- ISIS must efficiently support typical query elements commonly |
18 |
|
|
used on bibliographic databases, like looking up a value without |
19 |
|
|
regard for the field or in several fields at once and specifying |
20 |
|
|
a distance within search terms should occur. |
21 |
|
|
|
22 |
|
|
Since these are minimal requirements, |
23 |
|
|
they would not stop anybody from adding tons of features on top. |
24 |
|
|
For example, it's relatively easy to store ISO2709 data in a |
25 |
|
|
relational database like Sybase (used by OCLC/Pica), |
26 |
|
|
each record covering several rows (mfn, field number, field occ, value), |
27 |
|
|
then compute a second similar table for the index and so on. |
28 |
|
|
|
29 |
|
|
|
30 |
|
|
However, there is the word "efficiently", |
31 |
|
|
which practically turns out to put some restrictions on the |
32 |
|
|
feature-load, especially when combined with: |
33 |
|
|
|
34 |
|
|
- ISIS must be widely usable even in the face of *very* low budgets. |
35 |
|
|
Therefore, not only the software itself must be available for at most |
36 |
|
|
a nominal fee, but it also must not require very new, very powerful |
37 |
|
|
or otherwise expensive hardware and system. |
38 |
|
|
Even very large catalogs should get by with moderate system costs. |
39 |
|
|
|
40 |
|
|
The OCLC/Pica system for example requires one to spend |
41 |
|
|
hundreds of thousands of dollars for powerful Sun machines. |
42 |
|
|
|
43 |
|
|
|
44 |
|
|
* end of story ? |
45 |
|
|
|
46 |
|
|
Still, it would be very nice if more areas of application could |
47 |
|
|
be explored for ISIS, both for the librarians in order to be able |
48 |
|
|
to use their favourite DB (i.e. ISIS) for a broader range of |
49 |
|
|
tasks and also to expand the user community, possibly leading |
50 |
|
|
to more support for everybody. |
51 |
|
|
|
52 |
|
|
One important question is whether ISIS needs some fundamental changes |
53 |
|
|
deep in it's guts, or whether it already has everything that's needed |
54 |
|
|
to build a broad range of sophisticated solutions on top of it. |
55 |
|
|
As you might expect, we are pretty well convinced of the latter. |
56 |
|
|
|
57 |
|
|
|
58 |
|
|
* file formats |
59 |
|
|
|
60 |
|
|
Just like it doesn't harm a database much to be exported to and |
61 |
|
|
imported from ISO2709, there is not much of a problem with different |
62 |
|
|
file formats, as long as there do exist conversion tools. |
63 |
|
|
As you know, CISIS/Unix-DBs are incompatible to WinIsis/DOS-DBs, |
64 |
|
|
but may be converted via ISO files. |
65 |
|
|
As long as the basic data structures are the same, |
66 |
|
|
lossless conversion is just a matter of tools. |
67 |
|
|
It's even less of a problem if the software itself can read |
68 |
|
|
several file formats (like openisis does). |
69 |
|
|
You won't care much whether your wordprocessor is reading |
70 |
|
|
a .doc or .rtf file, would you? |
71 |
|
|
We did an interesting and very successful study implementing |
72 |
|
|
an ISIS-like DB in pure Java using a plaintext masterfile |
73 |
|
|
very similar to the Mbox mailfolder format |
74 |
|
|
(hope to be able to release the code soon). |
75 |
|
|
Likewise there is no reason why one should not be able to read |
76 |
|
|
directly from an ISO2709 file. |
77 |
|
|
Besides convertible masterfile formats, one might well use other |
78 |
|
|
formats for xref and index, which always can be reconstructed as needed. |
79 |
|
|
There are several reasons like improved performance or robustness to do so. |
80 |
|
|
So I don't think ISIS is defined in terms of detailled file formats, |
81 |
|
|
but rather in terms of the basic data structures. |
82 |
|
|
|
83 |
|
|
One problem that might come to mind when talking about file formats |
84 |
|
|
are the limits. While the maximum number of records per DB as well |
85 |
|
|
as the maximum total file sizes are bypassed relatively easy |
86 |
|
|
by logically joining several databases, the maximum record size of |
87 |
|
|
about 32K is a limit which might be unacceptable for some applications. |
88 |
|
|
(Although it can partly be resolved by deploying external files |
89 |
|
|
like OCLC/Pica does to circumvent Sybase's varchar limits). |
90 |
|
|
Raising this limit would clearly restrict lossless conversion to one way, |
91 |
|
|
from small to large DB. Where a large DB model is needed, |
92 |
|
|
all parties developing ISIS software should agree on one format |
93 |
|
|
to allow for as-painless-as-possible interoperability. |
94 |
|
|
|
95 |
|
|
|
96 |
|
|
* so what kind of database is ISIS ? |
97 |
|
|
|
98 |
|
|
Classical database theory basically distinguishes ISAM, |
99 |
|
|
network, hierarchical and relational database systems. |
100 |
|
|
ISIS is strongly related to ISAM DBs, however it's flexible |
101 |
|
|
indexing is rarely paralleled by any of these systems |
102 |
|
|
and it's non-flat data model is targeted by hierarchical DBs |
103 |
|
|
only (in greater generality and with much higher costs). |
104 |
|
|
|
105 |
|
|
- Although direct joins by MFN shouldn't be too costly, |
106 |
|
|
ISIS is not the database of choice when several records |
107 |
|
|
typically need to be combined in queries or transactions. |
108 |
|
|
However, in many application cases, only one ISIS record is |
109 |
|
|
needed as opposed to several relational table rows. |
110 |
|
|
In such situations, ISIS is even an excellent and efficient |
111 |
|
|
transaction (OLTP) database (since save writing of an ISIS |
112 |
|
|
record is much simpler than other DB's undo/redo logs). |
113 |
|
|
- ISIS is not the database of choice when records are updated |
114 |
|
|
by the hour. However, where only about 10% of records are |
115 |
|
|
changed between two (monthly, weekly or daily) runs of backup |
116 |
|
|
and compactification, the space overhead is not a big problem. |
117 |
|
|
Where old versions of data need to be retained anyway |
118 |
|
|
(as often needed and supported, for example, by postgres history), |
119 |
|
|
you would hardly find a more efficient solution. |
120 |
|
|
- ISIS is not the database of choice when it comes to high volume online |
121 |
|
|
analytical processing (querying statistics on several dimensions, OLAP). |
122 |
|
|
However, after reading some database books and Oracle manuals, |
123 |
|
|
one learns that OLAP requires a well designed ("star schema") |
124 |
|
|
database separate from the transactional one, anyway. |
125 |
|
|
- ISIS does not, in itself, provide any concurrency control |
126 |
|
|
(actual implementations do, to some extend). This doesn't |
127 |
|
|
hurt when running a read-only multi-user catalogue, |
128 |
|
|
a stand-alone application and in some insert-only situations. |
129 |
|
|
For distributed multi-client update, there are mechanisms based |
130 |
|
|
on timestamps or stored procedures that need to be supported |
131 |
|
|
by some ISIS server to come. |
132 |
|
|
|
133 |
|
|
|
134 |
|
|
While these data models are strongly tied to the logical |
135 |
|
|
nature and physical organisation of the data, |
136 |
|
|
newer notions like that of an 'object oriented' or 'XML' |
137 |
|
|
database rather describe a way to use and access a database. |
138 |
|
|
Actually OO or XML DBs are usually based on one of |
139 |
|
|
the above mentioned systems (mostly relational ones). |
140 |
|
|
For the most part, using a DB as OO or XML storage does require nothing |
141 |
|
|
but some libraries and optionally precompilers for C++ or Java |
142 |
|
|
-- these can be build on top of existing ISIS without changing it, |
143 |
|
|
and ISIS will be an excellent choice for many applications. |
144 |
|
|
Some aspects of increased functionality and performance will |
145 |
|
|
require sort of "stored procedures" running inside the database. |
146 |
|
|
In the case of a XML DB they are used for example to decomposite structures, |
147 |
|
|
in the OO case they might need some sort of "magic switch" (method |
148 |
|
|
overriding) to perform differently for some records than for others. |
149 |
|
|
We believe that all this magic can be achieved based on ISIS. |
150 |
|
|
The concepts of an ISIS database server and a scripting language as an |
151 |
|
|
alternative to formatting exits are to be discussed elsewhere ... |
152 |
|
|
|
153 |
|
|
First we want to shed some more light on the great flexibility the |
154 |
|
|
ISIS database system has by it's very nature. |
155 |
|
|
|
156 |
|
|
|
157 |
|
|
* ISIS is a mail database |
158 |
|
|
|
159 |
|
|
Looking at http://www.faqs.org/rfcs/rfc822.html (or its updates) |
160 |
|
|
one will find many similarities between ISO2709 records and internet mails, |
161 |
|
|
which are, after all, essentially a series of header names and values. |
162 |
|
|
After assigning numbers to the 100 or 200 most commonly used headers |
163 |
|
|
and some sort of subfield encoding (e.g. "^nname^vvalue", |
164 |
|
|
"name<TAB>value" or simply "name: value") to store other header lines |
165 |
|
|
with a special field number, mails are easily and very efficiently |
166 |
|
|
stored in an ISIS database. Given the enormous number of communication, |
167 |
|
|
groupware and workflow systems that are nowadays built upon standard plain |
168 |
|
|
internet mails (typically using a set of special mail headers), |
169 |
|
|
this is a very large area to be served by ISIS databases. |
170 |
|
|
The above mentioned Mbox-style implementation of ISIS tends towards |
171 |
|
|
that direction, building upon the javax.mail standard. |
172 |
|
|
IMAP mail servers could greatly benefit from the powerful indexing |
173 |
|
|
and retrieval system of ISIS databases. |
174 |
|
|
If also the mail sending application allows to select special headers |
175 |
|
|
from an entry form prepared by a skilled librarian with thesauri |
176 |
|
|
and systematics, an institution or company could really come to a new |
177 |
|
|
way of using mail as a system of qualified, living information. |
178 |
|
|
|
179 |
|
|
|
180 |
|
|
* ISIS is a multimedia database |
181 |
|
|
|
182 |
|
|
After all the mail not only has got headers, but also a body. |
183 |
|
|
A plaintext body of reasonable length (some KB, like sent by nice people), |
184 |
|
|
fits without problem in a field whose number means "body". |
185 |
|
|
A multipart body is easily decomposed to a series of body fields. |
186 |
|
|
Wether larger or non-plaintext bodies are stored within or outside |
187 |
|
|
the masterfile is a matter of the actual implementation and doesn't |
188 |
|
|
need to be discussed here, both approaches have their pros and cons. |
189 |
|
|
Anyway, the MIME standard, up and running since 1982, |
190 |
|
|
allows for storage and transmission of anything that uses bytes, |
191 |
|
|
and is easily integrated with ISIS databases |
192 |
|
|
(we partly did it, code to be released). |
193 |
|
|
|
194 |
|
|
|
195 |
|
|
* ISIS is a XML database |
196 |
|
|
|
197 |
|
|
Likewise XML, which basically is text, can be stored in an ISIS database |
198 |
|
|
(with respect to the implementation's maximum record length). |
199 |
|
|
Add some formatting exits to address the XML node content via |
200 |
|
|
a DOM-style a.b.c notation as used in javascript, use them in your FST |
201 |
|
|
and you will for sure have one of the world's best indexed and fastest |
202 |
|
|
XML database -- most others are using a relational DB as basis. |
203 |
|
|
So indexing, retrieving and displaying XML data is more or less |
204 |
|
|
simply a matter of some formatting functions. |
205 |
|
|
|
206 |
|
|
However, when thinking about data entry forms, for example, |
207 |
|
|
the dark side of the force shows up: |
208 |
|
|
Even with a very sophisticated database system with the ability to make |
209 |
|
|
sense out of XML DTDs, it is anyway potentially much more complicated. |
210 |
|
|
XML was meant to provide arbitrary complexity in the first place. |
211 |
|
|
And when it comes to DTDs like that of XHTML, which will carry just about |
212 |
|
|
the same content as any HTML page, one easily understands that reasonable |
213 |
|
|
automatic processing becomes nearly impossible -- that's the reason why |
214 |
|
|
HTML pages are largely beefed up with headers (Dublin Core and others). |
215 |
|
|
If you really desperately need it, it's good to have it, |
216 |
|
|
but else using it might be looking for trouble. |
217 |
|
|
|
218 |
|
|
When having to work with XML structures for one or the other reason, |
219 |
|
|
typically because they should be imported or exported, |
220 |
|
|
one should think of a mapping between XML and ISIS structures. |
221 |
|
|
In many situations XML structures are shallow and can be ISIfied by |
222 |
|
|
simply mapping the first level of sister nodes to ISIS fields and the |
223 |
|
|
second level to subfields (may require repeated subfield support). |
224 |
|
|
In other situations a closer look at the data structure may reveal |
225 |
|
|
that it is not well designed with regard to Ockham's razor but contains |
226 |
|
|
totally unnecessary depth which may be collapsed to the first case. |
227 |
|
|
Actually, during several years of work with XML structures as |
228 |
|
|
suggested by several "standards", I rarely found a reasonable |
229 |
|
|
structure which can not be mapped to a field-subfield-schema. |
230 |
|
|
|
231 |
|
|
|
232 |
|
|
But even if you really need XML structures "as is", |
233 |
|
|
they can be stored very |
234 |
|
|
> Struct efficiently |
235 |
|
|
in ISIS, with all the benefits of the flexible index |
236 |
|
|
(c.f. |
237 |
|
|
> unirec the universal ISIS record) |
238 |
|
|
. |
239 |
|
|
Anyway, Dublin Core metadata or other RDF (resource description framework) |
240 |
|
|
headers are conveniently stored in ISIS just like mail headers. |
241 |
|
|
Maybe, as this schema was created to suit the needs of the very |
242 |
|
|
old science of bibliographic knowledge management, much of that |
243 |
|
|
experience was built into it. |
244 |
|
|
|
245 |
|
|
On the other hand, XML's ancestor SGML was conceived for a document's body, |
246 |
|
|
not the head, and I guess there still is it's place in spite of |
247 |
|
|
programming industry's hype. The use of XML for structuring documents |
248 |
|
|
that are ment to be read by humans rather than machines of course |
249 |
|
|
is perfectly reasonable. Transparent access to file based data associated |
250 |
|
|
with a record and a XML add-on to the formatting language could aid |
251 |
|
|
in converting extracts of document contents to metadata accessible |
252 |
|
|
in the ISIS database and/or it's index. |
253 |
|
|
|
254 |
|
|
|
255 |
|
|
To wrap it up, I'd suggest to look at XML as an optional add-on to ISIS |
256 |
|
|
rather than an integral part. ISIS already has all the functionality |
257 |
|
|
needed to support any reasonable use of XML. ISIS data can much more |
258 |
|
|
efficiently contain XML structures than the other way round. |
259 |
|
|
|
260 |
|
|
|
261 |
|
|
* ISIS is a database for document/content management systems |
262 |
|
|
|
263 |
|
|
It follows that ISIS may very well support the needs of |
264 |
|
|
systems for XML documents or website content in XML or HTML. |
265 |
|
|
With increasing experience with such systems, people tend to |
266 |
|
|
understand that content metadata should be organized according |
267 |
|
|
to bibliographic principles. (Not that surprising, is it)? |
268 |
|
|
In cooperation with the oc4science.org there are projects at german |
269 |
|
|
universities to integrate publishing, document management and website CMS, |
270 |
|
|
based on an (Open)ISIS DB and directed by the librarian. |
271 |
|
|
|
272 |
|
|
|
273 |
|
|
----------- |
274 |
|
|
$Id: whatabout.txt,v 1.8 2003/02/14 17:30:33 kripke Exp $ |