1 |
|
2 |
The universal ISIS record |
3 |
|
4 |
|
5 |
|
6 |
Users of CDS/ISIS are accustomed to stuffing not only bibliographic, |
7 |
but all sorts of data into ISIS records. The 1994 edition of ANSI/NISO |
8 |
Z39.2 ("Information Interchange Format" alias ISO2709), after which ISIS |
9 |
records are modelled (more or less), contained a "reduction of references |
10 |
to 'bibliographic' data, because the standard is used for many other types". |
11 |
|
12 |
For example, CDS/ISIS uses a ISIS database to hold the various texts |
13 |
for language specific versions. Various control files like the syspar, |
14 |
FDT and FST are well suited for storage in ISIS records. |
15 |
Probably some implementations of CDS/ISIS internally use ISIS records |
16 |
to hold that data, but none seems to be able to read/write those to/from |
17 |
ISIS master files, ISO2709 or a choice of textformats, including XML. |
18 |
|
19 |
|
20 |
* nesting |
21 |
|
22 |
But then, there is more. I already mentioned that e-mail and many |
23 |
simple XML structures are conveniently stored in ISIS records. |
24 |
But even data which does not all that obviously follow a tag-value-list |
25 |
scheme can be fit easily into such a record. Consider what is happening |
26 |
when ASN.1/BER-encoded structures are sent down the wire to a Z39.50 server, |
27 |
for example structured Type-1 queries to locate bibliographic information: |
28 |
They are turned into a series of tags and values. "Twain OR Clemens" is |
29 |
sent as an "OR" field followed by two term fields valued Twain and Clemens. |
30 |
|
31 |
We can do that same serialization trick, of course: |
32 |
embedding one structure (i.e. record i.e. tag-value-list) |
33 |
into another by simple inserting the fields. |
34 |
That way we can achieve nesting of arbitrary depth no less than with XML. |
35 |
|
36 |
One problem that comes to mind: how do we tell the boundaries? |
37 |
- for structures with a fixed number of fields, like the "OR" node |
38 |
having two childs, boundaries are implicit. |
39 |
- the length (number of childs) may be given with the opening field. |
40 |
This may be inconvenient and/or error-prone, if not computed automatically. |
41 |
- a closing item like "</>" may be used, e.g. a reserved field tag. |
42 |
|
43 |
These approaches are now |
44 |
> Struct discussed in more detail. |
45 |
|
46 |
|
47 |
Based on such a schema, not only queries can be expressed and stored |
48 |
as records, but also formats, with proper nesting of IFs, loops and so on. |
49 |
This approach has a couple of advantages: |
50 |
- formats may be specified in any of a couple of external representations |
51 |
including XML |
52 |
- the variants of the CDS/ISIS formatting language with different |
53 |
names for the same functions can be supported using input filters |
54 |
- formats can be stored, retrieved and exchanged using standard means |
55 |
|
56 |
On the other hand, the formatting language could be augmented |
57 |
to support substructures. A straightforward and relatively easy to |
58 |
use and implement extension would be a PASCAL-style WITH r DO. |
59 |
The current OpenIsis bindings, especially Tcl as preferred formatting language, |
60 |
contain such support. |
61 |
|
62 |
|
63 |
* external representation |
64 |
|
65 |
Besides CDS/ISIS master files and ISO2709 files, |
66 |
there are a couple of text based formats suitable |
67 |
to store or exchange ISIS records. |
68 |
|
69 |
Most follow a name=value style and are using separators like '=', |
70 |
':' and linebreaks, with different quoting rules. |
71 |
Among these are |
72 |
- RFC 822 emails |
73 |
- Java properties |
74 |
- Windows-style .INI files |
75 |
- character or tabulator separated values (tsv/csv) |
76 |
(think of the TAB as subfield delimiter) |
77 |
|
78 |
Then there are XML/HTML/SGML and finally freestyle languages, |
79 |
like the query or formatting language, where item boundaries |
80 |
are determined depending on context. |
81 |
|
82 |
Conversion would typically be based on one or more FDTs, |
83 |
mapping between names and numbers. |
84 |
Such a mapping, when used with formats, could also enable the |
85 |
use of symbolic field names like author instead of v24. |
86 |
|
87 |
The "plain" representation as preferrably used by OpenIsis |
88 |
is described in the papaer on |
89 |
> Serialized |
90 |
records. |
91 |
|
92 |
----- |
93 |
$Id: unirec.txt,v 1.6 2003/04/07 13:12:43 kripke Exp $ |