1 |
dpavlin |
237 |
|
2 |
|
|
The universal ISIS record |
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
Users of CDS/ISIS are accustomed to stuffing not only bibliographic, |
7 |
|
|
but all sorts of data into ISIS records. The 1994 edition of ANSI/NISO |
8 |
|
|
Z39.2 ("Information Interchange Format" alias ISO2709), after which ISIS |
9 |
|
|
records are modelled (more or less), contained a "reduction of references |
10 |
|
|
to 'bibliographic' data, because the standard is used for many other types". |
11 |
|
|
|
12 |
|
|
For example, CDS/ISIS uses a ISIS database to hold the various texts |
13 |
|
|
for language specific versions. Various control files like the syspar, |
14 |
|
|
FDT and FST are well suited for storage in ISIS records. |
15 |
|
|
Probably some implementations of CDS/ISIS internally use ISIS records |
16 |
|
|
to hold that data, but none seems to be able to read/write those to/from |
17 |
|
|
ISIS master files, ISO2709 or a choice of textformats, including XML. |
18 |
|
|
|
19 |
|
|
|
20 |
|
|
* nesting |
21 |
|
|
|
22 |
|
|
But then, there is more. I already mentioned that e-mail and many |
23 |
|
|
simple XML structures are conveniently stored in ISIS records. |
24 |
|
|
But even data which does not all that obviously follow a tag-value-list |
25 |
|
|
scheme can be fit easily into such a record. Consider what is happening |
26 |
|
|
when ASN.1/BER-encoded structures are sent down the wire to a Z39.50 server, |
27 |
|
|
for example structured Type-1 queries to locate bibliographic information: |
28 |
|
|
They are turned into a series of tags and values. "Twain OR Clemens" is |
29 |
|
|
sent as an "OR" field followed by two term fields valued Twain and Clemens. |
30 |
|
|
|
31 |
|
|
We can do that same serialization trick, of course: |
32 |
|
|
embedding one structure (i.e. record i.e. tag-value-list) |
33 |
|
|
into another by simple inserting the fields. |
34 |
|
|
That way we can achieve nesting of arbitrary depth no less than with XML. |
35 |
|
|
|
36 |
|
|
One problem that comes to mind: how do we tell the boundaries? |
37 |
|
|
- for structures with a fixed number of fields, like the "OR" node |
38 |
|
|
having two childs, boundaries are implicit. |
39 |
|
|
- the length (number of childs) may be given with the opening field. |
40 |
|
|
This may be inconvenient and/or error-prone, if not computed automatically. |
41 |
|
|
- a closing item like "</>" may be used, e.g. a reserved field tag. |
42 |
|
|
|
43 |
|
|
These approaches are now |
44 |
|
|
> Struct discussed in more detail. |
45 |
|
|
|
46 |
|
|
|
47 |
|
|
Based on such a schema, not only queries can be expressed and stored |
48 |
|
|
as records, but also formats, with proper nesting of IFs, loops and so on. |
49 |
|
|
This approach has a couple of advantages: |
50 |
|
|
- formats may be specified in any of a couple of external representations |
51 |
|
|
including XML |
52 |
|
|
- the variants of the CDS/ISIS formatting language with different |
53 |
|
|
names for the same functions can be supported using input filters |
54 |
|
|
- formats can be stored, retrieved and exchanged using standard means |
55 |
|
|
|
56 |
|
|
On the other hand, the formatting language could be augmented |
57 |
|
|
to support substructures. A straightforward and relatively easy to |
58 |
|
|
use and implement extension would be a PASCAL-style WITH r DO. |
59 |
|
|
The current OpenIsis bindings, especially Tcl as preferred formatting language, |
60 |
|
|
contain such support. |
61 |
|
|
|
62 |
|
|
|
63 |
|
|
* external representation |
64 |
|
|
|
65 |
|
|
Besides CDS/ISIS master files and ISO2709 files, |
66 |
|
|
there are a couple of text based formats suitable |
67 |
|
|
to store or exchange ISIS records. |
68 |
|
|
|
69 |
|
|
Most follow a name=value style and are using separators like '=', |
70 |
|
|
':' and linebreaks, with different quoting rules. |
71 |
|
|
Among these are |
72 |
|
|
- RFC 822 emails |
73 |
|
|
- Java properties |
74 |
|
|
- Windows-style .INI files |
75 |
|
|
- character or tabulator separated values (tsv/csv) |
76 |
|
|
(think of the TAB as subfield delimiter) |
77 |
|
|
|
78 |
|
|
Then there are XML/HTML/SGML and finally freestyle languages, |
79 |
|
|
like the query or formatting language, where item boundaries |
80 |
|
|
are determined depending on context. |
81 |
|
|
|
82 |
|
|
Conversion would typically be based on one or more FDTs, |
83 |
|
|
mapping between names and numbers. |
84 |
|
|
Such a mapping, when used with formats, could also enable the |
85 |
|
|
use of symbolic field names like author instead of v24. |
86 |
|
|
|
87 |
|
|
The "plain" representation as preferrably used by OpenIsis |
88 |
|
|
is described in the papaer on |
89 |
|
|
> Serialized |
90 |
|
|
records. |
91 |
|
|
|
92 |
|
|
----- |
93 |
|
|
$Id: unirec.txt,v 1.6 2003/04/07 13:12:43 kripke Exp $ |