/[webpac]/openisis/current/doc/unirec.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/current/doc/unirec.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (show annotations)
Mon Mar 8 17:43:12 2004 UTC (16 years, 7 months ago) by dpavlin
File MIME type: text/plain
File size: 3846 byte(s)
initial import of openisis 0.9.0 vendor drop

1
2 The universal ISIS record
3
4
5
6 Users of CDS/ISIS are accustomed to stuffing not only bibliographic,
7 but all sorts of data into ISIS records. The 1994 edition of ANSI/NISO
8 Z39.2 ("Information Interchange Format" alias ISO2709), after which ISIS
9 records are modelled (more or less), contained a "reduction of references
10 to 'bibliographic' data, because the standard is used for many other types".
11
12 For example, CDS/ISIS uses a ISIS database to hold the various texts
13 for language specific versions. Various control files like the syspar,
14 FDT and FST are well suited for storage in ISIS records.
15 Probably some implementations of CDS/ISIS internally use ISIS records
16 to hold that data, but none seems to be able to read/write those to/from
17 ISIS master files, ISO2709 or a choice of textformats, including XML.
18
19
20 * nesting
21
22 But then, there is more. I already mentioned that e-mail and many
23 simple XML structures are conveniently stored in ISIS records.
24 But even data which does not all that obviously follow a tag-value-list
25 scheme can be fit easily into such a record. Consider what is happening
26 when ASN.1/BER-encoded structures are sent down the wire to a Z39.50 server,
27 for example structured Type-1 queries to locate bibliographic information:
28 They are turned into a series of tags and values. "Twain OR Clemens" is
29 sent as an "OR" field followed by two term fields valued Twain and Clemens.
30
31 We can do that same serialization trick, of course:
32 embedding one structure (i.e. record i.e. tag-value-list)
33 into another by simple inserting the fields.
34 That way we can achieve nesting of arbitrary depth no less than with XML.
35
36 One problem that comes to mind: how do we tell the boundaries?
37 - for structures with a fixed number of fields, like the "OR" node
38 having two childs, boundaries are implicit.
39 - the length (number of childs) may be given with the opening field.
40 This may be inconvenient and/or error-prone, if not computed automatically.
41 - a closing item like "</>" may be used, e.g. a reserved field tag.
42
43 These approaches are now
44 > Struct discussed in more detail.
45
46
47 Based on such a schema, not only queries can be expressed and stored
48 as records, but also formats, with proper nesting of IFs, loops and so on.
49 This approach has a couple of advantages:
50 - formats may be specified in any of a couple of external representations
51 including XML
52 - the variants of the CDS/ISIS formatting language with different
53 names for the same functions can be supported using input filters
54 - formats can be stored, retrieved and exchanged using standard means
55
56 On the other hand, the formatting language could be augmented
57 to support substructures. A straightforward and relatively easy to
58 use and implement extension would be a PASCAL-style WITH r DO.
59 The current OpenIsis bindings, especially Tcl as preferred formatting language,
60 contain such support.
61
62
63 * external representation
64
65 Besides CDS/ISIS master files and ISO2709 files,
66 there are a couple of text based formats suitable
67 to store or exchange ISIS records.
68
69 Most follow a name=value style and are using separators like '=',
70 ':' and linebreaks, with different quoting rules.
71 Among these are
72 - RFC 822 emails
73 - Java properties
74 - Windows-style .INI files
75 - character or tabulator separated values (tsv/csv)
76 (think of the TAB as subfield delimiter)
77
78 Then there are XML/HTML/SGML and finally freestyle languages,
79 like the query or formatting language, where item boundaries
80 are determined depending on context.
81
82 Conversion would typically be based on one or more FDTs,
83 mapping between names and numbers.
84 Such a mapping, when used with formats, could also enable the
85 use of symbolic field names like author instead of v24.
86
87 The "plain" representation as preferrably used by OpenIsis
88 is described in the papaer on
89 > Serialized
90 records.
91
92 -----
93 $Id: unirec.txt,v 1.6 2003/04/07 13:12:43 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26