1 |
* some notes on the relation of XML and ISIS |
2 |
|
3 |
|
4 |
XML is in widespread use as a lingua franca |
5 |
for glueing software components together. |
6 |
Several tools for this can be found at xml.apache.org. |
7 |
|
8 |
What is missing here is an efficient, easy to use |
9 |
way of storing XML data. Only the most trivial cases |
10 |
are easily mapped onto the relational data model, |
11 |
which uses flat records, consisting of a fixed number |
12 |
of fields. The data structures modelled in XML |
13 |
typically have a variable number of childs. |
14 |
Hierarchical databases like ADABAS C are well suited |
15 |
and actually used by SoftwareAG in their Tamino XML DB, |
16 |
but aren't widely and freely available. |
17 |
|
18 |
|
19 |
* ISIS to XML |
20 |
|
21 |
ISIS records can be easily and canonically converted to XML. |
22 |
Anything up to the first subfield delimiter is the body (a text node), |
23 |
subfields are attributes |
24 |
(strictly XML-ish this is ok only for non-repeated subfields). |
25 |
Other special subdivisions of field content like the typical |
26 |
<key word> may split to real child nodes. |
27 |
|
28 |
The result (as generated by make pdemo) may look like: |
29 |
$ |
30 |
<isisrec id="148"> |
31 |
<v69> |
32 |
<key>Educational Psychology</key> |
33 |
<key>universities</key> |
34 |
<key>Kenya</key> |
35 |
</v69> |
36 |
<v70>Okatcha, F.M.M.O.</v70> |
37 |
<v30 a="1 p."/> |
38 |
<v24>Personal statement</v24> |
39 |
<v26 c="1976"/> |
40 |
<v12 p="Tbilisi, USSR" d="1976">Symposium on the Psychological Bases of Programmed Learning </v12> |
41 |
</isisrec> |
42 |
$ |
43 |
|
44 |
Instead of tag numbers and subfield characters, |
45 |
symbolic names from the FDT may be used. |
46 |
|
47 |
|
48 |
* XML to ISIS |
49 |
|
50 |
XML data structures can be |
51 |
easily and efficiently mapped to the data model of ISO2709. |
52 |
|
53 |
The general conversion (based on a SAX parser) works as follows: |
54 |
- when encountering an opening tag, look up it's name in the FDT. |
55 |
If there is no FDT provided, create one on the fly. |
56 |
If the FDT does not contain the tag name, |
57 |
create a new entry using tag number max(100,1+highest tag in FDT). |
58 |
Create a field using the tag number found and field value '+'. |
59 |
- when encountering an attribute, look up it's name in the |
60 |
> Meta metadata |
61 |
Create a new subfield entry if needed using code 'a' |
62 |
or 1+highest code used (for this tag). |
63 |
Append a subfield using the code found. |
64 |
- When encountering an empty tag (the current field ends with />), |
65 |
change the starting '+' to '-'. |
66 |
- When encountering a text node, add a field using tag number 0 |
67 |
with the node's body as value. |
68 |
- When encountering a closing tag, look up it's name as for opening tags, |
69 |
add a field with an empty value. |
70 |
- As additional optimization, most text nodes can be eliminated |
71 |
by using the initial value of a node to represent an immediatly |
72 |
following text node. |
73 |
|
74 |
For example look at RDF ( |
75 |
> http://www.w3.org/RDF |
76 |
, |
77 |
> http://archive.dstc.edu.au/RDU/reports/RDF-Idiot |
78 |
). |
79 |
A structure like |
80 |
$ |
81 |
<DC:Creator parseType="Resource"> |
82 |
<vCard:FN> Dr Jacky J Crystal </vCard:FN> |
83 |
<vCard:TITLE> Director </vCard:TITLE> |
84 |
<vCard:EMAIL> jacky@dstc.com.au </vCard:EMAIL> |
85 |
<vCard:ROLE> Researcher </vCard:ROLE> |
86 |
</DC:Creator> |
87 |
$ |
88 |
canonically maps to |
89 |
$ |
90 |
100 +^aResource |
91 |
101 + |
92 |
0 Dr Jacky J Crystal |
93 |
101 |
94 |
102 + |
95 |
... |
96 |
$ |
97 |
or, with text-node elimination, to |
98 |
$ |
99 |
100 +^aResource |
100 |
101 -Dr Jacky J Crystal |
101 |
102 -Director |
102 |
... |
103 |
100 |
104 |
$ |
105 |
using about half the bytes it takes to store the original. |
106 |
|
107 |
If they had made an attribute what can be an attribute |
108 |
(not substructered, not repeatable) instead of a child, |
109 |
it would read (with explicitly assigned subfield codes) |
110 |
much more efficiently like |
111 |
$ |
112 |
100 ^pResource^fDr Jacky J Crystal^tDirector^ejacky@dstc.com.au^rResearcher |
113 |
$ |
114 |
|
115 |
Also see |
116 |
> unirec |
117 |
and |
118 |
> Struct |
119 |
|
120 |
--- |
121 |
$Id: xmlisis.txt,v 1.7 2003/06/23 14:43:42 kripke Exp $ |