1 |
dpavlin |
237 |
* some notes on the relation of XML and ISIS |
2 |
|
|
|
3 |
|
|
|
4 |
|
|
XML is in widespread use as a lingua franca |
5 |
|
|
for glueing software components together. |
6 |
|
|
Several tools for this can be found at xml.apache.org. |
7 |
|
|
|
8 |
|
|
What is missing here is an efficient, easy to use |
9 |
|
|
way of storing XML data. Only the most trivial cases |
10 |
|
|
are easily mapped onto the relational data model, |
11 |
|
|
which uses flat records, consisting of a fixed number |
12 |
|
|
of fields. The data structures modelled in XML |
13 |
|
|
typically have a variable number of childs. |
14 |
|
|
Hierarchical databases like ADABAS C are well suited |
15 |
|
|
and actually used by SoftwareAG in their Tamino XML DB, |
16 |
|
|
but aren't widely and freely available. |
17 |
|
|
|
18 |
|
|
|
19 |
|
|
* ISIS to XML |
20 |
|
|
|
21 |
|
|
ISIS records can be easily and canonically converted to XML. |
22 |
|
|
Anything up to the first subfield delimiter is the body (a text node), |
23 |
|
|
subfields are attributes |
24 |
|
|
(strictly XML-ish this is ok only for non-repeated subfields). |
25 |
|
|
Other special subdivisions of field content like the typical |
26 |
|
|
<key word> may split to real child nodes. |
27 |
|
|
|
28 |
|
|
The result (as generated by make pdemo) may look like: |
29 |
|
|
$ |
30 |
|
|
<isisrec id="148"> |
31 |
|
|
<v69> |
32 |
|
|
<key>Educational Psychology</key> |
33 |
|
|
<key>universities</key> |
34 |
|
|
<key>Kenya</key> |
35 |
|
|
</v69> |
36 |
|
|
<v70>Okatcha, F.M.M.O.</v70> |
37 |
|
|
<v30 a="1 p."/> |
38 |
|
|
<v24>Personal statement</v24> |
39 |
|
|
<v26 c="1976"/> |
40 |
|
|
<v12 p="Tbilisi, USSR" d="1976">Symposium on the Psychological Bases of Programmed Learning </v12> |
41 |
|
|
</isisrec> |
42 |
|
|
$ |
43 |
|
|
|
44 |
|
|
Instead of tag numbers and subfield characters, |
45 |
|
|
symbolic names from the FDT may be used. |
46 |
|
|
|
47 |
|
|
|
48 |
|
|
* XML to ISIS |
49 |
|
|
|
50 |
|
|
XML data structures can be |
51 |
|
|
easily and efficiently mapped to the data model of ISO2709. |
52 |
|
|
|
53 |
|
|
The general conversion (based on a SAX parser) works as follows: |
54 |
|
|
- when encountering an opening tag, look up it's name in the FDT. |
55 |
|
|
If there is no FDT provided, create one on the fly. |
56 |
|
|
If the FDT does not contain the tag name, |
57 |
|
|
create a new entry using tag number max(100,1+highest tag in FDT). |
58 |
|
|
Create a field using the tag number found and field value '+'. |
59 |
|
|
- when encountering an attribute, look up it's name in the |
60 |
|
|
> Meta metadata |
61 |
|
|
Create a new subfield entry if needed using code 'a' |
62 |
|
|
or 1+highest code used (for this tag). |
63 |
|
|
Append a subfield using the code found. |
64 |
|
|
- When encountering an empty tag (the current field ends with />), |
65 |
|
|
change the starting '+' to '-'. |
66 |
|
|
- When encountering a text node, add a field using tag number 0 |
67 |
|
|
with the node's body as value. |
68 |
|
|
- When encountering a closing tag, look up it's name as for opening tags, |
69 |
|
|
add a field with an empty value. |
70 |
|
|
- As additional optimization, most text nodes can be eliminated |
71 |
|
|
by using the initial value of a node to represent an immediatly |
72 |
|
|
following text node. |
73 |
|
|
|
74 |
|
|
For example look at RDF ( |
75 |
|
|
> http://www.w3.org/RDF |
76 |
|
|
, |
77 |
|
|
> http://archive.dstc.edu.au/RDU/reports/RDF-Idiot |
78 |
|
|
). |
79 |
|
|
A structure like |
80 |
|
|
$ |
81 |
|
|
<DC:Creator parseType="Resource"> |
82 |
|
|
<vCard:FN> Dr Jacky J Crystal </vCard:FN> |
83 |
|
|
<vCard:TITLE> Director </vCard:TITLE> |
84 |
|
|
<vCard:EMAIL> jacky@dstc.com.au </vCard:EMAIL> |
85 |
|
|
<vCard:ROLE> Researcher </vCard:ROLE> |
86 |
|
|
</DC:Creator> |
87 |
|
|
$ |
88 |
|
|
canonically maps to |
89 |
|
|
$ |
90 |
|
|
100 +^aResource |
91 |
|
|
101 + |
92 |
|
|
0 Dr Jacky J Crystal |
93 |
|
|
101 |
94 |
|
|
102 + |
95 |
|
|
... |
96 |
|
|
$ |
97 |
|
|
or, with text-node elimination, to |
98 |
|
|
$ |
99 |
|
|
100 +^aResource |
100 |
|
|
101 -Dr Jacky J Crystal |
101 |
|
|
102 -Director |
102 |
|
|
... |
103 |
|
|
100 |
104 |
|
|
$ |
105 |
|
|
using about half the bytes it takes to store the original. |
106 |
|
|
|
107 |
|
|
If they had made an attribute what can be an attribute |
108 |
|
|
(not substructered, not repeatable) instead of a child, |
109 |
|
|
it would read (with explicitly assigned subfield codes) |
110 |
|
|
much more efficiently like |
111 |
|
|
$ |
112 |
|
|
100 ^pResource^fDr Jacky J Crystal^tDirector^ejacky@dstc.com.au^rResearcher |
113 |
|
|
$ |
114 |
|
|
|
115 |
|
|
Also see |
116 |
|
|
> unirec |
117 |
|
|
and |
118 |
|
|
> Struct |
119 |
|
|
|
120 |
|
|
--- |
121 |
|
|
$Id: xmlisis.txt,v 1.7 2003/06/23 14:43:42 kripke Exp $ |