1 |
dpavlin |
604 |
About cases and trunks: La Maleta and the Malete Object Model. |
2 |
|
|
|
3 |
|
|
* DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT |
4 |
|
|
|
5 |
|
|
This document describes two data structures: |
6 |
|
|
- la maleta (suitcase or Malete Array MA) |
7 |
|
|
is a flexible and lightweight two dimensional array, |
8 |
|
|
which can be represented (stored, exchanged) as |
9 |
|
|
and provides an interface to a Malete Record |
10 |
|
|
- el maletero (car boot, trunk or Malete Object MO) |
11 |
|
|
is an extended maleta, supporting a DOM-style tree of contained "objects". |
12 |
|
|
The term "object" here, like in the somewhat mislabeled DOM, |
13 |
|
|
relates to structure, not to behaviour (methods). |
14 |
|
|
|
15 |
|
|
|
16 |
|
|
* la maleta - the Malete Array |
17 |
|
|
|
18 |
|
|
While the actual implementation of a maleta (e.g. by means of an actual |
19 |
|
|
two dimensional array) is not part of this specification, |
20 |
|
|
the concepts are probably easiest to understand by thinking in terms |
21 |
|
|
of a Malete Record as described in |
22 |
|
|
> RecStruct |
23 |
|
|
. |
24 |
|
|
|
25 |
|
|
In the first dimension, there is a list of fields (pairs of tags and values). |
26 |
|
|
Every fields value is typically structured into subfields. |
27 |
|
|
The first (index 0) field's value (header) is considered special, |
28 |
|
|
it typically contains some record's id a/o control information. |
29 |
|
|
Other fields (body) constitute the record's contents. |
30 |
|
|
|
31 |
|
|
A maleta is considered to be a field (it's field 0) augmented by a body. |
32 |
|
|
|
33 |
|
|
|
34 |
|
|
Like any array-like data structure, the maleta uses an index expression |
35 |
|
|
to address it's parts for either reading them or assigning to them. |
36 |
|
|
Only values can be assigned to; tags can only be inserted or deleted. |
37 |
|
|
|
38 |
|
|
Like the PHP array, it has a builtin cursor (in the first dimension) |
39 |
|
|
for the concept of a current field. |
40 |
|
|
|
41 |
|
|
Like the Perl array, it supports slices (addressing multiple parts at once) |
42 |
|
|
in both dimensions. |
43 |
|
|
|
44 |
|
|
|
45 |
|
|
Here we describe the textual representation of an index, |
46 |
|
|
which implementations will typically parse into an internal representation. |
47 |
|
|
|
48 |
|
|
The parts of an index are, optional, but in this order: |
49 |
|
|
- a field spec selecting one or more fields by tag or position |
50 |
|
|
- a subfield spec selecting one or more subfields by id or position |
51 |
|
|
- a range spec selecting an offset and length |
52 |
|
|
- a key spec selecting a key to match |
53 |
|
|
Every part has an operator and value (id). |
54 |
|
|
An index may address multiple fields or subfields. |
55 |
|
|
Selecting both depends on the implementation supporting nested lists. |
56 |
|
|
Implementations may ignore whitespace in index expressions. |
57 |
|
|
|
58 |
|
|
|
59 |
|
|
The field spec uses a numerical value as tag or position. |
60 |
|
|
Addressing a single field sets the cursor to that field: |
61 |
|
|
- '-' first: |
62 |
|
|
reset to head and move to next (having tag=id, if given). |
63 |
|
|
- '+' next: |
64 |
|
|
move to the next element (having tag=id, if given) without resetting |
65 |
|
|
- (none) current: |
66 |
|
|
no change with no id or if cursor is on tag=id, else first. |
67 |
|
|
- '@' index: |
68 |
|
|
selects the ith element, using id as index (0 is head). |
69 |
|
|
|
70 |
|
|
Addressing multiple fields: |
71 |
|
|
- '--' loop: |
72 |
|
|
loop elements having id, returning a list of the individual results. |
73 |
|
|
Without an id, the list contains alternating tags and values. |
74 |
|
|
- '++' end: |
75 |
|
|
loop at end; useful to append fields |
76 |
|
|
- '@@' values: |
77 |
|
|
loops, returning a list of values. |
78 |
|
|
|
79 |
|
|
The subfield spec defaults to none, selecting the entire field value. |
80 |
|
|
- '^' subfield: |
81 |
|
|
selects the value (with id cut) of the subfield with this id. |
82 |
|
|
Id may also be the pseudo subfield '&', selecting the tag, |
83 |
|
|
or '@', selecting the cursor position. |
84 |
|
|
- '?' test: |
85 |
|
|
returns boolean 0/1 whether the field a/o subfield (with id) exists. |
86 |
|
|
- '!' break: |
87 |
|
|
returns the field or (with id) subfield value, breaks processing else. |
88 |
|
|
- '#' position: |
89 |
|
|
with a number, selects the ith subfield value, including any id. |
90 |
|
|
|
91 |
|
|
Addressing multiple subfields: |
92 |
|
|
- '^^' subfields: |
93 |
|
|
returns a list of subfield values (with id cut) for the given id or all. |
94 |
|
|
Without an id, the list contains alternating ids and values. |
95 |
|
|
- '##' position: |
96 |
|
|
returns a list of unmodified values (i.e. a split on subfield delimiter). |
97 |
|
|
|
98 |
|
|
A range spec can have one or both, in that order, of the following: |
99 |
|
|
- '*' offset: |
100 |
|
|
cuts the first offset bytes (not characters) |
101 |
|
|
- '.' length: |
102 |
|
|
cuts to the first length bytes |
103 |
|
|
|
104 |
|
|
A keyspec is part of setting the cursor, doing a next while the selected |
105 |
|
|
data does not match the specified key. When used with a test or break, |
106 |
|
|
it applies to the data (not the boolean result), and, with empty field spec, |
107 |
|
|
returns false or breaks, instead of moving to next. |
108 |
|
|
- '==' exact: |
109 |
|
|
checks for exact match |
110 |
|
|
- '=%' prefix: |
111 |
|
|
checks for prefix match |
112 |
|
|
- '=:' contains: |
113 |
|
|
checks for substring |
114 |
|
|
- '=~' expr: |
115 |
|
|
evaluate key as regular expression (optional extension) |
116 |
|
|
|
117 |
|
|
Index expressions are independent of any metadata. |
118 |
|
|
Especially they do not know anything about fixed subfields, |
119 |
|
|
but only check for the delimiter character. |
120 |
|
|
Fixed subfields may be accessed using ranges. |
121 |
|
|
|
122 |
|
|
However, a helper procedure can be set to rewrite bad expressions, e.g. |
123 |
|
|
turning field and subfield names into tags, subfield identifiers and ranges. |
124 |
|
|
|
125 |
|
|
|
126 |
|
|
Minimal implementation requirements: |
127 |
|
|
- tags may be limited to the range 0 to 65534, inclusive |
128 |
|
|
- position ('#'), offset ('*') and length ('.') may be limited |
129 |
|
|
to the range 0 to 255, inclusive |
130 |
|
|
|
131 |
|
|
* array operators |
132 |
|
|
|
133 |
|
|
Basic operations on maletas are |
134 |
|
|
- getting a single index |
135 |
|
|
returns the value or list (empty value or list if not found) |
136 |
|
|
- getting multiple indexes |
137 |
|
|
A failing test or break stops processing. |
138 |
|
|
A positive test does not produce an output value. |
139 |
|
|
Returns a list of the values returned by each index |
140 |
|
|
(unless there is only one non-test index). |
141 |
|
|
|
142 |
|
|
In Tcl, get is the default operation. Examples, assuming a maleta called v: |
143 |
|
|
$ |
144 |
|
|
v 24 ;# select the current (or first) field 24 |
145 |
|
|
v 24^a ^b ;# list of a and b subfield of current field 24 |
146 |
|
|
foreach {i v} [v ^^] { puts "$i=$v" } ;# list all subfields of current |
147 |
|
|
v --24 ;# list of all 24 values |
148 |
|
|
v td^width ;# helper should rewrite this to 100^w |
149 |
|
|
v -24?a:foo .2 ;# the MARC indicators of first 24 field where ^a contains foo |
150 |
|
|
$v->get("-24?a:foo", ".2"); # more verbose in PHP, Perl |
151 |
|
|
v.get("-24?a:foo .2"); // no varargs in Java, split at blanks |
152 |
|
|
$ |
153 |
|
|
|
154 |
|
|
Assignment (set), like retrieval, takes any number of string parameters. |
155 |
|
|
Implementations should also support passing multiple values in one |
156 |
|
|
parameter as a list, maleta or serialized record. |
157 |
|
|
Depending on the environment, this may require a different or overloaded |
158 |
|
|
method. |
159 |
|
|
|
160 |
|
|
An index addressing a single value (i.e. not a test) takes the next parameter |
161 |
|
|
as new value. If the addressed item does not exist, it is created. |
162 |
|
|
Assigning no value (there is no next parameter) deletes an item. |
163 |
|
|
|
164 |
|
|
If multiple items are addressed, all following parameters (or the elements |
165 |
|
|
of a single list parameter) are applied in turn. |
166 |
|
|
Excess parameters create new items, lacking parameters delete items. |
167 |
|
|
|
168 |
|
|
|
169 |
|
|
Tcl uses a '=' parameter as assignment operator, '=@' to assign from a list. |
170 |
|
|
Examples: |
171 |
|
|
$ |
172 |
|
|
v ^a = $a ^b = $b ;# set current subfields a and b to the variables |
173 |
|
|
v --24 = foo bar baz ;# rec has now exactly 3 24 fields |
174 |
|
|
v --24 =@ {foo bar baz} ;# same |
175 |
|
|
$v->set("--24", "foo", "bar", "baz"); |
176 |
|
|
$ |
177 |
|
|
|
178 |
|
|
Insertion is a variant of assignment addressing newly created items. |
179 |
|
|
|
180 |
|
|
|
181 |
|
|
* el maletero - the Malete Object |
182 |
|
|
|
183 |
|
|
A maletero (or MO) is a maleta where every field is itself a maletero, |
184 |
|
|
i.e. can have a body. It's body fields are called childs. |
185 |
|
|
|
186 |
|
|
A maletero corresponds to a region (contigous sequence of fields) |
187 |
|
|
in a plain Malete record by means of counted or delimited structures. |
188 |
|
|
|
189 |
|
|
|
190 |
|
|
Maleteros come in three flavours: |
191 |
|
|
- list (plain vanilla): |
192 |
|
|
The maletero behaves exactly like a maleta. |
193 |
|
|
All childs are treated as simple fields, regardless of their tag. |
194 |
|
|
The MO maps one-to-one to it's record. |
195 |
|
|
This is the most efficient mode where no complex childs are needed. |
196 |
|
|
- struct (+ strawberry, chocolate): |
197 |
|
|
Childs with non-negative tags are treated as simple. |
198 |
|
|
A child with a negative tag -n corresponds to a region spanning n fields. |
199 |
|
|
This includes one field for the child's tag and header |
200 |
|
|
and any fields it's childs correspond to in turn. |
201 |
|
|
When looping or setting the cursor, |
202 |
|
|
counted subrecords are recognized and their body is skipped over. |
203 |
|
|
- mom (with fruit and liquor): |
204 |
|
|
in this DOM-style mode, only fields with tag 0 are simple (textnodes). |
205 |
|
|
Every child with a positive tag orresponds to a delimited structure. |
206 |
|
|
An implementation may or may not support counted structures in mom mode. |
207 |
|
|
|
208 |
|
|
A maletero provides object handles to it's parent and childs, |
209 |
|
|
either by modifying the current handle or by creating new handles. |
210 |
|
|
New handles can be based on a copy of the corresponding record |
211 |
|
|
or use region in the same record. The latter may not be supported |
212 |
|
|
by all implementations or make the objects immutable |
213 |
|
|
to avoid conflicting concurrent modifications. |
214 |
|
|
|
215 |
|
|
|
216 |
|
|
* implementations |
217 |
|
|
|
218 |
|
|
A basic implementation may provide only a maleta, |
219 |
|
|
which is sufficient for traditional CDS/ISIS style record access. |
220 |
|
|
|
221 |
|
|
A complete implementation may provide only a maletero which can be used |
222 |
|
|
as maleta (like in english trunk means both car boot and suitcase). |
223 |
|
|
|
224 |
|
|
A particularly efficient implementation may provide both separately. |
225 |
|
|
|
226 |
|
|
|
227 |
|
|
The initial implementation is a Tcl extension (written in C), |
228 |
|
|
optionally augmented by a Tcl module (written in Tcl). |
229 |
|
|
The abstract model, however, can be similarly implemented in other languages. |
230 |
|
|
|
231 |
|
|
--- |
232 |
|
|
$Id: MOM.txt,v 1.3 2004/05/03 13:04:36 kripke Exp $ |