0.9.9e/doc/MOM.txt

About cases and trunks: La Maleta and the Malete Object Model.

*       DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT

This document describes two data structures:
-       la maleta (suitcase or Malete Array MA)
        is a flexible and lightweight two dimensional array,
        which can be represented (stored, exchanged) as
        and provides an interface to a Malete Record
-       el maletero (car boot, trunk or Malete Object MO)
        is an extended maleta, supporting a DOM-style tree of contained "objects".
        The term "object" here, like in the somewhat mislabeled DOM,
        relates to structure, not to behaviour (methods).


*       la maleta - the Malete Array

While the actual implementation of a maleta (e.g. by means of an actual
two dimensional array) is not part of this specification,
the concepts are probably easiest to understand by thinking in terms
of a Malete Record as described in
>       RecStruct
.

In the first dimension, there is a list of fields (pairs of tags and values).
Every fields value is typically structured into subfields.
The first (index 0) field's value (header) is considered special,
it typically contains some record's id a/o control information.
Other fields (body) constitute the record's contents.

A maleta is considered to be a field (it's field 0) augmented by a body.


Like any array-like data structure, the maleta uses an index expression
to address it's parts for either reading them or assigning to them.
Only values can be assigned to; tags can only be inserted or deleted.

Like the PHP array, it has a builtin cursor (in the first dimension)
for the concept of a current field.

Like the Perl array, it supports slices (addressing multiple parts at once)
in both dimensions.


Here we describe the textual representation of an index,
which implementations will typically parse into an internal representation.

The parts of an index are, optional, but in this order:
-       a field spec selecting one or more fields by tag or position
-       a subfield spec selecting one or more subfields by id or position
-       a range spec selecting an offset and length
-       a key spec selecting a key to match
Every part has an operator and value (id).
An index may address multiple fields or subfields.
Selecting both depends on the implementation supporting nested lists.
Implementations may ignore whitespace in index expressions.


The field spec uses a numerical value as tag or position.
Addressing a single field sets the cursor to that field:
-       '-'     first:
        reset to head and move to next (having tag=id, if given).
-       '+' next:
        move to the next element (having tag=id, if given) without resetting
-       (none) current:
        no change with no id or if cursor is on tag=id, else first.
-       '@' index:
        selects the ith element, using id as index (0 is head).

Addressing multiple fields:
-       '--' loop:
        loop elements having id, returning a list of the individual results.
        Without an id, the list contains alternating tags and values.
-       '++' end:
        loop at end; useful to append fields
-       '@@' values:
        loops, returning a list of values.

The subfield spec defaults to none, selecting the entire field value.
-       '^' subfield:
        selects the value (with id cut) of the subfield with this id.
        Id may also be the pseudo subfield '&amp;', selecting the tag,
        or '@', selecting the cursor position.
-       '?' test:
        returns boolean 0/1 whether the field a/o subfield (with id) exists.
-       '!' break:
        returns the field or (with id) subfield value, breaks processing else.
-       '#' position:
        with a number, selects the ith subfield value, including any id.

Addressing multiple subfields:
-       '^^' subfields:
        returns a list of subfield values (with id cut) for the given id or all.
        Without an id, the list contains alternating ids and values.
-       '##' position:
        returns a list of unmodified values (i.e. a split on subfield delimiter).

A range spec can have one or both, in that order, of the following:
-       '*' offset:
        cuts the first offset bytes (not characters)
-       '.' length:
        cuts to the first length bytes

A keyspec is part of setting the cursor, doing a next while the selected
data does not match the specified key. When used with a test or break,
it applies to the data (not the boolean result), and, with empty field spec,
returns false or breaks, instead of moving to next.
-       '==' exact:
        checks for exact match
-       '=%' prefix:
        checks for prefix match
-       '=:' contains:
        checks for substring
-       '=~' expr:
        evaluate key as regular expression (optional extension)

Index expressions are independent of any metadata.
Especially they do not know anything about fixed subfields,
but only check for the delimiter character.
Fixed subfields may be accessed using ranges.

However, a helper procedure can be set to rewrite bad expressions, e.g.
turning field and subfield names into tags, subfield identifiers and ranges.


Minimal implementation requirements:
-       tags may be limited to the range 0 to 65534, inclusive
-       position ('#'), offset ('*') and length ('.') may be limited
        to the range 0 to 255, inclusive

*       array operators

Basic operations on maletas are
-       getting a single index
        returns the value or list (empty value or list if not found)
-       getting multiple indexes
        A failing test or break stops processing.
        A positive test does not produce an output value.
        Returns a list of the values returned by each index
        (unless there is only one non-test index).

In Tcl, get is the default operation. Examples, assuming a maleta called v:
$
 v 24 ;# select the current (or first) field 24
 v 24^a ^b ;# list of a and b subfield of current field 24
 foreach {i v} [v ^^] { puts "$i=$v" } ;# list all subfields of current
 v --24 ;# list of all 24 values
 v td^width ;# helper should rewrite this to 100^w
 v -24?a:foo .2 ;# the MARC indicators of first 24 field where ^a contains foo
 $v->get("-24?a:foo", ".2"); # more verbose in PHP, Perl
 v.get("-24?a:foo .2"); // no varargs in Java, split at blanks
$

Assignment (set), like retrieval, takes any number of string parameters.
Implementations should also support passing multiple values in one
parameter as a list, maleta or serialized record.
Depending on the environment, this may require a different or overloaded
method.

An index addressing a single value (i.e. not a test) takes the next parameter
as new value. If the addressed item does not exist, it is created.
Assigning no value (there is no next parameter) deletes an item.

If multiple items are addressed, all following parameters (or the elements
of a single list parameter) are applied in turn.
Excess parameters create new items, lacking parameters delete items.


Tcl uses a '=' parameter as assignment operator, '=@' to assign from a list.
Examples:
$
 v ^a = $a ^b = $b ;# set current subfields a and b to the variables
 v --24 = foo bar baz ;# rec has now exactly 3 24 fields
 v --24 =@ {foo bar baz} ;# same
 $v->set("--24", "foo", "bar", "baz");
$

Insertion is a variant of assignment addressing newly created items.


*       el maletero - the Malete Object

A maletero (or MO) is a maleta where every field is itself a maletero,
i.e. can have a body. It's body fields are called childs.

A maletero corresponds to a region (contigous sequence of fields)
in a plain Malete record by means of counted or delimited structures.


Maleteros come in three flavours:
-       list (plain vanilla):
        The maletero behaves exactly like a maleta.
        All childs are treated as simple fields, regardless of their tag.
        The MO maps one-to-one to it's record.
        This is the most efficient mode where no complex childs are needed.
-       struct (+ strawberry, chocolate):
        Childs with non-negative tags are treated as simple.
        A child with a negative tag -n corresponds to a region spanning n fields.
        This includes one field for the child's tag and header
        and any fields it's childs correspond to in turn.
        When looping or setting the cursor,
        counted subrecords are recognized and their body is skipped over.
-       mom (with fruit and liquor):
        in this DOM-style mode, only fields with tag 0 are simple (textnodes).
        Every child with a positive tag orresponds to a delimited structure.
        An implementation may or may not support counted structures in mom mode.

A maletero provides object handles to it's parent and childs,
either by modifying the current handle or by creating new handles.
New handles can be based on a copy of the corresponding record
or use region in the same record. The latter may not be supported
by all implementations or make the objects immutable
to avoid conflicting concurrent modifications.


*       implementations

A basic implementation may provide only a maleta,
which is sufficient for traditional CDS/ISIS style record access.

A complete implementation may provide only a maletero which can be used
as maleta (like in english trunk means both car boot and suitcase).

A particularly efficient implementation may provide both separately.


The initial implementation is a Tcl extension (written in C),
optionally augmented by a Tcl module (written in Tcl).
The abstract model, however, can be similarly implemented in other languages.

---
        $Id: MOM.txt,v 1.3 2004/05/03 13:04:36 kripke Exp $
1	About cases and trunks: La Maleta and the Malete Object Model.
2
3	* DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT
4
5	This document describes two data structures:
6	- la maleta (suitcase or Malete Array MA)
7	is a flexible and lightweight two dimensional array,
8	which can be represented (stored, exchanged) as
9	and provides an interface to a Malete Record
10	- el maletero (car boot, trunk or Malete Object MO)
11	is an extended maleta, supporting a DOM-style tree of contained "objects".
12	The term "object" here, like in the somewhat mislabeled DOM,
13	relates to structure, not to behaviour (methods).
14
15
16	* la maleta - the Malete Array
17
18	While the actual implementation of a maleta (e.g. by means of an actual
19	two dimensional array) is not part of this specification,
20	the concepts are probably easiest to understand by thinking in terms
21	of a Malete Record as described in
22	> RecStruct
23	.
24
25	In the first dimension, there is a list of fields (pairs of tags and values).
26	Every fields value is typically structured into subfields.
27	The first (index 0) field's value (header) is considered special,
28	it typically contains some record's id a/o control information.
29	Other fields (body) constitute the record's contents.
30
31	A maleta is considered to be a field (it's field 0) augmented by a body.
32
33
34	Like any array-like data structure, the maleta uses an index expression
35	to address it's parts for either reading them or assigning to them.
36	Only values can be assigned to; tags can only be inserted or deleted.
37
38	Like the PHP array, it has a builtin cursor (in the first dimension)
39	for the concept of a current field.
40
41	Like the Perl array, it supports slices (addressing multiple parts at once)
42	in both dimensions.
43
44
45	Here we describe the textual representation of an index,
46	which implementations will typically parse into an internal representation.
47
48	The parts of an index are, optional, but in this order:
49	- a field spec selecting one or more fields by tag or position
50	- a subfield spec selecting one or more subfields by id or position
51	- a range spec selecting an offset and length
52	- a key spec selecting a key to match
53	Every part has an operator and value (id).
54	An index may address multiple fields or subfields.
55	Selecting both depends on the implementation supporting nested lists.
56	Implementations may ignore whitespace in index expressions.
57
58
59	The field spec uses a numerical value as tag or position.
60	Addressing a single field sets the cursor to that field:
61	- '-' first:
62	reset to head and move to next (having tag=id, if given).
63	- '+' next:
64	move to the next element (having tag=id, if given) without resetting
65	- (none) current:
66	no change with no id or if cursor is on tag=id, else first.
67	- '@' index:
68	selects the ith element, using id as index (0 is head).
69
70	Addressing multiple fields:
71	- '--' loop:
72	loop elements having id, returning a list of the individual results.
73	Without an id, the list contains alternating tags and values.
74	- '++' end:
75	loop at end; useful to append fields
76	- '@@' values:
77	loops, returning a list of values.
78
79	The subfield spec defaults to none, selecting the entire field value.
80	- '^' subfield:
81	selects the value (with id cut) of the subfield with this id.
82	Id may also be the pseudo subfield '&', selecting the tag,
83	or '@', selecting the cursor position.
84	- '?' test:
85	returns boolean 0/1 whether the field a/o subfield (with id) exists.
86	- '!' break:
87	returns the field or (with id) subfield value, breaks processing else.
88	- '#' position:
89	with a number, selects the ith subfield value, including any id.
90
91	Addressing multiple subfields:
92	- '^^' subfields:
93	returns a list of subfield values (with id cut) for the given id or all.
94	Without an id, the list contains alternating ids and values.
95	- '##' position:
96	returns a list of unmodified values (i.e. a split on subfield delimiter).
97
98	A range spec can have one or both, in that order, of the following:
99	- '*' offset:
100	cuts the first offset bytes (not characters)
101	- '.' length:
102	cuts to the first length bytes
103
104	A keyspec is part of setting the cursor, doing a next while the selected
105	data does not match the specified key. When used with a test or break,
106	it applies to the data (not the boolean result), and, with empty field spec,
107	returns false or breaks, instead of moving to next.
108	- '==' exact:
109	checks for exact match
110	- '=%' prefix:
111	checks for prefix match
112	- '=:' contains:
113	checks for substring
114	- '=~' expr:
115	evaluate key as regular expression (optional extension)
116
117	Index expressions are independent of any metadata.
118	Especially they do not know anything about fixed subfields,
119	but only check for the delimiter character.
120	Fixed subfields may be accessed using ranges.
121
122	However, a helper procedure can be set to rewrite bad expressions, e.g.
123	turning field and subfield names into tags, subfield identifiers and ranges.
124
125
126	Minimal implementation requirements:
127	- tags may be limited to the range 0 to 65534, inclusive
128	- position ('#'), offset ('*') and length ('.') may be limited
129	to the range 0 to 255, inclusive
130
131	* array operators
132
133	Basic operations on maletas are
134	- getting a single index
135	returns the value or list (empty value or list if not found)
136	- getting multiple indexes
137	A failing test or break stops processing.
138	A positive test does not produce an output value.
139	Returns a list of the values returned by each index
140	(unless there is only one non-test index).
141
142	In Tcl, get is the default operation. Examples, assuming a maleta called v:
143	$
144	v 24 ;# select the current (or first) field 24
145	v 24^a ^b ;# list of a and b subfield of current field 24
146	foreach {i v} [v ^^] { puts "$i=$v" } ;# list all subfields of current
147	v --24 ;# list of all 24 values
148	v td^width ;# helper should rewrite this to 100^w
149	v -24?a:foo .2 ;# the MARC indicators of first 24 field where ^a contains foo
150	$v->get("-24?a:foo", ".2"); # more verbose in PHP, Perl
151	v.get("-24?a:foo .2"); // no varargs in Java, split at blanks
152	$
153
154	Assignment (set), like retrieval, takes any number of string parameters.
155	Implementations should also support passing multiple values in one
156	parameter as a list, maleta or serialized record.
157	Depending on the environment, this may require a different or overloaded
158	method.
159
160	An index addressing a single value (i.e. not a test) takes the next parameter
161	as new value. If the addressed item does not exist, it is created.
162	Assigning no value (there is no next parameter) deletes an item.
163
164	If multiple items are addressed, all following parameters (or the elements
165	of a single list parameter) are applied in turn.
166	Excess parameters create new items, lacking parameters delete items.
167
168
169	Tcl uses a '=' parameter as assignment operator, '=@' to assign from a list.
170	Examples:
171	$
172	v ^a = $a ^b = $b ;# set current subfields a and b to the variables
173	v --24 = foo bar baz ;# rec has now exactly 3 24 fields
174	v --24 =@ {foo bar baz} ;# same
175	$v->set("--24", "foo", "bar", "baz");
176	$
177
178	Insertion is a variant of assignment addressing newly created items.
179
180
181	* el maletero - the Malete Object
182
183	A maletero (or MO) is a maleta where every field is itself a maletero,
184	i.e. can have a body. It's body fields are called childs.
185
186	A maletero corresponds to a region (contigous sequence of fields)
187	in a plain Malete record by means of counted or delimited structures.
188
189
190	Maleteros come in three flavours:
191	- list (plain vanilla):
192	The maletero behaves exactly like a maleta.
193	All childs are treated as simple fields, regardless of their tag.
194	The MO maps one-to-one to it's record.
195	This is the most efficient mode where no complex childs are needed.
196	- struct (+ strawberry, chocolate):
197	Childs with non-negative tags are treated as simple.
198	A child with a negative tag -n corresponds to a region spanning n fields.
199	This includes one field for the child's tag and header
200	and any fields it's childs correspond to in turn.
201	When looping or setting the cursor,
202	counted subrecords are recognized and their body is skipped over.
203	- mom (with fruit and liquor):
204	in this DOM-style mode, only fields with tag 0 are simple (textnodes).
205	Every child with a positive tag orresponds to a delimited structure.
206	An implementation may or may not support counted structures in mom mode.
207
208	A maletero provides object handles to it's parent and childs,
209	either by modifying the current handle or by creating new handles.
210	New handles can be based on a copy of the corresponding record
211	or use region in the same record. The latter may not be supported
212	by all implementations or make the objects immutable
213	to avoid conflicting concurrent modifications.
214
215
216	* implementations
217
218	A basic implementation may provide only a maleta,
219	which is sufficient for traditional CDS/ISIS style record access.
220
221	A complete implementation may provide only a maletero which can be used
222	as maleta (like in english trunk means both car boot and suitcase).
223
224	A particularly efficient implementation may provide both separately.
225
226
227	The initial implementation is a Tcl extension (written in C),
228	optionally augmented by a Tcl module (written in Tcl).
229	The abstract model, however, can be similarly implemented in other languages.
230
231	---
232	$Id: MOM.txt,v 1.3 2004/05/03 13:04:36 kripke Exp $