current/doc/Meta.txt

traditional field definition and OpenIsis data definition metadata.


*       the traditional FDT

According to the CDS/ISIS manual,
the field definition table is displayed like
$
44  Serie                          300 X R vz
$
and contains for each field tag:
-       a field description (up to 30 characters)
-       maximum length (1 to 1650)
-       field type:
        X for alphanum, A for strictly alpha (not including space),
        N for numeric (decimal digits), P for pattern (see below).
-       repeatability indicator
        R meaning field is repeatable, N else
-       format (subfield list or pattern)
        For a P field, this gives a COBOL-PIC-style pattern
        consisting of X (alnum), A (alpha), 9 (num) and literal letters,
        e.g. 99-999/AA to allow for input like 35-674/XE.
        For other field types, this is a list of legal subfields,
        e.g. vz to allow for input like foo^vbar^zbaz.
        Patterns are not supported by winisis as of Version 1.4.

The actual file format looks like
$
Series                        vz                  44 300 0 1
$
which is 30 characters name, 20 characters subfield/pattern,
the field tag, maximum length, a number encoding the type
(0 alphanum, 1 alpha, 2 numeric, 3 pattern),
and a number 1 for repeatable / 0 for non repeatable fields.


Moreover, the FDT defines the available
worksheets W (.fmt), printformats F (.pft) and field selections S (.fst).
Each such definition is on a line starting with the
key letter and a colon, and followed by (blank padded)
6 character fields of file basenames.
$
F:THES  THES1 THES2 
$


*       principles of OpenIsis metadata

Definition of data is done in the database's
>       Options
record, rather than in a separate file.
Data definition is aimed at being quite general,
supporting a wide range of formats according to different application needs.
It includes several features that are also found in IsisMARC's FDT21
by Ernesto Spinak, but uses substructures rather than a separate database.


Data definition does not only define fields, but also subfields,
subrecords, code values (enums) and types in a uniform approach.
Basically, for every data element,
there is one (heavily structured) field describing the element,
which may, however, refer to or be referred to by other fields.


It has the following subfields, separated by TAB:
-       i id
        numerical tag of the field or type described
-       s sub
        subfield identifier of the element defined (repeatable).
        Not present in a (top level) field definition.
        Number base 36 (i.e. 0..9,a=10,b=11..z=35).
        If the subid does not start with a number base 36,
        the element can be referred to by it's position among the subfield
        declarations for the same parent (in base 36, starting from 0).
        If there is a number, not followed by any characters,
        the subfield is identified by this number.
        If subid is empty, or the number is followed by a ')',
        the field is unidentified in it's parent.
        If any other character like '=', the field is identified by it's name,
        delimited by this character (or any single non-id char).
-       e       end
        end of this element (i.e., for a toplevel field, the initial part).
        Consists of a leading delimiter char and some optional flags.
        If empty, is the standard subfield delimiter (a TAB).
        If absent, and the field has fixed length (type 3 or 4), it has
        no delimiting character, but just stops after it's length.
        Else a subfield inherits it's parents end (defaulting to TAB),
        and for a toplevel there are no subfields (you may still define
        subfields, but the field value itself will include them).
        Traditional ISIS data should use e^.
        An initial numerical value (decimal digits) is interpreted as ASCII code
        of the delimiter char (TAB is 9).
        If end contains a (literal) blank, any blanks after the delimiter
        are consumed.
        If end contains a quote ("), the delimiter is not recognized within quotes.
        If end contains a backslash (\), the backslash character can be used
        to escape the delimiter, a quote (if spec) and itself.
-       n name
        technical name of the element, preferrably in english.
        Must be a lowercase identifier (matching regular expression [a-z_][a-z_0-9]*)
        unique throughout this data definition.
-       b base
        basetype. tag or name of element to copy definitions from.
-       t type
        numerical primary type of data, describing the initial part of the element
        (up to the first subfield delimiter, if any). default is 0.
-       l length
        numerical (max) length. Absent = ~ = any.
        Note that 0 does NOT mean any, but empty, i.e. the element is a flag.
        For fixed subrecords, this gives the number of childs.
        For other types, it is the length in characters (not bytes!).
        For types 3 and 4, this is a fixed length.
-       f format
        format pattern, highly dependent on type.
        if empty, indicates fixed length as given by l.
        for type 3, if the format is shorter than l,
        the last character is repeated to fill length l.
        if nonempty, can be used to imply type 3 and a fixed length.
        f is repeatable, allowing alternative formats.
        for subrecord( header)s, this gives a legal child as
        tag_or_name[,[min][,max]],
        where min defaults to 0 and max to unlimited.
        for most other types, this is the traditional pattern format (see above).
-       m min
        Minimum # of occurences. Absent = 0. Empty = 1.
-       r       repeat
        Maximum # of occurences. Absent = 1. Empty = ~ = any.
        Use r0 for fields that should not directly be included in their parent,
        but are referenced as basetypes or childs.
-       v       value
        default value, code for enums
-       x xref
        crossreference for field (see below) (repeatable)
-       d description
        descriptive text in the database's lead language.
        translations are maintained separatly.


*       types of fields

| code | C-name | name | description
| 0    | FTX    | any | actually any characters
| 1    | FTA    | alpha | STRICTLY alpha
| 2    | FTN    | numeric | only digits or signs (+-)
| 3    | FTP    | pattern | given by COBOL-style pattern
| 4    | FTB    | boolean | 0 or 1 (was 13)
| 5    | FTE    | enum | one of the codes listed for this field (was 12)
| 8    | FTI    | iso | alphanum using delimiter ASCII 31 (obsolete, was 10)
| 9    | FTT    | table | alphanum using delimiter ASCII 9 (TAB) (obsolete, was 14)
| 13   | FTO    | operator | fixed subrecord
| 14   | FTR    | record | embraced subrecord (structure)
| 15   | FTS    | sequence | counted subrecord (array)
| 16   | FTV    | value | an enum value


*       field names

It is common practice to use long field descriptions
like "Corporate Bodies" or "Govt. Publications No.",
which may be nice looking but are not well suited for technical use.

Therefore the OpenIsis metadata contains an extra field name
in it's metadata record.
When sourcing metadata from an FDT file,
OpenIsis will derive field names by lowercasing the descriptions
and replacing all runs of non-alphanums by a single '_',
yielding "govt_publications_no".


*       subrecords

>       Struct  subrecords
are introduced by one of the types 13 to 15.
All those introducing fields may contain any subfields.

Type 14 records may (after the leading + or -) also contain an
initial any string, whose length may be restricted to the given length.
(Typically used to hold an initial textnode's contents).


*       crossreferences

Crossreferences can be used for several purposes, including
-       specifying referential integrity constraints
-       especially authority, coding and terminology relations
-       specifying inheritance relations in the
>       PatchWork

However, all these purposes are more or less the same,
depending just on how you look at it. Any flavour of reference may be used
-       on data entry,
        to ensure that a reference can be resolved according to a given cardinality
-       on data retrieval,
        to enrich the data of a refering record by it's refered-to records

Basically, the contents of an element for which a crossref is defined,
may, should or must be either an MFN (row number) of a record or,
after slight modification, a (prefix of a) valid index entry,
in both cases usually in another database (table).


The crossref for a field specifies, separated by semicolon ';',
-       database of reference
-       cardinality as used in regular expressions
        '*' for any, '+' for at least one, '?' at most one, or min,max.
        Empty means exactly one.
-       prefix for index lookup
        Anything after the second ';'

If a value found in the index is delimited by the same delimiter
as the item in question (TAB strongly recommended),
it is considered a match and the whole index entry
considered the field of reference.
Else, the index entry's record is looked up and whatever is the entry's
tag is the field of reference.

Thus, resolving the reference, is largely governed by the index
(and you can do all sort of tricks there), not by the referer.


---
        $Id: Meta.txt,v 1.8 2003/06/30 09:48:40 kripke Exp $
1	traditional field definition and OpenIsis data definition metadata.
2
3
4	* the traditional FDT
5
6	According to the CDS/ISIS manual,
7	the field definition table is displayed like
8	$
9	44 Serie 300 X R vz
10	$
11	and contains for each field tag:
12	- a field description (up to 30 characters)
13	- maximum length (1 to 1650)
14	- field type:
15	X for alphanum, A for strictly alpha (not including space),
16	N for numeric (decimal digits), P for pattern (see below).
17	- repeatability indicator
18	R meaning field is repeatable, N else
19	- format (subfield list or pattern)
20	For a P field, this gives a COBOL-PIC-style pattern
21	consisting of X (alnum), A (alpha), 9 (num) and literal letters,
22	e.g. 99-999/AA to allow for input like 35-674/XE.
23	For other field types, this is a list of legal subfields,
24	e.g. vz to allow for input like foo^vbar^zbaz.
25	Patterns are not supported by winisis as of Version 1.4.
26
27	The actual file format looks like
28	$
29	Series vz 44 300 0 1
30	$
31	which is 30 characters name, 20 characters subfield/pattern,
32	the field tag, maximum length, a number encoding the type
33	(0 alphanum, 1 alpha, 2 numeric, 3 pattern),
34	and a number 1 for repeatable / 0 for non repeatable fields.
35
36
37	Moreover, the FDT defines the available
38	worksheets W (.fmt), printformats F (.pft) and field selections S (.fst).
39	Each such definition is on a line starting with the
40	key letter and a colon, and followed by (blank padded)
41	6 character fields of file basenames.
42	$
43	F:THES THES1 THES2
44	$
45
46
47	* principles of OpenIsis metadata
48
49	Definition of data is done in the database's
50	> Options
51	record, rather than in a separate file.
52	Data definition is aimed at being quite general,
53	supporting a wide range of formats according to different application needs.
54	It includes several features that are also found in IsisMARC's FDT21
55	by Ernesto Spinak, but uses substructures rather than a separate database.
56
57
58	Data definition does not only define fields, but also subfields,
59	subrecords, code values (enums) and types in a uniform approach.
60	Basically, for every data element,
61	there is one (heavily structured) field describing the element,
62	which may, however, refer to or be referred to by other fields.
63
64
65	It has the following subfields, separated by TAB:
66	- i id
67	numerical tag of the field or type described
68	- s sub
69	subfield identifier of the element defined (repeatable).
70	Not present in a (top level) field definition.
71	Number base 36 (i.e. 0..9,a=10,b=11..z=35).
72	If the subid does not start with a number base 36,
73	the element can be referred to by it's position among the subfield
74	declarations for the same parent (in base 36, starting from 0).
75	If there is a number, not followed by any characters,
76	the subfield is identified by this number.
77	If subid is empty, or the number is followed by a ')',
78	the field is unidentified in it's parent.
79	If any other character like '=', the field is identified by it's name,
80	delimited by this character (or any single non-id char).
81	- e end
82	end of this element (i.e., for a toplevel field, the initial part).
83	Consists of a leading delimiter char and some optional flags.
84	If empty, is the standard subfield delimiter (a TAB).
85	If absent, and the field has fixed length (type 3 or 4), it has
86	no delimiting character, but just stops after it's length.
87	Else a subfield inherits it's parents end (defaulting to TAB),
88	and for a toplevel there are no subfields (you may still define
89	subfields, but the field value itself will include them).
90	Traditional ISIS data should use e^.
91	An initial numerical value (decimal digits) is interpreted as ASCII code
92	of the delimiter char (TAB is 9).
93	If end contains a (literal) blank, any blanks after the delimiter
94	are consumed.
95	If end contains a quote ("), the delimiter is not recognized within quotes.
96	If end contains a backslash (\), the backslash character can be used
97	to escape the delimiter, a quote (if spec) and itself.
98	- n name
99	technical name of the element, preferrably in english.
100	Must be a lowercase identifier (matching regular expression [a-z_][a-z_0-9]*)
101	unique throughout this data definition.
102	- b base
103	basetype. tag or name of element to copy definitions from.
104	- t type
105	numerical primary type of data, describing the initial part of the element
106	(up to the first subfield delimiter, if any). default is 0.
107	- l length
108	numerical (max) length. Absent = ~ = any.
109	Note that 0 does NOT mean any, but empty, i.e. the element is a flag.
110	For fixed subrecords, this gives the number of childs.
111	For other types, it is the length in characters (not bytes!).
112	For types 3 and 4, this is a fixed length.
113	- f format
114	format pattern, highly dependent on type.
115	if empty, indicates fixed length as given by l.
116	for type 3, if the format is shorter than l,
117	the last character is repeated to fill length l.
118	if nonempty, can be used to imply type 3 and a fixed length.
119	f is repeatable, allowing alternative formats.
120	for subrecord( header)s, this gives a legal child as
121	tag_or_name[,[min][,max]],
122	where min defaults to 0 and max to unlimited.
123	for most other types, this is the traditional pattern format (see above).
124	- m min
125	Minimum # of occurences. Absent = 0. Empty = 1.
126	- r repeat
127	Maximum # of occurences. Absent = 1. Empty = ~ = any.
128	Use r0 for fields that should not directly be included in their parent,
129	but are referenced as basetypes or childs.
130	- v value
131	default value, code for enums
132	- x xref
133	crossreference for field (see below) (repeatable)
134	- d description
135	descriptive text in the database's lead language.
136	translations are maintained separatly.
137
138
139	* types of fields
140
141	\| code \| C-name \| name \| description
142	\| 0 \| FTX \| any \| actually any characters
143	\| 1 \| FTA \| alpha \| STRICTLY alpha
144	\| 2 \| FTN \| numeric \| only digits or signs (+-)
145	\| 3 \| FTP \| pattern \| given by COBOL-style pattern
146	\| 4 \| FTB \| boolean \| 0 or 1 (was 13)
147	\| 5 \| FTE \| enum \| one of the codes listed for this field (was 12)
148	\| 8 \| FTI \| iso \| alphanum using delimiter ASCII 31 (obsolete, was 10)
149	\| 9 \| FTT \| table \| alphanum using delimiter ASCII 9 (TAB) (obsolete, was 14)
150	\| 13 \| FTO \| operator \| fixed subrecord
151	\| 14 \| FTR \| record \| embraced subrecord (structure)
152	\| 15 \| FTS \| sequence \| counted subrecord (array)
153	\| 16 \| FTV \| value \| an enum value
154
155
156	* field names
157
158	It is common practice to use long field descriptions
159	like "Corporate Bodies" or "Govt. Publications No.",
160	which may be nice looking but are not well suited for technical use.
161
162	Therefore the OpenIsis metadata contains an extra field name
163	in it's metadata record.
164	When sourcing metadata from an FDT file,
165	OpenIsis will derive field names by lowercasing the descriptions
166	and replacing all runs of non-alphanums by a single '_',
167	yielding "govt_publications_no".
168
169
170	* subrecords
171
172	> Struct subrecords
173	are introduced by one of the types 13 to 15.
174	All those introducing fields may contain any subfields.
175
176	Type 14 records may (after the leading + or -) also contain an
177	initial any string, whose length may be restricted to the given length.
178	(Typically used to hold an initial textnode's contents).
179
180
181	* crossreferences
182
183	Crossreferences can be used for several purposes, including
184	- specifying referential integrity constraints
185	- especially authority, coding and terminology relations
186	- specifying inheritance relations in the
187	> PatchWork
188
189	However, all these purposes are more or less the same,
190	depending just on how you look at it. Any flavour of reference may be used
191	- on data entry,
192	to ensure that a reference can be resolved according to a given cardinality
193	- on data retrieval,
194	to enrich the data of a refering record by it's refered-to records
195
196	Basically, the contents of an element for which a crossref is defined,
197	may, should or must be either an MFN (row number) of a record or,
198	after slight modification, a (prefix of a) valid index entry,
199	in both cases usually in another database (table).
200
201
202	The crossref for a field specifies, separated by semicolon ';',
203	- database of reference
204	- cardinality as used in regular expressions
205	'*' for any, '+' for at least one, '?' at most one, or min,max.
206	Empty means exactly one.
207	- prefix for index lookup
208	Anything after the second ';'
209
210	If a value found in the index is delimited by the same delimiter
211	as the item in question (TAB strongly recommended),
212	it is considered a match and the whole index entry
213	considered the field of reference.
214	Else, the index entry's record is looked up and whatever is the entry's
215	tag is the field of reference.
216
217	Thus, resolving the reference, is largely governed by the index
218	(and you can do all sort of tricks there), not by the referer.
219
220
221	---
222	$Id: Meta.txt,v 1.8 2003/06/30 09:48:40 kripke Exp $