1 |
traditional field definition and OpenIsis data definition metadata. |
2 |
|
3 |
|
4 |
* the traditional FDT |
5 |
|
6 |
According to the CDS/ISIS manual, |
7 |
the field definition table is displayed like |
8 |
$ |
9 |
44 Serie 300 X R vz |
10 |
$ |
11 |
and contains for each field tag: |
12 |
- a field description (up to 30 characters) |
13 |
- maximum length (1 to 1650) |
14 |
- field type: |
15 |
X for alphanum, A for strictly alpha (not including space), |
16 |
N for numeric (decimal digits), P for pattern (see below). |
17 |
- repeatability indicator |
18 |
R meaning field is repeatable, N else |
19 |
- format (subfield list or pattern) |
20 |
For a P field, this gives a COBOL-PIC-style pattern |
21 |
consisting of X (alnum), A (alpha), 9 (num) and literal letters, |
22 |
e.g. 99-999/AA to allow for input like 35-674/XE. |
23 |
For other field types, this is a list of legal subfields, |
24 |
e.g. vz to allow for input like foo^vbar^zbaz. |
25 |
Patterns are not supported by winisis as of Version 1.4. |
26 |
|
27 |
The actual file format looks like |
28 |
$ |
29 |
Series vz 44 300 0 1 |
30 |
$ |
31 |
which is 30 characters name, 20 characters subfield/pattern, |
32 |
the field tag, maximum length, a number encoding the type |
33 |
(0 alphanum, 1 alpha, 2 numeric, 3 pattern), |
34 |
and a number 1 for repeatable / 0 for non repeatable fields. |
35 |
|
36 |
|
37 |
Moreover, the FDT defines the available |
38 |
worksheets W (.fmt), printformats F (.pft) and field selections S (.fst). |
39 |
Each such definition is on a line starting with the |
40 |
key letter and a colon, and followed by (blank padded) |
41 |
6 character fields of file basenames. |
42 |
$ |
43 |
F:THES THES1 THES2 |
44 |
$ |
45 |
|
46 |
|
47 |
* principles of OpenIsis metadata |
48 |
|
49 |
Definition of data is done in the database's |
50 |
> Options |
51 |
record, rather than in a separate file. |
52 |
Data definition is aimed at being quite general, |
53 |
supporting a wide range of formats according to different application needs. |
54 |
It includes several features that are also found in IsisMARC's FDT21 |
55 |
by Ernesto Spinak, but uses substructures rather than a separate database. |
56 |
|
57 |
|
58 |
Data definition does not only define fields, but also subfields, |
59 |
subrecords, code values (enums) and types in a uniform approach. |
60 |
Basically, for every data element, |
61 |
there is one (heavily structured) field describing the element, |
62 |
which may, however, refer to or be referred to by other fields. |
63 |
|
64 |
|
65 |
It has the following subfields, separated by TAB: |
66 |
- i id |
67 |
numerical tag of the field or type described |
68 |
- s sub |
69 |
subfield identifier of the element defined (repeatable). |
70 |
Not present in a (top level) field definition. |
71 |
Number base 36 (i.e. 0..9,a=10,b=11..z=35). |
72 |
If the subid does not start with a number base 36, |
73 |
the element can be referred to by it's position among the subfield |
74 |
declarations for the same parent (in base 36, starting from 0). |
75 |
If there is a number, not followed by any characters, |
76 |
the subfield is identified by this number. |
77 |
If subid is empty, or the number is followed by a ')', |
78 |
the field is unidentified in it's parent. |
79 |
If any other character like '=', the field is identified by it's name, |
80 |
delimited by this character (or any single non-id char). |
81 |
- e end |
82 |
end of this element (i.e., for a toplevel field, the initial part). |
83 |
Consists of a leading delimiter char and some optional flags. |
84 |
If empty, is the standard subfield delimiter (a TAB). |
85 |
If absent, and the field has fixed length (type 3 or 4), it has |
86 |
no delimiting character, but just stops after it's length. |
87 |
Else a subfield inherits it's parents end (defaulting to TAB), |
88 |
and for a toplevel there are no subfields (you may still define |
89 |
subfields, but the field value itself will include them). |
90 |
Traditional ISIS data should use e^. |
91 |
An initial numerical value (decimal digits) is interpreted as ASCII code |
92 |
of the delimiter char (TAB is 9). |
93 |
If end contains a (literal) blank, any blanks after the delimiter |
94 |
are consumed. |
95 |
If end contains a quote ("), the delimiter is not recognized within quotes. |
96 |
If end contains a backslash (\), the backslash character can be used |
97 |
to escape the delimiter, a quote (if spec) and itself. |
98 |
- n name |
99 |
technical name of the element, preferrably in english. |
100 |
Must be a lowercase identifier (matching regular expression [a-z_][a-z_0-9]*) |
101 |
unique throughout this data definition. |
102 |
- b base |
103 |
basetype. tag or name of element to copy definitions from. |
104 |
- t type |
105 |
numerical primary type of data, describing the initial part of the element |
106 |
(up to the first subfield delimiter, if any). default is 0. |
107 |
- l length |
108 |
numerical (max) length. Absent = ~ = any. |
109 |
Note that 0 does NOT mean any, but empty, i.e. the element is a flag. |
110 |
For fixed subrecords, this gives the number of childs. |
111 |
For other types, it is the length in characters (not bytes!). |
112 |
For types 3 and 4, this is a fixed length. |
113 |
- f format |
114 |
format pattern, highly dependent on type. |
115 |
if empty, indicates fixed length as given by l. |
116 |
for type 3, if the format is shorter than l, |
117 |
the last character is repeated to fill length l. |
118 |
if nonempty, can be used to imply type 3 and a fixed length. |
119 |
f is repeatable, allowing alternative formats. |
120 |
for subrecord( header)s, this gives a legal child as |
121 |
tag_or_name[,[min][,max]], |
122 |
where min defaults to 0 and max to unlimited. |
123 |
for most other types, this is the traditional pattern format (see above). |
124 |
- m min |
125 |
Minimum # of occurences. Absent = 0. Empty = 1. |
126 |
- r repeat |
127 |
Maximum # of occurences. Absent = 1. Empty = ~ = any. |
128 |
Use r0 for fields that should not directly be included in their parent, |
129 |
but are referenced as basetypes or childs. |
130 |
- v value |
131 |
default value, code for enums |
132 |
- x xref |
133 |
crossreference for field (see below) (repeatable) |
134 |
- d description |
135 |
descriptive text in the database's lead language. |
136 |
translations are maintained separatly. |
137 |
|
138 |
|
139 |
* types of fields |
140 |
|
141 |
| code | C-name | name | description |
142 |
| 0 | FTX | any | actually any characters |
143 |
| 1 | FTA | alpha | STRICTLY alpha |
144 |
| 2 | FTN | numeric | only digits or signs (+-) |
145 |
| 3 | FTP | pattern | given by COBOL-style pattern |
146 |
| 4 | FTB | boolean | 0 or 1 (was 13) |
147 |
| 5 | FTE | enum | one of the codes listed for this field (was 12) |
148 |
| 8 | FTI | iso | alphanum using delimiter ASCII 31 (obsolete, was 10) |
149 |
| 9 | FTT | table | alphanum using delimiter ASCII 9 (TAB) (obsolete, was 14) |
150 |
| 13 | FTO | operator | fixed subrecord |
151 |
| 14 | FTR | record | embraced subrecord (structure) |
152 |
| 15 | FTS | sequence | counted subrecord (array) |
153 |
| 16 | FTV | value | an enum value |
154 |
|
155 |
|
156 |
* field names |
157 |
|
158 |
It is common practice to use long field descriptions |
159 |
like "Corporate Bodies" or "Govt. Publications No.", |
160 |
which may be nice looking but are not well suited for technical use. |
161 |
|
162 |
Therefore the OpenIsis metadata contains an extra field name |
163 |
in it's metadata record. |
164 |
When sourcing metadata from an FDT file, |
165 |
OpenIsis will derive field names by lowercasing the descriptions |
166 |
and replacing all runs of non-alphanums by a single '_', |
167 |
yielding "govt_publications_no". |
168 |
|
169 |
|
170 |
* subrecords |
171 |
|
172 |
> Struct subrecords |
173 |
are introduced by one of the types 13 to 15. |
174 |
All those introducing fields may contain any subfields. |
175 |
|
176 |
Type 14 records may (after the leading + or -) also contain an |
177 |
initial any string, whose length may be restricted to the given length. |
178 |
(Typically used to hold an initial textnode's contents). |
179 |
|
180 |
|
181 |
* crossreferences |
182 |
|
183 |
Crossreferences can be used for several purposes, including |
184 |
- specifying referential integrity constraints |
185 |
- especially authority, coding and terminology relations |
186 |
- specifying inheritance relations in the |
187 |
> PatchWork |
188 |
|
189 |
However, all these purposes are more or less the same, |
190 |
depending just on how you look at it. Any flavour of reference may be used |
191 |
- on data entry, |
192 |
to ensure that a reference can be resolved according to a given cardinality |
193 |
- on data retrieval, |
194 |
to enrich the data of a refering record by it's refered-to records |
195 |
|
196 |
Basically, the contents of an element for which a crossref is defined, |
197 |
may, should or must be either an MFN (row number) of a record or, |
198 |
after slight modification, a (prefix of a) valid index entry, |
199 |
in both cases usually in another database (table). |
200 |
|
201 |
|
202 |
The crossref for a field specifies, separated by semicolon ';', |
203 |
- database of reference |
204 |
- cardinality as used in regular expressions |
205 |
'*' for any, '+' for at least one, '?' at most one, or min,max. |
206 |
Empty means exactly one. |
207 |
- prefix for index lookup |
208 |
Anything after the second ';' |
209 |
|
210 |
If a value found in the index is delimited by the same delimiter |
211 |
as the item in question (TAB strongly recommended), |
212 |
it is considered a match and the whole index entry |
213 |
considered the field of reference. |
214 |
Else, the index entry's record is looked up and whatever is the entry's |
215 |
tag is the field of reference. |
216 |
|
217 |
Thus, resolving the reference, is largely governed by the index |
218 |
(and you can do all sort of tricks there), not by the referer. |
219 |
|
220 |
|
221 |
--- |
222 |
$Id: Meta.txt,v 1.8 2003/06/30 09:48:40 kripke Exp $ |