1 |
IIF, MARC and Z39.50 |
2 |
|
3 |
|
4 |
IIF is the "Information Interchange Format", a record serialization format |
5 |
specified in ISO standard 2709, also published as ANSI |
6 |
> http://www.niso.org/standards/resources/Z39-2.pdf Z39.2. |
7 |
IIF is mostly a plaintext format, in that almost any information is encoded |
8 |
using ASCII characters (no binary numbers) and the only control characters |
9 |
used are byte values 29 (record terminator RT), 30 (field terminator FT) |
10 |
and 31 (as subfield delimiter). |
11 |
|
12 |
|
13 |
> http://www.loc.gov/marc/ MARC |
14 |
("MAchine Readable Catalogue") is actually a family of largely incompatible |
15 |
standards ( |
16 |
> http://www.loc.gov/marc/marcdocz.html USMARC |
17 |
, |
18 |
> http://www.ifla.org/VI/3/p1996-1/sec-uni.htm UNIMARC |
19 |
, UKMARC, ...) that evolved from MARC I (1965). |
20 |
While the main concern of the MARC standards is to specify actual data models |
21 |
(assigning tags and subfield codes, which can be used perfectly well in |
22 |
Malete, CDS/ISIS or other databases), they also specify a variant of IIF as |
23 |
suggested common format for data exchange, which we here refer to as "MARC". |
24 |
(This file syntax seems to be mostly the same for all MARC standards). |
25 |
|
26 |
|
27 |
> ftp://ftp.loc.gov/pub/z3950/official/part1.txt Z39.50 |
28 |
is a network protocol to search and retrieve records. |
29 |
It supports various query "languages", the most commonly used of which |
30 |
is called Type-1 query. Type-1 is similar to the queries as supported |
31 |
by Malete and CDS/ISIS, however, much more general and complex. |
32 |
Terms can be searched for in any indexed field or with restriction |
33 |
to one or more "attributes". |
34 |
|
35 |
Attributes are basically the tags used in the index, which are almost always |
36 |
different from those used in records. While it is common for records to use |
37 |
any of the various MARCs or even completely different formats, the attributes |
38 |
used in bibliographical systems are typically those specified by the Bib-1 |
39 |
attribute set (e.g. assigning 4 to title). |
40 |
|
41 |
|
42 |
Z39.50 allows a client to select a record format from various conversions |
43 |
supported by a server. When a MARC format is selected, |
44 |
the data is actually transmitted serialized according to IIF. |
45 |
|
46 |
|
47 |
* IIF and MARC serialized records |
48 |
|
49 |
IIF specifies a serialization for records. Like the Malete record data file, |
50 |
an IIF file is simply a stream of such records; there is no additional |
51 |
file header. |
52 |
|
53 |
A record has |
54 |
- a 24 byte leader, containing 16 bytes structural data |
55 |
and 8 bytes application data (x, imported as "MARC leader"). |
56 |
The format for MARC is LLLLLxxxxx22BBBBBxxx4500. |
57 |
The Ls and Bs are total record length (including leader and a terminating RT) |
58 |
and start of data (field values, after an FT terminating the dictionary). |
59 |
The first '2' denotes that every field starts with two indicator bytes, |
60 |
the second is the subfield identifier length including the delimiter char. |
61 |
- a "dictionary" array with one entry per field containing 3 bytes tag, |
62 |
and n and m bytes for length and offset. |
63 |
n and m are digits at leader offset 20 and 21, MARC uses 4 and 5. |
64 |
In general IIF, leader byte 22 may specify a number of implementation |
65 |
defined entry bytes. |
66 |
- the actual field values, each terminated by the FT character. |
67 |
|
68 |
As opposed to folklore, MARC does NOT use a '$' as subfield delimiter, |
69 |
nor a '#' for unused indicators. Rather, the examples in the specs |
70 |
use a '$' to REPRESENT the subfield delimiter control character 31 (^_), |
71 |
and a '#' to REPRESENT a blank. The RT(29, ^]) is sometimes represented as '\' |
72 |
and the FT(30, ^^) as '^' or '@'. |
73 |
|
74 |
|
75 |
* Malete IIF import and export |
76 |
|
77 |
The malete tool provides two rather simplistic |
78 |
> CmdLine commands |
79 |
iifimp and iifexp. |
80 |
|
81 |
The command specific options are: |
82 |
- Ffile |
83 |
specify full filename for the IIF files. |
84 |
Default is the basename of the Malete database with extension .iif. |
85 |
On UNIX, a filename '-' selects stdin/out. |
86 |
- Nomarc (literally) |
87 |
do not assume the MARC structure 22/450 on import. Requires proper IIF data. |
88 |
- P[iic] |
89 |
on export, prepend indicators ii and, where needed, subfield c. |
90 |
A single -P uses two blanks as indicators and subfield '0'. |
91 |
Suggested to produce at least syntactically correct MARC. |
92 |
- Rid (literally) |
93 |
on import, use a numeric control number (1st field, if it has tag 1) |
94 |
as record id. Note that on export, the record id is always used as |
95 |
control number unless the record already has one, |
96 |
since this is specified as a must not only by MARC, but by IIF. |
97 |
|
98 |
* creating proper IIF from WinIsis |
99 |
|
100 |
In Database-Export, set the subfield separator to \031 and |
101 |
output line length to 0. |
102 |
|
103 |
If the fields do not contain valid MARC data, use a reformatting FST like |
104 |
$ |
105 |
001 0 MFN |
106 |
044 0 |00^a|,v44 |
107 |
024 0 |00^a|,v24 |
108 |
026 0 |00|,v26 |
109 |
070 0 (|00^a|,v70/) |
110 |
$ |
111 |
Make sure, that |
112 |
- the first output field is tag 1 containing some unique id |
113 |
- every field starts with two indicator characters |
114 |
(really should be blank, but that would be stripped during export) |
115 |
- the indicators are followed by a delimiter and subfield identifier |
116 |
Still the output is not 100% correct, since WinIsis sets |
117 |
number of indicators and identifier length to 0, where MARC specifies 2. |
118 |
However, many other MARC processors, including zebraidx, ignore these settings. |
119 |
|
120 |
|
121 |
* making MARC data available via Z39.50 |
122 |
|
123 |
MARC records can be made easily available using indexdata's |
124 |
> http://www.indexdata.dk/zebra/ zebra. |
125 |
|
126 |
If records in your IIF file use tags and subfields conforming to, say, USmarc, |
127 |
simply check out the test/usmarc example in the zebra distribution. |
128 |
Put your data in the records subdir and run "zebraidx update records; zebrasrv". |
129 |
|
130 |
If your data was exported from WinIsis, you may want to put a line |
131 |
"encoding Cp850" in the .abs file. |
132 |
|
133 |
|
134 |
You must use recordType: grs.marc.something, meaning that it's general |
135 |
structured data in some marc file format. |
136 |
The sample usmarc.abs uses the "marc usmarc.mar" statement, |
137 |
and usmarc.mar (in the zebra/tab directory) contains "reference USmarc", |
138 |
stating that the marc input actually IS in USmarc. |
139 |
This need not be the truth, it just means that the records will be served |
140 |
as is, if a client asks for USmarc. |
141 |
However, only the tags listed in "elm" statements in the .abs files |
142 |
will be indexed. |
143 |
|
144 |
|
145 |
Note that zebra's indexing support is not as flexible as that of CDS/ISIS: |
146 |
you can only select fields or subfields to be indexed in one of a couple |
147 |
of modes (like word or phrase). To take full advantage of sophisticated |
148 |
CDS/ISIS FSTs, include them in your export reformatting FST. |
149 |
Use some otherwise unused field tags to hold the index terms and "elm" |
150 |
statements to map them to bib-1 attributes. |
151 |
Omit those fields from the display mapping. |
152 |
|
153 |
|
154 |
To keep the data in its native format (say CDS), change the elm |
155 |
statements to map the fields to index to the corresponding bib-1 attributes |
156 |
for searching, e.g. "elm 024 Conference-name !", |
157 |
and, instead of using the "marc usmarc.mar" statement, |
158 |
create one or more maptabs to map the full record to one or more |
159 |
USmarc a/o other presentation formats as applicable. |
160 |
Check out the gils-usmarc.map example in the zebra/tab directory. |
161 |
|
162 |
|
163 |
Consult the |
164 |
> http://www.indexdata.dk/zebra/doc/ zebra documentation |
165 |
for details. |
166 |
|
167 |
|
168 |
* links |
169 |
- ISO2709 "Information Interchange Format", a.k.a. ANSI/NISO |
170 |
> http://www.niso.org/standards/resources/Z39-2.pdf Z39.2 |
171 |
- Machine Readable Catalogues |
172 |
> http://www.loc.gov/marc/specifications/specrecstruc.html (US) MARC 21 |
173 |
, |
174 |
> http://lcweb.loc.gov/marc/ overview |
175 |
, references at the |
176 |
> http://www.oasis-open.org/cover/marc.html Cover Pages |
177 |
- Z39.50 |
178 |
> ftp://ftp.loc.gov/pub/z3950/official/ official spec |
179 |
, overview at |
180 |
> http://www.oclcpica.org/content/45/pdf/z3950_handbook_paper.pdf OCLC|Pica |
181 |
, links at |
182 |
> http://www.indexdata.dk/technologies/z3950/ indexdata |
183 |
, makers of excellent free Z39.50 software. |
184 |
- Uncle Aung's |
185 |
> http://uncleaung.com/zisis/ Zisis |
186 |
|
187 |
--- |
188 |
$Id: IIF.txt,v 1.5 2004/09/23 11:44:04 kripke Exp $ |