1 |
dpavlin |
604 |
IIF, MARC and Z39.50 |
2 |
|
|
|
3 |
|
|
|
4 |
|
|
IIF is the "Information Interchange Format", a record serialization format |
5 |
|
|
specified in ISO standard 2709, also published as ANSI |
6 |
|
|
> http://www.niso.org/standards/resources/Z39-2.pdf Z39.2. |
7 |
|
|
IIF is mostly a plaintext format, in that almost any information is encoded |
8 |
|
|
using ASCII characters (no binary numbers) and the only control characters |
9 |
|
|
used are byte values 29 (record terminator RT), 30 (field terminator FT) |
10 |
|
|
and 31 (as subfield delimiter). |
11 |
|
|
|
12 |
|
|
|
13 |
|
|
> http://www.loc.gov/marc/ MARC |
14 |
|
|
("MAchine Readable Catalogue") is actually a family of largely incompatible |
15 |
|
|
standards ( |
16 |
|
|
> http://www.loc.gov/marc/marcdocz.html USMARC |
17 |
|
|
, |
18 |
|
|
> http://www.ifla.org/VI/3/p1996-1/sec-uni.htm UNIMARC |
19 |
|
|
, UKMARC, ...) that evolved from MARC I (1965). |
20 |
|
|
While the main concern of the MARC standards is to specify actual data models |
21 |
|
|
(assigning tags and subfield codes, which can be used perfectly well in |
22 |
|
|
Malete, CDS/ISIS or other databases), they also specify a variant of IIF as |
23 |
|
|
suggested common format for data exchange, which we here refer to as "MARC". |
24 |
|
|
(This file syntax seems to be mostly the same for all MARC standards). |
25 |
|
|
|
26 |
|
|
|
27 |
|
|
> ftp://ftp.loc.gov/pub/z3950/official/part1.txt Z39.50 |
28 |
|
|
is a network protocol to search and retrieve records. |
29 |
|
|
It supports various query "languages", the most commonly used of which |
30 |
|
|
is called Type-1 query. Type-1 is similar to the queries as supported |
31 |
|
|
by Malete and CDS/ISIS, however, much more general and complex. |
32 |
|
|
Terms can be searched for in any indexed field or with restriction |
33 |
|
|
to one or more "attributes". |
34 |
|
|
|
35 |
|
|
Attributes are basically the tags used in the index, which are almost always |
36 |
|
|
different from those used in records. While it is common for records to use |
37 |
|
|
any of the various MARCs or even completely different formats, the attributes |
38 |
|
|
used in bibliographical systems are typically those specified by the Bib-1 |
39 |
|
|
attribute set (e.g. assigning 4 to title). |
40 |
|
|
|
41 |
|
|
|
42 |
|
|
Z39.50 allows a client to select a record format from various conversions |
43 |
|
|
supported by a server. When a MARC format is selected, |
44 |
|
|
the data is actually transmitted serialized according to IIF. |
45 |
|
|
|
46 |
|
|
|
47 |
|
|
* IIF and MARC serialized records |
48 |
|
|
|
49 |
|
|
IIF specifies a serialization for records. Like the Malete record data file, |
50 |
|
|
an IIF file is simply a stream of such records; there is no additional |
51 |
|
|
file header. |
52 |
|
|
|
53 |
|
|
A record has |
54 |
|
|
- a 24 byte leader, containing 16 bytes structural data |
55 |
|
|
and 8 bytes application data (x, imported as "MARC leader"). |
56 |
|
|
The format for MARC is LLLLLxxxxx22BBBBBxxx4500. |
57 |
|
|
The Ls and Bs are total record length (including leader and a terminating RT) |
58 |
|
|
and start of data (field values, after an FT terminating the dictionary). |
59 |
|
|
The first '2' denotes that every field starts with two indicator bytes, |
60 |
|
|
the second is the subfield identifier length including the delimiter char. |
61 |
|
|
- a "dictionary" array with one entry per field containing 3 bytes tag, |
62 |
|
|
and n and m bytes for length and offset. |
63 |
|
|
n and m are digits at leader offset 20 and 21, MARC uses 4 and 5. |
64 |
|
|
In general IIF, leader byte 22 may specify a number of implementation |
65 |
|
|
defined entry bytes. |
66 |
|
|
- the actual field values, each terminated by the FT character. |
67 |
|
|
|
68 |
|
|
As opposed to folklore, MARC does NOT use a '$' as subfield delimiter, |
69 |
|
|
nor a '#' for unused indicators. Rather, the examples in the specs |
70 |
|
|
use a '$' to REPRESENT the subfield delimiter control character 31 (^_), |
71 |
|
|
and a '#' to REPRESENT a blank. The RT(29, ^]) is sometimes represented as '\' |
72 |
|
|
and the FT(30, ^^) as '^' or '@'. |
73 |
|
|
|
74 |
|
|
|
75 |
|
|
* Malete IIF import and export |
76 |
|
|
|
77 |
|
|
The malete tool provides two rather simplistic |
78 |
|
|
> CmdLine commands |
79 |
|
|
iifimp and iifexp. |
80 |
|
|
|
81 |
|
|
The command specific options are: |
82 |
|
|
- Ffile |
83 |
|
|
specify full filename for the IIF files. |
84 |
|
|
Default is the basename of the Malete database with extension .iif. |
85 |
|
|
On UNIX, a filename '-' selects stdin/out. |
86 |
|
|
- Nomarc (literally) |
87 |
|
|
do not assume the MARC structure 22/450 on import. Requires proper IIF data. |
88 |
|
|
- P[iic] |
89 |
|
|
on export, prepend indicators ii and, where needed, subfield c. |
90 |
|
|
A single -P uses two blanks as indicators and subfield '0'. |
91 |
|
|
Suggested to produce at least syntactically correct MARC. |
92 |
|
|
- Rid (literally) |
93 |
|
|
on import, use a numeric control number (1st field, if it has tag 1) |
94 |
|
|
as record id. Note that on export, the record id is always used as |
95 |
|
|
control number unless the record already has one, |
96 |
|
|
since this is specified as a must not only by MARC, but by IIF. |
97 |
|
|
|
98 |
|
|
* creating proper IIF from WinIsis |
99 |
|
|
|
100 |
|
|
In Database-Export, set the subfield separator to \031 and |
101 |
|
|
output line length to 0. |
102 |
|
|
|
103 |
|
|
If the fields do not contain valid MARC data, use a reformatting FST like |
104 |
|
|
$ |
105 |
|
|
001 0 MFN |
106 |
|
|
044 0 |00^a|,v44 |
107 |
|
|
024 0 |00^a|,v24 |
108 |
|
|
026 0 |00|,v26 |
109 |
|
|
070 0 (|00^a|,v70/) |
110 |
|
|
$ |
111 |
|
|
Make sure, that |
112 |
|
|
- the first output field is tag 1 containing some unique id |
113 |
|
|
- every field starts with two indicator characters |
114 |
|
|
(really should be blank, but that would be stripped during export) |
115 |
|
|
- the indicators are followed by a delimiter and subfield identifier |
116 |
|
|
Still the output is not 100% correct, since WinIsis sets |
117 |
|
|
number of indicators and identifier length to 0, where MARC specifies 2. |
118 |
|
|
However, many other MARC processors, including zebraidx, ignore these settings. |
119 |
|
|
|
120 |
|
|
|
121 |
|
|
* making MARC data available via Z39.50 |
122 |
|
|
|
123 |
|
|
MARC records can be made easily available using indexdata's |
124 |
|
|
> http://www.indexdata.dk/zebra/ zebra. |
125 |
|
|
|
126 |
|
|
If records in your IIF file use tags and subfields conforming to, say, USmarc, |
127 |
|
|
simply check out the test/usmarc example in the zebra distribution. |
128 |
|
|
Put your data in the records subdir and run "zebraidx update records; zebrasrv". |
129 |
|
|
|
130 |
|
|
If your data was exported from WinIsis, you may want to put a line |
131 |
|
|
"encoding Cp850" in the .abs file. |
132 |
|
|
|
133 |
|
|
|
134 |
|
|
You must use recordType: grs.marc.something, meaning that it's general |
135 |
|
|
structured data in some marc file format. |
136 |
|
|
The sample usmarc.abs uses the "marc usmarc.mar" statement, |
137 |
|
|
and usmarc.mar (in the zebra/tab directory) contains "reference USmarc", |
138 |
|
|
stating that the marc input actually IS in USmarc. |
139 |
|
|
This need not be the truth, it just means that the records will be served |
140 |
|
|
as is, if a client asks for USmarc. |
141 |
|
|
However, only the tags listed in "elm" statements in the .abs files |
142 |
|
|
will be indexed. |
143 |
|
|
|
144 |
|
|
|
145 |
|
|
Note that zebra's indexing support is not as flexible as that of CDS/ISIS: |
146 |
|
|
you can only select fields or subfields to be indexed in one of a couple |
147 |
|
|
of modes (like word or phrase). To take full advantage of sophisticated |
148 |
|
|
CDS/ISIS FSTs, include them in your export reformatting FST. |
149 |
|
|
Use some otherwise unused field tags to hold the index terms and "elm" |
150 |
|
|
statements to map them to bib-1 attributes. |
151 |
|
|
Omit those fields from the display mapping. |
152 |
|
|
|
153 |
|
|
|
154 |
|
|
To keep the data in its native format (say CDS), change the elm |
155 |
|
|
statements to map the fields to index to the corresponding bib-1 attributes |
156 |
|
|
for searching, e.g. "elm 024 Conference-name !", |
157 |
|
|
and, instead of using the "marc usmarc.mar" statement, |
158 |
|
|
create one or more maptabs to map the full record to one or more |
159 |
|
|
USmarc a/o other presentation formats as applicable. |
160 |
|
|
Check out the gils-usmarc.map example in the zebra/tab directory. |
161 |
|
|
|
162 |
|
|
|
163 |
|
|
Consult the |
164 |
|
|
> http://www.indexdata.dk/zebra/doc/ zebra documentation |
165 |
|
|
for details. |
166 |
|
|
|
167 |
|
|
|
168 |
|
|
* links |
169 |
|
|
- ISO2709 "Information Interchange Format", a.k.a. ANSI/NISO |
170 |
|
|
> http://www.niso.org/standards/resources/Z39-2.pdf Z39.2 |
171 |
|
|
- Machine Readable Catalogues |
172 |
|
|
> http://www.loc.gov/marc/specifications/specrecstruc.html (US) MARC 21 |
173 |
|
|
, |
174 |
|
|
> http://lcweb.loc.gov/marc/ overview |
175 |
|
|
, references at the |
176 |
|
|
> http://www.oasis-open.org/cover/marc.html Cover Pages |
177 |
|
|
- Z39.50 |
178 |
|
|
> ftp://ftp.loc.gov/pub/z3950/official/ official spec |
179 |
|
|
, overview at |
180 |
|
|
> http://www.oclcpica.org/content/45/pdf/z3950_handbook_paper.pdf OCLC|Pica |
181 |
|
|
, links at |
182 |
|
|
> http://www.indexdata.dk/technologies/z3950/ indexdata |
183 |
|
|
, makers of excellent free Z39.50 software. |
184 |
|
|
- Uncle Aung's |
185 |
|
|
> http://uncleaung.com/zisis/ Zisis |
186 |
|
|
|
187 |
|
|
--- |
188 |
|
|
$Id: IIF.txt,v 1.5 2004/09/23 11:44:04 kripke Exp $ |