/[webpac]/openisis/0.9.9e/doc/IIF.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/0.9.9e/doc/IIF.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 604 - (hide annotations)
Mon Dec 27 21:49:01 2004 UTC (19 years, 4 months ago) by dpavlin
File MIME type: text/plain
File size: 7714 byte(s)
import of new openisis release, 0.9.9e

1 dpavlin 604 IIF, MARC and Z39.50
2    
3    
4     IIF is the "Information Interchange Format", a record serialization format
5     specified in ISO standard 2709, also published as ANSI
6     > http://www.niso.org/standards/resources/Z39-2.pdf Z39.2.
7     IIF is mostly a plaintext format, in that almost any information is encoded
8     using ASCII characters (no binary numbers) and the only control characters
9     used are byte values 29 (record terminator RT), 30 (field terminator FT)
10     and 31 (as subfield delimiter).
11    
12    
13     > http://www.loc.gov/marc/ MARC
14     ("MAchine Readable Catalogue") is actually a family of largely incompatible
15     standards (
16     > http://www.loc.gov/marc/marcdocz.html USMARC
17     ,
18     > http://www.ifla.org/VI/3/p1996-1/sec-uni.htm UNIMARC
19     , UKMARC, ...) that evolved from MARC I (1965).
20     While the main concern of the MARC standards is to specify actual data models
21     (assigning tags and subfield codes, which can be used perfectly well in
22     Malete, CDS/ISIS or other databases), they also specify a variant of IIF as
23     suggested common format for data exchange, which we here refer to as "MARC".
24     (This file syntax seems to be mostly the same for all MARC standards).
25    
26    
27     > ftp://ftp.loc.gov/pub/z3950/official/part1.txt Z39.50
28     is a network protocol to search and retrieve records.
29     It supports various query "languages", the most commonly used of which
30     is called Type-1 query. Type-1 is similar to the queries as supported
31     by Malete and CDS/ISIS, however, much more general and complex.
32     Terms can be searched for in any indexed field or with restriction
33     to one or more "attributes".
34    
35     Attributes are basically the tags used in the index, which are almost always
36     different from those used in records. While it is common for records to use
37     any of the various MARCs or even completely different formats, the attributes
38     used in bibliographical systems are typically those specified by the Bib-1
39     attribute set (e.g. assigning 4 to title).
40    
41    
42     Z39.50 allows a client to select a record format from various conversions
43     supported by a server. When a MARC format is selected,
44     the data is actually transmitted serialized according to IIF.
45    
46    
47     * IIF and MARC serialized records
48    
49     IIF specifies a serialization for records. Like the Malete record data file,
50     an IIF file is simply a stream of such records; there is no additional
51     file header.
52    
53     A record has
54     - a 24 byte leader, containing 16 bytes structural data
55     and 8 bytes application data (x, imported as "MARC leader").
56     The format for MARC is LLLLLxxxxx22BBBBBxxx4500.
57     The Ls and Bs are total record length (including leader and a terminating RT)
58     and start of data (field values, after an FT terminating the dictionary).
59     The first '2' denotes that every field starts with two indicator bytes,
60     the second is the subfield identifier length including the delimiter char.
61     - a "dictionary" array with one entry per field containing 3 bytes tag,
62     and n and m bytes for length and offset.
63     n and m are digits at leader offset 20 and 21, MARC uses 4 and 5.
64     In general IIF, leader byte 22 may specify a number of implementation
65     defined entry bytes.
66     - the actual field values, each terminated by the FT character.
67    
68     As opposed to folklore, MARC does NOT use a '$' as subfield delimiter,
69     nor a '#' for unused indicators. Rather, the examples in the specs
70     use a '$' to REPRESENT the subfield delimiter control character 31 (^_),
71     and a '#' to REPRESENT a blank. The RT(29, ^]) is sometimes represented as '\'
72     and the FT(30, ^^) as '^' or '@'.
73    
74    
75     * Malete IIF import and export
76    
77     The malete tool provides two rather simplistic
78     > CmdLine commands
79     iifimp and iifexp.
80    
81     The command specific options are:
82     - Ffile
83     specify full filename for the IIF files.
84     Default is the basename of the Malete database with extension .iif.
85     On UNIX, a filename '-' selects stdin/out.
86     - Nomarc (literally)
87     do not assume the MARC structure 22/450 on import. Requires proper IIF data.
88     - P[iic]
89     on export, prepend indicators ii and, where needed, subfield c.
90     A single -P uses two blanks as indicators and subfield '0'.
91     Suggested to produce at least syntactically correct MARC.
92     - Rid (literally)
93     on import, use a numeric control number (1st field, if it has tag 1)
94     as record id. Note that on export, the record id is always used as
95     control number unless the record already has one,
96     since this is specified as a must not only by MARC, but by IIF.
97    
98     * creating proper IIF from WinIsis
99    
100     In Database-Export, set the subfield separator to \031 and
101     output line length to 0.
102    
103     If the fields do not contain valid MARC data, use a reformatting FST like
104     $
105     001 0 MFN
106     044 0 |00^a|,v44
107     024 0 |00^a|,v24
108     026 0 |00|,v26
109     070 0 (|00^a|,v70/)
110     $
111     Make sure, that
112     - the first output field is tag 1 containing some unique id
113     - every field starts with two indicator characters
114     (really should be blank, but that would be stripped during export)
115     - the indicators are followed by a delimiter and subfield identifier
116     Still the output is not 100% correct, since WinIsis sets
117     number of indicators and identifier length to 0, where MARC specifies 2.
118     However, many other MARC processors, including zebraidx, ignore these settings.
119    
120    
121     * making MARC data available via Z39.50
122    
123     MARC records can be made easily available using indexdata's
124     > http://www.indexdata.dk/zebra/ zebra.
125    
126     If records in your IIF file use tags and subfields conforming to, say, USmarc,
127     simply check out the test/usmarc example in the zebra distribution.
128     Put your data in the records subdir and run "zebraidx update records; zebrasrv".
129    
130     If your data was exported from WinIsis, you may want to put a line
131     "encoding Cp850" in the .abs file.
132    
133    
134     You must use recordType: grs.marc.something, meaning that it's general
135     structured data in some marc file format.
136     The sample usmarc.abs uses the "marc usmarc.mar" statement,
137     and usmarc.mar (in the zebra/tab directory) contains "reference USmarc",
138     stating that the marc input actually IS in USmarc.
139     This need not be the truth, it just means that the records will be served
140     as is, if a client asks for USmarc.
141     However, only the tags listed in "elm" statements in the .abs files
142     will be indexed.
143    
144    
145     Note that zebra's indexing support is not as flexible as that of CDS/ISIS:
146     you can only select fields or subfields to be indexed in one of a couple
147     of modes (like word or phrase). To take full advantage of sophisticated
148     CDS/ISIS FSTs, include them in your export reformatting FST.
149     Use some otherwise unused field tags to hold the index terms and "elm"
150     statements to map them to bib-1 attributes.
151     Omit those fields from the display mapping.
152    
153    
154     To keep the data in its native format (say CDS), change the elm
155     statements to map the fields to index to the corresponding bib-1 attributes
156     for searching, e.g. "elm 024 Conference-name !",
157     and, instead of using the "marc usmarc.mar" statement,
158     create one or more maptabs to map the full record to one or more
159     USmarc a/o other presentation formats as applicable.
160     Check out the gils-usmarc.map example in the zebra/tab directory.
161    
162    
163     Consult the
164     > http://www.indexdata.dk/zebra/doc/ zebra documentation
165     for details.
166    
167    
168     * links
169     - ISO2709 "Information Interchange Format", a.k.a. ANSI/NISO
170     > http://www.niso.org/standards/resources/Z39-2.pdf Z39.2
171     - Machine Readable Catalogues
172     > http://www.loc.gov/marc/specifications/specrecstruc.html (US) MARC 21
173     ,
174     > http://lcweb.loc.gov/marc/ overview
175     , references at the
176     > http://www.oasis-open.org/cover/marc.html Cover Pages
177     - Z39.50
178     > ftp://ftp.loc.gov/pub/z3950/official/ official spec
179     , overview at
180     > http://www.oclcpica.org/content/45/pdf/z3950_handbook_paper.pdf OCLC|Pica
181     , links at
182     > http://www.indexdata.dk/technologies/z3950/ indexdata
183     , makers of excellent free Z39.50 software.
184     - Uncle Aung's
185     > http://uncleaung.com/zisis/ Zisis
186    
187     ---
188     $Id: IIF.txt,v 1.5 2004/09/23 11:44:04 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26