1 |
dpavlin |
237 |
* what is isis |
2 |
|
|
|
3 |
|
|
Isis is a simple, yet powerful database system with a large installed |
4 |
|
|
base since the 80s. Since it's well suited for bibliographic data, |
5 |
|
|
it's commonly used in libraries, and since it's very low cost, |
6 |
|
|
especially in those running on a low budget. |
7 |
|
|
|
8 |
|
|
* introduction to the isis db |
9 |
|
|
|
10 |
|
|
An isis DB is a list of rows of unspecified structure, each identified |
11 |
|
|
by a unique number, the rowid (a.k.a. mfn). Each row is a list of |
12 |
|
|
fields, and each field has number (tag) and a string value. Within a |
13 |
|
|
row there may be zero, one or more fields with a given tag. While the |
14 |
|
|
field's value usually is a textual representation of data in one or |
15 |
|
|
the other character encoding (commonly one of the IBM/DOS code pages), |
16 |
|
|
it may actually contain arbitrary bytes. This is closely modelled |
17 |
|
|
after ISO2709 "Information Interchange Format" (IIF, a.k.a. ANSI/NISO |
18 |
|
|
> http://www.niso.org/standards/resources/Z39-2.pdf Z39.2 |
19 |
|
|
) |
20 |
|
|
|
21 |
|
|
* subfields |
22 |
|
|
|
23 |
|
|
There is a convention to encode multiple fields in one by separating |
24 |
|
|
them with a '^' followed by one character tagging the subfield. So the |
25 |
|
|
field value '^afoo^bbar^bbaz' represents a field having one 'a' |
26 |
|
|
subfield with value 'foo' and two 'b' subfields 'bar' and 'baz'. An |
27 |
|
|
other separator char may be used, e.g. ASCII character 31 ("Unit |
28 |
|
|
Separator") is used in the |
29 |
|
|
> http://www.loc.gov/marc/specifications/specrecstruc.html MARC standard. |
30 |
|
|
|
31 |
|
|
* formatting |
32 |
|
|
|
33 |
|
|
There is a formatting language, with literal text, field and subfield |
34 |
|
|
variables, if-else branches (on field existance) and for loops (over |
35 |
|
|
field repetitions) (roughly speaking). |
36 |
|
|
|
37 |
|
|
* indexing |
38 |
|
|
|
39 |
|
|
An index is build by converting a row into a list of words (optionally |
40 |
|
|
applying formats) and stuffing every word, qualified by the position |
41 |
|
|
of it's occurence in the row, into a B+-Tree (which is actually spread |
42 |
|
|
to six files). Searching for a word or word prefix is possible with or |
43 |
|
|
without qualifying the position (field). Since all fields can be |
44 |
|
|
combined into one index, it is usually not necessary (but possible) to |
45 |
|
|
set up multiple indexes. |
46 |
|
|
|
47 |
|
|
* queries |
48 |
|
|
|
49 |
|
|
A query language allows for combination of word lookups using and, or |
50 |
|
|
and not(without) operators. This is very similar to the "Type-1" query of |
51 |
|
|
> ftp://ftp.loc.gov/pub/z3950/official/part1.txt Z39.50. |
52 |
|
|
|
53 |
|
|
* usage |
54 |
|
|
|
55 |
|
|
While isis lacks most features of RDBMS like complex relations between |
56 |
|
|
different entities, it's flexibility comes in handy for many |
57 |
|
|
catalogues and directories with highly varying records and one single |
58 |
|
|
level of substructure, which today are usually modelled in XML |
59 |
|
|
documents rather than table rows. In other words, isis is an ideal |
60 |
|
|
storage for many XML applications. The flexible indexing mechanism |
61 |
|
|
combines the best of full text searching and structured retrieval. |
62 |
|
|
|