1 |
Announcing Malete, the database engine powering OpenIsis 1.0 |
2 |
|
3 |
|
4 |
* from 0.9 to 1.0 |
5 |
|
6 |
Based on the 0.9 engine and especially its Tcl binding, |
7 |
we had a system complete enough to do very intensive application testing |
8 |
of all concepts, both handling bibliographical and terminological |
9 |
as well as general industrial data. |
10 |
With those experiences at hand we spent the second half of 2003 |
11 |
to give our then two year old software a complete overhaul, |
12 |
in order to create a basis to last. |
13 |
|
14 |
|
15 |
Along the traditional believes of Unix design we figured out |
16 |
that the best and most stable combination of robustness/performance |
17 |
with flexibility/convenience can be achieved by clearly separating |
18 |
- a general purpose database system |
19 |
which is very simple in order to be fast and robust and lay |
20 |
a solid ground for flexibility, but itself is meant to be |
21 |
accessed by other software (or geeks) rather than humans. |
22 |
While this engine is based on the Z39.2 record model |
23 |
(even supporting record leaders as used by MARC), it makes no special |
24 |
provisions to support bibliographical data or CDS/ISIS legacy, |
25 |
but rather tries to make this model appealing to general purpose |
26 |
database usage. This engine is called Malete (kurdish for "our house"). |
27 |
Malete includes a database core library, generic server and |
28 |
access libraries for various programming languages. |
29 |
- a CDS/ISIS-style application |
30 |
or, actually, like Winisis, a framework for applications. |
31 |
This is targeted at CDS/ISIS users and librarians in general. |
32 |
It provides support for conversion from and to a variety of |
33 |
known file formats including MARC, high level indexing, |
34 |
references (authority files, coded data), forms and so on. |
35 |
|
36 |
In other terms, for retrieval you will rarely need more than the |
37 |
Malete engine (plus some formatting for presentation, which is |
38 |
usually done in a web programming language like PHP), |
39 |
while for data entry you want a convenient graphical user interface |
40 |
providing all sorts of lookups and checks. |
41 |
|
42 |
|
43 |
* technical changes |
44 |
|
45 |
- multiprocessing |
46 |
For a variety of reasons (detailled elsewhere) we postponed support |
47 |
for multi-threading (to at least until after the ongoing move towards |
48 |
compiler supported thread local storage is stable and widely available). |
49 |
Instead writing support by multiple processes is enabled based on |
50 |
file locking. Fast and consistent caching even for processes with |
51 |
very short life time (like CGI scripts) is achieved by replacing |
52 |
the former explicit caching with memory mapping. |
53 |
- platform indepence |
54 |
Now both record and index data file formats are identical across |
55 |
platforms (i.e. the same even on big endians like Suns and Macs). |
56 |
Only the pointer and tree files are plattform dependent, |
57 |
but are rebuilt from the data as needed. |
58 |
- generalized record format |
59 |
A record with n fields is now a series of n+1 tag-value pairs. |
60 |
The tag of the first field is the negative total length -n-1 |
61 |
and the value is a record "header" consisting of the record id (MFN) |
62 |
and optional leader data as used by MARC. Obviously, such a series |
63 |
can be part of a larger record, meaning records can be easily nested. |
64 |
- simplified serialized format |
65 |
The serialized (textual) representation of a record, |
66 |
used both in the masterfile and the server communications protocol, |
67 |
has dropped low-level support for field values containing newlines. |
68 |
Where needed, the application must apply proper encoding |
69 |
(but tools for that are provided). |
70 |
- simple transaction support |
71 |
Updates to a record can optionally be qualified with the position |
72 |
from which the record was last read, having the update fail |
73 |
if the record has been modified meanwhile. Reads can be done in |
74 |
"consistent snapshot" mode, reflecting the state of the database |
75 |
at one given point in time. |
76 |
- unified message interface |
77 |
The server communications protocol has been simplified and straigthened out. |
78 |
The masterfile now is only a special case of this protocol and thus |
79 |
can be directly sent to a server. Conceptually every record is a |
80 |
message saying "write me". |
81 |
- ucspi based server |
82 |
The server is designed to run under tcpserver, meaning it can take |
83 |
advantage of all of its features like access control, basic client |
84 |
authentication, IPv6, SSL encryption and so on. |
85 |
|
86 |
For more details see |
87 |
> Diff09 |
88 |
|
89 |
|
90 |
* applications and components |
91 |
|
92 |
OpenIsis 1.x will provide the following applications and components |
93 |
(probably not all in 1.0, but 1.1 should be fairly complete): |
94 |
|
95 |
- the Malete database server |
96 |
for Linux and other UNIX-like systems written in ISO C. |
97 |
This is aimed towards miminal functionality at maximum performance. |
98 |
Intended usage is for high volume read only processing and |
99 |
read/write with application controlled indexing. |
100 |
On UNIX, the server will be multi process based. |
101 |
On Windows, use of multiple processes is restricted to read-only mode. |
102 |
- Malete and OpenIsis command line tools |
103 |
for all systems providing several tools including conversion |
104 |
from and to legacy CDS/ISIS file formats. |
105 |
- Java, Perl and PHP libraries |
106 |
to contact a server, all written completely in the respective language. |
107 |
These are aimed at tight language integration, |
108 |
leveraging the application language's strengths and programmer's skills. |
109 |
Will run on all systems as supported by each language. |
110 |
- a Tcl extension and library |
111 |
where the library acts similar to those for other languages |
112 |
(but based on a C implemented record) and the extension basically |
113 |
provides the server interface in process. |
114 |
- an application server |
115 |
for all systems (i.e. including Windows), |
116 |
providing database and http service, based on Tcl with or w/o Tk. |
117 |
While this will not achieve the high throughput of a purely C-based |
118 |
server, the Tcl layer can add virtually arbitrary functionality. |
119 |
Intended usage is for read/write with server controlled indexing |
120 |
and integrated http applications based on Tcl server pages. |
121 |
Servers based on other languages are waiting for volunteers. |
122 |
- a Tk based GUI |
123 |
for all systems. Can run standalone or acting as server and/or client. |
124 |
- the OpenIsis Tcl library |
125 |
providing support for CDS/ISIS-style applications, e.g. indexing |
126 |
similar to FSTs. |
127 |
- the OpenIsis application |
128 |
targeted towards users from the CDS/ISIS community, esp. librarians, |
129 |
to provide interoperability with existing ISIS databases and support |
130 |
for bibliographic formats in a user friendly way. |
131 |
Written in Tcl/Tk as a sister of OpenMLCM. |
132 |
|
133 |
|
134 |
|
135 |
* Malete modules |
136 |
|
137 |
The Malete database system is structured in the following modules: |
138 |
|
139 |
- core |
140 |
basic C library for handling, storing and retrieving simple records. |
141 |
- pw |
142 |
"patchwork" framework for high level database services based on message |
143 |
passing. Some designs are borrowed from the Lisp and Smalltalk languages. |
144 |
- tool |
145 |
helper functions and command line tool including |
146 |
communication utilities and standalone server |
147 |
- java, perl and php |
148 |
client modules |
149 |
- tcl |
150 |
extension and base library |
151 |
- app |
152 |
the Tcl based application server |
153 |
- gui |
154 |
a generic Tk graphical user interface |
155 |
|
156 |
|
157 |
On top of this, the OpenIsis 1.x application set contains: |
158 |
|
159 |
- old |
160 |
compatility functions and command line tool |
161 |
- isis |
162 |
the OpenIsis library and graphical user interface |
163 |
|
164 |
|
165 |
* ISAM core |
166 |
|
167 |
This implements a variant of ISAM (index sequential access method) |
168 |
based on the ideas of Z39.2 (IIF) and Z39.50 (Type-1 queries). |
169 |
It provides a fully open and unprotected interface |
170 |
for unrestricted access at maximum performance. |
171 |
The core library is not fully self contained, |
172 |
but will require a few functions like stream I/O to be provided |
173 |
by each environment. |
174 |
It makes only very limited use of metadata, |
175 |
dealing with "physical" aspects like file names, locks and character sets. |
176 |
|
177 |
- util |
178 |
basic list, sessions, output buffers and other utilities |
179 |
- system |
180 |
services like file IO and time |
181 |
- charset |
182 |
recoding and collation |
183 |
- storage |
184 |
set of functions for database file access (master file and b-tree) |
185 |
|
186 |
|
187 |
* patchwork |
188 |
|
189 |
The patchwork C library wraps the ISAM core into an extendible |
190 |
framework for high level database services, |
191 |
based on passing records as request and response messages to server objects. |
192 |
It provides a fully abstract and generic method call interface |
193 |
plus a couple of database objects. |
194 |
|
195 |
An object dispatches messages by checking their type and other parameters |
196 |
and taking appropriate action, including forwarding to parent objects. |
197 |
This is known as the "pure object oriented" approach, |
198 |
as these objects don't have any other interface but the message dispatcher, |
199 |
especially no directly accessible data. |
200 |
|
201 |
- struct |
202 |
higher level operators on ISIS records a la IIF (Z39.2/ISO2709) |
203 |
based on meta data, including various substructures |
204 |
- base |
205 |
dispatcher wrapping the ISAM core. |
206 |
Based on the 0.9 server, but with some modifications to allow |
207 |
for most efficient message passing. |
208 |
- query |
209 |
dispatcher for ISIS/Z39.50 Type-1 style queries |
210 |
- server |
211 |
dispatcher providing record relations, views and other magic |
212 |
|
213 |
|
214 |
* design guidelines |
215 |
|
216 |
requirements: |
217 |
- flexible and efficient buffered pushing of output. |
218 |
Pulling is not used on lower levels; |
219 |
every environment will solicit input on the outermost level as adequate. |
220 |
- flexible and efficient construction, manipulation and passing |
221 |
of records, especially embedded subrecords in the patchwork. |
222 |
|
223 |
|
224 |
principles: |
225 |
- everything is a list. |
226 |
Similar to Java's String and StringBuffer, |
227 |
there is the immutable "Rec" and the mutable "List". |
228 |
- uniform stream output. |
229 |
Conceptually, all output is a list. There is only one (output) "Stream", |
230 |
which may be backed by memory buffers, files or other channels like a GUI |
231 |
window, so even diagnostic output can be captured. |
232 |
- negative counted subrecords. |
233 |
The patchwork uses negative counted embedding, since this allows |
234 |
to pass on embedded records without any modifcation or copying. |
235 |
- low tag usage. |
236 |
Besides reserving all negative tags for embedding, only a minimal amount |
237 |
of tags should be defined. Instead subfielding will be used extensively. |
238 |
The patchwork message header uses tag 0, containing the message type |
239 |
as an indicator, followed by any number of simple options and |
240 |
parameters, resembling a command line (see below). |
241 |
Alphabetic keywords and mnemonics are favoured over numbers. |
242 |
- leader |
243 |
There always has been some out-of-band data on records like their mfn. |
244 |
This is now generalized in the concept of a record leader (see below). |
245 |
|
246 |
|
247 |
implementation notes: |
248 |
- immutable lists |
249 |
are just the same as a record embedded by negative counting, |
250 |
i.e. an array of fields, with the tag of the first being the negative |
251 |
total field count. |
252 |
- record leader |
253 |
The tag of the first field of embedded records contains leader-like meta info; |
254 |
for database records this is (optional) mfn plus a MARC leader. |
255 |
Since there should not be a difference between the representation of |
256 |
embedded and first level records, every record has a leader. |
257 |
- message leader |
258 |
A record representing a message also has a leader. |
259 |
Where the message is not embedded, it is sent as a leading 0-tagged field. |
260 |
Since message leaders start with an alphabetic character, |
261 |
the 0 and tab are omitted in the textual representation. |
262 |
Message leaders use tabs as separators and start with a word |
263 |
indicating the message type to the dispatcher. |
264 |
Following subfields are parameters, with or without identifiers. |
265 |
- getopt command lines |
266 |
a command line of the form "command -aopt1 -bopt2 arg1 arg2" can be |
267 |
easily and canonically wrapped into one field by removing the '-' |
268 |
option indicator and identifying the non-option args as subfield '@'. |
269 |
A commandline interface thus maps easily, and without the need |
270 |
for looking up meta information, to a message leader, |
271 |
from which the method identified by "command" can fetch options |
272 |
using a getopt-like utility. System and db parameters are likewise |
273 |
stored in the options file. |
274 |
- message body |
275 |
most messages use only one type of record parameters; |
276 |
however, special embedded records like indexing instructions |
277 |
can be recognized by their leader, where applicable. |
278 |
Where a message contains parameter fields (first level, not leader subfields), |
279 |
it must use positive tags for that, preferrably using low numbers. |
280 |
- direct embedding |
281 |
Where a message has no parameter fields, |
282 |
i.e. no parameters besides its leader's subfields and embedded records, |
283 |
and there is only one parameter record, the message may, |
284 |
as a convenient shorthand, allow to specify the embedded record's leader |
285 |
(mfn and db for database records) as message options and have its leader |
286 |
immediatly followed by the record data. |
287 |
In other words, the message sort of embedds itself in its parameter |
288 |
record's leader (and has to remove itself before passing it on). |
289 |
This is the form used by masterfile metalines (with ommitted 0). |
290 |
- system options |
291 |
can be specified on the command line or in a system options file. |
292 |
there is a global options list (e.g. verbosity) and per db options |
293 |
(like file paths and readonly). The commandline format is |
294 |
"-aglob1 -bglob2 dbname -xdb1 -ydb2 [... dbname ...]". |
295 |
The system options file contains (the textual representation of a |
296 |
record with 0-tagged) fields, one for each db, wrapped up like |
297 |
"dbname xbd1 ydb2" (with tabs). Those options are NOT stored |
298 |
in each db's .opt file or meta record. |
299 |
- database metadata |
300 |
contained in the db's .m0d file is basically a chained |
301 |
message to the core engine, mostly configuring the "transmission format" |
302 |
|
303 |
|
304 |
> http://uk.travel.yahoo.com/t/wc/germany/berlin/nightlife/malete.html Malete |
305 |
|
306 |
--- |
307 |
$Id: OverView.txt,v 1.4 2004/06/15 13:17:37 kripke Exp $ |