1 |
dpavlin |
604 |
Announcing Malete, the database engine powering OpenIsis 1.0 |
2 |
|
|
|
3 |
|
|
|
4 |
|
|
* from 0.9 to 1.0 |
5 |
|
|
|
6 |
|
|
Based on the 0.9 engine and especially its Tcl binding, |
7 |
|
|
we had a system complete enough to do very intensive application testing |
8 |
|
|
of all concepts, both handling bibliographical and terminological |
9 |
|
|
as well as general industrial data. |
10 |
|
|
With those experiences at hand we spent the second half of 2003 |
11 |
|
|
to give our then two year old software a complete overhaul, |
12 |
|
|
in order to create a basis to last. |
13 |
|
|
|
14 |
|
|
|
15 |
|
|
Along the traditional believes of Unix design we figured out |
16 |
|
|
that the best and most stable combination of robustness/performance |
17 |
|
|
with flexibility/convenience can be achieved by clearly separating |
18 |
|
|
- a general purpose database system |
19 |
|
|
which is very simple in order to be fast and robust and lay |
20 |
|
|
a solid ground for flexibility, but itself is meant to be |
21 |
|
|
accessed by other software (or geeks) rather than humans. |
22 |
|
|
While this engine is based on the Z39.2 record model |
23 |
|
|
(even supporting record leaders as used by MARC), it makes no special |
24 |
|
|
provisions to support bibliographical data or CDS/ISIS legacy, |
25 |
|
|
but rather tries to make this model appealing to general purpose |
26 |
|
|
database usage. This engine is called Malete (kurdish for "our house"). |
27 |
|
|
Malete includes a database core library, generic server and |
28 |
|
|
access libraries for various programming languages. |
29 |
|
|
- a CDS/ISIS-style application |
30 |
|
|
or, actually, like Winisis, a framework for applications. |
31 |
|
|
This is targeted at CDS/ISIS users and librarians in general. |
32 |
|
|
It provides support for conversion from and to a variety of |
33 |
|
|
known file formats including MARC, high level indexing, |
34 |
|
|
references (authority files, coded data), forms and so on. |
35 |
|
|
|
36 |
|
|
In other terms, for retrieval you will rarely need more than the |
37 |
|
|
Malete engine (plus some formatting for presentation, which is |
38 |
|
|
usually done in a web programming language like PHP), |
39 |
|
|
while for data entry you want a convenient graphical user interface |
40 |
|
|
providing all sorts of lookups and checks. |
41 |
|
|
|
42 |
|
|
|
43 |
|
|
* technical changes |
44 |
|
|
|
45 |
|
|
- multiprocessing |
46 |
|
|
For a variety of reasons (detailled elsewhere) we postponed support |
47 |
|
|
for multi-threading (to at least until after the ongoing move towards |
48 |
|
|
compiler supported thread local storage is stable and widely available). |
49 |
|
|
Instead writing support by multiple processes is enabled based on |
50 |
|
|
file locking. Fast and consistent caching even for processes with |
51 |
|
|
very short life time (like CGI scripts) is achieved by replacing |
52 |
|
|
the former explicit caching with memory mapping. |
53 |
|
|
- platform indepence |
54 |
|
|
Now both record and index data file formats are identical across |
55 |
|
|
platforms (i.e. the same even on big endians like Suns and Macs). |
56 |
|
|
Only the pointer and tree files are plattform dependent, |
57 |
|
|
but are rebuilt from the data as needed. |
58 |
|
|
- generalized record format |
59 |
|
|
A record with n fields is now a series of n+1 tag-value pairs. |
60 |
|
|
The tag of the first field is the negative total length -n-1 |
61 |
|
|
and the value is a record "header" consisting of the record id (MFN) |
62 |
|
|
and optional leader data as used by MARC. Obviously, such a series |
63 |
|
|
can be part of a larger record, meaning records can be easily nested. |
64 |
|
|
- simplified serialized format |
65 |
|
|
The serialized (textual) representation of a record, |
66 |
|
|
used both in the masterfile and the server communications protocol, |
67 |
|
|
has dropped low-level support for field values containing newlines. |
68 |
|
|
Where needed, the application must apply proper encoding |
69 |
|
|
(but tools for that are provided). |
70 |
|
|
- simple transaction support |
71 |
|
|
Updates to a record can optionally be qualified with the position |
72 |
|
|
from which the record was last read, having the update fail |
73 |
|
|
if the record has been modified meanwhile. Reads can be done in |
74 |
|
|
"consistent snapshot" mode, reflecting the state of the database |
75 |
|
|
at one given point in time. |
76 |
|
|
- unified message interface |
77 |
|
|
The server communications protocol has been simplified and straigthened out. |
78 |
|
|
The masterfile now is only a special case of this protocol and thus |
79 |
|
|
can be directly sent to a server. Conceptually every record is a |
80 |
|
|
message saying "write me". |
81 |
|
|
- ucspi based server |
82 |
|
|
The server is designed to run under tcpserver, meaning it can take |
83 |
|
|
advantage of all of its features like access control, basic client |
84 |
|
|
authentication, IPv6, SSL encryption and so on. |
85 |
|
|
|
86 |
|
|
For more details see |
87 |
|
|
> Diff09 |
88 |
|
|
|
89 |
|
|
|
90 |
|
|
* applications and components |
91 |
|
|
|
92 |
|
|
OpenIsis 1.x will provide the following applications and components |
93 |
|
|
(probably not all in 1.0, but 1.1 should be fairly complete): |
94 |
|
|
|
95 |
|
|
- the Malete database server |
96 |
|
|
for Linux and other UNIX-like systems written in ISO C. |
97 |
|
|
This is aimed towards miminal functionality at maximum performance. |
98 |
|
|
Intended usage is for high volume read only processing and |
99 |
|
|
read/write with application controlled indexing. |
100 |
|
|
On UNIX, the server will be multi process based. |
101 |
|
|
On Windows, use of multiple processes is restricted to read-only mode. |
102 |
|
|
- Malete and OpenIsis command line tools |
103 |
|
|
for all systems providing several tools including conversion |
104 |
|
|
from and to legacy CDS/ISIS file formats. |
105 |
|
|
- Java, Perl and PHP libraries |
106 |
|
|
to contact a server, all written completely in the respective language. |
107 |
|
|
These are aimed at tight language integration, |
108 |
|
|
leveraging the application language's strengths and programmer's skills. |
109 |
|
|
Will run on all systems as supported by each language. |
110 |
|
|
- a Tcl extension and library |
111 |
|
|
where the library acts similar to those for other languages |
112 |
|
|
(but based on a C implemented record) and the extension basically |
113 |
|
|
provides the server interface in process. |
114 |
|
|
- an application server |
115 |
|
|
for all systems (i.e. including Windows), |
116 |
|
|
providing database and http service, based on Tcl with or w/o Tk. |
117 |
|
|
While this will not achieve the high throughput of a purely C-based |
118 |
|
|
server, the Tcl layer can add virtually arbitrary functionality. |
119 |
|
|
Intended usage is for read/write with server controlled indexing |
120 |
|
|
and integrated http applications based on Tcl server pages. |
121 |
|
|
Servers based on other languages are waiting for volunteers. |
122 |
|
|
- a Tk based GUI |
123 |
|
|
for all systems. Can run standalone or acting as server and/or client. |
124 |
|
|
- the OpenIsis Tcl library |
125 |
|
|
providing support for CDS/ISIS-style applications, e.g. indexing |
126 |
|
|
similar to FSTs. |
127 |
|
|
- the OpenIsis application |
128 |
|
|
targeted towards users from the CDS/ISIS community, esp. librarians, |
129 |
|
|
to provide interoperability with existing ISIS databases and support |
130 |
|
|
for bibliographic formats in a user friendly way. |
131 |
|
|
Written in Tcl/Tk as a sister of OpenMLCM. |
132 |
|
|
|
133 |
|
|
|
134 |
|
|
|
135 |
|
|
* Malete modules |
136 |
|
|
|
137 |
|
|
The Malete database system is structured in the following modules: |
138 |
|
|
|
139 |
|
|
- core |
140 |
|
|
basic C library for handling, storing and retrieving simple records. |
141 |
|
|
- pw |
142 |
|
|
"patchwork" framework for high level database services based on message |
143 |
|
|
passing. Some designs are borrowed from the Lisp and Smalltalk languages. |
144 |
|
|
- tool |
145 |
|
|
helper functions and command line tool including |
146 |
|
|
communication utilities and standalone server |
147 |
|
|
- java, perl and php |
148 |
|
|
client modules |
149 |
|
|
- tcl |
150 |
|
|
extension and base library |
151 |
|
|
- app |
152 |
|
|
the Tcl based application server |
153 |
|
|
- gui |
154 |
|
|
a generic Tk graphical user interface |
155 |
|
|
|
156 |
|
|
|
157 |
|
|
On top of this, the OpenIsis 1.x application set contains: |
158 |
|
|
|
159 |
|
|
- old |
160 |
|
|
compatility functions and command line tool |
161 |
|
|
- isis |
162 |
|
|
the OpenIsis library and graphical user interface |
163 |
|
|
|
164 |
|
|
|
165 |
|
|
* ISAM core |
166 |
|
|
|
167 |
|
|
This implements a variant of ISAM (index sequential access method) |
168 |
|
|
based on the ideas of Z39.2 (IIF) and Z39.50 (Type-1 queries). |
169 |
|
|
It provides a fully open and unprotected interface |
170 |
|
|
for unrestricted access at maximum performance. |
171 |
|
|
The core library is not fully self contained, |
172 |
|
|
but will require a few functions like stream I/O to be provided |
173 |
|
|
by each environment. |
174 |
|
|
It makes only very limited use of metadata, |
175 |
|
|
dealing with "physical" aspects like file names, locks and character sets. |
176 |
|
|
|
177 |
|
|
- util |
178 |
|
|
basic list, sessions, output buffers and other utilities |
179 |
|
|
- system |
180 |
|
|
services like file IO and time |
181 |
|
|
- charset |
182 |
|
|
recoding and collation |
183 |
|
|
- storage |
184 |
|
|
set of functions for database file access (master file and b-tree) |
185 |
|
|
|
186 |
|
|
|
187 |
|
|
* patchwork |
188 |
|
|
|
189 |
|
|
The patchwork C library wraps the ISAM core into an extendible |
190 |
|
|
framework for high level database services, |
191 |
|
|
based on passing records as request and response messages to server objects. |
192 |
|
|
It provides a fully abstract and generic method call interface |
193 |
|
|
plus a couple of database objects. |
194 |
|
|
|
195 |
|
|
An object dispatches messages by checking their type and other parameters |
196 |
|
|
and taking appropriate action, including forwarding to parent objects. |
197 |
|
|
This is known as the "pure object oriented" approach, |
198 |
|
|
as these objects don't have any other interface but the message dispatcher, |
199 |
|
|
especially no directly accessible data. |
200 |
|
|
|
201 |
|
|
- struct |
202 |
|
|
higher level operators on ISIS records a la IIF (Z39.2/ISO2709) |
203 |
|
|
based on meta data, including various substructures |
204 |
|
|
- base |
205 |
|
|
dispatcher wrapping the ISAM core. |
206 |
|
|
Based on the 0.9 server, but with some modifications to allow |
207 |
|
|
for most efficient message passing. |
208 |
|
|
- query |
209 |
|
|
dispatcher for ISIS/Z39.50 Type-1 style queries |
210 |
|
|
- server |
211 |
|
|
dispatcher providing record relations, views and other magic |
212 |
|
|
|
213 |
|
|
|
214 |
|
|
* design guidelines |
215 |
|
|
|
216 |
|
|
requirements: |
217 |
|
|
- flexible and efficient buffered pushing of output. |
218 |
|
|
Pulling is not used on lower levels; |
219 |
|
|
every environment will solicit input on the outermost level as adequate. |
220 |
|
|
- flexible and efficient construction, manipulation and passing |
221 |
|
|
of records, especially embedded subrecords in the patchwork. |
222 |
|
|
|
223 |
|
|
|
224 |
|
|
principles: |
225 |
|
|
- everything is a list. |
226 |
|
|
Similar to Java's String and StringBuffer, |
227 |
|
|
there is the immutable "Rec" and the mutable "List". |
228 |
|
|
- uniform stream output. |
229 |
|
|
Conceptually, all output is a list. There is only one (output) "Stream", |
230 |
|
|
which may be backed by memory buffers, files or other channels like a GUI |
231 |
|
|
window, so even diagnostic output can be captured. |
232 |
|
|
- negative counted subrecords. |
233 |
|
|
The patchwork uses negative counted embedding, since this allows |
234 |
|
|
to pass on embedded records without any modifcation or copying. |
235 |
|
|
- low tag usage. |
236 |
|
|
Besides reserving all negative tags for embedding, only a minimal amount |
237 |
|
|
of tags should be defined. Instead subfielding will be used extensively. |
238 |
|
|
The patchwork message header uses tag 0, containing the message type |
239 |
|
|
as an indicator, followed by any number of simple options and |
240 |
|
|
parameters, resembling a command line (see below). |
241 |
|
|
Alphabetic keywords and mnemonics are favoured over numbers. |
242 |
|
|
- leader |
243 |
|
|
There always has been some out-of-band data on records like their mfn. |
244 |
|
|
This is now generalized in the concept of a record leader (see below). |
245 |
|
|
|
246 |
|
|
|
247 |
|
|
implementation notes: |
248 |
|
|
- immutable lists |
249 |
|
|
are just the same as a record embedded by negative counting, |
250 |
|
|
i.e. an array of fields, with the tag of the first being the negative |
251 |
|
|
total field count. |
252 |
|
|
- record leader |
253 |
|
|
The tag of the first field of embedded records contains leader-like meta info; |
254 |
|
|
for database records this is (optional) mfn plus a MARC leader. |
255 |
|
|
Since there should not be a difference between the representation of |
256 |
|
|
embedded and first level records, every record has a leader. |
257 |
|
|
- message leader |
258 |
|
|
A record representing a message also has a leader. |
259 |
|
|
Where the message is not embedded, it is sent as a leading 0-tagged field. |
260 |
|
|
Since message leaders start with an alphabetic character, |
261 |
|
|
the 0 and tab are omitted in the textual representation. |
262 |
|
|
Message leaders use tabs as separators and start with a word |
263 |
|
|
indicating the message type to the dispatcher. |
264 |
|
|
Following subfields are parameters, with or without identifiers. |
265 |
|
|
- getopt command lines |
266 |
|
|
a command line of the form "command -aopt1 -bopt2 arg1 arg2" can be |
267 |
|
|
easily and canonically wrapped into one field by removing the '-' |
268 |
|
|
option indicator and identifying the non-option args as subfield '@'. |
269 |
|
|
A commandline interface thus maps easily, and without the need |
270 |
|
|
for looking up meta information, to a message leader, |
271 |
|
|
from which the method identified by "command" can fetch options |
272 |
|
|
using a getopt-like utility. System and db parameters are likewise |
273 |
|
|
stored in the options file. |
274 |
|
|
- message body |
275 |
|
|
most messages use only one type of record parameters; |
276 |
|
|
however, special embedded records like indexing instructions |
277 |
|
|
can be recognized by their leader, where applicable. |
278 |
|
|
Where a message contains parameter fields (first level, not leader subfields), |
279 |
|
|
it must use positive tags for that, preferrably using low numbers. |
280 |
|
|
- direct embedding |
281 |
|
|
Where a message has no parameter fields, |
282 |
|
|
i.e. no parameters besides its leader's subfields and embedded records, |
283 |
|
|
and there is only one parameter record, the message may, |
284 |
|
|
as a convenient shorthand, allow to specify the embedded record's leader |
285 |
|
|
(mfn and db for database records) as message options and have its leader |
286 |
|
|
immediatly followed by the record data. |
287 |
|
|
In other words, the message sort of embedds itself in its parameter |
288 |
|
|
record's leader (and has to remove itself before passing it on). |
289 |
|
|
This is the form used by masterfile metalines (with ommitted 0). |
290 |
|
|
- system options |
291 |
|
|
can be specified on the command line or in a system options file. |
292 |
|
|
there is a global options list (e.g. verbosity) and per db options |
293 |
|
|
(like file paths and readonly). The commandline format is |
294 |
|
|
"-aglob1 -bglob2 dbname -xdb1 -ydb2 [... dbname ...]". |
295 |
|
|
The system options file contains (the textual representation of a |
296 |
|
|
record with 0-tagged) fields, one for each db, wrapped up like |
297 |
|
|
"dbname xbd1 ydb2" (with tabs). Those options are NOT stored |
298 |
|
|
in each db's .opt file or meta record. |
299 |
|
|
- database metadata |
300 |
|
|
contained in the db's .m0d file is basically a chained |
301 |
|
|
message to the core engine, mostly configuring the "transmission format" |
302 |
|
|
|
303 |
|
|
|
304 |
|
|
> http://uk.travel.yahoo.com/t/wc/germany/berlin/nightlife/malete.html Malete |
305 |
|
|
|
306 |
|
|
--- |
307 |
|
|
$Id: OverView.txt,v 1.4 2004/06/15 13:17:37 kripke Exp $ |