/[webpac]/openisis/0.9.9e/doc/OverView.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/0.9.9e/doc/OverView.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 604 - (show annotations)
Mon Dec 27 21:49:01 2004 UTC (19 years, 6 months ago) by dpavlin
File MIME type: text/plain
File size: 13354 byte(s)
import of new openisis release, 0.9.9e

1 Announcing Malete, the database engine powering OpenIsis 1.0
2
3
4 * from 0.9 to 1.0
5
6 Based on the 0.9 engine and especially its Tcl binding,
7 we had a system complete enough to do very intensive application testing
8 of all concepts, both handling bibliographical and terminological
9 as well as general industrial data.
10 With those experiences at hand we spent the second half of 2003
11 to give our then two year old software a complete overhaul,
12 in order to create a basis to last.
13
14
15 Along the traditional believes of Unix design we figured out
16 that the best and most stable combination of robustness/performance
17 with flexibility/convenience can be achieved by clearly separating
18 - a general purpose database system
19 which is very simple in order to be fast and robust and lay
20 a solid ground for flexibility, but itself is meant to be
21 accessed by other software (or geeks) rather than humans.
22 While this engine is based on the Z39.2 record model
23 (even supporting record leaders as used by MARC), it makes no special
24 provisions to support bibliographical data or CDS/ISIS legacy,
25 but rather tries to make this model appealing to general purpose
26 database usage. This engine is called Malete (kurdish for "our house").
27 Malete includes a database core library, generic server and
28 access libraries for various programming languages.
29 - a CDS/ISIS-style application
30 or, actually, like Winisis, a framework for applications.
31 This is targeted at CDS/ISIS users and librarians in general.
32 It provides support for conversion from and to a variety of
33 known file formats including MARC, high level indexing,
34 references (authority files, coded data), forms and so on.
35
36 In other terms, for retrieval you will rarely need more than the
37 Malete engine (plus some formatting for presentation, which is
38 usually done in a web programming language like PHP),
39 while for data entry you want a convenient graphical user interface
40 providing all sorts of lookups and checks.
41
42
43 * technical changes
44
45 - multiprocessing
46 For a variety of reasons (detailled elsewhere) we postponed support
47 for multi-threading (to at least until after the ongoing move towards
48 compiler supported thread local storage is stable and widely available).
49 Instead writing support by multiple processes is enabled based on
50 file locking. Fast and consistent caching even for processes with
51 very short life time (like CGI scripts) is achieved by replacing
52 the former explicit caching with memory mapping.
53 - platform indepence
54 Now both record and index data file formats are identical across
55 platforms (i.e. the same even on big endians like Suns and Macs).
56 Only the pointer and tree files are plattform dependent,
57 but are rebuilt from the data as needed.
58 - generalized record format
59 A record with n fields is now a series of n+1 tag-value pairs.
60 The tag of the first field is the negative total length -n-1
61 and the value is a record "header" consisting of the record id (MFN)
62 and optional leader data as used by MARC. Obviously, such a series
63 can be part of a larger record, meaning records can be easily nested.
64 - simplified serialized format
65 The serialized (textual) representation of a record,
66 used both in the masterfile and the server communications protocol,
67 has dropped low-level support for field values containing newlines.
68 Where needed, the application must apply proper encoding
69 (but tools for that are provided).
70 - simple transaction support
71 Updates to a record can optionally be qualified with the position
72 from which the record was last read, having the update fail
73 if the record has been modified meanwhile. Reads can be done in
74 "consistent snapshot" mode, reflecting the state of the database
75 at one given point in time.
76 - unified message interface
77 The server communications protocol has been simplified and straigthened out.
78 The masterfile now is only a special case of this protocol and thus
79 can be directly sent to a server. Conceptually every record is a
80 message saying "write me".
81 - ucspi based server
82 The server is designed to run under tcpserver, meaning it can take
83 advantage of all of its features like access control, basic client
84 authentication, IPv6, SSL encryption and so on.
85
86 For more details see
87 > Diff09
88
89
90 * applications and components
91
92 OpenIsis 1.x will provide the following applications and components
93 (probably not all in 1.0, but 1.1 should be fairly complete):
94
95 - the Malete database server
96 for Linux and other UNIX-like systems written in ISO C.
97 This is aimed towards miminal functionality at maximum performance.
98 Intended usage is for high volume read only processing and
99 read/write with application controlled indexing.
100 On UNIX, the server will be multi process based.
101 On Windows, use of multiple processes is restricted to read-only mode.
102 - Malete and OpenIsis command line tools
103 for all systems providing several tools including conversion
104 from and to legacy CDS/ISIS file formats.
105 - Java, Perl and PHP libraries
106 to contact a server, all written completely in the respective language.
107 These are aimed at tight language integration,
108 leveraging the application language's strengths and programmer's skills.
109 Will run on all systems as supported by each language.
110 - a Tcl extension and library
111 where the library acts similar to those for other languages
112 (but based on a C implemented record) and the extension basically
113 provides the server interface in process.
114 - an application server
115 for all systems (i.e. including Windows),
116 providing database and http service, based on Tcl with or w/o Tk.
117 While this will not achieve the high throughput of a purely C-based
118 server, the Tcl layer can add virtually arbitrary functionality.
119 Intended usage is for read/write with server controlled indexing
120 and integrated http applications based on Tcl server pages.
121 Servers based on other languages are waiting for volunteers.
122 - a Tk based GUI
123 for all systems. Can run standalone or acting as server and/or client.
124 - the OpenIsis Tcl library
125 providing support for CDS/ISIS-style applications, e.g. indexing
126 similar to FSTs.
127 - the OpenIsis application
128 targeted towards users from the CDS/ISIS community, esp. librarians,
129 to provide interoperability with existing ISIS databases and support
130 for bibliographic formats in a user friendly way.
131 Written in Tcl/Tk as a sister of OpenMLCM.
132
133
134
135 * Malete modules
136
137 The Malete database system is structured in the following modules:
138
139 - core
140 basic C library for handling, storing and retrieving simple records.
141 - pw
142 "patchwork" framework for high level database services based on message
143 passing. Some designs are borrowed from the Lisp and Smalltalk languages.
144 - tool
145 helper functions and command line tool including
146 communication utilities and standalone server
147 - java, perl and php
148 client modules
149 - tcl
150 extension and base library
151 - app
152 the Tcl based application server
153 - gui
154 a generic Tk graphical user interface
155
156
157 On top of this, the OpenIsis 1.x application set contains:
158
159 - old
160 compatility functions and command line tool
161 - isis
162 the OpenIsis library and graphical user interface
163
164
165 * ISAM core
166
167 This implements a variant of ISAM (index sequential access method)
168 based on the ideas of Z39.2 (IIF) and Z39.50 (Type-1 queries).
169 It provides a fully open and unprotected interface
170 for unrestricted access at maximum performance.
171 The core library is not fully self contained,
172 but will require a few functions like stream I/O to be provided
173 by each environment.
174 It makes only very limited use of metadata,
175 dealing with "physical" aspects like file names, locks and character sets.
176
177 - util
178 basic list, sessions, output buffers and other utilities
179 - system
180 services like file IO and time
181 - charset
182 recoding and collation
183 - storage
184 set of functions for database file access (master file and b-tree)
185
186
187 * patchwork
188
189 The patchwork C library wraps the ISAM core into an extendible
190 framework for high level database services,
191 based on passing records as request and response messages to server objects.
192 It provides a fully abstract and generic method call interface
193 plus a couple of database objects.
194
195 An object dispatches messages by checking their type and other parameters
196 and taking appropriate action, including forwarding to parent objects.
197 This is known as the "pure object oriented" approach,
198 as these objects don't have any other interface but the message dispatcher,
199 especially no directly accessible data.
200
201 - struct
202 higher level operators on ISIS records a la IIF (Z39.2/ISO2709)
203 based on meta data, including various substructures
204 - base
205 dispatcher wrapping the ISAM core.
206 Based on the 0.9 server, but with some modifications to allow
207 for most efficient message passing.
208 - query
209 dispatcher for ISIS/Z39.50 Type-1 style queries
210 - server
211 dispatcher providing record relations, views and other magic
212
213
214 * design guidelines
215
216 requirements:
217 - flexible and efficient buffered pushing of output.
218 Pulling is not used on lower levels;
219 every environment will solicit input on the outermost level as adequate.
220 - flexible and efficient construction, manipulation and passing
221 of records, especially embedded subrecords in the patchwork.
222
223
224 principles:
225 - everything is a list.
226 Similar to Java's String and StringBuffer,
227 there is the immutable "Rec" and the mutable "List".
228 - uniform stream output.
229 Conceptually, all output is a list. There is only one (output) "Stream",
230 which may be backed by memory buffers, files or other channels like a GUI
231 window, so even diagnostic output can be captured.
232 - negative counted subrecords.
233 The patchwork uses negative counted embedding, since this allows
234 to pass on embedded records without any modifcation or copying.
235 - low tag usage.
236 Besides reserving all negative tags for embedding, only a minimal amount
237 of tags should be defined. Instead subfielding will be used extensively.
238 The patchwork message header uses tag 0, containing the message type
239 as an indicator, followed by any number of simple options and
240 parameters, resembling a command line (see below).
241 Alphabetic keywords and mnemonics are favoured over numbers.
242 - leader
243 There always has been some out-of-band data on records like their mfn.
244 This is now generalized in the concept of a record leader (see below).
245
246
247 implementation notes:
248 - immutable lists
249 are just the same as a record embedded by negative counting,
250 i.e. an array of fields, with the tag of the first being the negative
251 total field count.
252 - record leader
253 The tag of the first field of embedded records contains leader-like meta info;
254 for database records this is (optional) mfn plus a MARC leader.
255 Since there should not be a difference between the representation of
256 embedded and first level records, every record has a leader.
257 - message leader
258 A record representing a message also has a leader.
259 Where the message is not embedded, it is sent as a leading 0-tagged field.
260 Since message leaders start with an alphabetic character,
261 the 0 and tab are omitted in the textual representation.
262 Message leaders use tabs as separators and start with a word
263 indicating the message type to the dispatcher.
264 Following subfields are parameters, with or without identifiers.
265 - getopt command lines
266 a command line of the form "command -aopt1 -bopt2 arg1 arg2" can be
267 easily and canonically wrapped into one field by removing the '-'
268 option indicator and identifying the non-option args as subfield '@'.
269 A commandline interface thus maps easily, and without the need
270 for looking up meta information, to a message leader,
271 from which the method identified by "command" can fetch options
272 using a getopt-like utility. System and db parameters are likewise
273 stored in the options file.
274 - message body
275 most messages use only one type of record parameters;
276 however, special embedded records like indexing instructions
277 can be recognized by their leader, where applicable.
278 Where a message contains parameter fields (first level, not leader subfields),
279 it must use positive tags for that, preferrably using low numbers.
280 - direct embedding
281 Where a message has no parameter fields,
282 i.e. no parameters besides its leader's subfields and embedded records,
283 and there is only one parameter record, the message may,
284 as a convenient shorthand, allow to specify the embedded record's leader
285 (mfn and db for database records) as message options and have its leader
286 immediatly followed by the record data.
287 In other words, the message sort of embedds itself in its parameter
288 record's leader (and has to remove itself before passing it on).
289 This is the form used by masterfile metalines (with ommitted 0).
290 - system options
291 can be specified on the command line or in a system options file.
292 there is a global options list (e.g. verbosity) and per db options
293 (like file paths and readonly). The commandline format is
294 "-aglob1 -bglob2 dbname -xdb1 -ydb2 [... dbname ...]".
295 The system options file contains (the textual representation of a
296 record with 0-tagged) fields, one for each db, wrapped up like
297 "dbname xbd1 ydb2" (with tabs). Those options are NOT stored
298 in each db's .opt file or meta record.
299 - database metadata
300 contained in the db's .m0d file is basically a chained
301 message to the core engine, mostly configuring the "transmission format"
302
303
304 > http://uk.travel.yahoo.com/t/wc/germany/berlin/nightlife/malete.html Malete
305
306 ---
307 $Id: OverView.txt,v 1.4 2004/06/15 13:17:37 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26