/[webpac]/openisis/0.9.9e/doc/OverView.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/0.9.9e/doc/OverView.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 604 - (hide annotations)
Mon Dec 27 21:49:01 2004 UTC (19 years, 4 months ago) by dpavlin
File MIME type: text/plain
File size: 13354 byte(s)
import of new openisis release, 0.9.9e

1 dpavlin 604 Announcing Malete, the database engine powering OpenIsis 1.0
2    
3    
4     * from 0.9 to 1.0
5    
6     Based on the 0.9 engine and especially its Tcl binding,
7     we had a system complete enough to do very intensive application testing
8     of all concepts, both handling bibliographical and terminological
9     as well as general industrial data.
10     With those experiences at hand we spent the second half of 2003
11     to give our then two year old software a complete overhaul,
12     in order to create a basis to last.
13    
14    
15     Along the traditional believes of Unix design we figured out
16     that the best and most stable combination of robustness/performance
17     with flexibility/convenience can be achieved by clearly separating
18     - a general purpose database system
19     which is very simple in order to be fast and robust and lay
20     a solid ground for flexibility, but itself is meant to be
21     accessed by other software (or geeks) rather than humans.
22     While this engine is based on the Z39.2 record model
23     (even supporting record leaders as used by MARC), it makes no special
24     provisions to support bibliographical data or CDS/ISIS legacy,
25     but rather tries to make this model appealing to general purpose
26     database usage. This engine is called Malete (kurdish for "our house").
27     Malete includes a database core library, generic server and
28     access libraries for various programming languages.
29     - a CDS/ISIS-style application
30     or, actually, like Winisis, a framework for applications.
31     This is targeted at CDS/ISIS users and librarians in general.
32     It provides support for conversion from and to a variety of
33     known file formats including MARC, high level indexing,
34     references (authority files, coded data), forms and so on.
35    
36     In other terms, for retrieval you will rarely need more than the
37     Malete engine (plus some formatting for presentation, which is
38     usually done in a web programming language like PHP),
39     while for data entry you want a convenient graphical user interface
40     providing all sorts of lookups and checks.
41    
42    
43     * technical changes
44    
45     - multiprocessing
46     For a variety of reasons (detailled elsewhere) we postponed support
47     for multi-threading (to at least until after the ongoing move towards
48     compiler supported thread local storage is stable and widely available).
49     Instead writing support by multiple processes is enabled based on
50     file locking. Fast and consistent caching even for processes with
51     very short life time (like CGI scripts) is achieved by replacing
52     the former explicit caching with memory mapping.
53     - platform indepence
54     Now both record and index data file formats are identical across
55     platforms (i.e. the same even on big endians like Suns and Macs).
56     Only the pointer and tree files are plattform dependent,
57     but are rebuilt from the data as needed.
58     - generalized record format
59     A record with n fields is now a series of n+1 tag-value pairs.
60     The tag of the first field is the negative total length -n-1
61     and the value is a record "header" consisting of the record id (MFN)
62     and optional leader data as used by MARC. Obviously, such a series
63     can be part of a larger record, meaning records can be easily nested.
64     - simplified serialized format
65     The serialized (textual) representation of a record,
66     used both in the masterfile and the server communications protocol,
67     has dropped low-level support for field values containing newlines.
68     Where needed, the application must apply proper encoding
69     (but tools for that are provided).
70     - simple transaction support
71     Updates to a record can optionally be qualified with the position
72     from which the record was last read, having the update fail
73     if the record has been modified meanwhile. Reads can be done in
74     "consistent snapshot" mode, reflecting the state of the database
75     at one given point in time.
76     - unified message interface
77     The server communications protocol has been simplified and straigthened out.
78     The masterfile now is only a special case of this protocol and thus
79     can be directly sent to a server. Conceptually every record is a
80     message saying "write me".
81     - ucspi based server
82     The server is designed to run under tcpserver, meaning it can take
83     advantage of all of its features like access control, basic client
84     authentication, IPv6, SSL encryption and so on.
85    
86     For more details see
87     > Diff09
88    
89    
90     * applications and components
91    
92     OpenIsis 1.x will provide the following applications and components
93     (probably not all in 1.0, but 1.1 should be fairly complete):
94    
95     - the Malete database server
96     for Linux and other UNIX-like systems written in ISO C.
97     This is aimed towards miminal functionality at maximum performance.
98     Intended usage is for high volume read only processing and
99     read/write with application controlled indexing.
100     On UNIX, the server will be multi process based.
101     On Windows, use of multiple processes is restricted to read-only mode.
102     - Malete and OpenIsis command line tools
103     for all systems providing several tools including conversion
104     from and to legacy CDS/ISIS file formats.
105     - Java, Perl and PHP libraries
106     to contact a server, all written completely in the respective language.
107     These are aimed at tight language integration,
108     leveraging the application language's strengths and programmer's skills.
109     Will run on all systems as supported by each language.
110     - a Tcl extension and library
111     where the library acts similar to those for other languages
112     (but based on a C implemented record) and the extension basically
113     provides the server interface in process.
114     - an application server
115     for all systems (i.e. including Windows),
116     providing database and http service, based on Tcl with or w/o Tk.
117     While this will not achieve the high throughput of a purely C-based
118     server, the Tcl layer can add virtually arbitrary functionality.
119     Intended usage is for read/write with server controlled indexing
120     and integrated http applications based on Tcl server pages.
121     Servers based on other languages are waiting for volunteers.
122     - a Tk based GUI
123     for all systems. Can run standalone or acting as server and/or client.
124     - the OpenIsis Tcl library
125     providing support for CDS/ISIS-style applications, e.g. indexing
126     similar to FSTs.
127     - the OpenIsis application
128     targeted towards users from the CDS/ISIS community, esp. librarians,
129     to provide interoperability with existing ISIS databases and support
130     for bibliographic formats in a user friendly way.
131     Written in Tcl/Tk as a sister of OpenMLCM.
132    
133    
134    
135     * Malete modules
136    
137     The Malete database system is structured in the following modules:
138    
139     - core
140     basic C library for handling, storing and retrieving simple records.
141     - pw
142     "patchwork" framework for high level database services based on message
143     passing. Some designs are borrowed from the Lisp and Smalltalk languages.
144     - tool
145     helper functions and command line tool including
146     communication utilities and standalone server
147     - java, perl and php
148     client modules
149     - tcl
150     extension and base library
151     - app
152     the Tcl based application server
153     - gui
154     a generic Tk graphical user interface
155    
156    
157     On top of this, the OpenIsis 1.x application set contains:
158    
159     - old
160     compatility functions and command line tool
161     - isis
162     the OpenIsis library and graphical user interface
163    
164    
165     * ISAM core
166    
167     This implements a variant of ISAM (index sequential access method)
168     based on the ideas of Z39.2 (IIF) and Z39.50 (Type-1 queries).
169     It provides a fully open and unprotected interface
170     for unrestricted access at maximum performance.
171     The core library is not fully self contained,
172     but will require a few functions like stream I/O to be provided
173     by each environment.
174     It makes only very limited use of metadata,
175     dealing with "physical" aspects like file names, locks and character sets.
176    
177     - util
178     basic list, sessions, output buffers and other utilities
179     - system
180     services like file IO and time
181     - charset
182     recoding and collation
183     - storage
184     set of functions for database file access (master file and b-tree)
185    
186    
187     * patchwork
188    
189     The patchwork C library wraps the ISAM core into an extendible
190     framework for high level database services,
191     based on passing records as request and response messages to server objects.
192     It provides a fully abstract and generic method call interface
193     plus a couple of database objects.
194    
195     An object dispatches messages by checking their type and other parameters
196     and taking appropriate action, including forwarding to parent objects.
197     This is known as the "pure object oriented" approach,
198     as these objects don't have any other interface but the message dispatcher,
199     especially no directly accessible data.
200    
201     - struct
202     higher level operators on ISIS records a la IIF (Z39.2/ISO2709)
203     based on meta data, including various substructures
204     - base
205     dispatcher wrapping the ISAM core.
206     Based on the 0.9 server, but with some modifications to allow
207     for most efficient message passing.
208     - query
209     dispatcher for ISIS/Z39.50 Type-1 style queries
210     - server
211     dispatcher providing record relations, views and other magic
212    
213    
214     * design guidelines
215    
216     requirements:
217     - flexible and efficient buffered pushing of output.
218     Pulling is not used on lower levels;
219     every environment will solicit input on the outermost level as adequate.
220     - flexible and efficient construction, manipulation and passing
221     of records, especially embedded subrecords in the patchwork.
222    
223    
224     principles:
225     - everything is a list.
226     Similar to Java's String and StringBuffer,
227     there is the immutable "Rec" and the mutable "List".
228     - uniform stream output.
229     Conceptually, all output is a list. There is only one (output) "Stream",
230     which may be backed by memory buffers, files or other channels like a GUI
231     window, so even diagnostic output can be captured.
232     - negative counted subrecords.
233     The patchwork uses negative counted embedding, since this allows
234     to pass on embedded records without any modifcation or copying.
235     - low tag usage.
236     Besides reserving all negative tags for embedding, only a minimal amount
237     of tags should be defined. Instead subfielding will be used extensively.
238     The patchwork message header uses tag 0, containing the message type
239     as an indicator, followed by any number of simple options and
240     parameters, resembling a command line (see below).
241     Alphabetic keywords and mnemonics are favoured over numbers.
242     - leader
243     There always has been some out-of-band data on records like their mfn.
244     This is now generalized in the concept of a record leader (see below).
245    
246    
247     implementation notes:
248     - immutable lists
249     are just the same as a record embedded by negative counting,
250     i.e. an array of fields, with the tag of the first being the negative
251     total field count.
252     - record leader
253     The tag of the first field of embedded records contains leader-like meta info;
254     for database records this is (optional) mfn plus a MARC leader.
255     Since there should not be a difference between the representation of
256     embedded and first level records, every record has a leader.
257     - message leader
258     A record representing a message also has a leader.
259     Where the message is not embedded, it is sent as a leading 0-tagged field.
260     Since message leaders start with an alphabetic character,
261     the 0 and tab are omitted in the textual representation.
262     Message leaders use tabs as separators and start with a word
263     indicating the message type to the dispatcher.
264     Following subfields are parameters, with or without identifiers.
265     - getopt command lines
266     a command line of the form "command -aopt1 -bopt2 arg1 arg2" can be
267     easily and canonically wrapped into one field by removing the '-'
268     option indicator and identifying the non-option args as subfield '@'.
269     A commandline interface thus maps easily, and without the need
270     for looking up meta information, to a message leader,
271     from which the method identified by "command" can fetch options
272     using a getopt-like utility. System and db parameters are likewise
273     stored in the options file.
274     - message body
275     most messages use only one type of record parameters;
276     however, special embedded records like indexing instructions
277     can be recognized by their leader, where applicable.
278     Where a message contains parameter fields (first level, not leader subfields),
279     it must use positive tags for that, preferrably using low numbers.
280     - direct embedding
281     Where a message has no parameter fields,
282     i.e. no parameters besides its leader's subfields and embedded records,
283     and there is only one parameter record, the message may,
284     as a convenient shorthand, allow to specify the embedded record's leader
285     (mfn and db for database records) as message options and have its leader
286     immediatly followed by the record data.
287     In other words, the message sort of embedds itself in its parameter
288     record's leader (and has to remove itself before passing it on).
289     This is the form used by masterfile metalines (with ommitted 0).
290     - system options
291     can be specified on the command line or in a system options file.
292     there is a global options list (e.g. verbosity) and per db options
293     (like file paths and readonly). The commandline format is
294     "-aglob1 -bglob2 dbname -xdb1 -ydb2 [... dbname ...]".
295     The system options file contains (the textual representation of a
296     record with 0-tagged) fields, one for each db, wrapped up like
297     "dbname xbd1 ydb2" (with tabs). Those options are NOT stored
298     in each db's .opt file or meta record.
299     - database metadata
300     contained in the db's .m0d file is basically a chained
301     message to the core engine, mostly configuring the "transmission format"
302    
303    
304     > http://uk.travel.yahoo.com/t/wc/germany/berlin/nightlife/malete.html Malete
305    
306     ---
307     $Id: OverView.txt,v 1.4 2004/06/15 13:17:37 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26