/[webpac]/openisis/0.9.9e/doc/Protocol.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/0.9.9e/doc/Protocol.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 604 - (hide annotations)
Mon Dec 27 21:49:01 2004 UTC (19 years, 4 months ago) by dpavlin
File MIME type: text/plain
File size: 20039 byte(s)
import of new openisis release, 0.9.9e

1 dpavlin 604 The Malete server protocol.
2    
3    
4     * introduction
5    
6     The Malete server is based on passing of messages, which are represented
7     as records. The only interface to the server can be regarded as a single
8     function "send", which takes a record as parameter and returns a record.
9     The result record itself is a valid message.
10     This "send" can be actually invoked in one of two ways:
11    
12     - by having the server in process
13     i.e. by actually calling the C function "send",
14     possibly via some wrapper to interface another programming language.
15     This is the way the Malete Tcl extension works.
16     - via some bytestream
17     This can be regarded as just one of the wrappers, interfacing a
18     bytestream by deserializing message records from the bytestream
19     and serializing result records to the bytestream.
20     The standard server process uses stdin and stdout and thus can
21     be invoked by executing it from pipes or by contacting it via TCP,
22     when running from
23     > http://openisis.org/Doc/UcspiSsl tcpserver.
24     As a special case, the record data file itself is such a bytestream,
25     however only containing simple write messages.
26    
27     The server maintains a session state bound to a bytestream,
28     e.g. one TCP connection.
29    
30    
31     * messages and data
32    
33     In Malete, every record has a "header", which is the value of the first field.
34     The header specifies which message the record represents,
35     with the following fields ("body") containing parameter data for the message.
36    
37     Recall that
38     - the first field's tag denotes the number of fields in the record
39     - a "data record" is a record that can be written to a database.
40     This requires a record id (MFN), which, however, can be 0
41     to denote an append with the next available id.
42     - for a data record read from or written to the database,
43     the header will/must be empty or start with a digit.
44     The general format is 'rid[@pos][*TAB*leader]'.
45     Rid is the record id (MFN), which on write may be 0 to append a new record.
46     Pos is the optional old position to guard an updating write
47     against concurrent changes.
48     Leader contains arbitrary data like e.g. a MARC leader,
49     a record key or a message header.
50    
51     Proper message headers are not empty and do not start with a digit.
52     The first token of a message header (up to a *TAB* or end of value)
53     is the message name, optionally qualified by a message target,
54     i.e. an object to receive the message (usually a database).
55    
56    
57     However, messages and data are converted into each other canonically:
58     - If a data record header is encountered where a message is expected,
59     it is treated as a write message as if 'W*TAB*' where prepended
60     (which oviously will write just this record).
61     Even the empty message (a record with 0 fields) is a valid message
62     and will append an empty record when sent to a database.
63     - If a message is treated as data, its header is treated as leader
64     as if '0*TAB*' where prepended.
65    
66    
67     * message targets are objects
68    
69     A server processes messages by first looking up a target object by
70     inspecting and stripping an initial addressing part of the message header
71     (or resorting to some default) and then passing the message to this object.
72     (Actually, even this dispatching is done by an object, the session).
73    
74    
75     In general, objects are free in how they process messages.
76     For example, an object might represent a (session on a) remote server,
77     and simply pass every message there. Objects using the same processing
78     function are said to be in the same "class". Commonly processing functions
79     handle only some known messages and pass anything else on to the function
80     of another class, which is called "inheriting from this class".
81    
82    
83     Objects to which messages can be send are
84     - a structure
85     is a collection of other (child) objects like databases (tables).
86     It does basically nothing but passing messages to its childs.
87     It may support a listing of the known childs.
88     The structure interface may be implemented locally or as a remote server.
89     - a database (table)
90     supports reading and writing of record and query data.
91     A database is a structure, it may support childs e.g. to provide views.
92     - a session
93     is a structure representing the connection to a (local or remote) server.
94     It passes messages to the server's childs (like databases) and maintains
95     some state, called the environment.
96    
97    
98     Any object should recognize
99     - the comment '#'
100     a special message used to pass additional info (echo/error)
101     - rooting '.'
102     the message is passed to the session as is.
103     A session strips the '.' and processes the rest as usual.
104     - options '=' (optional extension)
105     to get or set values of object options (not implemented).
106     - messages starting with other special characters like '|' and ';'
107     are reserved for future special processing
108    
109    
110     A structure in addition recognizes
111     - child addressing '.'
112     if the message name starts with a letter and contains a dot '.',
113     everything up to the dot is taken as the name of a child.
114     After stripping the child qualifier, the message is send to the child.
115     With no additional message, the child's existence is tested
116     and returned in a comment.
117     The qualification can contain several dots, which are processed from left.
118     Therefore, 'a.b.c' means to send message 'b.c' to target 'a',
119     which could be for example a remote server, which in turn is expected
120     to somehow dispatch message 'c' to its local child 'b'.
121    
122     A session also supports:
123     - default path (optional extension)
124     Similar to a current working directory, a default path can be set
125     as session option '@', which is then lexically prepended to any
126     unrooted request to the session. (not implemented).
127    
128    
129     The standard messages a database should recognize are
130     - the write message W
131     writing one or more records to a database
132     - the read message R
133     reading records by record id
134     - the query message Q
135     to search the query data (btree index)
136     - the index message X
137     to write index data
138    
139    
140    
141     Standard message and object names always start with an ASCII letter.
142     As a convention, message names should start uppercase and
143     object names lowercase.
144    
145     Every message returns an error comment message in case of error
146     or another message as specified (possibly the empty message).
147    
148    
149     The body of a message (i.e. the fields following the header)
150     may define a fixed or variable number of parameter fields
151     or one or more records, which are in turn, depending on the message,
152     used as message or data records (generally regardless of their contents):
153     - header only:
154     The message is not using any fields or records as parameter.
155     Such messages treat any body as embedded records (see below) specifying
156     one or more chained messages, which are then processed in turn.
157     A possible but currently unused generalization of this is
158     a fixed number of parameter fields.
159     - parameter list:
160     the contents of following fields is interpreted by the message itself.
161     Many messages use only one type of parameter fields and ignore their tags.
162     - embedded records:
163     Each of the records begins with a proper header field,
164     with the tag being its negative length (including the header).
165     A tag of 0 is treated as using all available fields.
166     Should such a tag be positive or specify a length
167     exceeding the number of available fields, the result is undefined,
168     but either an error or treating it as record using all available fields.
169     - immediate record:
170     Some messages also support a short form, where they do not themselves
171     take all of their header, but only chop off some initial part of it,
172     using the remaining message as record.
173    
174    
175     * write
176    
177     The write message takes one of two forms:
178     - short write (immediate record):
179     The header is of the form 'W*TAB*rid[@pos][*TAB*leader]',
180     and the following fields are the body of a record to write.
181     This message writes the record with header 'rid[@pos][*TAB*leader]'
182     and the body as given by the following fields.
183     It returns a short read message with the record id written.
184     - long write (embedded records):
185     The header is a single 'W'. The body contains any number of embedded records.
186     Multiwrite returns a long read message with the record ids written.
187     With an empty body, long write can be used to test the existence
188     and writeability of a database.
189    
190     Note that there is no special support for deleting records;
191     writing empty records has the same effect.
192    
193    
194     * read
195    
196     Like write, the read message takes one of two forms,
197     all returning a long write for the retrieved records:
198     - short read (header only):
199     The header is of the form 'R*TAB*rid[*TAB*count]'.
200     It reads count (default 1) records starting at record rid.
201     A count of 0 reads any records as available and within the read limit.
202     Note that a read of record 0 retrieves the metadata.
203     - long read (parameter list):
204     The header is a single 'R'.
205     The following fields contain one record id each.
206    
207     Note that
208     - the number of records read at once is limited by the session option 'r'
209     - read might retrieve older versions of records,
210     if the database has a snapshot position set
211    
212    
213     * query
214    
215     The query message is of the form 'Q[*TAB*query]',
216     where query is an expression in the
217     > Query Malete query language.
218     With parameters, the query message creates a new query as the current.
219     With or without parameters, the query message returns an echo
220     of the estimated remaining result set size, followed by a long write
221     containing the next 'r' records from the current query set
222     (subject to a snapshot like read).
223    
224     The query can contain two parts, separated by a '?':
225     - an index based search defining a result set.
226     If it is empty, the search result set is the entire database.
227     - a filter to be applied on record retrieval.
228     If no filter is specified (i.e. no '?'), only record ids are returned.
229     An empty filter selects every record with all fields.
230     Other filters will select records and/or fields.
231    
232     In future versions, one or both parts might be specified as embedded
233     records. By now, however, the query message is header only.
234    
235    
236     Note that
237     - the session keeps a total of 'q' queries with the query expression,
238     the cursor (offset of next record to retrieve) and search result set.
239     If a query expression is only a reference '#n' to an open query,
240     this query is used from its current position without establishing
241     a new query.
242     - the size of a search result set is limited by the session option 's'.
243     This limit applies also to any intermediate result, thus the
244     actual set might be much smaller or even empty due to the limit.
245     Some search expressions might allow larger set sizes,
246     especially the empty one does (since no record ids need to be stored).
247    
248     The returned echo contains several numbers:
249     - estimated number of remaining records, including the ones just read.
250     This number may be wrong for a number of reasons, especially it does
251     not account for filtering. However, if it equals the number of returned
252     records, it is safe to assume that there are no more records.
253     This number is the primary echo code, if it is negative,
254     the rest of the echo is some error message.
255     - number of the query, by which it can be referenced.
256     These numbers are per database.
257     - truncation record id. If not 0, this is a record id where the search
258     was truncated due to the result set size limit.
259     Future versions might support transparent continuation after truncation.
260    
261    
262     * terms
263    
264     The terms message has one of the forms
265     - 'T*TAB*from*TAB*to'
266     Selects terms greater or equal the first parameter and less than the second.
267     Where the second parameter is empty, no upper bound is used.
268     - 'T*TAB*prefix'
269     Selects terms with the parameter as prefix.
270     Using a prefix ABC is just a shorthand for from ABC to ABD.
271     - 'T*TAB*from*TAB*to*TAB*tag'
272     Like the first form, but restrict matches to the given tag (number).
273    
274    
275     Terms are returned as a list (record with 0-tagged fields),
276     where each field value is a count of hits of the term,
277     followed by a *TAB* and the term.
278     The list is limited to the result set size.
279     The full index can be looped by using the last returned term
280     as from parameter for the next invocation.
281    
282    
283     When not restricting to a tag, the hit count is just the number of all
284     index entries for the selected terms. This may be higher than the number
285     of matched records, where a term has multiple hits for the same records.
286    
287     With a restriction to a tag, the count is the actual number of records
288     (even where a term has multiple entries for the same record and tag).
289     If the database uses the traditional fulltext index format (the default),
290     tag 0 selects any tag, else tag 0 selects actual tag 0 entries (unique keys).
291    
292    
293     * index
294    
295     The index message 'X' takes a parameter list of data and control fields.
296     Control fields have tag 0 and change the way the data fields are processed.
297     All other fields contain index data. During processing of the message,
298     a position counter is maintained which is incremented by one for every word
299     (in word or split mode), to the next multiple of the field step (default 65536)
300     for every field (1 in word mode), and reset to 0 on tag change.
301    
302    
303     Every control field contains one or more instructions
304     (as always, separated by TABs):
305     - f[pos]
306     sets default (full field) indexing mode where every data field contains
307     one index entry. The position is set to the given or 0 and then
308     incremented to the field step.
309     - w[pos]
310     Like field mode, but incrementing the position by one.
311     - s[pos]
312     Split mode, where each data field is split into words according
313     to collation info.
314     If the index has no collation info, all characters but the well-known
315     ASCII non-letters are assumed to be word characters.
316     - a[pos]
317     set add mode (default)
318     - d[pos]
319     set delete mode: following index entries are deleted.
320     - m[mode]
321     mode 'H' selects traditional conversion of angle brackets:
322     <a[=b]> is replaced by b (or nothing).
323     mode 'P' or none turns this off.
324     - p*pfx*
325     prepend prefix pfx to index entries
326     - r*id*
327     set record id (defaults to the session's last written record)
328     - [+|-]*tag*
329     where tag is a number, stops processing of the field and treats
330     everything after the next *TAB* as data field with *tag*.
331     With a leading + or -, set mode to add or del, resp.
332    
333     Control instructions may also be part of the message header.
334     The index message echoes a count of the index entries made.
335    
336    
337     * comment
338    
339     The comment message '#' is used to augment other messages.
340     It is header only (executing any body) of the form '#*TAB*code[*TAB*message]',
341     where code is a number.
342     A nonnegative code indicates a success, typically some count.
343     A negative code indicates some sort of error (-1..-10) or notification.
344     Message is arbitrary.
345     This message copies itself to the result.
346    
347    
348     * options
349    
350     Some objects have options, which can be given as subfields
351     in some configuration header for the object and be set and retrieved
352     using the '=' message. The '=' message echoes a comment containing
353     some or all options as subfields.
354    
355     - a single '=' echoes all options
356     - '=' immediatly followed by option characters echoes these options
357     - additional subfields set options and, after a single '=', echo these.
358    
359    
360     * special message processing
361    
362     optional extensions
363    
364     There are more special messages envisioned which are used to control or
365     modify the processing of one or more other messages.
366     Given here is a rough sketch as a guide for future implementation,
367     however, this may be not yet implemented and is still subject to change.
368    
369    
370     The pipe '|' reuses the result created by one message as or for another message.
371     It scans its header for occurences of '*TAB*|' (i.e. tabseparated subfields
372     with subfield code '|'), each of which starts a new submessage.
373     Iteratively, the part of the header up to the next submessage is processed
374     as a message, creating a result.
375    
376     Then if the next submessage
377     (the part of the header starting with the next character after the pipe
378     and extending to the character before the next '*TAB*|' or end of header)
379     - is empty,
380     the result is processed as message.
381     This is convenient to immediatly execute the read returned by a query.
382     - starts with a *TAB*,
383     the submessage (including the *TAB*) is appended to the result's header,
384     and the result is processed as message.
385     - else,
386     the result's header is echoed to the final (not the next intermediate)
387     result and then replaced by the submessage before processing.
388    
389     As a special case, if the pipe message header did not contain any '*TAB*|',
390     it is treated as with '*TAB*|' at end, i.e. the only submessage's result
391     is executed (mimicking the effect of backticks).
392    
393     In a long form, where the pipe message header is only the '|',
394     the submessages are embedded records in the body.
395     Here, in each step, any body fields of the following submessage
396     are prepended to the result before execution.
397    
398    
399     The composition ';' processes several messages, appending to the same result.
400     In the long form, submessages are embedded records.
401     In the compact form, the header is split into submessages as for the pipe.
402     (Details to specify).
403    
404    
405     * serialization
406    
407     Message can be represented in byte streams according to the following rules:
408     - Field values (including the header) MUST NOT contain a newline character,
409     else the results are undefined. Where an application must be prepared
410     to handle newlines, it must take care of encoding them (see below).
411     - If the message header is empty, no header is printed
412     - else if the message is a regular message (not starting with a digit),
413     the header is printed followed by a newline.
414     - else 'W*TAB*' is printed followed by the header and a newline.
415     - All body fields are printed as the tag followed by a *TAB*,
416     the value and a newline.
417     - A single newline is printed to terminate the message.
418    
419    
420     On deserialization, if a message starts with a number (digit or -sign),
421     this is the tag of the first body field, and an empty header is to
422     be assumed (equivalent to a 'W*TAB*0' append message).
423    
424     For all body fields, the deserialization must be done in the following steps:
425     - take an initial '-' sign and any digits as tag, defaulting to 0
426     - skip one following *TAB* character
427     - use anything up to a newline as value
428     Consequently, on serialization:
429     - a tag of 0 may and commonly will be omitted
430     - where a value does not start with a TAB,
431     the TAB may be ommited
432     - where a value does not start with a '-', digit or TAB,
433     both a 0 tag and the TAB may be ommited
434     - where values containing newlines are used unencoded,
435     they will in most cases result in following 0 tagged fields
436     However, ommiting the TAB is considered bad style.
437    
438    
439     The record data ("master") file is simply a stream of data record messages,
440     using headerless mode where possible (i.e. appends of leaderless records).
441    
442    
443     Some easy common encodings are suggested to deal with newline characters:
444     - in "field mode",
445     discard newlines by replacing them with spaces or tabs.
446     - in "text mode",
447     newlines are replaced with vertical tabs VT (ASCII 11, ^K).
448     This maybe reversed to restore newline-separated lines if needed,
449     but e.g. on printing the VT will have the desired effect.
450     - in "binary mode",
451     newlines are replaced as VT followed by a byte value 1,
452     if the newline is followed by a byte value 0 or 1, else by a single VT.
453     A VT is replaced by a VT and a 0 byte.
454     - as an "ultra robust binary mode", use BASE64.
455    
456     The advantages of text mode over binary mode are
457     - it is slightly faster than the binary translation
458     - the serialized records do not need more space
459     (whereas the binary serialization might need twice the space)
460    
461     The binary mode has the advantage of not loosing vertical tab characters that
462     might have been contained in the original field values.
463     It is fully transparent and can be used to store any binary data like images
464     with an average overhead of 0.4% (as compared to +33% with BASE64 encoding).
465     Note that for a plain text not containing control characters 0, 1 or 11,
466     text and binary mode have the same results, thus it is reasonably safe
467     for client libraries to use binary mode by default on all communication.
468    
469     However, BASE64 has the advantage of even surviving a character set recoding,
470     thus is more robust for databases which may be exchanged internationally.
471     Also the overhead of BASE64 is fixed to 33% (4 bytes for every 3),
472     while the binary mode has a worst case of +100% (on all VTs).
473    
474     ---
475     $Id: Protocol.txt,v 1.12 2004/06/15 11:11:16 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26