/[webpac]/openisis/current/doc/Serialized.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/Serialized.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 10001 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 ISIS records serialized
2    
3     Serialization means to convert the internal representation
4     of an ISIS record to a sequence of bytes ("octets") suitable
5     to be stored in a file or transferred via a network.
6    
7     The serialization format described here is used by OpenIsis
8     for both database master files and network communications.
9    
10    
11     * design goals
12    
13     The serialization format should be
14     - easy to use
15     for programmers and tool writers
16     - efficient
17     in execution time and space used
18     - robust
19     a broken masterfile should be fixed using a text editor
20     - versatile
21     can be used for a variety of applications using a variety of tools
22     - without limits
23     in number and size of records and fields
24    
25    
26     * basic format
27    
28     In general, a record is serialized by
29     - serializing meta information
30     - serializing the fields in order
31     - appending a blank line
32    
33     Fields are serialized as
34     - the field tag printed using ASCII decimal digits
35     (optionally preceeded by a minus sign, if negative tags are allowed)
36     - a (horizontal) TAB character (ASCII value 9, ^I)
37     - the field value
38     - a newline character (ASCII value 10, ^J)
39    
40     Metadata is serialized in the same way,
41     using special tags according to the needs of the environment.
42     Two situations are distinguished:
43     - "soft" metadata, which may and should be accessible as part of the record.
44     This is encoded by convention using negative tags.
45     An example of this is HTTP and other MIME-style communication,
46     where the MIME headers like "User-agent" or "Date" are encoded
47     in such a way, while content data like GET or POST parameters
48     should be mapped to positive IDs.
49     - "hard" metadata, which must not interfere with the record contents
50     in order for the environment to work properly.
51     This is encoded using a single non-digit character instead of tag digits.
52     An example of this is the MFN in a master file and information
53     regarding record deletion or update.
54    
55     The final blankline may be omitted,
56     where only a single record is contained in an otherwise delimited byte sequence.
57    
58     A reader should support a lazy mode,
59     allowing the TAB to be omitted, where unambigous
60     (the field value does not start with a digit or a TAB).
61     Writers, however, are strongly urged to write the TAB.
62    
63     Tags with leading zeros are allowed (typically with %03d)
64     and must not be interpreted as octal using atoi.
65    
66    
67     * newline conventions
68    
69     Two ways are supported to deal with newline characters in field values:
70     - in "text mode", newlines are replaced with vertical tabs (ASCII 11, ^K)
71     on serialization and vice versa on deserialization.
72     - in "binary mode", newlines are replaced as newline-TAB sequences.
73    
74     As an alternative to these protocol-level modes one may choose "field mode"
75     at the application level: simply claim that you are not interested in
76     newlines and replace any by tab or space (w/o ever converting back).
77    
78    
79     The advantages of text mode over binary mode are
80     - it is slightly faster than the binary translation
81     - the serialized records do not need more space
82     than the internal representation
83     (whereas the binary serialization might need nearly twice as much
84     in worst case)
85     - it is easily used with line-oriented utilities like grep or sed,
86     since each field is contained within one line
87    
88     The binary mode (which resembles MIME continuation lines) has the
89     advantage of not loosing vertical tab characters that might have
90     been contained in the original field values.
91     It is fully transparent and can be used to store any binary data like images
92     with an average overhead of 0.4%
93     (as compared to +33% needed by BASE64 encoding).
94    
95    
96     The OpenIsis server automatically detects binary mode,
97     if the client uses a continuation line.
98    
99    
100     * masterfile format
101    
102     A basic masterfile consists of a blank-line separated series of records.
103    
104     The first record is the "controlling record",
105     containing descriptive information such as the newline convention,
106     the subfield separator and the character encoding.
107     All of this is optional; the masterfile might just start with a blank line.
108    
109     The MFNs are then assigned implicitly in order, starting from 1.
110     There is no distinction between
111     empty (consecutive blanklines) and deleted records.
112     There is absolutely no redundant information contained.
113    
114     Masterfile compression creates this state
115     (however, it may choose to use special meta lines for long ranges
116     of deleted records, see below -- but those are inefficient in
117     the Xref, anyway).
118     Such a masterfile can be very easily created by any tool (like Perl and such).
119     The Xref file can be easily, fast and reliably recreated.
120    
121    
122     When writing to the database, information is ALWAYS appended to the end.
123     There is NO OVERWRITING of any data, ever, period.
124     That way data CANT BE DESTROYED by any operation
125     (one could advise the operating system to set a mandatory read lock on that),
126     and all changes are easily traced using tail -f.
127    
128     A binary mode masterfile starts with a "broken" continuation line
129     containing a single TAB.
130    
131    
132     * basic mode writing
133    
134     In metalines, all numbers are given in decimal digits
135     and multiple items are separated by TABs.
136     The optional timestamps are an arbitrary prefix of generalized time format
137     YYYYMMDDhhmmssttt... as of 'date +%Y%m%d%H%M' plus milliseconds ttt,
138     optionally followed by up to 14 characters to create a unique request id.
139    
140     Write operations use the following meta line (preceeding record data):
141     - W mfn [oldpos [timestamp]]
142     followed by new record data denotes a write of record mfn.
143     Depending on the needs of the environment,
144     the byte offset of the last version might be added in order to
145     support access to old versions (e.g. for delayed index update).
146     oldpos may be given as position[.length[.fields]]
147    
148     A W with no data following is mostly equivalent to a delete.
149     If an otherwise written mfn is higher than the highest known,
150     the highest known (and thus the implicit counter) is set to this.
151    
152     A lazy reader should not require meta lines to be followed by a blank line,
153     where unambigous.
154     For writers, however, the blank line is strongly recommended.
155    
156    
157     A reasonable size limit on metalines is 127(+newline), since
158     - 22 = 1+1+20 for operator, tab and mfn (64 bits ~ 20 digits)
159     - 43 = 1+20+1+10+1+10 for tab position.length.fields
160     - 32 = 1+17+14 for tab, milliseconds time and id
161     totals to 97, so we have some room left.
162    
163    
164     * advance mode
165    
166     For advanced space efficiency at the cost of read increased access time,
167     the following lines may be used:
168     - D mfn [oldpos [timestamp]]
169     a record entry consisting of only a D line denotes the deletion of
170     record number mfn.
171     Basically equivalent to a write with no data following,
172     but a little bit more explicit.
173     - I mfn [timestamp]
174     (set id) override the implicit MFN counter,
175     e.g. after a series of deleted rows or just to be explicit.
176     Basically equivalent to a write with no old pos,
177     but a little bit more explicit.
178     Recommended when appending records after some deletes.
179     - C mfn oldpos [timestamp]
180     introduces a series of patch commands specifying how record mfn
181     was changed.
182    
183     Software writing the masterfile will typically choose to
184     write full updates and MUST provide a switch to forcedly do so,
185     in order to be compatible with basic mode readers.
186    
187     However, the patch command language is particularily useful
188     with server operations, to avoid the need for read-write
189     sequences with some sort of locking.
190    
191    
192     * the patch language
193    
194     The patch commands are lines starting with special
195     characters like +, -, ~ and so on,
196     followed by (an optional TAB and) field addresses, TAB and field data.
197    
198    
199     The simplest case is the '+' command,
200     meaning that it's data is to be appended to the record.
201     The + and TAB may be omitted
202     (both, in order to not be confused with a continuation line).
203     In other words, the add command may look exactly like an ordinary
204     field line.
205    
206     A series of '=' commands works exactly like the set operation in an
207     OpenIsis Tcl record. Especially, field indexes and subfields are supported.
208     The '-' command resembles the del operation.
209    
210     A detailled description of the "patch language" is to be done.
211    
212    
213     example
214     $
215     C 1234
216     = 24 foo
217     = 24 bar
218     25 baz
219     $
220     changes record 1234 by setting the first to occurences of
221     field 24 to foo and baz, respectively, deleting any other occurences
222     of field 24, and appending a field 25 with value baz.
223    
224    
225     * the pointer file
226    
227     The pointer file is an array of n-Byte (n >= 6) entries,
228     the ith entry referencing mfn i, similar to the traditional .XRF (crossref).
229    
230     The n=k+l+m bytes specify two or three numbers (in native byte order)
231     of up to 8 bytes each:
232     - the first k bytes (k >= 4) give the position of the record
233     (or it's last update or change entry)
234     - the next l bytes (l >= 2) give the length of the record
235     (excluding the last field's terminating newline and following blank line)
236     - the final m bytes (m >= 0) give the number of fields.
237     If m is 0, or all bits in a field number are set (for large records),
238     the reader has to determine the number fo fields by inspecting the record.
239    
240     The first six bytes of the first entry describe the detailled layout.
241     Four bytes are the "magic number" containing the ASCII characters "ISIX".
242     Two bytes are the number (m*256 + l*16 + k) in native byte order.
243    
244     The minimum case k=4, l=2 imposes the limits of traditional ISIS.
245     Actually the lower limits are not enforced;
246     in a very specialised application one might want to use k/l/m = 3/1/0.
247     Recommended are at least eight bytes as k/l/m = 4/3/1 or 4/4/0
248     or, with large file support, 5/3/0.
249    
250    
251     However, when any of the limits is reached,
252     or an unsupported combination or byte order is found,
253     the Xref can easily be recreated with greater values.
254     The 12 byte pointer with k/l/m = 6/4/2 will be enough for
255     even gigantic databases of a quarter Petabyte (262.144 Gigabyte).
256    
257     The number of fields is redundant, but as an optimization
258     may make live a little bit easier for a reader.
259     If the pointer structure has m>0 (typically m=1),
260     a value of 0 must be stored if the number of fields exceeds the representable
261     range and a reader should be prepared to figure it out itself in that case.
262    
263    
264     ---
265     $Id: Serialized.txt,v 1.8 2003/05/30 13:26:34 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26