/[webpac]/openisis/current/doc/Views.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/Views.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 10096 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 Using views in OpenIsis.
2    
3     A "view", like a VIEW in SQL, creates new, typically temporary records based on
4     existing ones by means of some transformation like selecting a subset of the
5     available fields (a projection), retagging fields or manipulating field values.
6    
7    
8     As general concept, a view can be implemented using any algorithm
9     in any of the available programming languages to create new records
10     (and need not only refer to record contents, but may also access other
11     ressources like files).
12    
13     In a more narrow sense, however, a view is a special kind of transformation
14     defined by a "view record". The fields of a view record have tags
15     as they should appear in the target, typically some valid tags of the source
16     plus, for example, index control tags, if the view describes indexing.
17    
18    
19     In the following, the term "alphanumeric" denotes any ASCII letter or digit,
20     or any non-ASCII character.
21     "Word character" denotes any alphanumeric, hyphen '-' or underscore '_'.
22    
23    
24     The value can have one of several forms:
25     - if it is empty,
26     the tag is passed to the source record's v command (see below).
27     - if it starts with a %,
28     the rest of the value (w/o the %) is passed to the source record's v command.
29     If the tag is not 0, '=tag;' is prepended.
30     - if the value starts with any word character,
31     it is used literally.
32     - if it starts with a quote,
33     the rest of the value is used literally (w/o the quote).
34     If the value's last character is a quote, it is discarded.
35     - if it starts with an @,
36     the rest of the value names a view to be included
37     - if it starts with an &,
38     the rest of the value is the name of an extension exit to call
39     - if it starts with an {,
40     the rest of the value is a script to be executed in the host language
41     (after stripping an optional } as last character)
42     - any other form
43     (i.e. starting with other ASCII punctuation) is reserved for future use
44    
45     Example: the view
46     $
47     24
48     70
49     $
50     is a simple projection selecting fields 24 and 70 from the source.
51    
52    
53     * the v command
54    
55     is described here as an abstract command.
56     It is available in the C-API as well as from the language bindings,
57     possibly with language specific variations.
58    
59     It resembles the core concepts of traditional formatting,
60     including access to and looping over fields and subfields,
61     selecting substrings and attaching optional literals.
62     It is sort of the record's printf.
63     Like printf, and unlike traditional formatting,
64     it neither supports flow control nor screen rendering.
65    
66    
67     It takes a source and target record plus a string specifying a format.
68     Depending on the language environment, the source and/or target may be implicit.
69    
70     If the format starts with '=tag;', where tag is a tag,
71     this gives the tag used in the target and as default.
72     Otherwise, tags from the source are used in the target and default is *.
73    
74     The first (next) character is then checked for an encoding mode, see below.
75    
76    
77     The format is a series of output specifications,
78     consisting of a field tag (word characters, either numerical or by field name),
79     selectors and modifiers. The special tag * selects all fields.
80     Each spec may contain several subspecs, separated by commas,
81     using the same child context (otherwise, specs and subspecs are the same).
82     So the format is spec[;spec...], and a spec is spec[,subspec...].
83    
84    
85     The general operation of the v command is to loop over the record
86     until the last occurence was seen for all tags.
87     In the nth repetition, for each tag in any spec,
88     the (n+i)th occurence of a field with this tag is used,
89     where i is an offset given by an occurence selector.
90     Determine whether this is the last occurence.
91     For every iteration, a new output field is started,
92     and the format is processed as follows:
93     - loop over the (main) specifications
94     - loop over childs (or use the given field)
95     - loop over subspecs
96     - loop over subfields (or use the whole field)
97     - apply decoding
98     - apply substring
99     - apply encoding
100     - attach literals
101     - append the result to the target record
102    
103    
104     Each spec starts with an optional decoding mode,
105     optionally followed by a tag,
106     optionally followed by a child selector,
107     optionally followed by a subfield selector,
108     optionally followed by string modifiers,
109     optionally intermingled with occurence selectors and literals:
110     - , starts a new subspec
111     - ; starts a new spec with default context reset to the last tag seen
112     - . starts a child selector
113     - ^% start a subfield selector
114     - ([ start an occurence selector
115     - /~"'`|+ start a literal
116     - : starts a substring selector
117     - & calls an extension
118     - { evaluates a script
119    
120    
121     * encoding mode
122    
123     One of the following operators as first character of the format
124     can select an output "encoding":
125     - ? outputs a 1, if the selected entitity exists, 0 else
126     - ! the opposite of ?
127     - & applies HTML encoding
128     - % applies URL encoding
129    
130     The test encodings ?! inhibit normal processing;
131     they immediatly return after checking the first occurence of the the first tag.
132     For example, using a default of all tags (*), the format consisting
133     solely of a '?' checks wether a record is empty.
134    
135     More special characters (but not the '*') may be designated in the future,
136     so a format should always start with a tag (possibly explicit *).
137    
138    
139     * decoding mode
140    
141     An uppercase character before the tag may denote a decoding mode:
142     $
143     - H heading mode:
144     ^x is replaced as ';' for x=a, ',' for x=b..i, '.' for others
145     angle brackets are removed (>< replaced by '; '), <a> or <a=b> evaluates to a
146    
147     - D data mode:
148     in addition to heading mode, if there is no explicit literal after this field,
149     append ' ', if it ends in "punctuation", or '. ' else.
150    
151     - X index mode
152     like heading, but <a> evaluates to nothing and <a=b> to b
153    
154     - M traditional
155     For compatibility, specs reading MHx or MDx (x = L or U) set heading
156     or data mode, resp., as default processing (before substringing).
157     The case directive is ignored.
158     $
159    
160    
161     * child selector
162    
163     If a tag is immediatly followed by a dot '.' and optional tag,
164     field context is switched, for this spec and following specs separated by ',',
165     to loop over the childs with the given tag.
166     Tag defaults to 0, selecting text nodes in the canonical XML representation.
167     A * selects all childs, a second . recursively selects all childs.
168    
169    
170     * subfield selectors
171    
172     The primary subfield selector is the hat '^', followed by one character.
173     It can produce multiple items, like repetitions of a subfield or keywords.
174    
175     If the selector character is
176     - alphanumeric
177     select the (repetitions of the) subfield tagged with this character.
178     - an opening pairing brace
179     i.e. one of '(','{','[' or the angle bracket '&lt;',
180     words between pairs of this brace are selected (commonly keywords).
181     - a *
182     selects the part up to the first subfield delimiter
183     - a space
184     selects naive words as sequences of alphanum
185     - a )
186     selects parts between TABs (array mode)
187     - other punctuation
188     like / or | selects parts between pairs of this character
189    
190    
191     The percent sign '%' (think printf) works basically like the hat, but
192     - removes quotes surrounding values
193     - by default treats the TAB as subfield delimiter
194     - if followed by a punctuation character or space,
195     treats this plus surrounding whitespace as delimiter,
196     not separating within quotes.
197     - if followed by a ),
198     (optionally after another punctuation) goes to array mode,
199     that is there is no subfield indicator stripped from the values
200     - if followed by multiple word characters,
201     (including '-' and '_', optionally after an initial punctuation)
202     searches for subfields starting with that sequence followed by '=' or ':'
203    
204     Examples:
205     - '^)' splits at TABs
206     - '%)' splits at TABs with quote removal
207     - '%a' selects a sequence following a TAB and 'a'
208     - '%,)' splits a line of comma separated values
209     - '%;*' selects the primary value of a MIME property
210     - '%;charset' selects the charset attribute of a MIME property
211    
212    
213     * occurence selector
214    
215     By default, all occurences of fields, childs and subfields are used.
216     One or multiple occurences can be selected explicitly following a tag,
217     child selector or subfield selector using brackets [] (counting from 1)
218     or parentheses (counting from 0) like (i) or (i..j).
219    
220     - If i is ommited, it defaults to the first (1 or 0, resp.).
221     - If j is ommited, it defaults to last.
222    
223     Alternatively occurences may be selected by contents.
224     The general format is an optional subfield selector,
225     followed by an comparision operator, followed by a literal.
226     Only occurences where the field or specified subfield matches
227     the literal according to comparision are selected.
228     Parentheses select all such occurences,
229     while brackets select the first match
230     and default to the first occurence if none matches.
231    
232     Operators are
233     - = for equality
234     - ~ for contains
235     - * for starts with
236     - + for ends with
237     The equality operator may be ommited, where unambigous.
238     If some key subfield is known to occur at the start or end of field,
239     it is probably more efficient to test for +^zen than for ^z=en.
240    
241    
242     * literals
243    
244     Each tag, child or subfield selector may be followed by one or more literals.
245     Every literal but the / extends to the next occurence of the same
246     special character by which it is introduced.
247     This special character may be escaped using a backslash.
248     A literal backslash may be escaped as two (but need not, except at the end).
249    
250     The special character governs when and where the literal is output:
251     - " before the first occurence
252     (of the entity in question; i.e. field, child or subfield)
253     - ' before each
254     - ` after each
255     - | inbetween (after each but the last)
256     - + after the last
257     - / this single-character literal starts a new output field after each occurence
258     - ~ this literal is used if the given entitity does NOT occur
259    
260     Literals are not subject to the string modifiers.
261    
262    
263     * substring selector
264    
265     Introduced by a colon ':', it has the form :l or :o.l, where o and
266     l are integers denoting an offset and length to cut from the currently
267     selected value.
268    
269    
270     * extension exits
271    
272     An exit is a C-function (i.e., using C calling convention) in a dynamic library.
273     TODO: describe interface.
274    
275    
276     * script evaluation
277    
278     If a scripting environment like Tcl is available,
279     a {} block may contain a script to be evaluated.
280     TODO: describe interface.
281    
282    
283     ---
284     $Id: Views.txt,v 1.3 2003/06/02 07:49:08 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26