/[webpac]/openisis/current/doc/Views.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /openisis/current/doc/Views.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (show annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years ago) by dpavlin
File MIME type: text/plain
File size: 10096 byte(s)
initial import of openisis 0.9.0 vendor drop

1 Using views in OpenIsis.
2
3 A "view", like a VIEW in SQL, creates new, typically temporary records based on
4 existing ones by means of some transformation like selecting a subset of the
5 available fields (a projection), retagging fields or manipulating field values.
6
7
8 As general concept, a view can be implemented using any algorithm
9 in any of the available programming languages to create new records
10 (and need not only refer to record contents, but may also access other
11 ressources like files).
12
13 In a more narrow sense, however, a view is a special kind of transformation
14 defined by a "view record". The fields of a view record have tags
15 as they should appear in the target, typically some valid tags of the source
16 plus, for example, index control tags, if the view describes indexing.
17
18
19 In the following, the term "alphanumeric" denotes any ASCII letter or digit,
20 or any non-ASCII character.
21 "Word character" denotes any alphanumeric, hyphen '-' or underscore '_'.
22
23
24 The value can have one of several forms:
25 - if it is empty,
26 the tag is passed to the source record's v command (see below).
27 - if it starts with a %,
28 the rest of the value (w/o the %) is passed to the source record's v command.
29 If the tag is not 0, '=tag;' is prepended.
30 - if the value starts with any word character,
31 it is used literally.
32 - if it starts with a quote,
33 the rest of the value is used literally (w/o the quote).
34 If the value's last character is a quote, it is discarded.
35 - if it starts with an @,
36 the rest of the value names a view to be included
37 - if it starts with an &,
38 the rest of the value is the name of an extension exit to call
39 - if it starts with an {,
40 the rest of the value is a script to be executed in the host language
41 (after stripping an optional } as last character)
42 - any other form
43 (i.e. starting with other ASCII punctuation) is reserved for future use
44
45 Example: the view
46 $
47 24
48 70
49 $
50 is a simple projection selecting fields 24 and 70 from the source.
51
52
53 * the v command
54
55 is described here as an abstract command.
56 It is available in the C-API as well as from the language bindings,
57 possibly with language specific variations.
58
59 It resembles the core concepts of traditional formatting,
60 including access to and looping over fields and subfields,
61 selecting substrings and attaching optional literals.
62 It is sort of the record's printf.
63 Like printf, and unlike traditional formatting,
64 it neither supports flow control nor screen rendering.
65
66
67 It takes a source and target record plus a string specifying a format.
68 Depending on the language environment, the source and/or target may be implicit.
69
70 If the format starts with '=tag;', where tag is a tag,
71 this gives the tag used in the target and as default.
72 Otherwise, tags from the source are used in the target and default is *.
73
74 The first (next) character is then checked for an encoding mode, see below.
75
76
77 The format is a series of output specifications,
78 consisting of a field tag (word characters, either numerical or by field name),
79 selectors and modifiers. The special tag * selects all fields.
80 Each spec may contain several subspecs, separated by commas,
81 using the same child context (otherwise, specs and subspecs are the same).
82 So the format is spec[;spec...], and a spec is spec[,subspec...].
83
84
85 The general operation of the v command is to loop over the record
86 until the last occurence was seen for all tags.
87 In the nth repetition, for each tag in any spec,
88 the (n+i)th occurence of a field with this tag is used,
89 where i is an offset given by an occurence selector.
90 Determine whether this is the last occurence.
91 For every iteration, a new output field is started,
92 and the format is processed as follows:
93 - loop over the (main) specifications
94 - loop over childs (or use the given field)
95 - loop over subspecs
96 - loop over subfields (or use the whole field)
97 - apply decoding
98 - apply substring
99 - apply encoding
100 - attach literals
101 - append the result to the target record
102
103
104 Each spec starts with an optional decoding mode,
105 optionally followed by a tag,
106 optionally followed by a child selector,
107 optionally followed by a subfield selector,
108 optionally followed by string modifiers,
109 optionally intermingled with occurence selectors and literals:
110 - , starts a new subspec
111 - ; starts a new spec with default context reset to the last tag seen
112 - . starts a child selector
113 - ^% start a subfield selector
114 - ([ start an occurence selector
115 - /~"'`|+ start a literal
116 - : starts a substring selector
117 - & calls an extension
118 - { evaluates a script
119
120
121 * encoding mode
122
123 One of the following operators as first character of the format
124 can select an output "encoding":
125 - ? outputs a 1, if the selected entitity exists, 0 else
126 - ! the opposite of ?
127 - & applies HTML encoding
128 - % applies URL encoding
129
130 The test encodings ?! inhibit normal processing;
131 they immediatly return after checking the first occurence of the the first tag.
132 For example, using a default of all tags (*), the format consisting
133 solely of a '?' checks wether a record is empty.
134
135 More special characters (but not the '*') may be designated in the future,
136 so a format should always start with a tag (possibly explicit *).
137
138
139 * decoding mode
140
141 An uppercase character before the tag may denote a decoding mode:
142 $
143 - H heading mode:
144 ^x is replaced as ';' for x=a, ',' for x=b..i, '.' for others
145 angle brackets are removed (>< replaced by '; '), <a> or <a=b> evaluates to a
146
147 - D data mode:
148 in addition to heading mode, if there is no explicit literal after this field,
149 append ' ', if it ends in "punctuation", or '. ' else.
150
151 - X index mode
152 like heading, but <a> evaluates to nothing and <a=b> to b
153
154 - M traditional
155 For compatibility, specs reading MHx or MDx (x = L or U) set heading
156 or data mode, resp., as default processing (before substringing).
157 The case directive is ignored.
158 $
159
160
161 * child selector
162
163 If a tag is immediatly followed by a dot '.' and optional tag,
164 field context is switched, for this spec and following specs separated by ',',
165 to loop over the childs with the given tag.
166 Tag defaults to 0, selecting text nodes in the canonical XML representation.
167 A * selects all childs, a second . recursively selects all childs.
168
169
170 * subfield selectors
171
172 The primary subfield selector is the hat '^', followed by one character.
173 It can produce multiple items, like repetitions of a subfield or keywords.
174
175 If the selector character is
176 - alphanumeric
177 select the (repetitions of the) subfield tagged with this character.
178 - an opening pairing brace
179 i.e. one of '(','{','[' or the angle bracket '&lt;',
180 words between pairs of this brace are selected (commonly keywords).
181 - a *
182 selects the part up to the first subfield delimiter
183 - a space
184 selects naive words as sequences of alphanum
185 - a )
186 selects parts between TABs (array mode)
187 - other punctuation
188 like / or | selects parts between pairs of this character
189
190
191 The percent sign '%' (think printf) works basically like the hat, but
192 - removes quotes surrounding values
193 - by default treats the TAB as subfield delimiter
194 - if followed by a punctuation character or space,
195 treats this plus surrounding whitespace as delimiter,
196 not separating within quotes.
197 - if followed by a ),
198 (optionally after another punctuation) goes to array mode,
199 that is there is no subfield indicator stripped from the values
200 - if followed by multiple word characters,
201 (including '-' and '_', optionally after an initial punctuation)
202 searches for subfields starting with that sequence followed by '=' or ':'
203
204 Examples:
205 - '^)' splits at TABs
206 - '%)' splits at TABs with quote removal
207 - '%a' selects a sequence following a TAB and 'a'
208 - '%,)' splits a line of comma separated values
209 - '%;*' selects the primary value of a MIME property
210 - '%;charset' selects the charset attribute of a MIME property
211
212
213 * occurence selector
214
215 By default, all occurences of fields, childs and subfields are used.
216 One or multiple occurences can be selected explicitly following a tag,
217 child selector or subfield selector using brackets [] (counting from 1)
218 or parentheses (counting from 0) like (i) or (i..j).
219
220 - If i is ommited, it defaults to the first (1 or 0, resp.).
221 - If j is ommited, it defaults to last.
222
223 Alternatively occurences may be selected by contents.
224 The general format is an optional subfield selector,
225 followed by an comparision operator, followed by a literal.
226 Only occurences where the field or specified subfield matches
227 the literal according to comparision are selected.
228 Parentheses select all such occurences,
229 while brackets select the first match
230 and default to the first occurence if none matches.
231
232 Operators are
233 - = for equality
234 - ~ for contains
235 - * for starts with
236 - + for ends with
237 The equality operator may be ommited, where unambigous.
238 If some key subfield is known to occur at the start or end of field,
239 it is probably more efficient to test for +^zen than for ^z=en.
240
241
242 * literals
243
244 Each tag, child or subfield selector may be followed by one or more literals.
245 Every literal but the / extends to the next occurence of the same
246 special character by which it is introduced.
247 This special character may be escaped using a backslash.
248 A literal backslash may be escaped as two (but need not, except at the end).
249
250 The special character governs when and where the literal is output:
251 - " before the first occurence
252 (of the entity in question; i.e. field, child or subfield)
253 - ' before each
254 - ` after each
255 - | inbetween (after each but the last)
256 - + after the last
257 - / this single-character literal starts a new output field after each occurence
258 - ~ this literal is used if the given entitity does NOT occur
259
260 Literals are not subject to the string modifiers.
261
262
263 * substring selector
264
265 Introduced by a colon ':', it has the form :l or :o.l, where o and
266 l are integers denoting an offset and length to cut from the currently
267 selected value.
268
269
270 * extension exits
271
272 An exit is a C-function (i.e., using C calling convention) in a dynamic library.
273 TODO: describe interface.
274
275
276 * script evaluation
277
278 If a scripting environment like Tcl is available,
279 a {} block may contain a script to be evaluated.
280 TODO: describe interface.
281
282
283 ---
284 $Id: Views.txt,v 1.3 2003/06/02 07:49:08 kripke Exp $

  ViewVC Help
Powered by ViewVC 1.1.26