1 |
Using views in OpenIsis. |
2 |
|
3 |
A "view", like a VIEW in SQL, creates new, typically temporary records based on |
4 |
existing ones by means of some transformation like selecting a subset of the |
5 |
available fields (a projection), retagging fields or manipulating field values. |
6 |
|
7 |
|
8 |
As general concept, a view can be implemented using any algorithm |
9 |
in any of the available programming languages to create new records |
10 |
(and need not only refer to record contents, but may also access other |
11 |
ressources like files). |
12 |
|
13 |
In a more narrow sense, however, a view is a special kind of transformation |
14 |
defined by a "view record". The fields of a view record have tags |
15 |
as they should appear in the target, typically some valid tags of the source |
16 |
plus, for example, index control tags, if the view describes indexing. |
17 |
|
18 |
|
19 |
In the following, the term "alphanumeric" denotes any ASCII letter or digit, |
20 |
or any non-ASCII character. |
21 |
"Word character" denotes any alphanumeric, hyphen '-' or underscore '_'. |
22 |
|
23 |
|
24 |
The value can have one of several forms: |
25 |
- if it is empty, |
26 |
the tag is passed to the source record's v command (see below). |
27 |
- if it starts with a %, |
28 |
the rest of the value (w/o the %) is passed to the source record's v command. |
29 |
If the tag is not 0, '=tag;' is prepended. |
30 |
- if the value starts with any word character, |
31 |
it is used literally. |
32 |
- if it starts with a quote, |
33 |
the rest of the value is used literally (w/o the quote). |
34 |
If the value's last character is a quote, it is discarded. |
35 |
- if it starts with an @, |
36 |
the rest of the value names a view to be included |
37 |
- if it starts with an &, |
38 |
the rest of the value is the name of an extension exit to call |
39 |
- if it starts with an {, |
40 |
the rest of the value is a script to be executed in the host language |
41 |
(after stripping an optional } as last character) |
42 |
- any other form |
43 |
(i.e. starting with other ASCII punctuation) is reserved for future use |
44 |
|
45 |
Example: the view |
46 |
$ |
47 |
24 |
48 |
70 |
49 |
$ |
50 |
is a simple projection selecting fields 24 and 70 from the source. |
51 |
|
52 |
|
53 |
* the v command |
54 |
|
55 |
is described here as an abstract command. |
56 |
It is available in the C-API as well as from the language bindings, |
57 |
possibly with language specific variations. |
58 |
|
59 |
It resembles the core concepts of traditional formatting, |
60 |
including access to and looping over fields and subfields, |
61 |
selecting substrings and attaching optional literals. |
62 |
It is sort of the record's printf. |
63 |
Like printf, and unlike traditional formatting, |
64 |
it neither supports flow control nor screen rendering. |
65 |
|
66 |
|
67 |
It takes a source and target record plus a string specifying a format. |
68 |
Depending on the language environment, the source and/or target may be implicit. |
69 |
|
70 |
If the format starts with '=tag;', where tag is a tag, |
71 |
this gives the tag used in the target and as default. |
72 |
Otherwise, tags from the source are used in the target and default is *. |
73 |
|
74 |
The first (next) character is then checked for an encoding mode, see below. |
75 |
|
76 |
|
77 |
The format is a series of output specifications, |
78 |
consisting of a field tag (word characters, either numerical or by field name), |
79 |
selectors and modifiers. The special tag * selects all fields. |
80 |
Each spec may contain several subspecs, separated by commas, |
81 |
using the same child context (otherwise, specs and subspecs are the same). |
82 |
So the format is spec[;spec...], and a spec is spec[,subspec...]. |
83 |
|
84 |
|
85 |
The general operation of the v command is to loop over the record |
86 |
until the last occurence was seen for all tags. |
87 |
In the nth repetition, for each tag in any spec, |
88 |
the (n+i)th occurence of a field with this tag is used, |
89 |
where i is an offset given by an occurence selector. |
90 |
Determine whether this is the last occurence. |
91 |
For every iteration, a new output field is started, |
92 |
and the format is processed as follows: |
93 |
- loop over the (main) specifications |
94 |
- loop over childs (or use the given field) |
95 |
- loop over subspecs |
96 |
- loop over subfields (or use the whole field) |
97 |
- apply decoding |
98 |
- apply substring |
99 |
- apply encoding |
100 |
- attach literals |
101 |
- append the result to the target record |
102 |
|
103 |
|
104 |
Each spec starts with an optional decoding mode, |
105 |
optionally followed by a tag, |
106 |
optionally followed by a child selector, |
107 |
optionally followed by a subfield selector, |
108 |
optionally followed by string modifiers, |
109 |
optionally intermingled with occurence selectors and literals: |
110 |
- , starts a new subspec |
111 |
- ; starts a new spec with default context reset to the last tag seen |
112 |
- . starts a child selector |
113 |
- ^% start a subfield selector |
114 |
- ([ start an occurence selector |
115 |
- /~"'`|+ start a literal |
116 |
- : starts a substring selector |
117 |
- & calls an extension |
118 |
- { evaluates a script |
119 |
|
120 |
|
121 |
* encoding mode |
122 |
|
123 |
One of the following operators as first character of the format |
124 |
can select an output "encoding": |
125 |
- ? outputs a 1, if the selected entitity exists, 0 else |
126 |
- ! the opposite of ? |
127 |
- & applies HTML encoding |
128 |
- % applies URL encoding |
129 |
|
130 |
The test encodings ?! inhibit normal processing; |
131 |
they immediatly return after checking the first occurence of the the first tag. |
132 |
For example, using a default of all tags (*), the format consisting |
133 |
solely of a '?' checks wether a record is empty. |
134 |
|
135 |
More special characters (but not the '*') may be designated in the future, |
136 |
so a format should always start with a tag (possibly explicit *). |
137 |
|
138 |
|
139 |
* decoding mode |
140 |
|
141 |
An uppercase character before the tag may denote a decoding mode: |
142 |
$ |
143 |
- H heading mode: |
144 |
^x is replaced as ';' for x=a, ',' for x=b..i, '.' for others |
145 |
angle brackets are removed (>< replaced by '; '), <a> or <a=b> evaluates to a |
146 |
|
147 |
- D data mode: |
148 |
in addition to heading mode, if there is no explicit literal after this field, |
149 |
append ' ', if it ends in "punctuation", or '. ' else. |
150 |
|
151 |
- X index mode |
152 |
like heading, but <a> evaluates to nothing and <a=b> to b |
153 |
|
154 |
- M traditional |
155 |
For compatibility, specs reading MHx or MDx (x = L or U) set heading |
156 |
or data mode, resp., as default processing (before substringing). |
157 |
The case directive is ignored. |
158 |
$ |
159 |
|
160 |
|
161 |
* child selector |
162 |
|
163 |
If a tag is immediatly followed by a dot '.' and optional tag, |
164 |
field context is switched, for this spec and following specs separated by ',', |
165 |
to loop over the childs with the given tag. |
166 |
Tag defaults to 0, selecting text nodes in the canonical XML representation. |
167 |
A * selects all childs, a second . recursively selects all childs. |
168 |
|
169 |
|
170 |
* subfield selectors |
171 |
|
172 |
The primary subfield selector is the hat '^', followed by one character. |
173 |
It can produce multiple items, like repetitions of a subfield or keywords. |
174 |
|
175 |
If the selector character is |
176 |
- alphanumeric |
177 |
select the (repetitions of the) subfield tagged with this character. |
178 |
- an opening pairing brace |
179 |
i.e. one of '(','{','[' or the angle bracket '<', |
180 |
words between pairs of this brace are selected (commonly keywords). |
181 |
- a * |
182 |
selects the part up to the first subfield delimiter |
183 |
- a space |
184 |
selects naive words as sequences of alphanum |
185 |
- a ) |
186 |
selects parts between TABs (array mode) |
187 |
- other punctuation |
188 |
like / or | selects parts between pairs of this character |
189 |
|
190 |
|
191 |
The percent sign '%' (think printf) works basically like the hat, but |
192 |
- removes quotes surrounding values |
193 |
- by default treats the TAB as subfield delimiter |
194 |
- if followed by a punctuation character or space, |
195 |
treats this plus surrounding whitespace as delimiter, |
196 |
not separating within quotes. |
197 |
- if followed by a ), |
198 |
(optionally after another punctuation) goes to array mode, |
199 |
that is there is no subfield indicator stripped from the values |
200 |
- if followed by multiple word characters, |
201 |
(including '-' and '_', optionally after an initial punctuation) |
202 |
searches for subfields starting with that sequence followed by '=' or ':' |
203 |
|
204 |
Examples: |
205 |
- '^)' splits at TABs |
206 |
- '%)' splits at TABs with quote removal |
207 |
- '%a' selects a sequence following a TAB and 'a' |
208 |
- '%,)' splits a line of comma separated values |
209 |
- '%;*' selects the primary value of a MIME property |
210 |
- '%;charset' selects the charset attribute of a MIME property |
211 |
|
212 |
|
213 |
* occurence selector |
214 |
|
215 |
By default, all occurences of fields, childs and subfields are used. |
216 |
One or multiple occurences can be selected explicitly following a tag, |
217 |
child selector or subfield selector using brackets [] (counting from 1) |
218 |
or parentheses (counting from 0) like (i) or (i..j). |
219 |
|
220 |
- If i is ommited, it defaults to the first (1 or 0, resp.). |
221 |
- If j is ommited, it defaults to last. |
222 |
|
223 |
Alternatively occurences may be selected by contents. |
224 |
The general format is an optional subfield selector, |
225 |
followed by an comparision operator, followed by a literal. |
226 |
Only occurences where the field or specified subfield matches |
227 |
the literal according to comparision are selected. |
228 |
Parentheses select all such occurences, |
229 |
while brackets select the first match |
230 |
and default to the first occurence if none matches. |
231 |
|
232 |
Operators are |
233 |
- = for equality |
234 |
- ~ for contains |
235 |
- * for starts with |
236 |
- + for ends with |
237 |
The equality operator may be ommited, where unambigous. |
238 |
If some key subfield is known to occur at the start or end of field, |
239 |
it is probably more efficient to test for +^zen than for ^z=en. |
240 |
|
241 |
|
242 |
* literals |
243 |
|
244 |
Each tag, child or subfield selector may be followed by one or more literals. |
245 |
Every literal but the / extends to the next occurence of the same |
246 |
special character by which it is introduced. |
247 |
This special character may be escaped using a backslash. |
248 |
A literal backslash may be escaped as two (but need not, except at the end). |
249 |
|
250 |
The special character governs when and where the literal is output: |
251 |
- " before the first occurence |
252 |
(of the entity in question; i.e. field, child or subfield) |
253 |
- ' before each |
254 |
- ` after each |
255 |
- | inbetween (after each but the last) |
256 |
- + after the last |
257 |
- / this single-character literal starts a new output field after each occurence |
258 |
- ~ this literal is used if the given entitity does NOT occur |
259 |
|
260 |
Literals are not subject to the string modifiers. |
261 |
|
262 |
|
263 |
* substring selector |
264 |
|
265 |
Introduced by a colon ':', it has the form :l or :o.l, where o and |
266 |
l are integers denoting an offset and length to cut from the currently |
267 |
selected value. |
268 |
|
269 |
|
270 |
* extension exits |
271 |
|
272 |
An exit is a C-function (i.e., using C calling convention) in a dynamic library. |
273 |
TODO: describe interface. |
274 |
|
275 |
|
276 |
* script evaluation |
277 |
|
278 |
If a scripting environment like Tcl is available, |
279 |
a {} block may contain a script to be evaluated. |
280 |
TODO: describe interface. |
281 |
|
282 |
|
283 |
--- |
284 |
$Id: Views.txt,v 1.3 2003/06/02 07:49:08 kripke Exp $ |