1 |
dpavlin |
237 |
Using views in OpenIsis. |
2 |
|
|
|
3 |
|
|
A "view", like a VIEW in SQL, creates new, typically temporary records based on |
4 |
|
|
existing ones by means of some transformation like selecting a subset of the |
5 |
|
|
available fields (a projection), retagging fields or manipulating field values. |
6 |
|
|
|
7 |
|
|
|
8 |
|
|
As general concept, a view can be implemented using any algorithm |
9 |
|
|
in any of the available programming languages to create new records |
10 |
|
|
(and need not only refer to record contents, but may also access other |
11 |
|
|
ressources like files). |
12 |
|
|
|
13 |
|
|
In a more narrow sense, however, a view is a special kind of transformation |
14 |
|
|
defined by a "view record". The fields of a view record have tags |
15 |
|
|
as they should appear in the target, typically some valid tags of the source |
16 |
|
|
plus, for example, index control tags, if the view describes indexing. |
17 |
|
|
|
18 |
|
|
|
19 |
|
|
In the following, the term "alphanumeric" denotes any ASCII letter or digit, |
20 |
|
|
or any non-ASCII character. |
21 |
|
|
"Word character" denotes any alphanumeric, hyphen '-' or underscore '_'. |
22 |
|
|
|
23 |
|
|
|
24 |
|
|
The value can have one of several forms: |
25 |
|
|
- if it is empty, |
26 |
|
|
the tag is passed to the source record's v command (see below). |
27 |
|
|
- if it starts with a %, |
28 |
|
|
the rest of the value (w/o the %) is passed to the source record's v command. |
29 |
|
|
If the tag is not 0, '=tag;' is prepended. |
30 |
|
|
- if the value starts with any word character, |
31 |
|
|
it is used literally. |
32 |
|
|
- if it starts with a quote, |
33 |
|
|
the rest of the value is used literally (w/o the quote). |
34 |
|
|
If the value's last character is a quote, it is discarded. |
35 |
|
|
- if it starts with an @, |
36 |
|
|
the rest of the value names a view to be included |
37 |
|
|
- if it starts with an &, |
38 |
|
|
the rest of the value is the name of an extension exit to call |
39 |
|
|
- if it starts with an {, |
40 |
|
|
the rest of the value is a script to be executed in the host language |
41 |
|
|
(after stripping an optional } as last character) |
42 |
|
|
- any other form |
43 |
|
|
(i.e. starting with other ASCII punctuation) is reserved for future use |
44 |
|
|
|
45 |
|
|
Example: the view |
46 |
|
|
$ |
47 |
|
|
24 |
48 |
|
|
70 |
49 |
|
|
$ |
50 |
|
|
is a simple projection selecting fields 24 and 70 from the source. |
51 |
|
|
|
52 |
|
|
|
53 |
|
|
* the v command |
54 |
|
|
|
55 |
|
|
is described here as an abstract command. |
56 |
|
|
It is available in the C-API as well as from the language bindings, |
57 |
|
|
possibly with language specific variations. |
58 |
|
|
|
59 |
|
|
It resembles the core concepts of traditional formatting, |
60 |
|
|
including access to and looping over fields and subfields, |
61 |
|
|
selecting substrings and attaching optional literals. |
62 |
|
|
It is sort of the record's printf. |
63 |
|
|
Like printf, and unlike traditional formatting, |
64 |
|
|
it neither supports flow control nor screen rendering. |
65 |
|
|
|
66 |
|
|
|
67 |
|
|
It takes a source and target record plus a string specifying a format. |
68 |
|
|
Depending on the language environment, the source and/or target may be implicit. |
69 |
|
|
|
70 |
|
|
If the format starts with '=tag;', where tag is a tag, |
71 |
|
|
this gives the tag used in the target and as default. |
72 |
|
|
Otherwise, tags from the source are used in the target and default is *. |
73 |
|
|
|
74 |
|
|
The first (next) character is then checked for an encoding mode, see below. |
75 |
|
|
|
76 |
|
|
|
77 |
|
|
The format is a series of output specifications, |
78 |
|
|
consisting of a field tag (word characters, either numerical or by field name), |
79 |
|
|
selectors and modifiers. The special tag * selects all fields. |
80 |
|
|
Each spec may contain several subspecs, separated by commas, |
81 |
|
|
using the same child context (otherwise, specs and subspecs are the same). |
82 |
|
|
So the format is spec[;spec...], and a spec is spec[,subspec...]. |
83 |
|
|
|
84 |
|
|
|
85 |
|
|
The general operation of the v command is to loop over the record |
86 |
|
|
until the last occurence was seen for all tags. |
87 |
|
|
In the nth repetition, for each tag in any spec, |
88 |
|
|
the (n+i)th occurence of a field with this tag is used, |
89 |
|
|
where i is an offset given by an occurence selector. |
90 |
|
|
Determine whether this is the last occurence. |
91 |
|
|
For every iteration, a new output field is started, |
92 |
|
|
and the format is processed as follows: |
93 |
|
|
- loop over the (main) specifications |
94 |
|
|
- loop over childs (or use the given field) |
95 |
|
|
- loop over subspecs |
96 |
|
|
- loop over subfields (or use the whole field) |
97 |
|
|
- apply decoding |
98 |
|
|
- apply substring |
99 |
|
|
- apply encoding |
100 |
|
|
- attach literals |
101 |
|
|
- append the result to the target record |
102 |
|
|
|
103 |
|
|
|
104 |
|
|
Each spec starts with an optional decoding mode, |
105 |
|
|
optionally followed by a tag, |
106 |
|
|
optionally followed by a child selector, |
107 |
|
|
optionally followed by a subfield selector, |
108 |
|
|
optionally followed by string modifiers, |
109 |
|
|
optionally intermingled with occurence selectors and literals: |
110 |
|
|
- , starts a new subspec |
111 |
|
|
- ; starts a new spec with default context reset to the last tag seen |
112 |
|
|
- . starts a child selector |
113 |
|
|
- ^% start a subfield selector |
114 |
|
|
- ([ start an occurence selector |
115 |
|
|
- /~"'`|+ start a literal |
116 |
|
|
- : starts a substring selector |
117 |
|
|
- & calls an extension |
118 |
|
|
- { evaluates a script |
119 |
|
|
|
120 |
|
|
|
121 |
|
|
* encoding mode |
122 |
|
|
|
123 |
|
|
One of the following operators as first character of the format |
124 |
|
|
can select an output "encoding": |
125 |
|
|
- ? outputs a 1, if the selected entitity exists, 0 else |
126 |
|
|
- ! the opposite of ? |
127 |
|
|
- & applies HTML encoding |
128 |
|
|
- % applies URL encoding |
129 |
|
|
|
130 |
|
|
The test encodings ?! inhibit normal processing; |
131 |
|
|
they immediatly return after checking the first occurence of the the first tag. |
132 |
|
|
For example, using a default of all tags (*), the format consisting |
133 |
|
|
solely of a '?' checks wether a record is empty. |
134 |
|
|
|
135 |
|
|
More special characters (but not the '*') may be designated in the future, |
136 |
|
|
so a format should always start with a tag (possibly explicit *). |
137 |
|
|
|
138 |
|
|
|
139 |
|
|
* decoding mode |
140 |
|
|
|
141 |
|
|
An uppercase character before the tag may denote a decoding mode: |
142 |
|
|
$ |
143 |
|
|
- H heading mode: |
144 |
|
|
^x is replaced as ';' for x=a, ',' for x=b..i, '.' for others |
145 |
|
|
angle brackets are removed (>< replaced by '; '), <a> or <a=b> evaluates to a |
146 |
|
|
|
147 |
|
|
- D data mode: |
148 |
|
|
in addition to heading mode, if there is no explicit literal after this field, |
149 |
|
|
append ' ', if it ends in "punctuation", or '. ' else. |
150 |
|
|
|
151 |
|
|
- X index mode |
152 |
|
|
like heading, but <a> evaluates to nothing and <a=b> to b |
153 |
|
|
|
154 |
|
|
- M traditional |
155 |
|
|
For compatibility, specs reading MHx or MDx (x = L or U) set heading |
156 |
|
|
or data mode, resp., as default processing (before substringing). |
157 |
|
|
The case directive is ignored. |
158 |
|
|
$ |
159 |
|
|
|
160 |
|
|
|
161 |
|
|
* child selector |
162 |
|
|
|
163 |
|
|
If a tag is immediatly followed by a dot '.' and optional tag, |
164 |
|
|
field context is switched, for this spec and following specs separated by ',', |
165 |
|
|
to loop over the childs with the given tag. |
166 |
|
|
Tag defaults to 0, selecting text nodes in the canonical XML representation. |
167 |
|
|
A * selects all childs, a second . recursively selects all childs. |
168 |
|
|
|
169 |
|
|
|
170 |
|
|
* subfield selectors |
171 |
|
|
|
172 |
|
|
The primary subfield selector is the hat '^', followed by one character. |
173 |
|
|
It can produce multiple items, like repetitions of a subfield or keywords. |
174 |
|
|
|
175 |
|
|
If the selector character is |
176 |
|
|
- alphanumeric |
177 |
|
|
select the (repetitions of the) subfield tagged with this character. |
178 |
|
|
- an opening pairing brace |
179 |
|
|
i.e. one of '(','{','[' or the angle bracket '<', |
180 |
|
|
words between pairs of this brace are selected (commonly keywords). |
181 |
|
|
- a * |
182 |
|
|
selects the part up to the first subfield delimiter |
183 |
|
|
- a space |
184 |
|
|
selects naive words as sequences of alphanum |
185 |
|
|
- a ) |
186 |
|
|
selects parts between TABs (array mode) |
187 |
|
|
- other punctuation |
188 |
|
|
like / or | selects parts between pairs of this character |
189 |
|
|
|
190 |
|
|
|
191 |
|
|
The percent sign '%' (think printf) works basically like the hat, but |
192 |
|
|
- removes quotes surrounding values |
193 |
|
|
- by default treats the TAB as subfield delimiter |
194 |
|
|
- if followed by a punctuation character or space, |
195 |
|
|
treats this plus surrounding whitespace as delimiter, |
196 |
|
|
not separating within quotes. |
197 |
|
|
- if followed by a ), |
198 |
|
|
(optionally after another punctuation) goes to array mode, |
199 |
|
|
that is there is no subfield indicator stripped from the values |
200 |
|
|
- if followed by multiple word characters, |
201 |
|
|
(including '-' and '_', optionally after an initial punctuation) |
202 |
|
|
searches for subfields starting with that sequence followed by '=' or ':' |
203 |
|
|
|
204 |
|
|
Examples: |
205 |
|
|
- '^)' splits at TABs |
206 |
|
|
- '%)' splits at TABs with quote removal |
207 |
|
|
- '%a' selects a sequence following a TAB and 'a' |
208 |
|
|
- '%,)' splits a line of comma separated values |
209 |
|
|
- '%;*' selects the primary value of a MIME property |
210 |
|
|
- '%;charset' selects the charset attribute of a MIME property |
211 |
|
|
|
212 |
|
|
|
213 |
|
|
* occurence selector |
214 |
|
|
|
215 |
|
|
By default, all occurences of fields, childs and subfields are used. |
216 |
|
|
One or multiple occurences can be selected explicitly following a tag, |
217 |
|
|
child selector or subfield selector using brackets [] (counting from 1) |
218 |
|
|
or parentheses (counting from 0) like (i) or (i..j). |
219 |
|
|
|
220 |
|
|
- If i is ommited, it defaults to the first (1 or 0, resp.). |
221 |
|
|
- If j is ommited, it defaults to last. |
222 |
|
|
|
223 |
|
|
Alternatively occurences may be selected by contents. |
224 |
|
|
The general format is an optional subfield selector, |
225 |
|
|
followed by an comparision operator, followed by a literal. |
226 |
|
|
Only occurences where the field or specified subfield matches |
227 |
|
|
the literal according to comparision are selected. |
228 |
|
|
Parentheses select all such occurences, |
229 |
|
|
while brackets select the first match |
230 |
|
|
and default to the first occurence if none matches. |
231 |
|
|
|
232 |
|
|
Operators are |
233 |
|
|
- = for equality |
234 |
|
|
- ~ for contains |
235 |
|
|
- * for starts with |
236 |
|
|
- + for ends with |
237 |
|
|
The equality operator may be ommited, where unambigous. |
238 |
|
|
If some key subfield is known to occur at the start or end of field, |
239 |
|
|
it is probably more efficient to test for +^zen than for ^z=en. |
240 |
|
|
|
241 |
|
|
|
242 |
|
|
* literals |
243 |
|
|
|
244 |
|
|
Each tag, child or subfield selector may be followed by one or more literals. |
245 |
|
|
Every literal but the / extends to the next occurence of the same |
246 |
|
|
special character by which it is introduced. |
247 |
|
|
This special character may be escaped using a backslash. |
248 |
|
|
A literal backslash may be escaped as two (but need not, except at the end). |
249 |
|
|
|
250 |
|
|
The special character governs when and where the literal is output: |
251 |
|
|
- " before the first occurence |
252 |
|
|
(of the entity in question; i.e. field, child or subfield) |
253 |
|
|
- ' before each |
254 |
|
|
- ` after each |
255 |
|
|
- | inbetween (after each but the last) |
256 |
|
|
- + after the last |
257 |
|
|
- / this single-character literal starts a new output field after each occurence |
258 |
|
|
- ~ this literal is used if the given entitity does NOT occur |
259 |
|
|
|
260 |
|
|
Literals are not subject to the string modifiers. |
261 |
|
|
|
262 |
|
|
|
263 |
|
|
* substring selector |
264 |
|
|
|
265 |
|
|
Introduced by a colon ':', it has the form :l or :o.l, where o and |
266 |
|
|
l are integers denoting an offset and length to cut from the currently |
267 |
|
|
selected value. |
268 |
|
|
|
269 |
|
|
|
270 |
|
|
* extension exits |
271 |
|
|
|
272 |
|
|
An exit is a C-function (i.e., using C calling convention) in a dynamic library. |
273 |
|
|
TODO: describe interface. |
274 |
|
|
|
275 |
|
|
|
276 |
|
|
* script evaluation |
277 |
|
|
|
278 |
|
|
If a scripting environment like Tcl is available, |
279 |
|
|
a {} block may contain a script to be evaluated. |
280 |
|
|
TODO: describe interface. |
281 |
|
|
|
282 |
|
|
|
283 |
|
|
--- |
284 |
|
|
$Id: Views.txt,v 1.3 2003/06/02 07:49:08 kripke Exp $ |