1 |
* supported features |
2 |
|
3 |
As of version 0.8.6, openisis supports only a *very* limited |
4 |
subset of the formatting language. |
5 |
It should however be sufficient for the most important purposes: |
6 |
building indexes and basic preprocessing. |
7 |
|
8 |
- all kinds of literals (',",|) |
9 |
- all modes (P,D,H) |
10 |
- Vn[]^c[] field selector including repeated subfields |
11 |
- implicit and explicit loops |
12 |
|
13 |
The rest of this text describes sort of a merger of formatting |
14 |
features from both WinISIS and CISIS/wwwisis, |
15 |
which openisis may attempt to support one day. |
16 |
|
17 |
|
18 |
* formatting basics |
19 |
|
20 |
Formatting in openisis is separated into two tasks, |
21 |
that used to be mixed up in traditional ISIS software. |
22 |
- record processing |
23 |
In openisis, execution of a ("print"-)format actually transforms |
24 |
one or more records into one new ("output") record. |
25 |
It loops and selects fields, applies Mxx modes, REFs other records, |
26 |
but screen formatting directives are simply added as special fields. |
27 |
In terms of relational databases, a format defines a view. |
28 |
- screen rendering |
29 |
It is then the task of separate rendering engines to turn those |
30 |
printformat fields into well-indented plaintext or HTML or Postscript |
31 |
or TeX or Windows GDI commands or you name it. |
32 |
|
33 |
|
34 |
* elements of formatting |
35 |
|
36 |
A formatting expression is a series of literals and functions. |
37 |
Funtions use zero, one or more data items as parameters. |
38 |
Functions that expect one of the parameters to the left are called operators. |
39 |
|
40 |
* input types |
41 |
|
42 |
The expected type of a function parameter can be one of: |
43 |
- s string (auto-concatenated auto-stringified any) |
44 |
- n numeric expression (including string, boolean) |
45 |
- o output (auto-stringified, but not concatenated) |
46 |
- v variable or iterator |
47 |
- r row iterator (list of rowids) |
48 |
|
49 |
Immediatly after the function name, the following can be used: |
50 |
- c a single character |
51 |
- a alphanumeric bareword (identifier in the C language) |
52 |
- i integer literal |
53 |
- x anything |
54 |
|
55 |
|
56 |
* output types |
57 |
|
58 |
The output of an expression is zero, one or more fields, |
59 |
which can be string, numeric or value. |
60 |
The type of the last field determines which operators are recognized. |
61 |
|
62 |
Field tags are positive for values (i.e. the output of value iterators), |
63 |
zero for literals and negative in the range -1 .. -999 for printformats. |
64 |
Other negative tags are only temporary: |
65 |
value iterators are finally evaluated to emit values, |
66 |
conditional literals and numbers are later eliminated or |
67 |
changed to zero, resulting in a literal. |
68 |
|
69 |
|
70 |
* context |
71 |
|
72 |
during processing, we have the following context: |
73 |
o output record (not changed) |
74 |
r input record/s, db and loop; changed by REF, LR and loops |
75 |
f format, mode and variables; changed by @, Mcc |
76 |
x frame: function, signature, parameter type and position |
77 |
new frames are opened for functions, operators and blocks |
78 |
|
79 |
|
80 |
* show stoppers: ( ) , .. ] |
81 |
|
82 |
During processing the format string is scanned left to right |
83 |
and fields are appended to the ouput record as encountered. |
84 |
Function "calls" can be explicit (parameter list is enclosed in parentheses), |
85 |
or implicit, taking only one parameter (operators and literals like V24, X3). |
86 |
|
87 |
An opening parentheses following an operator or literal function makes the |
88 |
frame explicit. A parameter type of i or a is changed to n or s, resp. |
89 |
Otherwise a ( starts |
90 |
- in numeric context: an anonymous (n) arithmetic parentheses |
91 |
- in string context: an S(s_) function |
92 |
- in output context inside a loop: an indentation function |
93 |
- otherwise starts an explicit loop |
94 |
|
95 |
A closing parenthes closes whatever was opened by the last opening one. |
96 |
A comma ',', or range '..' closes a parameter, |
97 |
which closes an operator or saturated implicit loop. |
98 |
If there was no parameter to close, an empty string or 0 number is emitted, |
99 |
thus [..] is equivalent to [0..0]. |
100 |
|
101 |
|
102 |
* functions and type coercion |
103 |
|
104 |
Every function has a signature which denotes the types of the |
105 |
expected parameters. |
106 |
Functions and switches are put on the stack, the type expected |
107 |
from the following expressions is set according to their signature. |
108 |
Whenever a comma or the closing parentheses is encountered, |
109 |
the fields added for this parameter are converted to the expected type. |
110 |
This is not much work for numeric or boolean types, |
111 |
since expressions are evaluated while parsing. |
112 |
Where a string is expected, we might have a record (i.e. multiple strings), |
113 |
which is then collapsed as with the S function: |
114 |
all printformats are discarded and other strings are concatenated. |
115 |
When a function sees it's closing ')', it replaces the parameters |
116 |
pushed on the output record by the function value calculated from them. |
117 |
The return value is tagged with the first data tag encountered, |
118 |
even from records evaluating to empty. |
119 |
|
120 |
The signature is denoted by a string, where |
121 |
- the 1st char gives the left hand operand ('_' for none) |
122 |
- the 2nd char is a digit of required params or '_' for specials |
123 |
- the following chars give types of each parameter in turn |
124 |
- an trailing '_' denotes, that the last type may be repeated |
125 |
|
126 |
|
127 |
* record loops: "x" |x| + (s_) WHILE n (s_) CONTINUE BREAK OCC IOCC |
128 |
|
129 |
A loop is either started explicitly by a '(' in record context, |
130 |
or implicitly by a conditional literal or a V value iterator. |
131 |
An explicit loop is closed by the matching ')', |
132 |
while an implicit one ends when saturated (after first iterator), |
133 |
with a comma ',' or any other V iterator. |
134 |
The loop content is then repeatedly executed while incrementing |
135 |
the OCC counter from 1 (and possibly subject to a preceding WHILE). |
136 |
On each turn there is a "last" flag (initially true), which is cleared by any |
137 |
iterator that expects to have more fields (this may even be true if |
138 |
there was no OCCth field, e.g. if the OCCth field didn't have some subfield), |
139 |
consequently, if there are no iterators, the first turn is the last. |
140 |
Iterators also set or clear a "had" state (initially unknown). |
141 |
During each execution of the loop, we're then in string type context: |
142 |
- a "" conditional emits it's contents only on OCC=1 |
143 |
if before the first iterator, else on last=true |
144 |
- a || conditional emits it's contents on had!=false |
145 |
- a + undoes an immediatly preceding conditional on OCC==1 |
146 |
and sets had to false if last is true |
147 |
- an iterator adds the OCCth of (the selected) occurences (see below) |
148 |
- if an iterator has no OCCth occurence, an immediatly preceding |
149 |
|| is undone and had is set to false |
150 |
- other tokens are processed normally |
151 |
|
152 |
|
153 |
* field iterator and operators: Vi Di Ni [n_] ^c |
154 |
A field clause is always evaluated in a loop over OCC. |
155 |
The iterator can be modified by the range and subfield operators. |
156 |
What are the occurences in question, depends on the selected options: |
157 |
field occ range, subfield selector and subfield range. |
158 |
The integer list n_ may be given as x..y, where a missing x is 1 and |
159 |
missing y or the keyword LAST means up to last occurence. |
160 |
If no range is specified, that means all like [..]. |
161 |
If a subfield is specified, that subfield is selected. |
162 |
If a subfield range is given, those occurences of the subfield are used: |
163 |
if no range is specified for the field, subfield occs are relative |
164 |
to the record, else relative to each individual field occ, |
165 |
so use V71[..]^a[1] to get the first occ of a in each occ of v71. |
166 |
With 71=^afoo^ax, 71=^abar^ay and 71=^abaz^az, |
167 |
V71^a[1..3] will give the first three total occurences of subfield a, |
168 |
i.e. foo, x and bar, whereas in CISIS it would give foo, bar and baz. |
169 |
|
170 |
* other values and operators on values: Ei Si := |
171 |
|
172 |
* operators on string: *n .n (n) (n,n) |
173 |
The * and . string operators modify the top (the last pushed field) |
174 |
by removing the first n chars or all but the first n chars, resp. |
175 |
An () indentation operator is a somewhat late indentation printformat, |
176 |
which exchanges itself with the previous field. |
177 |
|
178 |
* arithmetic operators on numbers: * / + - |
179 |
* relational operators on numbers: = <> < <= > >= |
180 |
* relational operators on strings: = <> < <= > >= : |
181 |
* boolean operators on numbers: AND NOT OR |
182 |
|
183 |
* literals: 123 123.45 'x' "x" |x| /*x*/ !cxc |
184 |
* switches: IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL |
185 |
* format: @a MPL MPU MHL MHU MDL MDU |
186 |
|
187 |
* db: DB MSTNAME MFN[i] L(s) NPST(s) NPOST(s) LR(s) LR(s[,n,n]) REF(r,o) |
188 |
* xternal db: L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o) |
189 |
|
190 |
* printformats |
191 |
# / % { } !cxc () QC QJ B I UL Ci Xi Fi FSi CLi NEWLINE(s) LW(n) PICT(s) |
192 |
M(n[,n]) TAB[i] BOX[i] NP[i] NC[i] BPICT(s[,n]) |
193 |
FONTS(a_) COLS(s_) LINK(s_) |
194 |
|
195 |
* other functions |
196 |
&a(s_) CAT(s) GETENV(s) PUTENV(s) SYSTEM(s) DATE[i] PROC(s) |
197 |
DATETIME DATEONLY VAL(s) RMAX(n_) RMIN(n_) RSUM(n_) RAVR(n_) |
198 |
LEFT(s,n) RIGHT(s,n) SS(n,n,s) MID(s,n,n) REPLACE(s,s,s) INSTR(s,s) SIZE(s) |
199 |
F(n) F(n,n) F(n,n,n) TYPE(s) TYPE(s,s) S(s_) LAST |
200 |
NOCC(s_) P(s_) A(s_) |
201 |
|
202 |
* unsupported syntax |
203 |
L([s]s) NPOST([s]s) REF([s]j,...) -- hopeless, use winisis notation |
204 |
|
205 |
|
206 |
|
207 |
* list of tokens by syntax |
208 |
|
209 |
- empty tokens |
210 |
stopper: , .. ) ] THEN ELSE FI CASE ELSECASE ENDSEL CONTINUE BREAK |
211 |
state: MPL MPU MHL MHU MDL MDU |
212 |
values: DB MSTNAME OCC IOCC # / % { } QC QJ B I UL DATETIME DATEONLY LAST |
213 |
- immediate literal |
214 |
i MFN[i] DATE[i] TAB[i] BOX[i] NP[i] NC[i] Ci Xi Fi FSi CLi Vi Di Ni Ei Si |
215 |
@a L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o) |
216 |
^c 'x' "x" |x| /*x*/ !cxc |
217 |
- syntax blocks (jump & run) |
218 |
o ( o ) WHILE n ( o ) REF( r, o ) |
219 |
IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL |
220 |
- operators |
221 |
[n..n] ^c (n[,n]) := . * / + - = <> < <= > >= : AND OR NOT |
222 |
- all others are properly braced functions |
223 |
- ambiguous tokens |
224 |
S F might be Si, Fi or S(), F() |
225 |
/ + * might be arithmetic or string operators or a newline |
226 |
= <> < <= > >= might compare strings or numbers |
227 |
:= assigns number to Ei, string else |
228 |
( opens one or the other frame ... |
229 |
|
230 |
|
231 |
* tokenizing & processing |
232 |
|
233 |
* read a token and literal |
234 |
- get (longest matching) token |
235 |
- if token accepts a literal, get the literal |
236 |
- if token accepts an opening (, get it |
237 |
- resolve S/F ambiguity syntactically depending on presence of i literal |
238 |
|
239 |
* process |
240 |
- if token is possibly an operator of higher precedence, |
241 |
check operator ambiguities, |
242 |
coerce field according to operators wishes |
243 |
and go opening the operator frame |
244 |
- else |
245 |
coerce field according to frame context |
246 |
- if we had a field in o-context or token is a stopper, |
247 |
close the parameter |
248 |
- if the frame is implicit and saturated or token is a frame closer, |
249 |
close the frame and start over processing |
250 |
- add the token or literal |