1 |
dpavlin |
237 |
* supported features |
2 |
|
|
|
3 |
|
|
As of version 0.8.6, openisis supports only a *very* limited |
4 |
|
|
subset of the formatting language. |
5 |
|
|
It should however be sufficient for the most important purposes: |
6 |
|
|
building indexes and basic preprocessing. |
7 |
|
|
|
8 |
|
|
- all kinds of literals (',",|) |
9 |
|
|
- all modes (P,D,H) |
10 |
|
|
- Vn[]^c[] field selector including repeated subfields |
11 |
|
|
- implicit and explicit loops |
12 |
|
|
|
13 |
|
|
The rest of this text describes sort of a merger of formatting |
14 |
|
|
features from both WinISIS and CISIS/wwwisis, |
15 |
|
|
which openisis may attempt to support one day. |
16 |
|
|
|
17 |
|
|
|
18 |
|
|
* formatting basics |
19 |
|
|
|
20 |
|
|
Formatting in openisis is separated into two tasks, |
21 |
|
|
that used to be mixed up in traditional ISIS software. |
22 |
|
|
- record processing |
23 |
|
|
In openisis, execution of a ("print"-)format actually transforms |
24 |
|
|
one or more records into one new ("output") record. |
25 |
|
|
It loops and selects fields, applies Mxx modes, REFs other records, |
26 |
|
|
but screen formatting directives are simply added as special fields. |
27 |
|
|
In terms of relational databases, a format defines a view. |
28 |
|
|
- screen rendering |
29 |
|
|
It is then the task of separate rendering engines to turn those |
30 |
|
|
printformat fields into well-indented plaintext or HTML or Postscript |
31 |
|
|
or TeX or Windows GDI commands or you name it. |
32 |
|
|
|
33 |
|
|
|
34 |
|
|
* elements of formatting |
35 |
|
|
|
36 |
|
|
A formatting expression is a series of literals and functions. |
37 |
|
|
Funtions use zero, one or more data items as parameters. |
38 |
|
|
Functions that expect one of the parameters to the left are called operators. |
39 |
|
|
|
40 |
|
|
* input types |
41 |
|
|
|
42 |
|
|
The expected type of a function parameter can be one of: |
43 |
|
|
- s string (auto-concatenated auto-stringified any) |
44 |
|
|
- n numeric expression (including string, boolean) |
45 |
|
|
- o output (auto-stringified, but not concatenated) |
46 |
|
|
- v variable or iterator |
47 |
|
|
- r row iterator (list of rowids) |
48 |
|
|
|
49 |
|
|
Immediatly after the function name, the following can be used: |
50 |
|
|
- c a single character |
51 |
|
|
- a alphanumeric bareword (identifier in the C language) |
52 |
|
|
- i integer literal |
53 |
|
|
- x anything |
54 |
|
|
|
55 |
|
|
|
56 |
|
|
* output types |
57 |
|
|
|
58 |
|
|
The output of an expression is zero, one or more fields, |
59 |
|
|
which can be string, numeric or value. |
60 |
|
|
The type of the last field determines which operators are recognized. |
61 |
|
|
|
62 |
|
|
Field tags are positive for values (i.e. the output of value iterators), |
63 |
|
|
zero for literals and negative in the range -1 .. -999 for printformats. |
64 |
|
|
Other negative tags are only temporary: |
65 |
|
|
value iterators are finally evaluated to emit values, |
66 |
|
|
conditional literals and numbers are later eliminated or |
67 |
|
|
changed to zero, resulting in a literal. |
68 |
|
|
|
69 |
|
|
|
70 |
|
|
* context |
71 |
|
|
|
72 |
|
|
during processing, we have the following context: |
73 |
|
|
o output record (not changed) |
74 |
|
|
r input record/s, db and loop; changed by REF, LR and loops |
75 |
|
|
f format, mode and variables; changed by @, Mcc |
76 |
|
|
x frame: function, signature, parameter type and position |
77 |
|
|
new frames are opened for functions, operators and blocks |
78 |
|
|
|
79 |
|
|
|
80 |
|
|
* show stoppers: ( ) , .. ] |
81 |
|
|
|
82 |
|
|
During processing the format string is scanned left to right |
83 |
|
|
and fields are appended to the ouput record as encountered. |
84 |
|
|
Function "calls" can be explicit (parameter list is enclosed in parentheses), |
85 |
|
|
or implicit, taking only one parameter (operators and literals like V24, X3). |
86 |
|
|
|
87 |
|
|
An opening parentheses following an operator or literal function makes the |
88 |
|
|
frame explicit. A parameter type of i or a is changed to n or s, resp. |
89 |
|
|
Otherwise a ( starts |
90 |
|
|
- in numeric context: an anonymous (n) arithmetic parentheses |
91 |
|
|
- in string context: an S(s_) function |
92 |
|
|
- in output context inside a loop: an indentation function |
93 |
|
|
- otherwise starts an explicit loop |
94 |
|
|
|
95 |
|
|
A closing parenthes closes whatever was opened by the last opening one. |
96 |
|
|
A comma ',', or range '..' closes a parameter, |
97 |
|
|
which closes an operator or saturated implicit loop. |
98 |
|
|
If there was no parameter to close, an empty string or 0 number is emitted, |
99 |
|
|
thus [..] is equivalent to [0..0]. |
100 |
|
|
|
101 |
|
|
|
102 |
|
|
* functions and type coercion |
103 |
|
|
|
104 |
|
|
Every function has a signature which denotes the types of the |
105 |
|
|
expected parameters. |
106 |
|
|
Functions and switches are put on the stack, the type expected |
107 |
|
|
from the following expressions is set according to their signature. |
108 |
|
|
Whenever a comma or the closing parentheses is encountered, |
109 |
|
|
the fields added for this parameter are converted to the expected type. |
110 |
|
|
This is not much work for numeric or boolean types, |
111 |
|
|
since expressions are evaluated while parsing. |
112 |
|
|
Where a string is expected, we might have a record (i.e. multiple strings), |
113 |
|
|
which is then collapsed as with the S function: |
114 |
|
|
all printformats are discarded and other strings are concatenated. |
115 |
|
|
When a function sees it's closing ')', it replaces the parameters |
116 |
|
|
pushed on the output record by the function value calculated from them. |
117 |
|
|
The return value is tagged with the first data tag encountered, |
118 |
|
|
even from records evaluating to empty. |
119 |
|
|
|
120 |
|
|
The signature is denoted by a string, where |
121 |
|
|
- the 1st char gives the left hand operand ('_' for none) |
122 |
|
|
- the 2nd char is a digit of required params or '_' for specials |
123 |
|
|
- the following chars give types of each parameter in turn |
124 |
|
|
- an trailing '_' denotes, that the last type may be repeated |
125 |
|
|
|
126 |
|
|
|
127 |
|
|
* record loops: "x" |x| + (s_) WHILE n (s_) CONTINUE BREAK OCC IOCC |
128 |
|
|
|
129 |
|
|
A loop is either started explicitly by a '(' in record context, |
130 |
|
|
or implicitly by a conditional literal or a V value iterator. |
131 |
|
|
An explicit loop is closed by the matching ')', |
132 |
|
|
while an implicit one ends when saturated (after first iterator), |
133 |
|
|
with a comma ',' or any other V iterator. |
134 |
|
|
The loop content is then repeatedly executed while incrementing |
135 |
|
|
the OCC counter from 1 (and possibly subject to a preceding WHILE). |
136 |
|
|
On each turn there is a "last" flag (initially true), which is cleared by any |
137 |
|
|
iterator that expects to have more fields (this may even be true if |
138 |
|
|
there was no OCCth field, e.g. if the OCCth field didn't have some subfield), |
139 |
|
|
consequently, if there are no iterators, the first turn is the last. |
140 |
|
|
Iterators also set or clear a "had" state (initially unknown). |
141 |
|
|
During each execution of the loop, we're then in string type context: |
142 |
|
|
- a "" conditional emits it's contents only on OCC=1 |
143 |
|
|
if before the first iterator, else on last=true |
144 |
|
|
- a || conditional emits it's contents on had!=false |
145 |
|
|
- a + undoes an immediatly preceding conditional on OCC==1 |
146 |
|
|
and sets had to false if last is true |
147 |
|
|
- an iterator adds the OCCth of (the selected) occurences (see below) |
148 |
|
|
- if an iterator has no OCCth occurence, an immediatly preceding |
149 |
|
|
|| is undone and had is set to false |
150 |
|
|
- other tokens are processed normally |
151 |
|
|
|
152 |
|
|
|
153 |
|
|
* field iterator and operators: Vi Di Ni [n_] ^c |
154 |
|
|
A field clause is always evaluated in a loop over OCC. |
155 |
|
|
The iterator can be modified by the range and subfield operators. |
156 |
|
|
What are the occurences in question, depends on the selected options: |
157 |
|
|
field occ range, subfield selector and subfield range. |
158 |
|
|
The integer list n_ may be given as x..y, where a missing x is 1 and |
159 |
|
|
missing y or the keyword LAST means up to last occurence. |
160 |
|
|
If no range is specified, that means all like [..]. |
161 |
|
|
If a subfield is specified, that subfield is selected. |
162 |
|
|
If a subfield range is given, those occurences of the subfield are used: |
163 |
|
|
if no range is specified for the field, subfield occs are relative |
164 |
|
|
to the record, else relative to each individual field occ, |
165 |
|
|
so use V71[..]^a[1] to get the first occ of a in each occ of v71. |
166 |
|
|
With 71=^afoo^ax, 71=^abar^ay and 71=^abaz^az, |
167 |
|
|
V71^a[1..3] will give the first three total occurences of subfield a, |
168 |
|
|
i.e. foo, x and bar, whereas in CISIS it would give foo, bar and baz. |
169 |
|
|
|
170 |
|
|
* other values and operators on values: Ei Si := |
171 |
|
|
|
172 |
|
|
* operators on string: *n .n (n) (n,n) |
173 |
|
|
The * and . string operators modify the top (the last pushed field) |
174 |
|
|
by removing the first n chars or all but the first n chars, resp. |
175 |
|
|
An () indentation operator is a somewhat late indentation printformat, |
176 |
|
|
which exchanges itself with the previous field. |
177 |
|
|
|
178 |
|
|
* arithmetic operators on numbers: * / + - |
179 |
|
|
* relational operators on numbers: = <> < <= > >= |
180 |
|
|
* relational operators on strings: = <> < <= > >= : |
181 |
|
|
* boolean operators on numbers: AND NOT OR |
182 |
|
|
|
183 |
|
|
* literals: 123 123.45 'x' "x" |x| /*x*/ !cxc |
184 |
|
|
* switches: IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL |
185 |
|
|
* format: @a MPL MPU MHL MHU MDL MDU |
186 |
|
|
|
187 |
|
|
* db: DB MSTNAME MFN[i] L(s) NPST(s) NPOST(s) LR(s) LR(s[,n,n]) REF(r,o) |
188 |
|
|
* xternal db: L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o) |
189 |
|
|
|
190 |
|
|
* printformats |
191 |
|
|
# / % { } !cxc () QC QJ B I UL Ci Xi Fi FSi CLi NEWLINE(s) LW(n) PICT(s) |
192 |
|
|
M(n[,n]) TAB[i] BOX[i] NP[i] NC[i] BPICT(s[,n]) |
193 |
|
|
FONTS(a_) COLS(s_) LINK(s_) |
194 |
|
|
|
195 |
|
|
* other functions |
196 |
|
|
&a(s_) CAT(s) GETENV(s) PUTENV(s) SYSTEM(s) DATE[i] PROC(s) |
197 |
|
|
DATETIME DATEONLY VAL(s) RMAX(n_) RMIN(n_) RSUM(n_) RAVR(n_) |
198 |
|
|
LEFT(s,n) RIGHT(s,n) SS(n,n,s) MID(s,n,n) REPLACE(s,s,s) INSTR(s,s) SIZE(s) |
199 |
|
|
F(n) F(n,n) F(n,n,n) TYPE(s) TYPE(s,s) S(s_) LAST |
200 |
|
|
NOCC(s_) P(s_) A(s_) |
201 |
|
|
|
202 |
|
|
* unsupported syntax |
203 |
|
|
L([s]s) NPOST([s]s) REF([s]j,...) -- hopeless, use winisis notation |
204 |
|
|
|
205 |
|
|
|
206 |
|
|
|
207 |
|
|
* list of tokens by syntax |
208 |
|
|
|
209 |
|
|
- empty tokens |
210 |
|
|
stopper: , .. ) ] THEN ELSE FI CASE ELSECASE ENDSEL CONTINUE BREAK |
211 |
|
|
state: MPL MPU MHL MHU MDL MDU |
212 |
|
|
values: DB MSTNAME OCC IOCC # / % { } QC QJ B I UL DATETIME DATEONLY LAST |
213 |
|
|
- immediate literal |
214 |
|
|
i MFN[i] DATE[i] TAB[i] BOX[i] NP[i] NC[i] Ci Xi Fi FSi CLi Vi Di Ni Ei Si |
215 |
|
|
@a L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o) |
216 |
|
|
^c 'x' "x" |x| /*x*/ !cxc |
217 |
|
|
- syntax blocks (jump & run) |
218 |
|
|
o ( o ) WHILE n ( o ) REF( r, o ) |
219 |
|
|
IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL |
220 |
|
|
- operators |
221 |
|
|
[n..n] ^c (n[,n]) := . * / + - = <> < <= > >= : AND OR NOT |
222 |
|
|
- all others are properly braced functions |
223 |
|
|
- ambiguous tokens |
224 |
|
|
S F might be Si, Fi or S(), F() |
225 |
|
|
/ + * might be arithmetic or string operators or a newline |
226 |
|
|
= <> < <= > >= might compare strings or numbers |
227 |
|
|
:= assigns number to Ei, string else |
228 |
|
|
( opens one or the other frame ... |
229 |
|
|
|
230 |
|
|
|
231 |
|
|
* tokenizing & processing |
232 |
|
|
|
233 |
|
|
* read a token and literal |
234 |
|
|
- get (longest matching) token |
235 |
|
|
- if token accepts a literal, get the literal |
236 |
|
|
- if token accepts an opening (, get it |
237 |
|
|
- resolve S/F ambiguity syntactically depending on presence of i literal |
238 |
|
|
|
239 |
|
|
* process |
240 |
|
|
- if token is possibly an operator of higher precedence, |
241 |
|
|
check operator ambiguities, |
242 |
|
|
coerce field according to operators wishes |
243 |
|
|
and go opening the operator frame |
244 |
|
|
- else |
245 |
|
|
coerce field according to frame context |
246 |
|
|
- if we had a field in o-context or token is a stopper, |
247 |
|
|
close the parameter |
248 |
|
|
- if the frame is implicit and saturated or token is a frame closer, |
249 |
|
|
close the frame and start over processing |
250 |
|
|
- add the token or literal |