/[webpac]/openisis/current/doc/formatting.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/formatting.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years, 1 month ago) by dpavlin
File MIME type: text/plain
File size: 10148 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 * supported features
2    
3     As of version 0.8.6, openisis supports only a *very* limited
4     subset of the formatting language.
5     It should however be sufficient for the most important purposes:
6     building indexes and basic preprocessing.
7    
8     - all kinds of literals (',",|)
9     - all modes (P,D,H)
10     - Vn[]^c[] field selector including repeated subfields
11     - implicit and explicit loops
12    
13     The rest of this text describes sort of a merger of formatting
14     features from both WinISIS and CISIS/wwwisis,
15     which openisis may attempt to support one day.
16    
17    
18     * formatting basics
19    
20     Formatting in openisis is separated into two tasks,
21     that used to be mixed up in traditional ISIS software.
22     - record processing
23     In openisis, execution of a ("print"-)format actually transforms
24     one or more records into one new ("output") record.
25     It loops and selects fields, applies Mxx modes, REFs other records,
26     but screen formatting directives are simply added as special fields.
27     In terms of relational databases, a format defines a view.
28     - screen rendering
29     It is then the task of separate rendering engines to turn those
30     printformat fields into well-indented plaintext or HTML or Postscript
31     or TeX or Windows GDI commands or you name it.
32    
33    
34     * elements of formatting
35    
36     A formatting expression is a series of literals and functions.
37     Funtions use zero, one or more data items as parameters.
38     Functions that expect one of the parameters to the left are called operators.
39    
40     * input types
41    
42     The expected type of a function parameter can be one of:
43     - s string (auto-concatenated auto-stringified any)
44     - n numeric expression (including string, boolean)
45     - o output (auto-stringified, but not concatenated)
46     - v variable or iterator
47     - r row iterator (list of rowids)
48    
49     Immediatly after the function name, the following can be used:
50     - c a single character
51     - a alphanumeric bareword (identifier in the C language)
52     - i integer literal
53     - x anything
54    
55    
56     * output types
57    
58     The output of an expression is zero, one or more fields,
59     which can be string, numeric or value.
60     The type of the last field determines which operators are recognized.
61    
62     Field tags are positive for values (i.e. the output of value iterators),
63     zero for literals and negative in the range -1 .. -999 for printformats.
64     Other negative tags are only temporary:
65     value iterators are finally evaluated to emit values,
66     conditional literals and numbers are later eliminated or
67     changed to zero, resulting in a literal.
68    
69    
70     * context
71    
72     during processing, we have the following context:
73     o output record (not changed)
74     r input record/s, db and loop; changed by REF, LR and loops
75     f format, mode and variables; changed by @, Mcc
76     x frame: function, signature, parameter type and position
77     new frames are opened for functions, operators and blocks
78    
79    
80     * show stoppers: ( ) , .. ]
81    
82     During processing the format string is scanned left to right
83     and fields are appended to the ouput record as encountered.
84     Function "calls" can be explicit (parameter list is enclosed in parentheses),
85     or implicit, taking only one parameter (operators and literals like V24, X3).
86    
87     An opening parentheses following an operator or literal function makes the
88     frame explicit. A parameter type of i or a is changed to n or s, resp.
89     Otherwise a ( starts
90     - in numeric context: an anonymous (n) arithmetic parentheses
91     - in string context: an S(s_) function
92     - in output context inside a loop: an indentation function
93     - otherwise starts an explicit loop
94    
95     A closing parenthes closes whatever was opened by the last opening one.
96     A comma ',', or range '..' closes a parameter,
97     which closes an operator or saturated implicit loop.
98     If there was no parameter to close, an empty string or 0 number is emitted,
99     thus [..] is equivalent to [0..0].
100    
101    
102     * functions and type coercion
103    
104     Every function has a signature which denotes the types of the
105     expected parameters.
106     Functions and switches are put on the stack, the type expected
107     from the following expressions is set according to their signature.
108     Whenever a comma or the closing parentheses is encountered,
109     the fields added for this parameter are converted to the expected type.
110     This is not much work for numeric or boolean types,
111     since expressions are evaluated while parsing.
112     Where a string is expected, we might have a record (i.e. multiple strings),
113     which is then collapsed as with the S function:
114     all printformats are discarded and other strings are concatenated.
115     When a function sees it's closing ')', it replaces the parameters
116     pushed on the output record by the function value calculated from them.
117     The return value is tagged with the first data tag encountered,
118     even from records evaluating to empty.
119    
120     The signature is denoted by a string, where
121     - the 1st char gives the left hand operand ('_' for none)
122     - the 2nd char is a digit of required params or '_' for specials
123     - the following chars give types of each parameter in turn
124     - an trailing '_' denotes, that the last type may be repeated
125    
126    
127     * record loops: "x" |x| + (s_) WHILE n (s_) CONTINUE BREAK OCC IOCC
128    
129     A loop is either started explicitly by a '(' in record context,
130     or implicitly by a conditional literal or a V value iterator.
131     An explicit loop is closed by the matching ')',
132     while an implicit one ends when saturated (after first iterator),
133     with a comma ',' or any other V iterator.
134     The loop content is then repeatedly executed while incrementing
135     the OCC counter from 1 (and possibly subject to a preceding WHILE).
136     On each turn there is a "last" flag (initially true), which is cleared by any
137     iterator that expects to have more fields (this may even be true if
138     there was no OCCth field, e.g. if the OCCth field didn't have some subfield),
139     consequently, if there are no iterators, the first turn is the last.
140     Iterators also set or clear a "had" state (initially unknown).
141     During each execution of the loop, we're then in string type context:
142     - a "" conditional emits it's contents only on OCC=1
143     if before the first iterator, else on last=true
144     - a || conditional emits it's contents on had!=false
145     - a + undoes an immediatly preceding conditional on OCC==1
146     and sets had to false if last is true
147     - an iterator adds the OCCth of (the selected) occurences (see below)
148     - if an iterator has no OCCth occurence, an immediatly preceding
149     || is undone and had is set to false
150     - other tokens are processed normally
151    
152    
153     * field iterator and operators: Vi Di Ni [n_] ^c
154     A field clause is always evaluated in a loop over OCC.
155     The iterator can be modified by the range and subfield operators.
156     What are the occurences in question, depends on the selected options:
157     field occ range, subfield selector and subfield range.
158     The integer list n_ may be given as x..y, where a missing x is 1 and
159     missing y or the keyword LAST means up to last occurence.
160     If no range is specified, that means all like [..].
161     If a subfield is specified, that subfield is selected.
162     If a subfield range is given, those occurences of the subfield are used:
163     if no range is specified for the field, subfield occs are relative
164     to the record, else relative to each individual field occ,
165     so use V71[..]^a[1] to get the first occ of a in each occ of v71.
166     With 71=^afoo^ax, 71=^abar^ay and 71=^abaz^az,
167     V71^a[1..3] will give the first three total occurences of subfield a,
168     i.e. foo, x and bar, whereas in CISIS it would give foo, bar and baz.
169    
170     * other values and operators on values: Ei Si :=
171    
172     * operators on string: *n .n (n) (n,n)
173     The * and . string operators modify the top (the last pushed field)
174     by removing the first n chars or all but the first n chars, resp.
175     An () indentation operator is a somewhat late indentation printformat,
176     which exchanges itself with the previous field.
177    
178     * arithmetic operators on numbers: * / + -
179     * relational operators on numbers: = <> < <= > >=
180     * relational operators on strings: = <> < <= > >= :
181     * boolean operators on numbers: AND NOT OR
182    
183     * literals: 123 123.45 'x' "x" |x| /*x*/ !cxc
184     * switches: IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
185     * format: @a MPL MPU MHL MHU MDL MDU
186    
187     * db: DB MSTNAME MFN[i] L(s) NPST(s) NPOST(s) LR(s) LR(s[,n,n]) REF(r,o)
188     * xternal db: L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
189    
190     * printformats
191     # / % { } !cxc () QC QJ B I UL Ci Xi Fi FSi CLi NEWLINE(s) LW(n) PICT(s)
192     M(n[,n]) TAB[i] BOX[i] NP[i] NC[i] BPICT(s[,n])
193     FONTS(a_) COLS(s_) LINK(s_)
194    
195     * other functions
196     &a(s_) CAT(s) GETENV(s) PUTENV(s) SYSTEM(s) DATE[i] PROC(s)
197     DATETIME DATEONLY VAL(s) RMAX(n_) RMIN(n_) RSUM(n_) RAVR(n_)
198     LEFT(s,n) RIGHT(s,n) SS(n,n,s) MID(s,n,n) REPLACE(s,s,s) INSTR(s,s) SIZE(s)
199     F(n) F(n,n) F(n,n,n) TYPE(s) TYPE(s,s) S(s_) LAST
200     NOCC(s_) P(s_) A(s_)
201    
202     * unsupported syntax
203     L([s]s) NPOST([s]s) REF([s]j,...) -- hopeless, use winisis notation
204    
205    
206    
207     * list of tokens by syntax
208    
209     - empty tokens
210     stopper: , .. ) ] THEN ELSE FI CASE ELSECASE ENDSEL CONTINUE BREAK
211     state: MPL MPU MHL MHU MDL MDU
212     values: DB MSTNAME OCC IOCC # / % { } QC QJ B I UL DATETIME DATEONLY LAST
213     - immediate literal
214     i MFN[i] DATE[i] TAB[i] BOX[i] NP[i] NC[i] Ci Xi Fi FSi CLi Vi Di Ni Ei Si
215     @a L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
216     ^c 'x' "x" |x| /*x*/ !cxc
217     - syntax blocks (jump & run)
218     o ( o ) WHILE n ( o ) REF( r, o )
219     IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
220     - operators
221     [n..n] ^c (n[,n]) := . * / + - = <> < <= > >= : AND OR NOT
222     - all others are properly braced functions
223     - ambiguous tokens
224     S F might be Si, Fi or S(), F()
225     / + * might be arithmetic or string operators or a newline
226     = <> < <= > >= might compare strings or numbers
227     := assigns number to Ei, string else
228     ( opens one or the other frame ...
229    
230    
231     * tokenizing & processing
232    
233     * read a token and literal
234     - get (longest matching) token
235     - if token accepts a literal, get the literal
236     - if token accepts an opening (, get it
237     - resolve S/F ambiguity syntactically depending on presence of i literal
238    
239     * process
240     - if token is possibly an operator of higher precedence,
241     check operator ambiguities,
242     coerce field according to operators wishes
243     and go opening the operator frame
244     - else
245     coerce field according to frame context
246     - if we had a field in o-context or token is a stopper,
247     close the parameter
248     - if the frame is implicit and saturated or token is a frame closer,
249     close the frame and start over processing
250     - add the token or literal

  ViewVC Help
Powered by ViewVC 1.1.26