3 As of version 0.8.6, openisis supports only a *very* limited
4 subset of the formatting language.
5 It should however be sufficient for the most important purposes:
6 building indexes and basic preprocessing.
8 - all kinds of literals (',",|)
10 - Vn[]^c[] field selector including repeated subfields
11 - implicit and explicit loops
13 The rest of this text describes sort of a merger of formatting
14 features from both WinISIS and CISIS/wwwisis,
15 which openisis may attempt to support one day.
20 Formatting in openisis is separated into two tasks,
21 that used to be mixed up in traditional ISIS software.
23 In openisis, execution of a ("print"-)format actually transforms
24 one or more records into one new ("output") record.
25 It loops and selects fields, applies Mxx modes, REFs other records,
26 but screen formatting directives are simply added as special fields.
27 In terms of relational databases, a format defines a view.
29 It is then the task of separate rendering engines to turn those
30 printformat fields into well-indented plaintext or HTML or Postscript
31 or TeX or Windows GDI commands or you name it.
34 * elements of formatting
36 A formatting expression is a series of literals and functions.
37 Funtions use zero, one or more data items as parameters.
38 Functions that expect one of the parameters to the left are called operators.
42 The expected type of a function parameter can be one of:
43 - s string (auto-concatenated auto-stringified any)
44 - n numeric expression (including string, boolean)
45 - o output (auto-stringified, but not concatenated)
46 - v variable or iterator
47 - r row iterator (list of rowids)
49 Immediatly after the function name, the following can be used:
50 - c a single character
51 - a alphanumeric bareword (identifier in the C language)
58 The output of an expression is zero, one or more fields,
59 which can be string, numeric or value.
60 The type of the last field determines which operators are recognized.
62 Field tags are positive for values (i.e. the output of value iterators),
63 zero for literals and negative in the range -1 .. -999 for printformats.
64 Other negative tags are only temporary:
65 value iterators are finally evaluated to emit values,
66 conditional literals and numbers are later eliminated or
67 changed to zero, resulting in a literal.
72 during processing, we have the following context:
73 o output record (not changed)
74 r input record/s, db and loop; changed by REF, LR and loops
75 f format, mode and variables; changed by @, Mcc
76 x frame: function, signature, parameter type and position
77 new frames are opened for functions, operators and blocks
80 * show stoppers: ( ) , .. ]
82 During processing the format string is scanned left to right
83 and fields are appended to the ouput record as encountered.
84 Function "calls" can be explicit (parameter list is enclosed in parentheses),
85 or implicit, taking only one parameter (operators and literals like V24, X3).
87 An opening parentheses following an operator or literal function makes the
88 frame explicit. A parameter type of i or a is changed to n or s, resp.
90 - in numeric context: an anonymous (n) arithmetic parentheses
91 - in string context: an S(s_) function
92 - in output context inside a loop: an indentation function
93 - otherwise starts an explicit loop
95 A closing parenthes closes whatever was opened by the last opening one.
96 A comma ',', or range '..' closes a parameter,
97 which closes an operator or saturated implicit loop.
98 If there was no parameter to close, an empty string or 0 number is emitted,
99 thus [..] is equivalent to [0..0].
102 * functions and type coercion
104 Every function has a signature which denotes the types of the
106 Functions and switches are put on the stack, the type expected
107 from the following expressions is set according to their signature.
108 Whenever a comma or the closing parentheses is encountered,
109 the fields added for this parameter are converted to the expected type.
110 This is not much work for numeric or boolean types,
111 since expressions are evaluated while parsing.
112 Where a string is expected, we might have a record (i.e. multiple strings),
113 which is then collapsed as with the S function:
114 all printformats are discarded and other strings are concatenated.
115 When a function sees it's closing ')', it replaces the parameters
116 pushed on the output record by the function value calculated from them.
117 The return value is tagged with the first data tag encountered,
118 even from records evaluating to empty.
120 The signature is denoted by a string, where
121 - the 1st char gives the left hand operand ('_' for none)
122 - the 2nd char is a digit of required params or '_' for specials
123 - the following chars give types of each parameter in turn
124 - an trailing '_' denotes, that the last type may be repeated
127 * record loops: "x" |x| + (s_) WHILE n (s_) CONTINUE BREAK OCC IOCC
129 A loop is either started explicitly by a '(' in record context,
130 or implicitly by a conditional literal or a V value iterator.
131 An explicit loop is closed by the matching ')',
132 while an implicit one ends when saturated (after first iterator),
133 with a comma ',' or any other V iterator.
134 The loop content is then repeatedly executed while incrementing
135 the OCC counter from 1 (and possibly subject to a preceding WHILE).
136 On each turn there is a "last" flag (initially true), which is cleared by any
137 iterator that expects to have more fields (this may even be true if
138 there was no OCCth field, e.g. if the OCCth field didn't have some subfield),
139 consequently, if there are no iterators, the first turn is the last.
140 Iterators also set or clear a "had" state (initially unknown).
141 During each execution of the loop, we're then in string type context:
142 - a "" conditional emits it's contents only on OCC=1
143 if before the first iterator, else on last=true
144 - a || conditional emits it's contents on had!=false
145 - a + undoes an immediatly preceding conditional on OCC==1
146 and sets had to false if last is true
147 - an iterator adds the OCCth of (the selected) occurences (see below)
148 - if an iterator has no OCCth occurence, an immediatly preceding
149 || is undone and had is set to false
150 - other tokens are processed normally
153 * field iterator and operators: Vi Di Ni [n_] ^c
154 A field clause is always evaluated in a loop over OCC.
155 The iterator can be modified by the range and subfield operators.
156 What are the occurences in question, depends on the selected options:
157 field occ range, subfield selector and subfield range.
158 The integer list n_ may be given as x..y, where a missing x is 1 and
159 missing y or the keyword LAST means up to last occurence.
160 If no range is specified, that means all like [..].
161 If a subfield is specified, that subfield is selected.
162 If a subfield range is given, those occurences of the subfield are used:
163 if no range is specified for the field, subfield occs are relative
164 to the record, else relative to each individual field occ,
165 so use V71[..]^a[1] to get the first occ of a in each occ of v71.
166 With 71=^afoo^ax, 71=^abar^ay and 71=^abaz^az,
167 V71^a[1..3] will give the first three total occurences of subfield a,
168 i.e. foo, x and bar, whereas in CISIS it would give foo, bar and baz.
170 * other values and operators on values: Ei Si :=
172 * operators on string: *n .n (n) (n,n)
173 The * and . string operators modify the top (the last pushed field)
174 by removing the first n chars or all but the first n chars, resp.
175 An () indentation operator is a somewhat late indentation printformat,
176 which exchanges itself with the previous field.
178 * arithmetic operators on numbers: * / + -
179 * relational operators on numbers: = <> < <= > >=
180 * relational operators on strings: = <> < <= > >= :
181 * boolean operators on numbers: AND NOT OR
183 * literals: 123 123.45 'x' "x" |x| /*x*/ !cxc
184 * switches: IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
185 * format: @a MPL MPU MHL MHU MDL MDU
187 * db: DB MSTNAME MFN[i] L(s) NPST(s) NPOST(s) LR(s) LR(s[,n,n]) REF(r,o)
188 * xternal db: L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
191 # / % { } !cxc () QC QJ B I UL Ci Xi Fi FSi CLi NEWLINE(s) LW(n) PICT(s)
192 M(n[,n]) TAB[i] BOX[i] NP[i] NC[i] BPICT(s[,n])
193 FONTS(a_) COLS(s_) LINK(s_)
196 &a(s_) CAT(s) GETENV(s) PUTENV(s) SYSTEM(s) DATE[i] PROC(s)
197 DATETIME DATEONLY VAL(s) RMAX(n_) RMIN(n_) RSUM(n_) RAVR(n_)
198 LEFT(s,n) RIGHT(s,n) SS(n,n,s) MID(s,n,n) REPLACE(s,s,s) INSTR(s,s) SIZE(s)
199 F(n) F(n,n) F(n,n,n) TYPE(s) TYPE(s,s) S(s_) LAST
203 L([s]s) NPOST([s]s) REF([s]j,...) -- hopeless, use winisis notation
207 * list of tokens by syntax
210 stopper: , .. ) ] THEN ELSE FI CASE ELSECASE ENDSEL CONTINUE BREAK
211 state: MPL MPU MHL MHU MDL MDU
212 values: DB MSTNAME OCC IOCC # / % { } QC QJ B I UL DATETIME DATEONLY LAST
214 i MFN[i] DATE[i] TAB[i] BOX[i] NP[i] NC[i] Ci Xi Fi FSi CLi Vi Di Ni Ei Si
215 @a L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
216 ^c 'x' "x" |x| /*x*/ !cxc
217 - syntax blocks (jump & run)
218 o ( o ) WHILE n ( o ) REF( r, o )
219 IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
221 [n..n] ^c (n[,n]) := . * / + - = <> < <= > >= : AND OR NOT
222 - all others are properly braced functions
224 S F might be Si, Fi or S(), F()
225 / + * might be arithmetic or string operators or a newline
226 = <> < <= > >= might compare strings or numbers
227 := assigns number to Ei, string else
228 ( opens one or the other frame ...
231 * tokenizing & processing
233 * read a token and literal
234 - get (longest matching) token
235 - if token accepts a literal, get the literal
236 - if token accepts an opening (, get it
237 - resolve S/F ambiguity syntactically depending on presence of i literal
240 - if token is possibly an operator of higher precedence,
241 check operator ambiguities,
242 coerce field according to operators wishes
243 and go opening the operator frame
245 coerce field according to frame context
246 - if we had a field in o-context or token is a stopper,
248 - if the frame is implicit and saturated or token is a frame closer,
249 close the frame and start over processing
250 - add the token or literal