+++ /dev/null
-* supported features
-
-As of version 0.8.6, openisis supports only a *very* limited
-subset of the formatting language.
-It should however be sufficient for the most important purposes:
-building indexes and basic preprocessing.
-
-- all kinds of literals (',",|)
-- all modes (P,D,H)
-- Vn[]^c[] field selector including repeated subfields
-- implicit and explicit loops
-
-The rest of this text describes sort of a merger of formatting
-features from both WinISIS and CISIS/wwwisis,
-which openisis may attempt to support one day.
-
-
-* formatting basics
-
-Formatting in openisis is separated into two tasks,
-that used to be mixed up in traditional ISIS software.
-- record processing
- In openisis, execution of a ("print"-)format actually transforms
- one or more records into one new ("output") record.
- It loops and selects fields, applies Mxx modes, REFs other records,
- but screen formatting directives are simply added as special fields.
- In terms of relational databases, a format defines a view.
-- screen rendering
- It is then the task of separate rendering engines to turn those
- printformat fields into well-indented plaintext or HTML or Postscript
- or TeX or Windows GDI commands or you name it.
-
-
-* elements of formatting
-
-A formatting expression is a series of literals and functions.
-Funtions use zero, one or more data items as parameters.
-Functions that expect one of the parameters to the left are called operators.
-
-* input types
-
-The expected type of a function parameter can be one of:
-- s string (auto-concatenated auto-stringified any)
-- n numeric expression (including string, boolean)
-- o output (auto-stringified, but not concatenated)
-- v variable or iterator
-- r row iterator (list of rowids)
-
-Immediatly after the function name, the following can be used:
-- c a single character
-- a alphanumeric bareword (identifier in the C language)
-- i integer literal
-- x anything
-
-
-* output types
-
-The output of an expression is zero, one or more fields,
-which can be string, numeric or value.
-The type of the last field determines which operators are recognized.
-
-Field tags are positive for values (i.e. the output of value iterators),
-zero for literals and negative in the range -1 .. -999 for printformats.
-Other negative tags are only temporary:
-value iterators are finally evaluated to emit values,
-conditional literals and numbers are later eliminated or
-changed to zero, resulting in a literal.
-
-
-* context
-
-during processing, we have the following context:
-o output record (not changed)
-r input record/s, db and loop; changed by REF, LR and loops
-f format, mode and variables; changed by @, Mcc
-x frame: function, signature, parameter type and position
- new frames are opened for functions, operators and blocks
-
-
-* show stoppers: ( ) , .. ]
-
-During processing the format string is scanned left to right
-and fields are appended to the ouput record as encountered.
-Function "calls" can be explicit (parameter list is enclosed in parentheses),
-or implicit, taking only one parameter (operators and literals like V24, X3).
-
-An opening parentheses following an operator or literal function makes the
-frame explicit. A parameter type of i or a is changed to n or s, resp.
-Otherwise a ( starts
-- in numeric context: an anonymous (n) arithmetic parentheses
-- in string context: an S(s_) function
-- in output context inside a loop: an indentation function
-- otherwise starts an explicit loop
-
-A closing parenthes closes whatever was opened by the last opening one.
-A comma ',', or range '..' closes a parameter,
-which closes an operator or saturated implicit loop.
-If there was no parameter to close, an empty string or 0 number is emitted,
-thus [..] is equivalent to [0..0].
-
-
-* functions and type coercion
-
-Every function has a signature which denotes the types of the
-expected parameters.
-Functions and switches are put on the stack, the type expected
-from the following expressions is set according to their signature.
-Whenever a comma or the closing parentheses is encountered,
-the fields added for this parameter are converted to the expected type.
-This is not much work for numeric or boolean types,
-since expressions are evaluated while parsing.
-Where a string is expected, we might have a record (i.e. multiple strings),
-which is then collapsed as with the S function:
-all printformats are discarded and other strings are concatenated.
-When a function sees it's closing ')', it replaces the parameters
-pushed on the output record by the function value calculated from them.
-The return value is tagged with the first data tag encountered,
-even from records evaluating to empty.
-
-The signature is denoted by a string, where
-- the 1st char gives the left hand operand ('_' for none)
-- the 2nd char is a digit of required params or '_' for specials
-- the following chars give types of each parameter in turn
-- an trailing '_' denotes, that the last type may be repeated
-
-
-* record loops: "x" |x| + (s_) WHILE n (s_) CONTINUE BREAK OCC IOCC
-
-A loop is either started explicitly by a '(' in record context,
-or implicitly by a conditional literal or a V value iterator.
-An explicit loop is closed by the matching ')',
-while an implicit one ends when saturated (after first iterator),
-with a comma ',' or any other V iterator.
-The loop content is then repeatedly executed while incrementing
-the OCC counter from 1 (and possibly subject to a preceding WHILE).
-On each turn there is a "last" flag (initially true), which is cleared by any
-iterator that expects to have more fields (this may even be true if
-there was no OCCth field, e.g. if the OCCth field didn't have some subfield),
-consequently, if there are no iterators, the first turn is the last.
-Iterators also set or clear a "had" state (initially unknown).
-During each execution of the loop, we're then in string type context:
-- a "" conditional emits it's contents only on OCC=1
- if before the first iterator, else on last=true
-- a || conditional emits it's contents on had!=false
-- a + undoes an immediatly preceding conditional on OCC==1
- and sets had to false if last is true
-- an iterator adds the OCCth of (the selected) occurences (see below)
-- if an iterator has no OCCth occurence, an immediatly preceding
- || is undone and had is set to false
-- other tokens are processed normally
-
-
-* field iterator and operators: Vi Di Ni [n_] ^c
-A field clause is always evaluated in a loop over OCC.
-The iterator can be modified by the range and subfield operators.
-What are the occurences in question, depends on the selected options:
-field occ range, subfield selector and subfield range.
-The integer list n_ may be given as x..y, where a missing x is 1 and
-missing y or the keyword LAST means up to last occurence.
-If no range is specified, that means all like [..].
-If a subfield is specified, that subfield is selected.
-If a subfield range is given, those occurences of the subfield are used:
-if no range is specified for the field, subfield occs are relative
-to the record, else relative to each individual field occ,
-so use V71[..]^a[1] to get the first occ of a in each occ of v71.
-With 71=^afoo^ax, 71=^abar^ay and 71=^abaz^az,
-V71^a[1..3] will give the first three total occurences of subfield a,
-i.e. foo, x and bar, whereas in CISIS it would give foo, bar and baz.
-
-* other values and operators on values: Ei Si :=
-
-* operators on string: *n .n (n) (n,n)
-The * and . string operators modify the top (the last pushed field)
-by removing the first n chars or all but the first n chars, resp.
-An () indentation operator is a somewhat late indentation printformat,
-which exchanges itself with the previous field.
-
-* arithmetic operators on numbers: * / + -
-* relational operators on numbers: = <> < <= > >=
-* relational operators on strings: = <> < <= > >= :
-* boolean operators on numbers: AND NOT OR
-
-* literals: 123 123.45 'x' "x" |x| /*x*/ !cxc
-* switches: IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
-* format: @a MPL MPU MHL MHU MDL MDU
-
-* db: DB MSTNAME MFN[i] L(s) NPST(s) NPOST(s) LR(s) LR(s[,n,n]) REF(r,o)
-* xternal db: L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
-
-* printformats
-# / % { } !cxc () QC QJ B I UL Ci Xi Fi FSi CLi NEWLINE(s) LW(n) PICT(s)
-M(n[,n]) TAB[i] BOX[i] NP[i] NC[i] BPICT(s[,n])
-FONTS(a_) COLS(s_) LINK(s_)
-
-* other functions
-&a(s_) CAT(s) GETENV(s) PUTENV(s) SYSTEM(s) DATE[i] PROC(s)
-DATETIME DATEONLY VAL(s) RMAX(n_) RMIN(n_) RSUM(n_) RAVR(n_)
-LEFT(s,n) RIGHT(s,n) SS(n,n,s) MID(s,n,n) REPLACE(s,s,s) INSTR(s,s) SIZE(s)
-F(n) F(n,n) F(n,n,n) TYPE(s) TYPE(s,s) S(s_) LAST
-NOCC(s_) P(s_) A(s_)
-
-* unsupported syntax
-L([s]s) NPOST([s]s) REF([s]j,...) -- hopeless, use winisis notation
-
-
-
-* list of tokens by syntax
-
-- empty tokens
- stopper: , .. ) ] THEN ELSE FI CASE ELSECASE ENDSEL CONTINUE BREAK
- state: MPL MPU MHL MHU MDL MDU
- values: DB MSTNAME OCC IOCC # / % { } QC QJ B I UL DATETIME DATEONLY LAST
-- immediate literal
- i MFN[i] DATE[i] TAB[i] BOX[i] NP[i] NC[i] Ci Xi Fi FSi CLi Vi Di Ni Ei Si
- @a L->a(s) LR->a(s) NPOST->a(s) REF->a(r,o)
- ^c 'x' "x" |x| /*x*/ !cxc
-- syntax blocks (jump & run)
- o ( o ) WHILE n ( o ) REF( r, o )
- IF n THEN o ELSE o FI SELECT s CASE s: o ELSECASE o ENDSEL
-- operators
- [n..n] ^c (n[,n]) := . * / + - = <> < <= > >= : AND OR NOT
-- all others are properly braced functions
-- ambiguous tokens
- S F might be Si, Fi or S(), F()
- / + * might be arithmetic or string operators or a newline
- = <> < <= > >= might compare strings or numbers
- := assigns number to Ei, string else
- ( opens one or the other frame ...
-
-
-* tokenizing & processing
-
-* read a token and literal
-- get (longest matching) token
-- if token accepts a literal, get the literal
-- if token accepts an opening (, get it
-- resolve S/F ambiguity syntactically depending on presence of i literal
-
-* process
-- if token is possibly an operator of higher precedence,
- check operator ambiguities,
- coerce field according to operators wishes
- and go opening the operator frame
-- else
- coerce field according to frame context
-- if we had a field in o-context or token is a stopper,
- close the parameter
-- if the frame is implicit and saturated or token is a frame closer,
- close the frame and start over processing
-- add the token or literal