+++ /dev/null
-ISIS records serialized
-
-Serialization means to convert the internal representation
-of an ISIS record to a sequence of bytes ("octets") suitable
-to be stored in a file or transferred via a network.
-
-The serialization format described here is used by OpenIsis
-for both database master files and network communications.
-
-
-* design goals
-
-The serialization format should be
-- easy to use
- for programmers and tool writers
-- efficient
- in execution time and space used
-- robust
- a broken masterfile should be fixed using a text editor
-- versatile
- can be used for a variety of applications using a variety of tools
-- without limits
- in number and size of records and fields
-
-
-* basic format
-
-In general, a record is serialized by
-- serializing meta information
-- serializing the fields in order
-- appending a blank line
-
-Fields are serialized as
-- the field tag printed using ASCII decimal digits
- (optionally preceeded by a minus sign, if negative tags are allowed)
-- a (horizontal) TAB character (ASCII value 9, ^I)
-- the field value
-- a newline character (ASCII value 10, ^J)
-
-Metadata is serialized in the same way,
-using special tags according to the needs of the environment.
-Two situations are distinguished:
-- "soft" metadata, which may and should be accessible as part of the record.
- This is encoded by convention using negative tags.
- An example of this is HTTP and other MIME-style communication,
- where the MIME headers like "User-agent" or "Date" are encoded
- in such a way, while content data like GET or POST parameters
- should be mapped to positive IDs.
-- "hard" metadata, which must not interfere with the record contents
- in order for the environment to work properly.
- This is encoded using a single non-digit character instead of tag digits.
- An example of this is the MFN in a master file and information
- regarding record deletion or update.
-
-The final blankline may be omitted,
-where only a single record is contained in an otherwise delimited byte sequence.
-
-A reader should support a lazy mode,
-allowing the TAB to be omitted, where unambigous
-(the field value does not start with a digit or a TAB).
-Writers, however, are strongly urged to write the TAB.
-
-Tags with leading zeros are allowed (typically with %03d)
-and must not be interpreted as octal using atoi.
-
-
-* newline conventions
-
-Two ways are supported to deal with newline characters in field values:
-- in "text mode", newlines are replaced with vertical tabs (ASCII 11, ^K)
- on serialization and vice versa on deserialization.
-- in "binary mode", newlines are replaced as newline-TAB sequences.
-
-As an alternative to these protocol-level modes one may choose "field mode"
-at the application level: simply claim that you are not interested in
-newlines and replace any by tab or space (w/o ever converting back).
-
-
-The advantages of text mode over binary mode are
-- it is slightly faster than the binary translation
-- the serialized records do not need more space
- than the internal representation
- (whereas the binary serialization might need nearly twice as much
- in worst case)
-- it is easily used with line-oriented utilities like grep or sed,
- since each field is contained within one line
-
-The binary mode (which resembles MIME continuation lines) has the
-advantage of not loosing vertical tab characters that might have
-been contained in the original field values.
-It is fully transparent and can be used to store any binary data like images
-with an average overhead of 0.4%
-(as compared to +33% needed by BASE64 encoding).
-
-
-The OpenIsis server automatically detects binary mode,
-if the client uses a continuation line.
-
-
-* masterfile format
-
-A basic masterfile consists of a blank-line separated series of records.
-
-The first record is the "controlling record",
-containing descriptive information such as the newline convention,
-the subfield separator and the character encoding.
-All of this is optional; the masterfile might just start with a blank line.
-
-The MFNs are then assigned implicitly in order, starting from 1.
-There is no distinction between
-empty (consecutive blanklines) and deleted records.
-There is absolutely no redundant information contained.
-
-Masterfile compression creates this state
-(however, it may choose to use special meta lines for long ranges
-of deleted records, see below -- but those are inefficient in
-the Xref, anyway).
-Such a masterfile can be very easily created by any tool (like Perl and such).
-The Xref file can be easily, fast and reliably recreated.
-
-
-When writing to the database, information is ALWAYS appended to the end.
-There is NO OVERWRITING of any data, ever, period.
-That way data CANT BE DESTROYED by any operation
-(one could advise the operating system to set a mandatory read lock on that),
-and all changes are easily traced using tail -f.
-
-A binary mode masterfile starts with a "broken" continuation line
-containing a single TAB.
-
-
-* basic mode writing
-
-In metalines, all numbers are given in decimal digits
-and multiple items are separated by TABs.
-The optional timestamps are an arbitrary prefix of generalized time format
-YYYYMMDDhhmmssttt... as of 'date +%Y%m%d%H%M' plus milliseconds ttt,
-optionally followed by up to 14 characters to create a unique request id.
-
-Write operations use the following meta line (preceeding record data):
-- W mfn [oldpos [timestamp]]
- followed by new record data denotes a write of record mfn.
- Depending on the needs of the environment,
- the byte offset of the last version might be added in order to
- support access to old versions (e.g. for delayed index update).
- oldpos may be given as position[.length[.fields]]
-
-A W with no data following is mostly equivalent to a delete.
-If an otherwise written mfn is higher than the highest known,
-the highest known (and thus the implicit counter) is set to this.
-
-A lazy reader should not require meta lines to be followed by a blank line,
-where unambigous.
-For writers, however, the blank line is strongly recommended.
-
-
-A reasonable size limit on metalines is 127(+newline), since
-- 22 = 1+1+20 for operator, tab and mfn (64 bits ~ 20 digits)
-- 43 = 1+20+1+10+1+10 for tab position.length.fields
-- 32 = 1+17+14 for tab, milliseconds time and id
-totals to 97, so we have some room left.
-
-
-* advance mode
-
-For advanced space efficiency at the cost of read increased access time,
-the following lines may be used:
-- D mfn [oldpos [timestamp]]
- a record entry consisting of only a D line denotes the deletion of
- record number mfn.
- Basically equivalent to a write with no data following,
- but a little bit more explicit.
-- I mfn [timestamp]
- (set id) override the implicit MFN counter,
- e.g. after a series of deleted rows or just to be explicit.
- Basically equivalent to a write with no old pos,
- but a little bit more explicit.
- Recommended when appending records after some deletes.
-- C mfn oldpos [timestamp]
- introduces a series of patch commands specifying how record mfn
- was changed.
-
-Software writing the masterfile will typically choose to
-write full updates and MUST provide a switch to forcedly do so,
-in order to be compatible with basic mode readers.
-
-However, the patch command language is particularily useful
-with server operations, to avoid the need for read-write
-sequences with some sort of locking.
-
-
-* the patch language
-
-The patch commands are lines starting with special
-characters like +, -, ~ and so on,
-followed by (an optional TAB and) field addresses, TAB and field data.
-
-
-The simplest case is the '+' command,
-meaning that it's data is to be appended to the record.
-The + and TAB may be omitted
-(both, in order to not be confused with a continuation line).
-In other words, the add command may look exactly like an ordinary
-field line.
-
-A series of '=' commands works exactly like the set operation in an
-OpenIsis Tcl record. Especially, field indexes and subfields are supported.
-The '-' command resembles the del operation.
-
-A detailled description of the "patch language" is to be done.
-
-
-example
-$
-C 1234
-= 24 foo
-= 24 bar
-25 baz
-$
-changes record 1234 by setting the first to occurences of
-field 24 to foo and baz, respectively, deleting any other occurences
-of field 24, and appending a field 25 with value baz.
-
-
-* the pointer file
-
-The pointer file is an array of n-Byte (n >= 6) entries,
-the ith entry referencing mfn i, similar to the traditional .XRF (crossref).
-
-The n=k+l+m bytes specify two or three numbers (in native byte order)
-of up to 8 bytes each:
-- the first k bytes (k >= 4) give the position of the record
- (or it's last update or change entry)
-- the next l bytes (l >= 2) give the length of the record
- (excluding the last field's terminating newline and following blank line)
-- the final m bytes (m >= 0) give the number of fields.
- If m is 0, or all bits in a field number are set (for large records),
- the reader has to determine the number fo fields by inspecting the record.
-
-The first six bytes of the first entry describe the detailled layout.
-Four bytes are the "magic number" containing the ASCII characters "ISIX".
-Two bytes are the number (m*256 + l*16 + k) in native byte order.
-
-The minimum case k=4, l=2 imposes the limits of traditional ISIS.
-Actually the lower limits are not enforced;
-in a very specialised application one might want to use k/l/m = 3/1/0.
-Recommended are at least eight bytes as k/l/m = 4/3/1 or 4/4/0
-or, with large file support, 5/3/0.
-
-
-However, when any of the limits is reached,
-or an unsupported combination or byte order is found,
-the Xref can easily be recreated with greater values.
-The 12 byte pointer with k/l/m = 6/4/2 will be enough for
-even gigantic databases of a quarter Petabyte (262.144 Gigabyte).
-
-The number of fields is redundant, but as an optimization
-may make live a little bit easier for a reader.
-If the pointer structure has m>0 (typically m=1),
-a value of 0 must be stored if the number of fields exceeds the representable
-range and a reader should be prepared to figure it out itself in that case.
-
-
----
- $Id: Serialized.txt,v 1.8 2003/05/30 13:26:34 kripke Exp $