--- /dev/null
+ISIS records serialized
+
+Serialization means to convert the internal representation
+of an ISIS record to a sequence of bytes ("octets") suitable
+to be stored in a file or transferred via a network.
+
+The serialization format described here is used by OpenIsis
+for both database master files and network communications.
+
+
+* design goals
+
+The serialization format should be
+- easy to use
+ for programmers and tool writers
+- efficient
+ in execution time and space used
+- robust
+ a broken masterfile should be fixed using a text editor
+- versatile
+ can be used for a variety of applications using a variety of tools
+- without limits
+ in number and size of records and fields
+
+
+* basic format
+
+In general, a record is serialized by
+- serializing meta information
+- serializing the fields in order
+- appending a blank line
+
+Fields are serialized as
+- the field tag printed using ASCII decimal digits
+ (optionally preceeded by a minus sign, if negative tags are allowed)
+- a (horizontal) TAB character (ASCII value 9, ^I)
+- the field value
+- a newline character (ASCII value 10, ^J)
+
+Metadata is serialized in the same way,
+using special tags according to the needs of the environment.
+Two situations are distinguished:
+- "soft" metadata, which may and should be accessible as part of the record.
+ This is encoded by convention using negative tags.
+ An example of this is HTTP and other MIME-style communication,
+ where the MIME headers like "User-agent" or "Date" are encoded
+ in such a way, while content data like GET or POST parameters
+ should be mapped to positive IDs.
+- "hard" metadata, which must not interfere with the record contents
+ in order for the environment to work properly.
+ This is encoded using a single non-digit character instead of tag digits.
+ An example of this is the MFN in a master file and information
+ regarding record deletion or update.
+
+The final blankline may be omitted,
+where only a single record is contained in an otherwise delimited byte sequence.
+
+A reader should support a lazy mode,
+allowing the TAB to be omitted, where unambigous
+(the field value does not start with a digit or a TAB).
+Writers, however, are strongly urged to write the TAB.
+
+Tags with leading zeros are allowed (typically with %03d)
+and must not be interpreted as octal using atoi.
+
+
+* newline conventions
+
+Two ways are supported to deal with newline characters in field values:
+- in "text mode", newlines are replaced with vertical tabs (ASCII 11, ^K)
+ on serialization and vice versa on deserialization.
+- in "binary mode", newlines are replaced as newline-TAB sequences.
+
+As an alternative to these protocol-level modes one may choose "field mode"
+at the application level: simply claim that you are not interested in
+newlines and replace any by tab or space (w/o ever converting back).
+
+
+The advantages of text mode over binary mode are
+- it is slightly faster than the binary translation
+- the serialized records do not need more space
+ than the internal representation
+ (whereas the binary serialization might need nearly twice as much
+ in worst case)
+- it is easily used with line-oriented utilities like grep or sed,
+ since each field is contained within one line
+
+The binary mode (which resembles MIME continuation lines) has the
+advantage of not loosing vertical tab characters that might have
+been contained in the original field values.
+It is fully transparent and can be used to store any binary data like images
+with an average overhead of 0.4%
+(as compared to +33% needed by BASE64 encoding).
+
+
+The OpenIsis server automatically detects binary mode,
+if the client uses a continuation line.
+
+
+* masterfile format
+
+A basic masterfile consists of a blank-line separated series of records.
+
+The first record is the "controlling record",
+containing descriptive information such as the newline convention,
+the subfield separator and the character encoding.
+All of this is optional; the masterfile might just start with a blank line.
+
+The MFNs are then assigned implicitly in order, starting from 1.
+There is no distinction between
+empty (consecutive blanklines) and deleted records.
+There is absolutely no redundant information contained.
+
+Masterfile compression creates this state
+(however, it may choose to use special meta lines for long ranges
+of deleted records, see below -- but those are inefficient in
+the Xref, anyway).
+Such a masterfile can be very easily created by any tool (like Perl and such).
+The Xref file can be easily, fast and reliably recreated.
+
+
+When writing to the database, information is ALWAYS appended to the end.
+There is NO OVERWRITING of any data, ever, period.
+That way data CANT BE DESTROYED by any operation
+(one could advise the operating system to set a mandatory read lock on that),
+and all changes are easily traced using tail -f.
+
+A binary mode masterfile starts with a "broken" continuation line
+containing a single TAB.
+
+
+* basic mode writing
+
+In metalines, all numbers are given in decimal digits
+and multiple items are separated by TABs.
+The optional timestamps are an arbitrary prefix of generalized time format
+YYYYMMDDhhmmssttt... as of 'date +%Y%m%d%H%M' plus milliseconds ttt,
+optionally followed by up to 14 characters to create a unique request id.
+
+Write operations use the following meta line (preceeding record data):
+- W mfn [oldpos [timestamp]]
+ followed by new record data denotes a write of record mfn.
+ Depending on the needs of the environment,
+ the byte offset of the last version might be added in order to
+ support access to old versions (e.g. for delayed index update).
+ oldpos may be given as position[.length[.fields]]
+
+A W with no data following is mostly equivalent to a delete.
+If an otherwise written mfn is higher than the highest known,
+the highest known (and thus the implicit counter) is set to this.
+
+A lazy reader should not require meta lines to be followed by a blank line,
+where unambigous.
+For writers, however, the blank line is strongly recommended.
+
+
+A reasonable size limit on metalines is 127(+newline), since
+- 22 = 1+1+20 for operator, tab and mfn (64 bits ~ 20 digits)
+- 43 = 1+20+1+10+1+10 for tab position.length.fields
+- 32 = 1+17+14 for tab, milliseconds time and id
+totals to 97, so we have some room left.
+
+
+* advance mode
+
+For advanced space efficiency at the cost of read increased access time,
+the following lines may be used:
+- D mfn [oldpos [timestamp]]
+ a record entry consisting of only a D line denotes the deletion of
+ record number mfn.
+ Basically equivalent to a write with no data following,
+ but a little bit more explicit.
+- I mfn [timestamp]
+ (set id) override the implicit MFN counter,
+ e.g. after a series of deleted rows or just to be explicit.
+ Basically equivalent to a write with no old pos,
+ but a little bit more explicit.
+ Recommended when appending records after some deletes.
+- C mfn oldpos [timestamp]
+ introduces a series of patch commands specifying how record mfn
+ was changed.
+
+Software writing the masterfile will typically choose to
+write full updates and MUST provide a switch to forcedly do so,
+in order to be compatible with basic mode readers.
+
+However, the patch command language is particularily useful
+with server operations, to avoid the need for read-write
+sequences with some sort of locking.
+
+
+* the patch language
+
+The patch commands are lines starting with special
+characters like +, -, ~ and so on,
+followed by (an optional TAB and) field addresses, TAB and field data.
+
+
+The simplest case is the '+' command,
+meaning that it's data is to be appended to the record.
+The + and TAB may be omitted
+(both, in order to not be confused with a continuation line).
+In other words, the add command may look exactly like an ordinary
+field line.
+
+A series of '=' commands works exactly like the set operation in an
+OpenIsis Tcl record. Especially, field indexes and subfields are supported.
+The '-' command resembles the del operation.
+
+A detailled description of the "patch language" is to be done.
+
+
+example
+$
+C 1234
+= 24 foo
+= 24 bar
+25 baz
+$
+changes record 1234 by setting the first to occurences of
+field 24 to foo and baz, respectively, deleting any other occurences
+of field 24, and appending a field 25 with value baz.
+
+
+* the pointer file
+
+The pointer file is an array of n-Byte (n >= 6) entries,
+the ith entry referencing mfn i, similar to the traditional .XRF (crossref).
+
+The n=k+l+m bytes specify two or three numbers (in native byte order)
+of up to 8 bytes each:
+- the first k bytes (k >= 4) give the position of the record
+ (or it's last update or change entry)
+- the next l bytes (l >= 2) give the length of the record
+ (excluding the last field's terminating newline and following blank line)
+- the final m bytes (m >= 0) give the number of fields.
+ If m is 0, or all bits in a field number are set (for large records),
+ the reader has to determine the number fo fields by inspecting the record.
+
+The first six bytes of the first entry describe the detailled layout.
+Four bytes are the "magic number" containing the ASCII characters "ISIX".
+Two bytes are the number (m*256 + l*16 + k) in native byte order.
+
+The minimum case k=4, l=2 imposes the limits of traditional ISIS.
+Actually the lower limits are not enforced;
+in a very specialised application one might want to use k/l/m = 3/1/0.
+Recommended are at least eight bytes as k/l/m = 4/3/1 or 4/4/0
+or, with large file support, 5/3/0.
+
+
+However, when any of the limits is reached,
+or an unsupported combination or byte order is found,
+the Xref can easily be recreated with greater values.
+The 12 byte pointer with k/l/m = 6/4/2 will be enough for
+even gigantic databases of a quarter Petabyte (262.144 Gigabyte).
+
+The number of fields is redundant, but as an optimization
+may make live a little bit easier for a reader.
+If the pointer structure has m>0 (typically m=1),
+a value of 0 must be stored if the number of fields exceeds the representable
+range and a reader should be prepared to figure it out itself in that case.
+
+
+---
+ $Id: Serialized.txt,v 1.8 2003/05/30 13:26:34 kripke Exp $