Notes on Filesystem Layout -------------------------- These notes describe what mkcramfs generates. Kernel requirements are a bit looser, e.g. it doesn't care if the items are swapped around (though it does care that directory entries (inodes) in a given directory are contiguous, as this is used by readdir). All data is in little-endian format; user-space tools and the kernel do swabbing on big-endian systems. (See section `Byte Order' below.) : : struct cramfs_super (see cramfs_fs.h). : For each file: struct cramfs_inode (see cramfs_fs.h). Filename. Not generally null-terminated, but it is null-padded to a multiple of 4 bytes. The order of inode traversal is described as "width-first" (not to be confused with breadth-first); i.e. like depth-first but listing all of a directory's entries before recursing down its subdirectories: the same order as `ls -AUR' (but without the /^\..*:$/ directory header lines); put another way, the same order as `find -type d -exec ls -AU1 {} \;'. Beginning in 2.4.7, directory entries are sorted. This optimization allows cramfs_lookup to return more quickly when a filename does not exist, speeds up user-space directory sorts, etc. : One for each file that's either a symlink or a regular file of non-zero st_size. : nblocks * (where nblocks = (st_size - 1) / blksize + 1) nblocks * padding to multiple of 4 bytes The i'th for a file stores the byte offset of the *end* of the i'th (i.e. one past the last byte, which is the same as the start of the (i+1)'th if there is one). The first immediately follows the last for the file. s are each 32 bits long. The order of 's is a depth-first descent of the directory tree, i.e. the same order as `find -size +0 \( -type f -o -type l \) -print'. : The i'th is the output of zlib's compress function applied to the i'th blksize-sized chunk of the input data. (For the last of the file, the input may of course be smaller.) Each may be a different size. (See above.) s are merely byte-aligned, not generally u32-aligned. Holes ----- This kernel supports cramfs holes (i.e. [efficient representation of] blocks in uncompressed data consisting entirely of NUL bytes), but by default mkcramfs doesn't test for & create holes, since cramfs in kernels up to at least 2.3.39 didn't support holes. Run mkcramfs with -z if you want it to create files that can have holes in them. Byte Order ---------- When defining the cramfs filesystem, the two options for byte order were `always use little-endian' (like ext2fs) or `writer chooses endianness; kernel adapts at runtime'. Little-endian wins because of code simplicity and little CPU overhead even on big-endian machines. While cramfs has always been defined to be little-endian, this implementation originally required that cramfs filesystems be written and read with architectures of the same endianness; big-endian machines would write and read cramfs filesystems with big-endian byte order (the "incorrect" byte order for cramfs filesystems). Now, only little-endian cramfs filesystems are supported for both little-endian and big-endian machines. If you need to support big-endian cramfs filesystems for a legacy application on a big-endian machine, you could remove the byte-swapping, but it would probably be better to write a one-time byte order conversion program. Tools ----- The cramfs user-space tools, including mkcramfs and cramfsck, are located at . Future Development ================== Block Size ---------- (Block size in cramfs refers to the size of input data that is compressed at a time. It's intended to be somewhere around PAGE_CACHE_SIZE for cramfs_readpage's convenience.) The superblock ought to indicate the block size that the fs was written for, since comments in indicate that PAGE_CACHE_SIZE may grow in future (if I interpret the comment correctly). Currently, mkcramfs #define's PAGE_CACHE_SIZE as 4096 and uses that for blksize, whereas Linux-2.3.39 uses its PAGE_CACHE_SIZE, which in turn is defined as PAGE_SIZE (which can be as large as 32KB on arm). This discrepancy is a bug, though it's not clear which should be changed. One option is to change mkcramfs to take its PAGE_CACHE_SIZE from . Personally I don't like this option, but it does require the least amount of change: just change `#define PAGE_CACHE_SIZE (4096)' to `#include '. The disadvantage is that the generated cramfs cannot always be shared between different kernels, not even necessarily kernels of the same architecture if PAGE_CACHE_SIZE is subject to change between kernel versions (currently possible with arm and ia64). The remaining options try to make cramfs more sharable by choosing a block size. The options are: The other part of making cramfs more sharable is choosing a block size. The options are: 1. Always 4096 bytes. 2. Writer chooses blocksize; kernel adapts but rejects blocksize > PAGE_CACHE_SIZE. 3. Writer chooses blocksize; kernel adapts even to blocksize > PAGE_CACHE_SIZE. It's easy enough to change the kernel to use a smaller value than PAGE_CACHE_SIZE: just make cramfs_readpage read multiple blocks. The cost of option 1 is that kernels with a larger PAGE_CACHE_SIZE value don't get as good compression as they can. The cost of option 2 relative to option 1 is that the code uses variables instead of #define'd constants. The gain is that people with kernels having larger PAGE_CACHE_SIZE can make use of that if they don't mind their cramfs being inaccessible to kernels with smaller PAGE_CACHE_SIZE values. Option 3 is easy to implement if we don't mind being CPU-inefficient: e.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which must be no larger than 32KB) and discard what it doesn't need. Getting readpage to read into all the covered pages is harder. The main advantage of option 3 over 1, 2, is better compression. The cost is greater complexity. Probably not worth it, but I hope someone will disagree. (If it is implemented, then I'll re-use that code in e2compr.) Another cost of 2 and 3 over 1 is making mkcramfs use a different block size, but that just means adding and parsing a -b option. Inode Size ---------- Given that cramfs will probably be used for CDs etc. as well as just silicon ROMs, it might make sense to expand the inode a little from its current 12 bytes. Inodes other than the root inode are followed by filename, so the expansion doesn't even have to be a multiple of 4 bytes.