fs/cramfs/README

   1 Notes on Filesystem Layout
   2 --------------------------
   3
   4 These notes describe what mkcramfs generates.  Kernel requirements are
   5 a bit looser, e.g. it doesn't care if the <file_data> items are
   6 swapped around (though it does care that directory entries (inodes) in
   7 a given directory are contiguous, as this is used by readdir).
   8
   9 All data is in little-endian format; user-space tools and the kernel do
  10 swabbing on big-endian systems.  (See section `Byte Order' below.)
  11
  12 <filesystem>:
  13         <superblock>
  14         <directory_structure>
  15         <data>
  16
  17 <superblock>: struct cramfs_super (see cramfs_fs.h).
  18
  19 <directory_structure>:
  20         For each file:
  21                 struct cramfs_inode (see cramfs_fs.h).
  22                 Filename.  Not generally null-terminated, but it is
  23                  null-padded to a multiple of 4 bytes.
  24
  25 The order of inode traversal is described as "width-first" (not to be
  26 confused with breadth-first); i.e. like depth-first but listing all of
  27 a directory's entries before recursing down its subdirectories: the
  28 same order as `ls -AUR' (but without the /^\..*:$/ directory header
  29 lines); put another way, the same order as `find -type d -exec
  30 ls -AU1 {} \;'.
  31
  32 Beginning in 2.4.7, directory entries are sorted.  This optimization
  33 allows cramfs_lookup to return more quickly when a filename does not
  34 exist, speeds up user-space directory sorts, etc.
  35
  36 <data>:
  37         One <file_data> for each file that's either a symlink or a
  38          regular file of non-zero st_size.
  39
  40 <file_data>:
  41         nblocks * <block_pointer>
  42          (where nblocks = (st_size - 1) / blksize + 1)
  43         nblocks * <block>
  44         padding to multiple of 4 bytes
  45
  46 The i'th <block_pointer> for a file stores the byte offset of the
  47 *end* of the i'th <block> (i.e. one past the last byte, which is the
  48 same as the start of the (i+1)'th <block> if there is one).  The first
  49 <block> immediately follows the last <block_pointer> for the file.
  50 <block_pointer>s are each 32 bits long.
  51
  52 The order of <file_data>'s is a depth-first descent of the directory
  53 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
  54 -print'.
  55
  56
  57 <block>: The i'th <block> is the output of zlib's compress function
  58 applied to the i'th blksize-sized chunk of the input data.
  59 (For the last <block> of the file, the input may of course be smaller.)
  60 Each <block> may be a different size.  (See <block_pointer> above.)
  61 <block>s are merely byte-aligned, not generally u32-aligned.
  62
  63
  64 Holes
  65 -----
  66
  67 This kernel supports cramfs holes (i.e. [efficient representation of]
  68 blocks in uncompressed data consisting entirely of NUL bytes), but by
  69 default mkcramfs doesn't test for & create holes, since cramfs in
  70 kernels up to at least 2.3.39 didn't support holes.  Run mkcramfs
  71 with -z if you want it to create files that can have holes in them.
  72
  73
  74 Byte Order
  75 ----------
  76
  77 When defining the cramfs filesystem, the two options for byte order were
  78 `always use little-endian' (like ext2fs) or `writer chooses endianness;
  79 kernel adapts at runtime'.  Little-endian wins because of code
  80 simplicity and little CPU overhead even on big-endian machines.
  81
  82 While cramfs has always been defined to be little-endian, this
  83 implementation originally required that cramfs filesystems be written
  84 and read with architectures of the same endianness; big-endian machines
  85 would write and read cramfs filesystems with big-endian byte order (the
  86 "incorrect" byte order for cramfs filesystems).
  87
  88 Now, only little-endian cramfs filesystems are supported for both
  89 little-endian and big-endian machines.  If you need to support
  90 big-endian cramfs filesystems for a legacy application on a big-endian
  91 machine, you could remove the byte-swapping, but it would probably be
  92 better to write a one-time byte order conversion program.
  93
  94
  95 Tools
  96 -----
  97
  98 The cramfs user-space tools, including mkcramfs and cramfsck, are
  99 located at <http://sourceforge.net/projects/cramfs/>.
 100
 101
 102 Future Development
 103 ==================
 104
 105 Block Size
 106 ----------
 107
 108 (Block size in cramfs refers to the size of input data that is
 109 compressed at a time.  It's intended to be somewhere around
 110 PAGE_CACHE_SIZE for cramfs_readpage's convenience.)
 111
 112 The superblock ought to indicate the block size that the fs was
 113 written for, since comments in <linux/pagemap.h> indicate that
 114 PAGE_CACHE_SIZE may grow in future (if I interpret the comment
 115 correctly).
 116
 117 Currently, mkcramfs #define's PAGE_CACHE_SIZE as 4096 and uses that
 118 for blksize, whereas Linux-2.3.39 uses its PAGE_CACHE_SIZE, which in
 119 turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
 120 This discrepancy is a bug, though it's not clear which should be
 121 changed.
 122
 123 One option is to change mkcramfs to take its PAGE_CACHE_SIZE from
 124 <asm/page.h>.  Personally I don't like this option, but it does
 125 require the least amount of change: just change `#define
 126 PAGE_CACHE_SIZE (4096)' to `#include <asm/page.h>'.  The disadvantage
 127 is that the generated cramfs cannot always be shared between different
 128 kernels, not even necessarily kernels of the same architecture if
 129 PAGE_CACHE_SIZE is subject to change between kernel versions
 130 (currently possible with arm and ia64).
 131
 132 The remaining options try to make cramfs more sharable by choosing a
 133 block size.  The options are:
 134
 135 The other part of making cramfs more sharable is choosing a block
 136 size.  The options are:
 137
 138   1. Always 4096 bytes.
 139
 140   2. Writer chooses blocksize; kernel adapts but rejects blocksize >
 141      PAGE_CACHE_SIZE.
 142
 143   3. Writer chooses blocksize; kernel adapts even to blocksize >
 144      PAGE_CACHE_SIZE.
 145
 146 It's easy enough to change the kernel to use a smaller value than
 147 PAGE_CACHE_SIZE: just make cramfs_readpage read multiple blocks.
 148
 149 The cost of option 1 is that kernels with a larger PAGE_CACHE_SIZE
 150 value don't get as good compression as they can.
 151
 152 The cost of option 2 relative to option 1 is that the code uses
 153 variables instead of #define'd constants.  The gain is that people
 154 with kernels having larger PAGE_CACHE_SIZE can make use of that if
 155 they don't mind their cramfs being inaccessible to kernels with
 156 smaller PAGE_CACHE_SIZE values.
 157
 158 Option 3 is easy to implement if we don't mind being CPU-inefficient:
 159 e.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which
 160 must be no larger than 32KB) and discard what it doesn't need.
 161 Getting readpage to read into all the covered pages is harder.
 162
 163 The main advantage of option 3 over 1, 2, is better compression.  The
 164 cost is greater complexity.  Probably not worth it, but I hope someone
 165 will disagree.  (If it is implemented, then I'll re-use that code in
 166 e2compr.)
 167
 168 Another cost of 2 and 3 over 1 is making mkcramfs use a different
 169 block size, but that just means adding and parsing a -b option.
 170
 171
 172 Inode Size
 173 ----------
 174
 175 Given that cramfs will probably be used for CDs etc. as well as just
 176 silicon ROMs, it might make sense to expand the inode a little from
 177 its current 12 bytes.  Inodes other than the root inode are followed
 178 by filename, so the expansion doesn't even have to be a multiple of 4
 179 bytes.