1 Documentation for /proc/sys/vm/* Kernel version 2.4.28
2 =============================================================
4 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
7 (c) 2004, Marc-Christian Petersen <m.c.p@linux-systeme.com>
8 - Removed non-existent knobs which were removed in early
10 - Corrected values for bdflush
11 - Documented missing tunables
12 - Documented aa-vm tunables
16 For general info and legal blurb, please look in README.
17 =============================================================
19 This file contains the documentation for the sysctl files in
20 /proc/sys/vm and is valid for Linux kernel v2.4.28.
22 The files in this directory can be used to tune the operation
23 of the virtual memory (VM) subsystem of the Linux kernel, and
24 three of the files (bdflush, max-readahead, min-readahead)
25 also have some influence on disk usage.
27 Default values and initialization routines for most of these
28 files can be found in mm/vmscan.c, mm/page_alloc.c and
31 Currently, these files are in /proc/sys/vm:
45 - vm_lru_balance_ratio
49 =============================================================
55 This file controls the operation of the bdflush kernel
56 daemon. The source code to this struct can be found in
57 fs/buffer.c. It currently contains 9 integer values,
58 of which 6 are actually used by the kernel.
60 nfract: The first parameter governs the maximum
61 number of dirty buffers in the buffer
62 cache. Dirty means that the contents of the
63 buffer still have to be written to disk (as
64 opposed to a clean buffer, which can just be
65 forgotten about). Setting this to a high
66 value means that Linux can delay disk writes
67 for a long time, but it also means that it
68 will have to do a lot of I/O at once when
69 memory becomes short. A low value will
70 spread out disk I/O more evenly, at the cost
71 of more frequent I/O operations. The default
72 value is 30%, the minimum is 0%, and the
75 ndirty: The second parameter (ndirty) gives the
76 maximum number of dirty buffers that bdflush
77 can write to the disk in one time. A high
78 value will mean delayed, bursty I/O, while a
79 small value can lead to memory shortage when
80 bdflush isn't woken up often enough. The
81 default value is 500 dirty buffers, the
82 minimum is 1, and the maximum is 50000.
84 dummy2: The third parameter is not used.
86 dummy3: The fourth parameter is not used.
88 interval: The fifth parameter, interval, is the minimum
89 rate at which kupdate will wake and flush.
90 The value is in jiffies (clockticks), the
91 number of jiffies per second is normally 100
92 (Alpha is 1024). Thus, x*HZ is x seconds. The
93 default value is 5 seconds, the minimum is 0
94 seconds, and the maximum is 10,000 seconds.
96 age_buffer: The sixth parameter, age_buffer, governs the
97 maximum time Linux waits before writing out a
98 dirty buffer to disk. The value is in jiffies.
99 The default value is 30 seconds, the minimum
100 is 1 second, and the maximum 10,000 seconds.
102 sync: The seventh parameter, nfract_sync, governs
103 the percentage of buffer cache that is dirty
104 before bdflush activates synchronously. This
105 can be viewed as the hard limit before
106 bdflush forces buffers to disk. The default
107 is 60%, the minimum is 0%, and the maximum
110 stop_bdflush: The eighth parameter, nfract_stop_bdflush,
111 governs the percentage of buffer cache that
112 is dirty which will stop bdflush. The default
113 is 20%, the miniumum is 0%, and the maxiumum
116 dummy5: The ninth parameter is not used.
118 So the default is: 30 500 0 0 500 3000 60 20 0 for 100 HZ.
119 =============================================================
125 It can happen that the disk still keeps spinning up and you
126 don't quite know why or what causes it. The laptop mode patch
127 has a little helper for that as well. When set to 1, it will
128 dump info to the kernel message buffer about what process
129 caused the io. Be careful when playing with this setting.
130 It is advisable to shut down syslog first! The default is 0.
131 =============================================================
137 Kswapd is the kernel swapout daemon. That is, kswapd is that
138 piece of the kernel that frees memory when it gets fragmented
139 or full. Since every system is different, you'll probably
140 want some control over this piece of the system.
142 The numbers in this page correspond to the numbers in the
143 struct pager_daemon {tries_base, tries_min, swap_cluster
144 }; The tries_base and swap_cluster probably have the
145 largest influence on system performance.
147 tries_base The maximum number of pages kswapd tries to
148 free in one round is calculated from this
149 number. Usually this number will be divided
150 by 4 or 8 (see mm/vmscan.c), so it isn't as
152 When you need to increase the bandwidth to/
153 from swap, you'll want to increase this
156 tries_min This is the minimum number of times kswapd
157 tries to free a page each time it is called.
158 Basically it's just there to make sure that
159 kswapd frees some pages even when it's being
160 called with minimum priority.
162 swap_cluster This is the number of pages kswapd writes in
163 one turn. You want this large so that kswapd
164 does it's I/O in large chunks and the disk
165 doesn't have to seek often, but you don't
166 want it to be too large since that would
167 flood the request queue.
169 The default value is: 512 32 8.
170 =============================================================
176 Setting this to 1 switches the vm (and block layer) to laptop
177 mode. Leaving it to 0 makes the kernel work like before. When
178 in laptop mode, you also want to extend the intervals
179 desribed in Documentation/laptop-mode.txt.
180 See the laptop-mode.sh script for how to do that.
182 The default value is 0.
183 =============================================================
189 This tunable affects how early the Linux VFS will fetch the
190 next block of a file from memory. File readahead values are
191 determined on a per file basis in the VFS and are adjusted
192 based on the behavior of the application accessing the file.
193 Anytime the current position being read in a file plus the
194 current read ahead value results in the file pointer pointing
195 to the next block in the file, that block will be fetched
196 from disk. By raising this value, the Linux kernel will allow
197 the readahead value to grow larger, resulting in more blocks
198 being prefetched from disks which predictably access files in
199 uniform linear fashion. This can result in performance
200 improvements, but can also result in excess (and often
201 unnecessary) memory usage. Lowering this value has the
202 opposite affect. By forcing readaheads to be less aggressive,
203 memory may be conserved at a potential performance impact.
205 The default value is 31.
206 =============================================================
212 Like max-readahead, min-readahead places a floor on the
213 readahead value. Raising this number forces a files readahead
214 value to be unconditionally higher, which can bring about
215 performance improvements, provided that all file access in
216 the system is predictably linear from the start to the end of
217 a file. This of course results in higher memory usage from
218 the pagecache. Conversely, lowering this value, allows the
219 kernel to conserve pagecache memory, at a potential
222 The default value is 3.
223 =============================================================
229 This file contains the maximum number of memory map areas a
230 process may have. Memory map areas are used as a side-effect
231 of calling malloc, directly by mmap and mprotect, and also
232 when loading shared libraries.
234 While most applications need less than a thousand maps,
235 certain programs, particularly malloc debuggers, may consume
236 lots of them, e.g. up to one or two maps per allocation.
238 The default value is 65536.
239 =============================================================
245 This value contains a flag to enable memory overcommitment.
246 When this flag is 0, the kernel checks before each malloc()
247 to see if there's enough memory left. If the flag is nonzero,
248 the system pretends there's always enough memory.
250 This feature can be very useful because there are a lot of
251 programs that malloc() huge amounts of memory "just-in-case"
252 and don't use much of it. The default value is 0.
254 Look at: mm/mmap.c::vm_enough_memory() for more information.
255 =============================================================
261 The Linux VM subsystem avoids excessive disk seeks by reading
262 multiple pages on a page fault. The number of pages it reads
263 is dependent on the amount of memory in your machine.
265 The number of pages the kernel reads in at once is equal to
266 2 ^ page-cluster. Values above 2 ^ 5 don't make much sense
267 for swap because we only cluster swap data in 32-page groups.
268 =============================================================
274 The kernel keeps a number of page tables in a per-processor
275 cache (this helps a lot on SMP systems). The cache size for
276 each processor will be between the low and the high value.
278 On a low-memory, single CPU system you can safely set these
279 values to 0 so you don't waste the memory. On SMP systems it
280 is used so that the system can do fast pagetable allocations
281 without having to acquire the kernel memory lock.
283 For large systems, the settings are probably OK. For normal
284 systems they won't hurt a bit. For small systems (<16MB ram)
285 it might be advantageous to set both values to 0.
287 The default value is: 25 50.
288 =============================================================
294 select if to immdiatly insert anon pages in the lru.
295 Immediatly means as soon as they're allocated during the page
296 faults. If this is set to 0, they're inserted only after the
299 Having anon pages immediatly inserted in the lru allows the
300 VM to know better when it's worthwhile to start swapping
301 anonymous ram, it will start to swap earlier and it should
302 swap smoother and faster, but it will decrease scalability
303 on the >16-ways of an order of magnitude. Big SMP/NUMA
304 definitely can't take an hit on a global spinlock at
305 every anon page allocation.
307 Low ram machines that swaps all the time want to turn
308 this on (i.e. set to 1).
310 The default value is 0.
311 =============================================================
317 is how much of the inactive LRU queue we will scan in one go.
318 A value of 6 for vm_cache_scan_ratio implies that we'll scan
319 1/6 of the inactive lists during a normal aging round.
321 The default value is 6.
322 =============================================================
328 is when __alloc_pages fails, dump us a stack. This will
329 mostly happen during OOM conditions (hopefully ;)
331 The default value is 0.
332 =============================================================
336 vm_lru_balance_ratio:
337 ---------------------
338 controls the balance between active and inactive cache. The
339 bigger vm_balance is, the easier the active cache will grow,
340 because we'll rotate the active list slowly. A value of 2
341 means we'll go towards a balance of 1/3 of the cache being
344 The default value is 2.
345 =============================================================
351 controls the pageout rate, the smaller, the earlier we'll
354 The default value is 100.
355 =============================================================
361 is the number of vm passes before failing the memory
362 balancing. Take into account 3 passes are needed for a
363 flush/wait/free cycle and that we only scan
364 1/vm_cache_scan_ratio of the inactive list at each pass.
366 The default value is 60.
367 =============================================================
373 is what proportion of the VFS queues we will scan in one go.
374 A value of 6 for vm_vfs_scan_ratio implies that 1/6th of the
375 unused-inode, dentry and dquot caches will be freed during a
377 Big fileservers (NFS, SMB etc.) probably want to set this
380 The default value is 6.
381 =============================================================