1 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]>
3 <book id="LKLockingGuide">
5 <title>Unreliable Guide To Locking</title>
9 <firstname>Paul</firstname>
10 <othername>Rusty</othername>
11 <surname>Russell</surname>
14 <email>rusty@rustcorp.com.au</email>
22 <holder>Paul Russell</holder>
27 This documentation is free software; you can redistribute
28 it and/or modify it under the terms of the GNU General Public
29 License as published by the Free Software Foundation; either
30 version 2 of the License, or (at your option) any later
35 This program is distributed in the hope that it will be
36 useful, but WITHOUT ANY WARRANTY; without even the implied
37 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
38 See the GNU General Public License for more details.
42 You should have received a copy of the GNU General Public
43 License along with this program; if not, write to the Free
44 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
49 For more details see the file COPYING in the source
50 distribution of Linux.
57 <title>Introduction</title>
59 Welcome, to Rusty's Remarkably Unreliable Guide to Kernel
60 Locking issues. This document describes the locking systems in
61 the Linux Kernel as we approach 2.4.
64 It looks like <firstterm linkend="gloss-smp"><acronym>SMP</acronym>
65 </firstterm> is here to stay; so everyone hacking on the kernel
66 these days needs to know the fundamentals of concurrency and locking
71 <title>The Problem With Concurrency</title>
73 (Skip this if you know what a Race Condition is).
76 In a normal program, you can increment a counter like so:
79 very_important_count++;
83 This is what they would expect to happen:
87 <title>Expected Results</title>
89 <tgroup cols=2 align=left>
93 <entry>Instance 1</entry>
94 <entry>Instance 2</entry>
100 <entry>read very_important_count (5)</entry>
104 <entry>add 1 (6)</entry>
108 <entry>write very_important_count (6)</entry>
113 <entry>read very_important_count (6)</entry>
117 <entry>add 1 (7)</entry>
121 <entry>write very_important_count (7)</entry>
129 This is what might happen:
133 <title>Possible Results</title>
135 <tgroup cols=2 align=left>
138 <entry>Instance 1</entry>
139 <entry>Instance 2</entry>
145 <entry>read very_important_count (5)</entry>
150 <entry>read very_important_count (5)</entry>
153 <entry>add 1 (6)</entry>
158 <entry>add 1 (6)</entry>
161 <entry>write very_important_count (6)</entry>
166 <entry>write very_important_count (6)</entry>
173 This overlap, where what actually happens depends on the
174 relative timing of multiple tasks, is called a race condition.
175 The piece of code containing the concurrency issue is called a
176 critical region. And especially since Linux starting running
177 on SMP machines, they became one of the major issues in kernel
178 design and implementation.
181 The solution is to recognize when these simultaneous accesses
182 occur, and use locks to make sure that only one instance can
183 enter the critical region at any time. There are many
184 friendly primitives in the Linux kernel to help you do this.
185 And then there are the unfriendly primitives, but I'll pretend
192 <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title>
195 There are two main types of kernel locks. The fundamental type
197 (<filename class=headerfile>include/asm/spinlock.h</filename>),
198 which is a very simple single-holder lock: if you can't get the
199 spinlock, you keep trying (spinning) until you can. Spinlocks are
200 very small and fast, and can be used anywhere.
203 The second type is a semaphore
204 (<filename class=headerfile>include/asm/semaphore.h</filename>): it
205 can have more than one holder at any time (the number decided at
206 initialization time), although it is most commonly used as a
207 single-holder lock (a mutex). If you can't get a semaphore,
208 your task will put itself on the queue, and be woken up when the
209 semaphore is released. This means the CPU will do something
210 else while you are waiting, but there are many cases when you
211 simply can't sleep (see <xref linkend="sleeping-things">), and so
212 have to use a spinlock instead.
215 Neither type of lock is recursive: see
216 <xref linkend="techniques-deadlocks">.
219 <sect1 id="uniprocessor">
220 <title>Locks and Uniprocessor Kernels</title>
223 For kernels compiled without <symbol>CONFIG_SMP</symbol>, spinlocks
224 do not exist at all. This is an excellent design decision: when
225 no-one else can run at the same time, there is no reason to
230 You should always test your locking code with <symbol>CONFIG_SMP</symbol>
231 enabled, even if you don't have an SMP test box, because it
232 will still catch some (simple) kinds of deadlock.
236 Semaphores still exist, because they are required for
237 synchronization between <firstterm linkend="gloss-usercontext">user
238 contexts</firstterm>, as we will see below.
243 <title>Read/Write Lock Variants</title>
246 Both spinlocks and semaphores have read/write variants:
247 <type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
248 These divide users into two classes: the readers and the writers. If
249 you are only reading the data, you can get a read lock, but to write to
250 the data you need the write lock. Many people can hold a read lock,
251 but a writer must be sole holder.
255 This means much smoother locking if your code divides up
256 neatly along reader/writer lines. All the discussions below
257 also apply to read/write variants.
261 <sect1 id="usercontextlocking">
262 <title>Locking Only In User Context</title>
265 If you have a data structure which is only ever accessed from
266 user context, then you can use a simple semaphore
267 (<filename>linux/asm/semaphore.h</filename>) to protect it. This
268 is the most trivial case: you initialize the semaphore to the number
269 of resources available (usually 1), and call
270 <function>down_interruptible()</function> to grab the semaphore, and
271 <function>up()</function> to release it. There is also a
272 <function>down()</function>, which should be avoided, because it
273 will not return if a signal is received.
277 Example: <filename>linux/net/core/netfilter.c</filename> allows
278 registration of new <function>setsockopt()</function> and
279 <function>getsockopt()</function> calls, with
280 <function>nf_register_sockopt()</function>. Registration and
281 de-registration are only done on module load and unload (and boot
282 time, where there is no concurrency), and the list of registrations
283 is only consulted for an unknown <function>setsockopt()</function>
284 or <function>getsockopt()</function> system call. The
285 <varname>nf_sockopt_mutex</varname> is perfect to protect this,
286 especially since the setsockopt and getsockopt calls may well
291 <sect1 id="lock-user-bh">
292 <title>Locking Between User Context and BHs</title>
295 If a <firstterm linkend="gloss-bh">bottom half</firstterm> shares
296 data with user context, you have two problems. Firstly, the current
297 user context can be interrupted by a bottom half, and secondly, the
298 critical region could be entered from another CPU. This is where
299 <function>spin_lock_bh()</function>
300 (<filename class=headerfile>include/linux/spinlock.h</filename>) is
301 used. It disables bottom halves on that CPU, then grabs the lock.
302 <function>spin_unlock_bh()</function> does the reverse.
306 This works perfectly for <firstterm linkend="gloss-up"><acronym>UP
307 </acronym></firstterm> as well: the spin lock vanishes, and this macro
308 simply becomes <function>local_bh_disable()</function>
309 (<filename class=headerfile>include/asm/softirq.h</filename>), which
310 protects you from the bottom half being run.
314 <sect1 id="lock-user-tasklet">
315 <title>Locking Between User Context and Tasklets/Soft IRQs</title>
318 This is exactly the same as above, because
319 <function>local_bh_disable()</function> actually disables all
320 softirqs and <firstterm linkend="gloss-tasklet">tasklets</firstterm>
321 on that CPU as well. It should really be
322 called `local_softirq_disable()', but the name has been preserved
323 for historical reasons. Similarly,
324 <function>spin_lock_bh()</function> would now be called
325 spin_lock_softirq() in a perfect world.
330 <title>Locking Between Bottom Halves</title>
333 Sometimes a bottom half might want to share data with
334 another bottom half (especially remember that timers are run
338 <sect2 id="lock-bh-same">
339 <title>The Same BH</title>
342 Since a bottom half is never run on two CPUs at once, you
343 don't need to worry about your bottom half being run twice
344 at once, even on SMP.
348 <sect2 id="lock-bh-different">
349 <title>Different BHs</title>
352 Since only one bottom half ever runs at a time once, you
353 don't need to worry about race conditions with other bottom
354 halves. Beware that things might change under you, however,
355 if someone changes your bottom half to a tasklet. If you
356 want to make your code future-proof, pretend you're already
357 running from a tasklet (see below), and doing the extra
358 locking. Of course, if it's five years before that happens,
359 you're gonna look like a damn fool.
364 <sect1 id="lock-tasklets">
365 <title>Locking Between Tasklets</title>
368 Sometimes a tasklet might want to share data with another
369 tasklet, or a bottom half.
372 <sect2 id="lock-tasklets-same">
373 <title>The Same Tasklet</title>
375 Since a tasklet is never run on two CPUs at once, you don't
376 need to worry about your tasklet being reentrant (running
377 twice at once), even on SMP.
381 <sect2 id="lock-tasklets-different">
382 <title>Different Tasklets</title>
384 If another tasklet (or bottom half, such as a timer) wants
385 to share data with your tasklet, you will both need to use
386 <function>spin_lock()</function> and
387 <function>spin_unlock()</function> calls.
388 <function>spin_lock_bh()</function> is
389 unnecessary here, as you are already in a tasklet, and
390 none will be run on the same CPU.
395 <sect1 id="lock-softirqs">
396 <title>Locking Between Softirqs</title>
399 Often a <firstterm linkend="gloss-softirq">softirq</firstterm> might
400 want to share data with itself, a tasklet, or a bottom half.
403 <sect2 id="lock-softirqs-same">
404 <title>The Same Softirq</title>
407 The same softirq can run on the other CPUs: you can use a
408 per-CPU array (see <xref linkend="per-cpu">) for better
409 performance. If you're going so far as to use a softirq,
410 you probably care about scalable performance enough
411 to justify the extra complexity.
415 You'll need to use <function>spin_lock()</function> and
416 <function>spin_unlock()</function> for shared data.
420 <sect2 id="lock-softirqs-different">
421 <title>Different Softirqs</title>
424 You'll need to use <function>spin_lock()</function> and
425 <function>spin_unlock()</function> for shared data, whether it
426 be a timer (which can be running on a different CPU), bottom half,
427 tasklet or the same or another softirq.
433 <chapter id="hardirq-context">
434 <title>Hard IRQ Context</title>
437 Hardware interrupts usually communicate with a bottom half,
438 tasklet or softirq. Frequently this involves putting work in a
439 queue, which the BH/softirq will take out.
442 <sect1 id="hardirq-softirq">
443 <title>Locking Between Hard IRQ and Softirqs/Tasklets/BHs</title>
446 If a hardware irq handler shares data with a softirq, you have
447 two concerns. Firstly, the softirq processing can be
448 interrupted by a hardware interrupt, and secondly, the
449 critical region could be entered by a hardware interrupt on
450 another CPU. This is where <function>spin_lock_irq()</function> is
451 used. It is defined to disable interrupts on that cpu, then grab
452 the lock. <function>spin_unlock_irq()</function> does the reverse.
456 This works perfectly for UP as well: the spin lock vanishes,
457 and this macro simply becomes <function>local_irq_disable()</function>
458 (<filename class=headerfile>include/asm/smp.h</filename>), which
459 protects you from the softirq/tasklet/BH being run.
463 <function>spin_lock_irqsave()</function>
464 (<filename>include/linux/spinlock.h</filename>) is a variant
465 which saves whether interrupts were on or off in a flags word,
466 which is passed to <function>spin_lock_irqrestore()</function>. This
467 means that the same code can be used inside an hard irq handler (where
468 interrupts are already off) and in softirqs (where the irq
469 disabling is required).
474 <chapter id="common-techniques">
475 <title>Common Techniques</title>
478 This section lists some common dilemmas and the standard
479 solutions used in the Linux kernel code. If you use these,
480 people will find your code simpler to understand.
484 If I could give you one piece of advice: never sleep with anyone
485 crazier than yourself. But if I had to give you advice on
486 locking: <emphasis>keep it simple</emphasis>.
494 Be reluctant to introduce new locks.
498 Strangely enough, this is the exact reverse of my advice when
499 you <emphasis>have</emphasis> slept with someone crazier than yourself.
502 <sect1 id="techniques-no-writers">
503 <title>No Writers in Interrupt Context</title>
506 There is a fairly common case where an interrupt handler needs
507 access to a critical region, but does not need write access.
508 In this case, you do not need to use
509 <function>read_lock_irq()</function>, but only
510 <function>read_lock()</function> everywhere (since if an interrupt
511 occurs, the irq handler will only try to grab a read lock, which
512 won't deadlock). You will still need to use
513 <function>write_lock_irq()</function>.
517 Similar logic applies to locking between softirqs/tasklets/BHs
518 which never need a write lock, and user context:
519 <function>read_lock()</function> and
520 <function>write_lock_bh()</function>.
524 <sect1 id="techniques-deadlocks">
525 <title>Deadlock: Simple and Advanced</title>
528 There is a coding bug where a piece of code tries to grab a
529 spinlock twice: it will spin forever, waiting for the lock to
530 be released (spinlocks, rwlocks and semaphores are not
531 recursive in Linux). This is trivial to diagnose: not a
532 stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
537 For a slightly more complex case, imagine you have a region
538 shared by a bottom half and user context. If you use a
539 <function>spin_lock()</function> call to protect it, it is
540 possible that the user context will be interrupted by the bottom
541 half while it holds the lock, and the bottom half will then spin
542 forever trying to get the same lock.
546 Both of these are called deadlock, and as shown above, it can
547 occur even with a single CPU (although not on UP compiles,
548 since spinlocks vanish on kernel compiles with
549 <symbol>CONFIG_SMP</symbol>=n. You'll still get data corruption
550 in the second example).
554 This complete lockup is easy to diagnose: on SMP boxes the
555 watchdog timer or compiling with <symbol>DEBUG_SPINLOCKS</symbol> set
556 (<filename>include/linux/spinlock.h</filename>) will show this up
557 immediately when it happens.
561 A more complex problem is the so-called `deadly embrace',
562 involving two or more locks. Say you have a hash table: each
563 entry in the table is a spinlock, and a chain of hashed
564 objects. Inside a softirq handler, you sometimes want to
565 alter an object from one place in the hash to another: you
566 grab the spinlock of the old hash chain and the spinlock of
567 the new hash chain, and delete the object from the old one,
568 and insert it in the new one.
572 There are two problems here. First, if your code ever
573 tries to move the object to the same chain, it will deadlock
574 with itself as it tries to lock it twice. Secondly, if the
575 same softirq on another CPU is trying to move another object
576 in the reverse direction, the following could happen:
580 <title>Consequences</title>
582 <tgroup cols=2 align=left>
593 <entry>Grab lock A -> OK</entry>
594 <entry>Grab lock B -> OK</entry>
597 <entry>Grab lock B -> spin</entry>
598 <entry>Grab lock A -> spin</entry>
605 The two CPUs will spin forever, waiting for the other to give up
606 their lock. It will look, smell, and feel like a crash.
609 <sect2 id="techs-deadlock-prevent">
610 <title>Preventing Deadlock</title>
613 Textbooks will tell you that if you always lock in the same
614 order, you will never get this kind of deadlock. Practice
615 will tell you that this approach doesn't scale: when I
616 create a new lock, I don't understand enough of the kernel
617 to figure out where in the 5000 lock hierarchy it will fit.
621 The best locks are encapsulated: they never get exposed in
622 headers, and are never held around calls to non-trivial
623 functions outside the same file. You can read through this
624 code and see that it will never deadlock, because it never
625 tries to grab another lock while it has that one. People
626 using your code don't even need to know you are using a
631 A classic problem here is when you provide callbacks or
632 hooks: if you call these with the lock held, you risk simple
633 deadlock, or a deadly embrace (who knows what the callback
634 will do?). Remember, the other programmers are out to get
635 you, so don't do this.
639 <sect2 id="techs-deadlock-overprevent">
640 <title>Overzealous Prevention Of Deadlocks</title>
643 Deadlocks are problematic, but not as bad as data
644 corruption. Code which grabs a read lock, searches a list,
645 fails to find what it wants, drops the read lock, grabs a
646 write lock and inserts the object has a race condition.
650 If you don't see why, please stay the fuck away from my code.
656 <title>Per-CPU Data</title>
659 A great technique for avoiding locking which is used fairly
660 widely is to duplicate information for each CPU. For example,
661 if you wanted to keep a count of a common condition, you could
662 use a spin lock and a single counter. Nice and simple.
666 If that was too slow [it's probably not], you could instead
667 use a counter for each CPU [don't], then none of them need an
668 exclusive lock [you're wasting your time here]. To make sure
669 the CPUs don't have to synchronize caches all the time, align
670 the counters to cache boundaries by appending
671 `__cacheline_aligned' to the declaration
672 (<filename class=headerfile>include/linux/cache.h</filename>).
673 [Can't you think of anything better to do?]
677 They will need a read lock to access their own counters,
678 however. That way you can use a write lock to grant exclusive
679 access to all of them at once, to tally them up.
684 <title>Big Reader Locks</title>
687 A classic example of per-CPU information is Ingo's `big
689 (<filename class=headerfile>linux/include/brlock.h</filename>). These
690 use the Per-CPU Data techniques described above to create a lock which
691 is very fast to get a read lock, but agonizingly slow for a write
696 Fortunately, there are a limited number of these locks
697 available: you have to go through a strict interview process
702 <sect1 id="lock-avoidance-rw">
703 <title>Avoiding Locks: Read And Write Ordering</title>
706 Sometimes it is possible to avoid locking. Consider the
707 following case from the 2.2 firewall code, which inserted an
708 element into a single linked list in user context:
712 new->next = i->next;
717 Here the author (Alan Cox, who knows what he's doing) assumes
718 that the pointer assignments are atomic. This is important,
719 because networking packets would traverse this list on bottom
720 halves without a lock. Depending on their exact timing, they
721 would either see the new element in the list with a valid
722 <structfield>next</structfield> pointer, or it would not be in the
723 list yet. A lock is still required against other CPUs inserting
724 or deleting from the list, of course.
728 Of course, the writes <emphasis>must</emphasis> be in this
729 order, otherwise the new element appears in the list with an
730 invalid <structfield>next</structfield> pointer, and any other
731 CPU iterating at the wrong time will jump through it into garbage.
732 Because modern CPUs reorder, Alan's code actually read as follows:
736 new->next = i->next;
742 The <function>wmb()</function> is a write memory barrier
743 (<filename class=headerfile>include/asm/system.h</filename>): neither
744 the compiler nor the CPU will allow any writes to memory after the
745 <function>wmb()</function> to be visible to other hardware
746 before any of the writes before the <function>wmb()</function>.
750 As i386 does not do write reordering, this bug would never
751 show up on that platform. On other SMP platforms, however, it
756 There is also <function>rmb()</function> for read ordering: to ensure
757 any previous variable reads occur before following reads. The simple
758 <function>mb()</function> macro combines both
759 <function>rmb()</function> and <function>wmb()</function>.
763 Some atomic operations are defined to act as a memory barrier
764 (ie. as per the <function>mb()</function> macro, but if in
766 <!-- Rusty Russell 2 May 2001, 2.4.4 -->
768 spinlock operations act as partial barriers: operations after
769 gaining a spinlock will never be moved to precede the
770 <function>spin_lock()</function> call, and operations before
771 releasing a spinlock will never be moved after the
772 <function>spin_unlock()</function> call.
773 <!-- Manfred Spraul <manfreds@colorfullife.com>
774 24 May 2000 2.3.99-pre9 -->
778 <sect1 id="lock-avoidance-atomic-ops">
779 <title>Avoiding Locks: Atomic Operations</title>
782 There are a number of atomic operations defined in
783 <filename class=headerfile>include/asm/atomic.h</filename>: these
784 are guaranteed to be seen atomically from all CPUs in the system, thus
785 avoiding races. If your shared data consists of a single counter, say,
786 these operations might be simpler than using spinlocks (although for
787 anything non-trivial using spinlocks is clearer).
791 Note that the atomic operations do in general not act as memory
792 barriers. Instead you can insert a memory barrier before or
793 after <function>atomic_inc()</function> or
794 <function>atomic_dec()</function> by inserting
795 <function>smp_mb__before_atomic_inc()</function>,
796 <function>smp_mb__after_atomic_inc()</function>,
797 <function>smp_mb__before_atomic_dec()</function> or
798 <function>smp_mb__after_atomic_dec()</function>
799 respectively. The advantage of using those macros instead of
800 <function>smp_mb()</function> is, that they are cheaper on some
802 <!-- Sebastian Wilhelmi <seppi@seppi.de> 2002-03-04 -->
806 <sect1 id="ref-counts">
807 <title>Protecting A Collection of Objects: Reference Counts</title>
810 Locking a collection of objects is fairly easy: you get a
811 single spinlock, and you make sure you grab it before
812 searching, adding or deleting an object.
816 The purpose of this lock is not to protect the individual
817 objects: you might have a separate lock inside each one for
818 that. It is to protect the <emphasis>data structure
819 containing the objects</emphasis> from race conditions. Often
820 the same lock is used to protect the contents of all the
821 objects as well, for simplicity, but they are inherently
822 orthogonal (and many other big words designed to confuse).
826 Changing this to a read-write lock will often help markedly if
827 reads are far more common that writes. If not, there is
828 another approach you can use to reduce the time the lock is
829 held: reference counts.
833 In this approach, an object has an owner, who sets the
834 reference count to one. Whenever you get a pointer to the
835 object, you increment the reference count (a `get' operation).
836 Whenever you relinquish a pointer, you decrement the reference
837 count (a `put' operation). When the owner wants to destroy
838 it, they mark it dead, and do a put.
842 Whoever drops the reference count to zero (usually implemented
843 with <function>atomic_dec_and_test()</function>) actually cleans
844 up and frees the object.
848 This means that you are guaranteed that the object won't
849 vanish underneath you, even though you no longer have a lock
854 Here's some skeleton code:
858 void create_foo(struct foo *x)
860 atomic_set(&x->use, 1);
861 spin_lock_bh(&list_lock);
862 ... insert in list ...
863 spin_unlock_bh(&list_lock);
866 struct foo *get_foo(int desc)
870 spin_lock_bh(&list_lock);
872 if (ret) atomic_inc(&ret->use);
873 spin_unlock_bh(&list_lock);
878 void put_foo(struct foo *x)
880 if (atomic_dec_and_test(&x->use))
884 void destroy_foo(struct foo *x)
886 spin_lock_bh(&list_lock);
887 ... remove from list ...
888 spin_unlock_bh(&list_lock);
894 <sect2 id="helpful-macros">
895 <title>Macros To Help You</title>
897 There are a set of debugging macros tucked inside
898 <filename class=headerfile>include/linux/netfilter_ipv4/lockhelp.h</filename>
899 and <filename class=headerfile>listhelp.h</filename>: these are very
900 useful for ensuring that locks are held in the right places to protect
906 <sect1 id="sleeping-things">
907 <title>Things Which Sleep</title>
910 You can never call the following routines while holding a
911 spinlock, as they may sleep. This also means you need to be in
919 <firstterm linkend="gloss-userspace">userspace</firstterm>:
924 <function>copy_from_user()</function>
929 <function>copy_to_user()</function>
934 <function>get_user()</function>
939 <function> put_user()</function>
947 <function>kmalloc(GFP_KERNEL)</function>
953 <function>down_interruptible()</function> and
954 <function>down()</function>
957 There is a <function>down_trylock()</function> which can be
958 used inside interrupt context, as it will not sleep.
959 <function>up()</function> will also never sleep.
965 <function>printk()</function> can be called in
966 <emphasis>any</emphasis> context, interestingly enough.
971 <title>The Fucked Up Sparc</title>
974 Alan Cox says <quote>the irq disable/enable is in the register
975 window on a sparc</quote>. Andi Kleen says <quote>when you do
976 restore_flags in a different function you mess up all the
977 register windows</quote>.
981 So never pass the flags word set by
982 <function>spin_lock_irqsave()</function> and brethren to another
983 function (unless it's declared <type>inline</type>. Usually no-one
984 does this, but now you've been warned. Dave Miller can never do
985 anything in a straightforward manner (I can say that, because I have
986 pictures of him and a certain PowerPC maintainer in a compromising
991 <sect1 id="racing-timers">
992 <title>Racing Timers: A Kernel Pastime</title>
995 Timers can produce their own special problems with races.
996 Consider a collection of objects (list, hash, etc) where each
997 object has a timer which is due to destroy it.
1001 If you want to destroy the entire collection (say on module
1002 removal), you might do the following:
1006 /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
1007 HUNGARIAN NOTATION */
1008 spin_lock_bh(&list_lock);
1011 struct foo *next = list->next;
1012 del_timer(&list->timer);
1017 spin_unlock_bh(&list_lock);
1021 Sooner or later, this will crash on SMP, because a timer can
1022 have just gone off before the <function>spin_lock_bh()</function>,
1023 and it will only get the lock after we
1024 <function>spin_unlock_bh()</function>, and then try to free
1025 the element (which has already been freed!).
1029 This can be avoided by checking the result of
1030 <function>del_timer()</function>: if it returns
1031 <returnvalue>1</returnvalue>, the timer has been deleted.
1032 If <returnvalue>0</returnvalue>, it means (in this
1033 case) that it is currently running, so we can do:
1038 spin_lock_bh(&list_lock);
1041 struct foo *next = list->next;
1042 if (!del_timer(&list->timer)) {
1043 /* Give timer a chance to delete this */
1044 spin_unlock_bh(&list_lock);
1051 spin_unlock_bh(&list_lock);
1055 Another common problem is deleting timers which restart
1056 themselves (by calling <function>add_timer()</function> at the end
1057 of their timer function). Because this is a fairly common case
1058 which is prone to races, you should use <function>del_timer_sync()</function>
1059 (<filename class=headerfile>include/linux/timer.h</filename>)
1060 to handle this case. It returns the number of times the timer
1061 had to be deleted before we finally stopped it from adding itself back
1067 <chapter id="references">
1068 <title>Further reading</title>
1073 <filename>Documentation/spinlocks.txt</filename>:
1074 Linus Torvalds' spinlocking tutorial in the kernel sources.
1080 Unix Systems for Modern Architectures: Symmetric
1081 Multiprocessing and Caching for Kernel Programmers:
1085 Curt Schimmel's very good introduction to kernel level
1086 locking (not written for Linux, but nearly everything
1087 applies). The book is expensive, but really worth every
1088 penny to understand SMP locking. [ISBN: 0201633388]
1094 <chapter id="thanks">
1095 <title>Thanks</title>
1098 Thanks to Telsa Gwynne for DocBooking, neatening and adding
1103 Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul
1104 Mackerras, Ruedi Aschwanden, Alan Cox, Manfred Spraul and Tim
1105 Waugh for proofreading, correcting, flaming, commenting.
1109 Thanks to the cabal for having no influence on this document.
1113 <glossary id="glossary">
1114 <title>Glossary</title>
1116 <glossentry id="gloss-bh">
1117 <glossterm>bh</glossterm>
1120 Bottom Half: for historical reasons, functions with
1121 `_bh' in them often now refer to any software interrupt, e.g.
1122 <function>spin_lock_bh()</function> blocks any software interrupt
1123 on the current CPU. Bottom halves are deprecated, and will
1124 eventually be replaced by tasklets. Only one bottom half will be
1125 running at any time.
1130 <glossentry id="gloss-hwinterrupt">
1131 <glossterm>Hardware Interrupt / Hardware IRQ</glossterm>
1134 Hardware interrupt request. <function>in_irq()</function> returns
1135 <returnvalue>true</returnvalue> in a hardware interrupt handler (it
1136 also returns true when interrupts are blocked).
1141 <glossentry id="gloss-interruptcontext">
1142 <glossterm>Interrupt Context</glossterm>
1145 Not user context: processing a hardware irq or software irq.
1146 Indicated by the <function>in_interrupt()</function> macro
1147 returning <returnvalue>true</returnvalue> (although it also
1148 returns true when interrupts or BHs are blocked).
1153 <glossentry id="gloss-smp">
1154 <glossterm><acronym>SMP</acronym></glossterm>
1157 Symmetric Multi-Processor: kernels compiled for multiple-CPU
1158 machines. (CONFIG_SMP=y).
1163 <glossentry id="gloss-softirq">
1164 <glossterm>softirq</glossterm>
1167 Strictly speaking, one of up to 32 enumerated software
1168 interrupts which can run on multiple CPUs at once.
1169 Sometimes used to refer to tasklets and bottom halves as
1170 well (ie. all software interrupts).
1175 <glossentry id="gloss-swinterrupt">
1176 <glossterm>Software Interrupt / Software IRQ</glossterm>
1179 Software interrupt handler. <function>in_irq()</function> returns
1180 <returnvalue>false</returnvalue>; <function>in_softirq()</function>
1181 returns <returnvalue>true</returnvalue>. Tasklets, softirqs and
1182 bottom halves all fall into the category of `software interrupts'.
1187 <glossentry id="gloss-tasklet">
1188 <glossterm>tasklet</glossterm>
1191 A dynamically-registrable software interrupt,
1192 which is guaranteed to only run on one CPU at a time.
1197 <glossentry id="gloss-up">
1198 <glossterm><acronym>UP</acronym></glossterm>
1201 Uni-Processor: Non-SMP. (CONFIG_SMP=n).
1206 <glossentry id="gloss-usercontext">
1207 <glossterm>User Context</glossterm>
1210 The kernel executing on behalf of a particular
1211 process or kernel thread (given by the <function>current()</function>
1212 macro.) Not to be confused with userspace. Can be interrupted by
1213 software or hardware interrupts.
1218 <glossentry id="gloss-userspace">
1219 <glossterm>Userspace</glossterm>
1222 A process executing its own code outside the kernel.