mm: make mm->pinned_vm an atomic64 counter
authorDavidlohr Bueso <dave@stgolabs.net>
Thu, 7 Feb 2019 00:58:44 +0000 (11:58 +1100)
committerStephen Rothwell <sfr@canb.auug.org.au>
Fri, 8 Feb 2019 09:30:57 +0000 (20:30 +1100)
commit14a81014e6f70dd22006855c047282e6d2e4aea4
tree55cfa242d7b6ddb70863b6b6523b92389e1fc72a
parent1b0dc1d077e01e2ef49405a1b0661c2a39bbab0b
mm: make mm->pinned_vm an atomic64 counter

Patch series "mm: make pinned_vm atomic and simplify users", v3.

This series aims to provide cleanups to users that pin pages (mostly
infiniband) by converting the counter to atomic -- note that Daniel Jordan
also has patches
(http://lkml.kernel.org/r/20181105165558.11698-8-daniel.m.jordan@oracle.com)
for the locked_vm counterpart and vfio.

Apart from removing a source of mmap_sem writer, we benefit in that we can
get rid of a lot of code that defers work when the lock cannot be
acquired, as well as drivers avoiding mmap_sem altogether by also
converting gup to gup_fast() and letting the mm handle it.  Users that do
the gup_longterm() remain of course under at least reader mmap_sem.

On a similar topic and potential follow up, it would be nice to resurrect
Peter's VM_PINNED idea in that the broken semantics that occurred after
bc3e53f682 ("mm: distinguish between mlocked and pinned pages") are still
present.  Also encapsulating internal mm logic via mm[un]pin() instead of
drivers having to know about internals and playing nice with compaction
are all wins.

[1] https://lkml.org/lkml/2018/11/5/854

This patch (of 6):

Taking a sleeping lock to _only_ increment a variable is quite the
overkill, and pretty much all users do this.  Furthermore, some drivers
(ie: infiniband and scif) that need pinned semantics can go to quite some
trouble to actually delay via workqueue (un)accounting for pinned pages
when not possible to acquire it.

By making the counter atomic we no longer need to hold the mmap_sem and
can simply some code around it for pinned_vm users.  The counter is 64-bit
such that we need not worry about overflows such as rdma user input
controlled from userspace.

Link: http://lkml.kernel.org/r/20190206175920.31082-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
drivers/infiniband/core/umem.c
drivers/infiniband/hw/hfi1/user_pages.c
drivers/infiniband/hw/qib/qib_user_pages.c
drivers/infiniband/hw/usnic/usnic_uiom.c
drivers/misc/mic/scif/scif_rma.c
fs/proc/task_mmu.c
include/linux/mm_types.h
kernel/events/core.c
kernel/fork.c
mm/debug.c