linux
7 years agoBtrfs: skip checksum verification if IO error occurs
Liu Bo [Fri, 14 Apr 2017 01:11:48 +0000 (18:11 -0700)]
Btrfs: skip checksum verification if IO error occurs

Currently dio read also goes to verify checksum if -EIO has been returned,
although it usually fails on checksum, it's not necessary at all, we could
directly check if there is another copy to read.

And with this, the behavior of dio read is now consistent with that of
buffered read.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ use bool for uptodate ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: tolerate errors if we have retried successfully
Liu Bo [Wed, 17 May 2017 21:42:00 +0000 (15:42 -0600)]
Btrfs: tolerate errors if we have retried successfully

With raid1 profile, dio read isn't tolerating IO errors if read length is
less than the stripe length (64K).

Our bio didn't get split in btrfs_submit_direct_hook() if (dip->flags &
BTRFS_DIO_ORIG_BIO_SUBMITTED) is true and that happens when the read
length is less than 64k.  In this case, if the underlying device returns
error somehow, bio->bi_error has recorded that error.

If we could recover the correct data from another copy in profile raid1/10/5/6,
with btrfs_subio_endio_read() returning 0, bio would have the correct data in
its vector, but bio->bi_error is not updated accordingly so that the following
dio_end_io(dio_bio, bio->bi_error) makes directIO think this read has failed.

This fixes the problem by setting bio's error to 0 if a good copy has been
found.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: pass bytes to btrfs_bio_alloc
David Sterba [Fri, 2 Jun 2017 16:35:36 +0000 (18:35 +0200)]
btrfs: pass bytes to btrfs_bio_alloc

Most callers of btrfs_bio_alloc convert from bytes to sectors. Hide that
in the helper and simplify the logic in the callsers.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: opencode trivial compressed_bio_alloc, simplify error handling
David Sterba [Fri, 2 Jun 2017 16:01:51 +0000 (18:01 +0200)]
btrfs: opencode trivial compressed_bio_alloc, simplify error handling

compressed_bio_alloc is now a trivial wrapper around btrfs_bio_alloc, no
point keeping it. The error handling can be simplified, as we know
btrfs_bio_alloc will never fail.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove redundant parameters from btrfs_bio_alloc
David Sterba [Fri, 2 Jun 2017 15:55:44 +0000 (17:55 +0200)]
btrfs: remove redundant parameters from btrfs_bio_alloc

All callers pass gfp_flags=GFP_NOFS and nr_vecs=BIO_MAX_PAGES.

submit_extent_page adds __GFP_HIGH that does not make a difference in
our case as it allows access to memory reserves but otherwise does not
change the constraints.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: sink gfp parameter to btrfs_bio_clone
David Sterba [Fri, 2 Jun 2017 15:48:13 +0000 (17:48 +0200)]
btrfs: sink gfp parameter to btrfs_bio_clone

All callers pass GFP_NOFS.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: btrfs_io_bio_alloc never fails, skip error handling
David Sterba [Fri, 2 Jun 2017 15:38:30 +0000 (17:38 +0200)]
btrfs: btrfs_io_bio_alloc never fails, skip error handling

Update direct callers of btrfs_io_bio_alloc that do error handling, that
we can now remove.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: btrfs_bio_clone never fails, skip error handling
David Sterba [Fri, 2 Jun 2017 15:38:30 +0000 (17:38 +0200)]
btrfs: btrfs_bio_clone never fails, skip error handling

Update direct callers of btrfs_bio_clone that do error handling, that we
can now remove.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: btrfs_bio_alloc never fails, skip error handling
David Sterba [Fri, 2 Jun 2017 15:38:30 +0000 (17:38 +0200)]
btrfs: btrfs_bio_alloc never fails, skip error handling

Update direct callers of btrfs_bio_alloc that do error handling, that we
can now remove.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: bioset allocations will never fail, adapt our helpers
David Sterba [Fri, 2 Jun 2017 15:26:26 +0000 (17:26 +0200)]
btrfs: bioset allocations will never fail, adapt our helpers

Christoph pointed out that bio allocations backed by a bioset will never
fail.  As we always use a bioset for all bio allocations, we can skip
the error handling.  This patch adjusts our low-level helpers, the
cascaded changes to all callers will come next.

CC: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace
David Sterba [Wed, 31 May 2017 15:21:15 +0000 (17:21 +0200)]
btrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace

The compression workspace buffers are larger than a page so we use
vmalloc, unconditionally. This is not always necessary as there might be
contiguous memory available.

Let's use the kvmalloc helpers that will try kmalloc first and fallback
to vmalloc. For that they require GFP_KERNEL flags. As we now have the
alloc_workspace calls protected by memalloc_nofs in the critical
contexts, we can safely use GFP_KERNEL.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: switch kmallocs to GFP_KERNEL in lzo/zlib alloc_workspace
David Sterba [Wed, 31 May 2017 15:21:15 +0000 (17:21 +0200)]
btrfs: switch kmallocs to GFP_KERNEL in lzo/zlib alloc_workspace

As alloc_workspace is now protected by memalloc_nofs where needed,
we can switch the kmalloc to use GFP_KERNEL.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: add memalloc_nofs protections around alloc_workspace callback
David Sterba [Wed, 31 May 2017 15:14:56 +0000 (17:14 +0200)]
btrfs: add memalloc_nofs protections around alloc_workspace callback

The workspaces are preallocated at the beginning where we can safely use
GFP_KERNEL, but in some cases the find_workspace might reach the
allocation again, now in a more restricted context when the bios or
pages are being compressed.

To avoid potential lockup when alloc_workspace -> vmalloc would silently
use the GFP_KERNEL, add the memalloc_nofs helpers around the critical
call site.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: adjust includes after vmalloc removal
David Sterba [Wed, 31 May 2017 17:44:31 +0000 (19:44 +0200)]
btrfs: adjust includes after vmalloc removal

As we don't use vmalloc/vzalloc/vfree directly in ctree.c, we can now
use the proper header that defines kvmalloc.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: use GFP_KERNEL in init_ipath
David Sterba [Wed, 31 May 2017 17:32:09 +0000 (19:32 +0200)]
btrfs: use GFP_KERNEL in init_ipath

Now that init_ipath is called either from a safe context or with
memalloc_nofs protection, we can switch to GFP_KERNEL allocations in
init_path and init_data_container.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: add memalloc_nofs protection around init_ipath
David Sterba [Wed, 31 May 2017 17:21:38 +0000 (19:21 +0200)]
btrfs: scrub: add memalloc_nofs protection around init_ipath

init_ipath is called from a safe ioctl context and from scrub when
printing an error.  The protection is added for three reasons:

* init_data_container calls vmalloc and this does not work as expected
  in the GFP_NOFS context, so this silently does GFP_KERNEL and might
  deadlock in some cases
* keep the context constraint of GFP_NOFS, used by scrub
* we want to use GFP_KERNEL unconditionally inside init_ipath or its
  callees

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: send: use kvmalloc in iterate_dir_item
David Sterba [Wed, 31 May 2017 16:40:02 +0000 (18:40 +0200)]
btrfs: send: use kvmalloc in iterate_dir_item

We use a growing buffer for xattrs larger than a page size, at some
point vmalloc is unconditionally used for larger buffers. We can still
try to avoid it using the kvmalloc helper.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: replace opencoded kvzalloc with the helper
David Sterba [Wed, 31 May 2017 16:40:02 +0000 (18:40 +0200)]
btrfs: replace opencoded kvzalloc with the helper

The logic of kmalloc and vmalloc fallback is opencoded in
several places, we can now use the existing helper.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: lzo: compressed data size must be less then input size
Timofey Titovets [Mon, 29 May 2017 23:18:04 +0000 (02:18 +0300)]
Btrfs: lzo: compressed data size must be less then input size

Logic already skips if compression makes data bigger, let's sync lzo
with zlib and also return error if compressed size is equal to
input size.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: simplify code with bio_io_error
Guoqing Jiang [Fri, 2 Jun 2017 08:08:50 +0000 (16:08 +0800)]
btrfs: simplify code with bio_io_error

bio_io_error was introduced in the commit 4246a0b63bd8f56a1469b
("block: add a bi_error field to struct bio"), so use it to simplify
code.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps
Omar Sandoval [Mon, 5 Jun 2017 07:12:31 +0000 (00:12 -0700)]
Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps

First, instead of open-coding the vmalloc() fallback, use the new
kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of
GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which
could lead to deadlocks.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: use generic slab for for btrfs_transaction
David Sterba [Tue, 28 Mar 2017 10:06:05 +0000 (12:06 +0200)]
btrfs: use generic slab for for btrfs_transaction

Observing the number of slab objects of btrfs_transaction, there's just
one active on an almost quiescent filesystem, and the number of objects
goes to about ten when sync is in progress. Then the nubmer goes down to
1.  This matches the expectations of the transaction lifetime.

For such use the separate slab cache is not justified, as we do not
reuse objects frequently. For the shortlived transaction, the generic
slab (size 512) should be ok. We can optimistically expect that the 512
slabs are not all used (fragmentation) and there are free slots to take
when we do the allocation, compared to potentially allocating a whole new
page for the separate slab.

We'll lose the stats about the object use, which could be added later if
we really need them.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: embed scrub_wr_ctx into scrub context
David Sterba [Tue, 16 May 2017 17:10:32 +0000 (19:10 +0200)]
btrfs: scrub: embed scrub_wr_ctx into scrub context

The structure scrub_wr_ctx is not used anywhere just the scrub context,
we can move the members there. The tgtdev is renamed so it's more clear
that it belongs to the "wr" part.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: use fs_info::sectorsize and drop it from scrub context
David Sterba [Tue, 16 May 2017 17:10:41 +0000 (19:10 +0200)]
btrfs: scrub: use fs_info::sectorsize and drop it from scrub context

As we now have the node/block sizes in fs_info, we can use them and can
drop the local copies.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: add statx support
Yonghong Song [Fri, 12 May 2017 22:07:43 +0000 (15:07 -0700)]
Btrfs: add statx support

Return enhanced file attributes from the btrfs, including:
  (1). inode creation time as stx_btime, and
  (2). Certain BTRFS_INODE_xxx flags are mapped to stx_attributes flags.

Example output:
[root@localhost ~]# cat t.sh
touch t
chattr +aic t
~/linux/samples/statx/test-statx t
chattr -aic t
touch t
echo "========================================"
~/linux/samples/statx/test-statx t
/bin/rm t
[root@localhost ~]# ./t.sh
statx(t) = 0
results=fff
     Size: 0               Blocks: 0          IO Block: 4096    regular file
Device: 00:1c           Inode: 63962       Links: 1
Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
Access: 2017-05-11 16:03:13.999856591-0700
Modify: 2017-05-11 16:03:13.999856591-0700
Change: 2017-05-11 16:03:14.000856663-0700
   Birth: 2017-05-11 16:03:13.999856591-0700
Attributes: 0000000000000034 (........ ........ ........ ........ ........ ........ ........ .-ai.c..)
========================================
statx(t) = 0
results=fff
  Size: 0               Blocks: 0          IO Block: 4096    regular file
Device: 00:1c           Inode: 63962       Links: 1
Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
Access: 2017-05-11 16:03:14.006857097-0700
Modify: 2017-05-11 16:03:14.006857097-0700
Change: 2017-05-11 16:03:14.006857097-0700
  Birth: 2017-05-11 16:03:13.999856591-0700
Attributes: 0000000000000000 (........ ........ ........ ........ ........ ........ ........ .---.-..)
[root@localhost ~]#

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: lzo: fix typo in error message after failed deflate
Timofey Titovets [Thu, 25 May 2017 18:12:19 +0000 (21:12 +0300)]
Btrfs: lzo: fix typo in error message after failed deflate

Fix copy paste typo in debug message for lzo.c, lzo is not deflate.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: btrfs_wait_tree_block_writeback can be void return
Jeff Layton [Thu, 25 May 2017 10:39:52 +0000 (06:39 -0400)]
btrfs: btrfs_wait_tree_block_writeback can be void return

Nothing checks its return value.

Is it safe to skip checking return value of btrfs_wait_tree_block_writeback?

Liu Bo: I think yes, it's used in walk_log_tree which is called in two
places, free_log_tree and log replay.  For free_log_tree, it waits for
any running writeback of the extent buffer under freeing to finish in
case we need to access the eb pointer from page->private, and it's OK to
not check the return value, while for log replay, it's doesn't wait
because wc->wait is not set. So neither cares about the writeback error.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
[ added more explanation to changelog, from Liu Bo ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove __BTRFS_LEAF_DATA_SIZE
Nikolay Borisov [Mon, 22 May 2017 10:16:11 +0000 (13:16 +0300)]
btrfs: remove __BTRFS_LEAF_DATA_SIZE

__BTRFS_LAF_DATA_SIZE is used only by BTRFS_LEAF_DATA_SIZE. Make the
latter subsume the former.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: rename btrfs_leaf_data to BTRFS_LEAF_DATA_OFFSET
Nikolay Borisov [Mon, 29 May 2017 06:43:43 +0000 (09:43 +0300)]
btrfs: rename btrfs_leaf_data to BTRFS_LEAF_DATA_OFFSET

Commit 5f39d397dfbe ("Btrfs: Create extent_buffer interface
for large blocksizes") refactored btrfs_leaf_data function to take
extent_buffer rather than struct btrfs_leaf. However, as it turns out the
parameter being passed is never used. Furthermore this function no longer
returns the leaf data but rather the offset to it. So rename the function
to BTRFS_LEAF_DATA_OFFSET to make it consistent with other BTRFS_LEAF_*
helpers and turn it into a macro.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ removed () from the macro ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: reduce arguments for decompress_bio ops
Anand Jain [Fri, 26 May 2017 07:44:59 +0000 (15:44 +0800)]
btrfs: reduce arguments for decompress_bio ops

struct compressed_bio pointer can be used instead.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: btrfs_decompress_bio() could accept compressed_bio instead
Anand Jain [Fri, 26 May 2017 07:44:58 +0000 (15:44 +0800)]
btrfs: btrfs_decompress_bio() could accept compressed_bio instead

Instead of sending each argument of struct compressed_bio, send
the compressed_bio itself.

Also by having struct compressed_bio in btrfs_decompress_bio()
it would help tracing.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: Refactor update_space_info
Nikolay Borisov [Mon, 22 May 2017 06:35:50 +0000 (09:35 +0300)]
btrfs: Refactor update_space_info

Following the factoring out of the creation code udpate_space_info can
only be called for already-existing space_info structs. As such it
cannot fail.  Remove superfluous error handling and make the function
return void.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: Separate space_info create/update
Nikolay Borisov [Mon, 22 May 2017 06:35:49 +0000 (09:35 +0300)]
btrfs: Separate space_info create/update

Currently the struct space_info creation code is intermixed in the
udpate_space_info function. There are well-defined points at which the
we actually want to create brand-new space_info structs (e.g. during
mount of the filesystem as well as sometimes when adding/initialising
new chunks). In such cases update_space_info is called with 0 as the
bytes parameter. All of this makes for spaghetti code.

Fix it by factoring out the creation code in a separate
create_space_info structure. This also allows to simplify the internals.
Also remove BUG_ON from do_alloc_chunk since the callers handle errors.
Furthermore it will make the update_space_info function not fail,
allowing us to remove error handling in callers. This will come in a
follow up patch.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: let btrfs_print_leaf print more about block group
Liu Bo [Fri, 26 May 2017 00:08:12 +0000 (18:08 -0600)]
Btrfs: let btrfs_print_leaf print more about block group

This adds chunk_objectid and flags, with flags we can recognize whether
the block group is about data or metadata.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: skip commit transaction if we don't have enough pinned bytes
Liu Bo [Fri, 19 May 2017 17:39:15 +0000 (11:39 -0600)]
Btrfs: skip commit transaction if we don't have enough pinned bytes

We commit transaction in order to reclaim space from pinned bytes because
it could process delayed refs, and in may_commit_transaction(), we check
first if pinned bytes are enough for the required space, we then check if
that plus bytes reserved for delayed insert are enough for the required
space.

This changes the code to the above logic.

Fixes: b150a4f10d87 ("Btrfs: use a percpu to keep track of possibly pinned bytes")
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: simplify cleanup of wr_ctx in scrub_free_ctx
David Sterba [Tue, 16 May 2017 17:10:29 +0000 (19:10 +0200)]
btrfs: scrub: simplify cleanup of wr_ctx in scrub_free_ctx

We don't need to take the mutex and zero out wr_cur_bio, as this is
called after the scrub finished.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: inline helper scrub_free_wr_ctx
David Sterba [Tue, 16 May 2017 17:10:26 +0000 (19:10 +0200)]
btrfs: scrub: inline helper scrub_free_wr_ctx

The helper scrub_free_wr_ctx is used only once and fits into
scrub_free_ctx as it continues sctx shutdown, no need to keep it
separate.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: scrub: inline helper scrub_setup_wr_ctx
David Sterba [Tue, 16 May 2017 17:10:23 +0000 (19:10 +0200)]
btrfs: scrub: inline helper scrub_setup_wr_ctx

The helper scrub_setup_wr_ctx is used only once and fits into
scrub_setup_ctx as it continues intialization, no need to keep it
separate.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove root usage from can_overcommit
Jeff Mahoney [Wed, 17 May 2017 15:38:36 +0000 (11:38 -0400)]
btrfs: remove root usage from can_overcommit

can_overcommit using the root to determine the allocation profile
is the only use of a root in the call graph below reserve_metadata_bytes.

It turns out that we only need to know whether the allocation is for
the chunk root or not -- and we can pass that around as a bool instead.

This allows us to pull root usage out of the reservation path all the
way up to reserve_metadata_bytes itself, which uses it only to compare
against fs_info->chunk_root to set the bool.  In turn, this eliminates
a bunch of races where we use a particular root too early in the mount
process.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: cleanup root usage by btrfs_get_alloc_profile
Jeff Mahoney [Wed, 17 May 2017 15:38:35 +0000 (11:38 -0400)]
btrfs: cleanup root usage by btrfs_get_alloc_profile

There are two places where we don't already know what kind of alloc
profile we need before calling btrfs_get_alloc_profile, but we need
access to a root everywhere we call it.

This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile()
and relegates btrfs_system_alloc_profile to a static for use in those
two cases.  The next patch will eliminate one of those.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: fix bool type in btrfs_page_exists_in_range
David Sterba [Thu, 11 May 2017 23:02:22 +0000 (01:02 +0200)]
btrfs: fix bool type in btrfs_page_exists_in_range

We use only a simple bool indicator, int is not a problem here.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove unused member list from btrfs_end_io_wq
David Sterba [Thu, 13 Apr 2017 17:11:04 +0000 (19:11 +0200)]
btrfs: remove unused member list from btrfs_end_io_wq

The end io work queue items have been tracked by the work queues since
"Btrfs: Add async worker threads for pre and post IO checksumming"
(8b7128429235d9bd72cfd5e) (2008).

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove unused members dir_path from recorded_ref
David Sterba [Thu, 13 Apr 2017 17:11:04 +0000 (19:11 +0200)]
btrfs: remove unused members dir_path from recorded_ref

The two members do not seem to be used since the initial commit.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove unused member list from async_submit_bio
David Sterba [Thu, 13 Apr 2017 17:11:04 +0000 (19:11 +0200)]
btrfs: remove unused member list from async_submit_bio

The list used to track checksums in the early version (2.6.29), but I
was able not pinpoint the commit that stopped using it. Everything
apparently works without it for a long time.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove unused member err from reada_extent
David Sterba [Thu, 13 Apr 2017 17:11:04 +0000 (19:11 +0200)]
btrfs: remove unused member err from reada_extent

Seems to be unused since the initial commit, we ignore readahead errors
anyway, the full read will handle that if necessary.

Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: Remove unnecessary branching in free-space-tree.c
Sahil Kang [Wed, 17 May 2017 10:33:45 +0000 (03:33 -0700)]
btrfs: Remove unnecessary branching in free-space-tree.c

Both btrfs_create_free_space_tree and btrfs_clear_free_space_tree
contain:

  if (ret)
          return ret;

  return 0;

The if statement is only false when ret equals zero, and since we return
zero in such cases, we can safely remove the branching.

Signed-off-by: Sahil Kang <sahil.kang@asilaycomputing.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial
Liu Bo [Tue, 16 May 2017 17:57:14 +0000 (10:57 -0700)]
Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial

We only pass GFP_NOFS to btrfs_bio_clone_partial, so lets hardcode it.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: work around maybe-uninitialized warning
Arnd Bergmann [Thu, 18 May 2017 13:33:29 +0000 (15:33 +0200)]
Btrfs: work around maybe-uninitialized warning

A rewrite of btrfs_submit_direct_hook appears to have introduced a warning:

fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook':
fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Where the 'bio' variable was previously initialized unconditionally, it
is now set in the "while (submit_len > 0)" loop that would never execute
if submit_len is zero.

Assuming this cannot happen in practice, we can avoid the warning
by simply replacing the while{} loop with a do{}while() loop so
the compiler knows that it will always be entered at least once.

Fixes changes introduced in "Btrfs: use bio_clone_bioset_partial to
simplify DIO submit".

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: unify naming of btrfs_io_bio
Liu Bo [Mon, 17 Apr 2017 22:00:28 +0000 (15:00 -0700)]
Btrfs: unify naming of btrfs_io_bio

All dio endio functions are using io_bio for struct btrfs_io_bio, this
makes btrfs_submit_direct to follow this convention.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: check-integrity use bvec_iter
Liu Bo [Fri, 14 Apr 2017 23:11:52 +0000 (16:11 -0700)]
Btrfs: check-integrity use bvec_iter

Some check-integrity code depends on bio->bi_vcnt, this changes it to use
bio segments because some bios passing here may not have a reliable
bi_vcnt.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: record error if one block has failed to retry
Liu Bo [Tue, 16 May 2017 00:20:07 +0000 (17:20 -0700)]
Btrfs: record error if one block has failed to retry

In the nocsum case of dio read endio, it returns immediately if an error
gets returned when repairing, which leaves the rest blocks unrepaired.  The
behavior is different from how buffered read endio works in the same case.
This changes it to record error only and go on repairing the rest blocks.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: change how we iterate bios in endio
Liu Bo [Mon, 15 May 2017 22:33:27 +0000 (15:33 -0700)]
Btrfs: change how we iterate bios in endio

Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio read's
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.

Also for dio reads which don't get split, we also need to save a copy of
bio iterator in btrfs_bio_clone to let __bio_for_each_segments to access
each bvec in dio read's endio.  Note that it doesn't affect other calls of
btrfs_bio_clone() because they don't need to use this iterator.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: use bio_clone_bioset_partial to simplify DIO submit
Liu Bo [Tue, 16 May 2017 16:51:39 +0000 (09:51 -0700)]
Btrfs: use bio_clone_bioset_partial to simplify DIO submit

Currently when mapping bio to limit bio to a single stripe length, we
split bio by adding page to bio one by one, but later we don't modify
the vector of bio at all, thus we can use bio_clone_fast to use the
original bio vector directly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: new helper btrfs_bio_clone_partial
Liu Bo [Tue, 16 May 2017 00:43:31 +0000 (17:43 -0700)]
Btrfs: new helper btrfs_bio_clone_partial

This adds a new helper btrfs_bio_clone_partial, it'll allocate a cloned
bio that only owns a part of the original bio's data.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: use bio_clone_fast to clone our bio
Liu Bo [Tue, 4 Apr 2017 19:23:25 +0000 (12:23 -0700)]
Btrfs: use bio_clone_fast to clone our bio

For raid1 and raid10, we clone the original bio to the bios which are then
sent to different disks.

Right now we use bio_clone_bioset to create a clone bio with iterating
bi_io_vec to initialize it.  This changes it to use bio_clone_fast()
which creates a clone bio but only copies the bi_io_vec pointer
instead of iterating bi_io_vec.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: don't pass the inode through clean_io_failure
Josef Bacik [Fri, 5 May 2017 15:57:15 +0000 (11:57 -0400)]
Btrfs: don't pass the inode through clean_io_failure

Instead pass around the failure tree and the io tree.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: remove inode argument from repair_io_failure
Josef Bacik [Fri, 5 May 2017 15:57:14 +0000 (11:57 -0400)]
btrfs: remove inode argument from repair_io_failure

Once we remove the btree_inode we won't have an inode to pass anymore,
just pass the fs_info directly and the inum since we use that to print
out the repair message.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: replace tree->mapping with tree->private_data
Josef Bacik [Fri, 5 May 2017 15:57:13 +0000 (11:57 -0400)]
Btrfs: replace tree->mapping with tree->private_data

For extent_io tree's we have carried the address_mapping of the inode
around in the io tree in order to pull the inode back out for calling
into various tree ops hooks.  This works fine when everything that has
an extent_io_tree has an inode.  But we are going to remove the
btree_inode, so we need to change this.  Instead just have a generic
void * for private data that we can initialize with, and have all the
tree ops use that instead.  This had a lot of cascading changes but
should be relatively straightforward.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor reordering of the callback prototypes ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: Add quota_override knob into sysfs
Sargun Dhillon [Thu, 11 May 2017 21:18:03 +0000 (21:18 +0000)]
btrfs: Add quota_override knob into sysfs

This patch adds the read-write attribute quota_override into sysfs.
Any process which has CAP_SYS_RESOURCE can set this flag to on, and
once it is set to true, processes with CAP_SYS_RESOURCE can exceed
the quota.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor changelog edits ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE
Sargun Dhillon [Thu, 11 May 2017 21:17:33 +0000 (21:17 +0000)]
btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE

This patch introduces the quota override flag to btrfs_fs_info, and a
change to quota limit checking code to temporarily allow for quota to be
overridden for processes with CAP_SYS_RESOURCE.

It's useful for administrative programs, such as log rotation, that may
need to temporarily use more disk space in order to free up a greater
amount of overall disk space without yielding more disk space to the
rest of userland.

Eventually, we may want to add the idea of an operator-specific quota,
operator reserved space, or something else to allow for administrative
override, but this is perhaps the simplest solution.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor changelog edits ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: Convert fs_info->free_chunk_space to atomic64_t
Nikolay Borisov [Thu, 11 May 2017 06:17:46 +0000 (09:17 +0300)]
btrfs: Convert fs_info->free_chunk_space to atomic64_t

The ->free_chunk_space variable is used to track the unallocated space
and access to it is protected by a spinlock, which is not used for
anything else.  Make the code a bit self-explanatory by switching the
variable to an atomic64_t type and kill the spinlock.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ not a performance critical code, use of atomic type is ok ]
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: add framework to handle device flush error as a volume
Anand Jain [Fri, 5 May 2017 23:17:54 +0000 (07:17 +0800)]
btrfs: add framework to handle device flush error as a volume

This adds comments to the flush error handling part of the code, and
hopes to maintain the same logic with a framework which can be used to
handle the errors at the volume level.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: remove obsolete FIXMEs in qgroup ioctls
Daichou [Mon, 8 May 2017 02:10:02 +0000 (10:10 +0800)]
Btrfs: remove obsolete FIXMEs in qgroup ioctls

These FIXMEs were already addressed in 2013. All functions check for
qgroup existence:

* btrfs_add_qgroup_relation
* btrfs_ioctl_qgroup_create
* btrfs_limit_qgroup
* btrfs_del_qgroup_relation

Signed-off-by: Daichou <tommy0705c@gmail.com>
[ enhance and reformat changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: cleanup unused qgroup trace event
Anand Jain [Fri, 5 May 2017 02:09:36 +0000 (10:09 +0800)]
btrfs: cleanup unused qgroup trace event

Commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for
qgroup data reserve) added the following events which aren't used.
  btrfs__qgroup_data_map
  btrfs_qgroup_init_data_rsv_map
  btrfs_qgroup_free_data_rsv_map
So remove them.

CC: quwenruo@cn.fujitsu.com
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoBtrfs: remove an unused variable
Dan Carpenter [Tue, 2 May 2017 09:25:27 +0000 (12:25 +0300)]
Btrfs: remove an unused variable

"item" is never used.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agobtrfs: kmap() can't fail
Fabian Frederick [Tue, 25 Apr 2017 18:11:02 +0000 (20:11 +0200)]
btrfs: kmap() can't fail

Remove NULL test on kmap() as it will always return a valid pointer.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoLinux 4.12-rc6
Linus Torvalds [Mon, 19 Jun 2017 14:19:37 +0000 (22:19 +0800)]
Linux 4.12-rc6

7 years agomm: larger stack guard gap, between vmas
Hugh Dickins [Mon, 19 Jun 2017 11:03:24 +0000 (04:03 -0700)]
mm: larger stack guard gap, between vmas

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Mon, 19 Jun 2017 08:50:09 +0000 (16:50 +0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Stream of fixes has slowed down, only a few this week:

   - Some DT fixes for Allwinner platforms, and addition of a clock to
     the R_CCU clock controller that had been missed.

   - A couple of small DT fixes for am335x-sl50"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
  ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
  ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
  ARM: dts: am335x-sl50: Fix card detect pin for mmc1
  arm64: allwinner: h5: Remove syslink to shared DTSI
  ARM: sunxi: h3/h5: fix the compatible of R_CCU

7 years agoMerge tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git...
Olof Johansson [Mon, 19 Jun 2017 03:42:21 +0000 (20:42 -0700)]
Merge tag 'sunxi-fixes-for-4.12' of https://git./linux/kernel/git/sunxi/linux into fixes

Allwinner fixes for 4.12

A few fixes around the PRCM support that got in 4.12 with a wrong
compatible, and a missing clock in the binding.

* tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
  ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
  arm64: allwinner: h5: Remove syslink to shared DTSI
  ARM: sunxi: h3/h5: fix the compatible of R_CCU

Signed-off-by: Olof Johansson <olof@lixom.net>
7 years agoMerge tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Mon, 19 Jun 2017 01:55:12 +0000 (18:55 -0700)]
Merge tag 'omap-for-v4.12/fixes-sl50' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Two fixes for am335x-sl50 to fix a boot time error
for claiming SPI pins, and to fix a SDIO card detect
pin for production version of the device.

* tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
  ARM: dts: am335x-sl50: Fix card detect pin for mmc1

Signed-off-by: Olof Johansson <olof@lixom.net>
7 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Mon, 19 Jun 2017 00:25:05 +0000 (09:25 +0900)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio bugfix from Michael Tsirkin:
 "It turns out balloon does not handle IOMMUs correctly. We should fix
  that at some point, for now let's just disable this configuration"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: disable VIOMMU support

7 years agoMerge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 19 Jun 2017 00:20:25 +0000 (09:20 +0900)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Two driver bugfixes"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: ismt: fix wrong device address when unmap the data buffer
  i2c: rcar: use correct length when unmapping DMA

7 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Mon, 19 Jun 2017 00:01:01 +0000 (09:01 +0900)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:

 - Three highmem fixes:
    + Fixed mapping initialization
    + Adjust the pkmap location
    + Ensure we use at most one page for PTEs

 - Fix makefile dependencies for .its targets to depend on vmlinux

 - Fix reversed condition in BNEZC and JIALC software branch emulation

 - Only flush initialized flush_insn_slot to avoid NULL pointer
   dereference

 - perf: Remove incorrect odd/even counter handling for I6400

 - ftrace: Fix init functions tracing

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: .its targets depend on vmlinux
  MIPS: Fix bnezc/jialc return address calculation
  MIPS: kprobes: flush_insn_slot should flush only if probe initialised
  MIPS: ftrace: fix init functions tracing
  MIPS: mm: adjust PKMAP location
  MIPS: highmem: ensure that we don't use more than one page for PTEs
  MIPS: mm: fixed mappings: correct initialisation
  MIPS: perf: Remove incorrect odd/even counter handling for I6400

7 years agovirtio_balloon: disable VIOMMU support
Michael S. Tsirkin [Tue, 13 Jun 2017 17:56:44 +0000 (20:56 +0300)]
virtio_balloon: disable VIOMMU support

virtio balloon bypasses the DMA API entirely so does not support the
VIOMMU right now.  It's not clear we need that support, for now let's
just make sure we don't pretend to support it.

Cc: stable@vger.kernel.org
Cc: Wei Wang <wei.w.wang@intel.com>
Fixes: 1a937693993f ("virtio: new feature to detect IOMMU device quirk")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:49:12 +0000 (18:49 +0900)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixlets for x86:

   - Handle WARN_ONs proper with the new UD based WARN implementation

   - Disable 1G mappings when 2M mappings are disabled by kmemleak or
     debug_pagealloc. Otherwise 1G mappings might still be used,
     confusing the debug mechanisms"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Disable 1GB direct mappings when disabling 2MB mappings
  x86/debug: Handle early WARN_ONs proper

7 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:46:51 +0000 (18:46 +0900)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Three fixlets for timers:

   - Two hot-fixes for the alarmtimer based posix timers, which prevent
     a nasty DOS by self rescheduling timers. The proper cleanup of that
     mess is queued for 4.13

   - Make a function static"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/broadcast: Make tick_broadcast_setup_oneshot() static
  alarmtimer: Rate limit periodic intervals
  alarmtimer: Prevent overflow of relative timers

7 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:45:17 +0000 (18:45 +0900)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "Two small fixes for the schedulre core:

   - Use the proper switch_mm() variant in idle_task_exit() because that
     code is not called with interrupts disabled.

   - Fix a confusing typo in a printk"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
  sched/fair: Fix typo in printk message

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:42:31 +0000 (18:42 +0900)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Three fixes for the perf user space side:

   - Fix the probing of precise_ip level, which got broken recently for
     x86.

   - Unbreak the ARCH=x86_64 build

   - Report module before trying to unwind into the module code, which
     avoids broken stack frames displayed"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf unwind: Report module before querying isactivation in dwfl unwind
  perf tools: Fix build with ARCH=x86_64
  perf evsel: Fix probing of precise_ip level for default cycles event

7 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:40:41 +0000 (18:40 +0900)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "Add a missing resource release to an error path"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Release resources in __setup_irq() error path

7 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Jun 2017 09:38:42 +0000 (18:38 +0900)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Thomas Gleixner:
 "A single fix which adds fortify_panic to the list of no return
  functions"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Add fortify_panic as __noreturn function

7 years agoMerge tag 'led_fixes_for_4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 17 Jun 2017 23:51:35 +0000 (08:51 +0900)]
Merge tag 'led_fixes_for_4.12-rc6' of git://git./linux/kernel/git/j.anaszewski/linux-leds

Pull LED fixes from Jacek Anaszewski:
 "Two LED fixes:

   - fix signal source assignment for leds-bcm6328

   - revert patch that intended to fix LED behavior on suspend but it
     had a side effect preventing suspend at all due to uevent being
     sent on trigger removal"

* tag 'led_fixes_for_4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  Revert "leds: handle suspend/resume in heartbeat trigger"
  leds: bcm6328: fix signal source assignment for leds 4 to 7

7 years agoMerge tag 'usb-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 17 Jun 2017 23:39:54 +0000 (08:39 +0900)]
Merge tag 'usb-4.12-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small gadget and xhci USB fixes for 4.12-rc6.

  Nothing major, but one of the gadget patches does fix a reported oops,
  and the xhci ones resolve reported problems. All have been in
  linux-next with no reported issues"

* tag 'usb-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
  usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
  usb: xhci: Fix USB 3.1 supported protocol parsing
  USB: gadget: fix GPF in gadgetfs
  usb: gadget: composite: make sure to reactivate function on unbind

7 years agoMerge tag 'staging-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 17 Jun 2017 23:36:30 +0000 (08:36 +0900)]
Merge tag 'staging-4.12-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull staging and IIO fixes from Greg KH:
 "Here are some small staging and IIO driver fixes for 4.12-rc6.

  Nothing huge, just a few small driver fixes for reported issues. All
  have been in linux-next with no reported issues"

* tag 'staging-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Staging: rtl8723bs: fix an error code in isFileReadable()
  iio: buffer-dmaengine: Add missing header buffer_impl.h
  iio: buffer-dma: Add missing header buffer_impl.h
  iio: adc: meson-saradc: fix potential crash in meson_sar_adc_clear_fifo
  iio: adc: mxs-lradc: Fix return value check in mxs_lradc_adc_probe()
  iio: imu: inv_mpu6050: add accel lpf setting for chip >= MPU6500
  staging: iio: ad7152: Fix deadlock in ad7152_write_raw_samp_freq()

7 years agoMerge tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client
Linus Torvalds [Sat, 17 Jun 2017 23:23:02 +0000 (08:23 +0900)]
Merge tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A fix for an old ceph ->fh_to_* bug from Luis and two timestamp fixups
  from Zheng, prompted by the ongoing y2038 work"

* tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client:
  ceph: unify inode i_ctime update
  ceph: use current_kernel_time() to get request time stamp
  ceph: check i_nlink while converting a file handle to dentry

7 years agoMerge tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 17 Jun 2017 08:34:41 +0000 (17:34 +0900)]
Merge tag 'xfs-4.12-fixes-4' of git://git./fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "One more bugfix for you for 4.12-rc6 to fix something that came up in
  an earlier rc:

   - Fix some bogus ASSERT failures on CONFIG_SMP=n and CONFIG_XFS_DEBUG=y"

* tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix spurious spin_is_locked() assert failures on non-smp kernels

7 years agoMerge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 17 Jun 2017 08:30:07 +0000 (17:30 +0900)]
Merge branch 'ufs-fixes' of git://git./linux/kernel/git/viro/vfs

Pull ufs fixes from Al Viro:
 "Fix assorted ufs bugs: a couple of deadlocks, fs corruption in
  truncate(), oopsen on tail unpacking and truncate when racing with
  vmscan, mild fs corruption (free blocks stats summary buggered, *BSD
  fsck would complain and fix), several instances of broken logics
  around reserved blocks (starting with "check almost never triggers
  when it should" and then there are issues with sufficiently large
  UFS2)"

[ Note: ufs hasn't gotten any loving in a long time, because nobody
  really seems to use it. These ufs fixes are triggered by people
  actually caring now, not some sudden influx of new bugs.  - Linus ]

* 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ufs_truncate_blocks(): fix the case when size is in the last direct block
  ufs: more deadlock prevention on tail unpacking
  ufs: avoid grabbing ->truncate_mutex if possible
  ufs_get_locked_page(): make sure we have buffer_heads
  ufs: fix s_size/s_dsize users
  ufs: fix reserved blocks check
  ufs: make ufs_freespace() return signed
  ufs: fix logics in "ufs: make fsck -f happy"

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 17 Jun 2017 08:26:53 +0000 (17:26 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "A couple of fixes; a leak in mntns_install() caught by Andrei (this
  cycle regression) + d_invalidate() softlockup fix - that had been
  reported by a bunch of people lately, but the problem is pretty old"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: don't forget to put old mntns in mntns_install
  Hang/soft lockup in d_invalidate with simultaneous calls

7 years agoMerge tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Fri, 16 Jun 2017 21:53:20 +0000 (06:53 +0900)]
Merge tag 'pci-v4.12-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - fix another PCI_ENDPOINT build error (merged for v4.12)

 - fix error codes added to config accessors for v4.12

* tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: endpoint: Select CRC32 to fix test build error
  PCI: Make error code types consistent in pci_{read,write}_config_*

7 years agoMerge tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux
Linus Torvalds [Fri, 16 Jun 2017 21:51:25 +0000 (06:51 +0900)]
Merge tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - fix udlfb driver to stop spamming logs (Mike Gerow)

 - add missing endianness conversions in smscufx & udlfb drivers (Johan
   Hovold)

 - fix few gcc warnings/errors (Arnd Bergmann)

* tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux:
  video: fbdev: udlfb: drop log level for blanking
  video: fbdev: via: remove possibly unused variables
  video: fbdev: add missing USB-descriptor endianness conversions
  video: fbdev: avoid int-in-bool-context warning

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 16 Jun 2017 21:49:34 +0000 (06:49 +0900)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: correct the comment when reclaimed pages exceed the scanned pages
  userfaultfd: shmem: handle coredumping in handle_userfault()
  mm: numa: avoid waiting on freed migrated pages
  swap: cond_resched in swap_cgroup_prepare()
  mm/memory-failure.c: use compound_head() flags for huge pages

7 years agomm: correct the comment when reclaimed pages exceed the scanned pages
zhongjiang [Fri, 16 Jun 2017 21:02:40 +0000 (14:02 -0700)]
mm: correct the comment when reclaimed pages exceed the scanned pages

Commit e1587a494540 ("mm: vmpressure: fix sending wrong events on
underflow") declared that reclaimed pages exceed the scanned pages due
to the thp reclaim.

That is incorrect because THP will be spilt to normal page and loop
again, which will result in the scanned pages increment.

[akpm@linux-foundation.org: tweak comment text]
Link: http://lkml.kernel.org/r/1496824266-25235-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agouserfaultfd: shmem: handle coredumping in handle_userfault()
Andrea Arcangeli [Fri, 16 Jun 2017 21:02:37 +0000 (14:02 -0700)]
userfaultfd: shmem: handle coredumping in handle_userfault()

Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to
__get_user_pages().

shmem as opposed has no special FOLL_DUMP handling there so
handle_mm_fault() is invoked without mmap_sem and ends up calling
handle_userfault() that isn't expecting to be invoked without mmap_sem
held.

This makes handle_userfault() fail immediately if invoked through
shmem_vm_ops->fault during coredumping and solves the problem.

The side effect is a BUG_ON with no lock held triggered by the
coredumping process which exits.  Only 4.11 is affected, pre-4.11 anon
memory holes are skipped in __get_user_pages by checking FOLL_DUMP
explicitly against empty pagetables (mm/gup.c:no_page_table()).

It's zero cost as we already had a check for current->flags to prevent
futex to trigger userfaults during exit (PF_EXITING).

Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: numa: avoid waiting on freed migrated pages
Mark Rutland [Fri, 16 Jun 2017 21:02:34 +0000 (14:02 -0700)]
mm: numa: avoid waiting on freed migrated pages

In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
waiting until the pmd is unlocked before we return and retry.  However,
we can race with migrate_misplaced_transhuge_page():

    // do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
    // Holds 0 refs on page                 // Holds 2 refs on page

    vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
    /* ... */
    if (pmd_trans_migrating(*vmf->pmd)) {
            page = pmd_page(*vmf->pmd);
            spin_unlock(vmf->ptl);
                                            ptl = pmd_lock(mm, pmd);
                                            if (page_count(page) != 2)) {
                                                    /* roll back */
                                            }
                                            /* ... */
                                            mlock_migrate_page(new_page, page);
                                            /* ... */
                                            spin_unlock(ptl);
                                            put_page(page);
                                            put_page(page); // page freed here
            wait_on_page_locked(page);
            goto out;
    }

This can result in the freed page having its waiters flag set
unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
page alloc/free functions.  This has been observed on arm64 KVM guests.

We can avoid this by having do_huge_pmd_numa_page() take a reference on
the page before dropping the pmd lock, mirroring what we do in
__migration_entry_wait().

When we hit the race, migrate_misplaced_transhuge_page() will see the
reference and abort the migration, as it may do today in other cases.

Fixes: b8916634b77bffb2 ("mm: Prevent parallel splits during THP migration")
Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoswap: cond_resched in swap_cgroup_prepare()
Yu Zhao [Fri, 16 Jun 2017 21:02:31 +0000 (14:02 -0700)]
swap: cond_resched in swap_cgroup_prepare()

I saw need_resched() warnings when swapping on large swapfile (TBs)
because continuously allocating many pages in swap_cgroup_prepare() took
too long.

We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
the same for the page allocation.

Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm/memory-failure.c: use compound_head() flags for huge pages
James Morse [Fri, 16 Jun 2017 21:02:29 +0000 (14:02 -0700)]
mm/memory-failure.c: use compound_head() flags for huge pages

memory_failure() chooses a recovery action function based on the page
flags.  For huge pages it uses the tail page flags which don't have
anything interesting set, resulting in:

> Memory failure: 0x9be3b4: Unknown page state
> Memory failure: 0x9be3b4: recovery action for unknown page: Failed

Instead, save a copy of the head page's flags if this is a huge page,
this means if there are no relevant flags for this tail page, we use the
head pages flags instead.  This results in the me_huge_page() recovery
action being called:

> Memory failure: 0x9b7969: recovery action for huge page: Delayed

For hugepages that have not yet been allocated, this allows the hugepage
to be dequeued.

Fixes: 524fca1e7356 ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 16 Jun 2017 20:57:54 +0000 (05:57 +0900)]
Merge tag 'powerpc-4.12-6' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Three small fixes for recently merged code:

   - remove a spurious WARN_ON when a PCI device has no of_node, it's
     allowed in some circumstances for there to be no of_node.

   - fix the offset for store EOI MMIOs in the XIVE interrupt
     controller.

   - fix non-const WARN_ONs which were becoming BUGs due to them losing
     BUGFLAG_WARNING in a recent cleanup patch.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
  Herrenschmidt"

* tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
  powerpc/xive: Fix offset for store EOI MMIOs
  powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node

7 years agoMerge tag 'perf-urgent-for-mingo-4.12-20170616' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Fri, 16 Jun 2017 19:33:48 +0000 (21:33 +0200)]
Merge tag 'perf-urgent-for-mingo-4.12-20170616' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix probing of precise_ip level for default cycles event, that
  got broken recently on x86_64 when its arch code started
  considering invalid requesting precise samples when not sampling
  (i.e. when attr.sample_period == 0).

  This also fixes another problem in s/390 where the precision
  probing with sample_period == 0 returned precise_ip > 0, that
  then, when setting up the real cycles event (not probing) would
  return EOPNOTSUPP for precise_ip > 0 (as determined previously
  by probing) and sample_period > 0.

  These problems resulted in attr_precise not being set to the
  highest precision available on x86.64 when no event was specified,
  i.e. the canonical:

perf record ./workload

  would end up using attr.precise_ip = 0. As a workaround this would
  need to be done:

perf record -e cycles:P ./workload

  And on s/390 it would plain not work, requiring using:

        perf record -e cycles ./workload

  as a workaround.  (Arnaldo Carvalho de Melo)

- Fix perf build with ARCH=x86_64, when ARCH should be transformed
  into ARCH=x86, just like with the main kernel Makefile and
  tools/objtool's, i.e. use SRCARCH. (Jiada Wang)

- Avoid accessing uninitialized data structures when unwinding with
  elfutils's libdw, making it more closely mimic libunwind's unwinder.
  (Milian Wolff)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoperf unwind: Report module before querying isactivation in dwfl unwind
Milian Wolff [Fri, 2 Jun 2017 14:37:53 +0000 (16:37 +0200)]
perf unwind: Report module before querying isactivation in dwfl unwind

The PC returned by dwfl_frame_pc() may map into a not-yet-reported
module. We have to report it before we continue unwinding. But when we
query for the isactivation flag in dwfl_frame_pc, libdw will actually do
one more unwinding step internally which can then break and lead to
missed frames or broken stacks.

With libunwind we get e.g.:

~~~~~
  heaptrack_gui  2228 135073.400474:     613969 cycles:
          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
           20439 __libc_start_main (/usr/lib/libc-2.25.so)
           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)

  heaptrack_gui  2228 135073.401156:     569521 cycles:
          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
           f5a1c QGuiApplicationPrivate::createPlatformIntegration (/usr/lib/libQt5Gui.so.5.8.0)
           f650c QGuiApplicationPrivate::createEventDispatcher (/usr/lib/libQt5Gui.so.5.8.0)
          298524 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
           20439 __libc_start_main (/usr/lib/libc-2.25.so)
           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
~~~~~

Note the two frames 1589e8 and 78622 in the first sample. These are
missing when unwinding with libdw. The second sample's breakage is
more obvious:

~~~~~
  heaptrack_gui  2228 135073.400474:     613969 cycles:
          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
           20439 __libc_start_main (/usr/lib/libc-2.25.so)
           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)

heaptrack_gui  2228 135073.401156:     569521 cycles:
          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
          723dbf [unknown] ([unknown])
~~~~~

This patch fixes this issue and the libdw unwinder mimicks the libunwind
behavior more closely.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20170602143753.16907-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoMerge tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs
Linus Torvalds [Fri, 16 Jun 2017 09:45:47 +0000 (18:45 +0900)]
Merge tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:
 "A fix from Nic for a race seen in production (including a stable tag).

  And while I'm sending you this I'm also sneaking in a trivial new
  helper from Bart so that we don't need inter-tree dependencies for the
  next merge window"

* tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs:
  configfs: Introduce config_item_get_unless_zero()
  configfs: Fix race between create_link and configfs_rmdir