<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-obmc-linux/drivers/dax, branch dev-4.13</title>
<subtitle>Talos™ II Linux sources for OpenBMC</subtitle>
<id>https://git.raptorcs.com/git/talos-obmc-linux/atom?h=dev-4.13</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-obmc-linux/atom?h=dev-4.13'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/'/>
<updated>2017-10-05T07:47:24+00:00</updated>
<entry>
<title>dax: remove the pmem_dax_ops-&gt;flush abstraction</title>
<updated>2017-10-05T07:47:24+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-09-01T01:47:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=bfc0ab41a82c8b7e68aae241b19bd716dd678521'/>
<id>urn:sha1:bfc0ab41a82c8b7e68aae241b19bd716dd678521</id>
<content type='text'>
commit c3ca015fab6df124c933b91902f3f2a3473f9da5 upstream.

Commit abebfbe2f731 ("dm: add -&gt;flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops-&gt;flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops-&gt;flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev-&gt;ops-&gt;flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops-&gt;flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2f731 ("dm: add -&gt;flush() dax operation support")
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm</title>
<updated>2017-07-28T19:17:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-28T19:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=1731a47444e7bf66d5cf9415bbd6ac5e3903d719'/>
<id>urn:sha1:1731a47444e7bf66d5cf9415bbd6ac5e3903d719</id>
<content type='text'>
Pull device mapper fixes from Mike Snitzer:

 - a few DM integrity fixes that improve performance. One that address
   inefficiencies in the on-disk journal device layout. Another that
   makes use of the block layer's on-stack plugging when writing the
   journal.

 - a dm-bufio fix for the blk_status_t conversion that went in during
   the merge window.

 - a few DM raid fixes that address correctness when suspending the
   device and a validation fix for validation that occurs during device
   activation.

 - a couple DM zoned target fixes. Important one being the fix to not
   use GFP_KERNEL in the IO path due to concerns about deadlock in
   low-memory conditions (e.g. swap over a DM zoned device, etc).

 - a DM DAX device fix to make sure dm_dax_flush() is called if the
   underlying DAX device is operating as a write cache.

* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm, dax: Make sure dm_dax_flush() is called if device supports it
  dm verity fec: fix GFP flags used with mempool_alloc()
  dm zoned: use GFP_NOIO in I/O path
  dm zoned: remove test for impossible REQ_OP_FLUSH conditions
  dm raid: bump target version
  dm raid: avoid mddev-&gt;suspended access
  dm raid: fix activation check in validate_raid_redundancy()
  dm raid: remove WARN_ON() in raid10_md_layout_to_format()
  dm bufio: fix error code in dm_bufio_write_dirty_buffers()
  dm integrity: test for corrupted disk format during table load
  dm integrity: WARN_ON if variables representing journal usage get out of sync
  dm integrity: use plugging when writing the journal
  dm integrity: fix inefficient allocation of journal space
</content>
</entry>
<entry>
<title>dm, dax: Make sure dm_dax_flush() is called if device supports it</title>
<updated>2017-07-26T19:55:44+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2017-07-26T13:35:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=273752c9ff03eb83856601b2a3458218bb949e46'/>
<id>urn:sha1:273752c9ff03eb83856601b2a3458218bb949e46</id>
<content type='text'>
Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device.  This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2f7 ("dm: add -&gt;flush() dax operation support")
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
</entry>
<entry>
<title>device-dax: fix sysfs duplicate warnings</title>
<updated>2017-07-19T00:49:14+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-07-19T00:49:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=bbb3be170ac2891526ad07b18af7db226879a8e7'/>
<id>urn:sha1:bbb3be170ac2891526ad07b18af7db226879a8e7</id>
<content type='text'>
Fix warnings of the form...

     WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
     sysfs: cannot create duplicate filename '/class/dax/dax12.0'
     Call Trace:
      dump_stack+0x63/0x86
      __warn+0xcb/0xf0
      warn_slowpath_fmt+0x5a/0x80
      ? kernfs_path_from_node+0x4f/0x60
      sysfs_warn_dup+0x62/0x80
      sysfs_do_create_link_sd.isra.2+0x97/0xb0
      sysfs_create_link+0x25/0x40
      device_add+0x266/0x630
      devm_create_dax_dev+0x2cf/0x340 [dax]
      dax_pmem_probe+0x1f5/0x26e [dax_pmem]
      nvdimm_bus_probe+0x71/0x120

...by reusing the namespace id for the device-dax instance name.

Now that we have decided that there will never by more than one
device-dax instance per libnvdimm-namespace parent device [1], we can
directly reuse the namepace ids. There are some possible follow-on
cleanups, but those are saved for a later patch to simplify the -stable
backport.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html

Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem...")
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: Dariusz Dokupil &lt;dariusz.dokupil@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>device-dax: fix 'passing zero to ERR_PTR()' warning</title>
<updated>2017-07-17T18:43:58+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-07-12T20:42:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=43fe51e11c194a6576634585f81ba33e104194a5'/>
<id>urn:sha1:43fe51e11c194a6576634585f81ba33e104194a5</id>
<content type='text'>
Dan Carpenter reports:

    The patch 7b6be8444e0f: "dax: refactor dax-fs into a generic provider
    of 'struct dax_device' instances" from Apr 11, 2017, leads to the
    following static checker warning:

        drivers/dax/device.c:643 devm_create_dev_dax()
        warn: passing zero to 'ERR_PTR'

Fix the case where we inadvertently leak 0 to ERR_PTR() by setting at
every error case, and make it clear that 'count' is never 0.

Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux</title>
<updated>2017-07-08T02:38:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-08T02:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750'/>
<id>urn:sha1:088737f44bbf6378745f5b57b035e57ee3dc4750</id>
<content type='text'>
Pull Writeback error handling updates from Jeff Layton:
 "This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2017-07-07T16:44:06+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-07T16:44:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=b6ffe9ba46016f8351896ccee33bebcd0e5ea7c0'/>
<id>urn:sha1:b6ffe9ba46016f8351896ccee33bebcd0e5ea7c0</id>
<content type='text'>
Pull libnvdimm updates from Dan Williams:
 "libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with -&gt;copy_from_iter() and -&gt;flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  &lt;toshi.kani@hpe.com&gt;"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...
</content>
</entry>
<entry>
<title>fs: new infrastructure for writeback error handling and reporting</title>
<updated>2017-07-06T11:02:25+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2017-07-06T11:02:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=5660e13d2fd6af1903d4b0b98020af95ca2d638a'/>
<id>urn:sha1:5660e13d2fd6af1903d4b0b98020af95ca2d638a</id>
<content type='text'>
Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.

The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.

If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.

This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.

In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.

One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.

This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.

This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).

Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.

The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure for reporting errors at
fsync time.

Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>libnvdimm, pmem, dax: export a cache control attribute</title>
<updated>2017-06-29T16:29:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-06-27T04:28:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=6e0c90d691cd5d90569f5918ab03eb76c81f9c6e'/>
<id>urn:sha1:6e0c90d691cd5d90569f5918ab03eb76c81f9c6e</id>
<content type='text'>
The dax_flush() operation can be turned into a nop on platforms where
firmware arranges for cpu caches to be flushed on a power-fail event.
The ACPI 6.2 specification defines a mechanism for the platform to
indicate this capability so the kernel can select the proper default.
However, for other platforms, the administrator must toggle this setting
manually.

Given this flush setting is a dax-specific mechanism we advertise it
through a 'dax' attribute group hanging off a host device. For example,
a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
'write_cache' attribute to control response to dax cache flush requests.
This is similar to the 'queue/write_cache' attribute that appears under
block devices.

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: convert to bitmask for flags</title>
<updated>2017-06-29T16:29:49+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-06-28T00:59:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=9a60c3ef577beb0376704808949f2c1f8fb0672c'/>
<id>urn:sha1:9a60c3ef577beb0376704808949f2c1f8fb0672c</id>
<content type='text'>
In preparation for adding more flags, convert the existing flag to a
bit-flag.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
</feed>
