<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/fs/notify/mark.c, branch v4.7</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.7</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.7'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2016-05-20T02:12:14+00:00</updated>
<entry>
<title>fsnotify: avoid spurious EMFILE errors from inotify_init()</title>
<updated>2016-05-20T02:12:14+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-20T00:08:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=35e481761cdc688dbee0ef552a13f49af8eba6cc'/>
<id>urn:sha1:35e481761cdc688dbee0ef552a13f49af8eba6cc</id>
<content type='text'>
Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance).  However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance.  Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view.  This problem is exposed by LTP inotify06 testcase on
some machines.

We fix the problem by making sure all group marks are properly freed
while destroying inotify instance.  We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance.  So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Xiaoguang Wang &lt;wangxg.fnst@cn.fujitsu.com&gt;
Tested-by: Xiaoguang Wang &lt;wangxg.fnst@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: turn fsnotify reaper thread into a workqueue job</title>
<updated>2016-02-19T00:23:24+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2016-02-17T21:11:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=0918f1c309b86301605650c836ddd2021d311ae2'/>
<id>urn:sha1:0918f1c309b86301605650c836ddd2021d311ae2</id>
<content type='text'>
We don't require a dedicated thread for fsnotify cleanup.  Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Tested-by: Eryu Guan &lt;guaneryu@gmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Eric Paris &lt;eparis@parisplace.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"</title>
<updated>2016-02-19T00:23:24+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2016-02-17T21:11:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=13d34ac6e55b8284c592c67e166ac614b3c4c1d7'/>
<id>urn:sha1:13d34ac6e55b8284c592c67e166ac614b3c4c1d7</id>
<content type='text'>
This reverts commit c510eff6beba ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks.  While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here.  While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys.  While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch.  A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Reported-by: Eryu Guan &lt;guaneryu@gmail.com&gt;
Tested-by: Eryu Guan &lt;guaneryu@gmail.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: Eric Paris &lt;eparis@parisplace.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: destroy marks with call_srcu instead of dedicated thread</title>
<updated>2016-01-15T00:00:49+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2016-01-14T23:16:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=c510eff6bebaa244e577b8f499e86606b5e5d4c7'/>
<id>urn:sha1:c510eff6bebaa244e577b8f499e86606b5e5d4c7</id>
<content type='text'>
At the time that this code was originally written, call_srcu didn't
exist, so this thread was required to ensure that we waited for that
SRCU grace period to settle before finally freeing the object.

It does exist now however and we can much more efficiently use call_srcu
to handle this.  That also allows us to potentially use srcu_barrier to
ensure that they are all of the callbacks have run before proceeding.
In order to conserve space, we union the rcu_head with the g_list.

This will be necessary for nfsd which will allocate marks from a
dedicated slabcache.  We have to be able to ensure that all of the
objects are destroyed before destroying the cache.  That's fairly

Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Cc: Eric Paris &lt;eparis@parisplace.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: get rid of fsnotify_destroy_mark_locked()</title>
<updated>2015-09-04T23:54:41+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.com</email>
</author>
<published>2015-09-04T22:43:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=4712e722f91457e60723b9cef6265a74290efba9'/>
<id>urn:sha1:4712e722f91457e60723b9cef6265a74290efba9</id>
<content type='text'>
fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group-&gt;mark_mutex.  To avoid future problems with this
function, split it into two.

fsnotify_detach_mark() is the part that needs group-&gt;mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group-&gt;mark_mutex.  This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group-&gt;mark_mutex.

Signed-off-by: Jan Kara &lt;jack@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: remove mark-&gt;free_list</title>
<updated>2015-09-04T23:54:41+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.com</email>
</author>
<published>2015-09-04T22:43:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=925d1132a03e33cb8f29a0057300d023b4f1be23'/>
<id>urn:sha1:925d1132a03e33cb8f29a0057300d023b4f1be23</id>
<content type='text'>
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away.  However we can free all of
the marks without using a special list with some care.

Signed-off-by: Jan Kara &lt;jack@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()</title>
<updated>2015-08-07T01:39:41+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.com</email>
</author>
<published>2015-08-06T22:46:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=8f2f3eb59dff4ec538de55f2e0592fec85966aab'/>
<id>urn:sha1:8f2f3eb59dff4ec538de55f2e0592fec85966aab</id>
<content type='text'>
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list.  This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara &lt;jack@suse.com&gt;
Reported-by: Ashish Sangwan &lt;a.sangwan@samsung.com&gt;
Reviewed-by: Ashish Sangwan &lt;a.sangwan@samsung.com&gt;
Cc: Lino Sanfilippo &lt;LinoSanfilippo@gmx.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"</title>
<updated>2015-07-21T23:06:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-07-21T23:06:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d725e66c06ab440032f49ef17e960896d0ec6d49'/>
<id>urn:sha1:d725e66c06ab440032f49ef17e960896d0ec6d49</id>
<content type='text'>
This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf.

Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:

 "Thanks for report! You are right that my patch introduces a race
  between fsnotify kthread and fsnotify_destroy_group() which can result
  in leaking inotify event on group destruction.

  I haven't yet decided whether the right fix is not to queue events for
  dying notification group (as that is pointless anyway) or whether we
  should just fix the original problem differently...  Whenever I look
  at fsnotify code mark handling I get lost in the maze of locks, lists,
  and subtle differences between how different notification systems
  handle notification marks :( I'll think about it over night"

and after thinking about it, Jan says:

 "OK, I have looked into the code some more and I found another
  relatively simple way of fixing the original oops.  It will be IMHO
  better than trying to fixup this issue which has more potential for
  breakage.  I'll ask Linus to revert the fsnotify fix he already merged
  and send a new fix"

Reported-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Requested-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()</title>
<updated>2015-07-17T23:39:54+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2015-07-17T23:24:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a2673b6e040663bf16a552f8619e6bde9f4b9acf'/>
<id>urn:sha1:a2673b6e040663bf16a552f8619e6bde9f4b9acf</id>
<content type='text'>
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked().  The reason why we drop that mutex is that
we need to call a -&gt;freeing_mark() callback which may acquire mark_mutex
again.  To avoid this and similar lock inversion issues, we move the call
to -&gt;freeing_mark() callback to the kthread destroying the mark.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Ashish Sangwan &lt;a.sangwan@samsung.com&gt;
Suggested-by: Lino Sanfilippo &lt;LinoSanfilippo@gmx.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: remove destroy_list from fsnotify_mark</title>
<updated>2014-12-13T20:42:53+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2014-12-13T00:58:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=37d469e7673a663cbf38360beb1eaa3224c9d272'/>
<id>urn:sha1:37d469e7673a663cbf38360beb1eaa3224c9d272</id>
<content type='text'>
destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed.  However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark-&gt;g_list for queueing into destroy_list.  This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Heinrich Schuchardt &lt;xypron.glpk@gmx.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
