<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/arch/x86/kernel/cpu/resctrl, branch master</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2020-02-08T21:26:41+00:00</updated>
<entry>
<title>Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2020-02-08T21:26:41+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-08T21:26:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=c9d35ee049b40f1d73e890bf88dd55f83b1e9be8'/>
<id>urn:sha1:c9d35ee049b40f1d73e890bf88dd55f83b1e9be8</id>
<content type='text'>
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context-&gt;log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
</content>
</entry>
<entry>
<title>fs_parse: fold fs_parameter_desc/fs_parameter_spec</title>
<updated>2020-02-07T19:48:37+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-09-07T11:23:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d7167b149943e38ad610191ecbb0800c78bbced9'/>
<id>urn:sha1:d7167b149943e38ad610191ecbb0800c78bbced9</id>
<content type='text'>
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fs_parser: remove fs_parameter_description name field</title>
<updated>2020-02-07T19:48:36+00:00</updated>
<author>
<name>Eric Sandeen</name>
<email>sandeen@sandeen.net</email>
</author>
<published>2019-12-06T16:45:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=96cafb9ccb153f6a82ff2c9bde68916d9d65501e'/>
<id>urn:sha1:96cafb9ccb153f6a82ff2c9bde68916d9d65501e</id>
<content type='text'>
Unused now.

Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-01-31T19:05:33+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-01-31T19:05:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=b70a2d6b29f7c5b621bf83f903f26fee5fe28efc'/>
<id>urn:sha1:b70a2d6b29f7c5b621bf83f903f26fee5fe28efc</id>
<content type='text'>
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - three fixes and a cleanup for the resctrl code

   - a HyperV fix

   - a fix to /proc/kcore contents in live debugging sessions

   - a fix for the x86 decoder opcode map"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/decoder: Add TEST opcode to Group3-2
  x86/resctrl: Clean up unused function parameter in mkdir path
  x86/resctrl: Fix a deadlock due to inaccurate reference
  x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
  x86/resctrl: Fix use-after-free when deleting resource groups
  x86/hyper-v: Add "polling" bit to hv_synic_sint
  x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y
</content>
</entry>
<entry>
<title>Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-01-28T20:00:29+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-01-28T20:00:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=4244057c3da1dde07c0f338ea32fed6e79d6a657'/>
<id>urn:sha1:4244057c3da1dde07c0f338ea32fed6e79d6a657</id>
<content type='text'>
Pull x86 resource control updates from Ingo Molnar:
 "The main change in this tree is the extension of the resctrl procfs
  ABI with a new file that helps tooling to navigate from tasks back to
  resctrl groups: /proc/{pid}/cpu_resctrl_groups.

  Also fix static key usage for certain feature combinations and
  simplify the task exit resctrl case"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Add task resctrl information display
  x86/resctrl: Check monitoring static key in the MBM overflow handler
  x86/resctrl: Do not reconfigure exiting tasks
</content>
</entry>
<entry>
<title>x86/resctrl: Clean up unused function parameter in mkdir path</title>
<updated>2020-01-20T16:00:41+00:00</updated>
<author>
<name>Xiaochen Shen</name>
<email>xiaochen.shen@intel.com</email>
</author>
<published>2020-01-08T16:28:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=32ada3b9e04c6f6d4916967bd8bbe2450ad5bc5e'/>
<id>urn:sha1:32ada3b9e04c6f6d4916967bd8bbe2450ad5bc5e</id>
<content type='text'>
Commit

  334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference")

changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
within mkdir_rdt_prepare(). That change resulted in an unused function
parameter to mkdir_rdt_prepare().

Clean up the unused function parameter in mkdir_rdt_prepare() and its
callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon().

Signed-off-by: Xiaochen Shen &lt;xiaochen.shen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com
</content>
</entry>
<entry>
<title>x86/resctrl: Fix a deadlock due to inaccurate reference</title>
<updated>2020-01-20T15:57:53+00:00</updated>
<author>
<name>Xiaochen Shen</name>
<email>xiaochen.shen@intel.com</email>
</author>
<published>2020-01-08T16:28:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=334b0f4e9b1b4a1d475f803419d202f6c5e4d18e'/>
<id>urn:sha1:334b0f4e9b1b4a1d475f803419d202f6c5e4d18e</id>
<content type='text'>
There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:

$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus  cpus_list  mon_data  tasks

Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1

3 locks held by mkdir/48649:
 #0:  (sb_writers#17){.+.+}, at: [&lt;ffffffffb4ca2aa0&gt;] mnt_want_write+0x20/0x50
 #1:  (&amp;type-&gt;i_mutex_dir_key#8/1){+.+.}, at: [&lt;ffffffffb4c8c13b&gt;] filename_create+0x7b/0x170
 #2:  (rdtgroup_mutex){+.+.}, at: [&lt;ffffffffb4a4389d&gt;] rdtgroup_kn_lock_live+0x3d/0x70

4 locks held by rmdir/48652:
 #0:  (sb_writers#17){.+.+}, at: [&lt;ffffffffb4ca2aa0&gt;] mnt_want_write+0x20/0x50
 #1:  (&amp;type-&gt;i_mutex_dir_key#8/1){+.+.}, at: [&lt;ffffffffb4c8c3cf&gt;] do_rmdir+0x13f/0x1e0
 #2:  (&amp;type-&gt;i_mutex_dir_key#8){++++}, at: [&lt;ffffffffb4c86d5d&gt;] vfs_rmdir+0x4d/0x120
 #3:  (rdtgroup_mutex){+.+.}, at: [&lt;ffffffffb4a4389d&gt;] rdtgroup_kn_lock_live+0x3d/0x70

Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.

Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.

As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.

Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
-------------------------   -------------------------
                            kernfs_iop_mkdir
                              /*
                               * kn: "m1", parent_kn: "mon_groups",
                               * prgrp_kn: parent_kn-&gt;parent: "c1",
                               *
                               * "mon_groups", parent_kn-&gt;active++: 1
                               */
                              kernfs_get_active(parent_kn)
kernfs_iop_rmdir
  /* "c1", kn-&gt;active++ */
  kernfs_get_active(kn)

  rdtgroup_kn_lock_live
    atomic_inc(&amp;rdtgrp-&gt;waitcount)
    /* "c1", kn-&gt;active-- */
    kernfs_break_active_protection(kn)
    mutex_lock

  rdtgroup_rmdir_ctrl
    free_all_child_rdtgrp
      sentry-&gt;flags = RDT_DELETED

    rdtgroup_ctrl_remove
      rdtgrp-&gt;flags = RDT_DELETED
      kernfs_get(kn)
      kernfs_remove(rdtgrp-&gt;kn)
        __kernfs_remove
          /* "mon_groups", sub_kn */
          atomic_add(KN_DEACTIVATED_BIAS, &amp;sub_kn-&gt;active)
          kernfs_drain(sub_kn)
            /*
             * sub_kn-&gt;active == KN_DEACTIVATED_BIAS + 1,
             * waiting on sub_kn-&gt;active to drop, but it
             * never drops in Thread 2 which is blocked
             * on getting rdtgroup_mutex.
             */
Thread 1 hangs here ----&gt;
            wait_event(sub_kn-&gt;active == KN_DEACTIVATED_BIAS)
            ...
                              rdtgroup_mkdir
                                rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                  mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                    rdtgroup_kn_lock_live(prgrp_kn)
                                      atomic_inc(&amp;rdtgrp-&gt;waitcount)
                                      /*
                                       * "c1", prgrp_kn-&gt;active--
                                       *
                                       * The active reference on "c1" is
                                       * dropped, but not matching the
                                       * actual active reference taken
                                       * on "mon_groups", thus causing
                                       * Thread 1 to wait forever while
                                       * holding rdtgroup_mutex.
                                       */
                                      kernfs_break_active_protection(
                                                               prgrp_kn)
                                      /*
                                       * Trying to get rdtgroup_mutex
                                       * which is held by Thread 1.
                                       */
Thread 2 hangs here ----&gt;             mutex_lock
                                      ...

The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn-&gt;priv of "mon_groups" is NULL instead of pointing to the rdtgrp.

Fix it by pointing kn-&gt;priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.

This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.

Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Xiaochen Shen &lt;xiaochen.shen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
</content>
</entry>
<entry>
<title>x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup</title>
<updated>2020-01-20T15:56:11+00:00</updated>
<author>
<name>Xiaochen Shen</name>
<email>xiaochen.shen@intel.com</email>
</author>
<published>2020-01-08T16:28:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=074fadee59ee7a9d2b216e9854bd4efb5dad679f'/>
<id>urn:sha1:074fadee59ee7a9d2b216e9854bd4efb5dad679f</id>
<content type='text'>
There is a race condition in the following scenario which results in an
use-after-free issue when reading a monitoring file and deleting the
parent ctrl_mon group concurrently:

Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls
kernfs_break_active_protection() to drop the active reference of kernfs
node in rdtgroup_kn_lock_live().

In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub
kernfs nodes to drop the active reference when removing all subtree
kernfs nodes recursively. Thread 2 could block on kernfs_remove() until
Thread 1 calls kernfs_break_active_protection(). Only after
kernfs_remove() completes the refcount of rdtgrp could be trusted.

Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(),
Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0
instead of 1 due to the race.

In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory
(rdtgrp-&gt;waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_mondata_show)  Thread 2 (rdtgroup_rmdir)
--------------------------------  -------------------------
rdtgroup_kn_lock_live
  /*
   * kn active protection until
   * kernfs_break_active_protection(kn)
   */
  rdtgrp = kernfs_to_rdtgroup(kn)
                                  rdtgroup_kn_lock_live
                                    atomic_inc(&amp;rdtgrp-&gt;waitcount)
                                    mutex_lock
                                  rdtgroup_rmdir_ctrl
                                    free_all_child_rdtgrp
                                      /*
                                       * sentry-&gt;waitcount should be 1
                                       * but is 0 now due to the race.
                                       */
                                      kfree(sentry)*[1]
  /*
   * Only after kernfs_remove()
   * completes, the refcount of
   * rdtgrp could be trusted.
   */
  atomic_inc(&amp;rdtgrp-&gt;waitcount)
  /* kn-&gt;active-- */
  kernfs_break_active_protection(kn)
                                    rdtgroup_ctrl_remove
                                      rdtgrp-&gt;flags = RDT_DELETED
                                      /*
                                       * Blocking routine, wait for
                                       * all sub kernfs nodes to drop
                                       * active reference in
                                       * kernfs_break_active_protection.
                                       */
                                      kernfs_remove(rdtgrp-&gt;kn)
                                  rdtgroup_kn_unlock
                                    mutex_unlock
                                    atomic_dec_and_test(
                                                &amp;rdtgrp-&gt;waitcount)
                                    &amp;&amp; (flags &amp; RDT_DELETED)
                                      kernfs_unbreak_active_protection(kn)
                                      kfree(rdtgrp)
  mutex_lock
mon_event_read
rdtgroup_kn_unlock
  mutex_unlock
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1].
   */
  atomic_dec_and_test(&amp;rdtgrp-&gt;waitcount)
  &amp;&amp; (flags &amp; RDT_DELETED)
    /* kn-&gt;active++ */
    kernfs_unbreak_active_protection(kn)
    kfree(rdtgrp)

Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in
rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp.

Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Xiaochen Shen &lt;xiaochen.shen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com
</content>
</entry>
<entry>
<title>x86/resctrl: Fix use-after-free when deleting resource groups</title>
<updated>2020-01-20T15:45:43+00:00</updated>
<author>
<name>Xiaochen Shen</name>
<email>xiaochen.shen@intel.com</email>
</author>
<published>2020-01-08T16:28:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=b8511ccc75c033f6d54188ea4df7bf1e85778740'/>
<id>urn:sha1:b8511ccc75c033f6d54188ea4df7bf1e85778740</id>
<content type='text'>
A resource group (rdtgrp) contains a reference count (rdtgrp-&gt;waitcount)
that indicates how many waiters expect this rdtgrp to exist. Waiters
could be waiting on rdtgroup_mutex or some work sitting on a task's
workqueue for when the task returns from kernel mode or exits.

The deletion of a rdtgrp is intended to have two phases:

  (1) while holding rdtgroup_mutex the necessary cleanup is done and
  rdtgrp-&gt;flags is set to RDT_DELETED,

  (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
  only if there are no waiters and its flag is set to RDT_DELETED. Upon
  gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
  for the RDT_DELETED flag.

When unmounting the resctrl file system or deleting ctrl_mon groups,
all of the subdirectories are removed and the data structure of rdtgrp
is forcibly freed without checking rdtgrp-&gt;waitcount. If at this point
there was a waiter on rdtgrp then a use-after-free issue occurs when the
waiter starts running and accesses the rdtgrp structure it was waiting
on.

See kfree() calls in [1], [2] and [3] in these two call paths in
following scenarios:
(1) rdt_kill_sb() -&gt; rmdir_all_sub() -&gt; free_all_child_rdtgrp()
(2) rdtgroup_rmdir() -&gt; rdtgroup_rmdir_ctrl() -&gt; free_all_child_rdtgrp()

There are several scenarios that result in use-after-free issue in
following:

Scenario 1:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdt_kill_sb() is finished, referring to earlier rdtgrp memory
(rdtgrp-&gt;waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdt_kill_sb)
-------------------------------        ----------------------
rdtgroup_kn_lock_live
  atomic_inc(&amp;rdtgrp-&gt;waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&amp;rdtgrp-&gt;waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&amp;rdtgrp-&gt;waitcount)
  &amp;&amp; (flags &amp; RDT_DELETED)
                                       mutex_lock
                                       rmdir_all_sub
                                         /*
                                          * sentry and rdtgrp are freed
                                          * without checking refcount
                                          */
                                         free_all_child_rdtgrp
                                           kfree(sentry)*[1]
                                         kfree(rdtgrp)*[2]
                                       mutex_unlock
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1] or [2].
   */
  atomic_dec_and_test(&amp;rdtgrp-&gt;waitcount)
  &amp;&amp; (flags &amp; RDT_DELETED)
    kfree(rdtgrp)

Scenario 2:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
(rdtgrp-&gt;waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdtgroup_rmdir)
-------------------------------        -------------------------
rdtgroup_kn_lock_live
  atomic_inc(&amp;rdtgrp-&gt;waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&amp;rdtgrp-&gt;waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&amp;rdtgrp-&gt;waitcount)
  &amp;&amp; (flags &amp; RDT_DELETED)
                                       rdtgroup_kn_lock_live
                                         atomic_inc(&amp;rdtgrp-&gt;waitcount)
                                         mutex_lock
                                       rdtgroup_rmdir_ctrl
                                         free_all_child_rdtgrp
                                           /*
                                            * sentry is freed without
                                            * checking refcount
                                            */
                                           kfree(sentry)*[3]
                                         rdtgroup_ctrl_remove
                                           rdtgrp-&gt;flags = RDT_DELETED
                                       rdtgroup_kn_unlock
                                         mutex_unlock
                                         atomic_dec_and_test(
                                                     &amp;rdtgrp-&gt;waitcount)
                                         &amp;&amp; (flags &amp; RDT_DELETED)
                                           kfree(rdtgrp)
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [3].
   */
  atomic_dec_and_test(&amp;rdtgrp-&gt;waitcount)
  &amp;&amp; (flags &amp; RDT_DELETED)
    kfree(rdtgrp)

If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
like following. Note that "0x6b" is POISON_FREE after kfree(). The
corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
waitcount member of struct rdtgroup which was freed:

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
  420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
  Single bit error detected. Probably bad RAM.
  Run memtest86+ or a similar memory test tool.
  Next obj: start=ffff9504c5b0d800, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048
  420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkdkkkkkkkkkkk
  Prev obj: start=ffff9504c58ab000, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

Fix this by taking reference count (waitcount) of rdtgrp into account in
the two call paths that currently do not do so. Instead of always
freeing the resource group it will only be freed if there are no waiters
on it. If there are waiters, the resource group will have its flags set
to RDT_DELETED.

It will be left to the waiter to free the resource group when it starts
running and finding that it was the last waiter and the resource group
has been removed (rdtgrp-&gt;flags &amp; RDT_DELETED) since. (1) rdt_kill_sb()
-&gt; rmdir_all_sub() -&gt; free_all_child_rdtgrp() (2) rdtgroup_rmdir() -&gt;
rdtgroup_rmdir_ctrl() -&gt; free_all_child_rdtgrp()

Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Xiaochen Shen &lt;xiaochen.shen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
</content>
</entry>
<entry>
<title>x86/resctrl: Add task resctrl information display</title>
<updated>2020-01-20T15:19:10+00:00</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2020-01-15T09:28:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=e79f15a4598c1f3f3f7f3319ca308c63c91fdaf2'/>
<id>urn:sha1:e79f15a4598c1f3f3f7f3319ca308c63c91fdaf2</id>
<content type='text'>
Monitoring tools that want to find out which resctrl control and monitor
groups a task belongs to must currently read the "tasks" file in every
group until they locate the process ID.

Add an additional file /proc/{pid}/cpu_resctrl_groups to provide this
information:

1)   res:
     mon:

resctrl is not available.

2)   res:/
     mon:

Task is part of the root resctrl control group, and it is not associated
to any monitor group.

3)  res:/
    mon:mon0

Task is part of the root resctrl control group and monitor group mon0.

4)  res:group0
    mon:

Task is part of resctrl control group group0, and it is not associated
to any monitor group.

5) res:group0
   mon:mon1

Task is part of resctrl control group group0 and monitor group mon1.

Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Tested-by: Jinshi Chen &lt;jinshi.chen@intel.com&gt;
Link: https://lkml.kernel.org/r/20200115092851.14761-1-yu.c.chen@intel.com
</content>
</entry>
</feed>
