<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/drivers/gpu/drm/i915/gt, branch master</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2020-02-14T03:04:46+00:00</updated>
<entry>
<title>Merge tag 'drm-intel-next-fixes-2020-02-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes</title>
<updated>2020-02-14T03:04:46+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2020-02-14T03:03:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=6f4134b30b6ee33e2fd4d602099e6c5e60d0351a'/>
<id>urn:sha1:6f4134b30b6ee33e2fd4d602099e6c5e60d0351a</id>
<content type='text'>
drm/i915 fixes for v5.6-rc2

Most of these were aimed at a "next fixes" pull already during the merge
window, but there were issues with the baseline I used, which resulted
in a lot of issues in CI. I've regenerated this stuff piecemeal now,
adding gradually to it, and it seems healthy now.

Due to the issues this is much bigger than I'd like. But it was
obviously necessary to take the time to ensure it's not garbage...

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;

From: Jani Nikula &lt;jani.nikula@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/878sl6yfrn.fsf@intel.com
</content>
</entry>
<entry>
<title>drm/i915/execlists: Reclaim the hanging virtual request</title>
<updated>2020-02-12T15:03:53+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-22T14:02:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a2f90f4ff3746c92896e2b7af8763d6fe5206dbc'/>
<id>urn:sha1:a2f90f4ff3746c92896e2b7af8763d6fe5206dbc</id>
<content type='text'>
If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.

v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine

Fixes: ad18ba7b5eeb ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Mika Kuoppala &lt;mika.kuoppala@linux.intel.com&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd2abe566521e61d1aebf603eb013b7f)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/execlists: Take a reference while capturing the guilty request</title>
<updated>2020-02-12T15:02:53+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-22T14:02:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=317e0395cc230736e474e499e01992bcdace5a73'/>
<id>urn:sha1:317e0395cc230736e474e499e01992bcdace5a73</id>
<content type='text'>
Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.

Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.

Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).

Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1d8e38d6927967ae1a3271a6ab7a927)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/execlists: Offline error capture</title>
<updated>2020-02-12T14:55:58+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-16T18:47:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=ad18ba7b5eebf58209a898de8519f6bb2280620b'/>
<id>urn:sha1:ad18ba7b5eebf58209a898de8519f6bb2280620b</id>
<content type='text'>
Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Mika Kuoppala &lt;mika.kuoppala@linux.intel.com&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386afb235e11616098d2c7772e49776b58)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/gt: Allow temporary suspension of inflight requests</title>
<updated>2020-02-12T14:55:58+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-16T18:47:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=c3f1ed90e6ffbf4e22010522351779f920e53d0d'/>
<id>urn:sha1:c3f1ed90e6ffbf4e22010522351779f920e53d0d</id>
<content type='text'>
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd74496f0c33644125fb69ff175859b1f)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915: Keep track of request among the scheduling lists</title>
<updated>2020-02-12T14:55:58+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-16T18:47:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=9e2750fc80b5cc606365201132d49fed00570dd1'/>
<id>urn:sha1:9e2750fc80b5cc606365201132d49fed00570dd1</id>
<content type='text'>
If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Mika Kuoppala &lt;mika.kuoppala@linux.intel.com&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/gt: Acquire ce-&gt;active before ce-&gt;pin_count/ce-&gt;pin_mutex</title>
<updated>2020-02-12T11:24:45+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-27T15:28:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=5b92415e64e145e7da60420ead66b62aa41917bf'/>
<id>urn:sha1:5b92415e64e145e7da60420ead66b62aa41917bf</id>
<content type='text'>
Similar to commit ac0e331a628b ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.

&lt;2&gt; [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
&lt;4&gt; [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
&lt;4&gt; [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
&lt;4&gt; [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
&lt;4&gt; [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
&lt;4&gt; [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 &lt;0f&gt; 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
&lt;4&gt; [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
&lt;4&gt; [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
&lt;4&gt; [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
&lt;4&gt; [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
&lt;4&gt; [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
&lt;4&gt; [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
&lt;4&gt; [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
&lt;4&gt; [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt; [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
&lt;4&gt; [198.669872] Call Trace:
&lt;4&gt; [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
&lt;4&gt; [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
&lt;4&gt; [198.670024]  i915_request_create+0x86/0x1c0 [i915]
&lt;4&gt; [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
&lt;4&gt; [198.670082]  ? __lock_acquire+0x460/0x15d0
&lt;4&gt; [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
&lt;4&gt; [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
&lt;4&gt; [198.670181]  drm_ioctl_kernel+0xa7/0xf0
&lt;4&gt; [198.670188]  drm_ioctl+0x2e1/0x390
&lt;4&gt; [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]

Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
(cherry picked from commit e5429340bfa2dc43a07c3329e0c30cdae4cc0b35)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/execlists: Leave resetting ring to intel_ring</title>
<updated>2020-02-11T09:49:16+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-15T17:58:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a754012b9f2323a5d640da7eb7b095ac3b8cd012'/>
<id>urn:sha1:a754012b9f2323a5d640da7eb7b095ac3b8cd012</id>
<content type='text'>
We need to allow concurrent intel_context_unpin, which means avoiding
doing destructive operations like intel_ring_reset(). This was already
fixed for intel_ring_unpin() in commit 0725d9a31869 ("drm/i915/gt: Make
intel_ring_unpin() safe for concurrent pint"), but I overlooked that
execlists_context_unpin() also made the same mistake.

Reported-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Mika Kuoppala &lt;mika.kuoppala@linux.intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk
(cherry picked from commit f3c0efc9fe7a4e61544034f525348a3aa86ac5aa)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/gt: Use the BIT when checking the flags, not the index</title>
<updated>2020-02-10T12:45:43+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-15T12:25:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=1b5af53781654706e1a4ee479274f8a3e3f74c01'/>
<id>urn:sha1:1b5af53781654706e1a4ee479274f8a3e3f74c01</id>
<content type='text'>
In converting over to using set_bit()/test_bit(), when manually
inspecting the rq-&gt;fence.flags, we need to use BIT().

Fixes: e1c31fb5dde3 ("drm/i915: Merge i915_request.flags with i915_request.fence.flags")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk
(cherry picked from commit 72ff2b8d5f2dcb09bfa37b902c23311eec426496)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/i915/selftests: Add a mock i915_vma to the mock_ring</title>
<updated>2020-02-10T12:45:37+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2020-01-14T16:00:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=1fdea0cb0dba0d42ffcfb619b349c1a2afa2492e'/>
<id>urn:sha1:1fdea0cb0dba0d42ffcfb619b349c1a2afa2492e</id>
<content type='text'>
Add a i915_vma to the mock_engine/mock_ring so that the core code can
always assume the presence of ring-&gt;vma.

Fixes: 8ccfc20a7d56 ("drm/i915/gt: Mark ring-&gt;vma as active while pinned")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Mika Kuoppala &lt;mika.kuoppala@linux.intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20200114160030.2468927-1-chris@chris-wilson.co.uk
(cherry picked from commit b63b4feaef7363d2cf46dd76bb6e87e060b2b0de)
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
</content>
</entry>
</feed>
