summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/gt/intel_hangcheck.c
Commit message (Collapse)AuthorAgeFilesLines
* drm/i915/gt: Replace hangcheck by heartbeatsChris Wilson2019-10-231-361/+0
| | | | | | | | | | | | | | | Replace sampling the engine state every so often with a periodic heartbeat request to measure the health of an engine. This is coupled with the forced-preemption to allow long running requests to survive so long as they do not block other users. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-5-chris@chris-wilson.co.uk
* drm/i915: Pass in intel_gt at some for_each_engine sitesTvrtko Ursulin2019-10-181-2/+2
| | | | | | | | | | | | | | Where the function, or code segment, operates on intel_gt, we need to start passing it instead of i915 to for_each_engine(_masked). This is another partial step in migration of i915->engines[] to gt->engines[]. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017094500.21831-2-tvrtko.ursulin@linux.intel.com
* drm/i915: Make for_each_engine_masked work on intel_gtTvrtko Ursulin2019-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Medium term goal is to eliminate the i915->engine[] array and to get there we have recently introduced equivalent array in intel_gt. Now we need to migrate the code further towards this state. This next step is to eliminate usage of i915->engines[] from the for_each_engine_masked iterator. For this to work we also need to use engine->id as index when populating the gt->engine[] array and adjust the default engine set indexing to use engine->legacy_idx instead of assuming gt->engines[] indexing. v2: * Populate gt->engine[] earlier. * Check that we don't duplicate engine->legacy_idx v3: * Work around the initialization order issue between default_engines() and intel_engines_driver_register() which sets engine->legacy_idx for now. It will be fixed properly later. v4: * Merge with forgotten v2.5. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017161852.8836-1-tvrtko.ursulin@linux.intel.com
* drm/i915/gt: Prefer local path to runtime powermanagementChris Wilson2019-10-071-2/+2
| | | | | | | | | | | | | Avoid going to the base i915 device when we already have a path from gt to the runtime powermanagement interface. The benefit is that it looks a bit more self-consistent to always be acquiring the gt->uncore->rpm for use with the gt->uncore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191007154531.1750-1-chris@chris-wilson.co.uk
* drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()Sebastian Andrzej Siewior2019-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt handler and for that reason it invokes (as an optimisation) only spin_lock() for locking assuming that the interrupts are already disabled. The function intel_engine_signal_breadcrumbs() is provided to disable interrupts while the former function is invoked so that assumption is also true for callers from preemptible context. On PREEMPT_RT local_irq_disable() really disables interrupts and this forbids to invoke spin_lock() which becomes a sleeping spinlock. This is also problematic with `threadirqs' in conjunction with irq_work. With force threading the interrupt handler, the handler is invoked with disabled BH but with interrupts enabled. This is okay and the lock itself is never acquired in IRQ context. This changes with irq_work (signal_irq_work()) which _still_ invokes intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this and complain. Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave() suffix and let all callers invoke intel_engine_breadcrumbs_irq() directly instead using intel_engine_signal_breadcrumbs(). Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190926105644.16703-2-bigeasy@linutronix.de
* drm/i915: Refactor instdone loops on new subslice functionsStuart Summers2019-08-231-1/+2
| | | | | | | | | | Refactor instdone loops to use the new intel_sseu_has_subslice function. Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-10-stuart.summers@intel.com
* drm/i915/gt: Use intel_gt as the primary object for handling resetsChris Wilson2019-07-121-26/+42
| | | | | | | | | | | | | Having taken the first step in encapsulating the functionality by moving the related files under gt/, the next step is to start encapsulating by passing around the relevant structs rather than the global drm_i915_private. In this step, we pass intel_gt to intel_reset.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
* drm/i915/hangcheck: Look at instdone for all enginesTvrtko Ursulin2019-07-041-3/+0
| | | | | | | | | | | | | | | | It seems intel_engine_get_instdone is able to get instdone for all engines but intel_hangcheck.c/subunits_stuck decides to ignore it for non render. We can just drop the check in subunits_stuck since the checks on unavailable fields will always return stuck, which when bitwise and with the potential unstuck instdone is harmless. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190703144116.15593-1-tvrtko.ursulin@linux.intel.com
* drm/i915: update rpm_get/put to use the rpm structureDaniele Ceraolo Spurio2019-06-141-2/+2
| | | | | | | | | | | | | | The functions where internally already only using the structure, so we need to just flip the interface. v2: rebase Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com
* drm/i915: Tidy engine mask types in hangcheckTvrtko Ursulin2019-06-071-3/+3
| | | | | | | | We can use intel_engine_mask_t to align with the rest of the codebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190607082557.31670-2-tvrtko.ursulin@linux.intel.com
* Revert "drm/i915: Expand subslice mask"Jani Nikula2019-05-291-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 1ac159e23c2c ("drm/i915: Expand subslice mask"), which kills ICL due to GEM_BUG_ON() sanity checks before CI even gets a chance to do anything. The commit exposes an issue in commit 1e40d4aea57b ("drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads"), which will also need to be addressed. There's a proposed fix [1], but considering the seeming uncertainty with the fix as well as the size of the regressing commit (in this context, the one that actually brings down ICL), this warrants a revert to get ICL working, and gives us time to get all of this right without rushing. Even if this means shooting the messenger. <3>[ 9.426327] intel_sseu_get_subslices:46 GEM_BUG_ON(slice >= sseu->max_slices) <4>[ 9.426355] ------------[ cut here ]------------ <2>[ 9.426357] kernel BUG at drivers/gpu/drm/i915/gt/intel_sseu.c:46! <4>[ 9.426371] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <4>[ 9.426377] CPU: 1 PID: 364 Comm: systemd-udevd Not tainted 5.2.0-rc2-CI-CI_DRM_6159+ #1 <4>[ 9.426385] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4>[ 9.426444] RIP: 0010:intel_sseu_get_subslices+0x8a/0xe0 [i915] <4>[ 9.426452] Code: d5 76 b7 e0 48 8b 35 9d 24 21 00 49 c7 c0 07 f0 72 a0 b9 2e 00 00 00 48 c7 c2 00 8e 6d a0 48 c7 c7 a5 14 5b a0 e8 36 3c be e0 <0f> 0b 48 c7 c1 80 d5 6f a0 ba 30 00 00 00 48 c7 c6 00 8e 6d a0 48 <4>[ 9.426468] RSP: 0018:ffffc9000037b9c8 EFLAGS: 00010282 <4>[ 9.426475] RAX: 000000000000000f RBX: 0000000000000000 RCX: 0000000000000000 <4>[ 9.426482] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849e346f98 <4>[ 9.426490] RBP: ffff88848a200000 R08: 0000000000000004 R09: ffff88849d50b000 <4>[ 9.426497] R10: 0000000000000000 R11: ffff88849e346f98 R12: ffff88848a209e78 <4>[ 9.426505] R13: 0000000003000000 R14: ffff88848a20b1a8 R15: 0000000000000000 <4>[ 9.426513] FS: 00007f73d5ae8680(0000) GS:ffff88849fc80000(0000) knlGS:0000000000000000 <4>[ 9.426521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 9.426527] CR2: 0000561417b01260 CR3: 0000000494764003 CR4: 0000000000760ee0 <4>[ 9.426535] PKRU: 55555554 <4>[ 9.426538] Call Trace: <4>[ 9.426585] wa_init_mcr+0xd5/0x110 [i915] <4>[ 9.426597] ? lock_acquire+0xa6/0x1c0 <4>[ 9.426645] icl_gt_workarounds_init+0x21/0x1a0 [i915] <4>[ 9.426694] ? i915_driver_load+0xfcf/0x18a0 [i915] <4>[ 9.426739] gt_init_workarounds+0x14c/0x230 [i915] <4>[ 9.426748] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 9.426789] intel_gt_init_workarounds+0x1b/0x30 [i915] <4>[ 9.426835] i915_driver_load+0xfd7/0x18a0 [i915] <4>[ 9.426843] ? lock_acquire+0xa6/0x1c0 <4>[ 9.426850] ? __pm_runtime_resume+0x4f/0x80 <4>[ 9.426857] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 9.426863] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 9.426870] ? lockdep_hardirqs_on+0xe3/0x1b0 <4>[ 9.426915] i915_pci_probe+0x29/0xa0 [i915] <4>[ 9.426923] pci_device_probe+0x9e/0x120 <4>[ 9.426930] really_probe+0xea/0x3c0 <4>[ 9.426936] driver_probe_device+0x10b/0x120 <4>[ 9.426942] device_driver_attach+0x4a/0x50 <4>[ 9.426948] __driver_attach+0x97/0x130 <4>[ 9.426954] ? device_driver_attach+0x50/0x50 <4>[ 9.426960] bus_for_each_dev+0x74/0xc0 <4>[ 9.426966] bus_add_driver+0x13f/0x210 <4>[ 9.426971] ? 0xffffffffa083b000 <4>[ 9.426976] driver_register+0x56/0xe0 <4>[ 9.426982] ? 0xffffffffa083b000 <4>[ 9.426987] do_one_initcall+0x58/0x300 <4>[ 9.426994] ? do_init_module+0x1d/0x1f6 <4>[ 9.427001] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 9.427007] ? kmem_cache_alloc_trace+0x261/0x290 <4>[ 9.427014] do_init_module+0x56/0x1f6 <4>[ 9.427020] load_module+0x24d1/0x2990 <4>[ 9.427032] ? __se_sys_finit_module+0xd3/0xf0 <4>[ 9.427037] __se_sys_finit_module+0xd3/0xf0 <4>[ 9.427047] do_syscall_64+0x55/0x1c0 <4>[ 9.427053] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 9.427059] RIP: 0033:0x7f73d5609839 <4>[ 9.427064] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 <4>[ 9.427082] RSP: 002b:00007ffdf34477b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 9.427091] RAX: ffffffffffffffda RBX: 00005559fd5d7b40 RCX: 00007f73d5609839 <4>[ 9.427099] RDX: 0000000000000000 RSI: 00007f73d52e8145 RDI: 000000000000000f <4>[ 9.427106] RBP: 00007f73d52e8145 R08: 0000000000000000 R09: 00007ffdf34478d0 <4>[ 9.427114] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 <4>[ 9.427121] R13: 00005559fd5c90f0 R14: 0000000000020000 R15: 00005559fd5d7b40 <4>[ 9.427131] Modules linked in: i915(+) mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep e1000e snd_hda_core ghash_clmulni_intel ptp snd_pcm cdc_ether usbnet mii pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc <4>[ 9.427254] ---[ end trace af3eeb543bd66e66 ]--- [1] http://patchwork.freedesktop.org/patch/msgid/20190528200655.11605-1-chris@chris-wilson.co.uk References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6159/fi-icl-u2/pstore0-1517155098_Oops_1.log References: 1e40d4aea57b ("drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads") Fixes: 1ac159e23c2c ("drm/i915: Expand subslice mask") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Yunwei Zhang <yunwei.zhang@intel.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190529082150.31526-1-jani.nikula@intel.com
* drm/i915: Expand subslice maskStuart Summers2019-05-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the subslice_mask runtime parameter is stored as an array of subslices per slice. Expand the subslice mask array to better match what is presented to userspace through the I915_QUERY_TOPOLOGY_INFO ioctl. The index into this array is then calculated: slice * subslice stride + subslice index / 8 v2: fix spacing in set_sseu_info args use set_sseu_info to initialize sseu data when building device status in debugfs rename variables in intel_engine_types.h to avoid checkpatch warnings v3: update headers in intel_sseu.h v4: add const to some sseu_dev_info variables use sseu->eu_stride for EU stride calculations v5: address review comments from Tvrtko and Daniele v6: remove extra space in intel_sseu_get_subslices return the correct subslice enable in for_each_instdone add GEM_BUG_ON to ensure user doesn't pass invalid ss_mask size use printk formatted string for subslice mask v7: remove string.h header and rebase Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190524154022.13575-6-stuart.summers@intel.com
* drm/i915/hangcheck: Replace hangcheck.seqno with RING_HEADChris Wilson2019-05-081-4/+4
| | | | | | | | | | | | | | After realising we need to sample RING_START to detect context switches from preemption events that do not allow for the seqno to advance, we can also realise that the seqno itself is just a distance along the ring and so can be replaced by sampling RING_HEAD. v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk
* drm/i915/hangcheck: Track context changesChris Wilson2019-05-031-3/+9
| | | | | | | | | | | | | | Given sufficient preemption, we may see a busy system that doesn't advance seqno while performing work across multiple contexts, and given sufficient pathology not even notice a change in ACTHD. What does change between the preempting contexts is their RING, so take note of that and treat a change in the ring address as being an indication of forward progress. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-1-chris@chris-wilson.co.uk
* drm/i915: Invert the GEM wakeref hierarchyChris Wilson2019-04-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
* drm/i915: Move GraphicsTechnology files under gt/Chris Wilson2019-04-241-0/+334
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
OpenPOWER on IntegriCloud