From 87229ad9de079cb12ee09a3dc16113c390b729d5 Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Wed, 19 Sep 2012 11:12:41 +1000 Subject: drm: micro optimise cache flushing We hit this a lot with i915 and although we'd like to engineer things to hit it a lot less, this commit at least makes it consume a few less cycles. from something containing movzwl 0x0(%rip),%r10d to add %r8,%rdx I only noticed it while using perf to profile something else. Reviewed-by: Chris Wilson Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_cache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/gpu/drm/drm_cache.c') diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 08758e061478..3dbc7f17eb11 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -37,12 +37,13 @@ drm_clflush_page(struct page *page) { uint8_t *page_virtual; unsigned int i; + const int size = boot_cpu_data.x86_clflush_size; if (unlikely(page == NULL)) return; page_virtual = kmap_atomic(page); - for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size) + for (i = 0; i < PAGE_SIZE; i += size) clflush(page_virtual + i); kunmap_atomic(page_virtual); } -- cgit v1.2.1 From 9da3da660d8c19a54f6e93361d147509be3fff84 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Fri, 1 Jun 2012 15:20:22 +0100 Subject: drm/i915: Replace the array of pages with a scatterlist Rather than have multiple data structures for describing our page layout in conjunction with the array of pages, we can migrate all users over to a scatterlist. One major advantage, other than unifying the page tracking structures, this offers is that we replace the vmalloc'ed array (which can be up to a megabyte in size) with a chain of individual pages which helps reduce memory pressure. The disadvantage is that we then do not have a simple array to iterate, or to access randomly. The common case for this is in the relocation processing, which will typically fit within a single scatterlist page and so be almost the same cost as the simple array. For iterating over the array, the extra function call could be optimised away, but in reality is an insignificant cost of either binding the pages, or performing the pwrite/pread. v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the trivial compile error from rebasing. Signed-off-by: Chris Wilson Signed-off-by: Daniel Vetter --- drivers/gpu/drm/drm_cache.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'drivers/gpu/drm/drm_cache.c') diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 3dbc7f17eb11..4a4274b348b6 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -100,6 +100,31 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) } EXPORT_SYMBOL(drm_clflush_pages); +void +drm_clflush_sg(struct sg_table *st) +{ +#if defined(CONFIG_X86) + if (cpu_has_clflush) { + struct scatterlist *sg; + int i; + + mb(); + for_each_sg(st->sgl, sg, st->nents, i) + drm_clflush_page(sg_page(sg)); + mb(); + + return; + } + + if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0) + printk(KERN_ERR "Timed out waiting for cache flush.\n"); +#else + printk(KERN_ERR "Architecture has no drm_cache.c support\n"); + WARN_ON_ONCE(1); +#endif +} +EXPORT_SYMBOL(drm_clflush_sg); + void drm_clflush_virt_range(char *addr, unsigned long length) { -- cgit v1.2.1