summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vfio: powerpc/spapr: Rework groups attachingAlexey Kardashevskiy2015-06-111-16/+24
| | | | | | | | | | | | | | This is to make extended ownership and multiple groups support patches simpler for review. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Moving pinning/unpinning to helpersAlexey Kardashevskiy2015-06-111-20/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pretty mechanical patch to make next patches simpler. New tce_iommu_unuse_page() helper does put_page() now but it might skip that after the memory registering patch applied. As we are here, this removes unnecessary checks for a value returned by pfn_to_page() as it cannot possibly return NULL. This moves tce_iommu_disable() later to let tce_iommu_clear() know if the container has been enabled because if it has not been, then put_page() must not be called on TCEs from the TCE table. This situation is not yet possible but it will after KVM acceleration patchset is applied. This changes code to work with physical addresses rather than linear mapping addresses for better code readability. Following patches will add an xchg() callback for an IOMMU table which will accept/return physical addresses (unlike current tce_build()) which will eliminate redundant conversions. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Disable DMA mappings on disabled containerAlexey Kardashevskiy2015-06-111-0/+6
| | | | | | | | | | | | | | | | At the moment DMA map/unmap requests are handled irrespective to the container's state. This allows the user space to pin memory which it might not be allowed to pin. This adds checks to MAP/UNMAP that the container is enabled, otherwise -EPERM is returned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Move locked_vm accounting to helpersAlexey Kardashevskiy2015-06-111-19/+63
| | | | | | | | | | | | | | | | | | | | | | | | | There moves locked pages accounting to helpers. Later they will be reused for Dynamic DMA windows (DDW). This reworks debug messages to show the current value and the limit. This stores the locked pages number in the container so when unlocking the iommu table pointer won't be needed. This does not have an effect now but it will with the multiple tables per container as then we will allow attaching/detaching groups on fly and we may end up having a container with no group attached but with the counter incremented. While we are here, update the comment explaining why RLIMIT_MEMLOCK might be required to be bigger than the guest RAM. This also prints pid of the current process in pr_warn/pr_debug. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Use it_page_sizeAlexey Kardashevskiy2015-06-111-13/+13
| | | | | | | | | | | | | | | | This makes use of the it_page_size from the iommu_table struct as page size can differ. This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code as recently introduced IOMMU_PAGE_XXX macros do not include IOMMU_PAGE_SHIFT. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Check that IOMMU page is fully contained by system pageAlexey Kardashevskiy2015-06-111-0/+16
| | | | | | | | | | | | | | | | | | This checks that the TCE table page size is not bigger that the size of a page we just pinned and going to put its physical address to the table. Otherwise the hardware gets unwanted access to physical memory between the end of the actual page and the end of the aligned up TCE page. Since compound_order() and compound_head() work correctly on non-huge pages, there is no need for additional check whether the page is huge. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driverAlexey Kardashevskiy2015-06-113-72/+67
| | | | | | | | | | | | | | | | | | | | | | This moves page pinning (get_user_pages_fast()/put_page()) code out of the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs to as the platform code does not deal with page pinning. This makes iommu_take_ownership()/iommu_release_ownership() deal with the IOMMU table bitmap only. This removes page unpinning from iommu_take_ownership() as the actual TCE table might contain garbage and doing put_page() on it is undefined behaviour. Besides the last part, the rest of the patch is mechanical. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/iommu: Always release iommu_table in iommu_free_table()Alexey Kardashevskiy2015-06-111-3/+5
| | | | | | | | | | | | | | | At the moment iommu_free_table() only releases memory if the table was initialized for the platform code use, i.e. it had it_map initialized (which purpose is to track DMA memory space use). With dynamic DMA windows, we will need to be able to release iommu_table even if it was used for VFIO in which case it_map is NULL so does the patch. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/iommu: Put IOMMU group explicitlyAlexey Kardashevskiy2015-06-113-8/+20
| | | | | | | | | | | | | | | | | | | | So far an iommu_table lifetime was the same as PE. Dynamic DMA windows will change this and iommu_free_table() will not always require the group to be released. This moves iommu_group_put() out of iommu_free_table(). This adds a iommu_pseries_free_table() helper which does iommu_group_put() and iommu_free_table(). Later it will be changed to receive a table_group and we will have to change less lines then. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv/ioda: Clean up IOMMU group registrationAlexey Kardashevskiy2015-06-111-20/+8
| | | | | | | | | | | | | The existing code has 3 calls to iommu_register_group() and all 3 branches actually cover all possible cases. This replaces 3 calls with one and moves the registration earlier; the latter will make more sense when we add TCE table sharing. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_groupAlexey Kardashevskiy2015-06-114-19/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The set_iommu_table_base_and_group() name suggests that the function sets table base and add a device to an IOMMU group. The actual purpose for table base setting is to put some reference into a device so later iommu_add_device() can get the IOMMU group reference and the device to the group. At the moment a group cannot be explicitly passed to iommu_add_device() as we want it to work from the bus notifier, we can fix it later and remove confusing calls of set_iommu_table_base(). This replaces set_iommu_table_base_and_group() with a couple of set_iommu_table_base() + iommu_add_device() which makes reading the code easier. This adds few comments why set_iommu_table_base() and iommu_add_device() are called where they are called. For IODA1/2, this essentially removes iommu_add_device() call from the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular place: - for physical PE, the device is already attached by iommu_add_device() in pnv_pci_ioda_setup_dma_pe(); - for virtual PE, the sysfs entries are not ready to create all symlinks so actual adding is happening in tce_iommu_bus_notifier. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU groupAlexey Kardashevskiy2015-06-112-25/+6
| | | | | | | | | | | | | | | This relies on the fact that a PCI device always has an IOMMU table which may not be the case when we get dynamic DMA windows so let's use more reliable check for IOMMU group here. As we do not rely on the table presence here, remove the workaround from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group parameter from pnv_ioda_setup_bus_dma(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* mtd: powernv: Add powernv flash MTD abstraction driverCyril Bur2015-06-113-0/+294
| | | | | | | | | | | | | | Powerpc powernv platforms allow access to certain system flash devices through a firmwarwe interface. This change adds an mtd driver for these flash devices. Minor updates from Jeremy Kerr and Joel Stanley. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Add trace point for tracking hash pte faultAneesh Kumar K.V2015-06-102-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables us to understand how many hash fault we are taking when running benchmarks. For ex: -bash-4.2# ./perf stat -e powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30 -P -n 1000 ... Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000': 1,10,04,075 powerpc:hash_fault 1,10,03,429 page-faults 30.865978991 seconds time elapsed NOTE: The impact of the tracepoint was not noticeable when running test. It was within the run-time variance of the test. For ex: without-patch: -------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.236562 task-clock (msec) # 0.928 CPUs utilized 2,179,213 stalled-cycles-frontend # 0.00% frontend cycles idle 17,174,367 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007794658 seconds time elapsed And with-patch: --------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.233746 task-clock (msec) # 0.921 CPUs utilized 0 context-switches # 0.000 K/sec 0.007854876 seconds time elapsed Performance counter stats for './a.out 3000 300': 643 page-faults # 0.087 M/sec 649 powerpc:hash_fault # 0.087 M/sec 7.430376 task-clock (msec) # 0.938 CPUs utilized 2,347,174 stalled-cycles-frontend # 0.00% frontend cycles idle 17,524,282 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007920284 seconds time elapsed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add gitignore file for the new DSCR testsAnshuman Khandual2015-06-071-0/+7
| | | | | | | This patch adds .gitignore for all the newly added DSCR tests. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add thread based stress test for DSCR sysfs interfacesMichael Ellerman2015-06-072-1/+82
| | | | | | | | | | This patch adds a test to update the system wide DSCR value repeatedly and then verifies that any thread on any given CPU on the system must be able to see the same DSCR value whether its is being read through the problem state based SPR or the privilege state based SPR. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for all DSCR sysfs interfacesAnshuman Khandual2015-06-072-1/+98
| | | | | | | | | | This test continuously updates the system wide DSCR default value in the sysfs interface and makes sure that the same is reflected across all the sysfs interfaces for each individual CPUs present on the system. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for DSCR inheritence across fork & execAnshuman Khandual2015-06-072-1/+118
| | | | | | | | | | | This patch adds a test case to verify that the changed DSCR value inside any process would be inherited to it's child across the fork and exec system call. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for DSCR value inheritence across forkAnshuman Khandual2015-06-072-1/+97
| | | | | | | | | | | This patch adds a test to verify that the changed DSCR value inside any process would be inherited to it's child process across the fork system call. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for DSCR SPR numbersAnshuman Khandual2015-06-072-1/+62
| | | | | | | | | | | | This patch adds a test which verifies that the DSCR privilege and problem state SPR read & write accesses while making sure that the results are always the same irrespective of which SPR number is being used. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for explicitly changing DSCR valueAnshuman Khandual2015-06-072-1/+72
| | | | | | | | | | | This patch adds a test which modifies the DSCR using mtspr instruction and verifies the change using mfspr instruction. It uses both the privilege state SPR as well as the problem state SPR for the purpose. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test for system wide DSCR defaultAnshuman Khandual2015-06-074-1/+267
| | | | | | | | | | | | This patch adds a test case for the system wide DSCR default value, which when changed through it's sysfs interface must be visible to all threads reading DSCR either through the privilege state SPR or the problem state SPR. The DSCR value change should be immediate as well. Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/dscr: Add documentation for DSCR supportAnshuman Khandual2015-06-072-0/+85
| | | | | | | | | | | This patch adds a new documentation file explaining the DSCR support on powerpc platforms. This explains DSCR related data structure, code paths and also available user interfaces. Any further functional changes to the DSCR support in the kernel should definitely update the documentation here. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/dscr: Add some in-code documentationAnshuman Khandual2015-06-072-0/+47
| | | | | | | | This patch adds some in-code documentation to the DSCR related code to make it more readable without having any functional change to it. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULTAnshuman Khandual2015-06-074-5/+5
| | | | | | | | | PACA_DSCR offset macro tracks dscr_default element in the paca structure. Better change the name of this macro to match that of the data element it tracks. Makes the code more readable. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Remove the unused extern dscr_defaultAnshuman Khandual2015-06-071-1/+0
| | | | | | | | | | | The process context switch code no longer uses dscr_default variable from the sysfs.c file. The variable became unused when we started storing the CPU specific DSCR value in the PACA structure instead. This patch just removes this extern declaration. It was originally added by the following commit. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Fix handling of DSCR related facility unavailable exceptionAnshuman Khandual2015-06-071-5/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently DSCR (Data Stream Control Register) can be accessed with mfspr or mtspr instructions inside a thread via two different SPR numbers. One being the user accessible problem state SPR number 0x03 and the other being the privilege state SPR number 0x11. All access through the privilege state SPR number get emulated through illegal instruction exception. Any access through the problem state SPR number raises one facility unavailable exception which sets the thread based dscr_inherit bit and enables DSCR facility through FSCR register thus allowing direct access to DSCR without going through this exception in the future. We set the thread.dscr_inherit bit whether the access was with mfspr or mtspr instruction which is neither correct nor does it match the behaviour through the instruction emulation code path driven from privilege state SPR number. User currently observes two different kind of behaviour when accessing the DSCR through these two SPR numbers. This problem can be observed through these two test cases by replacing the privilege state SPR number with the problem state SPR number. (1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c (2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c This patch fixes the problem by making sure that the behaviour visible to the user remains the same irrespective of which SPR number is being used. Inside facility unavailable exception, we check whether it was cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction, just emulate the instruction. In case of mtspr instruction, set the thread based dscr_inherit bit and also enable the facility through FSCR. All user SPR based mfspr instruction will be emulated till one user SPR based mtspr has been executed. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Reset default context for vPHB on releaseMichael Neuling2015-06-071-0/+1
| | | | | | | | When we release the device, we should also invalidate the default context. With this cxl_get_context() will return null after removal. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Fix trivial error in eeh_restore_dev_state()David Gibson2015-06-071-1/+1
| | | | | | | | | | | | | | | Commit 28158cd "powerpc/eeh: Enhance pcibios_set_pcie_reset_state()" introduced a fix for a problem where certain configurations could lead to pci_reset_function() destroying the state of PCI devices other than the one specified. Unfortunately, the fix has a trivial bug - it calls pci_save_state() again, when it should be calling pci_restore_state(). This corrects the problem. Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Add opal-prd channelJeremy Kerr2015-06-058-2/+535
| | | | | | | | | | | | | This change adds a char device to access the "PRD" (processor runtime diagnostics) channel to OPAL firmware. Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta & Vishal Kulkarni. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Expose OPAL APIs required by PRD interfaceJeremy Kerr2015-06-051-0/+4
| | | | | | | | The (upcoming) opal-prd driver needs to access the message notifier and xscom code, so add EXPORT_SYMBOL_GPL macros for these. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Merge common platform device initialisationJeremy Kerr2015-06-051-15/+6
| | | | | | | | | opal_ipmi_init and opal_flash_init are equivalent, except for the compatbile string. Merge these two into a common opal_pdev_init function. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/config: Enable bnx2x on ppc64 and pseries defconfigsAnton Blanchard2015-06-042-0/+2
| | | | | Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: convert OPAL codes returned by sysparam callsCédric Le Goater2015-06-042-4/+9
| | | | | | | | The opal_{get,set}_param calls return internal error codes which need to be translated in errnos in Linux. Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add AFU virtual PHB and kernel APIMichael Neuling2015-06-036-4/+822
| | | | | | | | | | | | | | | | | | | | | | | | | This patch does two things. Firstly it presents the Accelerator Function Unit (AFUs) behind the POWER Service Layer (PSL) as PCI devices on a virtual PCI Host Bridge (vPHB). This in in addition to the PSL being a PCI device itself. As part of the Coherent Accelerator Interface Architecture (CAIA) AFUs can provide an AFU configuration. This AFU configuration recored is architected to be the same as a PCI config space. This patch sets discovers the AFU configuration records, provides AFU config space read/write functions to these configuration records. It then enumerates the PCI bus. It also hooks in PCI ops where appropriate. It also destroys the vPHB when the physical card is removed. Secondly, it add an in kernel API for AFU to use CXL. AFUs must present a driver that firstly binds as a PCI device. This PCI device can then be using to do CXL specific operations (that can't sit in the PCI ops) using this API. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Export file ops for use by APIMichael Neuling2015-06-032-9/+23
| | | | | | | | | | | | The cxl kernel API will allow drivers other than cxl to export a file descriptor which has the same userspace API. These file descriptors will be able to be used against libcxl. This exports those file ops for use by other drivers. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Move include file cxl.h -> cxl-base.hMichael Neuling2015-06-0311-12/+12
| | | | | | | | | | | | | This moves the current include file from cxl.h -> cxl-base.h. This current include file is used only to pass information between the base driver that needs to be built into the kernel and the cxl module. This is to make way for a new include/misc/cxl.h which will contain just the kernel API for other driver to use Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Cleanup MakefileMichael Neuling2015-06-031-1/+2
| | | | | | | | Cleanup Makefile by fixing line wrapping. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Rework context lifetimesMichael Neuling2015-06-032-3/+3
| | | | | | | | | | | | | | This reworks contexts lifetimes a bit to enable the kernel API where we may want to reuse contexts. Here we will want to start and stop contexts without freeing them. Start context does the get pid & ctx so stop context will need to do the puts. Here we move put pid & ctx to the detach context path which will become part of the stop context path. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Configure PSL for kernel contexts and merge codeMichael Neuling2015-06-031-28/+35
| | | | | | | | | | | | | This updates AFU directed and dedicated modes for contexts attached to the kernel. The SR (similar to the MSR in the core) calculation is getting quite complex and is duplicated in AFU directed and dedicated modes. This patch also merges this SR calculation for these modes. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Split afu_register_irqs() functionMichael Neuling2015-06-032-7/+25
| | | | | | | | | Split the afu_register_irqs() function so that different parts can be useful elsewhere. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Only check pid for userspace contextsMichael Neuling2015-06-031-15/+19
| | | | | | | | | We only need to check the pid attached to this context for userspace contexts. Kernel contexts can skip this check. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Export some symbolsMichael Neuling2015-06-032-5/+10
| | | | | | | | | | Export some symbols which will soon be used elsewhere in this driver. Now they are global we rename them so to avoid collisions. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: cxl_afu_reset() -> __cxl_afu_reset()Michael Neuling2015-06-034-8/+8
| | | | | | | | | Rename cxl_afu_reset() to __cxl_afu_reset() to we can reuse this function name in the API. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Rework detach context functionsMichael Neuling2015-06-032-7/+14
| | | | | | | | | Rework __detach_context() and cxl_context_detach() so we can reuse them in the kernel API. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add cookie parameter to afu_release_irqs()Michael Neuling2015-06-034-5/+5
| | | | | | | | | | | | | Add cookie parameter to afu_release_irqs() so that we can pass in a different cookie than the context structure. This will be useful for other kernel drivers that want to call this but get their own cookie back in the interrupt handler. Update all existing call sites. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Dump debug info on the AFU configuration recordMichael Neuling2015-06-031-1/+11
| | | | | | | | | Now that we parse the AFU Configuration record, dump some info on it when in debug mode. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Fix error path on probeMichael Neuling2015-06-031-0/+1
| | | | | | | | | | | | | | When probing we call pci_enable_device() but don't call pci_disable_device() on fail. This causes refcounting issues in the PCI subsystem if a second driver tries to bind to the same device. This patch adds the pci_disable_device() to the probe error path. This error path is hit when this cxl driver tries to bind to AFUs (on the vPHB) rather than the physical device. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Re-order card init to check the VSEC earlierIan Munsie2015-06-031-15/+15
| | | | | | | | | | | | | | | | When we expose AFUs as virtual PCI devices, they may look like the physical CAPI PCI card. ie they may have the same vendor/device IDs. We want to avoid these AFUs binding to this driver and any init this driver may do. Re-order card init to check the VSEC earlier before assigning BARs or activating CXL. Also change the dev used in early prints as the adapter struct may not be inited at this earlier stage. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Remove unnecessarily verbose print in cxl_remove()Michael Neuling2015-06-031-2/+0
| | | | | | Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud