summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* [Target] remove TargetRecip class; 2nd trySanjay Patel2016-10-202-50/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* revert r284495: [Target] remove TargetRecip classSanjay Patel2016-10-182-23/+50
| | | | | | There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
* [Target] remove TargetRecip class; move reciprocal estimate isel ↵Sanjay Patel2016-10-182-50/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
* [PPC] Shorter sequence to load 64bit constant with same hi/lo wordsGuozhi Wei2016-10-141-2/+23
| | | | | | | | | | | | This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276
* [PPCMIPeephole] Fix splat eliminationTim Shen2016-10-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation: B = Splat A C = Splat B => C = Splat A because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not: B = Splat A C = Splat B => B = Splat A C = COPY B Fixes PR30663. Reviewers: echristo, iteratee, kbarton, nemanjai Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25493 llvm-svn: 283961
* Revert r283690, "MC: Remove unused entities."Peter Collingbourne2016-10-101-1/+1
| | | | llvm-svn: 283814
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-097-23/+38
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* MC: Remove unused entities.Peter Collingbourne2016-10-091-1/+1
| | | | llvm-svn: 283691
* Target: Remove unused entities.Peter Collingbourne2016-10-091-1/+1
| | | | llvm-svn: 283690
* Revert "Revert "Add a static_assert to enforce that parameters to ↵Mehdi Amini2016-10-071-1/+2
| | | | | | | | | llvm::format() are not totally unsafe"" This reverts commit r283510 and reapply r283509, with updates to clang-tools-extra as well. llvm-svn: 283525
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-071-10/+0
| | | | llvm-svn: 283515
* Revert "Add a static_assert to enforce that parameters to llvm::format() are ↵Mehdi Amini2016-10-061-2/+1
| | | | | | | | not totally unsafe" This reverts commit r283509, clang is hitting the assert. llvm-svn: 283510
* Add a static_assert to enforce that parameters to llvm::format() are not ↵Mehdi Amini2016-10-061-1/+2
| | | | | | | | | | | | | | | | | | | | | totally unsafe Summary: I had for the second time today a bug where llvm::format("%s", Str) was called with Str being a StringRef. The Linux and MacOS bots were fine, but windows having different calling convention, it printed garbage. Instead we can catch this at compile-time: it is never expected to call a C vararg printf-like function with non scalar type I believe. Reviewers: bogner, Bigcheese, dexonsmith Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25266 llvm-svn: 283509
* [Target] move reciprocal estimate settings from TargetOptions to TargetLoweringSanjay Patel2016-10-042-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO. Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current state. The ingredients of this patch are: Remove the reciprocal estimate command-line debug option. Add TargetRecip to TargetLowering. Remove TargetRecip from TargetOptions. Clean up the TargetRecip implementation to work with this new scheme. Set the default reciprocal settings in TargetLoweringBase (everything is off). Update the PowerPC defaults, users, and tests. Update the x86 defaults, users, and tests. Note that if this patch needs to be reverted, the related clang patch checked in at r283251 should be reverted too. Differential Revision: https://reviews.llvm.org/D24816 llvm-svn: 283252
* [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register setNemanja Ivanovic2016-10-043-6/+81
| | | | | | | | | | | | | This patch corresponds to review: The newly added VSX D-Form (register + offset) memory ops target the upper half of the VSX register set. The existing ones target the lower half. In order to unify these and have the ability to target all the VSX registers using D-Form operations, this patch defines Pseudo-ops for the loads/stores which are expanded post-RA. The expansion then choses the correct opcode based on the register that was allocated for the operation. llvm-svn: 283212
* [Power9] Part-word VSX integer scalar loads/stores and sign extend instructionsNemanja Ivanovic2016-10-0418-157/+582
| | | | | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190
* [PowerPC] Account for the ELFv2 function prologue during branch selectionHal Finkel2016-10-032-2/+18
| | | | | | | | | | | | | | | | The PPC branch-selection pass, which performs branch relaxation, needs to account for the padding that might be introduced to satisfy block alignment requirements. We were assuming that the first block was at offset zero (i.e. had the alignment of the function itself), but under the ELFv2 ABI, a global entry function prologue is added to the first block, and it is a two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte alignment, the fact that the first block is eight bytes offset from the start of the function is relevant to calculating where padding will be added in between later blocks. Unfortunately, I don't have a small test case. llvm-svn: 283086
* [PowerPC] Refactor soft-float support, and enable PPC64 soft floatHal Finkel2016-10-025-26/+43
| | | | | | | | | | | | | | | | | | | | | | | This change enables soft-float for PowerPC64, and also makes soft-float disable all vector instruction sets for both 32-bit and 64-bit modes. This latter part is necessary because the PPC backend canonicalizes many Altivec vector types to floating-point types, and so soft-float breaks scalarization support for many operations. Both for embedded targets and for operating-system kernels desiring soft-float support, it seems reasonable that disabling hardware floating-point also disables vector instructions (embedded targets without hardware floating point support are unlikely to have Altivec, etc. and operating system kernels desiring not to use floating-point registers to lower syscall cost are unlikely to want to use vector registers either). If someone needs this to work, we'll need to change the fact that we promote many Altivec operations to act on v4f32. To make it possible to disable Altivec when soft-float is enabled, hardware floating-point support needs to be expressed as a positive feature, like the others, and not a negative feature, because target features cannot have dependencies on the disabling of some other feature. So +soft-float has now become -hard-float. Fixes PR26970. llvm-svn: 283060
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-014-15/+11
| | | | llvm-svn: 283004
* [Power9] Builtins for ELF v.2 API conformance - back end portionNemanja Ivanovic2016-09-273-31/+78
| | | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D24396 This patch adds support for the "vector count trailing zeroes", "vector compare not equal" and "vector compare not equal or zero instructions" as well as "scalar count trailing zeroes" instructions. It also changes the vector negation to use XXLNOR (when VSX is enabled) so as not to increase register pressure (previously this was done with a splat immediate of all ones followed by an XXLXOR). This was done because the altivec.h builtins (patch to follow) use vector negation and the use of an additional register for the splat immediate is not optimal. llvm-svn: 282478
* [Power9] Exploit move and splat instructions for build_vector improvementNemanja Ivanovic2016-09-235-5/+96
| | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D21135 This patch exploits the following instructions: mtvsrws lxvwsx mtvsrdd mfvsrld In order to improve some build_vector and extractelement patterns. llvm-svn: 282246
* [PowerPC] Sign extend sub-word values for atomic comparisonsNemanja Ivanovic2016-09-221-2/+11
| | | | | | | | Atomic comparison instructions use the sub-word load instruction on Power8 and up but the value is not sign extended prior to the signed word compare instruction. This patch adds that sign extension. llvm-svn: 282182
* [PPC] Set SP after loading data from stack frame, if no red zone is presentKrzysztof Parzyszek2016-09-221-50/+195
| | | | | | | | | | | Follow-up to r280705: Make sure that the SP is only restored after all data is loaded from the stack frame, if there is no red zone. This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24466 llvm-svn: 282174
* [PowerPC] Remove LE patterns matching generic stores/loads to VSX permuting opsNemanja Ivanovic2016-09-221-5/+10
| | | | | | | | | | | | | This patch corresponds to: https://reviews.llvm.org/D21409 The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords in the vector register when in little-endian mode. Custom code ensures that the necessary swaps are inserted for these. This patch simply removes the possibilty that a load/store node will match one of these instructions in the SDAG as that would not insert the necessary swaps. llvm-svn: 282144
* [Power9] Add exploitation of non-permuting memory opsNemanja Ivanovic2016-09-225-21/+68
| | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D19825 The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the patterns for the old instruction have a predicate that ensures they won't be selected on Power9 and newer CPUs. llvm-svn: 282143
* Fix a hidden use of grabbing the Mangler from the AsmPrinter and updateEric Christopher2016-09-161-4/+4
| | | | | | accordingly. llvm-svn: 281748
* Place the lowered phi instruction(s) before the DEBUG_VALUE entryKeith Walker2016-09-161-1/+1
| | | | | | | | | | | | | | | | When a phi node is finally lowered to a machine instruction it is important that the lowered "load" instruction is placed before the associated DEBUG_VALUE entry describing the value loaded. Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to more fully describe that it also skips debug entries. Then used the "new" function SkipPHIsAndLabels when the debug information should not be skipped when placing the lowered "load" instructions so that it is placed before the debug entries. Differential Revision: https://reviews.llvm.org/D23760 llvm-svn: 281727
* Move the Mangler from the AsmPrinter down to TLOF and clean up theEric Christopher2016-09-162-5/+2
| | | | | | TLOF API accordingly. llvm-svn: 281708
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-142-4/+4
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-142-3/+3
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-142-4/+11
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-141-1/+1
| | | | llvm-svn: 281495
* getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCISanjay Patel2016-09-142-6/+6
| | | | llvm-svn: 281493
* getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCISanjay Patel2016-09-141-5/+3
| | | | llvm-svn: 281490
* getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-141-1/+1
| | | | llvm-svn: 281489
* Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)Nemanja Ivanovic2016-09-142-2/+16
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D24021 In the initial implementation of this instruction, I forgot to account for variable indices. This patch fixes PR30189 and should probably be merged into 3.9.1 (I'll open a bug according to the new instructions). llvm-svn: 281479
* Adding missing directive for Power9.Nemanja Ivanovic2016-09-141-1/+1
| | | | | | | | There is currently no codegen for Power9 that depends on the directive so this is NFC for now but will be important in the future. This was missed in r268950 so I'm adding it now. llvm-svn: 281473
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-112-11/+19
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* [PowerPC] Fix address-offset folding for plain addiHal Finkel2016-09-071-15/+38
| | | | | | | | | | | | | | | | | | | | | | | When folding an addi into a memory access that can take an immediate offset, we were implicitly assuming that the existing offset was zero. This was incorrect. If we're dealing with an addi with a plain constant, we can add it to the existing offset (assuming that doesn't overflow the immediate, etc.), but if we have anything else (i.e. something that will become a relocation expression), we'll go back to requiring the existing immediate offset to be zero (because we don't know what the requirements on that relocation expression might be - e.g. maybe it is paired with some addis in some relevant way). On the other hand, when dealing with a plain addi with a regular constant immediate, the alignment restrictions (from the TOC base pointer, etc.) are irrelevant. I've added the test case from PR30280, which demonstrated the bug, but also demonstrates a missed optimization opportunity (i.e. we don't need the memory accesses at all). Fixes PR30280. llvm-svn: 280789
* [PPC] Claim stack frame before storing into it, if no red zone is presentKrzysztof Parzyszek2016-09-061-25/+91
| | | | | | | | | | | | | Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it there is no guarantee that this part of the stack will not be modified by any interrupt. To avoid this, make sure to claim the stack frame first before storing into it. This fixes https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24093 llvm-svn: 280705
* [PowerPC] During branch relaxation, recompute padding offsets before each ↵Hal Finkel2016-09-041-7/+39
| | | | | | | | | | | | | | | | iteration We used to compute the padding contributions to the block sizes during branch relaxation only at the start of the transformation. As we perform branch relaxation, we change the sizes of the blocks, and so the amount of inter-block padding might change. Accordingly, we need to recompute the (alignment-based) padding in between every iteration on our way toward the fixed point. Unfortunately, I don't have a test case (and none was provided in the bug report), and while this obviously seems needed, algorithmically, I don't have any way of generating a small and/or non-fragile regression test. llvm-svn: 280626
* [PowerPC] Zero-extend constants in FastISelHal Finkel2016-09-041-1/+6
| | | | | | | | | | | | | | | | | | As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which are illegal types promoted to i32 on PowerPC, is a choice constrained by assumptions within the infrastructure. Specifically, the logic in FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI operands will be zero extended, and so, at least when materializing constants that are PHI operands, we must do the same. The rest of our fast-isel implementation does not appear to depend on the fact that we were sign-extending i8/i16 constants, and all other targets also appear to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we had been doing this only for i1 constants, and sign-extending the others). Fixes PR27721. llvm-svn: 280614
* [PowerPC] Support asm parsing for bc[l][a][+-] mnemonicsHal Finkel2016-09-035-0/+70
| | | | | | | | | | | | | | | | | | | | PowerPC assembly code in the wild, so it seems, has things like this: bc+ 12, 28, .L9 This is a bit odd because the '+' here becomes part of the BO field, and the BO field is otherwise the first operand. Nevertheless, the ISA specification does clearly say that the +- hint syntax applies to all conditional-branch mnemonics (that test either CTR or a condition register, although not the forms which check both), both basic and extended, so this is supposed to be valid. This introduces some asm-parser-only definitions which take only the upper three bits from the specified BO value, and the lower two bits are implied by the +- suffix (via some associated aliases). Fixes PR23646. llvm-svn: 280571
* [PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfevHal Finkel2016-09-023-0/+26
| | | | | | | | These few book-III instructions are used by the Linux kernel. Partially fixes PR24796. llvm-svn: 280560
* [PowerPC] Add support for the extended dcbf form and mnemonicsHal Finkel2016-09-023-3/+46
| | | | | | | | | dcbf has an optional hint-like field, add support for the extended form and the associated mnemonics (dcbfl and dcbflp). Partially fixes PR24796. llvm-svn: 280559
* [PowerPC] For larger offsets, when possible, fold offset into addis toc@haHal Finkel2016-09-022-2/+34
| | | | | | | | | | | | | When we have an offset into a global, etc. that is accessed relative to the TOC base pointer, and the offset is larger than the minimum alignment of the global itself and the TOC base pointer (which is 8-byte aligned), we can still fold the @toc@ha into the memory access, but we must update the addis instruction's symbol reference with the offset as the symbol addend. When there is only one use of the addi to be folded and only one use of the addis that would need its symbol's offset adjusted, then we can make the adjustment and fold the @toc@l into the memory access. llvm-svn: 280545
* [PowerPC] hasAndNotCompare should return trueHal Finkel2016-09-021-0/+4
| | | | | | | | | | As Sanjay suggested when he added the hook, PPC should return true from hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a compare). Fixes PR27203. llvm-svn: 280457
* [PowerPC] Add a pattern for a runtime bit checkHal Finkel2016-09-021-0/+40
| | | | | | | | | | | | | | | | | | | Following a suggestion by Sanjay, we should lower: %shl = shl i32 1, %y %and = and i32 %x, %shl %cmp = icmp eq i32 %and, %shl ret i1 %cmp into: subfic r4, r4, 32 rlwnm r3, r3, r4, 31, 31 Add this pattern and some associated patterns for the 64-bit case and the not-equal case. Fixes PR27356. llvm-svn: 280454
* [PowerPC] Don't apply the PPC64 address-formation peephole for offsets ↵Hal Finkel2016-09-021-2/+7
| | | | | | | | | | | | | | | | greater than 7 When applying our address-formation PPC64 peephole, we are reusing the @ha TOC addis value with the low parts associated with different offsets (i.e. different effective symbol addends). We were assuming this was okay so long as the offsets were less than the alignment of the global variable being accessed. This ignored the fact, however, that the TOC base pointer itself need only be 8-byte aligned. As a result, what we were doing is legal only for offsets less than 8 regardless of the alignment of the object being accessed. Fixes PR28727. llvm-svn: 280441
OpenPOWER on IntegriCloud