summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-2711-63/+378
| | | | llvm-svn: 296396
* [ARM] don't transform an add(ext Cond), C to select unless there's a setcc ↵Sanjay Patel2017-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | of the condition The transform in question claims to be doing: // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) ...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node for the sext/zext patterns. This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(), so I was seeing infinite loops with my draft of a patch applied. The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's not affected by this code change. Differential Revision: https://reviews.llvm.org/D30355 llvm-svn: 296389
* AMDGPU: Add some of the new gfx9 VOP3 instructionsMatt Arsenault2017-02-271-0/+12
| | | | llvm-svn: 296382
* [X86][SSE] Attempt to extract vector elements through target shufflesSimon Pilgrim2017-02-271-0/+97
| | | | | | | | | | DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381
* AMDGPU: Support inlineasm for packed instructionsMatt Arsenault2017-02-271-1/+42
| | | | | | | Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
* AMDGPU: Don't fold immediate if clamp/omod are setMatt Arsenault2017-02-272-8/+13
| | | | | | | Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375
* AMDGPU: Fold omod into instructionsMatt Arsenault2017-02-273-6/+146
| | | | llvm-svn: 296372
* AMDGPU: Add f16 to shader calling conventionsMatt Arsenault2017-02-271-3/+3
| | | | | | Mostly useful for writing tests for f16 features. llvm-svn: 296370
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-2723-86/+879
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* [Hexagon] Defs and clobbers can overlapKrzysztof Parzyszek2017-02-271-5/+4
| | | | llvm-svn: 296365
* [X86] Use APInt instead of SmallBitVector tracking undef elements from ↵Craig Topper2017-02-271-25/+25
| | | | | | | | | | | | | | | | | | | getTargetConstantBitsFromNode and getConstVector. Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30392 llvm-svn: 296355
* [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in ↵Craig Topper2017-02-271-63/+57
| | | | | | | | | | | | | | | | | | | shuffle lowering Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30390 llvm-svn: 296354
* [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap ↵Craig Topper2017-02-271-5/+5
| | | | | | | | | | allocation Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized. Differential Revision: https://reviews.llvm.org/D30387 llvm-svn: 296353
* [X86] Use APInt instead of SmallBitVector for tracking undef elements in ↵Craig Topper2017-02-271-10/+10
| | | | | | | | | | | | | | | | | | | constant pool shuffle decoding Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30386 llvm-svn: 296352
* AArch64InstPrinter: rewrite of printSysAliasSjoerd Meijer2017-02-273-316/+163
| | | | | | | | | | | | | | | | | | This is a cleanup/rewrite of the printSysAlias function. This was not using the tablegen instruction descriptions, but was "manually" decoding the instructions. This has been replaced with calls to lookup_XYZ_ByEncoding tablegen calls. This revealed several problems. First, instruction IVAU had the wrong encoding. This was cancelled out by the parser that incorrectly matched the wrong encoding. Second, instruction CVAP was missing from the SystemOperands tablegen descriptions, so this has been added. And third, the required target features were not captured in the tablegen descriptions, so support for this has also been added. Differential Revision: https://reviews.llvm.org/D30329 llvm-svn: 296343
* [ARM] LSL #0 is an alias of MOVJohn Brawn2017-02-272-12/+39
| | | | | | | | | | | | | | | | | | | | | | | | Currently we handle this correctly in arm, but in thumb we don't which leads to an unpredictable instruction being emitted for LSL #0 in an IT block and SP not being permitted in some cases when it should be. For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the .td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to get the IT handling right. We also need to adjust the handling of MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We should also adjust it to allow SP in the same way that it is allowed in MOV rd, rn, but I haven't done that here because it looks like it would take quite a lot of work to get right. Additionally correct the selection of the 16-bit shift instructions in processInstruction, where it was checking if the two registers were equal when it should have been checking if they were low. It appears that previously this code was never executed and the 16-bit encoding was selected by default, but the other changes I've done here have somehow made it start being used. Differential Revision: https://reviews.llvm.org/D30294 llvm-svn: 296342
* AArch64AsmParser: don't try to parse “[1]” for non-vector register operandsSjoerd Meijer2017-02-271-25/+0
| | | | | | | | | There are no instructions that have "[1]" as part of the assembly string; FMOVXDhighr is out of date. This removes dead code. Differential Revision: https://reviews.llvm.org/D30165 llvm-svn: 296327
* [AMDGPU] Runtime metadata fixes:Konstantin Zhuravlyov2017-02-275-32/+79
| | | | | | | | | | | - Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324
* [X86] Check for less than 0 rather than explicit compare with -1. NFCCraig Topper2017-02-271-2/+3
| | | | llvm-svn: 296321
* [X86] Fix execution domain for cmpss/sd instructions.Craig Topper2017-02-261-0/+8
| | | | llvm-svn: 296293
* [AVX-512] Fix execution domain for scalar commutable min/max instructions.Craig Topper2017-02-261-1/+1
| | | | llvm-svn: 296292
* [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296291
* [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296290
* [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and ↵Craig Topper2017-02-261-14/+16
| | | | | | VPBROADCASTWr_Alt instructions. NFC llvm-svn: 296289
* [AVX-512] Fix execution domain for VPMADD52 instructions.Craig Topper2017-02-261-0/+2
| | | | llvm-svn: 296288
* [AVX-512] Fix the execution domain for VSCALEF instructions.Craig Topper2017-02-261-0/+4
| | | | llvm-svn: 296286
* [AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296285
* [X86] Fix the execution domain for scalar SQRT intrinsic instruction.Craig Topper2017-02-261-2/+2
| | | | llvm-svn: 296284
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-262-6/+1
| | | | | | | | UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
* vec perm can go down either pipeline on P8.Eric Christopher2017-02-261-1/+1
| | | | | | No observable changes, spotted while looking at the scheduling description. llvm-svn: 296277
* [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)Simon Pilgrim2017-02-252-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272
* [AVX-512] Fix the execution domain for scalar FMA instructions.Craig Topper2017-02-251-1/+2
| | | | llvm-svn: 296271
* [AVX-512] Fix the execution domain on some instructions.Craig Topper2017-02-251-4/+13
| | | | llvm-svn: 296270
* [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS ↵Craig Topper2017-02-251-24/+11
| | | | | | using the scalar register class. We only have patterns for the masked intrinsics. llvm-svn: 296264
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-252-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
* Minor code cleanup. NFC.Junmo Park2017-02-251-1/+1
| | | | llvm-svn: 296207
* [WebAssembly] Add support for using a wasm global for the stack pointer.Dan Gohman2017-02-245-42/+112
| | | | | | | This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201
* [Hexagon] Undo shift folding where it could simplify addressing modeKrzysztof Parzyszek2017-02-241-3/+75
| | | | | | | | | | | | For example, avoid (single shift): r0 = and(##536870908,lsr(r0,#3)) r0 = memw(r1+r0<<#0) in favor of (two shifts): r0 = lsr(r0,#5) r0 = memw(r1+r0<<#2) llvm-svn: 296196
* [WebAssembly] Basic support for Wasm object file encoding.Dan Gohman2017-02-2423-140/+588
| | | | | | | | | With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190
* [Hexagon] Prettify code in HexagonDAGToDAGISel::SelectKrzysztof Parzyszek2017-02-241-47/+13
| | | | llvm-svn: 296187
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-244-1/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* Revert "Correct register pressure calculation in presence of subregs"Stanislav Mekhanoshin2017-02-242-22/+0
| | | | | | | | This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
* [WebAssembly] Handle f16 in fast-isel.Dan Gohman2017-02-241-0/+2
| | | | llvm-svn: 296172
* [Target/MIPS] Kill dead code, no functional change intended.Davide Italiano2017-02-241-11/+0
| | | | | | Hopefully placates gcc with -Werror. llvm-svn: 296153
* Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt ↵Simon Pilgrim2017-02-242-7/+8
| | | | | | | | | | | | | | | | | | subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147
* [PowerPC] Use subfic instruction for subtract from immediateNemanja Ivanovic2017-02-241-0/+4
| | | | | | | | | | | Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate. The corresponding pattern already exists for 32-bit integers. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29387 llvm-svn: 296144
* [PowerPC] Use rldicr instruction for AND with an immediate if possibleNemanja Ivanovic2017-02-241-0/+13
| | | | | | | | | | | Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that clear bits from the right hand size. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29388 llvm-svn: 296143
* [APInt] Add APInt::extractBits() method to extract APInt subrangeSimon Pilgrim2017-02-242-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141
* Recommit "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-247-151/+400
| | | | | | | | | | | | | | | | | | | | | | This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134
* Revert "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-246-59/+151
| | | | | | This reverts r296132. I forgot to include the tests. llvm-svn: 296133
OpenPOWER on IntegriCloud