summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [IfConversion] Add testcases [NFC]Mikael Holmen2017-09-206-0/+211
| | | | | | | These tests should have been included in r310697 / D34099 but apparently I missed them. llvm-svn: 313737
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-16/+525
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-202-15/+157
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-203-6/+601
| | | | llvm-svn: 313712
* [MIRPrinter] Print empty successor lists when they cannot be guessedQuentin Colombet2017-09-192-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies commit r313685, this time with the proper updates to the test cases. Original commit message: Unreachable blocks in the machine instr representation are these weird empty blocks with no successors. The MIR printer used to not print empty lists of successors. However, the MIR parser now treats non-printed list of successors as "please guess it for me". As a result, the parser tries to guess the list of successors and given the block is empty, just assumes it falls through the next block (if any). For instance, the following test case used to fail the verifier. The MIR printer would print entry / \ true (def) false (no list of successors) | split.true (use) The MIR parser would understand this: entry / \ true (def) false | / <-- invalid edge split.true (use) Because of the invalid edge, we get the "def does not dominate all uses" error. The fix consists in printing empty successor lists, so that the parser knows what to do for unreachable blocks. rdar://problem/34022159 llvm-svn: 313696
* Revert "[MIRPrinter] Print empty successor lists when they cannot be guessed"Quentin Colombet2017-09-191-48/+0
| | | | | | | | | This reverts commit r313685. I thought I had ran ninja check, but apparently I didn't... Need to update a bunch of mir tests. llvm-svn: 313686
* [MIRPrinter] Print empty successor lists when they cannot be guessedQuentin Colombet2017-09-191-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unreachable blocks in the machine instr representation are these weird empty blocks with no successors. The MIR printer used to not print empty lists of successors. However, the MIR parser now treats non-printed list of successors as "please guess it for me". As a result, the parser tries to guess the list of successors and given the block is empty, just assumes it falls through the next block (if any). For instance, the following test case used to fail the verifier. The MIR printer would print entry / \ true (def) false (no list of successors) | split.true (use) The MIR parser would understand this: entry / \ true (def) false | / <-- invalid edge split.true (use) Because of the invalid edge, we get the "def does not dominate all uses" error. The fix consists in printing empty successor lists, so that the parser knows what to do for unreachable blocks. rdar://problem/34022159 llvm-svn: 313685
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-1923-87/+119
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* [SystemZ] Fix truncstore + bswap codegen bugUlrich Weigand2017-09-191-0/+17
| | | | | | | | | | | | | SystemZTargetLowering::combineSTORE contains code to transform a combination of STORE + BSWAP into a STRV type instruction. This transformation is correct for regular stores, but not for truncating stores. The routine neglected to check for that case. Fixes a miscompilation of llvm-objcopy with clang, which caused test suite failures in the SystemZ multistage build bot. llvm-svn: 313669
* [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD.Tony Jiang2017-09-191-0/+60
| | | | | | | | | | Two blocks prior to the join each perform an li and the the join block has an add using the initialized register. Optimize each predecessor block to instead use addi and delete the li's and add. Differential Revision: https://reviews.llvm.org/D36734 llvm-svn: 313639
* [AArch64] Extend tests of loads and stores of register pairsEvandro Menezes2017-09-192-2/+45
| | | | | | Include instances of FP register pairs. llvm-svn: 313638
* [globalisel] Add a G_BSWAP instruction and support bswap using it.Daniel Sanders2017-09-191-0/+60
| | | | llvm-svn: 313633
* [X86][SSE] Add 'redundant pand' test case from PR34620Simon Pilgrim2017-09-191-0/+19
| | | | llvm-svn: 313632
* [x86] regenerate checks; NFCSanjay Patel2017-09-191-13/+76
| | | | llvm-svn: 313631
* [globalisel] Add support for intrinsic_voidDaniel Sanders2017-09-191-0/+29
| | | | llvm-svn: 313629
* [globalisel] Add support for intrinsic_w_chain.Daniel Sanders2017-09-191-0/+27
| | | | | | This maps directly to G_INTRINSIC_W_SIDE_EFFECTS. llvm-svn: 313627
* [x86] Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-1912-262/+2275
| | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
* [ARM] Use ADDCARRY / SUBCARRYRoger Ferrer Ibanez2017-09-193-15/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparatory step for D34515. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 - fixes PR34564 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313618
* Test commit.Andrei Elovikov2017-09-191-1/+0
| | | | llvm-svn: 313617
* AMDGPU: Run internalize symbols at -O0Matt Arsenault2017-09-191-15/+48
| | | | | | | | The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
* [X86][Skylake] Adding the scheduling information for the SkylakeClient targetGadi Haber2017-09-1914-850/+1033
| | | | | | | | | | | | | | This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879. Please expect some performance fluctuations due to code alignment effects. Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena Differential Revision: https://reviews.llvm.org/D37294 llvm-svn: 313613
* [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing ↵Craig Topper2017-09-1929-549/+555
| | | | | | table. llvm-svn: 313610
* bpf: add inline-asm supportYonghong Song2017-09-181-0/+54
| | | | | | Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 313593
* [DAGCombiner] fold assertzexts separated by truncSanjay Patel2017-09-188-36/+32
| | | | | | | | | | | | | If we have an AssertZext of a truncated value that has already been AssertZext'ed, we can assert on the wider source op to improve the zext-y knowledge: assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN This moves a fold from being Mips-specific to general combining, and x86 shows improvements. Differential Revision: https://reviews.llvm.org/D37017 llvm-svn: 313577
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* [DAG, x86] allow store merging before and after legalization (PR34217)Sanjay Patel2017-09-183-77/+34
| | | | | | | | | | | | | | | | | rL310710 allowed store merging to occur after legalization to catch stores that are created late, but this exposes a logic hole seen in PR34217: https://bugs.llvm.org/show_bug.cgi?id=34217 We will miss merging stores if the target lowers vector extracts into target-specific operations. This patch allows store merging to occur both before and after legalization if the target chooses to get maximum merging. I don't think the potential regressions in the other tests are relevant. The tests are for correctness of weird IR constructs rather than perf tests, and I think those are still correct. Differential Revision: https://reviews.llvm.org/D37987 llvm-svn: 313564
* [X86] Make sure we still emit zext for GR32 to GR64 when the source of the ↵Craig Topper2017-09-181-0/+42
| | | | | | | | | | | | zext is AssertZext The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext. This fixes PR28540. Differential Revision: https://reviews.llvm.org/D37729 llvm-svn: 313563
* [x86] add tests for PR34217; NFCSanjay Patel2017-09-181-1/+214
| | | | llvm-svn: 313548
* [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare ↵Simon Pilgrim2017-09-181-21/+9
| | | | | | | | results. As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 313547
* [x86] regenerate checks; NFCSanjay Patel2017-09-181-16/+17
| | | | llvm-svn: 313545
* [LoopVectorizer] Add more testcases for PR33804.Manoj Gupta2017-09-182-0/+96
| | | | | | | | | | | | | | | | Summary: Add test cases when float <-> pointer types conversion is triggered in presence of load instructions. Reviewers: Ayal, srhines, mkuper, rengolin Reviewed By: rengolin Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D37967 llvm-svn: 313544
* [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign ↵Simon Pilgrim2017-09-186-1041/+701
| | | | | | | | | | | | bits. For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements. We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results. Differential Revision: https://reviews.llvm.org/D37849 llvm-svn: 313543
* [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is ↵Craig Topper2017-09-1815-449/+407
| | | | | | | | | | | | enabled The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there. If we have VPERMQ/PD we should prefer those since they only have a single input. Differential Revision: https://reviews.llvm.org/D37947 llvm-svn: 313542
* [X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X)Simon Pilgrim2017-09-182-23/+11
| | | | | | | | As discussed on PR28925 and D37849. Differential Revision: https://reviews.llvm.org/D37975 llvm-svn: 313532
* [X86][SSE] Add vselect with zero tests (PR28925)Simon Pilgrim2017-09-181-0/+67
| | | | llvm-svn: 313529
* [X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.Nikolai Bozhenov2017-09-1815-234/+386
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524
* [X86][Codegen] adding masked gathers tests for avx2Mohammed Agabaria2017-09-181-0/+915
| | | | | | | | | related to patch: https://reviews.llvm.org/D35772 adding llvm gathers test before gathers codegen support. Differential Revision: https://reviews.llvm.org/D37800 llvm-svn: 313516
* [X86] Teach the execution domain fixing tables to use movlhps inplace of ↵Craig Topper2017-09-1841-360/+360
| | | | | | | | unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509
* [X86] Teach execution domain fixing to convert between FP and int unpack ↵Craig Topper2017-09-1842-743/+573
| | | | | | instructions. llvm-svn: 313508
* [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.Craig Topper2017-09-1840-486/+381
| | | | llvm-svn: 313507
* [X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary ↵Craig Topper2017-09-171-2/+2
| | | | | | shuffles with SSE1 only. llvm-svn: 313504
* [X86] Add a couple more unary shuffles to the sse1 shuffle test.Craig Topper2017-09-171-0/+18
| | | | | | These can be implemented with movlhps and movhlps. llvm-svn: 313503
* Adding test cases for PR34629 & PR34634.Jatin Bhateja2017-09-172-0/+119
| | | | | | Differential Revision: https://reviews.llvm.org/D37962 llvm-svn: 313490
* [GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions.Igor Breger2017-09-1711-15/+293
| | | | llvm-svn: 313483
* [GlobalISel][X86] Use correct physical register in mir tests.NFC.Igor Breger2017-09-1719-124/+122
| | | | llvm-svn: 313479
* [GlobalISel][X86] G_FCONSTANT support.Igor Breger2017-09-174-7/+173
| | | | | | | | | | | | | | Summary: G_FCONSTANT support, port the implementation from X86FastIsel. Reviewers: zvi, delena, guyblank Reviewed By: delena Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37734 llvm-svn: 313478
* [x86] enable storeOfVectorConstantIsCheap() target hookSanjay Patel2017-09-162-65/+59
| | | | | | | | | | | | | | This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are handled separately in there using the appropriate hooks. For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449: https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug) Differential Revision: https://reviews.llvm.org/D37451 llvm-svn: 313458
* [X86] Add isel patterns to be able to fold loads into VPERM2F128 even when ↵Craig Topper2017-09-161-2/+2
| | | | | | | | | | the load is on the first input to the SDNode. We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel. Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel. llvm-svn: 313455
* [X86] Remove unused check lines that got left behind when I moved tests to ↵Craig Topper2017-09-161-15/+0
| | | | | | the instrinsic upgrade file and regenerated. llvm-svn: 313454
* [X86] Remove the vperm2f128 test file I just added in r313450.Craig Topper2017-09-161-229/+0
| | | | | | I missed the we already had a pretty thorough test file for these instructions. llvm-svn: 313451
OpenPOWER on IntegriCloud