summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [DebugInfo] Handle endianness when moving debug info for split integer ↵Bjorn Pettersson2017-10-031-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | values (reapplied) Summary: Take the target's endianness into account when splitting the debug information in DAGTypeLegalizer::SetExpandedInteger. This patch fixes so that, for big-endian targets, the fragment expression corresponding to the high part of a split integer value is placed at offset 0, in order to correctly represent the memory address order. I have attached a PPC32 reproducer where the resulting DWARF pieces for a 64-bit integer were incorrectly reversed. Original patch was reverted due to using -stop-after=isel in the test case (but that is only working when AMDGPU target is included in the llc build). The test case has now been updated to use -stop-before=expand-isel-pseudos instead. Patch by: dstenb Reviewers: JDevlieghere, aprantl, dblaikie Reviewed By: JDevlieghere, aprantl, dblaikie Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D38172 llvm-svn: 314781
* [X86][SSE] Add support for shuffle combining from PACKSS/PACKUSSimon Pilgrim2017-10-031-16/+4
| | | | | | Mentioned in D38472 llvm-svn: 314777
* [X86][SSE] Add support for PACKSS/PACKUS constant foldingSimon Pilgrim2017-10-033-68/+48
| | | | | | Pulled out of D38472 llvm-svn: 314776
* [X86] Provide the LSDA pointer with RIP relative addressing if necessaryMartin Storsjo2017-10-031-1/+1
| | | | | | | | | | | | | This makes sure the LSDA pointer isn't truncated to 32 bit. Make LowerINTRINSIC_WO_CHAIN a member function instead of a static function, so that it can use the getGlobalWrapperKind method. This solves the second half of the issues mentioned in PR34720. Differential Revision: https://reviews.llvm.org/D38343 llvm-svn: 314767
* [Legalizer] Add support for G_OR NarrowScalar.Quentin Colombet2017-10-031-11/+48
| | | | | | | | | | | | | | Legalize bitwise OR: A = BinOp<Ty> B, C into: B1, ..., BN = G_UNMERGE_VALUES B C1, ..., CN = G_UNMERGE_VALUES C A1 = BinOp<Ty/N> B1, C2 ... AN = BinOp<Ty/N> BN, CN A = G_MERGE_VALUES A1, ..., AN llvm-svn: 314760
* [PowerPC] Revert r314666.Tim Shen2017-10-021-68/+0
| | | | | | | | | See https://reviews.llvm.org/D38172. I tried to XFAIL it, but sometimes XPASS triggers the bot. Simply revert it. llvm-svn: 314739
* [PowerPC] Temporarily disable the test introduced by r314666Tim Shen2017-10-021-0/+2
| | | | | | See https://reviews.llvm.org/D38172 for details. llvm-svn: 314732
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-10-0283-347/+326
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729
* AMDGPU: Fix potentially incorrectly matching check linesMatt Arsenault2017-10-021-8/+8
| | | | | | | | These check lines are supposed to make sure the new d16 load instructions aren't used, but the expected instruction name is a prefix of the incorrect instruction name. llvm-svn: 314714
* Add support for Myriad ma2x8x series of CPUsWalter Lee2017-10-021-0/+14
| | | | | | | | | | | | Summary: Also add support for some older Myriad CPUs that were missing. Reviewers: jyknight Subscribers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D37552 llvm-svn: 314705
* Eliminate ftrunc if source is know to be roundedStanislav Mekhanoshin2017-10-021-0/+92
| | | | | | Differential Revision: https://reviews.llvm.org/D38421 llvm-svn: 314688
* [X86][SSE] Add PACKSS/PACKUS constant folding tests Simon Pilgrim2017-10-023-16/+214
| | | | llvm-svn: 314682
* Regenerate test (missing broadcast constant comments). NFCI.Simon Pilgrim2017-10-021-8/+8
| | | | | | Still avoiding the floating point comments to prevent linux/windows discrepancies. llvm-svn: 314681
* Regenerate test (missing broadcast constant comments). NFCI.Simon Pilgrim2017-10-021-3/+3
| | | | llvm-svn: 314680
* Regenerate test. NFCI.Simon Pilgrim2017-10-021-2/+0
| | | | llvm-svn: 314679
* [Debug info] Handle endianness when moving debug info for split integer valuesBjorn Pettersson2017-10-021-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Take the target's endianness into account when splitting the debug information in DAGTypeLegalizer::SetExpandedInteger. This patch fixes so that, for big-endian targets, the fragment expression corresponding to the high part of a split integer value is placed at offset 0, in order to correctly represent the memory address order. I have attached a PPC32 reproducer where the resulting DWARF pieces for a 64-bit integer were incorrectly reversed. Patch by: dstenb Reviewers: JDevlieghere, aprantl, dblaikie Reviewed By: JDevlieghere, aprantl, dblaikie Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D38172 llvm-svn: 314666
* [PowerPC] support ZERO_EXTEND in tryBitPermutationHiroshi Inoue2017-10-021-0/+23
| | | | | | | | | | | | | | | | | | | This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI. Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction. For example, we allow these nodes t9: i32 = add t7, Constant:i32<1> t11: i32 = and t9, Constant:i32<255> t12: i64 = zero_extend t11 t14: i64 = shl t12, Constant:i64<2> to be folded into a rotate-and-mask instruction. Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF]; Differential Revision: https://reviews.llvm.org/D37514 llvm-svn: 314655
* [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in ↵Michael Zuckerman2017-10-021-855/+408
| | | | | | | | | | | | | | | | | | | X86InterleavedAccess (VF64 stride 3-4) I continue to support different VF interleaved and in this pass for this patch, I added the vf64 stride3 support for both load and store. I also added support fot the stride4 store. Reviewers: 1. zvi 2. dorit 3. igorb 4. guyblank Differential Revision: https://reviews.llvm.org/D37687 Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b llvm-svn: 314651
* [Hexagon] Check vector elements for equivalence in the ↵Ron Lieberman2017-10-021-0/+86
| | | | | | | | | | | | | HexagonVectorLoopCarriedReuse pass If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 314642
* [Hexagon] Patch to Extract i1 element from vector of i1Ron Lieberman2017-10-021-0/+9
| | | | | | | This patch extracts 1 element from vector consisting of elements of size 1 bit at given index. llvm-svn: 314641
* [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMemCraig Topper2017-10-013-5/+5
| | | | | | | | | | | | | | | | | | | Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
* [X86][SSE] Add faux shuffle combining support for PACKUSSimon Pilgrim2017-10-012-22/+6
| | | | llvm-svn: 314631
* [X86][AVX2] Simplify PACKUS combine testSimon Pilgrim2017-10-011-3/+3
| | | | | | Trying to use a AND mask is tricky as after legalization its nigh impossible for computeKnownBits to do anything with it llvm-svn: 314630
* [X86][SSE] Improve shuffle combining of PACKSS instructions.Simon Pilgrim2017-10-012-14/+6
| | | | | | Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits. llvm-svn: 314629
* [X86][SSE] Add shuffle combining tests with PACKSS/PACKUSSimon Pilgrim2017-10-012-0/+156
| | | | llvm-svn: 314628
* pre-commit adding test for broadcastm patternJina Nahias2017-10-011-0/+263
| | | | | | | Differential Revision: https://reviews.llvm.org/D38312 Change-Id: Ifbc4189549f2f59995019a86f85f989c04e4d37d llvm-svn: 314626
* Adding test for interleved, case stride 4 vf64 store<NFC>.Michael Zuckerman2017-10-011-0/+309
| | | | | Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017 llvm-svn: 314621
* Fix typo. NFCXin Tong2017-10-011-1/+1
| | | | llvm-svn: 314615
* Revert "Fix typo [NFC]"Xin Tong2017-10-011-1/+1
| | | | | | | | This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9. Incorrectly include changes that are not typo fix. llvm-svn: 314614
* Fix typo [NFC]Xin Tong2017-10-011-1/+1
| | | | llvm-svn: 314613
* Regenerate mul combine tests to update broadcast comment.Simon Pilgrim2017-09-301-1/+1
| | | | llvm-svn: 314607
* [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1Simon Pilgrim2017-09-301-24/+4
| | | | | | Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599
* [AVX-512] Add patterns to make fp compare instructions commutable during isel.Craig Topper2017-09-301-2/+255
| | | | llvm-svn: 314598
* [X86][SSE] Add vector truncation cases inspired by PR34773Simon Pilgrim2017-09-301-0/+804
| | | | | | We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits llvm-svn: 314597
* Code refactoring for the interleaved code <NFC>Michael Zuckerman2017-09-301-57/+57
| | | | | Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596
* [X86][SKX] Added codegen regression test for avx512 instructions scheduling.NFC.Gadi Haber2017-09-302-0/+17757
| | | | | | | | | | | | | | NFC. Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and avx512-shuffle-schedule.ll. This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore the scheduling for the avx512 instructions are still missing. Reviewers: zvi, delena, RKSimon, igorb Differential Revision: https://reviews.llvm.org/D38035 Change-Id: I792762763127a921b9e13684b58af03646536533 llvm-svn: 314594
* [X86] Support v64i8 mulhu/mulhsCraig Topper2017-09-302-2970/+94
| | | | | | | | Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584
* [AMDGPU] Set fast-math flags on functions given the optionsStanislav Mekhanoshin2017-09-291-0/+33
| | | | | | | | | | | | | | | | We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568
* CodeGen: Fix pointer info in expandUnalignedLoad/StoreYaxun Liu2017-09-291-0/+24
| | | | | | | | | | | | Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand in stack, which does not have correct address space. This causes unaligned private double16 load/store to be lowered to flat_load instead of buffer_load for amdgcn target. This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl. Differential Revision: https://reviews.llvm.org/D35361 llvm-svn: 314566
* AMDGPU: VALU carry-in and v_cndmask condition cannot be EXECNicolai Haehnle2017-09-294-30/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522
* [SystemZ] implement shouldCoalesce()Jonas Paulsson2017-09-291-0/+18
| | | | | | | | | | | | | | | | | | | Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516
* [X86] Improve codegen for inverted overflow checking intrinsics.Amara Emerson2017-09-291-24/+12
| | | | | | | | Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514
* [mips] Reordering callseq* nodes to be linearAleksandar Beserminji2017-09-297-10/+11
| | | | | | | | | | | | | Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Recommitting r314497. This version does not contain test which fails when compiler is not build in debug mode. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314507
* Revert "[mips] Reordering callseq* nodes to be linear"Aleksandar Beserminji2017-09-297-66/+10
| | | | | | | | | Added test relies on the compiler being built in debug mode, which may not be the case. This reverts commit r314497. llvm-svn: 314506
* [X86][SSE] Added more tests for vector multiplications as utility for D37896Simon Pilgrim2017-09-291-0/+302
| | | | | | | | | | | | Added additional tests for vector multiplications with multipliers that are: * powers of 2 displaced by 1, * product of a power of 2 displaced by one with another power of 2. Patch by @pacxx (Michael Haidl) Differential Revision: https://reviews.llvm.org/D38350 llvm-svn: 314504
* [AMDGPU] calling conventions for AMDPAL OS typeTim Renouf2017-09-298-1/+141
| | | | | | | | | | | | | | | Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-291-1/+46
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* [Triple] Add AMDPAL operating system typeTim Renouf2017-09-291-0/+10
| | | | | | | | | | | | | | | | | | Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500
* [mips] Reordering callseq* nodes to be linearAleksandar Beserminji2017-09-297-10/+66
| | | | | | | | | | Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314497
* [X86] Don't select (cmp (and, imm), 0) to testwCraig Topper2017-09-283-8/+42
| | | | | | | | | | | | | | | | | Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474
OpenPOWER on IntegriCloud