summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add urem vector test for non-uniform pow2 constantsSimon Pilgrim2017-07-261-4/+17
| | | | llvm-svn: 309104
* [X86] Regenerated urem pow2 tests on 32/64 bit targetsSimon Pilgrim2017-07-261-36/+79
| | | | llvm-svn: 309103
* [X86] Regenerated umul overflow tests on 32/64 bit targetsSimon Pilgrim2017-07-261-14/+48
| | | | llvm-svn: 309102
* [ARM] GlobalISel: Map G_GLOBAL_VALUE to GPRDiana Picus2017-07-261-0/+19
| | | | | | A G_GLOBAL_VALUE is basically a pointer, so it should live in the GPR. llvm-svn: 309101
* [X86][AVX] Regenerated and cleaned up AVX1 intrinsic tests.Simon Pilgrim2017-07-267-718/+989
| | | | | | Cleaned up triple settings, added 32-bit/64-bit targets where useful, added broadcast comments llvm-svn: 309100
* [X86][AVX2] Regenerated and cleaned up broadcast tests.Simon Pilgrim2017-07-262-40/+42
| | | | llvm-svn: 309099
* [X86][AVX512] Regenerated and added 32-bit targets to select testsSimon Pilgrim2017-07-261-106/+256
| | | | llvm-svn: 309098
* [X86][AVX] Regenerated and cleaned up masked gather/scatter tests.Simon Pilgrim2017-07-261-32/+8
| | | | | | Remove unused KNL checks and triple settings, added broadcast comments llvm-svn: 309097
* [X86][AVX] Regenerate lzcnt test.Simon Pilgrim2017-07-261-36/+36
| | | | | | Tidied up triples and checks. llvm-svn: 309095
* [X86][FMA] Regenerate test with broadcast comments.Simon Pilgrim2017-07-262-13/+13
| | | | llvm-svn: 309093
* [ARM] GlobalISel: Mark G_GLOBAL_VALUE as legalDiana Picus2017-07-261-0/+26
| | | | llvm-svn: 309090
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.Michael Zuckerman2017-07-261-136/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch expands the support of lowerInterleavedStore to 32x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low). The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 32 chars: c0, c1, , c31 m0, m1, , m31 y0, y1, , y31 k0, k1, ., k31 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers: dorit Farhana RKSimon guyblank DavidKreitzer Differential Revision: https://reviews.llvm.org/D34601 llvm-svn: 309086
* [X86] Prevent selecting masked aligned load instructions if the load should ↵Craig Topper2017-07-261-0/+115
| | | | | | | | | | | | | | | | be non-temporal Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35712 llvm-svn: 309079
* [AArch64] Add a test for float argument passing to win64 vararg functionsMartin Storsjo2017-07-251-0/+26
| | | | | | | | The existing tests only tested how a va_start is lowered. Differential Revision: https://reviews.llvm.org/D35540 llvm-svn: 309015
* [PowerPC] Pretty-print CR bits the way the binutils disassembler doesNemanja Ivanovic2017-07-256-26/+30
| | | | | | | | | This patch just adds printing of CR bit registers in a more human-readable form akin to that used by the GNU binutils. Differential Revision: https://reviews.llvm.org/D31494 llvm-svn: 309001
* [PowerPC] - Recommit r304907 now that the issue has been fixedNemanja Ivanovic2017-07-255-6/+567
| | | | | | | This is just a recommit since the issue that the commit exposed is now resolved. llvm-svn: 308995
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-288/+122
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [X86] Regenerate test.Simon Pilgrim2017-07-251-3/+5
| | | | llvm-svn: 308981
* [X86] Regenerate test with broadcast comments.Simon Pilgrim2017-07-251-3/+3
| | | | llvm-svn: 308980
* [X86] Add 24-byte memcmp tests (PR33914)Simon Pilgrim2017-07-253-17/+304
| | | | llvm-svn: 308963
* Fix endianness bug in DAGCombiner::visitTRUNCATE and visitEXTRACT_VECTOR_ELTFrancois Pichet2017-07-251-0/+55
| | | | | | | | | | | | | | | | Summary: Do not assume little endian architecture in DAGCombiner::visitTRUNCATE and DAGCombiner::visitEXTRACT_VECTOR_ELT. PR33682 Reviewers: hfinkel, sdardis, RKSimon Reviewed By: sdardis, RKSimon Subscribers: uabelho, RKSimon, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D34990 llvm-svn: 308960
* [AArch64] Reserve a 16 byte aligned amount of fixed stack for win64 varargsMartin Storsjo2017-07-252-11/+65
| | | | | | | | | | | | | | | | | | | | | | | | | Create a dummy 8 byte fixed object for the unused slot below the first stored vararg. Alternative ideas tested but skipped: One could try to align the whole fixed object to 16, but I haven't found how to add an offset to the stack frame used in LowerWin64_VASTART. If only the size of the fixed stack object size is padded but not the offset, via MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false), PrologEpilogInserter crashes due to "Attempted to reset backwards range!". This fixes misconceptions about where registers are spilled, since AArch64FrameLowering.cpp assumes the offset from fixed objects is aligned to 16 bytes (and the Win64 case there already manually aligns the offset to 16 bytes). This fixes cases where local stack allocations could overwrite callee saved registers on the stack. Differential Revision: https://reviews.llvm.org/D35720 llvm-svn: 308950
* [Hexagon] Recognize C4_cmpneqi, C4_cmpltei and C4_cmplteui in NewValueJumpKrzysztof Parzyszek2017-07-241-0/+48
| | | | llvm-svn: 308914
* Adding base test for interleave store VF16 and expand the test for AVX512 Michael Zuckerman2017-07-241-81/+245
| | | | | | This patch doesn't modifay any non test file. llvm-svn: 308909
* [X86][AVX512] Add patterns for masked AVX512 floating point compare ↵Ayman Musa2017-07-241-129/+4503
| | | | | | | | | | | instructions that were missing. patterns were missed by D33188. Adding for completion. +Updating test. Differential Revesion: https://reviews.llvm.org/D35179 llvm-svn: 308868
* [AVR] Remove the instrumentation passDylan McKay2017-07-231-62/+0
| | | | | | | | I have a much better way of running integration tests now. https://github.com/dylanmckay/avr-test-suite llvm-svn: 308857
* [AVR] Improve the 'icall-func-pointer-correct-addr-space.ll' testDylan McKay2017-07-231-8/+20
| | | | | | Patch by Carl Peto. llvm-svn: 308856
* [CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos ↵Petr Hosek2017-07-231-0/+2
| | | | | | | | | | optimization Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D35748 llvm-svn: 308854
* [AArch64] Add test for function alignment for a optsize function (NFC). Florian Hahn2017-07-231-17/+24
| | | | | | | | | | | | Reviewers: dblaikie, t.p.northover, rengolin Reviewed By: rengolin Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D35620 llvm-svn: 308852
* [AArch64] Redundant Copy Elimination - remove more zero copies.Chad Rosier2017-07-231-0/+565
| | | | | | | | | | | | | | | | | | This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne and we know the result of the compare instruction is zero. For example, BB#0: subs w0, w1, w2 str w0, [x1] b.ne .LBB0_2 BB#1: mov w0, wzr ; <-- redundant str w0, [x2] .LBB0_2 Differential Revision: https://reviews.llvm.org/D35075 llvm-svn: 308849
* [X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful ↵Craig Topper2017-07-231-9/+9
| | | | | | | | complexity adjustment to keep shift by immediate using the legacy instructions. These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns. llvm-svn: 308834
* [DAG] Fix typo preventing some stores merges to truncated stores.Nirav Dave2017-07-232-8/+5
| | | | | | | | | | | | | | | Check the actual memory type stored and not the extended value size when considering if truncated store merge is worthwhile. Reviewers: efriedma, RKSimon, spatel, jyknight Reviewed By: efriedma Subscribers: llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D35623 llvm-svn: 308833
* RA: Remove another assert on empty intervalsMatt Arsenault2017-07-221-0/+34
| | | | | | | | | This case is similar to the one fixed in r308808, except when rematerializing. Fixes bug 33884. llvm-svn: 308813
* RA: Remove assert on empty live intervalsMatt Arsenault2017-07-211-0/+40
| | | | | | | | | This is possible if there is an undef use when splitting the vreg during spilling. Fixes bug 33620. llvm-svn: 308808
* Remove Bitrig: LLVM ChangesErich Keane2017-07-211-4/+0
| | | | | | | | Bitrig code has been merged back to OpenBSD, thus the OS has been abandoned. Differential Revision: https://reviews.llvm.org/D35707 llvm-svn: 308799
* AMDGPU: Implement memory modelKonstantin Zhuravlyov2017-07-2113-34/+1863
| | | | llvm-svn: 308781
* [Hexagon] Add inline-asm constraint 'a' for modifier register classKrzysztof Parzyszek2017-07-211-0/+16
| | | | | | | For example asm ("memw(%0++%1) = %2" : : "r"(addr),"a"(mod),"r"(val) : "memory") llvm-svn: 308761
* [mips] Support -membedded-data and fix a related bugSimon Dardis2017-07-211-6/+22
| | | | | | | | | | | | -membedded-data changes the location of constant data from the .sdata to the .rodata section. Previously it was (incorrectly) always located in the .rodata section. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D35686 llvm-svn: 308758
* [SystemZ] test updateJonas Paulsson2017-07-211-1/+1
| | | | | | test/CodeGen/SystemZ/loop-01.ll was incorrectly updated by r308729. llvm-svn: 308736
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-212-2/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* [X86][SSE] Add extra (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) test caseSimon Pilgrim2017-07-211-0/+30
| | | | | | We should be able to handle the case where some c1+c2 elements exceed max shift and some don't by performing a clamp after the sum llvm-svn: 308724
* [X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSKSimon Pilgrim2017-07-214-871/+88
| | | | | | | | | | | | Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction. This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results. In future we could probably generalize this to handle more cases. Differential Revision: https://reviews.llvm.org/D35303 llvm-svn: 308723
* [AVX-512] Fix a bug that prevented some non-temporal loads from using the ↵Craig Topper2017-07-211-30/+9
| | | | | | | | movntdqa instruction. The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits. llvm-svn: 308702
* Recommit: GlobalISel: select G_EXTRACT and G_INSERT instructions on AArch64.Tim Northover2017-07-201-0/+54
| | | | | | | | It revealed a bug in the Localizer pass which has now been fixed. This includes the fix for SUBREG_TO_REG committed separately last time. llvm-svn: 308688
* GlobalISel: stop localizer putting constants before EH_LABELsTim Northover2017-07-201-0/+48
| | | | | | | | If the localizer pass puts one of its constants before the label that tells the unwinder "jump here to handle your exception" then control-flow will skip it, leaving uninitialized registers at runtime. That's bad. llvm-svn: 308687
* [NVPTX] Add lowering of i128 params.Artem Belevich2017-07-203-0/+93
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 308675
* Add an ID field to StackObjectsMatt Arsenault2017-07-2011-60/+202
| | | | | | | | | | | | | | | | | | | | | On AMDGPU SGPR spills are really spilled to another register. The spiller creates the spills to new frame index objects, which is used as a placeholder. This will eventually be replaced with a reference to a position in a VGPR to write to and the frame index deleted. It is most likely not a real stack location that can be shared with another stack object. This is a problem when StackSlotColoring decides it should combine a frame index used for a normal VGPR spill with a real stack location and a frame index used for an SGPR. Add an ID field so that StackSlotColoring has a way of knowing the different frame index types are incompatible. llvm-svn: 308673
* [X86] Adding ISel tests for strided-shuffles with non-zero offset. NFC.Zvi Rackover2017-07-203-0/+3641
| | | | llvm-svn: 308672
* [SPARC] Clean up the support for disabling fsmuld and fmuls instructions.James Y Knight2017-07-202-13/+38
| | | | | | | | | | | | | | | | | Summary: Also enable no-fsmuld for sparcv7 (which doesn't have the instruction). The previous code which used a post-processing pass to do this was unnecessary; disabling the instruction is entirely sufficient. Reviewers: jacob_hansen, ekedaigle Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35576 llvm-svn: 308661
* [X86] Allow masks with more than 6 bits set on the x << (y & mask) ↵Craig Topper2017-07-201-1/+0
| | | | | | optimization for the 64-bit memory shifts. llvm-svn: 308657
OpenPOWER on IntegriCloud