summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-2/+2
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [Sparc] invalid adjustments in TLS_LE/TLS_LDO relocations removedFedor Sergeev2017-07-251-8/+7
| | | | | | | | | | | | | | | | | | | Summary: Some SPARC TLS relocations were applying nontrivial adjustments to zero value, leading to unexpected non-zero values in ELF and then Solaris linker failures. Getting rid of these adjustments. Fixes PR33825. Reviewers: rafael, asb, jyknight Subscribers: joerg, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D35567 llvm-svn: 308978
* X86 Asm uses assertions instead of proper diagnostic. This patch fixes that.Andrew V. Tischenko2017-07-251-23/+57
| | | | | | Differential Revision: https://reviews.llvm.org/D35115 llvm-svn: 308972
* This patch enables the usage of constant Enum identifiers within Microsoft ↵Matan Haroush2017-07-251-21/+55
| | | | | | | | | | style inline assembly statements. Differential Revision: https://reviews.llvm.org/D33277 https://reviews.llvm.org/D33278 llvm-svn: 308966
* [ARM] Enable partial and runtime unrollingSam Parker2017-07-252-0/+35
| | | | | | | | | | Enable runtime and partial loop unrolling of simple loops without calls on M-class cores. The thresholds are calculated based on whether the target is Thumb or Thumb-2. Differential Revision: https://reviews.llvm.org/D34619 llvm-svn: 308956
* [AArch64] Reserve a 16 byte aligned amount of fixed stack for win64 varargsMartin Storsjo2017-07-251-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Create a dummy 8 byte fixed object for the unused slot below the first stored vararg. Alternative ideas tested but skipped: One could try to align the whole fixed object to 16, but I haven't found how to add an offset to the stack frame used in LowerWin64_VASTART. If only the size of the fixed stack object size is padded but not the offset, via MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false), PrologEpilogInserter crashes due to "Attempted to reset backwards range!". This fixes misconceptions about where registers are spilled, since AArch64FrameLowering.cpp assumes the offset from fixed objects is aligned to 16 bytes (and the Win64 case there already manually aligns the offset to 16 bytes). This fixes cases where local stack allocations could overwrite callee saved registers on the stack. Differential Revision: https://reviews.llvm.org/D35720 llvm-svn: 308950
* Revert "[X86][InlineAsm][Ms Compatibility]Prefer variable name over a ↵Reid Kleckner2017-07-241-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | register when the two collides" This reverts r308867 and r308866. It broke the sanitizer-windows buildbot on C++ code similar to the following: namespace cl { } void f() { __asm { mov al, cl } } t.cpp(4,13): error: unexpected namespace name 'cl': expected expression mov al, cl ^ In this case, MSVC parses 'cl' as a register, not a namespace. llvm-svn: 308926
* [Hexagon] Recognize C4_cmpneqi, C4_cmpltei and C4_cmplteui in NewValueJumpKrzysztof Parzyszek2017-07-241-1/+25
| | | | llvm-svn: 308914
* [AArch64] Adjust the cost model for Exynos M1 and M2Evandro Menezes2017-07-241-4/+3
| | | | | | Fine tune the resources in a couple of ASIMD loads. llvm-svn: 308904
* AMDGPU: Fix allocating pseudo-registersMatt Arsenault2017-07-242-2/+6
| | | | | | | | There's no need for these to be part of a class since they are immediately replaced. New unreachable hit in existing tests.' llvm-svn: 308903
* [X86][AVX512] Add patterns for masked AVX512 floating point compare ↵Ayman Musa2017-07-241-1/+52
| | | | | | | | | | | instructions that were missing. patterns were missed by D33188. Adding for completion. +Updating test. Differential Revesion: https://reviews.llvm.org/D35179 llvm-svn: 308868
* [X86][InlineAsm][Ms Compatibility]Prefer variable name over a register when ↵Coby Tayree2017-07-241-1/+12
| | | | | | | | | | | | | | | | | | | | | the two collides On MS-style, the following snippet: int eax; __asm mov eax, ebx should yield loading of ebx, into the location pointed by the variable eax This patch sees to it. Currently, a reg-to-reg move would have been invoked. clang: D34740 Differential Revision: https://reviews.llvm.org/D34739 llvm-svn: 308866
* [AVR] Remove the instrumentation passDylan McKay2017-07-234-226/+0
| | | | | | | | I have a much better way of running integration tests now. https://github.com/dylanmckay/avr-test-suite llvm-svn: 308857
* [CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos ↵Petr Hosek2017-07-231-3/+6
| | | | | | | | | | optimization Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D35748 llvm-svn: 308854
* [AArch64] Redundant Copy Elimination - remove more zero copies.Chad Rosier2017-07-231-65/+160
| | | | | | | | | | | | | | | | | | This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne and we know the result of the compare instruction is zero. For example, BB#0: subs w0, w1, w2 str w0, [x1] b.ne .LBB0_2 BB#1: mov w0, wzr ; <-- redundant str w0, [x2] .LBB0_2 Differential Revision: https://reviews.llvm.org/D35075 llvm-svn: 308849
* [X86] Add some hasSideEffects=0 flags.Craig Topper2017-07-232-1/+4
| | | | llvm-svn: 308835
* [X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful ↵Craig Topper2017-07-232-6/+59
| | | | | | | | complexity adjustment to keep shift by immediate using the legacy instructions. These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns. llvm-svn: 308834
* [X86] Add nopq instruction which is a rex encoded version of nopl for gas ↵Craig Topper2017-07-221-0/+4
| | | | | | compatibility. llvm-svn: 308818
* [X86] Add register form of NOPL and NOPW for assembler/disassembler.Craig Topper2017-07-221-0/+5
| | | | | | Fixes PR32805. llvm-svn: 308817
* AMDGPU: Remove leftover td fileMatt Arsenault2017-07-222-16/+0
| | | | | | | All of the instructions were moved out of this a while ago, so it's just a useless comment now. llvm-svn: 308815
* Remove Bitrig: LLVM ChangesErich Keane2017-07-211-1/+0
| | | | | | | | Bitrig code has been merged back to OpenBSD, thus the OS has been abandoned. Differential Revision: https://reviews.llvm.org/D35707 llvm-svn: 308799
* X86InterleaveAccess: A fix for bug33826Farhana Aleen2017-07-211-13/+18
| | | | | | | | Reviewers: DavidKreitzer Differential Revision: https://reviews.llvm.org/D35638 llvm-svn: 308784
* AMDGPU: Implement memory modelKonstantin Zhuravlyov2017-07-218-6/+591
| | | | llvm-svn: 308781
* [PPC] Add Defs = [CARRY] to MIR SRADI_32Guozhi Wei2017-07-211-1/+1
| | | | | | | | | | MIR SRADI uses instruction template XSForm_1rc which declares Defs = [CARRY]. But MIR SRADI_32 uses instruction template XSForm_1, and it doesn't declare such implicit definition. With patch D33720 it causes wrong code generation for perl. This patch adds the implicit definition. Differential Revision: https://reviews.llvm.org/D35699 llvm-svn: 308780
* AMDGPU: Introduce maybeAtomic instruction flagKonstantin Zhuravlyov2017-07-215-3/+17
| | | | | | Testing is in the follow up change llvm-svn: 308779
* AMDGPU: Preserve undef flag in eliminateFrameIndexMatt Arsenault2017-07-211-10/+9
| | | | | | | | | | Fixes verifier errors in some call tests. Not sure why we haven't run into this before. Test split into separate patch for once call support is committed. llvm-svn: 308774
* AMDGPU: Partially fix improper reliance on memoperandsMatt Arsenault2017-07-211-17/+26
| | | | | | | There are 2 more places doing this, but I'm not sure what they are doing and don't make any sense to me llvm-svn: 308770
* AMDGPU: Don't track lgkmcnt for global_/scratch_ instructionsMatt Arsenault2017-07-213-9/+17
| | | | llvm-svn: 308766
* AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsetsMatt Arsenault2017-07-211-3/+13
| | | | llvm-svn: 308762
* [Hexagon] Add inline-asm constraint 'a' for modifier register classKrzysztof Parzyszek2017-07-211-2/+10
| | | | | | | For example asm ("memw(%0++%1) = %2" : : "r"(addr),"a"(mod),"r"(val) : "memory") llvm-svn: 308761
* [mips] Support -membedded-data and fix a related bugSimon Dardis2017-07-211-2/+15
| | | | | | | | | | | | -membedded-data changes the location of constant data from the .sdata to the .rodata section. Previously it was (incorrectly) always located in the .rodata section. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D35686 llvm-svn: 308758
* AMDGPU: Add instruction definitions for some scratch_* instructionsMatt Arsenault2017-07-215-19/+108
| | | | | | Omit atomics for now since they probably aren't useful. llvm-svn: 308747
* [mips] Enable IAS by default for Android MIPS64Petar Jovanovic2017-07-211-0/+4
| | | | | | | | | Follow up to r306280 in Clang. Enable IAS by default for Android MIPS64 (uses N64 ABI). Differential Revision: https://reviews.llvm.org/D35482 llvm-svn: 308742
* [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifierDmitry Preobrazhensky2017-07-2110-40/+343
| | | | | | | | | | See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591 Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35424 llvm-svn: 308740
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-2126-61/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* [X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSKSimon Pilgrim2017-07-211-3/+15
| | | | | | | | | | | | Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction. This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results. In future we could probably generalize this to handle more cases. Differential Revision: https://reviews.llvm.org/D35303 llvm-svn: 308723
* [AVX-512] Fix a bug that prevented some non-temporal loads from using the ↵Craig Topper2017-07-211-3/+3
| | | | | | | | movntdqa instruction. The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits. llvm-svn: 308702
* [AArch64] Adjust the cost model for Exynos M1 and M2Evandro Menezes2017-07-201-10/+16
| | | | | | | Add the cost for the EXT instructions and explicitly add the cost for a few instructions that were implied by the coarse model. llvm-svn: 308697
* Recommit: GlobalISel: select G_EXTRACT and G_INSERT instructions on AArch64.Tim Northover2017-07-201-1/+49
| | | | | | | | It revealed a bug in the Localizer pass which has now been fixed. This includes the fix for SUBREG_TO_REG committed separately last time. llvm-svn: 308688
* [NVPTX] Add lowering of i128 params.Artem Belevich2017-07-203-11/+32
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 308675
* AMDGPU: Rename _RTN atomic instructionsMatt Arsenault2017-07-202-26/+26
| | | | | | | | | | | Move the _RTN to the end of the name. It reads better if the other addressing mode components line up with the non-RTN version. It is also more convenient to define saddr variants of FLAT atomics to have the RTN last, and it is good to have a consistent naming scheme. llvm-svn: 308674
* Add an ID field to StackObjectsMatt Arsenault2017-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | | On AMDGPU SGPR spills are really spilled to another register. The spiller creates the spills to new frame index objects, which is used as a placeholder. This will eventually be replaced with a reference to a position in a VGPR to write to and the frame index deleted. It is most likely not a real stack location that can be shared with another stack object. This is a problem when StackSlotColoring decides it should combine a frame index used for a normal VGPR spill with a real stack location and a frame index used for an SGPR. Add an ID field so that StackSlotColoring has a way of knowing the different frame index types are incompatible. llvm-svn: 308673
* Changed EOL back to LF. NFC.Artem Belevich2017-07-202-7832/+7832
| | | | llvm-svn: 308671
* [COFF, ARM64, CodeView] Add support to emit CodeView debug info for ARM64 COFFMandeep Singh Grang2017-07-204-1/+18
| | | | | | | | | | | | Reviewers: compnerd, ruiu, rnk, zturner Reviewed By: rnk Subscribers: majnemer, aemerson, aprantl, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35518 llvm-svn: 308665
* [SPARC] Clean up the support for disabling fsmuld and fmuls instructions.James Y Knight2017-07-209-279/+25
| | | | | | | | | | | | | | | | | Summary: Also enable no-fsmuld for sparcv7 (which doesn't have the instruction). The previous code which used a post-processing pass to do this was unnecessary; disabling the instruction is entirely sufficient. Reviewers: jacob_hansen, ekedaigle Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35576 llvm-svn: 308661
* Implement LaneBitmask::getNumLanes and LaneBitmask::getHighestLaneKrzysztof Parzyszek2017-07-202-3/+2
| | | | | | | This should eliminate most uses of countPopulation and Log2_32 on the lane mask values. llvm-svn: 308658
* [X86] Allow masks with more than 6 bits set on the x << (y & mask) ↵Craig Topper2017-07-201-1/+1
| | | | | | optimization for the 64-bit memory shifts. llvm-svn: 308657
* Use LaneBitmask::getLane in a few more placesKrzysztof Parzyszek2017-07-201-1/+1
| | | | llvm-svn: 308655
* AMDGPU: Add encoding for carryless add/sub instructionsMatt Arsenault2017-07-205-1/+69
| | | | llvm-svn: 308639
* AMDGPU: Add encodings for global atomicsMatt Arsenault2017-07-201-25/+190
| | | | llvm-svn: 308638
OpenPOWER on IntegriCloud