summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [Hexagon] Handle cases when the aligned stack pointer is missingKrzysztof Parzyszek2017-06-262-9/+19
| | | | llvm-svn: 306288
* [SystemZ] Add a check against zero before calling getTestUnderMaskCond()Jonas Paulsson2017-06-261-0/+2
| | | | | | | | | | | | | | Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287
* AMDGPU: Whitespace fixesMatt Arsenault2017-06-264-6/+6
| | | | llvm-svn: 306265
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-266-30/+34
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memopsSimon Pilgrim2017-06-251-10/+0
| | | | | | The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-06-252-2/+2
| | | | llvm-svn: 306247
* [GlobalISel][X86] Support vector type G_EXTRACT selection.Igor Breger2017-06-251-0/+104
| | | | | | | | | | | | | | | | Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240
* [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2Dorit Nuzman2017-06-252-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
* fix trivial typos in comment, NFCHiroshi Inoue2017-06-241-1/+1
| | | | llvm-svn: 306211
* Simplify the processFixupValue interface. NFC.Rafael Espindola2017-06-246-28/+15
| | | | llvm-svn: 306202
* Remove a processFixupValue hack.Rafael Espindola2017-06-242-35/+32
| | | | | | | | | | | The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200
* [WebAssembly] Fix build after r306177Derek Schuff2017-06-241-10/+14
| | | | llvm-svn: 306190
* Remove redundant argument.Rafael Espindola2017-06-2414-22/+23
| | | | llvm-svn: 306189
* Move Value adjustment to applyFixup. NFC.Rafael Espindola2017-06-231-2/+1
| | | | llvm-svn: 306178
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-2314-43/+74
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
* Reland r306095: [mips] Fix reg positions in the aui/daui instructionsPetar Jovanovic2017-06-231-3/+3
| | | | | | | | | | | | | | | | After fixing (r306173) a failing test in the lld test suite (r306173), reland r306095. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aui/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306174
* [AArch64][Falkor] Remove some non-existent opcodes from sched detail ↵Geoff Berry2017-06-231-12/+12
| | | | | | regexes. NFC. llvm-svn: 306170
* [MSP430] Fix data layout string.Vadzim Dambrouski2017-06-231-2/+6
| | | | | | | | | | | | | | | | Summary: Without this patch some types have incorrect size and/or alignment according to the MSP430 EABI. Reviewers: asl, awygle Reviewed By: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34561 llvm-svn: 306159
* Revert "[Hexagon] Handle decreasing of stack alignment in frame lowering"Krzysztof Parzyszek2017-06-234-51/+2
| | | | | | This breaks passing of aligned function arguments. llvm-svn: 306145
* [AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".Chad Rosier2017-06-236-3/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (Bcc), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling Bcc instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can reduce register pressure and is also the default behavior for GCC. A few examples: add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed. add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2 In looking at all current sub-target machine descriptions, this transformation appears to be either positive or neutral. Differential Revision: https://reviews.llvm.org/D34220. llvm-svn: 306144
* [X86] Fix SP adjustment in stack probes emitted on 32-bit Windows.whitequark2017-06-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit r306010 adjusted the condition as follows: - if (Is64Bit) { + if (!STI.isTargetWin32()) { The intent was to preserve the behavior on all Windows platforms but extend the behavior on 64-bit Windows platforms to every other one. (Before r306010, emitStackProbeCall only ever executed when emitting code for Windows triples.) Unfortunately, if (Is64Bit && STI.isOSWindows()) is not the same as if (!STI.isTargetWin32()) because of the way isTargetWin32() is defined: bool isTargetWin32() const { return !In64BitMode && (isTargetCygMing() || isTargetKnownWindowsMSVC()); } In practice this broke the JIT tests on 32-bit Windows, which did not satisfy the new condition: LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll because %esp was not updated correctly. The failures are only visible on a MSVC 2017 Debug build, for which we do not have bots. llvm-svn: 306142
* [Hexagon] Remove call to printAndVerify from HexagonPassConfigKrzysztof Parzyszek2017-06-231-1/+0
| | | | | | | It causes an extra pass of the machine verifier to be added to the pass manager, and causes test/CodeGen/Generic/llc-start-stop.ll to fail. llvm-svn: 306140
* [x86] fix value types for SBB transform (PR33560)Sanjay Patel2017-06-231-8/+13
| | | | | | | | | | | I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139
* [Hexagon] Handle decreasing of stack alignment in frame loweringKrzysztof Parzyszek2017-06-234-2/+51
| | | | llvm-svn: 306124
* Remove trailing whitespace. NFCI.Simon Pilgrim2017-06-231-1/+1
| | | | llvm-svn: 306121
* GlobalISel: remove G_SEQUENCE instruction.Tim Northover2017-06-231-4/+0
| | | | | | | | It was trying to do too many things. The basic lumping together of values for legalization purposes is now handled by G_MERGE_VALUES. More complex things involving gaps and odd sizes are handled by G_INSERT sequences. llvm-svn: 306120
* [SystemZ] Remove unnecessary serialization before volatile loadsUlrich Weigand2017-06-234-17/+6
| | | | | | | | | | | | | | | | | This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad introduced by r196905. Nothing in the semantics of the "volatile" keyword or the definition of the z/Architecture actually requires that volatile loads are preceded by a serialization operation, and no other compiler on the platform actually implements this. Since we've now seen a use case where this additional serialization causes noticable performance degradation, this patch removes it. The patch still leaves in the serialization before atomic loads, which is now implemented directly in lowerATOMIC_LOAD. (This also seems overkill, but that can be addressed separately.) llvm-svn: 306117
* AMDGPU/GlobalISel: Mark 32-bit G_AND as legalTom Stellard2017-06-231-0/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112
* [SystemZ] Fix trap issue and enable expensive checks.Jonas Paulsson2017-06-232-8/+3
| | | | | | | | | | | | | | | | | | | | | | | The isBarrier/isTerminator flags have been removed from the SystemZ trap instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just an issue at -O0 and did not affect code output on benchmarks. (Like Eli pointed out: "targets are split over whether they consider their "trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and SystemZ do. We should probably try to be consistent here.". This is still the case, although SystemZ has switched sides). SystemZ now returns true in isMachineVerifierClean() :-) These Generic tests have been modified so that they can be run with or without EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and CodeGen/Generic/print-machineinstrs.ll Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman https://bugs.llvm.org/show_bug.cgi?id=33047 https://reviews.llvm.org/D34143 llvm-svn: 306106
* Revert r306095: [mips] Fix reg positions in the aui/daui instructionsPetar Jovanovic2017-06-231-3/+3
| | | | | | | | | | | | | | | ELF/mips-plt-r6.s in lld-test is failing. Reverting the change. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aut/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306099
* [mips] Fix register positions in the aui/daui instructionsPetar Jovanovic2017-06-231-3/+3
| | | | | | | | | | | | Swapped the position of the rt and rs register in the aut/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D33988 llvm-svn: 306095
* [mips][msa] Splat.d endianness checkStefan Maksimovic2017-06-231-6/+12
| | | | | | | | | | | | | Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian. Additionally, masking has been performed for vshf via which splat.d is created. Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero. Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases. Masking with one results in generating a single splati.d instruction when possible. Differential Revision: https://reviews.llvm.org/D32216 llvm-svn: 306090
* COFF: Produce an error on invalid pcrel relocs.Rafael Espindola2017-06-232-7/+17
| | | | | | | | | | X86_64 COFF only has support for 32 bit pcrel relocations. Produce an error on all others. Note that gnu as has extended the relocation values to support this. It is not clear if we should support the gnu extension. llvm-svn: 306082
* Fixed a (product) build error that was due to an unused variableFarhana Aleen2017-06-221-2/+1
| | | | | | | | | | | | | Details: There was a use but it was in the assert which was not exercised during product build. Reviewers: Andrew Kaylor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306073
* [x86] add/sub (X==0) --> sbb(cmp X, 1)Sanjay Patel2017-06-221-5/+17
| | | | | | | | | | | | | | | | | | | | | | This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | 1.0 | | | | | | | cmp edi, 0x1 | 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072
* Supported lowerInterleavedStore() in X86InterleavedAccess.Farhana Aleen2017-06-222-31/+101
| | | | | | | | | | Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068
* [WebAssembly] WebAssemblyFastISel getelementptr variable index supportJacob Gravelle2017-06-221-1/+19
| | | | | | | | | | | | | | Summary: Previously -fast-isel getelementptr would constant-fold non-constant i8 load/stores. Reviewers: sunfish Subscribers: jfb, dschuff, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D34044 llvm-svn: 306060
* [Hexagon] Properly update kill flags in HexagonNewValueJumpKrzysztof Parzyszek2017-06-221-1/+1
| | | | | | | The feeder instruction will be moved to right before the compare, so the updating code should not be looking for kills past the compare. llvm-svn: 306059
* [Hexagon] Use LivePhysRegs to fix up kills in HexagonGenMuxKrzysztof Parzyszek2017-06-221-50/+36
| | | | | | Remove the previous, manual shuffling of the kill flags. llvm-svn: 306054
* [AVX-512] Remove and autoupgrade the masked integer compare intrinsicsCraig Topper2017-06-221-24/+0
| | | | | | | | | | | | | | | | | Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047
* [x86] add/sub (X==0) --> sbb(neg X)Sanjay Patel2017-06-221-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1* | | | | | | | | xor eax, eax | 1 | 1.0 | | | | | | CP | test edi, edi | 1 | | | | | | 1.0 | CP | setnz al | 1 | | 1.0 | | | | | CP | neg eax This is the adc version: | 1* | | | | | | | | xor eax, eax | 1 | 1.0 | | | | | | CP | cmp edi, 0x1 | 2 | | 1.0 | | | | 1.0 | CP | adc eax, 0xffffffff And this is sbb: | 1 | 1.0 | | | | | | | neg edi | 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040
* Add a common error checking for some invalid expressions.Rafael Espindola2017-06-221-2/+1
| | | | | | | This refactors a bit of duplicated code and fixes an assertion failure on ELF. llvm-svn: 306035
* [AMDGPU] Add intrinsics for tbuffer load and store - build error fixDavid Stuttard2017-06-221-2/+1
| | | | | | | Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-228-121/+535
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* [Hexagon] Handle a global operand to A2_addi when creating duplexesKrzysztof Parzyszek2017-06-221-27/+26
| | | | llvm-svn: 306012
* [X86] Add support for "probe-stack" attributewhitequark2017-06-223-19/+37
| | | | | | | | | | | This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010
* [ARM] Create relocations for beq.w branches to ARM function syms.Florian Hahn2017-06-221-0/+1
| | | | | | | | | | | | | | | | | | Summary: The ARM ELF ABI requires the linker to do interworking for wide conditional branches from Thumb code to ARM code. That was pointed out by @peter.smith in the comments for D33436. Reviewers: rafael, peter.smith, echristo Reviewed By: peter.smith Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, peter.smith Differential Revision: https://reviews.llvm.org/D34447 llvm-svn: 306009
* [mips] Allow $AT to be used as a register namePetar Jovanovic2017-06-221-1/+1
| | | | | | | | | | | This patch allows $AT to be used as a register name in assembly files. Currently only $at is recognized as a valid register name. Patch by Stanislav Ocovaj. Differential Revision: https://reviews.llvm.org/D34348 llvm-svn: 306007
* [Hexagon] Recognize potential offset overflow for store-imm to stackKrzysztof Parzyszek2017-06-221-4/+39
| | | | | | | Reserve an extra scavenging stack slot if the offset field in store- -immediate instructions may overflow. llvm-svn: 306004
* [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit ↵Sam Kolton2017-06-221-11/+15
| | | | | | | | | | | | | | | | encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999
OpenPOWER on IntegriCloud