summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Change encodeU/SLEB128 to pad to certain number of bytesSam Clegg2017-09-151-5/+2
| | | | | | | | | | | | | | | Previously the 'Padding' argument was the number of padding bytes to add. However most callers that use 'Padding' know how many overall bytes they need to write. With the previous code this would mean encoding the LEB once to find out how many bytes it would occupy and then using this to calulate the 'Padding' value. See: https://reviews.llvm.org/D36595 Differential Revision: https://reviews.llvm.org/D37494 llvm-svn: 313393
* [llvm] Fix some typos. NFC.Mandeep Singh Grang2017-09-151-1/+1
| | | | | | | | | | | | Reviewers: mcrosier Reviewed By: mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37922 llvm-svn: 313388
* Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection ↵Hans Wennborg2017-09-152-479/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376
* [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones ↵Craig Topper2017-09-151-3/+4
| | | | | | | | | | | | that can be done with a insertf128 The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks. But I think we want to allow VPERMQ for any unary shuffle. Differential Revision: https://reviews.llvm.org/D37893 llvm-svn: 313373
* [X86] Use SDNode::ops() instead of makeArrayRef and op_begin(). NFCICraig Topper2017-09-151-5/+5
| | | | llvm-svn: 313367
* [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 ↵Craig Topper2017-09-151-0/+12
| | | | | | | | | | | | | | constant build vectors When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split. This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues. Fixes PR34605. Differential Revision: https://reviews.llvm.org/D37858 llvm-svn: 313366
* [X86] Add isel pattern infrastructure to begin recognizing when we're ↵Craig Topper2017-09-151-0/+59
| | | | | | | | | | | | | | | | | | inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros. Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64. Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way. So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop. I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted. Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544. Differential Revision: https://reviews.llvm.org/D37653 llvm-svn: 313365
* [Hexagon] Switch to parameterized register classes for HVXKrzysztof Parzyszek2017-09-1529-13785/+2407
| | | | | | | This removes the duplicate HVX instruction set for the 128-byte mode. Single instruction set now works for both modes (64- and 128-byte). llvm-svn: 313362
* [AArch64] allow v8f16 types when FullFP16 is supportedSjoerd Meijer2017-09-151-33/+31
| | | | | | | | | | | This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351
* [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.Jatin Bhateja2017-09-152-10/+479
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343
* [X86] Remove an unnecessary SmallVector from LowerBUILD_VECTOR.Craig Topper2017-09-141-4/+2
| | | | | | I think this may have existed to convert from SDUse to SDValue, but it doesn't look like its needed now. llvm-svn: 313311
* Fix warnings in r313297.Jan Sjodin2017-09-142-5/+3
| | | | llvm-svn: 313302
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-4/+5
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-144-4/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: Fix assert on alloca of array of structMatt Arsenault2017-09-141-6/+5
| | | | llvm-svn: 313282
* AMDGPU: Stop modifying SP in call sequencesMatt Arsenault2017-09-141-3/+3
| | | | | | | | | | | | | | | Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
* [mips] Implement the 'dext' aliases and it's disassembly alias.Simon Dardis2017-09-145-43/+162
| | | | | | | | | | | | | | | | The other members of the dext family of instructions (dextm, dextu) are traditionally handled by the assembler selecting the right variant of 'dext' depending on the values of the position and size operands. When these instructions are disassembled, rather than reporting the actual instruction, an equivalent aliased form of 'dext' is generated and is reported. This is to mimic the behaviour of binutils. Reviewers: slthakur, nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D34887 llvm-svn: 313276
* AMDGPU: Make frame register caller preservedMatt Arsenault2017-09-142-10/+16
| | | | | | | | | | | | | Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
* [mips] Implement the 'dins' aliases.Simon Dardis2017-09-145-25/+139
| | | | | | | | | | | | Traditionally GAS has provided automatic selection between dins, dinsm and dinsu. Binutils also disassembles all instructions in that family as 'dins' rather than the actual instruction. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34877 llvm-svn: 313267
* Test commit.Aleksandar Beserminji2017-09-141-1/+1
| | | | llvm-svn: 313262
* [Hexagon] Make getMemAccessSize return size in bytesKrzysztof Parzyszek2017-09-147-58/+65
| | | | | | | | It used to return the actual field value from the instruction descriptor. There is no reason for that, that value is not interesting in any way and the specifics of its encoding in the descriptor should not be exposed. llvm-svn: 313257
* [X86] When applying the shuffle-to-zero-extend transformation on floating ↵Ayman Musa2017-09-141-4/+9
| | | | | | | | | | point, bitcast to integer first. Fix issue described in PR34577. Differential Revision: https://reviews.llvm.org/D37803 llvm-svn: 313256
* [mips] Pick the right variant of DINS upfront and enable target instruction ↵Simon Dardis2017-09-148-44/+119
| | | | | | | | | | | | | | | | | | | | | | | | | verification This patch complements D16810 "[mips] Make isel select the correct DEXT variant up front.". Now ISel picks the right variant of DINS, so now there is no need to replace DINS with the appropriate variant during MipsMCCodeEmitter::encodeInstruction(). This patch also enables target specific instruction verification for ins, dins, dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these constraints are not checked during instruction selection. Adding machine verification should catch outstanding cases. Finally, correct a bug that instruction verification uncovered, where the position operand of a DINSU generated during lowering was being silently and accidently corrected to the correct value. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34809 llvm-svn: 313254
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-133-0/+16
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-134-2/+47
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* AMDGPU: Handle coldcc in more placesMatt Arsenault2017-09-131-0/+2
| | | | | | Missed in r312936 llvm-svn: 313205
* Refactoring the stride 4 code in the X86interleavedaccess NFCMichael Zuckerman2017-09-131-34/+32
| | | | llvm-svn: 313166
* [mips] correct operand range for DINSM instructionPetar Jovanovic2017-09-131-1/+1
| | | | | | | | | | | | | This patch corrects the definition of the DINSM instruction. Specification for DINSM instruction for Mips64 says that size operand should be 2 <= size <= 64, but it is defined as uimm5_inssize_plus1 which gives range of 1 .. 32. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37683 llvm-svn: 313149
* [Power9] Add missing instructions: extswsli, popcntbStefan Pintilie2017-09-132-0/+22
| | | | | | | | Added the following P9 instructions: extswsli, extswsli., popcntb Differential Revision: https://reviews.llvm.org/D37342 llvm-svn: 313147
* [GlobalISel][X86] support G_FPEXT operation.Igor Breger2017-09-132-3/+15
| | | | | | | | | | | | | | Summary: Support G_FPEXT operation. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34816 llvm-svn: 313135
* [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)Uriel Korach2017-09-131-18/+0
| | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134
* [X86] Adding X86 Processor FamiliesMohammed Agabaria2017-09-133-5/+59
| | | | | | | | | Adding x86 Processor families to initialize several uArch properties (based on the family) This patch shows how gather cost can be initialized based on the proc. family Differential Revision: https://reviews.llvm.org/D35348 llvm-svn: 313132
* [X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a ↵Craig Topper2017-09-131-2/+9
| | | | | | | | BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126
* [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vectorElena Demikhovsky2017-09-131-0/+1
| | | | | | Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121
* [X86] Use isUInt<32> to simplify some code. NFCCraig Topper2017-09-131-1/+1
| | | | llvm-svn: 313112
* [Fuchsia] Magenta -> ZirconPetr Hosek2017-09-132-4/+4
| | | | | | | | | | | Fuchsia's lowest API layer has been renamed from Magenta to Zircon. In LLVM proper, this is only mentioned in comments. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D37763 llvm-svn: 313105
* [WebAssembly] Add sign extend instructions from atomics proposalDerek Schuff2017-09-132-2/+24
| | | | | | | | | | Select them from ISD::SIGN_EXTEND_INREG Differential Revision: https://reviews.llvm.org/D37603 remove spurious change llvm-svn: 313101
* [x86] eliminate unnecessary vector compare for AVX masked storeSanjay Patel2017-09-121-2/+27
| | | | | | | | | | | | | | | | | The masked store instruction only cares about the sign-bit of each mask element, so the compare s<0 isn't needed. As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) I filed a bug to track improvements for AVX512: https://bugs.llvm.org/show_bug.cgi?id=34584 Differential Revision: https://reviews.llvm.org/D37446 llvm-svn: 313089
* [mips] handle UImm16_AltRelaxed match typePetar Jovanovic2017-09-121-0/+1
| | | | | | | | | | | | | Currently, UImm16_AltRelaxed match type is not handled in MatchAndEmitInstruction() function, which may result in llvm_unreachable() behavior. This patch adds necessary case for this match type. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37682 llvm-svn: 313077
* [AArch64][GlobalISel] Select all fpexts.Ahmed Bougacha2017-09-122-33/+7
| | | | | | | Tablegen already can select these: mark them as legal, remove the c++ code, and add tests for all types. llvm-svn: 313074
* [AArch64][GlobalISel] Select all fptruncs.Ahmed Bougacha2017-09-121-28/+3
| | | | | | | | | | We already support these in tablegen, but we're matching the wrong operator (libm ftrunc). Fix that. While there, drop the c++ code, support COPYs of FPR16, and add tests for the other types. llvm-svn: 313073
* Update branch coalescing to be a PowerPC specific passLei Huang2017-09-124-0/+794
| | | | | | | | | | | | Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes the analyzeBranch method which currently does not include any implicit operands. This is not an issue on PPC but must be handled on other targets. Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce. Differential Revision : https: // reviews.llvm.org/D32776 llvm-svn: 313061
* bpf: Add BPF AsmParser support in LLVMYonghong Song2017-09-125-1/+500
| | | | | | Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 313055
* [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI ↵Craig Topper2017-09-124-52/+92
| | | | | | | | | | | | | | | | instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054
* Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY"Hans Wennborg2017-09-122-168/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was causing PR34045 to fire again. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 Also revert follow-up r313010: > [ARM] Fix typo when creating ISD::SUB nodes > > In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, > giving them two values instead of one. > > This fails when the merge_values combiner finds one of these nodes. > > This change fixes PR34564. > > Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313044
* [SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers.Jonas Paulsson2017-09-121-0/+3
| | | | | | | | | | | | | This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023
* [AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC.Sjoerd Meijer2017-09-121-19/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D37676 llvm-svn: 313017
* [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IRYael Tsafrir2017-09-121-10/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013
* [ARM] Fix typo when creating ISD::SUB nodesRoger Ferrer Ibanez2017-09-121-5/+5
| | | | | | | | | | | | | In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, giving them two values instead of one. This fails when the merge_values combiner finds one of these nodes. This change fixes PR34564. Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313010
* [ARM] Use ADDCARRY / SUBCARRYRoger Ferrer Ibanez2017-09-122-20/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313009
OpenPOWER on IntegriCloud