summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Select vaddvaSam Tebbs2019-08-201-0/+7
| | | | | | | | This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404
* [X86][BtVer2] Fix latency and throughput of atomic INC/DEC/NEG/NOT.Andrea Di Biagio2019-08-201-0/+14
| | | | | | | | | Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy. Number of uOPs is still 1. Differential Revision: https://reviews.llvm.org/D66469 llvm-svn: 369388
* [RISCV] Implement getExprForFDESymbol to ensure RISCV_32_PCREL is used for ↵Alex Bradbury2019-08-205-0/+30
| | | | | | | | | | | | | | | | | | the FDE location Follow binutils in using RISCV_32_PCREL for the FDE initial location. As explained in the relevant binutils commit <https://github.com/riscv/riscv-binutils-gdb/commit/a6cbf936e3dce68114d28cdf60d510a3f78a6d40>, the ADD/SUB pair of relocations is problematic in the presence of linker relaxation. This patch has the same end goal as D64715 but includes test changes and avoids adding a new global VariantKind to MCExpr.h (preferring RISCVMCExpr VKs like the rest of the RISC-V backend). Differential Revision: https://reviews.llvm.org/D66419 llvm-svn: 369375
* [X86][Btver2] Fix latency and throughput of CMPXCHG instructions.Andrea Di Biagio2019-08-205-4/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC. Throughput is superiorly limited to 0.33 because of the implicit in/out dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the same memory location, store-to-load forwarding occurs and values for sequent loads are quickly forwarded from the store buffer. Interestingly, the functionality in LLVM that computes the reciprocal throughput doesn't seem to know about RMW instructions. That functionality only looks at the "consumed resource cycles" for the throughput computation. It should be fixed/improved by a future patch. In particular, for RMW instructions, that logic should also take into account for the write latency of in/out register operands. An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to ~17cy/inst due to cache locking, which prevents other memory uOPs to start executing before the "lock releasing" store uOP. CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less macro opcode. Their latency tend to be the same as the other RR/RM variants. RR variants are relatively fast 3cy (but still microcoded - 5 macro opcodes). CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load forwarding. That means, throughput is clearly limited by the in/out dependency on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs for the Integer pipes). I have reused the same mix of consumed resource from the other CMPXCHG instructions for CMPXCHG8B too. LOCK CMPXCHG8B is instead 18cycles. CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38) cycles dependeing on whether the LOCK prefix is specified or not. I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to 2x the resource cycles used for CMPXCHG8B. The two new hasLockPrefix() functions are used by the btver2 scheduling model check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have been encoded in predicates of variant scheduling classes that describe lat/thr of CMPXCHG. Differential Revision: https://reviews.llvm.org/D66424 llvm-svn: 369365
* [X86] Add back the -x86-experimental-vector-widening-legalization comand ↵Craig Topper2019-08-202-143/+1227
| | | | | | | | | | | | | | | | | | | | | line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332
* [NFC] Test commit, fix some comment spelling.Thomas Raoux2019-08-201-1/+1
| | | | llvm-svn: 369326
* [AsmPrinter] Remove const qualifier from EmitBasicBlockStart.Karl-Johan Karlsson2019-08-204-6/+6
| | | | | | | | | | | | | | | | Overriders may want to modify state in it. AMDGPU wants to, but has to make its members mutable in order to do so. Besides, EmitBasicBlockEnd is not const, so why should Start be? Patch by Bevin Hansson. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D66341 llvm-svn: 369325
* Refactor isPointerOffset (NFC).Evgeniy Stepanov2019-08-191-7/+7
| | | | | | | | | | | | | | | | Summary: Simplify the API using Optional<> and address comments in https://reviews.llvm.org/D66165 Reviewers: vitalybuka Subscribers: hiraditya, llvm-commits, ostannard, pcc Tags: #llvm Differential Revision: https://reviews.llvm.org/D66317 llvm-svn: 369300
* MemTag: stack initializer merging.Evgeniy Stepanov2019-08-193-6/+301
| | | | | | | | | | | | | | | | | | | | | | Summary: MTE provides instructions to update memory tags and data at the same time. This change makes use of those to generate more compact code for stack variable tagging + initialization. We collect memory store and memset instructions following an alloca or a lifetime.start call, and replace them with the corresponding MTE intrinsics. Since the intrinsics work on 16-byte aligned chunks, the stored values are combined as necessary. Reviewers: pcc, vitalybuka, ostannard Subscribers: srhines, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66167 llvm-svn: 369297
* [WebAssembly][MC] Allow empty assembly functionsSam Clegg2019-08-191-9/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D66434 llvm-svn: 369292
* [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more ↵Craig Topper2019-08-191-9/+11
| | | | | | | | | | | | | | than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle. The motivating case are the changes in vector-reduce-add.ll where we were doing extra work in the scalar domain instead of shuffling. There may be some one use check that needs to be looked into there, but this patch sidesteps the issue by avoiding broadcasts that aren't really broadcasting. Differential Revision: https://reviews.llvm.org/D66071 llvm-svn: 369287
* [nfc] Silent gcc warningSerge Guelton2019-08-191-3/+2
| | | | llvm-svn: 369266
* [PeepholeOptimizer] Don't assume bitcast def always has inputJinsong Ji2019-08-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If we have a MI marked with bitcast bits, but without input operands, PeepholeOptimizer might crash with assert. eg: If we apply the changes in PPCInstrVSX.td as in this patch: [(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>; We will get assert in PeepholeOptimizer. ``` llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. ``` The fix is to abort if we found out of bound access. Reviewers: qcolombet, MatzeB, hfinkel, arsenm Reviewed By: qcolombet Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65542 llvm-svn: 369261
* [RISCV] Don't force absolute FK_Data_X fixups to relocsAlex Bradbury2019-08-191-0/+7
| | | | | | | | | | | | The current behavior of shouldForceRelocation forces relocations for the majority of fixups when relaxation is enabled. This makes sense for fixups which incorporate symbols but is unnecessary for simple data fixups where the fixup target is already resolved to an absolute value. Differential Revision: https://reviews.llvm.org/D63404 Patch by Edward Jones. llvm-svn: 369257
* [ARM] Add support for MVE vaddvSam Tebbs2019-08-194-0/+41
| | | | | | | | | This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245
* [ARM] MVE sext costsDavid Green2019-08-191-0/+25
| | | | | | | | | This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244
* [X86] Teach lower1BitShuffle to match right shifts with upper zero elements ↵Craig Topper2019-08-191-19/+20
| | | | | | | | | | on types that don't natively support KSHIFT. We can support these by widening to a supported type, then shifting all the way to the left and then back to the right to ensure that we shift in zeroes. llvm-svn: 369232
* [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the ↵Craig Topper2019-08-191-1/+1
| | | | | | | | | | widened vector to the KSHIFT node. Not sure how to test this as we have tests that exercise this code, but nothing failed for the types not matching. Since all the k-registers use equivalent register classes everything just ends up working. llvm-svn: 369228
* [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and ↵Craig Topper2019-08-191-0/+48
| | | | | | | | | | | only relies on undef. This allows us to widen the type when the KSHIFTR instruction doesn't exist for the type. If we need to shift in zeroes into the upper elements we would need more work to guarantee zeroes when widening. llvm-svn: 369227
* [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros ↵Craig Topper2019-08-191-7/+16
| | | | | | | | | with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226
* [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating ↵Craig Topper2019-08-181-14/+30
| | | | | | | | | | zero vectors followed by one non-zero vector followed by undef vectors. For such a case we should only need a KSHIFTL, but we were previously generating a KSHIFTL followed by a KSHIFTR because we mistakenly believed we need to zero the undef elements. llvm-svn: 369224
* [X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant.Craig Topper2019-08-181-4/+4
| | | | | | vXi1 vectors don't need special handling. llvm-svn: 369222
* [X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors.Craig Topper2019-08-181-8/+24
| | | | | | | We can insert the value into a larger legal type and shift that by the desired amount. llvm-svn: 369215
* Fix signed/unsigned comparison warning. NFCI.Simon Pilgrim2019-08-181-2/+2
| | | | llvm-svn: 369213
* [X86] isTargetShuffleEquivalent - add BUILD_VECTOR matchingSimon Pilgrim2019-08-181-3/+21
| | | | | | | | | | Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead. As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises. Also, enables support in matchVectorShuffleWithPACK llvm-svn: 369212
* [X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI.Simon Pilgrim2019-08-181-8/+10
| | | | | | Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler. llvm-svn: 369211
* AMDGPU: Fix iterator error when lowering SI_END_CFMatt Arsenault2019-08-181-4/+4
| | | | | | | If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block. llvm-svn: 369203
* AMDGPU: Disambiguate v3f16 format in load/store tablesMatt Arsenault2019-08-185-104/+119
| | | | | | | | | Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202
* [X86] Add a one use check to the combineStore code that handles ↵Craig Topper2019-08-171-1/+1
| | | | | | | | | v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore. This prevent us from emitting a separate truncate and a truncating store instruction. llvm-svn: 369200
* Revert Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero ↵Paul Walker2019-08-171-6/+4
| | | | | | | | for meta instructions. This reverts r369132 (git commit 19301d75f086caae1a495d267f5d0264b225942d) llvm-svn: 369186
* Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for ↵Paul Walker2019-08-171-4/+6
| | | | | | | | meta instructions. This reverts r369133 (git commit 2632c677f85cba1ac2aef5d68aaf8af0f5b3c944) llvm-svn: 369185
* Reland "[ARM] push LR before __gnu_mcount_nc"Jian Cai2019-08-165-0/+90
| | | | | | | | This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173
* [AArch64][GlobalISel] Fix an assertion during G_UNMERGE selection for s128 ↵Amara Emerson2019-08-161-1/+3
| | | | | | types. llvm-svn: 369172
* Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target ↵Jordan Rupprecht2019-08-161-17/+0
| | | | | | | | | | shuffle using DemandedElts mask (reapplied) This reverts r368662 (git commit 1a8d790cf5f89c1df718844f13e934e39bef6ef5) The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024 llvm-svn: 369167
* [ARM] Preserve liveness in ARMConstantIslands.Eli Friedman2019-08-161-3/+18
| | | | | | | | | | We currently don't use liveness information after this point, but it can be useful to catch bugs using -verify-machineinstrs, and optimizations could potentially use this information in the future. Differential Revision: https://reviews.llvm.org/D66319 llvm-svn: 369162
* [X86] Use Register/MCRegister in more places in X86Craig Topper2019-08-169-43/+45
| | | | | | | | | | This was a quick pass through some obvious places. I haven't tried the clang-tidy check. I also replaced the zeroes in getX86SubSuperRegister with X86::NoRegister which is the real sentinel name. Differential Revision: https://reviews.llvm.org/D66363 llvm-svn: 369151
* Revert "[ARM] push LR before __gnu_mcount_nc"Jian Cai2019-08-165-90/+0
| | | | | | This reverts commit f4cf3b959333f62b7a7b2d7771f7010c9d8da388. llvm-svn: 369149
* [ARM] push LR before __gnu_mcount_ncJian Cai2019-08-165-0/+90
| | | | | | | | | Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147
* [WebAssembly] Forbid use of EM_ASM with setjmp/longjmpGuanzhong Chen2019-08-161-0/+24
| | | | | | | | | | | | | | | | | | | | | | | Summary: We tried to support EM_ASM with setjmp/longjmp in binaryen. But with dynamic linking thrown into the mix, the code is no longer understandable and cannot be maintained. We also discovered more bugs in the EM_ASM handling code. To ensure maintainability and correctness of the binaryen code, EM_ASM will no longer be supported with setjmp/longjmp. This is probably fine since the support was added recently and haven't be published. Reviewers: tlively, sbc100, jgravelle-google, kripken Reviewed By: tlively, kripken Subscribers: dschuff, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66356 llvm-svn: 369137
* [X86] resolveTargetShuffleInputs - add DemandedElts variant. NFCI.Simon Pilgrim2019-08-161-3/+10
| | | | | | Nothing calls this yet, everything still goes through the non (all) DemandedElts wrapper. llvm-svn: 369136
* [X86] combineExtractWithShuffle - handle extract(truncate(x), 0)Simon Pilgrim2019-08-161-1/+11
| | | | | | Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit). llvm-svn: 369134
* [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta ↵Paul Walker2019-08-161-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions. Recommit with fixes for mac builders. Summary: AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta instructions (e.g. CFI_INSTRUCTION) as normal instructions and giving them a size of 4. This results in branch relaxation calculating block sizes wrong. Branch relaxation also considers alignment and thus a single mistake can result in later blocks being incorrectly sized even when they themselves do not contain meta instructions. The net result is we might not relax a branch whose destination is not within range. Reviewers: nickdesaulniers, peter.smith Reviewed By: peter.smith Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66337 > llvm-svn: 369111 llvm-svn: 369133
* Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for ↵Paul Walker2019-08-161-4/+6
| | | | | | | | meta instructions. This reverts r369111 (git commit 3ccee5f7c4087ed119dbeba537f3df1b048a4dff) llvm-svn: 369132
* [X86] Alphabetize pass initialization definitions. NFCI.Simon Pilgrim2019-08-161-1/+1
| | | | llvm-svn: 369126
* [Hexagon] Generate min/max instructions for 64-bit vectorsKrzysztof Parzyszek2019-08-164-49/+135
| | | | llvm-svn: 369124
* Relanding r368987 [AArch64] Change location of frame-record within ↵Sander de Smalen2019-08-163-26/+88
| | | | | | | | | | | | | | | | callee-save area. Changes: There was a condition for `!NeedsFrameRecord` missing in the assert. The assert in question has changed to: + assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP || + RPI.Reg1 == AArch64::LR) && + "FrameRecord must be allocated together with LR"); This addresses PR43016. llvm-svn: 369122
* [ARM] MVE sext of a load is freeDavid Green2019-08-161-0/+15
| | | | | | | | | MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118
* [RISCV] Convert registers from unsigned to RegisterLuis Marques2019-08-1611-78/+82
| | | | | | | | | Only in public interfaces that have not yet been converted should there remain registers with unsigned type. Differential Revision: https://reviews.llvm.org/D66252 llvm-svn: 369114
* [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta ↵Paul Walker2019-08-161-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions. Summary: AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta instructions (e.g. CFI_INSTRUCTION) as normal instructions and giving them a size of 4. This results in branch relaxation calculating block sizes wrong. Branch relaxation also considers alignment and thus a single mistake can result in later blocks being incorrectly sized even when they themselves do not contain meta instructions. The net result is we might not relax a branch whose destination is not within range. Reviewers: nickdesaulniers, peter.smith Reviewed By: peter.smith Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66337 llvm-svn: 369111
* [X86] Remove unused include. NFCI.Simon Pilgrim2019-08-161-1/+0
| | | | | | We don't use anything from TargetOptions.h directly and its included via TargetLowering.h anyhow. llvm-svn: 369110
OpenPOWER on IntegriCloud