summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [RISCV] Fix use of side-effects in asserts in decoder functionsLuis Marques2019-08-211-6/+9
| | | | llvm-svn: 369580
* Fix -Werror=unused-variable error after r369528.Richard Smith2019-08-211-0/+3
| | | | llvm-svn: 369573
* Revert r367389 (and follow-up r368404); it caused PR43073.Nico Weber2019-08-211-50/+76
| | | | llvm-svn: 369567
* [WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcastsSam Clegg2019-08-211-0/+2
| | | | | | | | Fixes: https://github.com/emscripten-core/emscripten/issues/8770 Differential Revision: https://reviews.llvm.org/D66508 llvm-svn: 369566
* [MVT] Add v16f16 and v32f16 vectors.Craig Topper2019-08-211-0/+4
| | | | | | | | | I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565
* [mips] Replace call `expandLoadAddress` by `loadAndAddSymbolAddress`. NFCSimon Atanasyan2019-08-211-2/+2
| | | | | | | | In case of expanding `lw/sw $reg, symbol($reg)` instruction for PIC it's enough to call the `loadAndAddSymbolAddress` method. Additional work performed by the `expandLoadAddress` is not required here. llvm-svn: 369563
* [mips] Remove duplicated case from the `StringSwitch`. NFCSimon Atanasyan2019-08-211-1/+0
| | | | llvm-svn: 369562
* GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sourcesMatt Arsenault2019-08-211-1/+1
| | | | | | | | This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547
* [ARM] Formatting for ARMInstrMVE.td. NFCDavid Green2019-08-211-89/+98
| | | | | | | This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545
* [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitionsAlexander Timofeev2019-08-212-0/+16
| | | | | | | Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
* [LLVM][Alignment] Introduce Alignment In MachineFrameInfoGuillaume Chatelet2019-08-212-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: jfb Subscribers: hiraditya, dexonsmith, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65800 llvm-svn: 369531
* [RISCV] Add support for RVC HINT instructionsLuis Marques2019-08-217-3/+212
| | | | | | | | | The hint instructions are enabled by default (if the standard C extension is enabled). To disable them pass -mattr=-rvc-hints. Differential Revision: https://reviews.llvm.org/D62592 llvm-svn: 369528
* [MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOADPetar Avramovic2019-08-211-3/+3
| | | | | | | | NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66205 llvm-svn: 369512
* [MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXTPetar Avramovic2019-08-211-0/+4
| | | | | | | | NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66204 llvm-svn: 369511
* [MIPS GlobalISel] Consider type1 when legalizing shifts after r351882Petar Avramovic2019-08-211-2/+2
| | | | | | | | | r351882 allows different type for shift amount then result and value being shifted. Fix MIPS Legalizer rules to take r351882 into account. Differential Revision: https://reviews.llvm.org/D66203 llvm-svn: 369510
* [MIPS GlobalISel] NarrowScalar G_TRUNCPetar Avramovic2019-08-211-0/+4
| | | | | | | | | Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source. NarrowScalar G_TRUNC to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66202 llvm-svn: 369509
* [AArch64] Update MTE system register encodingsLuke Cheeseman2019-08-211-5/+5
| | | | | | | | | | The encodings for the system registers TFSRE0_EL1, TFSR_EL1 TFSR_EL2, TFSR_EL3 and TFSR_EL12 have been changed so that they consistently have CRn=5 and CRm=6 as per https://developer.arm.com/docs/ddi0487/latest. Differential Revision: https://reviews.llvm.org/D65442 llvm-svn: 369505
* [AArch64][GlobalISel] Add support for narrowScalar of G_ZEXTAmara Emerson2019-08-212-4/+5
| | | | | | | | We do this by merging the source with the high bits set to 0. Differential Revision: https://reviews.llvm.org/D66181 llvm-svn: 369480
* [RISCV GlobalISel] Adding initial GlobalISel infrastructureDaniel Sanders2019-08-2015-2/+412
| | | | | | | | | | | | | | | | | | | Summary: Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`. Patch by Andrew Wei Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders Reviewed By: dsanders Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65219 llvm-svn: 369467
* [AArch64][GlobalISel] Select logical_imm32 and logical_imm64 patternsJessica Paquette2019-08-202-0/+23
| | | | | | | | | | | | | | Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM SDNodeXForms in AArch64InstrFormats.td. - Add select-logical-imm.mir, which contains tests for each imported pattern. - Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now select instructions of this form. Differential Revision: https://reviews.llvm.org/D66162 llvm-svn: 369465
* [AArch64][GlobalISel] Select patterns which use shifted register operandsJessica Paquette2019-08-202-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GlobalISel equivalents for the following from AArch64InstrFormats: - arith_shifted_reg32 - arith_shifted_reg64 And partial support for - logical_shifted_reg32 - logical_shifted_reg32 The only thing missing for the logical cases is support for rotates. Other than the missing support, the transformation is identical for the arithmetic shifted register and the logical shifted register. Lots of tests here: - Add select-arith-shifted-reg.mir to show that we correctly select add and sub instructions which use this pattern. - Add select-logical-shifted-reg.mir to cover patterns which are not shared between the arithmetic and logical cases. - Update addsub-shifted.ll to show that we correctly fold shifts into adds/subs. - Update eon.ll to show that we can select the eon instruction by folding xors. Differential Revision: https://reviews.llvm.org/D66163 llvm-svn: 369460
* [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors ↵Craig Topper2019-08-201-0/+14
| | | | | | | | | | | | | | (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459
* Revert [WinEH] Allocate space in funclets stack to save XMM CSRsReid Kleckner2019-08-203-135/+26
| | | | | | | | This reverts r367088 (git commit 9ad565f70ec5fd3531056d7c939302d4ea970c83) And the follow up fix r368631 / e9865b9b31bb2e6bc742dc6fca8f9f9517c3c43e llvm-svn: 369457
* Adds support for writing the .bss section for XCOFF object files.Sean Fertile2019-08-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target object writer. Also adds a class to represent the top-level sections, which we materialize in the ObjectWriter. executePostLayoutBinding will map all csects into the appropriate container depending on its storage mapping class, and map all symbols into their containing csect. Once all symbols have been processed we - Assign addresses and symbol table indices. - Calaculte section sizes. - Build the section header table. - Assign the sections raw-pointer value for non-virtual sections. Since the .bss section is virtual, writing the header table is enough to add support. Writing of a sections raw data, or of any relocations is not included in this patch. Testing is done by dumping the section header table, but it needs to be extended to include dumping the symbol table once readobj support for dumping auxiallary entries lands. Differential Revision: https://reviews.llvm.org/D65159 llvm-svn: 369454
* [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector ↵Craig Topper2019-08-201-0/+12
| | | | | | | | | | (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))). Differential Revision: https://reviews.llvm.org/D66489 llvm-svn: 369434
* [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a ↵Craig Topper2019-08-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend. We already had patterns for extending to i32 to take advantage of the impliciting zeroing of the upper bits of a 32-bit GPR that is done by KMOVW/KMOVB. But the extend might be all the way to i64, in which case the existing patterns would fail and we'd get a KMOVW/B followed by a MOVZX. By adding patterns for i64 we can use the fact that KMOVW/B zero the upper bits of the 32-bit GPR and the normal property that 32-bit GPR writes implicitly zero the upper 32-bits of the full 64-bit GPR. The anyextend patterns are slightly different since we don't care about the upper zeros. For the i8->i64 I think this avoids selecting the anyextend as a MOVZX to prevent a partial register issue that doesn't exist. For i16->i64 I think we would have just emitted an insert_subreg on top of the extract_subreg that the vXi16->i16 bitcast pattern emits. The register coalescer or peephole pass should combine those, but this saves that work and makes i8/16 consistent. llvm-svn: 369431
* [TargetMachine] Don't try to create COFFSTUB references on windows on non-COFFMartin Storsjo2019-08-202-3/+8
| | | | | | | | This avoids spurious relocation types for windows/elf targets. Differential Revision: https://reviews.llvm.org/D66401 llvm-svn: 369426
* Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault2019-08-205-142/+28
| | | | | | | This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
* [X86][BtVer2] Use ReadAfterLd entries for the register operands of CMPXCHG.Andrea Di Biagio2019-08-201-3/+12
| | | | | | This is a follow-up of r369365. llvm-svn: 369412
* [X86] Use isNullConstant instead of getConstantOperandVal == 0. NFCCraig Topper2019-08-201-2/+2
| | | | llvm-svn: 369410
* [ARM] Select vaddvaSam Tebbs2019-08-201-0/+7
| | | | | | | | This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404
* [X86][BtVer2] Fix latency and throughput of atomic INC/DEC/NEG/NOT.Andrea Di Biagio2019-08-201-0/+14
| | | | | | | | | Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy. Number of uOPs is still 1. Differential Revision: https://reviews.llvm.org/D66469 llvm-svn: 369388
* [RISCV] Implement getExprForFDESymbol to ensure RISCV_32_PCREL is used for ↵Alex Bradbury2019-08-205-0/+30
| | | | | | | | | | | | | | | | | | the FDE location Follow binutils in using RISCV_32_PCREL for the FDE initial location. As explained in the relevant binutils commit <https://github.com/riscv/riscv-binutils-gdb/commit/a6cbf936e3dce68114d28cdf60d510a3f78a6d40>, the ADD/SUB pair of relocations is problematic in the presence of linker relaxation. This patch has the same end goal as D64715 but includes test changes and avoids adding a new global VariantKind to MCExpr.h (preferring RISCVMCExpr VKs like the rest of the RISC-V backend). Differential Revision: https://reviews.llvm.org/D66419 llvm-svn: 369375
* [X86][Btver2] Fix latency and throughput of CMPXCHG instructions.Andrea Di Biagio2019-08-205-4/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC. Throughput is superiorly limited to 0.33 because of the implicit in/out dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the same memory location, store-to-load forwarding occurs and values for sequent loads are quickly forwarded from the store buffer. Interestingly, the functionality in LLVM that computes the reciprocal throughput doesn't seem to know about RMW instructions. That functionality only looks at the "consumed resource cycles" for the throughput computation. It should be fixed/improved by a future patch. In particular, for RMW instructions, that logic should also take into account for the write latency of in/out register operands. An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to ~17cy/inst due to cache locking, which prevents other memory uOPs to start executing before the "lock releasing" store uOP. CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less macro opcode. Their latency tend to be the same as the other RR/RM variants. RR variants are relatively fast 3cy (but still microcoded - 5 macro opcodes). CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load forwarding. That means, throughput is clearly limited by the in/out dependency on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs for the Integer pipes). I have reused the same mix of consumed resource from the other CMPXCHG instructions for CMPXCHG8B too. LOCK CMPXCHG8B is instead 18cycles. CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38) cycles dependeing on whether the LOCK prefix is specified or not. I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to 2x the resource cycles used for CMPXCHG8B. The two new hasLockPrefix() functions are used by the btver2 scheduling model check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have been encoded in predicates of variant scheduling classes that describe lat/thr of CMPXCHG. Differential Revision: https://reviews.llvm.org/D66424 llvm-svn: 369365
* [X86] Add back the -x86-experimental-vector-widening-legalization comand ↵Craig Topper2019-08-202-143/+1227
| | | | | | | | | | | | | | | | | | | | | line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332
* [NFC] Test commit, fix some comment spelling.Thomas Raoux2019-08-201-1/+1
| | | | llvm-svn: 369326
* [AsmPrinter] Remove const qualifier from EmitBasicBlockStart.Karl-Johan Karlsson2019-08-204-6/+6
| | | | | | | | | | | | | | | | Overriders may want to modify state in it. AMDGPU wants to, but has to make its members mutable in order to do so. Besides, EmitBasicBlockEnd is not const, so why should Start be? Patch by Bevin Hansson. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D66341 llvm-svn: 369325
* Refactor isPointerOffset (NFC).Evgeniy Stepanov2019-08-191-7/+7
| | | | | | | | | | | | | | | | Summary: Simplify the API using Optional<> and address comments in https://reviews.llvm.org/D66165 Reviewers: vitalybuka Subscribers: hiraditya, llvm-commits, ostannard, pcc Tags: #llvm Differential Revision: https://reviews.llvm.org/D66317 llvm-svn: 369300
* MemTag: stack initializer merging.Evgeniy Stepanov2019-08-193-6/+301
| | | | | | | | | | | | | | | | | | | | | | Summary: MTE provides instructions to update memory tags and data at the same time. This change makes use of those to generate more compact code for stack variable tagging + initialization. We collect memory store and memset instructions following an alloca or a lifetime.start call, and replace them with the corresponding MTE intrinsics. Since the intrinsics work on 16-byte aligned chunks, the stored values are combined as necessary. Reviewers: pcc, vitalybuka, ostannard Subscribers: srhines, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66167 llvm-svn: 369297
* [WebAssembly][MC] Allow empty assembly functionsSam Clegg2019-08-191-9/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D66434 llvm-svn: 369292
* [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more ↵Craig Topper2019-08-191-9/+11
| | | | | | | | | | | | | | than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle. The motivating case are the changes in vector-reduce-add.ll where we were doing extra work in the scalar domain instead of shuffling. There may be some one use check that needs to be looked into there, but this patch sidesteps the issue by avoiding broadcasts that aren't really broadcasting. Differential Revision: https://reviews.llvm.org/D66071 llvm-svn: 369287
* [nfc] Silent gcc warningSerge Guelton2019-08-191-3/+2
| | | | llvm-svn: 369266
* [PeepholeOptimizer] Don't assume bitcast def always has inputJinsong Ji2019-08-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If we have a MI marked with bitcast bits, but without input operands, PeepholeOptimizer might crash with assert. eg: If we apply the changes in PPCInstrVSX.td as in this patch: [(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>; We will get assert in PeepholeOptimizer. ``` llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. ``` The fix is to abort if we found out of bound access. Reviewers: qcolombet, MatzeB, hfinkel, arsenm Reviewed By: qcolombet Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65542 llvm-svn: 369261
* [RISCV] Don't force absolute FK_Data_X fixups to relocsAlex Bradbury2019-08-191-0/+7
| | | | | | | | | | | | The current behavior of shouldForceRelocation forces relocations for the majority of fixups when relaxation is enabled. This makes sense for fixups which incorporate symbols but is unnecessary for simple data fixups where the fixup target is already resolved to an absolute value. Differential Revision: https://reviews.llvm.org/D63404 Patch by Edward Jones. llvm-svn: 369257
* [ARM] Add support for MVE vaddvSam Tebbs2019-08-194-0/+41
| | | | | | | | | This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245
* [ARM] MVE sext costsDavid Green2019-08-191-0/+25
| | | | | | | | | This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244
* [X86] Teach lower1BitShuffle to match right shifts with upper zero elements ↵Craig Topper2019-08-191-19/+20
| | | | | | | | | | on types that don't natively support KSHIFT. We can support these by widening to a supported type, then shifting all the way to the left and then back to the right to ensure that we shift in zeroes. llvm-svn: 369232
* [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the ↵Craig Topper2019-08-191-1/+1
| | | | | | | | | | widened vector to the KSHIFT node. Not sure how to test this as we have tests that exercise this code, but nothing failed for the types not matching. Since all the k-registers use equivalent register classes everything just ends up working. llvm-svn: 369228
* [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and ↵Craig Topper2019-08-191-0/+48
| | | | | | | | | | | only relies on undef. This allows us to widen the type when the KSHIFTR instruction doesn't exist for the type. If we need to shift in zeroes into the upper elements we would need more work to guarantee zeroes when widening. llvm-svn: 369227
* [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros ↵Craig Topper2019-08-191-7/+16
| | | | | | | | | with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226
OpenPOWER on IntegriCloud