summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32Matt Arsenault2020-01-093-0/+17
| | | | Only partially fixes one pattern import.
* AMDGPU/GlobalISel: Fix add of neg inline constant patternMatt Arsenault2020-01-094-1/+26
|
* AMDGPU: Add register class to DS_SWIZZLE_B32 patternMatt Arsenault2020-01-091-1/+1
| | | | Reduces diff for a future patch.
* [NFC][ARM] LowOverheadLoop commentsSam Parker2020-01-091-0/+16
| | | | Add a comment describing the dependencies of the pass.
* [ARM][MVE] Don't unroll intrinsic loops.Sam Parker2020-01-091-4/+5
| | | | | | | | We don't unroll vector loops for MVE targets, but we miss the case when loops only contain intrinsic calls. So just move the logic a bit to catch this case. Differential Revision: https://reviews.llvm.org/D72440
* [VE] Target stub for NEC SX-AuroraKazushi (Jam) Marukawa2020-01-0913-0/+273
| | | | | | | | | Summary: This patch registers the 've' target: the NEC SX-Aurora TSUBASA Vector Engine. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D69103
* Revert "[ARM][LowOverheadLoops] Update liveness info"Sam Parker2020-01-091-64/+0
| | | | | | | This reverts commit e93e0d413f3afa1df5c5f88df546bebcd1183155. There's some ordering problems on some on the buildbots which needs investigating.
* [ARM][LowOverheadLoops] Update liveness infoSam Parker2020-01-091-0/+64
| | | | | | | | After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131
* [APFloat] Fix checked error assert failuresEhud Katz2020-01-092-3/+3
| | | | | | | | | | | `APFLoat::convertFromString` returns `Expected` result, which must be "checked" if the LLVM_ENABLE_ABI_BREAKING_CHECKS preprocessor flag is set. To mark an `Expected` result as "checked" we must consume the `Error` within. In many cases, we are only interested in knowing if an error occured, without the need to examine the error info. This is achieved, easily, with the `errorToBool()` API.
* Revert "Revert "[MIR] Target specific MIR formating and parsing""Daniel Sanders2020-01-081-1/+4
| | | | | | | There was an unguarded dereference of MF in a function that permitted nullptr. Fixed This reverts commit 71d64f72f934631aa2f12b9542c23f74f256f494.
* Revert "[MIR] Target specific MIR formating and parsing"Nico Weber2020-01-081-4/+1
| | | | | This reverts commit 3ef05d85be8c3666ebfa3ad986eb334da5195a47. It broke check-llvm on many bots, see comments on D69836.
* [MIR] Target specific MIR formating and parsingPeng Guo2020-01-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Added MIRFormatter for target specific MIR formating and parsing with immediate and custom pseudo source values. Target machine can subclass MIRFormatter and implement custom logic for printing and parsing immediate and custom pseudo source values for better readability. * Target specific immediate mnemonic need to start with "." follows by identifier string. When MIR parser sees immediate it will call target specific parsing function. * Custom pseudo source value need to start with custom follows by double-quoted string. MIR parser will pass the quoted string to target specific PSV parsing function. * MIRFormatter have 2 helper functions to facilitate LLVM value printing and parsing for custom PSV if they refers LLVM values. Patch by Peng Guo Reviewers: dsanders, arsenm Reviewed By: dsanders Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69836
* Revert "[MIR] Target specific MIR formating and parsing"Daniel Sanders2020-01-081-4/+1
| | | | | | Forgot to credit Peng in the commit message. This reverts commit be841f89d0014b1e0246a4feae941b2f74abd908.
* [MIR] Target specific MIR formating and parsingPeng Guo2020-01-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Added MIRFormatter for target specific MIR formating and parsing with immediate and custom pseudo source values. Target machine can subclass MIRFormatter and implement custom logic for printing and parsing immediate and custom pseudo source values for better readability. * Target specific immediate mnemonic need to start with "." follows by identifier string. When MIR parser sees immediate it will call target specific parsing function. * Custom pseudo source value need to start with custom follows by double-quoted string. MIR parser will pass the quoted string to target specific PSV parsing function. * MIRFormatter have 2 helper functions to facilitate LLVM value printing and parsing for custom PSV if they refers LLVM values. Reviewers: dsanders, arsenm Reviewed By: dsanders Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69836
* [PowerPC] when folding rlwinm+rlwinm. to andi., we should use first rlwinmZheng Chen2020-01-081-15/+21
| | | | | | | | | | | | | | | | | input reg. %2:gprc = RLWINM %1:gprc, 27, 5, 10 %3:gprc = RLWINM_rec %2:gprc, 8, 5, 10, implicit-def $cr0 ==> %3:gprc = ANDI_rec %1, 0, implicit-def $cr0 we should use %1 instead of %2 as ANDI_rec input. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D71885
* [PowerPC]: Add powerpcspe target triple subarch componentJustin Hibbits2020-01-081-0/+3
| | | | | | | | | | Summary: This allows the use of '-target powerpcspe-unknown-linux-gnu' or 'powerpcspe-unknown-freebsd' to be used, instead of '-target powerpc-unknown-linux-gnu -mspe'. Reviewed By: dim Differential Revision: https://reviews.llvm.org/D72014
* [X86] Remove EFLAGS from live-in lists in X86FlagsCopyLowering.Jonas Paulsson2020-01-081-0/+3
| | | | | | | | | | | When EFLAGS is no longer live into a basic block, remove it from the live-in list. Fixes https://bugs.llvm.org/show_bug.cgi?id=44462. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D71375
* Revert "Merge memtag instructions with adjacent stack slots."Evgenii Stepanov2020-01-087-489/+30
| | | | | | | | | | | | *** Bad machine code: Tied use must be a register *** - function: stg_alloca17 - basic block: %bb.0 entry (0x20076710580) - instruction: early-clobber %0:gpr64common, early-clobber %1:gpr64sp = STGloop 272, %stack.0.a :: (store 272 into %ir.a, align 16) - operand 3: %stack.0.a http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/21481/steps/test-check-all/logs/stdio This reverts commit b675a7628ce6a21b1e4a71c079a67badfb8b073d.
* Merge memtag instructions with adjacent stack slots.Evgenii Stepanov2020-01-087-30/+489
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Detect a run of memory tagging instructions for adjacent stack frame slots, and replace them with a shorter instruction sequence * replace STG + STG with ST2G * replace STGloop + STGloop with STGloop This code needs to run when stack slot offsets are already known, but before FrameIndex operands in STG instructions are eliminated; that's the reason for the new hook in PrologueEpilogue. This change modifies STGloop and STZGloop pseudos to take the size as an immediate integer operand, and base address as a FI operand when possible. This is needed to simplify recognizing an STGloop instruction as operating on a stack slot post-regalloc. This improves memtag code size by ~0.25%, and it looks like an additional ~0.1% is possible by rearranging the stack frame such that consecutive STG instructions reference adjacent slots (patch pending). Reviewers: pcc, ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70286
* [X86] Keep cl::opts at top of file [NFC]Philip Reames2020-01-081-34/+34
|
* [X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in ↵Craig Topper2020-01-081-9/+11
| | | | | | | | | | 64-bit mode For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier. When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd. Differential Revision: https://reviews.llvm.org/D72368
* [X86] Add isel patterns for bitcasting between v32i1/v64i1 and float/double.Craig Topper2020-01-081-0/+11
| | | | | | We have to do an intermediate jump to a GPR to make the cast. Fixes PR43750.
* [BranchAlign] Compiler support for suppressing branch alignPhilip Reames2020-01-082-2/+50
| | | | | | | | | | | | As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code. Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and *only* worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support. For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall. Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least *correct* and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place. Differential Revision: https://reviews.llvm.org/D72303
* Fix "pointer is null" static analyzer warning. NFCI.Simon Pilgrim2020-01-081-1/+1
| | | | Use cast<> instead of dyn_cast<> since we know that the pointer should be valid (and is dereferenced immediately below in the getSignature call).
* [amdgpu] Remove unused header. NFC.Michael Liao2020-01-081-1/+0
|
* [ARM,MVE] Intrinsics for variable shift instructions.Simon Tatham2020-01-081-12/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329
* [ARM,MVE] Intrinsics for partial-overwrite imm shifts.Simon Tatham2020-01-081-49/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328
* [ARM][MVE] Enable masked gathers from vector of pointersAnna Welker2020-01-086-1/+208
| | | | | | | | Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743
* AArch64: add missing Apple CPU names and use them by default.Tim Northover2020-01-084-7/+107
| | | | | | | | Apple's CPUs are called A7-A13 in official communication, occasionally with weird suffixes which we probably don't need to care about. This adds each one and describes its features. It also switches the default CPU to the canonical name for Cyclone, but leaves legacy support in so that existing bitcode still compiles.
* [X86] Adding fp128 support for strict fcmpWang, Pengfei2020-01-081-5/+5
| | | | | | | | | | | | Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897
* [RISCV] Fix evalutePCRelLo for symbols at the end of a fragmentJames Clarke2020-01-081-1/+5
| | | | | | | | | | | | | | | | | | Summary: This is analogous to D58943, which correctly finds the corresponding fixup. However, when linker relaxations are disabled and we evaluate the fixup, we need to also ensure we use an offset of 0 rather than the size of the previous fragment. Reviewers: asb, efriedma, lenary Reviewed By: efriedma Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71978
* AMDGPU: Annotate EXTRACT_SUBREGs with source register classesMatt Arsenault2020-01-071-24/+24
| | | | | This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.
* AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointersMatt Arsenault2020-01-071-1/+1
| | | | | 4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal condition type set for pointers with any unrecognized address space.
* AMDGPU: Apply i16 add->sub pattern with zext to i32Matt Arsenault2020-01-071-8/+15
| | | | | This was only applying the deeper nested zext pattern, and missing the special case code size fold.
* [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 targetCraig Topper2020-01-071-3/+1
| | | | | | Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354
* AMDGPU: Remove VOP3Mods0Clamp0OModMatt Arsenault2020-01-076-34/+1
| | | | | Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.
* AMDGPU: Fix misleading, misplaced end block commentsMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Use ImmLeafMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Fix not using v_cvt_f16_[iu]16Matt Arsenault2020-01-072-10/+35
| | | | | We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.
* [PowerPC][Triple] Use elfv2 on freebsd>=13 and linux-muslFangrui Song2020-01-071-2/+0
| | | | | | | | | | | | | | | | | | | Summary: Every powerpc64le platform uses elfv2. For powerpc64, the environments "elfv1" and "elfv2" were added for FreeBSD ELFv1->ELFv2 migration in D61950. FreeBSD developers have decided to use OS versions to select ABI, and no one is relying on the environments. Also use elfv2 on powerpc64-linux-musl. Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default ABI. Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D72352
* [MachineOutliner][AArch64] Save + restore LR in noreturn functionsJessica Paquette2020-01-071-1/+11
| | | | | | | | | | | | | | Conservatively always save + restore LR in noreturn functions. These functions do not end in a RET, and so they aren't guaranteed to have an instruction which uses LR in any way. So, as a result, you can end up in unfortunate situations where you can't backtrace out of these functions in a debugger. Remove the old noreturn test, and add a new one which is more descriptive. Remove the restriction that we can't outline from noreturn functions as well since we now do the right thing.
* [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable ↵Craig Topper2020-01-071-3/+14
| | | | | | | | v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318
* [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targetsCraig Topper2020-01-071-0/+13
| | | | | | Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302
* [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [AIX][XCOFF]Implement mergeable constdiggerlin2020-01-071-2/+1
| | | | | | | | | | SUMMARY: In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable. Reviewers: hubert.reinterpretcast,jasonliu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D71551
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-072-2/+2
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* AMDGPU/GlobalISel: Fix import of s_abs_i32 patternMatt Arsenault2020-01-071-1/+1
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.voteMatt Arsenault2020-01-071-2/+2
|
* AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern importMatt Arsenault2020-01-071-4/+4
| | | | | Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.
* [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPSSjoerd Meijer2020-01-071-31/+41
| | | | | | | | | | | | | | | | | | | This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470
OpenPOWER on IntegriCloud