summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] clean up extract-of-concat fold; NFCSanjay Patel2020-01-081-13/+21
| | | | | | | This hopes to improve readability and adds an assert. The functional change noted by the TODO comment is proposed in: D72361
* [JumpThreading] Thread jumps through two basic blocksKazu Hirata2020-01-081-2/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch teaches JumpThreading.cpp to thread through two basic blocks like: bb3: %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 by duplicating basic blocks like bb3 above. Once we duplicate bb3 as bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have: bb3: %var = phi i32* [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb3.dup: %var = phi i32* [ null, %bb1 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 Then the existing code in JumpThreading.cpp can thread edge bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70247
* [ARM,MVE] Intrinsics for variable shift instructions.Simon Tatham2020-01-081-12/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329
* [ARM,MVE] Intrinsics for partial-overwrite imm shifts.Simon Tatham2020-01-081-49/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328
* [Intrinsic] Add fixed point division intrinsics.Bevin Hansson2020-01-0810-33/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: llvm.sdiv.fix.* llvm.udiv.fix.* These intrinsics perform scaled division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Patch by: ebevhan Reviewers: bjope, leonardchan, efriedma, craig.topper Reviewed By: craig.topper Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70007
* [NFC] Move InPQueue into arguments of releaseNodeQiu Chaofan2020-01-081-8/+3
| | | | | | | This patch moves `InPQueue` into function arguments instead of template arguments of `releaseNode`, which is a cleaner approach. Differential Revision: https://reviews.llvm.org/D72125
* [ARM][MVE] Enable masked gathers from vector of pointersAnna Welker2020-01-086-1/+208
| | | | | | | | Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743
* [Dsymutil][Debuginfo][NFC] Reland: Refactor dsymutil to separate DWARF ↵Alexey Lapshin2020-01-088-1/+398
| | | | | | | | | | | | | | | | | | | | | | optimizing part. #2. Summary: This patch relands D71271. The problem with D71271 is that it has cyclic dependency: CodeGen->AsmPrinter->DebugInfoDWARF->CodeGen. To avoid cyclic dependency this patch puts implementation for DWARFOptimizer into separate library: lib/DWARFLinker. Thus the difference between this patch and D71271 is in that DWARFOptimizer renamed into DWARFLinker and it`s files are put into lib/DWARFLinker. Reviewers: JDevlieghere, friss, dblaikie, aprantl Reviewed By: JDevlieghere Subscribers: thegameg, merge_guards_bot, probinson, mgorny, hiraditya, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D71839
* Revert "[InstCombine] fold zext of masked bit set/clear"Kadir Cetinkaya2020-01-081-17/+3
| | | | | | | | | | This reverts commit a041c4ec6f7aa659b235cb67e9231a05e0a33b7d. This looks like a non-trivial change and there has been no code reviews (at least there were no phabricator revisions attached to the commit description). It is also causing a regression in one of our downstream integration tests, we haven't been able to come up with a minimal reproducer yet.
* AArch64: add missing Apple CPU names and use them by default.Tim Northover2020-01-084-7/+107
| | | | | | | | Apple's CPUs are called A7-A13 in official communication, occasionally with weird suffixes which we probably don't need to care about. This adds each one and describes its features. It also switches the default CPU to the canonical name for Cyclone, but leaves legacy support in so that existing bitcode still compiles.
* [X86] Adding fp128 support for strict fcmpWang, Pengfei2020-01-083-25/+63
| | | | | | | | | | | | Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897
* [RISCV] Fix evalutePCRelLo for symbols at the end of a fragmentJames Clarke2020-01-081-1/+5
| | | | | | | | | | | | | | | | | | Summary: This is analogous to D58943, which correctly finds the corresponding fixup. However, when linker relaxations are disabled and we evaluate the fixup, we need to also ensure we use an offset of 0 rather than the size of the previous fragment. Reviewers: asb, efriedma, lenary Reviewed By: efriedma Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71978
* AMDGPU: Annotate EXTRACT_SUBREGs with source register classesMatt Arsenault2020-01-071-24/+24
| | | | | This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.
* [SCEV] get more accurate range for AddExpr with wrap flag.czhengsz2020-01-071-1/+8
| | | | | | Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D64869
* [GVN/FP] Considate logic for reasoning about equality vs equivalance for floatsPhilip Reames2020-01-071-29/+58
| | | | | | Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't. Differential Revision: https://reviews.llvm.org/D67126
* [AArch64][GlobalISel] Fold a chain of two G_PTR_ADDs of constant offsets.Amara Emerson2020-01-071-0/+46
| | | | | | | | | | E.g. %addr1 = G_PTR_ADD %base, G_CONSTANT 20 %addr2 = G_PTR_ADD %addr1, G_CONSTANT 8 --> %addr2 = G_PTR_ADD %base, G_CONSTANT 28 Differential Revision: https://reviews.llvm.org/D72351
* Revert "Allow output constraints on "asm goto""Bill Wendling2020-01-075-50/+18
| | | | | | This reverts commit 52366088a8e42c2f1e96e8430b84b8b65ec3f7bc. I accidentally pushed this before supporting changes.
* Allow output constraints on "asm goto"Bill Wendling2020-01-075-18/+50
| | | | | | | | | | | | | | | | | Summary: Remove the restrictions that preventing "asm goto" from returning non-void values. The values returned by "asm goto" are only valid on the "fallthrough" path. Reviewers: jyknight, nickdesaulniers, hfinkel Reviewed By: jyknight, nickdesaulniers Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69876
* AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointersMatt Arsenault2020-01-071-1/+1
| | | | | 4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal condition type set for pointers with any unrecognized address space.
* AMDGPU: Apply i16 add->sub pattern with zext to i32Matt Arsenault2020-01-071-8/+15
| | | | | This was only applying the deeper nested zext pattern, and missing the special case code size fold.
* [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 targetCraig Topper2020-01-071-3/+1
| | | | | | Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354
* AMDGPU: Remove VOP3Mods0Clamp0OModMatt Arsenault2020-01-076-34/+1
| | | | | Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.
* AMDGPU: Fix misleading, misplaced end block commentsMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Use ImmLeafMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Fix not using v_cvt_f16_[iu]16Matt Arsenault2020-01-072-10/+35
| | | | | We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.
* [PowerPC][Triple] Use elfv2 on freebsd>=13 and linux-muslFangrui Song2020-01-072-6/+0
| | | | | | | | | | | | | | | | | | | Summary: Every powerpc64le platform uses elfv2. For powerpc64, the environments "elfv1" and "elfv2" were added for FreeBSD ELFv1->ELFv2 migration in D61950. FreeBSD developers have decided to use OS versions to select ABI, and no one is relying on the environments. Also use elfv2 on powerpc64-linux-musl. Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default ABI. Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D72352
* [MachineOutliner][AArch64] Save + restore LR in noreturn functionsJessica Paquette2020-01-072-7/+11
| | | | | | | | | | | | | | Conservatively always save + restore LR in noreturn functions. These functions do not end in a RET, and so they aren't guaranteed to have an instruction which uses LR in any way. So, as a result, you can end up in unfortunate situations where you can't backtrace out of these functions in a debugger. Remove the old noreturn test, and add a new one which is more descriptive. Remove the restriction that we can't outline from noreturn functions as well since we now do the right thing.
* [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable ↵Craig Topper2020-01-071-3/+14
| | | | | | | | v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318
* [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targetsCraig Topper2020-01-071-0/+13
| | | | | | Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302
* Fix issues reported by -Wrange-loop-analysis when building with latest Clang ↵Alexandre Ganea2020-01-071-1/+1
| | | | | | (trunk). NFC. Fixes warning: loop variable 'E' of type 'const llvm::StringRef' creates a copy from type 'const llvm::StringRef' [-Wrange-loop-analysis]
* [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [AIX][XCOFF]Implement mergeable constdiggerlin2020-01-072-3/+2
| | | | | | | | | | SUMMARY: In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable. Reviewers: hubert.reinterpretcast,jasonliu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D71551
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-072-2/+2
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* [InstCombine] try to pull 'not' of select into compare operandsSanjay Patel2020-01-071-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) --> select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?) If both sides of the select are cmps, we can remove an instruction. The case where only side is a cmp is deferred to a possible follow-on patch. We have a more general 'isFreeToInvert' analysis, but I'm not seeing a way to use that more widely without inducing infinite looping (opposing transforms). Here, we flip the compare predicates directly, so we should not have any danger by creating extra intermediate 'not' ops. Alive proofs: https://rise4fun.com/Alive/jKa Name: both select values are compares - invert predicates %tcmp = icmp sle i32 %x, %y %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = icmp sgt i32 %x, %y %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Name: false val is compare - invert/not %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = xor i1 %tcmp, -1 %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Differential Revision: https://reviews.llvm.org/D72007
* AMDGPU/GlobalISel: Fix import of s_abs_i32 patternMatt Arsenault2020-01-071-1/+1
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.voteMatt Arsenault2020-01-071-2/+2
|
* OpaquePtr: print byval types containing anonymous types correctly.Tim Northover2020-01-071-6/+41
| | | | | | | | | | Attribute::getAsString doesn't have enough information to print anonymous Module-level types correctly, so they come back as "%type 0xabcd". This results in broken IR when printing as text. Instead, print type-attributes (currently just byval) using the TypePrinting infrastructure available in AsmWriter. This only applies to function argument attributes.
* AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern importMatt Arsenault2020-01-071-4/+4
| | | | | Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.
* [TypePromotion] Use SetVectors instead of PtrSetsSam Parker2020-01-071-40/+30
| | | | | | | | | | | Remove the chance of non-deterministic insertion of zexts of the sources by using a SetVector instead of SmallPtrSet. Do the same for sinks for consistency and to negate the small issue from possibly happening. The SafeWrap instructions are now also stored in a SmallVector. The IRPromoter members of these structures have been changed to references. Differential Revision: https://reviews.llvm.org/D72322
* [DAGCombiner] reduce shuffle of concat of same vectorSanjay Patel2020-01-071-0/+24
| | | | | | | | | | | | | | | | | This is possibly a small part towards solving PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 The vectorizer is creating shuffles of concat like this: %63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine, so we should have a backend backstop to convert that into the likely simpler form: %64 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3> Differential Revision: https://reviews.llvm.org/D72300
* [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPSSjoerd Meijer2020-01-072-33/+41
| | | | | | | | | | | | | | | | | | | This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470
* [X86] Standardize shuffle match/lowering function names. NFC.Simon Pilgrim2020-01-071-38/+39
| | | | We mainly use lowerShuffle*/matchShuffle* - replace the (few) lowerVectorShuffle*/matchVectorShuffle* cases to be consistent.
* [ARM] Improve codegen of volatile load/store of i64Victor Campos2020-01-076-6/+162
| | | | | | | | | | | | | | | | | | Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
* Fix "use of uninitialized variable" static analyzer warning. NFCI.Simon Pilgrim2020-01-071-1/+1
|
* Fix "use of uninitialized variable" static analyzer warnings. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* Fix "use of uninitialized variable" static analyzer warnings. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [DebugInfo] Fix infinite loop caused by reading past debug_line endJames Henderson2020-01-071-2/+17
| | | | | | | | | | | | | | | | If the claimed unit length of a debug line program is such that the line table would finish past the end of the .debug_line section, an infinite loop occurs because the data extractor will continue to "read" zeroes without changing the offset. This previously didn't hit an error because the line table program handles a series of zeroes as a bad extended opcode. This patch fixes the inifinite loop and adds a warning if the program doesn't fit in the available data. Reviewed by: JDevlieghere Differential Revision: https://reviews.llvm.org/D72279
* [NFC] Use isX86() instead of getArch()Jim Lin2020-01-072-7/+4
| | | | | | | | | | | | | | Summary: This is a clean up for https://reviews.llvm.org/D72247. Reviewers: MaskRay, craig.topper, jhenderson Reviewed By: MaskRay Subscribers: hiraditya, rupprecht, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72320
* [APFloat] Fix out of scope usage of a pointer to local variableEhud Katz2020-01-071-3/+7
|
* [APFloat] Fix fusedMultiplyAdd when `this` equals to `Addend`Ehud Katz2020-01-071-9/+12
| | | | | | | | | | | | | Up until now, the arguments to `fusedMultiplyAdd` are passed by reference. We must save the `Addend` value on the beginning of the function, before we modify `this`, as they may be the same reference. To fix this, we now pass the `addend` parameter of `multiplySignificand` by value (instead of by-ref), and have a default value of zero. Fix PR44051. Differential Revision: https://reviews.llvm.org/D70422
OpenPOWER on IntegriCloud