summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [Attributor] AAValueConstantRange: Value range analysis using constant rangeHideto Ueno2020-01-011-7/+499
| | | | | | | | | | | | | | | | | | | | | This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point. One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument). The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty). Currently, AAValueConstantRange is created when AAValueSimplify cannot simplify the value. Supported - BinaryOperator(add, sub, ...) - CmpInst(icmp eq, ...) - !range metadata `AAValueConstantRange` is not intended to extend to polyhedral range value analysis. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71620
* [X86] Fix typo in getCMovOpcode.Craig Topper2019-12-311-1/+1
| | | | | | The 64-bit HasMemoryOperand line was using CMOV32rm instead of CMOV64rm. Not sure how to test this. We have no test coverage that passes true for HasMemoryOperand.
* [X86] Add X87 FCMOV support to X86FlagsCopyLowering.Craig Topper2019-12-311-0/+73
| | | | Fixes PR44396
* DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nszMatt Arsenault2019-12-312-5/+10
| | | | | | | | | | | | | | This was increasing the number of instructions when fsub was legalized on AMDGPU with no signed zeros enabled. This fold should be guarded by hasOneUse, and I don't think getNode should be doing that. The same fold is already done as a regular combine through isNegatibleForFree. This does require duplicating, even though isNegatibleForFree does this combine already (and properly checks hasOneUse) to avoid one PPC regression. In the regression, the outer fneg has nsz but the fsub operand does not. isNegatibleForFree only sees the operand, and doesn't see it's used from a nsz context. A nsz parameter needs to be added and threaded through isNegatibleForFree to avoid this.
* [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.Craig Topper2019-12-311-0/+3
|
* [X86][InstCombine] Add constant folding and simplification support for pdep ↵Craig Topper2019-12-311-0/+58
| | | | | | | | | | | | and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952
* [X86] Use carry flag from add for (seteq (add X, -1), -1).Craig Topper2019-12-311-10/+31
| | | | | | | | If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019
* [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to ↵Craig Topper2019-12-312-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.
* [amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`.Michael Liao2019-12-311-1/+1
| | | | | | | | | | Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866
* [InstCombine] fold zext of masked bit set/clearSanjay Patel2019-12-311-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1
* Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"Nikita Popov2019-12-311-6/+0
| | | | | | This reverts commit 27a0795943fee0f30b995fe5165428afc2dfd402. Seems to break test-suite.
* [PowerPC][NFC] Fix clang-tidy warningJinsong Ji2019-12-311-5/+5
| | | | | | | | | | | | | | | | | | Reported by https://results.llvm-merge-guard.org/amd64_debian_testing_clang8-726/clang-tidy.txt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11672:10: warning: invalid case style for variable 'isEQ' [readability-identifier-naming] bool isEQ = (MI.getOpcode() == PPC::ANDI_rec_1_EQ_BIT || ^~~~ IsEq /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11679:14: warning: invalid case style for variable 'dl' [readability-identifier-naming] DebugLoc dl = MI.getDebugLoc(); ^~ Dl
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2019-12-311-0/+6
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164
* [ARM][TypePromotion] Re-enable by defaultSam Parker2019-12-311-1/+1
| | | | Re-enable the pass after it was reverted and the bug fixed.
* [InstCombine] Don't rewrite phi-of-bitcast when the phi has other usersConnor Abbott2019-12-311-14/+43
| | | | | | | | | Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209
* [Attributor] Suppress unused warnings when assertions are disabled. NFCIlya Biryukov2019-12-311-0/+2
|
* [Attributor] Function signature rewrite infrastructureJohannes Doerfert2019-12-311-0/+259
| | | | | | | | | | | | | | | As part of the Attributor manifest we want to change the signature of functions. This patch introduces a fairly generic interface to do so. As a first, very simple, use case, we remove unused arguments. A second use case, pointer privatization, will be committed with this patch as well. A lot of the code and ideas are taken from argument promotion and we run all argument promotion tests through this framework as well. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68765
* [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in ↵Craig Topper2019-12-311-2/+8
| | | | | | | | | | | | | | | LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.
* [NFC] Style cleanupShengchen Kan2019-12-311-28/+29
| | | | | | 1. make function Is16BitMemOperand static 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter
* [Attributor] Propagate known align from arguments to call sites argumentsJohannes Doerfert2019-12-311-0/+12
| | | | | | | Since the information is known we can simply use it at the call site. This is especially useful for callbacks but also helps regular calls. The test changes are mechanical.
* [Attributor] Use abstract call sites to determine associated argumentsJohannes Doerfert2019-12-312-2/+111
| | | | | | | | | | | | | | | | | | | | | | This is the second step after D67871 to make use of abstract call sites. In this patch the argument we associate with a abstract call site argument can be the one in the callback callee instead of the one in the callback broker. Caveat: We cannot allow no-alias arguments for problematic callbacks: As described in [1], adding no-alias (or restrict) to arguments could break synchronization as the synchronization effect, e.g., a barrier, does not "alias" with the pointer anymore. This disables no-alias annotation for potentially problematic arguments until we implement the fix described in [1]. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68008 [1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel, International Workshop on OpenMP 2018, http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
* [Attributor] Annotate the memory behavior of call site argumentsJohannes Doerfert2019-12-311-2/+7
| | | | | | | | | Especially for callbacks, annotating the call site arguments is important. Doing so exposed a too strong dependence of AAMemoryBehavior on AANoCapture since we handle the case of potentially captured pointers explicitly. The changes to the tests are all mechanical.
* [NFC] Make X86MCCodeEmitter::isPCRel32Branch staticShengchen Kan2019-12-311-4/+2
|
* Revert "DebugInfo: Fix rangesBaseAddress DICompileUnit bitcode ↵David Blaikie2019-12-303-4/+3
| | | | | | | | | | | | serialization/deserialization" Seeing some curious CFI failures internally - which makes little sense to me, as I don't think anyone is using this flag (even us, internally)... so sounds like a bug in my code somewhere (possibly a latent one that propagating this flag exposed, not sure). Reverting while I investigate. This reverts commit c51b45e32ef7f35c11891f60871aa9c2c04cd991.
* [NFC] Style cleanupShengchen Kan2019-12-311-389/+479
| | | | | | | 1. Remove function is64BitMode() and use STI.hasFeature(X86::Mode16Bit) directly 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter 4. Format the code with clang-format
* [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵Craig Topper2019-12-305-25/+25
| | | | | | | | | | | instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
* Remove a redundant `default:` on an exhaustive switch(enum).Eric Astor2019-12-301-2/+0
|
* [OpenMP] Use the OpenMPIRBuilder for `omp parallel`Johannes Doerfert2019-12-302-1/+30
| | | | | | | | | | | | This allows to use the OpenMPIRBuilder for parallel regions. Code was extracted from D61953 and adapted to work with the new version (D70109). All but one feature should be supported. An update of this patch will provide test coverage and privatization other than shared. Reviewed By: fghanim Differential Revision: https://reviews.llvm.org/D70290
* [OpenMP] Use the OpenMPIRBuilder for `omp cancel`Johannes Doerfert2019-12-301-30/+79
| | | | | | | | | | | | An `omp cancel parallel` needs to be emitted by the OpenMPIRBuilder if the `parallel` was emitted by the OpenMPIRBuilder. This patch makes this possible. The cancel logic is shared with the cancel barriers. Testing is done via unit tests and the clang cancel_codegen.cpp file once D70290 lands. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D71948
* [X86][AsmParser] re-introduce 'offset' operatorEric Astor2019-12-304-98/+200
| | | | | | | | | | | | | | | | | | | | | | | Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-302-4/+12
|
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* AMDGPU/GlobalISel: Re-use MRI available in selectorMatt Arsenault2019-12-301-9/+7
|
* Ignore "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" in favor ↵Fangrui Song2019-12-303-13/+22
| | | | | | | | | | | | | of "frame-pointer" D56351 (included in LLVM 8.0.0) introduced "frame-pointer". All tests which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf" have been migrated to use "frame-pointer". Implement UpgradeFramePointerAttributes to upgrade the two obsoleted function attributes for bitcode. Their semantics are ignored. Differential Revision: https://reviews.llvm.org/D71863
* [MIPS GlobalISel] Select bitreverse. RecommitPetar Avramovic2019-12-302-1/+50
| | | | | | | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Recommit notes: Introduce temporary variables in order to make sure instructions get inserted into MachineFunction in same order regardless of compiler used to build llvm. Differential Revision: https://reviews.llvm.org/D71363
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-302-4/+9
|
* [InstCombine] propagate sign argument through nested copysignsSanjay Patel2019-12-301-0/+10
| | | | | This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [ARM][Thumb][FIX] Add unwinding information to t4Diogo Sampaio2019-12-301-0/+2
| | | | | | | | | | | | | | | | | Summary: Add missing part of patch D71361. Now that the stack-frame can be operated using a addw/subw instruction, they should appear in the unwinding list. Reviewers: dmgreen, efriedma Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72000
* GlobalISel: moreElementsVector for FP min/maxMatt Arsenault2019-12-302-1/+8
|
* AMDGPU: Improve llvm.round.f64 lowering for CI+Matt Arsenault2019-12-302-4/+5
| | | | | The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
* AMDGPU/GlobalISel: Account for G_PHI result bankMatt Arsenault2019-12-301-13/+23
| | | | | | | | | Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.
* [PowerPC] Legalize rounding nodesNemanja Ivanovic2019-12-302-0/+52
| | | | | | | | VSX provides a full complement of rounding instructions yet we somehow ended up with some of them legal and others not. This just legalizes all of the FP rounding nodes and the FP -> int rounding nodes with unsafe math. Differential revision: https://reviews.llvm.org/D69949
* Revert "[MIPS GlobalISel] Select bitreverse"Dmitri Gribenko2019-12-302-49/+1
| | | | | | This reverts commit dbc136e0fe7e14c64dcb78e72321bb41af60afa4. It broke buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/21066
* [ARM] Sink splat to ICmpDavid Green2019-12-302-2/+3
| | | | | | | | | This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
* [LV][NFC] Keep dominator tree up to date during vectorization.Evgeniy Brevnov2019-12-303-72/+64
|
* [LV][NFC] Some refactoring and renaming to facilitate next change.Evgeniy Brevnov2019-12-301-69/+80
|
* [ARM][THUMB2] Allow emitting T3 types of add and subDiogo Sampaio2019-12-301-42/+33
| | | | | | | | | | | | | | | | | | Summary: This patch allows to emit thumb2 add and sub instructions with 12 bit immediates in the emitT2RegPlusImmediate function. - Splitting parts of the D70680 Reviewers: eli.friedman, olista01, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71361
* [MIPS GlobalISel] Select bitreversePetar Avramovic2019-12-302-1/+49
| | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Differential Revision: https://reviews.llvm.org/D71363
* [MIPS GlobalISel] Select bswapPetar Avramovic2019-12-303-0/+72
| | | | | | | | | G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates these intrinsics from __builtin_bswap32 and __builtin_bswap64. Add lower and narrowscalar for G_BSWAP. Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later. Differential Revision: https://reviews.llvm.org/D71362
* [MCP] Add stats for backward copy propagation. NFC.Kai Luo2019-12-301-1/+5
|
OpenPOWER on IntegriCloud