summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [TypePromotion] Use SetVectors instead of PtrSetsSam Parker2020-01-071-40/+30
| | | | | | | | | | | Remove the chance of non-deterministic insertion of zexts of the sources by using a SetVector instead of SmallPtrSet. Do the same for sinks for consistency and to negate the small issue from possibly happening. The SafeWrap instructions are now also stored in a SmallVector. The IRPromoter members of these structures have been changed to references. Differential Revision: https://reviews.llvm.org/D72322
* [DAGCombiner] reduce shuffle of concat of same vectorSanjay Patel2020-01-071-0/+24
| | | | | | | | | | | | | | | | | This is possibly a small part towards solving PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 The vectorizer is creating shuffles of concat like this: %63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine, so we should have a backend backstop to convert that into the likely simpler form: %64 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3> Differential Revision: https://reviews.llvm.org/D72300
* [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPSSjoerd Meijer2020-01-071-2/+0
| | | | | | | | | | | | | | | | | | | This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470
* GlobalISel: Implement lower for G_INTRINSIC_ROUNDMatt Arsenault2020-01-061-0/+29
| | | | | Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.
* Don't rely on 'l'(ell) modifiers to indicate a label referenceBill Wendling2020-01-061-19/+16
| | | | | | | | | | | | | | | | Summary: It's not necessary to use an 'l'(ell) modifier when referencing a label. Treat block addresses and MBB references as if the modifier is used anyway. This prevents us from generating references to ficticious labels. Reviewers: jyknight, nickdesaulniers, hfinkel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71849
* GlobalISel: Correct result type for G_FCMP in lowerFPTOUIMatt Arsenault2020-01-061-1/+3
| | | | | Using the final result type doesn't make any sense. Use the natural default boolean type for the select condition.
* GlobalISel: Start adding computeNumSignBits to GISelKnownBitsMatt Arsenault2020-01-061-0/+70
|
* llc/MIR: Fix setFunctionAttributes for MIR functionsMatt Arsenault2020-01-061-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | A random set of attributes are implemented by llc/opt forcing the string attributes on the IR functions before processing anything. This would not happen for MIR functions, which have not yet been created at this point. Use a callback in the MIR parser, purely to avoid dealing with the ugliness that the command line flags are in a .inc file, and would require allowing access to these flags from multiple places (either from the MIR parser directly, or a new utility pass to implement these flags). It would probably be better to cleanup the flag handling into a separate library. This is in preparation for treating more command line flags with a corresponding function attribute in a more uniform way. The fast math flags in particular have a messy system where the command line flag sets the behavior from a function attribute if present, and otherwise the command line flag. This means if any other pass tries to inspect the function attributes directly, it will be inconsistent with the intended behavior. This is also inconsistent with the current behavior of -mcpu and -mattr, which overwrites any pre-existing function attributes. I would like to move this to consistenly have the command line flags not overwrite any pre-existing attributes, and to always ensure the command line flags are consistent with the function attributes.
* [LegalizeTypes] Add widening support for STRICT_FSETCC/FSETCCSCraig Topper2020-01-062-0/+86
| | | | | | This patch adds widening which really just scalarizes because we don't have a strategy for the extra elements we would need to pad with. Differential Revision: https://reviews.llvm.org/D72193
* Fix "use of uninitialized variable" static analyzer warning. NFCI.Simon Pilgrim2020-01-061-1/+1
|
* [DAG] DAGCombiner::XformToShuffleWithZero - use APInt::extractBits helper. NFCI.Simon Pilgrim2020-01-061-8/+4
|
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-064-4/+4
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* [TargetLowering] Use SETCC input type to call getBooleanContents instead of ↵Craig Topper2020-01-051-1/+1
| | | | | | | | | the setcc result type. This isn't a functonal change since we also check the bit width is the same and the input type is integer. This guarantees the input and output type are the same. But passing the input type makes the code more readable.
* [DAGCombine] Don't check the legality of type when combine the SIGN_EXTEND_INREGQingShan Zhang2020-01-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | This is the DAG node for SIGN_EXTEND_INREG : t21: v4i32 = sign_extend_inreg t18, ValueType:ch:v4i16 It has two operands. The first one is the value it want to extend, and the second one is the type to specify how to extend the value. For this example, it means that, it is signed extend the t18(v4i32) from v4i16 to v4i32. That is the semantics of c code: vector int foo(vector int m) { return m << 16 >> 16; } And it could be any vector type that hardware support the operation, though the type 'v4i16' is NOT legal for the target. When we are trying to combine the srl + sra, what we did now is calling the TLI.isOperationLegal(), which will also check the legality of the type. That doesn't make sense. Differential Revision: https://reviews.llvm.org/D70230
* [LegalizeVectorOps][X86] Enable expansion of vector fp_to_uint in ↵Craig Topper2020-01-041-1/+5
| | | | | | | | | | | | | LegalizeVectorOps to avoid scalarization. The code here isn't great in all caess. Particularly v4f64->v4i32 on 64-bit AVX targets. But there is some improvement in some configurations. There's definitely some issues with computeNumSignBits with X86ISD::STRICT_FCMP. As well as not being able to propagate sign bits through merge_values nodes that get created during custom legalization.
* [TargetLowering] In expandFP_TO_UINT, add proper extend or truncate for the ↵Craig Topper2020-01-041-0/+4
| | | | | | | | | | | | condition to feed the DstVT select. Previously, for vectors we created a vselect with a condition that didn't match what the target wanted according to getSetCCResultType. To make up for this, X86 had a special DAG combine to detect if the condition was all sign bits and then insert its own truncate or extend. By adding the extend/truncate here explicitly we can avoid that.
* [LegalizeVectorOps] Split most of ExpandStrictFPOp into a separate ↵Craig Topper2020-01-041-6/+13
| | | | | | | | | | | UnrollStrictFPOp method. Call that method from ExpandUINT_TO_FLOAT. ExpandStrictFPOp calls ExpandUINT_TO_FLOAT. Previously, ExpandUINT_TO_FLOAT returned SDValue() if it wasn't able to handle and needed to unroll. Then ExpandStrictFPOp would detect his SDValue() and do the unroll. After this change, ExpandUINT_TO_FLOAT will directly call UnrollStrictFPOp and return the unrolled result.
* GlobalISel: Scalarize all division operationsMatt Arsenault2020-01-041-0/+3
| | | | | | This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.
* Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."Florian Hahn2020-01-041-1/+1
| | | | | This reverts commit 51ef53f3bd23559203fe9af82ff2facbfedc1db3, as it breaks some bots.
* [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).Florian Hahn2020-01-041-1/+1
| | | | | | | | | | | | SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
* GlobalISel: Define G_READCYCLECOUNTERMatt Arsenault2020-01-041-0/+2
|
* [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits ↵Simon Pilgrim2020-01-041-0/+11
| | | | | | | | | | | | | | for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887
* GlobalISel: Add type argument to getRegBankFromRegClassMatt Arsenault2020-01-031-7/+13
| | | | | | AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
* [DAGCombiner] fix miscompile in translating (X & undef) to shuffleSanjay Patel2020-01-031-1/+3
| | | | | See PR42982 for more context: https://bugs.llvm.org/show_bug.cgi?id=42982
* [LegalizeVectorOps] Pass the post-UpdateNodeOperands version of Op to ↵Craig Topper2020-01-031-11/+14
| | | | | | | | | | ExpandLoad/ExpandStore UpdateNodeOperands might CSE to another existing node. So we should make sure we're legalizing that node otherwise we might fail to hook up the operands properly. I've moved the result registration up to the caller to avoid having to pass both Result and Op into the functions where it might be confusing which is which. This address 2 other issues pointed out in D71861. Differential Revision: https://reviews.llvm.org/D72021
* Move tail call disabling code to target independent codeReid Kleckner2020-01-034-8/+28
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* [DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448)Roman Lebedev2020-01-031-20/+9
| | | | | | | | | | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)' Even if we don't manage to fold `~` into B, we have likely formed `ANDN` node. Also, this way there's less similar-but-duplicate folds. Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [DAGCombiner] `~(add X, -1)` -> `neg X` foldRoman Lebedev2020-01-031-0/+7
| | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)', but we currently fail to sink that '~' into `(B - 1)`. Name: ~(X - 1) -> (0 - X) %o = add i32 %X, -1 %r = xor i32 %o, -1 => %r = sub i32 0, %X https://rise4fun.com/Alive/rjU
* [DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold ↵Roman Lebedev2020-01-031-0/+10
| | | | | | | | | | | | | | | | | | | | | (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [NFC][DAGCombine] Clarify comment for 'A - (A & (B - 1))' foldRoman Lebedev2020-01-031-1/+1
|
* Fix typo "psuedo" in commentsJay Foad2020-01-031-1/+1
|
* [DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)Roman Lebedev2020-01-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵QingShan Zhang2020-01-031-0/+1
| | | | | | | | | | | for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
* DAG: Use TargetConstant for FENCE operandsMatt Arsenault2020-01-021-4/+4
|
* [SelectionDAG] Simplify SelectionDAGBuilder::visitInlineAsmFangrui Song2020-01-021-3/+1
|
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-025-10/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* [NFC] Add explicit instantiation to releaseNodeQiu Chaofan2020-01-021-0/+5
| | | | | | Resolve a build failure about undefined symbols introduced by f9f78cf. Differential Revision: https://reviews.llvm.org/D72069
* [RegisterClassInfo] Use SmallVector::assign instead of resize to make sure ↵Craig Topper2020-01-011-1/+1
| | | | | | | | | | | | we erase previous contents from all entries of the vector. resize only writes to elements that get added. Any elements that already existed maintain their previous value. In this case we're trying to erase cached information so we should use assign which will write to every element. Found while trying to add new tests to an existing X86 test and noticed register allocation changing in other functions.
* [MachineScheduler] improve reuse of 'releaseNode'methodLorenzo Casalino2020-01-011-17/+21
| | | | | | | | | | | | | | | | The 'SchedBoundary::releaseNode' is merely invoked for releasing the Top/Bottom root nodes. However, 'SchedBoundary::releasePending' uses its same logic to check if the Pending queue has any releasable SUnit. It is possible to slightly modify the body of the two, allowing re-use of the former ('releaseNode') in the latter. Patch by Lorenzo Casalino <lorenzo.casalino93@gmail.com> Reviewers: MatzeB, fhahn, atrick Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65506
* [NFC] Fixes -Wrange-loop-analysis warningsMark de Wever2020-01-015-10/+10
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71857
* DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nszMatt Arsenault2019-12-312-5/+10
| | | | | | | | | | | | | | This was increasing the number of instructions when fsub was legalized on AMDGPU with no signed zeros enabled. This fold should be guarded by hasOneUse, and I don't think getNode should be doing that. The same fold is already done as a regular combine through isNegatibleForFree. This does require duplicating, even though isNegatibleForFree does this combine already (and properly checks hasOneUse) to avoid one PPC regression. In the regression, the outer fneg has nsz but the fsub operand does not. isNegatibleForFree only sees the operand, and doesn't see it's used from a nsz context. A nsz parameter needs to be added and threaded through isNegatibleForFree to avoid this.
* [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to ↵Craig Topper2019-12-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.
* [ARM][TypePromotion] Re-enable by defaultSam Parker2019-12-311-1/+1
| | | | Re-enable the pass after it was reverted and the bug fixed.
* [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵Craig Topper2019-12-302-16/+6
| | | | | | | | | | | instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
* Ignore "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" in favor ↵Fangrui Song2019-12-301-13/+1
| | | | | | | | | | | | | of "frame-pointer" D56351 (included in LLVM 8.0.0) introduced "frame-pointer". All tests which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf" have been migrated to use "frame-pointer". Implement UpgradeFramePointerAttributes to upgrade the two obsoleted function attributes for bitcode. Their semantics are ignored. Differential Revision: https://reviews.llvm.org/D71863
* [MIPS GlobalISel] Select bitreverse. RecommitPetar Avramovic2019-12-301-1/+46
| | | | | | | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Recommit notes: Introduce temporary variables in order to make sure instructions get inserted into MachineFunction in same order regardless of compiler used to build llvm. Differential Revision: https://reviews.llvm.org/D71363
* GlobalISel: moreElementsVector for FP min/maxMatt Arsenault2019-12-301-1/+7
|
* Revert "[MIPS GlobalISel] Select bitreverse"Dmitri Gribenko2019-12-301-45/+1
| | | | | | This reverts commit dbc136e0fe7e14c64dcb78e72321bb41af60afa4. It broke buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/21066
* [MIPS GlobalISel] Select bitreversePetar Avramovic2019-12-301-1/+45
| | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Differential Revision: https://reviews.llvm.org/D71363
* [MIPS GlobalISel] Select bswapPetar Avramovic2019-12-301-0/+58
| | | | | | | | | G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates these intrinsics from __builtin_bswap32 and __builtin_bswap64. Add lower and narrowscalar for G_BSWAP. Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later. Differential Revision: https://reviews.llvm.org/D71362
OpenPOWER on IntegriCloud