summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] fix typo in comment; NFCSanjay Patel2019-06-201-1/+1
| | | | llvm-svn: 363974
* [clang][NewPM] Do not eliminate available_externally durng `-O2 -flto` runsLeonard Chan2019-06-201-2/+4
| | | | | | | | | | | | This fixes CodeGen/available-externally-suppress.c when the new pass manager is turned on by default. available_externally was not emitted during -O2 -flto runs when it should still be retained for link time inlining purposes. This can be fixed by checking that we aren't LTOPrelinking when adding the EliminateAvailableExternallyPass. Differential Revision: https://reviews.llvm.org/D63580 llvm-svn: 363971
* [LFTR] Fix a (latent?) bug related to nested loopsPhilip Reames2019-06-201-0/+6
| | | | | | | | I can't actually come up with a test case this triggers on without an out of tree change, but in theory, it's a bug in the recently added multiple exit LFTR support. The root issue is that an exiting block common to two loops can (in theory) have computable exit counts for both loops. Rewriting the exit of an inner loop in terms of the outer loops IV would cause the inner loop to either a) run forever, or b) terminate on the first iteration. In practice, we appear to get lucky and not have the exit count computable for the outer loop, except when it's trivially zero. Given we bail on zero exit counts, we don't appear to ever trigger this. But I can't come up with a reason we *can't* compute an exit count for the outer loop on the common exiting block, so this may very well be triggering in some cases. llvm-svn: 363964
* [X86] Add BLSI to isUseDefConvertible.Craig Topper2019-06-201-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957
* [InstCombine] canonicalize check for power-of-2Sanjay Patel2019-06-201-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The form that compares against 0 is better because: 1. It removes a use of the input value. 2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS). This is a root cause for PR42314, but probably doesn't completely answer the codegen request: https://bugs.llvm.org/show_bug.cgi?id=42314 Alive proof: https://rise4fun.com/Alive/9kG Name: is power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp eq i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp eq i32 %a2, 0 Name: is not power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp ne i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp ne i32 %a2, 0 llvm-svn: 363956
* [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.Simon Pilgrim2019-06-201-21/+20
| | | | | | Better handling of out-of-i64-range values due to large integer types or from fuzz tests. llvm-svn: 363955
* [DAGCombiner][NFC] Remove unused varJordan Rupprecht2019-06-201-1/+0
| | | | llvm-svn: 363954
* Store a pointer to the return value in a static alloca and let the debugger ↵Amy Huang2019-06-201-2/+12
| | | | | | | | | | | | | | use that as the variable address for NRVO variables. Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D63361 llvm-svn: 363952
* [InstCombine] cttz(-x) -> cttz(x)David Bolvansky2019-06-201-0/+6
| | | | | | | | | | | | | | | | Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63534 llvm-svn: 363951
* AMDGPU: Fix ignoring DisableFramePointerElim in leaf functionsMatt Arsenault2019-06-201-11/+7
| | | | | | | | The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949
* [CodeGen] Fix formatting and comments (NFC)Evandro Menezes2019-06-201-6/+8
| | | | llvm-svn: 363947
* AMDGPU: Treat undef as an inline immediateMatt Arsenault2019-06-202-5/+19
| | | | | | | This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941
* [ARM] Add a batch of MVE integer instructions.Simon Tatham2019-06-203-1/+406
| | | | | | | | | | | | | | | | This includes integer arithmetic of various kinds (add/sub/multiply, saturating and not), and the immediate forms of VMOV and VMVN that load an immediate into all lanes of a vector. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62674 llvm-svn: 363936
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-2010-40/+56
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), ↵Simon Pilgrim2019-06-201-17/+19
| | | | | | | | C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929
* [DAGCombine] Add TODOs for some combines that should support non-uniform vectorsSimon Pilgrim2019-06-201-0/+15
| | | | | | We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc. llvm-svn: 363924
* [X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well.Simon Pilgrim2019-06-201-6/+10
| | | | llvm-svn: 363922
* [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.Simon Pilgrim2019-06-201-2/+1
| | | | | | | | Fixes cppcheck warning. Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here. llvm-svn: 363921
* [MIPS GlobalISel] Select integer to floating point conversionsPetar Avramovic2019-06-203-2/+23
| | | | | | | | Select G_SITOFP and G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D63542 llvm-svn: 363912
* [MIPS GlobalISel] Select floating point to integer conversionsPetar Avramovic2019-06-204-2/+53
| | | | | | | | Select G_FPTOSI and G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D63541 llvm-svn: 363911
* [X86] Remove memory instructions form isUseDefConvertible.Craig Topper2019-06-201-15/+15
| | | | | | | | The caller of this is looking for comparisons of the input to these instructions with 0. But the memory instructions input is an addess not a value input in a register. llvm-svn: 363907
* [X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they ↵Craig Topper2019-06-201-3/+4
| | | | | | seemed obviously missing. llvm-svn: 363906
* AMDGPU: Don't clobber VCC in MUBUF addr64 emulationMatt Arsenault2019-06-201-9/+16
| | | | | | | | | Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904
* [llvm-objdump] Switch between ARM/Thumb based on mapping symbols.Eli Friedman2019-06-201-29/+28
| | | | | | | | | | | | | | | The ARMDisassembler changes allow changing between ARM and Thumb mode based on the MCSubtargetInfo, rather than the Target, which simplifies the other changes a bit. I'm not really happy with adding more target-specific logic to tools/llvm-objdump/, but there isn't any easy way around it: the logic in question specifically applies to disassembling an object file, and that code simply isn't located in lib/Target, at least at the moment. Differential Revision: https://reviews.llvm.org/D60927 llvm-svn: 363903
* AMDGPU: Consolidate some getGeneration checksMatt Arsenault2019-06-199-31/+82
| | | | | | | | This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902
* [FileCheck] Stop qualifying expressions as numericThomas Preud'homme2019-06-191-40/+39
| | | | | | | | | | | | | | | | | Summary: Stop referring to "numeric expression", using simply the term "expression" instead. Likewise for numeric operation since operations are only used in numeric expressions. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63500 llvm-svn: 363901
* FileCheck: Return parse error w/ Error & ExpectedThomas Preud'homme2019-06-191-201/+214
| | | | | | | | | | | | | | | | Summary: Make use of Error and Expected to bubble up diagnostics and force checking of errors in the callers. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63125 llvm-svn: 363900
* AMDGPU: Undo sub x, c canonicalization for v2i16Matt Arsenault2019-06-193-26/+87
| | | | | | Should avoid regression from D62341 llvm-svn: 363899
* [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().Simon Pilgrim2019-06-191-2/+2
| | | | | | Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something..... llvm-svn: 363887
* [mips] Mark the `lwupc` instruction as MIPS64 R6 onlySimon Atanasyan2019-06-192-3/+3
| | | | | | | | | | The "The MIPS64 Instruction Set Reference Manual" [1] states that the `lwupc` is MIPS64 Release 6 only. It should not be supported for 32-bit CPUs. [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-6.06.pdf llvm-svn: 363886
* [mips] Add (GPR|PTR)_64 predicates to PseudoReturn64 and ↵Simon Atanasyan2019-06-191-2/+2
| | | | | | | | | PseudoIndirectHazardBranch64 This patch is one of a series of patches. The goal is to make P5600 scheduler model complete and turn on the `CompleteModel` flag. llvm-svn: 363885
* LFTR for multiple exit loopsPhilip Reames2019-06-191-17/+16
| | | | | | | | | | | | | | Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop. This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.) I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred. I do want to note that I added one bit after the review. When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one. This was theoretically possible with a single exit, but the zero case covered it in practice. With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached. Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done. The solution seemed obvious, so I didn't bother with another round of review. Differential Revision: https://reviews.llvm.org/D62625 llvm-svn: 363883
* [MemorySSA] Cleanup trivial phis.Alina Sbirlea2019-06-191-5/+8
| | | | | | | | | | | | | | | | | | | Summary: This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning. In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up. In MemorySSA, the APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63354 llvm-svn: 363880
* [MemorySSA] Use GraphDiff info when computing IDF.Alina Sbirlea2019-06-191-1/+1
| | | | | | | | | | | | | | | | Summary: When computing IDF for insert updates, ensure we use the snapshot CFG offered by GraphDiff. Caught by D63389. Reviewers: kuhar, george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits, Szelethus Tags: #llvm Differential Revision: https://reviews.llvm.org/D63443 llvm-svn: 363879
* [LFTR] Stylistic cleanup as suggested in last review comment of D62939 [NFC]Philip Reames2019-06-191-9/+9
| | | | | | (Resumbit of r363292 which was reverted along w/an earlier patch) llvm-svn: 363877
* AMDGPU: Fix folding immediate into readfirstlane through reg_sequenceMatt Arsenault2019-06-192-2/+6
| | | | | | | | | | | | | The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876
* [LFTR] Rename variable to minimize confusion [NFC]Philip Reames2019-06-191-15/+15
| | | | | | | | (Recommit of r363293 which was reverted when a dependent patch was.) As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop. A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here. llvm-svn: 363875
* hwasan: Shrink outlined checks by 1 instruction.Peter Collingbourne2019-06-191-12/+7
| | | | | | | | | Turns out that we can save an instruction by folding the right shift into the compare. Differential Revision: https://reviews.llvm.org/D63568 llvm-svn: 363874
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-196-21/+155
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG supportSimon Pilgrim2019-06-191-11/+12
| | | | | | Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856
* [x86] avoid vector load narrowing with extracted store uses (PR42305)Sanjay Patel2019-06-191-0/+20
| | | | | | | | | | | | This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-3/+4
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850
* [X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI.Simon Pilgrim2019-06-191-0/+1
| | | | | | So I don't forget - there's a load of yak shaving to do first. llvm-svn: 363847
* [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlierHuihui Zhang2019-06-191-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To generate simplified IR, make sure fold ``` (X & signbit) ==/!= 0) -> X s>=/s< 0; ``` is scheduled before fold ``` ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. ``` https://rise4fun.com/Alive/fbdh Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63026 llvm-svn: 363845
* [X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG.Simon Pilgrim2019-06-191-10/+15
| | | | | | We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841
* [ARM] Add MVE vector bit-operations (register inputs).Simon Tatham2019-06-198-25/+477
| | | | | | | | | | | | | | | | | | | | | | | | This includes all the obvious bitwise operations (AND, OR, BIC, ORN, MVN) in register-to-register forms, and the immediate forms of AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that access a single lane of a vector. Some of those VMOVs (specifically, the ones that access a 32-bit lane) share an encoding with existing instructions that were disassembled as accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in 8.1-M they're now written as accessing a quarter of a q-register (e.g. `vmov.32 r0, q0[2]`). The older syntax is still accepted by the assembler. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62673 llvm-svn: 363838
* [AVR] Change limit type to match the argument type (NFC)Evandro Menezes2019-06-191-1/+1
| | | | llvm-svn: 363832
* [Hexagon] Change limit type to match the argument type (NFC)Evandro Menezes2019-06-191-1/+1
| | | | llvm-svn: 363831
* [X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned ↵Simon Pilgrim2019-06-191-15/+13
| | | | | | | | bool flag. NFCI. Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag. llvm-svn: 363818
* [DFSan] Add UnaryOperator visitor to DataFlowSanitizerCameron McInally2019-06-191-0/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D62815 llvm-svn: 363814
OpenPOWER on IntegriCloud