summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNCJessica Paquette2019-04-232-1/+2
| | | | | | | | | Same patch as G_FCEIL etc. Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct rule in AArch64LegalizerInfo.cpp, and add a test. llvm-svn: 359021
* [ConstantRange] Add urem supportNikita Popov2019-04-231-0/+15
| | | | | | | | | | | | | Add urem support to ConstantRange, so we can handle in in LVI. This is an approximate implementation that tries to capture the most useful conditions: If the LHS is always strictly smaller than the RHS, then the urem is a no-op and the result is the same as the LHS range. Otherwise the lower bound is zero and the upper bound is min(LHSMax, RHSMax - 1). Differential Revision: https://reviews.llvm.org/D60952 llvm-svn: 359019
* [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cppStanislav Mekhanoshin2019-04-231-1/+1
| | | | | | | | The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017
* [AArch64][GlobalISel] Legalize G_FMA for more vector typesJessica Paquette2019-04-231-2/+3
| | | | | | | | | Same as G_FCEIL, G_FABS, etc. Just move it into that rule. Add a legalizer test for G_FMA, which we didn't have before and update arm64-vfloatintrinsics.ll. llvm-svn: 359015
* [AliasAnalysis] AAResults preserves AAManager.Alina Sbirlea2019-04-231-6/+4
| | | | | | | | | | | | | | | | Summary: AAResults should not invalidate AAManager. Update tests. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60914 llvm-svn: 359014
* [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcodeJessica Paquette2019-04-231-0/+1
| | | | | | | | Noticed an unnecessary fallback in arm64-vmul caused by this. Also add a regbankselect test for G_FMA. llvm-svn: 359013
* llvm-undname: Support demangling the spaceship operatorNico Weber2019-04-232-5/+5
| | | | | | Also add a test for demanling the co_await operator. llvm-svn: 359007
* [InstCombine] Convert a masked.load of a dereferenceable address to an ↵Philip Reames2019-04-231-4/+14
| | | | | | | | | | unconditional load If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable. Differential Revision: https://reviews.llvm.org/D59703 llvm-svn: 359000
* [x86] use psubus for more vsetcc lowering (PR39859)Sanjay Patel2019-04-231-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Circling back to a leftover bit from PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859#c1 ...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'. This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca. We already do this transform for the SETULT predicate, so this makes the code more symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect anything but pre-SSE4.1 subtargets. $ cat before.s movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768] pxor %xmm0, %xmm2 pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255] pand %xmm2, %xmm0 pandn %xmm1, %xmm2 por %xmm2, %xmm0 $ cat after.s movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256] psubusw %xmm0, %xmm2 pxor %xmm3, %xmm3 pcmpeqw %xmm2, %xmm3 pand %xmm3, %xmm0 pandn %xmm1, %xmm3 por %xmm3, %xmm0 $ llvm-mca before.s -mcpu=haswell Iterations: 100 Instructions: 600 Total Cycles: 909 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 0.77 IPC: 0.66 Block RThroughput: 1.8 $ llvm-mca after.s -mcpu=haswell Iterations: 100 Instructions: 700 Total Cycles: 409 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.71 IPC: 1.71 Block RThroughput: 1.8 Differential Revision: https://reviews.llvm.org/D60838 llvm-svn: 358999
* [SPARC] Use the correct register set for the "r" asm constraint.Joerg Sonnenberger2019-04-231-0/+2
| | | | | | | | 64bit mode must use 64bit registers, otherwise assumptions about the top half of the registers are made. Problem found by Takeshi Nakayama in NetBSD. llvm-svn: 358998
* Revert "DebugInfo: Emit only one kind of accelerated access/name table"David Blaikie2019-04-233-8/+3
| | | | | | | | Regresses some apple_names situations - still investigating. This reverts commit r358931. llvm-svn: 358997
* Use llvm::stable_sortFangrui Song2019-04-2340-159/+128
| | | | | | While touching the code, simplify if feasible. llvm-svn: 358996
* [RISCV] Support assembling %tls_{ie,gd}_pcrel_hi modifiersLewis Revill2019-04-238-4/+48
| | | | | | | | | This patch adds support for parsing and assembling the %tls_ie_pcrel_hi and %tls_gd_pcrel_hi modifiers. Differential Revision: https://reviews.llvm.org/D55342 llvm-svn: 358994
* [AMDGPU] Fix hidden argument metadata duplication for V3Scott Linder2019-04-231-30/+0
| | | | | | | | | | Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992
* [X86] Pull out collectConcatOps helper. NFCI.Simon Pilgrim2019-04-231-21/+51
| | | | | | Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series. llvm-svn: 358989
* ARM: disallow add/sub to sp unless Rn is also sp.Tim Northover2019-04-232-1/+27
| | | | | | | | The manual says that Thumb2 add/sub instructions are only allowed to modify sp if the first source is also sp. This is slightly different from the usual rGPR restriction since it's context-sensitive, so implement it in C++. llvm-svn: 358987
* [DAGCombiner] generalize binop-of-splats scalarizationSanjay Patel2019-04-231-46/+38
| | | | | | | | | | | | | | If we only match build vectors, we can miss some patterns that use shuffles as seen in the affected tests. Note that the underlying calls within getSplatSourceVector() have the potential for compile-time explosion because of exponential recursion looking through binop opcodes, but currently the list of supported opcodes is very limited. Both of those problems should be addressed in follow-up patches. llvm-svn: 358984
* AMDGPU: Fix LCSSA phi lowering in SILowerI1CopiesNicolai Haehnle2019-04-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983
* [CallSite removal] move InlineCost to CallBase usageFedor Sergeev2019-04-236-114/+113
| | | | | | | | | | | Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982
* [ARM] Update check for CBZ in IfcvtDavid Green2019-04-233-43/+59
| | | | | | | | | | | The check for creating CBZ in constant island pass recently obtained the ability to search backwards to find a Cmp instruction. The code in IfCvt should mirror this to allow more conversions to the smaller form. The common code has been pulled out into a separate function to be shared between the two places. Differential Revision: https://reviews.llvm.org/D60090 llvm-svn: 358977
* [ARM] Don't replicate instructions in Ifcvt at minsizeDavid Green2019-04-231-0/+9
| | | | | | | | | | Ifcvt can replicate instructions as it converts them to be predicated. This stops that from happening on thumb2 targets at minsize where an extra IT instruction is likely needed. Differential Revision: https://reviews.llvm.org/D60089 llvm-svn: 358974
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-04-231-2/+2
| | | | llvm-svn: 358970
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-04-231-2/+2
| | | | llvm-svn: 358969
* [DAGCombiner] Combine OR as ADD when no common bits are setBjorn Pettersson2019-04-231-16/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965
* [AArch64] Add support for MTE intrinsicsJaved Absar2019-04-234-22/+78
| | | | | | | | | | | This patch provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. The intrinsics are described in detail in the latest ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed by: David Spickett Differential Revision: https://reviews.llvm.org/D60486 llvm-svn: 358963
* [ARM][FIX] Add missing f16.lane.vldN/vstN loweringDiogo N. Sampaio2019-04-231-0/+2
| | | | | | | | | | | | | | | | | | Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962
* [llvm-mc] - Properly set the the address align field of the compressed sections.George Rimar2019-04-231-2/+6
| | | | | | | | | | | | | | | | | | | | | About the compressed sections spec says: (https://docs.oracle.com/cd/E37838_01/html/E36783/section_compression.html) sh_addralign fields of the section header for a compressed section reflect the requirements of the compressed section. Currently, llvm-mc always puts uncompressed section alignment to sh_addralign. It is not correct. zlib styled section contains an Elfxx_Chdr header, so we should either use 4 or 8 values depending on the target (Uncompressed section alignment is stored in ch_addralign field of the compression header). GNU assembler version 2.31.1 also has this issue, but in 2.32.51 it was already fixed. This is how it was found during debugging of the https://bugs.llvm.org/show_bug.cgi?id=40482 actually. Differential revision: https://reviews.llvm.org/D60965 llvm-svn: 358960
* [LSR] Limit the recursion for setup costDavid Green2019-04-231-11/+14
| | | | | | | | | | | | | | In some circumstances we can end up with setup costs that are very complex to compute, even though the scevs are not very complex to create. This can also lead to setupcosts that are calculated to be exactly -1, which LSR treats as an invalid cost. This patch puts a limit on the recursion depth for setup cost to prevent them taking too long. Thanks to @reames for the report and test case. Differential Revision: https://reviews.llvm.org/D60944 llvm-svn: 358958
* [WebAssembly] Bail out of fastisel earlier when computing PIC addressesSam Clegg2019-04-231-11/+6
| | | | | | | | | | | This change partially reverts https://reviews.llvm.org/D54647 in favor of bailing out during computeAddress instead. This catches the condition earlier and handles more cases. Differential Revision: https://reviews.llvm.org/D60986 llvm-svn: 358948
* Revert "Use const DebugLoc&"Chandler Carruth2019-04-231-2/+2
| | | | | | | | | | | | | | | | This reverts r358910 (git commit 2b744665308fc8d30a3baecb4947f2bd81aa7d30) While this patch *seems* trivial and safe and correct, it is not. The copies are actually load bearing copies. You can observe this with MSan or other ways of checking for use-after-destroy, but otherwise this may result in ... difficult to debug inexplicable behavior. I suspect the issue is that the debug location is used after the original reference to it is removed. The metadata backing it gets destroyed as its last references goes away, and then we reference it later through these const references. llvm-svn: 358940
* DebugInfo: Emit only one kind of accelerated access/name tableDavid Blaikie2019-04-223-3/+8
| | | | | | | | | | | | | | | Currently to opt in to debug_names in DWARFv5, the IR must contain 'nameTableKind: Default' which also enables debug_pubnames. Instead, only allow one of {debug_names, apple_names, debug_pubnames, debug_gnu_pubnames}. nameTableKind: Default gives debug_names in DWARFv5 and greater, debug_pubnames in v4 and earlier - and apple_names when tuning for lldb on MachO. nameTableKind: GNU always gives gnu_pubnames llvm-svn: 358931
* [SelectionDAG] move splat util functions up from x86 loweringSanjay Patel2019-04-222-56/+53
| | | | | | | | | | This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930
* [AMDGPU] Fix an issue in `op_sel_hi` skipping.Michael Liao2019-04-221-7/+16
| | | | | | | | | | | | | | | | | Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922
* [InstCombine] Eliminate stores to constant memoryPhilip Reames2019-04-222-0/+24
| | | | | | | | | | | | If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store. The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead. Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date. Differential Revision: https://reviews.llvm.org/D60659 llvm-svn: 358919
* [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify ↵Philip Reames2019-04-222-6/+2
| | | | | | | | from InstCombine In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask. llvm-svn: 358913
* Use const DebugLoc&Matt Arsenault2019-04-221-2/+2
| | | | llvm-svn: 358910
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-2/+7
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* [IPSCCP] Add missing `AssumptionCacheTracker` dependencyJustin Bogner2019-04-221-0/+1
| | | | | | | | | Back in August, r340525 introduced a dependency on the assumption cache tracker in the ipsccp pass, but that commit missed a call to INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache improperly registered if SCCP is the only thing that pulls it in. llvm-svn: 358903
* [LPM/BPI] Preserve BPI through trivial loop pass pipeline (e.g. LCSSA, ↵Philip Reames2019-04-222-0/+13
| | | | | | | | | | | | | | LoopSimplify) Currently, we do not expose BPI to loop passes at all. In the old pass manager, we appear to have been ignoring the fact that LCSSA and/or LoopSimplify didn't preserve BPI, and making it available to the following loop passes anyways. In the new one, it's invalidated before running any loop pass if either LCSSA or LoopSimplify actually make changes. If they don't make changes, then BPI is valid and available. So, we go ahead and teach LCSSA and LoopSimplify how to preserve BPI for consistency between old and new pass managers. This patch avoids an invalidation between the two requires in the following trivial pass pipeline: opt -passes="requires<branch-prob>,loop(no-op-loop),requires<branch-prob>" (when the input file is one which requires either LCSSA or LoopSimplify to canonicalize the loops) Differential Revision: https://reviews.llvm.org/D60790 llvm-svn: 358901
* [PGO/SamplePGO][NFC] Move the function updateProfWeight from InstructionWei Mi2019-04-222-43/+44
| | | | | | | | | | | | | | | | | | | | | | to CallInst. The issue was raised here: https://reviews.llvm.org/D60903#1472783 The function Instruction::updateProfWeight is only used for CallInst in profile update. From the current interface, it is very easy to think that the function can also be used for branch instruction. However, Branch instruction does't need the scaling the function provides for branch_weights and VP (value profile), in addition, scaling may introduce inaccuracy for branch probablity. The patch moves the function updateProfWeight from Instruction class to CallInst to remove the confusion. The patch also changes the scaling of branch_weights from a loop to a block because we know that ProfileData for branch_weights of CallInst will only have two operands at most. Differential Revision: https://reviews.llvm.org/D60911 llvm-svn: 358900
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-1/+3
| | | | llvm-svn: 358894
* GlobalISel: Legalize scalar G_EXTRACT sourcesMatt Arsenault2019-04-221-0/+7
| | | | llvm-svn: 358892
* llvm-undname: Fix an assert-on-invalid, found by oss-fuzzNico Weber2019-04-221-1/+1
| | | | llvm-svn: 358891
* AMDGPU: Fix not checking for copy when looking at copy srcMatt Arsenault2019-04-221-1/+6
| | | | | | | Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890
* [AMDGPU][MC] Corrected parsing of SP3 'neg' modifierDmitry Preobrazhensky2019-04-221-24/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-222-12/+50
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [DAGCombiner] make variable name less ambiguous; NFCSanjay Patel2019-04-221-4/+4
| | | | llvm-svn: 358886
* [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFCSanjay Patel2019-04-221-11/+16
| | | | llvm-svn: 358884
* [LLVM-C] Add accessors to the default floating-point metadata nodeRobert Widmann2019-04-221-0/+12
| | | | | | | | | | | | | | | | Summary: Add a getter and setter pair for floating-point accuracy metadata. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60527 llvm-svn: 358883
* [NewPM] Add Option handling for SimpleLoopUnswitchSerguei Katkov2019-04-222-1/+39
| | | | | | | | | | | This patch enables passing options to SimpleLoopUnswitch via the passes pipeline. Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60676 llvm-svn: 358880
OpenPOWER on IntegriCloud