summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] When a bitcast is about to be turned into a VMOVDRR, try to combine itQuentin Colombet2015-12-041-0/+55
| | | | | | | | | | | with its source instead of forcing the values on GPRs. This improves the lowering of vector code when such bitcasts happen in the middle of vector computations. rdar://problem/23691584 llvm-svn: 254684
* AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.Tim Northover2015-12-021-1/+1
| | | | | | | | The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic if there has been a corresponding successful stxp. It's less clear for AArch32, so I'm leaving that alone for now. llvm-svn: 254524
* Replace all weight-based interfaces in MBB with probability-based ↵Cong Hou2015-12-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interfaces, and update all uses of old interfaces. (This is the second attempt to submit this patch. The first caused two assertion failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687) The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254377
* Revert r254348: "Replace all weight-based interfaces in MBB with ↵Hans Wennborg2015-12-011-1/+1
| | | | | | | | | | probability-based interfaces, and update all uses of old interfaces." and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction." Asserts were firing in Chromium builds. See PR25687. llvm-svn: 254366
* Replace all weight-based interfaces in MBB with probability-based ↵Cong Hou2015-12-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interfaces, and update all uses of old interfaces. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254348
* ARM: address WOA unsigned division overflow crashMartell Malone2015-11-261-17/+16
| | | | | | | | | Building on r253865 the crash is not limited to signed overflows. Disable custom handling of unsigned 32-bit and 64-bit integer divide. Add test cases for both 32-bit and 64-bit unsigned integer overflow. llvm-svn: 254158
* Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)Artyom Skrobov2015-11-251-20/+8
| | | | | | | | | | | | | | Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085
* ARM: address WoA division overflow crashMartell Malone2015-11-231-20/+11
| | | | | | | Disable custom handling of signed 32-bit and 64-bit integer divide. Add test cases for both 32-bit and 64-bit integer overflow crashes. llvm-svn: 253865
* Properly check if a CMPZ node is in fact comparing against zeroJames Molloy2015-11-161-0/+6
| | | | | | | | This was left implicit and never ever checked, which means we could have a CMPZ against some non-zero value and we were carrying on with BFI conversion regardless. Caught by Oliver Stannard using csmith; regression test added. llvm-svn: 253195
* [ARM] Replace ARMISD::RBIT with ISD::BITREVERSEJames Molloy2015-11-131-4/+5
| | | | | | ISD::BITREVERSE matches "rbit" completely, so remove ARMISD::RBIT and mark ISD::BITREVERSE as legal, adding a test for lowering. llvm-svn: 253047
* [ARM] CMOV->BFI combining: handle both senses of CMPZJames Molloy2015-11-121-0/+10
| | | | | | | | I completely misunderstood what ARMISD::CMPZ means. It's not "compare equal to zero", it's "compare, only setting the zero/Z flag". It can either be equal-to-zero or not-equal-to-zero, and we weren't checking what sense it was. If it's equal-to-zero, we can swap the operands around and pretend like it is not-equal-to-zero, which is both a bug fix and lets us handle more cases. llvm-svn: 252891
* Properly fix unused variable in disable-assert builds.Diego Novillo2015-11-111-1/+3
| | | | | | | | I missed the side-effects of ParseBFI in my previous attempt (r252748). Thanks dblaikie for the suggestion of adding a void use of the unused variable instead. llvm-svn: 252751
* Remove unused variable in disable-assert builds. NFC.Diego Novillo2015-11-111-2/+1
| | | | llvm-svn: 252748
* [ARM] Combine BFIs togetherJames Molloy2015-11-111-2/+109
| | | | | | If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits. llvm-svn: 252740
* [ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()Sanjay Patel2015-11-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2 implementation. The net result of allowing this speculation for the regression tests in this patch is that we get this code: ctlz: clz r0, r0 bx lr cttz: rbit r0, r0 clz r0, r0 bx lr Instead of: ctlz: cmp r0, #0 moveq r0, #32 clzne r0, r0 bx lr cttz: cmp r0, #0 moveq r0, #32 rbitne r0, r0 clzne r0, r0 bx lr This will help solve a general speculation/despeculation problem noted in PR24818: https://llvm.org/bugs/show_bug.cgi?id=24818 Differential Revision: http://reviews.llvm.org/D14469 llvm-svn: 252639
* Reapply "[ARM] Combine CMOV into BFI where possible"James Molloy2015-11-101-0/+112
| | | | | | | | | | | | | | | | | | Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh! Original commit message: If we have a CMOV, OR and AND combination such as: if (x & CN) y |= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252606
* [EABI] Add LLVM support for -meabi flagRenato Golin2015-11-091-8/+28
| | | | | | | | | | | | | | | | | | | | | "GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent '__aeabi_memops'. This convertion violates GCC contract. The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI targets. Without -meabi: use the triple default EABI. With -meabi=default: use the triple default EABI. With -meabi=gnu: use 'memops'. With -meabi=4 or -meabi=5: use '__aeabi_memops'. With -meabi set to an unknown value: same as -meabi=default. Patch by Vinicius Tinti. llvm-svn: 252462
* Revert "[ARM] Combine CMOV into BFI where possible"Renato Golin2015-11-091-115/+0
| | | | | | | This reverts commit r252057, as it broke ARM self-hosting buildbots, probably due to a code-gen fault. llvm-svn: 252460
* [WinEH] Update exception pointer registersJoseph Tremoulet2015-11-071-7/+14
| | | | | | | | | | | | | | | | | | | | Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383
* [ARM] Compute known bits for ARMISD::CMOVJames Molloy2015-11-051-0/+10
| | | | | | | | | | We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities. I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :( This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case. llvm-svn: 252168
* [ARM] Combine CMOV into BFI where possibleJames Molloy2015-11-041-0/+105
| | | | | | | | | | | | | | If we have a CMOV, OR and AND combination such as: if (x & CN) y |= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057
* ARM: add support for WatchOS's compact unwind information.Tim Northover2015-10-281-4/+4
| | | | llvm-svn: 251573
* ARM: teach backend about WatchOS and TvOS libcalls.Tim Northover2015-10-281-23/+46
| | | | | | | The most substantial changes are again for watchOS: libcalls are hard-float if needed and sincos has a different calling convention. llvm-svn: 251571
* [ARM] Expand ROTL and ROTR of vector value typesCharlie Turner2015-10-271-1/+5
| | | | | | | | | | | | Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection. Reviewers: rengolin, t.p.northover Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14082 llvm-svn: 251401
* ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.Peter Collingbourne2015-10-261-1/+1
| | | | | | | Unbreaks linking with gold, which cannot resolve direct relocations referring to global symbols. llvm-svn: 251342
* ARM/ELF: Better codegen for global variable addresses.Peter Collingbourne2015-10-261-32/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In PIC mode we were previously computing global variable addresses (or GOT entry addresses) by adding the PC, the PC-relative GOT displacement and the GOT-relative symbol/GOT entry displacement. Because the latter two displacements are fixed, we ended up performing one more addition than necessary. This change causes us to compute addresses using a single PC-relative displacement, resulting in a shorter code sequence. This reduces code size by about 4% in a recent build of Chromium for Android. As a result of this change we no longer need to compute the GOT base address in the ARM backend, which allows us to remove the Global Base Reg pass and SDAG lowering for the GOT. We also now no longer use the GOT when addressing a symbol which is known to be defined in the same linkage unit. Specifically, the symbol must have either hidden visibility or a strong definition in the current module in order to not use the the GOT. This is a change from the previous behaviour where we would use the GOT to address externally visible symbols defined in the same module. I think the only cases where this could matter are cases involving symbol interposition, but we don't really support that well anyway. Differential Revision: http://reviews.llvm.org/D13650 llvm-svn: 251322
* Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. ↵Craig Topper2015-10-221-6/+6
| | | | | | This removes the need to pass a hardcoded size in many places. NFC llvm-svn: 251032
* Adding support for TargetLoweringBase::LibCallArtyom Skrobov2015-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: TargetLoweringBase::Expand is defined as "Try to expand this to other ops, otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between the two possibilities was defined in a rather convoluted way: - if DIVREM is legal, expand to DIVREM - if DIVREM has a custom lowering, expand to DIVREM - if DIVREM libcall is defined and a remainder from the same division is computed elsewhere, expand to a DIVREM libcall - else, expand to a DIV libcall This had the undesirable effect that if both DIV and DIVREM are implemented as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM libcall, even when the remainder isn't used. The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that backends can directly control whether they prefer an expansion or a conversion to a libcall. This makes the generic lowering code even more generic, allowing its reuse in a wider range of target-specific configurations. The useful effect is that ARM backend will now generate a call to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where it doesn't need the remainder. There's no functional change outside the ARM backend. Reviewers: t.p.northover, rengolin Subscribers: t.p.northover, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D13862 llvm-svn: 250826
* ARM: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-191-10/+7
| | | | llvm-svn: 250759
* Make a bunch of static arrays const.Craig Topper2015-10-181-3/+3
| | | | llvm-svn: 250642
* [ARM] Promote helper function to SelectionDAG.Chad Rosier2015-10-071-34/+12
| | | | | | | | | I'll be using the function in a similar combine for AArch64. The helper was also improved to handle undef values. Part of http://reviews.llvm.org/D13442 llvm-svn: 249572
* [ARM] Use correct half-precision functions in EABI modeOliver Stannard2015-10-071-0/+8
| | | | | | | | | The ARM RTABI defines the half- to single-precision float conversion functions with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore we need to emit the __aeabi version when compiling with an eabi or eabihf triple, and the __gnu version with a gnueabi or gnueabihf triple. llvm-svn: 249565
* [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.Chad Rosier2015-10-071-3/+4
| | | | | | This would result in a crash since the vcvt used does not support v8i32 types. llvm-svn: 249560
* [ARM][AArch64] Only lower to interleaved load/store if the target has NEONJeroen Ketema2015-10-071-6/+7
| | | | | | | | | Without an additional check for NEON, the compiler crashes during legalization of NEON ldN/stN. Differential Revision: http://reviews.llvm.org/D13508 llvm-svn: 249550
* [ARM] Push more complex check down to reduce compile time. NFC.Chad Rosier2015-10-071-10/+10
| | | | llvm-svn: 249547
* [ARM] Minor refactoring. NFC.Chad Rosier2015-10-061-2/+4
| | | | llvm-svn: 249465
* [ARM] Minor refactoring. NFC.Chad Rosier2015-10-061-8/+10
| | | | llvm-svn: 249464
* [ARM] Minor refactoring. NFC.Chad Rosier2015-10-061-9/+8
| | | | llvm-svn: 249463
* [ARM] Minor refactoring to improve readability. NFC.Chad Rosier2015-10-061-13/+14
| | | | llvm-svn: 249454
* [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.Scott Douglass2015-10-051-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were previously codegen'ing memcpy as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achieving better codegen is to create MEMCPY pseudo instructions, attach scratch virtual registers to them and then, post register allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers. The register allocator will have picked arbitrary registers which we sort when expanding the MEMCPY. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. Fixes PR9199 and PR23768. [This is based on Peter Collingbourne's r238473 which was reverted.] Differential Revision: http://reviews.llvm.org/D13239 Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458 Author: Peter Collingbourne <peter@pcc.me.uk> llvm-svn: 249322
* [ARM][NEON] Use address space in vld([1234]|[234]lane) and ↵Jeroen Ketema2015-09-301-6/+7
| | | | | | | | | | | | | | | | | | | | | vst([1234]|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
* [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()Artyom Skrobov2015-09-281-21/+3
| | | | | | | | | | | | | | | supportsTailCall() has two callers. Both of them double-check isThumb1Only(), and refuse to proceed with tail-calling in that case. Therefore, it makes sense to move this check to ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized; and to eliminate the extra checks at the call sites. Following a review comment, added an "assert(supportsTailCall())" in IsEligibleForTailCall. NFC. llvm-svn: 248703
* [ARM] Don't generate clrex for pre-v7 targets.Ahmed Bougacha2015-09-261-0/+2
| | | | | | Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640
* ARM: make -Asserts,-Werror=unused-variable build happySaleem Abdulrasool2015-09-251-4/+4
| | | | | | | The value was only used in an assertion. Sink the variable usage into the assertion. llvm-svn: 248562
* ARM: address WoA division limitationSaleem Abdulrasool2015-09-251-8/+131
| | | | | | | | | | | | | We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561
* [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.defArtyom Skrobov2015-09-241-1/+1
| | | | | | | | | | | | | | | | | | Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with a FIXME: attached. This patch changes the handling of +t2dsp to be in line with other architecture extensions. Following a revert of r248152 and new review comments, this patch also includes renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc. The spelling of "t2dsp" is preserved, pending a further investigation of its possible external usage. Differential Revision: http://reviews.llvm.org/D12937 llvm-svn: 248519
* [ARM] Emit clrex in the expanded cmpxchg fail block.Ahmed Bougacha2015-09-221-0/+6
| | | | | | | | | | | | | | | | | | | ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294
* [ARM] Do not scale vext with a factorJeroen Ketema2015-09-211-9/+1
| | | | | | | | | | | | | The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208
* ARM: cleanup formattingSaleem Abdulrasool2015-09-201-2/+2
| | | | | | clang-format a line which was poorly formatted. NFC. llvm-svn: 248110
* propagate fast-math-flags on DAG nodesSanjay Patel2015-09-161-0/+5
| | | | | | | | | | | | | | | | | | | After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815
OpenPOWER on IntegriCloud