summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a defMatthias Braun2018-11-021-0/+24
| | | | llvm-svn: 346026
* [DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the ↵Simon Pilgrim2018-11-021-18/+19
| | | | | | | | | | | | | vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964
* [ARM][CGP] Negative constant operand handlingSam Parker2018-11-018-0/+52
| | | | | | | | | | | | | | | | | While mutating instructions, we sign extended negative constant operands for binary operators that can safely overflow. This was to allow instructions, such as add nuw i8 %a, -2, to still be able to perform a subtraction. However, the code to handle constants doesn't take into consideration that instructions, such as sub nuw i8 -2, %a, require the i8 -2 to be converted into i32 254. This is a relatively simple fix, but I've taken the time to reorganise the code a bit - mainly that instructions that can be promoted are cached and splitting up the Mutate function. Differential Revision: https://reviews.llvm.org/D53972 llvm-svn: 345840
* [IR] Allow increasing the alignment of dso-local globals.Eli Friedman2018-10-311-1/+7
| | | | | | | | | I think this is the actual important property; the previous visibility check was an approximation. Differential Revision: https://reviews.llvm.org/D53852 llvm-svn: 345790
* [ARM] Add missing pseudo-instruction for Thumb1 RSBS.Eli Friedman2018-10-314-41/+21
| | | | | | | | | Shows up rarely for 64-bit arithmetic, more frequently for the compare patterns added in r325323. Differential Revision: https://reviews.llvm.org/D53848 llvm-svn: 345782
* MachineOperand/MIParser: Do not print debug-use flag, infer itMatthias Braun2018-10-303-48/+48
| | | | | | | | | | | | | | The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671
* [ARM][NFC] Make tests immune to better div optimizationsDavid Bolvansky2018-10-303-18/+16
| | | | | | | | | | | | | | Summary: Related to D52504 Reviewers: spatel Reviewed By: spatel Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53901 llvm-svn: 345665
* [SchedModel] Fix for read advance cycles with implicit pseudo operands.Jonas Paulsson2018-10-3014-160/+160
| | | | | | | | | | | | | | | | | | The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606
* Relax fast register allocator related test cases; NFCMatthias Braun2018-10-294-22/+26
| | | | | | | | | | | | | - Relex hard coded registers and stack frame sizes - Some test cleanups - Change phi-dbg.ll to match on mir output after phi elimination instead of going through the whole codegen pipeline. This is in preparation for https://reviews.llvm.org/D52010 I'm committing all the test changes upfront that work before and after independently. llvm-svn: 345532
* [ARM][NFC] Fix test inlineasm-X-allocation.llSjoerd Meijer2018-10-291-15/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D53748 llvm-svn: 345491
* [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1.Eli Friedman2018-10-264-37/+21
| | | | | | | | | | | | | | | | | | The "dead" markings allow existing target-independent optimizations, like MachineSink, to trigger more frequently. The CPSR defs would have eventually been marked dead by LiveVariables, so this only affects optimizations before regalloc. The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible with this change: the transform adds a use to an otherwise dead def of CPSR. This is covered by existing regression tests. thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the generated code; I'll fix it in D53452. Differential Revision: https://reviews.llvm.org/D53453 llvm-svn: 345420
* [ARM] Fix ARMCodeGenPrepare test casesSjoerd Meijer2018-10-261-32/+30
| | | | | | | | | | | | While working on FileCheck producing better diagnostics in D53710, I noticed that our test case is broken in a few different ways. The test was running, but results were not checked as prefix CHECK-COMMON wasn't defined (which is what FileCheck should warn about). Also, the output was different in 2 cases because of recent changes in ARMCodeGenPrepare. Differential Revision: https://reviews.llvm.org/D53746 llvm-svn: 345386
* [ARM] Regenerate vdup testsSimon Pilgrim2018-10-251-81/+271
| | | | llvm-svn: 345276
* [NFC] Rename minnan and maxnan to minimum and maximumThomas Lively2018-10-241-4/+4
| | | | | | | | | | | | | | | Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218
* ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.Peter Collingbourne2018-10-243-9/+74
| | | | | | | | | | | | | | | | | | | | | The BKPT instruction is specified to cause a software breakpoint, and at least on Linux results in a SIGTRAP. This makes it more suitable for implementing debugtrap than TRAP (aka UDF #254), which is specified to cause an undefined instruction exception and results in a SIGILL on Linux. Moreover, BKPT is not marked as a terminator, which is not only consistent with the IR instruction but allows the analyzeBlock function to correctly analyze a basic block containing the instruction, which fixes an assertion failure in the machine block placement pass previously triggered by the included test case. Because BKPT is only supported starting with ARMv5T, we continue to use UDF #254 when targeting v4T. Differential Revision: https://reviews.llvm.org/D53614 llvm-svn: 345171
* ARM: handle checking aliases with out-of-bounds GEPsSaleem Abdulrasool2018-10-241-0/+17
| | | | | | | | | | | | A global alias may use indices which are not considered in bounds. In such a case, accessing the base object will fail as it only peers through inbounds accesses. This pattern is used by the swift compiler to create references to preceeding members in the type metadata. This would cause the code generation to fail when targeting a platform that used ELF as the object file format. Be conservative and fail the read-only check if we run into an alias that we cannot peer through. llvm-svn: 345107
* Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")Eli Friedman2018-10-1824-688/+0
| | | | | | | Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-10-1724-0/+688
| | | | | | | | | | | | | | Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693
* [ARM] Do not fuse VADD and VMUL, continued (2/2)Sjoerd Meijer2018-10-171-0/+9
| | | | | | | | | This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683
* [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)Sjoerd Meijer2018-10-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- | Opt | < rL342874 | >= rL342874 | | |------------------------------------------------------| |-O3 | vmla | vmul | vmul | | | | vadd | vadd | |------------------------------------------------------| |-Ofast | vfma | vfma | vmul | | | | | vadd | |------------------------------------------------------| |-Oz | vmla | vmla | vmla | -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671
* [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)Simon Pilgrim2018-10-151-137/+17
| | | | | | | | | | As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type. This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case...... Differential Revision: https://reviews.llvm.org/D53257 llvm-svn: 344512
* [ARM] Regenerate cttz testsSimon Pilgrim2018-10-141-136/+283
| | | | | | Improve codegen view as part of PR32655 llvm-svn: 344479
* [ARM] Regenerate popcnt testsSimon Pilgrim2018-10-131-54/+257
| | | | | | Improve codegen view as part of PR32655 llvm-svn: 344465
* [ARM] Account for implicit IT when calculating inline asm sizePeter Smith2018-10-081-0/+47
| | | | | | | | | | | | | | | | | | | | | When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960
* [ARM] Fix correctness checks in promoteToConstantPool.Eli Friedman2018-09-282-16/+23
| | | | | | | | | | | | | | | | | Correctly check for relocations in the constant to promote. And don't allow promoting a constant multiple times. This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ; it's not a complete fix because we also need to prevent ARMConstantIslands from cloning the constant. (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51472 llvm-svn: 343361
* [ARM] Use preferred alignment for constants in promoteToConstantPool.Eli Friedman2018-09-281-0/+11
| | | | | | | | | | | | | | | This mostly affects IR generated by non-clang frontends because clang generally sets the alignment of globals explicitly. Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 . (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51469 llvm-svn: 343359
* [ARM] Allow execute only code on Cortex-m23David Spickett2018-09-283-0/+4
| | | | | | | | | | | The NoMovt feature prevents the use of MOVW/MOVT instructions on Cortex-M23 for performance reasons. These instructions are required for execute only code so NoMovt should be disabled when that option is enabled. Differential Revision: https://reviews.llvm.org/D52551 llvm-svn: 343302
* [CodeGen] Always print register ties in MI::dump()Francis Visoiu Mistrih2018-09-262-2/+2
| | | | | | | | | It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107
* [ARM] Fix for PR39060Sam Parker2018-09-263-9/+182
| | | | | | | | | | | | | When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub. bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060 Differential Revision: https://reviews.llvm.org/D52463 llvm-svn: 343092
* [CodeGen] Enable tail calls for functions with NonNull attributes.David Green2018-09-261-0/+12
| | | | | | | | | | | Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
* Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"Hans Wennborg2018-09-263-558/+0
| | | | | | | | | | | | | | | | | | | | This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082
* [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33Sjoerd Meijer2018-09-241-0/+19
| | | | | | | | | | | | A sequence of VMUL and VADD instructions always give the same or better performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33. Executing the VMUL and VADD back-to-back requires the same cycles, but having separate instructions allows scheduling to avoid the hazard between these 2 instructions. Differential Revision: https://reviews.llvm.org/D52289 llvm-svn: 342874
* [ARM][ARMLoadStoreOptimizer]Luke Cheeseman2018-09-241-0/+40
| | | | | | | | | | | - The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers - This is an UNPREDICTABLE instruction and shouldn't be done - It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch - This fixes https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52085 llvm-svn: 342872
* [ARM] bottom-top mul support ARMParallelDSPSam Parker2018-09-243-0/+558
| | | | | | | | | | | | | | | | Originally committed in rL342210 but was reverted in rL342260 because it was causing issues in vectorized code, because I had forgotten to ensure that we're operating on scalar values. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342870
* [ARM] Fix unwind information for floating point registersOliver Stannard2018-09-191-0/+15
| | | | | | | | | | | | Fixes the unwind information generated for floating-point registers. Previously, all padding registers were assumed to be four bytes wide. Now, the width of the register is used to specify the amount of padding. Patch by Jackson Woodruff! Differential revision: https://reviews.llvm.org/D51494 llvm-svn: 342545
* [TargetLowering] Android has sincos functionsJohn Brawn2018-09-181-0/+18
| | | | | | | | | Since Android API version 9 the Android libm has had the sincos functions, so they should be recognised as libcalls and sincos optimisation should be applied. Differential Revision: https://reviews.llvm.org/D52025 llvm-svn: 342471
* Revert "[ARM] Cleanup ARM CGP isSupportedValue"Volodymyr Sapsai2018-09-181-34/+0
| | | | | | | | | | | | | | | This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431
* [ARM] Cleanup ARM CGP isSupportedValueSam Parker2018-09-171-0/+34
| | | | | | | | | | | | isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
* [ARM] Disallow icmp with negative imm and overflowSam Parker2018-09-171-0/+22
| | | | | | | | | | We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
* Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"Reid Kleckner2018-09-142-460/+0
| | | | | | | | | | It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-09-142-0/+460
| | | | | | | | | | On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210
* [ARM] Allow truncs as sources in ARM CGPSam Parker2018-09-131-1/+0
| | | | | | | | | | We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141
* [ARM] Fix FixConst for ARMCodeGenPrepareSam Parker2018-09-131-1/+10
| | | | | | | | | | Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140
* ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.Tim Northover2018-09-131-0/+49
| | | | | | | | | | | | The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127
* [ARM] Tighten f64<->f16 conversion requirementsDiogo N. Sampaio2018-09-122-0/+29
| | | | | | | | | | | | | | Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061
* [ARM] Exchange MAC operands in ARMParallelDSPSam Parker2018-09-123-0/+737
| | | | | | | | | | | | | | | | SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033
* [ARM] Allow bitcasts in ARMCodeGenPrepareSam Parker2018-09-121-0/+44
| | | | | | | | Allow bitcasts in the use-def chains, treating them as sources. Differential Revision: https://reviews.llvm.org/D50758 llvm-svn: 342032
* [ARM] Add smlald support in ARMParallelDSPSam Parker2018-09-113-0/+364
| | | | | | | | | Search from i64 reducing phis, as well as i32, to allow the generation of smlald instructions. Differential Revision: https://reviews.llvm.org/D51101 llvm-svn: 341941
* ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.Tim Northover2018-09-072-0/+121
| | | | | | | | | | | Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. llvm-svn: 341642
* The initial .text section generated in object files was missing theEric Christopher2018-09-061-0/+3
| | | | | | | | | | | | | | | | | | | | SHF_ARM_PURECODE flag when being built with the -mexecute-only flag. All code sections of an ELF must have the flag set for the final .text section to be execute-only, otherwise the flag gets removed. A HasData flag is added to MCSection to aid in the determination that the section is empty. A virtual setTargetSectionFlags is added to MCELFObjectTargetWriter to allow subclasses to set target specific section flags to be added to sections which we then use in the ARM backend to set SHF_ARM_PURECODE. Patch by Ivan Lozano! Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D48792 llvm-svn: 341593
OpenPOWER on IntegriCloud