summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll
Commit message (Collapse)AuthorAgeFilesLines
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-241-12/+12
| | | | as cleanups after D56351
* [ARM] Generate vcmp instead of vcmpeKristof Beyls2019-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025
* [FIX] Forces shrink wrapping to consider any memory access as aliasing with ↵Diogo N. Sampaio2019-06-131-203/+1398
| | | | | | | | | | | | | | | | | | | | | | | | | the stack Summary: Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472 The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed. Taking an exception can cause the stack to be corrupted. As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access the stack. Reviewers: dmgreen, qcolombet Reviewed By: qcolombet Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg Tags: #llvm Differential Revision: https://reviews.llvm.org/D63152 llvm-svn: 363265
* [SchedModel] Fix for read advance cycles with implicit pseudo operands.Jonas Paulsson2018-10-301-14/+14
| | | | | | | | | | | | | | | | | | The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606
* Reapply ARM: Do not spill CSR to stack on entry to noreturn functionsTim Northover2018-04-071-1/+1
| | | | | | | | | | | | | | | | | | Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494
* Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"Vitaly Buka2018-04-071-1/+1
| | | | | | | | Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486
* ARM: Do not spill CSR to stack on entry to noreturn functionsTim Northover2018-04-051-1/+1
| | | | | | | | | | | | | | A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287
* Fix test broken by r304020David Blaikie2017-05-261-1/+6
| | | | | | | | | | It's a workaround because the test was flakey passing to begin with, but it looks like (going off commit history) it really did want to test in the presence of debug info, so keep that behavior (by adding something to the CU so it's not dropped) & restore the flakey pass in the process. (added a FIXME in case someone else decides to look at it later) llvm-svn: 304042
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlySjoerd Meijer2016-12-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
* ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()Matthias Braun2016-11-111-0/+1
| | | | | | | | | | | | | | | addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544
* Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"James Molloy2016-11-031-2/+2
| | | | | | This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2016-11-031-2/+2
| | | | | | | | | | | | | | | This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893
* Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"James Molloy2016-09-141-2/+2
| | | | | | This reverts commit r281323. It caused chromium test failures and a selfhost failure. llvm-svn: 281451
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2016-09-131-2/+2
| | | | | | | | | | | | | For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 281323
* Revert r281215, it caused PR30358.Nico Weber2016-09-121-2/+2
| | | | llvm-svn: 281263
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2016-09-121-2/+2
| | | | | | | | | | | | | For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 281215
* [ARM] Generate consistent frame records for Thumb2Oliver Stannard2016-08-231-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is not an official documented ABI for frame pointers in Thumb2, but we should try to emit something which is useful. We use r7 as the frame pointer for Thumb code, which currently means that if a function needs to save a high register (r8-r11), it will get pushed to the stack between the frame pointer (r7) and link register (r14). This means that while a stack unwinder can follow the chain of frame pointers up the stack, it cannot know the offset to lr, so does not know which functions correspond to the stack frames. To fix this, we need to push the callee-saved registers in two batches, with the first push saving the low registers, fp and lr, and the second push saving the high registers. This is already implemented, but previously only used for iOS. This patch turns it on for all Thumb2 targets when frame pointers are required by the ABI, and the frame pointer is r7 (Windows uses r11, so this isn't a problem there). If frame pointer elimination is enabled we still emit a single push/pop even if we need a frame pointer for other reasons, to avoid increasing code size. We must also ensure that lr is pushed to the stack when using a frame pointer, so that we end up with a complete frame record. Situations that could cause this were rare, because we already push lr in most situations so that we can return using the pop instruction. Differential Revision: https://reviews.llvm.org/D23516 llvm-svn: 279506
* DAG: avoid duplicated truncating for sign extended operandWeiming Zhao2016-07-291-3/+7
| | | | | | | | | | | | | | | Summary: When performing cmp for EQ/NE and the operand is sign extended, we can avoid the truncaton if the bits to be tested are no less than origianl bits. Reviewers: eli.friedman Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D22933 llvm-svn: 277252
* NFC. Fix popular typo in comment 'deferencing' --> 'dereferencing'.Nick Lewycky2016-06-281-1/+1
| | | | | | Bonus changes, * placement in X86ISelLowering and 'exerce' -> 'exercise' in test. llvm-svn: 273984
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-011-1/+1
| | | | llvm-svn: 265081
* [ARM] Make sure we do not dereference the end iterator when accessing debugQuentin Colombet2015-10-151-0/+57
| | | | | | | | | | information. Although the problem was always here, it would only be exposed when shrink-wrapping is enable. rdar://problem/23110493 llvm-svn: 250352
* [ShrinkWrap] Refactor the handling of infinite loop in the analysis.Quentin Colombet2015-09-171-0/+62
| | | | | | | | | - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958
* Fix possible infinite loop in shrink wrapping when searching for save/restoreKit Barton2015-08-061-0/+28
| | | | | | | | | | | | | | | | | | | | points. There is an infinite loop that can occur in Shrink Wrapping while searching for the Save/Restore points. Part of this search checks whether the save/restore points are located in different loop nests and if so, uses the (post) dominator trees to find the immediate (post) dominator blocks. However, if the current block does not have any immediate (post) dominators then this search will result in an infinite loop. This can occur in code containing an infinite loop. The modification checks whether the immediate (post) dominator is different from the current save/restore block. If it is not, then the search terminates and the current location is not considered as a valid save/restore point for shrink wrapping. Phabricator: http://reviews.llvm.org/D11607 llvm-svn: 244247
* [ARM] Make the frame lowering code ready for shrink-wrapping.Quentin Colombet2015-07-221-0/+536
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap. Related to <rdar://problem/20821730> llvm-svn: 242908
OpenPOWER on IntegriCloud