summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Re-run load/store optimizer after aggressive tail duplicationAlexandros Lamprineas2018-12-172-0/+52
| | | | | | | | | The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338
* Regenerate neon copy tests. NFCI.Simon Pilgrim2018-12-151-174/+589
| | | | llvm-svn: 349270
* [GlobalISel] LegalizerHelper: Implement fewerElementsVector for G_LOAD/G_STOREVolkan Keles2018-12-142-39/+54
| | | | | | | | | | | | Reviewers: aemerson, dsanders, bogner, paquette, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53728 llvm-svn: 349200
* [globalisel][combiner] Fix r349167 for release mode botsDaniel Sanders2018-12-141-0/+2
| | | | | | This test relies on -debug-only which is unavailable in non-asserts builds. llvm-svn: 349174
* [globalisel][combiner] Make the CombinerChangeObserver a ↵Daniel Sanders2018-12-141-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MachineFunction::Delegate Summary: This allows us to register it with the MachineFunction delegate and be notified automatically about erasure and creation of instructions. However, we still need explicit notification for modifications such as those caused by setReg() or replaceRegWith(). There is a catch with this though. The notification for creation is delivered before any operands can be added. While appropriate for scheduling combiner work. This is unfortunate for debug output since an opcode by itself doesn't provide sufficient information on what happened. As a result, the work list remembers the instructions (when debug output is requested) and emits a more complete dump later. Another nit is that the MachineFunction::Delegate provides const pointers which is inconvenient since we want to use it to schedule future modification. To resolve this GISelWorkList now has an optional pointer to the MachineFunction which describes the scope of the work it is permitted to schedule. If a given MachineInstr* is in this function then it is permitted to schedule work to be performed on the MachineInstr's. An alternative to this would be to remove the const from the MachineFunction::Delegate interface, however delegates are not permitted to modify the MachineInstr's they receive. In addition to this, the observer has three interface changes. * erasedInstr() is now erasingInstr() to indicate it is about to be erased but still exists at the moment. * changingInstr() and changedInstr() have been added to report changes before and after they are made. This allows us to trace the changes in the debug output. * As a convenience changingAllUsesOfReg() and finishedChangingAllUsesOfReg() will report changingInstr() and changedInstr() for each use of a given register. This is primarily useful for changes caused by MachineRegisterInfo::replaceRegWith() With this in place, both combine rules have been updated to report their changes to the observer. Finally, make some cosmetic changes to the debug output and make Combiner and CombinerHelp Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar Reviewed By: aditya_nandakumar Subscribers: mgorny, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52947 llvm-svn: 349167
* [AArch64] make test immune to scalarization improvements; NFCSanjay Patel2018-12-141-5/+6
| | | | | | | This is explicitly implementing what the comment says rather than relying on the implicit zext of a costant operand. llvm-svn: 349166
* [AArch64] Catch some more CMN opportunities.Arnaud A. de Grandmaison2018-12-131-0/+399
| | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022
* [CodeGen] Allow mempcy/memset to generate small overlapping stores.Clement Courbet2018-12-131-7/+5
| | | | | | | | | | | | | Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 349016
* Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping ↵Clement Courbet2018-12-111-5/+7
| | | | | | | | stores." Breaks ARM/memcpy-inline.ll llvm-svn: 348844
* [CodeGen] Allow mempcy/memset to generate small overlapping stores.Clement Courbet2018-12-111-7/+5
| | | | | | | | | | | | | Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 348843
* [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.Amara Emerson2018-12-108-53/+407
| | | | | | | | | | | | This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788
* [GlobalISel] Set stack protector index when translating ↵Petr Pavlu2018-12-101-0/+3
| | | | | | | | | | | | | | | | | | Intrinsic::stackprotector Record the stack protector index in MachineFrameInfo when translating Intrinsic::stackprotector similarly as is done by SelectionDAG when processing the same intrinsic. Setting this index allows the Prologue/Epilogue Insertion to recognize that the stack protection is enabled. The pass can then make sure that the stack protector comes before local variables on the stack and assigns potentially vulnerable objects first so they are close to the stack protector slot. Differential Revision: https://reviews.llvm.org/D55418 llvm-svn: 348761
* [GlobalISel] Add IR translation support for the @llvm.log10 intrinsicJessica Paquette2018-12-072-0/+13
| | | | | | | | This adds IR translation support for @llvm.log10 and updates relevant tests. https://reviews.llvm.org/D55392 llvm-svn: 348657
* [DAGCombiner] use root SDLoc for all nodes created by logic foldSanjay Patel2018-12-071-6/+6
| | | | | | | | | | | If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552
* AArch64: Fix invalid CCMP emissionMatthias Braun2018-12-061-4/+68
| | | | | | | | | | | | | | | | | | The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes http://llvm.org/PR39550 Differential Revision: https://reviews.llvm.org/D54137 llvm-svn: 348444
* [GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and ↵Amara Emerson2018-12-051-0/+9
| | | | | | | | | | | | | | | | | | | | | | | G_CONCAT_VECTOR opcodes. These opcodes are intended to subsume some of the capability of G_MERGE_VALUES, as it was too powerful and thus complex to add deal with throughout the GISel pipeline. G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar operands which are larger than the destination vector element type, and therefore does an implicit truncate. G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed, vectors together. These will be used in a subsequent commit. This commit just adds the initial infrastructure. Differential Revision: https://reviews.llvm.org/D53594 llvm-svn: 348430
* [MachineOutliner] Outline functions by order of benefitJessica Paquette2018-12-051-0/+106
| | | | | | | | | | | | | | | | | | Mostly NFC, only change is the order of outlined function names. Loop over the outlined functions instead of walking the candidate list. This is a bit easier to understand. It's far more natural to create a function, then replace all of its occurrences with calls than the other way around. The functions outlined after this do not change, but their names will be decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the most beneficial function, rather than the first one seen. This makes it easier to enforce an ordering on the outlined functions. So, this also adds a test to make sure that the ordering works as expected. llvm-svn: 348414
* AArch64: support funclets in fastcall and swift_callSaleem Abdulrasool2018-12-051-0/+36
| | | | | | | | | | Functions annotated with `__fastcall` or `__attribute__((__fastcall__))` or `__attribute__((__swiftcall__))` may contain SEH handlers even on Win64. This matches the behaviour of cl which allows for `__try`/`__except` inside a `__fastcall` function. This was detected while trying to self-host clang on Windows ARM64. llvm-svn: 348337
* [AArch64][GlobalISel] Re-enable selection of volatile loads.Amara Emerson2018-12-051-1/+1
| | | | | | | | | | | | | | We previously disabled this in r323371 because of a bug where we selected an extending load, but didn't delete the old G_LOAD, resulting in two loads being generated for volatile loads. Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the tablegen patterns should no longer be able to select (ext(load x)) patterns, it should be safe to re-enable it. The old test case should still work as expected. llvm-svn: 348320
* [ARM64][Windows] Fix local stack size for funcletsSanjin Sijaric2018-12-041-0/+53
| | | | | | | | | | | The comment was misplaced, and the code didn't do what the comment indicated, namely ignoring the varargs portion when computing the local stack size of a funclet in emitEpilogue. This results in incorrect offset computations within funclets that are contained in vararg functions. Differential Revision: https://reviews.llvm.org/D55096 llvm-svn: 348222
* [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfoJessica Paquette2018-12-041-0/+72
| | | | | | | | | | | | | | | | This moves the stack check logic into a lambda within getOutliningCandidateInfo. This allows us to be less conservative with stack checks. Whether or not a stack instruction is safe to outline is dependent on the frame variant and call variant of the outlined function; only in cases where we modify the stack can these be unsafe. So, if we move that logic later, when we're looking at an individual candidate, we can make better decisions here. This gives some code size savings as a result. llvm-svn: 348220
* [MachineOutliner] Drop candidates that require fixups if it's beneficialJessica Paquette2018-12-033-0/+314
| | | | | | | | | | | | | | | | | | | | | | | If it's a bigger code size win to drop candidates that require stack fixups than to demote every candidate to that variant, the outliner should do that. This happens if the number of bytes taken by calls to functions that don't require fixups, plus the number of bytes that'd be left is less than the number of bytes that it'd take to emit a save + restore for all candidates. Also add tests for each possible new behaviour. - machine-outliner-compatible-candidates shows that when we have candidates that don't use the stack, we can use the default call variant along with the no save/regsave variant. - machine-outliner-all-stack shows that when it's better to fix up the stack, we still will demote all candidates to that case - machine-outliner-drop-stack shows that we can discard candidates that require stack fixups when it would be beneficial to do so. llvm-svn: 348168
* [GlobalISel] Fix test irtranslator-stackprotect-check.llPetr Pavlu2018-12-031-10/+10
| | | | | | | Fix for commit r347862. Use correct AArch64 triple in test CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll. llvm-svn: 348111
* [MachineOutliner][AArch64] Improve checks for stack instructionsJessica Paquette2018-12-013-21/+22
| | | | | | | | | | | | | If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081
* Replace w16/w17 in machine-outliner.mir with w11/w12Jessica Paquette2018-12-011-52/+52
| | | | | | | These registers should not be used here, since they are interprocedural scratch registers in AArch64. llvm-svn: 348080
* [MachineOutliner] Outline both register save calls + no LR save calls togetherJessica Paquette2018-11-301-10/+26
| | | | | | | | | | | | | | | | | Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality. llvm-svn: 348036
* AArch64: Don't emit CFI for SCS register in nounwind functions.Peter Collingbourne2018-11-301-0/+8
| | | | | | | | | | All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision: https://reviews.llvm.org/D54988 llvm-svn: 348035
* [MachineScheduler] Order FI-based memops based on stack directionFrancis Visoiu Mistrih2018-11-291-3/+3
| | | | | | | | It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906
* [GlobalISel] LegalizationArtifactCombiner: Combine aext([asz]ext x) -> ↵Volkan Keles2018-11-291-4/+58
| | | | | | | | | | | | | | | | | | | [asz]ext x Summary: Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to reduce the number of instructions generated to clean up some legalization artifacts. Reviewers: aditya_nandakumar, dsanders, aemerson, bogner Reviewed By: aemerson Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D54174 llvm-svn: 347893
* [GlobalISel] Fix insertion of stack-protector epiloguePetr Pavlu2018-11-291-0/+50
| | | | | | | | | | | | | | | | | * Tell the StackProtector pass to generate the epilogue instrumentation when GlobalISel is enabled because GISel currently does not implement the same deferred epilogue insertion as SelectionDAG. * Update StackProtector::InsertStackProtectors() to find a stack guard slot by searching for the llvm.stackprotector intrinsic when the prologue was not created by StackProtector itself but the pass still needs to generate the epilogue instrumentation. This fixes a problem when the pass would abort because the stack guard AllocInst pointer was null when generating the epilogue -- test CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347862
* [MachineScheduler] Add support for clustering mem ops with FI base operandsFrancis Visoiu Mistrih2018-11-282-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long *); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long *dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747
* [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use ↵Craig Topper2018-11-262-25/+25
| | | | | | | | | | | | SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593
* Revert r347490 as it breaks address sanitizer buildsLuke Cheeseman2018-11-231-11/+11
| | | | llvm-svn: 347499
* Revert r343341Luke Cheeseman2018-11-231-11/+11
| | | | | | | - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490
* [DAGCombiner] form 'not' ops ahead of shifts (PR39657)Sanjay Patel2018-11-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
* [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTORJohn Brawn2018-11-221-0/+22
| | | | | | | | | | | | | A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456
* [DAGCombiner] look through bitcasts when trying to narrow vector binopsSanjay Patel2018-11-201-1/+3
| | | | | | | | | | | | | | | | | | | This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356
* [AArch64, x86] add tests for shift-not (PR39657); NFCSanjay Patel2018-11-201-0/+18
| | | | llvm-svn: 347316
* Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.Martin Elshuber2018-11-192-0/+411
| | | | | | | | | | | | | | This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
* DAG combiner: fold (select, C, X, undef) -> XStanislav Mekhanoshin2018-11-161-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
* AArch64: Emit a call frame instruction for the shadow call stack register.Peter Collingbourne2018-11-161-0/+1
| | | | | | | | | | When unwinding past a function that uses shadow call stack, we must subtract 8 from the value of the x18 register. This patch causes us to emit a call frame instruction that causes that to happen. Differential Revision: https://reviews.llvm.org/D54609 llvm-svn: 347089
* [ARM64] [Windows] Handle funcletsEli Friedman2018-11-094-0/+357
| | | | | | | | | | | | This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568
* [AArch64] Support HiSilicon's TSV110 processorBryan Chan2018-11-092-0/+2
| | | | | | | | | | | | Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls Reviewed By: kristof.beyls Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D53908 llvm-svn: 346546
* [SelectionDAG] swap select_cc operands to enable foldingAlexandros Lamprineas2018-11-091-0/+54
| | | | | | | | | | | | | | | | | | The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484
* [COFF, ARM64] Add support for MSVC buffer security checkMandeep Singh Grang2018-11-091-0/+8
| | | | | | | | | | | | Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469
* [DAGCombine] Improve alias analysis for chain of independent stores.Nirav Dave2018-11-083-15/+13
| | | | | | | | | | | | | | | | | | | FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432
* [AArch64] [Windows] Trap after noreturn calls.Eli Friedman2018-11-071-0/+17
| | | | | | | | | Like the comment says, this isn't the most efficient fix in terms of codesize, but it works. Differential Revision: https://reviews.llvm.org/D54129 llvm-svn: 346358
* [AArch64][GlobalISel] Simplify and autogenerate the legalizer testsVolkan Keles2018-11-0618-807/+365
| | | | llvm-svn: 346253
* Reland r346166: [GlobalISel] Refactor the artifact combiner a bit by using ↵Volkan Keles2018-11-061-0/+21
| | | | | | | | | | MIPatternMatch It was causing a crash because we were trying to get the definition of a target register. Fixed the issue by adding a check and added a test case for that. llvm-svn: 346251
* [NFC][x86][AArch64] extract-bits.ll: add test with 'ashr'.Roman Lebedev2018-11-051-0/+32
| | | | llvm-svn: 346121
OpenPOWER on IntegriCloud