summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] GlobalISel: Support array returnsDiana Picus2017-05-291-24/+87
| | | | | | | These are a bit rare in practice, but they don't require anything special compared to array parameters, so support them as well. llvm-svn: 304137
* [ARM] GlobalISel: Support array parameters/argumentsDiana Picus2017-05-292-5/+298
| | | | | | | | | Clang coerces structs into arrays, so it's a good idea to support them. Most of the support boils down to getting the splitToValueTypes helper to actually split types. We then use G_INSERT/G_EXTRACT to deal with the parts. llvm-svn: 304132
* [DAGCombiner] use narrow load to avoid vector extractSanjay Patel2017-05-272-5/+7
| | | | | | | | | | | | | | | | | | If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector). This is intentionally conservative. Follow-ups may loosen the one-use constraint to account for the extract cost or just remove the one-use check. The memop chain updating is based on code that already exists multiple times in x86 lowering, so that should be pulled into a helper function as a follow-up. Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137). Differential Revision: https://reviews.llvm.org/D33578 llvm-svn: 304072
* Fix test broken by r304020David Blaikie2017-05-261-1/+6
| | | | | | | | | | It's a workaround because the test was flakey passing to begin with, but it looks like (going off commit history) it really did want to test in the presence of debug info, so keep that behavior (by adding something to the CU so it's not dropped) & restore the flakey pass in the process. (added a FIXME in case someone else decides to look at it later) llvm-svn: 304042
* [ARM] Fix lowering of misaligned memcpy/memsetJohn Brawn2017-05-262-4/+51
| | | | | | | | | | | | | | | | | Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990
* [ARM] Add tests for 6-M memcpy/memset code generationJohn Brawn2017-05-262-10/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D33495 llvm-svn: 303987
* CodeGen: Rename DEBUG_TYPE to match passnamesMatthias Braun2017-05-256-14/+14
| | | | | | | | Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
* AsmPrinter: mark the beginning and the end of a function in verbose modeFrancis Visoiu Mistrih2017-05-231-0/+1
| | | | llvm-svn: 303690
* [ARM] Temporarily disable globals promotion to constant pools to prevent ↵Oleg Ranevskyy2017-05-233-15/+15
| | | | | | | | | | | | | | | | | | | | | miscompilation Summary: A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries. The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed. Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1. Reviewers: efriedma, jmolloy Reviewed By: efriedma Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar Differential Revision: https://reviews.llvm.org/D33446 llvm-svn: 303679
* [GlobalISel] IRTranslator: Translate ConstantStructVolkan Keles2017-05-191-0/+30
| | | | | | | | | | | | Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D33317 llvm-svn: 303412
* Fix buildbot failure after rL303327: [BPI] Reduce the probability of ↵Serguei Katkov2017-05-181-1/+1
| | | | | | | | unreachable edge to minimal value greater than 0. One more test is updated to meet new branch probability for unreachable branches. llvm-svn: 303329
* Elide stores which are overwritten without being observed.Nirav Dave2017-05-163-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198
* Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"Renato Golin2017-05-163-61/+1
| | | | | | | | | | | | | | | | | Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193
* [ARM] Mark LEApcrel instructions as isAsCheapAsAMoveJohn Brawn2017-05-152-1/+30
| | | | | | | | | | | Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054
* [ARM] Mark LEApcrel as not having side effectsJohn Brawn2017-05-151-0/+31
| | | | | | | | | | | | | | | Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053
* [ARM][GlobalISel] Legalize narrow scalar ops by wideningDiana Picus2017-05-113-54/+213
| | | | | | | | | | | | | | This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB and MUL to 32 bits since we only have TableGen patterns for 32 bits. See the commit message for r292827 for more details. At this point we could just remove some of the tests for regbankselect and instruction-select, since we're not going to see any narrow operations at those levels anymore. Instead I decided to update them with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences generated by the legalizer. llvm-svn: 302782
* [ARM][GlobalISel] Support for G_ANYEXTDiana Picus2017-05-112-0/+99
| | | | | | | | | | | | | | G_ANYEXT can be introduced by the legalizer when widening scalars. Add support for it in the register bank info (same mapping as everything else) and in the instruction selector. When selecting it, we treat it as a COPY, just like G_TRUNC. On this occasion we get rid of some assertions in selectCopy so we can reuse it. This shouldn't be a problem at the moment since we're not supporting any complicated cases (e.g. FPR, different register banks). We might want to separate the paths when we do. llvm-svn: 302778
* Add extra operand to CALLSEQ_START to keep frame part set up previouslySerge Pavlov2017-05-091-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527
* ARM: use divmod libcalls on embedded MachO platforms too.Tim Northover2017-05-082-38/+36
| | | | | | | The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4 anyway, so we should always use them if possible. llvm-svn: 302462
* [ARM][NEON] Add support for ISD::ABS lowering Simon Pilgrim2017-05-081-0/+38
| | | | | | | | | | Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly Added constant folding tests. Differential Revision: https://reviews.llvm.org/D32938 llvm-svn: 302417
* ARM: Compute MaxCallFrame size earlyMatthias Braun2017-05-051-0/+24
| | | | | | | | | | | | | | | | | This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303
* MIParser/MIRPrinter: Compute block successors if not explicitely specifiedMatthias Braun2017-05-054-14/+0
| | | | | | | | | | | | | | | | | - MIParser: If the successor list is not specified successors will be added based on basic block operands in the block and possible fallthrough. - MIRPrinter: Adds a new `simplify-mir` option, with that option set: Skip printing of block successor lists in cases where the parser is guaranteed to reconstruct it. This means we still print the list if some successor cannot be determined (happens for example for jump tables), if the successor order changes or branch probabilities being unequal. Differential Revision: https://reviews.llvm.org/D31262 llvm-svn: 302289
* [ARM] ACLE Chapter 9 intrinsicsSam Parker2017-05-043-63/+591
| | | | | | | | | | | | Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126
* [XRay] Create an Index of sleds per functionDean Michael Berris2017-05-042-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change adds a new section to the xray-instrumented binary that stores an index into ranges of the instrumentation map, where sleds associated with the same function can be accessed as an array. At runtime, we can get access to this index by function ID offset allowing for selective patching and unpatching by function ID. Each entry in this new section (xray_fn_idx) will include two pointers indicating the start and one past the end of the sleds associated with the same function. These entries will be 16 bytes long on x86 and aarch64. On arm, we align to 16 bytes anyway so the runtime has to take that into consideration. __{start,stop}_xray_fn_idx will be the symbols that the runtime will look for when we implement the selective patching/unpatching by function id APIs. Because XRay synthesizes the function id's in a monotonically increasing manner at runtime now, implementations (and users) can use this table to look up the sleds associated with a specific function. This is useful in implementations that want to do things like: - Implement coverage mode for functions by patching everything pre-main, then as functions are encountered, the installed handler can unpatch the function that's been encountered after recording that it's been called. - Do "learning mode", so that the implementation can figure out some statistical information about function calls by function id for a time being, and then determine which functions are worth uninstrumenting at runtime. - Do "selective instrumentation" where an implementation can specifically instrument only certain function id's at runtime (either based on some external data, or through some other heuristics) instead of patching all the instrumented functions at runtime. Reviewers: dblaikie, echristo, chandlerc, javed.absar Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32693 llvm-svn: 302109
* ARM: add extra test for addrmode folding.Tim Northover2017-05-031-0/+11
| | | | | | | I was worried we might replace a mul with a mul+shift even if there were later uses. Turns out to be unfounded but I'd just as well add an actual test for it. llvm-svn: 302051
* ARM: avoid handing a deleted node back to TableGen during ISel.Tim Northover2017-05-021-0/+17
| | | | | | | | | | When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983
* ARM: add arm1176j-f processorTim Northover2017-05-021-1/+4
| | | | | | | | | I doubt anyone actually uses it, and I'm not even entirely convinced it exists myself; but it is our default for "clang -arch armv6". Functionally, if it does exist it's identical to the arm1176jz-f from LLVM's point of view (the difference is apparently in the "Security Extensions"). llvm-svn: 301962
* [ARM] GlobalISel: Tighten test. NFCDiana Picus2017-04-281-27/+27
| | | | | | Explicitly check types and load sizes in the IRTranslator test. llvm-svn: 301627
* [ARM] GlobalISel: Fix extended stack operandsDiana Picus2017-04-272-4/+52
| | | | | | | | | | | | | | Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531
* [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)Sanjay Patel2017-04-261-13/+10
| | | | | | | | | | | | Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that. This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180. Differential Revision: https://reviews.llvm.org/D31944 llvm-svn: 301457
* Fix signed multiplication with overflow fallback.Ranjeet Singh2017-04-261-0/+16
| | | | | | | | | | | | | | | | For targets that don't have ISD::MULHS or ISD::SMUL_LOHI for the type and the double width type is illegal, then the two operands are sign extended to twice their size then multiplied to check for overflow. The extended upper halves were mismatched causing an incorrect result. This fixes the mismatch. A test was added for ARM V6-M where the bug was detected. Patch by James Duley. Differential Revision: https://reviews.llvm.org/D31807 llvm-svn: 301404
* [ARM, x86] add more vector tests for bool math; NFCSanjay Patel2017-04-241-0/+39
| | | | | | | | I'm proposing a fold for increment-of-sexted-bool in: https://reviews.llvm.org/D31944 ...so we need to know what happens in more cases like these. llvm-svn: 301269
* [ARM] GlobalISel: Legalize s8 and s16 G_(S|U)DIVDiana Picus2017-04-242-4/+203
| | | | | | | | | | | | | | We have to widen the operands to 32 bits and then we can either use hardware division if it is available or lower to a libcall otherwise. At the moment it is not enough to set the Legalizer action to WidenScalar, since for libcalls it won't know what to do (it won't be able to find what size to widen to, because it will find Libcall and not Legal for 32 bits). To hack around this limitation, we request Custom lowering, and as part of that we widen first and then we run another legalizeInstrStep on the widened DIV. llvm-svn: 301166
* [ARM] GlobalISel: Support G_(S|U)DIV for s32Diana Picus2017-04-244-0/+225
| | | | | | | | | Add support for both targets with hardware division and without. For hardware division we have to add support throughout the pipeline (legalizer, reg bank select, instruction select). For targets without hardware division, we only need to mark it as a libcall. llvm-svn: 301164
* [ARM] GlobalISel: Select G_CONSTANT with CImm operandsDiana Picus2017-04-242-3/+32
| | | | | | | | | | | | | | When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm operand. We used to just leave the G_CONSTANT operand unchanged, which works in some cases (such as the GEP offsets that we create when referring to stack slots). However, in many other places the G_CONSTANTs are created with CImm operands. This patch makes sure to handle those as well, and to error out gracefully if in the end we don't end up with an Imm operand. Thanks to Oliver Stannard for reporting this issue. llvm-svn: 301162
* ARM: make sure we use all entries in a vector before forming a vpaddl.Tim Northover2017-04-211-0/+9
| | | | | | | | | Otherwise there's some mismatch, and we'll either form an illegal type or an illegal node. Thanks to Eli Friedman for pointing out the problem with my original solution. llvm-svn: 301036
* ARM: don't try to create an i8 -> i32 vpaddl.Tim Northover2017-04-211-0/+11
| | | | | | | | DAG combine was mistakenly assuming that the step-up it was looking at was always a doubling, but it can sometimes be a larger extension in which case we'd crash. llvm-svn: 301002
* [ARM] GlobalISel: Add support for G_TRUNCDiana Picus2017-04-213-0/+77
| | | | | | | | Select them as copies. We only select if both the source and the destination are on the same register bank, so this shouldn't cause any trouble. llvm-svn: 300971
* [ARM] GlobalISel: Make struct arguments fail elegantlyDiana Picus2017-04-211-0/+80
| | | | | | | | | | | The condition in isSupportedType didn't handle struct/array arguments properly. Fix the check and add a test to make sure we use the fallback path in this kind of situation. The test deals with some common cases where the call lowering should error out. There are still some issues here that need to be addressed (tail calls come to mind), but they can be addressed in other patches. llvm-svn: 300967
* ARM: lower "fence singlethread" to a pure compiler barrier.Tim Northover2017-04-201-0/+16
| | | | | | | | Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904
* ARM: handle post-indexed NEON ops where the offset isn't the access width.Tim Northover2017-04-207-63/+121
| | | | | | | | | | | Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
* [DAG] add splat vector support for 'xor' in SimplifyDemandedBitsSanjay Patel2017-04-191-4/+2
| | | | | | | | | This allows forming more 'not' ops, so we get improvements for ISAs that have and-not. Follow-up to: https://reviews.llvm.org/rL300725 llvm-svn: 300763
* ARMFrameLowering: Reserve emergency spill slot for large argumentsMatthias Braun2017-04-191-0/+94
| | | | | | | | | | | | | | | | Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761
* [ARM] Use TableGen patterns to select vtbl. NFC.Eli Friedman2017-04-191-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749
* ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.Tim Northover2017-04-191-0/+24
| | | | llvm-svn: 300726
* [ARM] add test and auto-generate checks; NFCSanjay Patel2017-04-191-122/+440
| | | | llvm-svn: 300698
* Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"Renato Golin2017-04-191-94/+0
| | | | | | This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
* [ARM] GlobalISel: Add support for G_MULDiana Picus2017-04-194-1/+326
| | | | | | | | Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
* ARMFrameLowering: Reserve emergency spill slot for large argumentsMatthias Braun2017-04-191-0/+94
| | | | | | | | | | | | We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639
* [ARM] Add hardware build attributes in assemblerOliver Stannard2017-04-181-234/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547
OpenPOWER on IntegriCloud