summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WBFlorian Hahn2018-03-024-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570
* [TLS] use emulated TLS if the target supports only this modeChih-Hung Hsieh2018-02-282-0/+13
| | | | | | | | | | | | | | | Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341
* [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operationsPablo Barrio2018-02-282-2/+160
| | | | | | | | | | | | | | | | | | | Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Martin Svanfeldt Reviewers: fhahn, pbarrio, rogfer01 Reviewed By: pbarrio, rogfer01 Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 326333
* [ARM] Another f16 litpool fixSjoerd Meijer2018-02-271-0/+113
| | | | | | | | | | | | | We were always setting the block alignment to 2 bytes in Thumb mode and 4-bytes in ARM mode (r325754, and r325012), but this could cause reducing the block alignment when it already had been aligned (e.g. in Thumb mode when the block is a CPE that was already 4-byte aligned). Patch by Momchil Velikov, I've only added a test. Differential Revision: https://reviews.llvm.org/D43777 llvm-svn: 326232
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2018-02-273-8/+9
| | | | | | | | Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208
* [CodeGen] Don't omit any redundant information in -debug outputFrancis Visoiu Mistrih2018-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | In r322867, we introduced IsStandalone when printing MIR in -debug output. The default behaviour for that was: 1) If any of MBB, MI, or MO are -debug-printed separately, don't omit any redundant information. 2) When -debug-printing a MF entirely, don't print any redundant information. 3) When printing MIR, don't print any redundant information. I'd like to change 2) to: 2) When -debug-printing a MF entirely, don't omit any redundant information. Differential Revision: https://reviews.llvm.org/D43337 llvm-svn: 326094
* Recommit: [ARM] f16 constant pool fixSjoerd Meijer2018-02-223-1/+113
| | | | | | | This recommits r325754; the modified and failing test case actually didn't need any modifications. llvm-svn: 325765
* Revert r325754 and r325755 (f16 literal pool) because buildbots were unhappy.Sjoerd Meijer2018-02-224-115/+3
| | | | llvm-svn: 325756
* Added a test that I forgot to svn add in my previous commit r325754.Sjoerd Meijer2018-02-221-0/+107
| | | | llvm-svn: 325755
* [ARM] f16 constant pool fixSjoerd Meijer2018-02-223-3/+8
| | | | | | | | | | | This is a follow up of r325012, that allowed half types in constant pools. Proper alignment was enforced when a big basic block was split up, but not when a CPE was placed before/after a block; the successor block had the wrong alignment. Differential Revision: https://reviews.llvm.org/D43580 llvm-svn: 325754
* [ARM] Lower BR_CC for f16Sjoerd Meijer2018-02-201-1/+31
| | | | | | | | This case wasn't handled yet. Differential Revision: https://reviews.llvm.org/D43508 llvm-svn: 325616
* [CodeGen] Fix tests breaking after r325505Francis Visoiu Mistrih2018-02-191-1/+2
| | | | llvm-svn: 325512
* [ARM] Add LLVM tests for the vcvtr builtinsSjoerd Meijer2018-02-171-0/+35
| | | | | | | | | Follow up of Clang commit r325351; this adds the LLVM tests, which were also missing. Differential Revision: https://reviews.llvm.org/D43395 llvm-svn: 325443
* Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Quentin Colombet2018-02-173-9/+8
| | | | | | | | | | | | | | | | | This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421
* [ARM] Return true in enableMultipleCopyHints().Jonas Paulsson2018-02-164-129/+126
| | | | | | | | | | Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Eli Friedman llvm-svn: 325327
* [LegalizeDAG] Fix legalization of SETCCMikhail Maltsev2018-02-161-0/+23
| | | | | | | | | | | | | | | | | | | Summary: Currently when expanding a SETCC node into a SELECT_CC, LLVM uses an incorrect type for determining BooleanContent of the result. This patch fixes the issue. Fixes PR36079. Reviewers: rogfer01, javed.absar, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43282 llvm-svn: 325325
* [ARM] Materialise some boolean values to avoid a branchRoger Ferrer Ibanez2018-02-169-528/+536
| | | | | | | | | | | | | This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches. Differential Revision: https://reviews.llvm.org/D34515 llvm-svn: 325323
* [ARM] Fix redirect in inline assembly testPablo Barrio2018-02-151-1/+1
| | | | | | | | | | | | Summary: Fix silly mistake in a test Reviewers: gkistanova, apilipenko Subscribers: javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D43342 llvm-svn: 325283
* [ARM] Allow 64- and 128-bit types with 't' inline asm constraintPablo Barrio2018-02-152-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In LLVM, 't' selects a floating-point/SIMD register and only supports 32-bit values. This is appropriately documented in the LLVM Language Reference Manual. However, this behaviour diverges from that of GCC, where 't' selects the s0-s31 registers and its qX and dX variants depending on additional operand modifiers (q/P). For example, the following C code: #include <arm_neon.h> float32x4_t a, b, x; asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b)) results in the following assembly if compiled with GCC: vadd.f32 s0, s0, s1 whereas LLVM will show "error: couldn't allocate output register for constraint 't'", since a, b, x are 128-bit variables, not 32-bit. This patch extends the use of 't' to mean that of GCC, thus allowing selection of the lower Q vector regs and their D/S variants. For example, the earlier code will now compile as: vadd.f32 q0, q0, q1 This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result, as follows: asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b)) Since this is only an extension of functionality, existing code should not be affected by this change. Note that operand modifiers q/P are already supported by LLVM, so this patch should suffice to support inline assembly with constraint 't' originally built for GCC. Reviewers: grosbach, rengolin Reviewed By: rengolin Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42962 llvm-svn: 325244
* [ARM] f16 vcmp fixesSjoerd Meijer2018-02-151-24/+50
| | | | | | | | This adds f16 VCMP match rules and fixes the test cases. Differential Revision: https://reviews.llvm.org/D43291 llvm-svn: 325228
* [ARM] f16 stack spill/reloadsSjoerd Meijer2018-02-141-0/+26
| | | | | | | | This adds support for handling f16 stack spills/reloads. Differential Revision: https://reviews.llvm.org/D43280 llvm-svn: 325130
* [CodeGen] Print bundled instructions using the MIR syntax in -debug outputFrancis Visoiu Mistrih2018-02-131-3/+3
| | | | | | | | | | | | | | | | | Old syntax: BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2 * %r0 = SOME_OP %r2 * %r1 = ANOTHER_OP internal %r0 New syntax: BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2 { %r0 = SOME_OP %r2 %r1 = ANOTHER_OP internal %r0 } llvm-svn: 325032
* [ARM] Allow half types in ConstantPoolSjoerd Meijer2018-02-132-0/+169
| | | | | | | | | | | Change ARMConstantIslandPass to: - accept f16 literals as litpool entries, - if the litpool needs to be inserted in the middle of a big block, then we need to 4-byte align the next instruction in ARM mode. Differential Revision: https://reviews.llvm.org/D42784 llvm-svn: 325012
* [Thumb] Handle addressing mode AddrMode5FP16Sjoerd Meijer2018-02-131-0/+24
| | | | | | | | This addressing mode wasn't checked, so we were running in an assert. Differential Revision: https://reviews.llvm.org/D43179 llvm-svn: 324996
* [GlobalMerge] Allow merging of dllexported variablesMartin Storsjo2018-02-121-3/+9
| | | | | | | | | If merging them, the dllexport attribute needs to be brought along to the new GlobalAlias. Differential Revision: https://reviews.llvm.org/D43192 llvm-svn: 324937
* [CodeGen] Add a -trap-unreachable option for debuggingDavid Green2018-02-121-0/+8
| | | | | | | | | | | Add a common -trap-unreachable option, similar to the target specific hexagon equivalent, which has been replaced. This turns unreachable instructions into traps, which is useful for debugging. Differential Revision: https://reviews.llvm.org/D42965 llvm-svn: 324880
* [ARM] preserve test intent by removing undefSanjay Patel2018-02-101-1/+1
| | | | | | | D43141 proposes to correct undef folding in the DAG, and this test would not survive that change. llvm-svn: 324814
* Emit smaller exception tables for non-SJLJ mode.Rafael Espindola2018-02-093-7/+7
| | | | | | | | | | | * Use uleb128 for code offsets in the LSDA call site table. * Omit the TTBase offset if the type table is empty. This change can reduce the size of the DWARF/Itanium LSDA by about half. Patch by Ryan Prichard! llvm-svn: 324750
* Use assembler expressions to lay out the EH LSDA.Rafael Espindola2018-02-092-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rely on the assembler to finalize the layout of the DWARF/Itanium exception-handling LSDA. Rather than calculate the exact size of each thing in the LSDA, use assembler directives: To emit the offset to the TTBase label: .uleb128 .Lttbase0-.Lttbaseref0 .Lttbaseref0: To emit the size of the call site table: .uleb128 .Lcst_end0-.Lcst_begin0 .Lcst_begin0: ... call site table entries ... .Lcst_end0: To align the type info table: ... action table ... .balign 4 .long _ZTIi .long _ZTIl .Lttbase0: Using assembler directives simplifies the compiler and allows switching the encoding of offsets in the call site table from udata4 to uleb128 for a large code size savings. (This commit does not change the encoding.) The combination of the uleb128 followed by a balign creates an unfortunate dependency cycle that the assembler must sometimes resolve either by padding an LEB or by inserting zero padding before the type table. See PR35809 or GNU as bug 4029. Patch by Ryan Prichard! llvm-svn: 324749
* [CodeGen] Unify the syntax of MBB successors in MIR and -debug outputFrancis Visoiu Mistrih2018-02-096-13/+13
| | | | | | | | | | | Instead of: Successors according to CFG: %bb.6(0x12492492 / 0x80000000 = 14.29%) print: successors: %bb.6(0x12492492); %bb.6(14.29%) llvm-svn: 324685
* [CodeGen] Print MachineBasicBlock labels using MIR syntax in -debug outputFrancis Visoiu Mistrih2018-02-084-6/+6
| | | | | | | | | | | | | | Instead of: %bb.1: derived from LLVM BB %for.body print: bb.1.for.body: Also use MIR syntax for MBB attributes like "align", "landing-pad", etc. llvm-svn: 324563
* [ARM] FP16 mov imm patternSjoerd Meijer2018-02-071-0/+13
| | | | | | | | | This is a follow up of r324321, adding a match pattern for mov with a FP16 immediate (also fixing operand vfp_f16imm that wasn't even compiling). Differential Revision: https://reviews.llvm.org/D42973 llvm-svn: 324456
* Place undefined globals in .bss instead of .dataEli Friedman2018-02-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | Following up on the discussion from http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef values are now placed in the .bss as well as null values. This prevents undef global values taking up potentially huge amounts of space in the .data section. The following two lines now both generate equivalent .bss data: @vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4 @vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for This is primarily motivated by the corresponding issue in the Rust compiler (https://github.com/rust-lang/rust/issues/41315). Differential Revision: https://reviews.llvm.org/D41705 Patch by varkor! llvm-svn: 324424
* [LivePhysRegs] Fix handling of return instructions.Eli Friedman2018-02-061-1/+1
| | | | | | | | | | | | | | | | | See D42509 for the original version of this. Basically, there are two significant changes to behavior here: - addLiveOuts always adds all pristine registers (even if a block has no successors). - addLiveOuts and addLiveOutsNoPristines always add all callee-saved registers for return blocks (including conditional return blocks). I cleaned up the functions a bit to make it clear these properties hold. Differential Revision: https://reviews.llvm.org/D42655 llvm-svn: 324422
* [ARM] f16 conversionsSjoerd Meijer2018-02-061-0/+45
| | | | | | | | | This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion match patterns. Differential Revision: https://reviews.llvm.org/D42954 llvm-svn: 324360
* [ARM] Armv8.2-A FP16 code generation (part 3/3)Sjoerd Meijer2018-02-061-8/+566
| | | | | | | | | | | | | | | This adds most of the FP16 codegen support, but these areas need further work: - FP16 literals and immediates are not properly supported yet (e.g. literal pool needs work), - Instructions that are generated from intrinsics (e.g. vabs) haven't been added. This will be addressed in follow-up patches. Differential Revision: https://reviews.llvm.org/D42849 llvm-svn: 324321
* [ARM] fixed some tabs/whitespaces in test. NFC.Sjoerd Meijer2018-02-021-6/+6
| | | | llvm-svn: 324074
* SplitKit: Fix liveness recomputation in some remat cases.Matthias Braun2018-02-021-0/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Example situation: ``` BB0: %0 = ... use %0 ; ... condjump BB1 jmp BB2 BB1: %0 = ... ; rematerialized def from above (from earlier split step) jmp BB2 BB2: ; ... use %0 ``` %0 will have a live interval with 3 value numbers (for the BB0, BB1 and BB2 parts). Now SplitKit tries and succeeds in rematerializing the value number in BB2 (This only works because it is a secondary split so SplitKit is can trace this back to a single original def). We need to recompute all live ranges affected by a value number that we rematerialize. The case that we missed before is that when the value that is rematerialized is at a join (Phi VNI) then we also have to recompute liveness for the predecessor VNIs. rdar://35699130 Differential Revision: https://reviews.llvm.org/D42667 llvm-svn: 324039
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2018-02-014-7/+9
| | | | | | | | | | | | | | | | | | | | | | Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991
* [ARM] FullFP16 LowerReturn FixSjoerd Meijer2018-02-011-8/+21
| | | | | | | | | | Commit r323512 introduced an optimisation in LowerReturn for half-precision return values. A missing check caused a crash when the return value is "undef" (i.e. a node that has no operands). Differential Revision: https://reviews.llvm.org/D42743 llvm-svn: 323968
* Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using ↵Evgeniy Stepanov2018-01-313-136/+3
| | | | | | | | | | bit-operations" Miscompiles code. Testcase pending. This reverts commit r323869. llvm-svn: 323929
* Followup on Proposal to move MIR physical register namespace to '$' sigil.Puyan Lotfi2018-01-3152-3989/+3989
| | | | | | | | | | | | Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
* [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operationsPablo Barrio2018-01-313-3/+136
| | | | | | | | | | | | | | | | | | | Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Marten Svanfeldt. Reviewers: fhahn, pbarrio Reviewed By: pbarrio Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 323869
* [ARM] Armv8.2-A FP16 code generation (part 2/3)Sjoerd Meijer2018-01-311-9/+5
| | | | | | | | | | | | Half-precision arguments and return values are passed as if it were an int or float for ARM. This results in truncates and bitcasts to/from i16 and f16 values, which are legalized very early to stack stores/loads. When FullFP16 is enabled, we want to avoid codegen for these bitcasts as it is unnecessary and inefficient. Differential Revision: https://reviews.llvm.org/D42580 llvm-svn: 323861
* [ARM GlobalISel] Add inst selector tests for G_SITOFP and G_UITOFPDiana Picus2018-01-301-0/+113
| | | | | | These are handled by the TableGen'erated code. llvm-svn: 323732
* [ARM GlobalISel] Map G_SITOFP and G_UITOFPDiana Picus2018-01-301-0/+91
| | | | | | | Straightforward mapping (integer operand to GPR, floating point operand to FPR). llvm-svn: 323731
* [ARM GlobalISel] Legalize G_SITOFP and G_UITOFPDiana Picus2018-01-301-0/+143
| | | | | | | | Legal if we have hardware support, libcall otherwise. Also add supporting code to the legalizer helper for libcalls. llvm-svn: 323730
* [ARM GlobalISel] Add inst selector tests for G_FPTOSI and G_FPTOUIDiana Picus2018-01-301-0/+113
| | | | | | The work is done by the TableGen'erated code. llvm-svn: 323728
* [ARM GlobalISel] Map G_FPTOSI and G_FPTOUIDiana Picus2018-01-301-0/+91
| | | | | | | Straightforward mapping (integer operand goes to GPR, floating point operand goes to FPR). llvm-svn: 323727
* [ARM GlobalISel] Legalize G_FPTOSI and G_FPTOUIDiana Picus2018-01-301-0/+143
| | | | | | | | | Legal if we have hardware support for floating point, libcalls otherwise. Also add the necessary support for libcalls in the legalizer helper. llvm-svn: 323726
OpenPOWER on IntegriCloud