summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Test commitColin LeMahieu2014-10-211-1/+1
| | | | | | Fixing brief comment. llvm-svn: 220299
* [PowerPC] Avoid VSX FMA mutate when killed product reg = addend regBill Schmidt2014-10-211-0/+6
| | | | | | | | | | | | | | | | | | | | | With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290
* [ARM] NEON 32-bit scalar moves are also available in VFPv2Oliver Stannard2014-10-211-2/+3
| | | | | | | | | | | The 32-bit variants of the NEON scalar<->GPR move instructions are also available in VFPv2. The 8- and 16-bit variants do require NEON. Note that the checks in the test file are all -DAG because they are checking a mixture of stdout and stderr, and the ordering is not guaranteed. llvm-svn: 220288
* [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an ↵Yuri Gorshenin2014-10-211-63/+121
| | | | | | | | | | | | | | index register. Summary: Fixed memory accesses with rbp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5819 llvm-svn: 220283
* [Thumb2] LDRS?[BH] cannot load to the PCOliver Stannard2014-10-211-4/+4
| | | | | | | The Thumb2 LDRS?[BH] instructions are not valid when the destination register is the PC (these encodings are used for preload hints). llvm-svn: 220278
* [mips][microMIPS] Implement ADDU16 and SUBU16 instructionsZoran Jovanovic2014-10-212-0/+27
| | | | | | Differential Revision: http://reviews.llvm.org/D5118 llvm-svn: 220276
* [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructionsZoran Jovanovic2014-10-212-0/+34
| | | | | | Differential Revision: http://reviews.llvm.org/D5117 llvm-svn: 220275
* [mips][microMIPS] Implement microMIPS 16-bit instructions registersZoran Jovanovic2014-10-213-0/+46
| | | | | | Differential Revision: http://reviews.llvm.org/D5116 llvm-svn: 220273
* Fix a bit of confusion about .set and produce more readable assembly.Rafael Espindola2014-10-213-4/+2
| | | | | | | | | | | | | | | Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256
* [X86] Fix a bug in the lowering of the mask of VSELECT.Quentin Colombet2014-10-201-1/+6
| | | | | | | | | | | | | | | | X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT when it knows it can be lowered into BLEND. Indeed, only the high bits need to be set for those and it optimizes those accordingly. However, when the mask is a compile time constant, the lowering will be handled by the generic optimizer and those modifications will generate bad code in the generic optimizer. This patch fixes that by preventing the optimization if the VSELECT will be handled by the generic optimizer. <rdar://problem/18675020> llvm-svn: 220242
* [X86] Memory folding for commutative instructions (updated)Simon Pilgrim2014-10-203-59/+102
| | | | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt. Added additional regression test case provided by Joerg Sonnenberger. Differential Revision: http://reviews.llvm.org/D5818 llvm-svn: 220239
* ARM: rework Thumb1 frame index rewritingTim Northover2014-10-205-112/+28
| | | | | | | | | | | | | | | | | | | | | | | The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236
* [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7MOliver Stannard2014-10-201-3/+5
| | | | | | | These instructions are related to the v7[AR] exception model, and are not defined on v7M. llvm-svn: 220204
* Remove unnecessary else.Sid Manning2014-10-201-4/+4
| | | | llvm-svn: 220200
* [ARM] Do not select SMULW[BT] or SMLAW[BT]Oliver Stannard2014-10-202-30/+8
| | | | | | | | | | | | | | | | | | | The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196
* [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndexOliver Stannard2014-10-201-0/+1
| | | | | | | | | This function can, for some offsets from the SP, split one instruction into two. Since it re-uses the original instruction as the first instruction of the result, we need ensure its result register is not marked as dead before we use it in the second instruction. llvm-svn: 220194
* Use triple predicate functions instead of checking values directly. NFC.Bob Wilson2014-10-191-24/+7
| | | | llvm-svn: 220155
* R600/SI: Add global atomicrmw xchgAaron Watry2014-10-172-0/+4
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220110
* R600/SI: Add global atomicrmw xorAaron Watry2014-10-172-1/+4
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220109
* R600/SI: Add global atomicrmw orAaron Watry2014-10-172-1/+4
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220108
* R600/SI: Add global atomicrmw min/uminAaron Watry2014-10-172-2/+8
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220107
* R600/SI: Add global atomicrmw max/umaxAaron Watry2014-10-172-2/+8
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220106
* R600/SI: Add global atomicrmw andAaron Watry2014-10-172-1/+4
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220105
* R600/SI: Add global atomicrmw subAaron Watry2014-10-172-1/+4
| | | | | | | | v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220104
* [PowerPC] Change assert to better formBill Schmidt2014-10-171-3/+3
| | | | llvm-svn: 220092
* R600/SI: Remove redundant setting of instruction bitsMatt Arsenault2014-10-171-4/+0
| | | | | | These are all set on the instruction base classes. llvm-svn: 220091
* [PowerPC] Change liveness testing in VSX FMA mutation passBill Schmidt2014-10-171-8/+20
| | | | | | | | | | | | | | | | With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090
* Fix typoMatt Arsenault2014-10-171-1/+1
| | | | llvm-svn: 220068
* R600/SI: Also check for FPImm literal constantsMatt Arsenault2014-10-171-1/+1
| | | | llvm-svn: 220067
* R600/SI: Allow commuting with source modifiersMatt Arsenault2014-10-171-11/+22
| | | | llvm-svn: 220066
* R600/SI: Simplify code with hasModifiersSetMatt Arsenault2014-10-171-9/+8
| | | | llvm-svn: 220065
* R600/SI: Fix general commuting breaking src modsMatt Arsenault2014-10-172-1/+22
| | | | | | | | The generic code trying to use findCommutedOpIndices won't understand that it needs to swap the modifier operands also, so it should fail if they are set. llvm-svn: 220064
* R600/SI: Cleanup code with ChangeToFPImmediateMatt Arsenault2014-10-171-4/+3
| | | | llvm-svn: 220063
* R600/SI: Allow comuting fp immediatesMatt Arsenault2014-10-171-3/+9
| | | | llvm-svn: 220062
* R600/SI: Use early return instead of checking condition twiceMatt Arsenault2014-10-171-11/+16
| | | | | | Any commutable instruction will have at least src1. llvm-svn: 220061
* R600/SI: Use complex pattern for MUBUF load patterns.Matt Arsenault2014-10-171-3/+2
| | | | | | This eliminates a use of the SI_ADDR64_RSRC pseudo llvm-svn: 220057
* R600/SI: Remove SI_BUFFER_RSRC pseudoMatt Arsenault2014-10-173-42/+23
| | | | | | | Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056
* [X86] Fix missed selection of non-temporal store of zero vector.Andrea Di Biagio2014-10-171-0/+8
| | | | | | | | | | When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054
* [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.James Molloy2014-10-171-9/+9
| | | | | | | | | | We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051
* [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generationBill Schmidt2014-10-172-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047
* Mips: Only set divrem i64 to custom on 64bitJan Vesely2014-10-171-2/+2
| | | | | | Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220046
* [mips] Add support for COP1's Branch-On-Cond-Likely instructionsVasileios Kalintiris2014-10-171-2/+10
| | | | | | | | | | | | Summary: Depends on D5782 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5802 llvm-svn: 220042
* [mips] Add support for COP0's Branch-On-Cond-Likely instructionsVasileios Kalintiris2014-10-171-6/+25
| | | | | | | | | | | | Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5782 llvm-svn: 220036
* ARM: Fix a bug which was causing convergence failure in constant-island pass.Akira Hatanaka2014-10-171-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015
* R600/SI: Simplify debug printingMatt Arsenault2014-10-171-5/+3
| | | | llvm-svn: 219999
* R600/SI: Remove another VALU patternMatt Arsenault2014-10-161-5/+0
| | | | llvm-svn: 219988
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-163-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* R600/SI: Remove unnecessary VALU patternsMatt Arsenault2014-10-161-41/+0
| | | | | | | | These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956
* R600: Fix nonsensical implementation of computeKnownBits for BFEMatt Arsenault2014-10-161-5/+1
| | | | | | This was resulting in invalid simplifications of sdiv llvm-svn: 219953
* Delete -std-compile-opts.Rafael Espindola2014-10-161-22/+22
| | | | | | These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951
OpenPOWER on IntegriCloud