summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [x86] Clean up some tests to use FileCheck and combine two into a singleChandler Carruth2014-08-283-25/+59
| | | | | | | | | file. Changing code that is covered by these tests is just too hard to debug currently, and now it will be clear the nature of the changes. llvm-svn: 216643
* [FastISel] Undo phi node updates when falling-back to SelectionDAG.Juergen Ributzka2014-08-281-0/+26
| | | | | | | | | | | | | | | | | | | | The included test case would fail, because the MI PHI node would have two operands from the same predecessor. This problem occurs when a switch instruction couldn't be selected. This happens always, because there is no default switch support for FastISel to begin with. The problem was that FastISel would first add the operand to the PHI nodes and then fall-back to SelectionDAG, which would then in turn add the same operands to the PHI nodes again. This fix removes these duplicate PHI node operands by reseting the PHINodesToUpdate to its original state before FastISel tried to select the instruction. This fixes <rdar://problem/18155224>. llvm-svn: 216640
* [FastISel]Juergen Ributzka2014-08-281-0/+10
| | | | | | | | | | | | | | | | | | | | Currently instructions are folded very aggressively for AArch64 into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. This fix teaches hasTrivialKill to not only check the LLVM IR that the value has a single use, but also to check if the register that represents that value has already been used. This can happen when the instruction with the use was folded into another instruction (in this particular case a load instruction). This fixes rdar://problem/18142857. llvm-svn: 216634
* Revert "[FastISel][AArch64] Don't fold instructions too aggressively into ↵Juergen Ributzka2014-08-271-130/+0
| | | | | | | | the memory operation." Quentin pointed out that this is not the correct approach and there is a better and easier solution. llvm-svn: 216632
* [FastISel][AArch64] Don't fold instructions too aggressively into the memory ↵Juergen Ributzka2014-08-271-0/+130
| | | | | | | | | | | | | | | | | | | | | operation. Currently instructions are folded very aggressively into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. If the computed address is used by only memory operations in the same basic block, then it is safe to fold them. This is because all memory operations will fold the address computation and the original computation will never be emitted. This fixes rdar://problem/18142857. llvm-svn: 216629
* [FastISel][AArch64] Fix simplify address when the address comes from a shift.Juergen Ributzka2014-08-271-0/+21
| | | | | | | | | When the address comes directly from a shift instruction then the address computation cannot be folded into the memory instruction, because the zero register is not available as a base register. Simplify addess needs to emit the shift instruction and use the result as base register. llvm-svn: 216621
* [FastISel][AArch64] Use the zero register for stores.Juergen Ributzka2014-08-272-10/+15
| | | | | | | | | Use the zero register directly when possible to avoid an unnecessary register copy and a wasted register at -O0. This also uses integer stores to store a positive floating-point zero. This saves us from materializing the positive zero in a register and then storing it. llvm-svn: 216617
* Teach the AArch64 backend about v4f16 and v8f16Oliver Stannard2014-08-277-25/+1462
| | | | | | | | This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555
* [x86] Fix a regression introduced with r213897 for 32-bit targets whereChandler Carruth2014-08-271-2/+13
| | | | | | | | | | | | we stopped efficiently lowering sextload using the SSE41 instructions for that operation. This is a consequence of a bad predicate I used thinking of the memory access needs. The code actually handles the cases where the predicate doesn't apply, and handles them much better. =] Simple fix and a test case added. Fixes PR20767. llvm-svn: 216538
* [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.Chandler Carruth2014-08-272-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be *exactly* right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537
* AVX-512: Added intrinsic for VMOVSS store form with mask.Elena Demikhovsky2014-08-271-0/+9
| | | | llvm-svn: 216530
* [FastISel][AArch64] Fix address simplification.Juergen Ributzka2014-08-271-0/+27
| | | | | | | | | | | | When a shift with extension or an add with shift and extension cannot be folded into the memory operation, then the address calculation has to be materialized separately. While doing so the code forgot to consider a possible sign-/zero- extension. This fix folds now also the sign-/zero-extension into the add or shift instruction which is used to materialize the address. This fixes rdar://problem/18141718. llvm-svn: 216511
* [FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction.Juergen Ributzka2014-08-271-0/+170
| | | | llvm-svn: 216510
* ARM: Add patterns for dbgYi Kong2014-08-261-0/+13
| | | | llvm-svn: 216451
* [AArch32] Add patterns for VCVT{A,N,P,M}.Chad Rosier2014-08-251-0/+117
| | | | | | | Patterns for lowering libm calls to VCVT{A,N,P,M} are also included. Phabricator Revision: http://reviews.llvm.org/D5033 llvm-svn: 216388
* [PowerPC] Add support for dcbtst and icbt (prefetch)Hal Finkel2014-08-231-3/+22
| | | | | | | | | | | | Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. llvm-svn: 216339
* Revert "ARM: improve RTABI 4.2 conformance on Linux"Chad Rosier2014-08-231-9/+20
| | | | | | | This reverts commit r215862 due to nightly failures. Will work on getting a reduced test case, but I wanted to get our bots green in the meantime. llvm-svn: 216325
* [x86] Start fixing a really subtle and terrible form of miscompile inChandler Carruth2014-08-231-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | these DAG combines. The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing a node with its operand, you can cause its uses to CSE to itself, which then causes their uses to become your uses which causes them to be picked up by the RAUW. For nodes that are determined to be "no-ops", this is "fine". But if the RAUW is one of several steps to enact a transformation, this causes the DAG to really silently eat an discard nodes that you would never expect. It took days for me to actually pinpoint a test case triggering this and a really frustrating amount of time to even comprehend the bug because I never even thought about the ability of RAUW to iteratively consume nodes due to CSE-ing them into itself. To fix this, we have to build up a brand-new chain of operations any time we are combining across (potentially) intervening nodes. But once the logic is added to do this, another issue surfaces: CombineTo eagerly deletes the one node combined, *but no others*. This is... really frustrating. If deleting it makes its operands become dead, those operand nodes often won't go onto the worklist in the order you would want -- they're already on it and not near the top. That means things higher on the worklist will get combined prior to these dead nodes being GCed out of the worklist, and if the chain is long, the immediate users won't be enough to re-detect where the root of the chain is that became single-use again after deleting the dead nodes. The better way to do this is to never immediately delete nodes, and instead to just enqueue them so we can recursively delete them. The combined-from node is typically not on the worklist anyways by virtue of having been popped off.... But that in turn breaks other tests that *require* CombineTo to delete unused nodes. :: sigh :: Fortunately, there is a better way. This whole routine should have been returning the replacement rather than using CombineTo which is quite hacky. Switch to that, and all the pieces fall together. I suspect the same kind of miscompile is possible in the half-shuffle folding code, and potentially the recursive folding code. I'll be switching those over to a pattern more like this one for safety's sake even though I don't immediately have any test cases for them. Note that the only way I got a test case for this instance was with *heavily* DAG combined 256-bit shuffle sequences generated by my fuzzer. ;] llvm-svn: 216319
* Revert r215611 because it caused the infinite loop in bug 20736. There is a ↵Nick Lewycky2014-08-231-0/+5
| | | | | | reduced testcase in that bug. llvm-svn: 216307
* ARM / x86_64 varargs: Don't save regparms in prologue without va_startReid Kleckner2014-08-228-3/+40
| | | | | | | | | | | | There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294
* R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignmentTom Stellard2014-08-221-2/+53
| | | | llvm-svn: 216279
* R600/SI: Use a ComplexPattern for DS loads and storesTom Stellard2014-08-225-88/+111
| | | | llvm-svn: 216278
* [ARM] Move the implementation of the target hooks related to copy-relatedQuentin Colombet2014-08-221-0/+2
| | | | | | | | | instruction from ARMInstrInfo to ARMBaseInstrInfo. That way, thumb mode can also benefit from the advanced copy optimization. <rdar://problem/12702965> llvm-svn: 216274
* [mips] Don't use odd-numbered float registers for double arguments for fastccSasa Stankovic2014-08-221-0/+82
| | | | | | | | calling convention if FP is 64-bit and +nooddspreg is used. Differential Revision: http://reviews.llvm.org/D4981.diff llvm-svn: 216262
* [FastISel][AArch64] Add support for variable shift.Juergen Ributzka2014-08-211-12/+112
| | | | | | | | This adds the missing variable shift support for value type i8, i16, and i32. This fixes <rdar://problem/18095685>. llvm-svn: 216242
* [FastISel][AArch64] Use the correct register class to make the MI verifier ↵Juergen Ributzka2014-08-2127-45/+44
| | | | | | | | | | | | | | | happy. This is mostly achieved by providing the correct register class manually, because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and MVT::i64. Also cleanup the code to use the FastEmitInst_* method whenever possible. This makes sure that the operands' register class is properly constrained. For all the remaining cases this adds the missing constrainOperandRegClass calls for each operand. llvm-svn: 216225
* R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructionsTom Stellard2014-08-211-0/+28
| | | | llvm-svn: 216220
* R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the functionTom Stellard2014-08-211-3/+5
| | | | | | This fixes a crash in an ocl conformance test. llvm-svn: 216219
* [AArch64] Run a peephole pass right after AdvSIMD pass.Quentin Colombet2014-08-211-2/+23
| | | | | | | | | The AdvSIMD pass may produce copies that are not coalescer-friendly. The peephole optimizer knows how to fix that as demonstrated in the test case. <rdar://problem/12702965> llvm-svn: 216200
* Thumb1 load/store optimizer: Improve code to materialize new base register.Moritz Roth2014-08-213-2/+31
| | | | | | | | | | | | | There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only the latter supports using different source and destination registers, so whenever we materialize a new base register (at a certain offset) we'd do so by moving the base register value to the new register and then adding in place. This patch changes the code to use a single tADDi3 if the offset is small enough to fit in 3 bits. Differential Revision: http://reviews.llvm.org/D5006 llvm-svn: 216193
* [FastISel][AArch64] Remove redundant test.Juergen Ributzka2014-08-211-23/+0
| | | | | | These tests and many more are already covered by fast-isel-addressing-modes.ll. llvm-svn: 216186
* Add a thread-model knob for lowering atomics on baremetal & single threaded ↵Jonathan Roelofs2014-08-211-0/+113
| | | | | | | | systems http://reviews.llvm.org/D4984 llvm-svn: 216182
* DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors ↵Benjamin Kramer2014-08-212-0/+18
| | | | | | | | with many arguments. PR20677 llvm-svn: 216175
* [ARM] Enable DP copy, load and store instructions for FPv4-SPOliver Stannard2014-08-218-13/+1074
| | | | | | | | | | | | | | | | | The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172
* [x86] Added _addcarry_ and _subborrow_ intrinsicsRobert Khasanov2014-08-211-0/+51
| | | | llvm-svn: 216164
* [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.Robert Khasanov2014-08-211-0/+26
| | | | llvm-svn: 216162
* Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG ↵Jiangning Liu2014-08-212-266/+12
| | | | | | Builder and type". llvm-svn: 216147
* [PeepholeOptimizer] Take advantage of the isInsertSubreg property in theQuentin Colombet2014-08-211-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> llvm-svn: 216144
* Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequenceJonathan Roelofs2014-08-202-2/+43
| | | | | | | | | | | On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to be avoided. This patch trades simplicity for implementation time at the expense of performance... As they say: correctness first, then performance. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few ideas on how to make this better. llvm-svn: 216138
* Don't prevent a vselect of constants from becoming a single load (PR20648).Sanjay Patel2014-08-201-0/+10
| | | | | | | | | | | | | | | Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648 This patch checks the operands of a vselect to see if all values are constants. If yes, bail out of any further attempts to create a blend or shuffle because SelectionDAGLegalize knows how to turn this kind of vselect into a single load. This already happens for machines without SSE4.1, so the added checks just send more targets down that path. Differential Revision: http://reviews.llvm.org/D4934 llvm-svn: 216121
* X86: Add missing triples from r216119Duncan P. N. Exon Smith2014-08-201-2/+2
| | | | llvm-svn: 216120
* X86: Align the stack on word boundaries in LowerFormalArguments()Duncan P. N. Exon Smith2014-08-201-0/+30
| | | | | | | | | | | | | | | The goal of the patch is to implement section 3.2.3 of the AMD64 ABI correctly. The controlling sentence is, "The size of each argument gets rounded up to eightbytes. Therefore the stack will always be eightbyte aligned." The equivalent sentence in the i386 ABI page 37 says, "At all times, the stack pointer should point to a word-aligned area." For both architectures, the stack pointer is not being rounded up to the nearest eightbyte or word between the last normal argument and the first variadic argument. Patch by Thomas Jablin! llvm-svn: 216119
* Do not insert a tail call when returning multiple values on X86Keno Fischer2014-08-201-0/+16
| | | | | | | | | | | | | | | | | | | | | Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530. The problem is that X86ISelLowering erroneously thought the third call was eligible for tail call elimination. It would have been if it's return value was actually the one returned by the calling function, but here that is not the case and additional values are being returned. Test Plan: Test case from the original bug report is included. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D4968 llvm-svn: 216117
* critical-anti-dependency breaker: don't use reg def info from kill insts ↵Sanjay Patel2014-08-201-0/+28
| | | | | | | | | | | | | | | | | | (PR20308) In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker caused a miscompile because it broke a WAR hazard using a register that it thinks is available based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from a kill because they are really just nops. This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again. The test case is a reduced version of the code from the bug report. Differential Revision: http://reviews.llvm.org/D4977 llvm-svn: 216114
* [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage ofQuentin Colombet2014-08-201-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088
* [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.Juergen Ributzka2014-08-201-0/+11
| | | | | | | | | | This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero- extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both operands have to be sign-/zero-extended with separate instructions. Related to <rdar://problem/17913111>. llvm-svn: 216073
* Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and typeJiangning Liu2014-08-202-12/+266
| | | | | | | | legalization stage. With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end. llvm-svn: 216066
* [x32] Fix FrameIndex check in SelectLEA64_32AddrPavel Chupin2014-08-205-2/+71
| | | | | | | | | | | | | | | | | | Summary: Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new lea-5.ll case. Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in ESP/EBP case. Test Plan: lea tests modified to include x32/nacl and new test added Reviewers: nadav, dschuff, t.p.northover Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4929 llvm-svn: 216065
* ARM: Fix codegen for rbit intrinsicYi Kong2014-08-202-0/+40
| | | | | | | | | | | | LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 llvm-svn: 216064
* [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.Juergen Ributzka2014-08-201-1/+1
| | | | | | | | | | Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register class to be used with the zero register. This makes the MachineInstruction verifier happy again. This is related to <rdar://problem/18027157>. llvm-svn: 216040
OpenPOWER on IntegriCloud