summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [WinEH] Teach AsmPrinter about funcletsDavid Majnemer2015-09-2910-32/+164
| | | | | | | | | | | Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824
* [AArch64] Add integer pre- and post-index halfword/byte loads and stores.Chad Rosier2015-09-291-0/+156
| | | | llvm-svn: 248817
* Arguments spilled on the stack before a function call may haveJeroen Ketema2015-09-291-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786
* [WebAssembly] Rename test files to match platform naming conventions.Dan Gohman2015-09-294-0/+0
| | | | llvm-svn: 248783
* [WinEH] Fix ip2state table emission with funcletsReid Kleckner2015-09-282-33/+82
| | | | | | | Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763
* AMDGPU: Fix splitting x16 SMRD loadsMatt Arsenault2015-09-281-0/+43
| | | | | | | | When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
* AMDGPU: Fix moving SMRD loads with literal offsets on CIMatt Arsenault2015-09-281-1/+98
| | | | llvm-svn: 248740
* AMDGPU: Add testcasesMatt Arsenault2015-09-281-0/+119
| | | | | | | Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738
* AMDGPU: Cleanup testMatt Arsenault2015-09-281-82/+83
| | | | | | | | | Run instnamer on it, and rename check prefix. This is in preparation for adding new testcases to cover bugs on other subtargets. llvm-svn: 248737
* [WebAssembly] Support for direct call and call_indirect.Dan Gohman2015-09-282-32/+45
| | | | llvm-svn: 248716
* [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walkingHal Finkel2015-09-281-0/+41
| | | | | | | | | | | | | | | | When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698
* [EH] Create removeUnwindEdge utilityJoseph Tremoulet2015-09-271-0/+31
| | | | | | | | | | | | | | | | | Summary: Factor the code that rewrites invokes to calls and rewrites WinEH terminators to their "unwind to caller" equivalents into a helper in Utils/Local, and use it in the three places I'm aware of that need to do this. Reviewers: andrew.w.kaylor, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13152 llvm-svn: 248677
* AMDGPU: Fix sched model for VOP2b instructionsMatt Arsenault2015-09-261-2/+3
| | | | | | | | Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646
* [WebAssembly] Rename several functions and types according to the new spec.Dan Gohman2015-09-264-34/+34
| | | | llvm-svn: 248644
* [ARM] Don't generate clrex for pre-v7 targets.Ahmed Bougacha2015-09-261-1/+32
| | | | | | Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640
* Use fixed-point representation for BranchProbability.Cong Hou2015-09-251-1/+3
| | | | | | | | | | | | | | | | | | | | BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633
* SelectionDAGDumper: Print simple operands inline.Matthias Braun2015-09-251-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628
* merge vector stores into wider vector stores and fix AArch64 misaligned ↵Sanjay Patel2015-09-252-4/+32
| | | | | | | | | | | | | | | | | | | | | | access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622
* PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepointMatthias Braun2015-09-251-0/+166
| | | | | | | | | | | | | | The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620
* AMDGPU/SI: Use .hsatext section instead of .text for HSATom Stellard2015-09-251-0/+13
| | | | | | | | | | Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619
* PeepholeOptimizer: Remove redundant copiesMatt Arsenault2015-09-252-27/+38
| | | | | | | | | | | | If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611
* AMDGPU: Add some more tests for literal operandsMatt Arsenault2015-09-252-6/+231
| | | | llvm-svn: 248600
* [AArch64] Add support for generating pre- and post-index load/store pairs.Chad Rosier2015-09-252-4/+128
| | | | llvm-svn: 248593
* AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAGMatt Arsenault2015-09-251-0/+19
| | | | | | | | | | | | | This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589
* AMDGPU: Improve accuracy of instruction rates for VOPCMatt Arsenault2015-09-252-8/+10
| | | | | | | | | | | These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582
* ARM: address WoA division limitationSaleem Abdulrasool2015-09-252-36/+49
| | | | | | | | | | | | | We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561
* [X86][SSE2] Fix zero/any extension shuffles that don't start from the first ↵Simon Pilgrim2015-09-241-2/+2
| | | | | | | | element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536
* AMDGPU: Add s_dcache_* instructionsMatt Arsenault2015-09-244-0/+112
| | | | llvm-svn: 248533
* AMDGPU: Add cache invalidation instructions.Matt Arsenault2015-09-243-0/+46
| | | | | | | | | | These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532
* Regression Test: Deletes redundant/invalid test.Mohammad Shahid2015-09-241-242/+0
| | | | | | | | Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493
* Codegen: Fix llvm.*absdiff semantic.Mohammad Shahid2015-09-242-0/+210
| | | | | | | | Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483
* Introduce target hook for optimizing register copiesMatt Arsenault2015-09-249-124/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
* AMDGPU: Fix printing trailing whitespace for mubuf atomicsMatt Arsenault2015-09-241-10/+10
| | | | llvm-svn: 248472
* Use new TokenFactor chain when merging storesMatt Arsenault2015-09-241-0/+53
| | | | | | | | | | | | | | | | | | | | | If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468
* AMDGPU: Reduce number of copies emittedMatt Arsenault2015-09-243-16/+10
| | | | | | | | | | | | | Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467
* ARM: fix folding stack adjustment (again again again...)Tim Northover2015-09-231-1/+2
| | | | | | | | | | | | This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437
* Swap loop invariant GEP with loop variant GEP to allow more LICM.Lawrence Hu2015-09-231-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420
* [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating ↵Sanjay Patel2015-09-231-4/+1
| | | | | | | | | | | | | | | | | | | | | | | on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415
* [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on ↵Sanjay Patel2015-09-231-4/+1
| | | | | | | | | | | | | | | | | | | | | | | FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409
* [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating ↵Sanjay Patel2015-09-231-11/+4
| | | | | | | | | | | | | | | | | | | | | on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395
* [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IRSimon Pilgrim2015-09-236-180/+132
| | | | | | | | | | This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368
* Add a test case for the fix of profile update issue when lowering switch ↵Cong Hou2015-09-231-0/+52
| | | | | | statement. llvm-svn: 248356
* LiveIntervalAnalysis: Avoid multiple connected liveness componentsMatthias Braun2015-09-221-1/+0
| | | | | | | | | | | | | We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335
* [ARM] Emit clrex in the expanded cmpxchg fail block.Ahmed Bougacha2015-09-225-69/+138
| | | | | | | | | | | | | | | | | | | ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294
* [AArch64] Emit clrex in the expanded cmpxchg fail block.Ahmed Bougacha2015-09-222-33/+53
| | | | | | | | | | | | | | | | | In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291
* Don't raise inexact when lowering ceil, floor, round, trunc.Stephen Canon2015-09-227-382/+172
| | | | | | | | The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266
* [X86][SSE] Match zero/any extension shuffles that don't start from the first ↵Simon Pilgrim2015-09-228-163/+237
| | | | | | | | | | | | element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250
* [DAGCombiner] Improve FMA support for interpolation patternsSimon Pilgrim2015-09-213-1/+506
| | | | | | | | | | This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210
* [ARM] Do not scale vext with a factorJeroen Ketema2015-09-211-0/+11
| | | | | | | | | | | | | The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208
* [SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOSUlrich Weigand2015-09-212-0/+329
| | | | | | | | | | | | | The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there is no legal instruction for those on SystemZ. This could cause LLVM internal errors. Fixed by setting the operation action to Expand for those opcodes. Also added test cases for all other LLVM IR intrinsics that should generate a library call. (Those already work correctly since the default operation action is fine.) llvm-svn: 248180
OpenPOWER on IntegriCloud