summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Do not verify dominator tree if it has no rootsSerge Pavlov2017-01-251-1/+1
| | | | | | | | | | | If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033
* AMDGPU: Check nsz instead of unsafe mathMatt Arsenault2017-01-252-3/+3
| | | | llvm-svn: 293028
* DAG: Recognize no-signed-zeros-fp-math attributeMatt Arsenault2017-01-253-1/+81
| | | | | | | | clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
* DAGCombiner: Allow negating ConstantFP after legalizeMatt Arsenault2017-01-251-2/+1
| | | | llvm-svn: 293019
* AMDGPU: Implement early ifcvt target hooks.Matt Arsenault2017-01-254-2/+567
| | | | | | | | | | | | Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
* [GlobalISel] Generate selector for more integer binop patterns.Ahmed Bougacha2017-01-251-2/+2
| | | | | | | | This surprisingly isn't NFC because there are patterns to select GPR sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to SUB if NZCV is dead. From ISel's perspective, both are fine. llvm-svn: 293010
* GlobalISel: Use the correct types when translating landingpad instructionsJustin Bogner2017-01-252-2/+46
| | | | | | | | | | | There was a bug here where we were using p0 instead of s32 for the selector type in the landingpad. Instead of hardcoding these types we should get the types from the landingpad instruction directly. Note that we replicate an assert from SDAG here to only support two-valued landingpads. llvm-svn: 292995
* AMDGPU: Remove spurious out branches after a killMatt Arsenault2017-01-241-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
* Revert rL292621. Caused some internal build bot failures in apple.Wei Mi2017-01-243-454/+0
| | | | llvm-svn: 292984
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-24273-377/+374
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* AMDGPU/SI: Give up in promote alloca when a pointer may be captured.Changpeng Fang2017-01-241-0/+47
| | | | | | | | | | Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
* [AMDGPU] Add VGPR copies post regalloc fix passStanislav Mekhanoshin2017-01-241-0/+44
| | | | | | | | | | | | | | Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
* [AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'Evandro Menezes2017-01-241-1/+1
| | | | | | | In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954
* [X86][AVX2] Regenerate test.Simon Pilgrim2017-01-241-2/+1
| | | | llvm-svn: 292950
* [X86][AVX2] Removed FIXME comment and regenerated test.Simon Pilgrim2017-01-241-5/+1
| | | | | | The comment talked about replacing vpmovzxwd+vpslld+vpsrad with vpmovsxwd - which isn't valid as we're sign extending a <8 x i1> bool vector not an all/nobits <8 x i16> llvm-svn: 292948
* [X86][AVX2] Cleaned up test triple and regenerated tests.Simon Pilgrim2017-01-241-14/+2
| | | | llvm-svn: 292946
* [SelectionDAG] Handle inverted conditions when splitting into multiple branches.Geoff Berry2017-01-243-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y | Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944
* [X86][SSE] Add support for constant folding vector arithmetic shift by ↵Simon Pilgrim2017-01-243-45/+38
| | | | | | immediates llvm-svn: 292919
* [X86][SSE] Add support for constant folding vector logical shift by immediatesSimon Pilgrim2017-01-2412-209/+166
| | | | llvm-svn: 292915
* AMDGPU : Add trap handler support.Wei Ding2017-01-241-5/+4
| | | | llvm-svn: 292893
* [SelectionDAG] Teach getNode to simplify a couple easy cases of ↵Craig Topper2017-01-244-301/+12
| | | | | | | | | | | | | | | | | | | | | EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876
* LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.Matthias Braun2017-01-241-0/+22
| | | | | | | | | | | | | | | | | A register unit may be allocatable and non-reserved but some of the register(tuples) built with it are reserved. We still need to calculate liveness in this case. Note to out of tree targets: If you start seeing machine verifier errors with this commit, it probably means that you do not properly mark super registers of reserved register as reserved. See for example r292836 or r292870 for example on how to fix that. rdar://29996737 Differential Revision: https://reviews.llvm.org/D28881 llvm-svn: 292871
* AMDGPU: Custom lower more vector operationsMatt Arsenault2017-01-234-18/+513
| | | | | | This avoids stack usage. llvm-svn: 292846
* DAG: Don't fold vector extract into load if target doesn't want toMatt Arsenault2017-01-231-0/+31
| | | | | | | Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
* [APFloat] Switch from (PPCDoubleDoubleImpl, IEEEdouble) layout to ↵Tim Shen2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (IEEEdouble, IEEEdouble) Summary: This patch changes the layout of DoubleAPFloat, and adjust all operations to do either: 1) (IEEEdouble, IEEEdouble) -> (uint64_t, uint64_t) -> PPCDoubleDoubleImpl, then run the old algorithm. 2) Do the right thing directly. 1) includes multiply, divide, remainder, mod, fusedMultiplyAdd, roundToIntegral, convertFromString, next, convertToInteger, convertFromAPInt, convertFromSignExtendedInteger, convertFromZeroExtendedInteger, convertToHexString, toString, getExactInverse. 2) includes makeZero, makeLargest, makeSmallest, makeSmallestNormalized, compare, bitwiseIsEqual, bitcastToAPInt, isDenormal, isSmallest, isLargest, isInteger, ilogb, scalbn, frexp, hash_value, Profile. I could split this into two patches, e.g. use 1) for all operatoins first, then incrementally change some of them to 2). I didn't do that, because 1) involves code that converts data between PPCDoubleDoubleImpl and (IEEEdouble, IEEEdouble) back and forth, and may pessimize the compiler. Instead, I find easy functions and use approach 2) for them directly. Next step is to implement move multiply and divide from 1) to 2). I don't have plans for other functions in 1). Differential Revision: https://reviews.llvm.org/D27872 llvm-svn: 292839
* AMDGPU: Combine fp16/fp64 subtarget featuresMatt Arsenault2017-01-2310-45/+90
| | | | | | | The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
* [AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.Ahmed Bougacha2017-01-231-6/+12
| | | | | | | | | Since we're now avoiding operations using narrow scalar integer types, we have to legalize the integer side of the FP conversions. This requires teaching the legalizer how to do that. llvm-svn: 292828
* [AArch64][GlobalISel] Legalize narrow scalar ops again.Ahmed Bougacha2017-01-239-415/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since r279760, we've been marking as legal operations on narrow integer types that have wider legal equivalents (for instance, G_ADD s8). Compared to legalizing these operations, this reduced the amount of extends/truncates required, but was always a weird legalization decision made at selection time. So far, we haven't been able to formalize it in a way that permits the selector generated from SelectionDAG patterns to be sufficient. Using a wide instruction (say, s64), when a narrower instruction exists (s32) would introduce register class incompatibilities (when one narrow generic instruction is selected to the wider variant, but another is selected to the narrower variant). It's also impractical to limit which narrow operations are matched for which instruction, as restricting "narrow selection" to ranges of types clashes with potentially incompatible instruction predicates. Concerns were also raised regarding MIPS64's sign-extended register assumptions, as well as wrapping behavior. See discussions in https://reviews.llvm.org/D26878. Instead, legalize the operations. Should we ever revert to selecting these narrow operations, we should try to represent this more accurately: for instance, by separating a "concrete" type on operations, and an "underlying" type on vregs, we could move the "this narrow-looking op is really legal" decision to the legalizer, and let the selector use the "underlying" vreg type only, which would be guaranteed to map to a register class. In any case, we eventually should mitigate: - the performance impact by selecting no-op extract/truncates to COPYs (which we currently do), and the COPYs to register reuses (which we don't do yet). - the compile-time impact by optimizing away extract/truncate sequences in the legalizer. llvm-svn: 292827
* [ARM] Classification Improvements to ARM Sched-Models. NFCI.Javed Absar2017-01-231-0/+69
| | | | | | | | | | | | | | | | | This is a series of patches to enable adding of machine sched models for ARM processors easier and compact. They define new sched-readwrites for groups of ARM instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. The current patch focuses on floating-point instructions. Reviewers: Diana Picus (rovka), Renato Golin (rengolin) Differential Revision: https://reviews.llvm.org/D28194 llvm-svn: 292825
* DAG: Allow legalization of fcanonicalize vector typesMatt Arsenault2017-01-231-0/+214
| | | | llvm-svn: 292814
* [X86][SSE] Add missing X86ISD::ANDNP combines.Simon Pilgrim2017-01-222-41/+7
| | | | llvm-svn: 292767
* [X86][SSE] Improve shuffle combining with zero insertionsSimon Pilgrim2017-01-222-59/+30
| | | | | | Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766
* [X86][SSE] Regenerate sqrt testsSimon Pilgrim2017-01-221-13/+15
| | | | llvm-svn: 292764
* Fix test name. NFCI.Simon Pilgrim2017-01-221-6/+6
| | | | llvm-svn: 292763
* Fix some broken CHECK lines.Benjamin Kramer2017-01-2214-23/+23
| | | | | | The colon is important. llvm-svn: 292761
* [x86] avoid crashing with illegal vector type (PR31672)Sanjay Patel2017-01-221-0/+133
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758
* [X86] Don't allow commuting to form phsub operations.Craig Topper2017-01-211-4/+29
| | | | | | Fixes PR31714. llvm-svn: 292713
* [X86] Add test cases that show bad commuting being allowed to create a phsub ↵Craig Topper2017-01-211-0/+33
| | | | | | operation. llvm-svn: 292712
* [NVPTX] Add explicit check for llvm.sqrt.f32 to intrinsics.ll.Justin Lebar2017-01-211-0/+8
| | | | | | Test-only change. llvm-svn: 292690
* [ValueTracking] recognize variations of 'clamp' to improve codegen (PR31693)Sanjay Patel2017-01-201-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By enhancing value tracking, we allow an existing min/max canonicalization to kick in and improve codegen for several targets that have min/max instructions. Unfortunately, recognizing min/max in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but I'm hoping we can remove that soon. Correctness proofs based on Alive: Name: smaxmin Pre: C1 < C2 %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp slt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp sgt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: sminmax Pre: C1 > C2 %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp sgt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp slt i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: smaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: sminmax Done: 1 Optimization is correct! Name: umaxmin Pre: C1 u< C2 %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ult i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ugt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: uminmax Pre: C1 u> C2 %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ugt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ult i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: umaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: uminmax Done: 1 Optimization is correct! llvm-svn: 292660
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-15/+15
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* [WebAssembly] Don't create bitcast-wrappers for varargs.Dan Gohman2017-01-201-0/+17
| | | | | | | | | WebAssembly varargs functions use a significantly different ABI than non-varargs functions, and the current code in WebAssemblyFixFunctionBitcasts doesn't handle that difference. For now, just avoid creating wrapper functions in the presence of varargs. llvm-svn: 292645
* [x86] add tests to show missed min/max vector codegen (PR31693)Sanjay Patel2017-01-201-12/+73
| | | | llvm-svn: 292640
* AArch64LoadStoreOptimizer: Update kill flags when merging storesMatthias Braun2017-01-202-133/+132
| | | | | | | | | | | | | | Kill flags need to be updated correctly when moving stores up/down to form store pair instructions. Those invalid flags have been ignored before but as of r290014 they are recognized when using -mllvm -verify-machineinstrs. Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it to ldst-opt.mir test and adds a new tests for this change. Differential Revision: https://reviews.llvm.org/D28875 llvm-svn: 292625
* [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".Wei Mi2017-01-203-0/+454
| | | | | | | | | | | | | | | The recommit fixes a bug related with live interval update after the partial redundent copy is moved. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292621
* [Thumb] Add support for tMUL in the compare instruction peephole optimizer.Sjoerd Meijer2017-01-202-0/+186
| | | | | | | | | | | | | | | | | We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead. Differential Revision: https://reviews.llvm.org/D27990 llvm-svn: 292608
* [AVX-512] Fix a couple test cases to not pass an undef mask to gather ↵Craig Topper2017-01-201-6/+14
| | | | | | intrinsic. This could break if any future optimizations taken advantage of the undef. llvm-svn: 292585
* [test] Remove a unwanted match for `XFAIL:`.Greg Parker2017-01-201-1/+1
| | | | llvm-svn: 292567
* [AArch64][GlobalISel] Widen scalar int->fp conversions.Ahmed Bougacha2017-01-201-6/+12
| | | | | | | It's incorrect to ignore the higher bits of the integer source. Teach the legalizer how to widen it. llvm-svn: 292563
* [AMDGPU] Prevent spills before exec mask is restoredStanislav Mekhanoshin2017-01-201-0/+78
| | | | | | | | | | | | | Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
OpenPOWER on IntegriCloud