summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/SI: Give up in promote alloca when a pointer may be captured.Changpeng Fang2017-01-241-0/+47
| | | | | | | | | | Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
* [AMDGPU] Add VGPR copies post regalloc fix passStanislav Mekhanoshin2017-01-241-0/+44
| | | | | | | | | | | | | | Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
* [AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'Evandro Menezes2017-01-241-1/+1
| | | | | | | In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954
* [X86][AVX2] Regenerate test.Simon Pilgrim2017-01-241-2/+1
| | | | llvm-svn: 292950
* [X86][AVX2] Removed FIXME comment and regenerated test.Simon Pilgrim2017-01-241-5/+1
| | | | | | The comment talked about replacing vpmovzxwd+vpslld+vpsrad with vpmovsxwd - which isn't valid as we're sign extending a <8 x i1> bool vector not an all/nobits <8 x i16> llvm-svn: 292948
* [X86][AVX2] Cleaned up test triple and regenerated tests.Simon Pilgrim2017-01-241-14/+2
| | | | llvm-svn: 292946
* [SelectionDAG] Handle inverted conditions when splitting into multiple branches.Geoff Berry2017-01-243-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y | Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944
* [X86][SSE] Add support for constant folding vector arithmetic shift by ↵Simon Pilgrim2017-01-243-45/+38
| | | | | | immediates llvm-svn: 292919
* [X86][SSE] Add support for constant folding vector logical shift by immediatesSimon Pilgrim2017-01-2412-209/+166
| | | | llvm-svn: 292915
* AMDGPU : Add trap handler support.Wei Ding2017-01-241-5/+4
| | | | llvm-svn: 292893
* [SelectionDAG] Teach getNode to simplify a couple easy cases of ↵Craig Topper2017-01-244-301/+12
| | | | | | | | | | | | | | | | | | | | | EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876
* LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.Matthias Braun2017-01-241-0/+22
| | | | | | | | | | | | | | | | | A register unit may be allocatable and non-reserved but some of the register(tuples) built with it are reserved. We still need to calculate liveness in this case. Note to out of tree targets: If you start seeing machine verifier errors with this commit, it probably means that you do not properly mark super registers of reserved register as reserved. See for example r292836 or r292870 for example on how to fix that. rdar://29996737 Differential Revision: https://reviews.llvm.org/D28881 llvm-svn: 292871
* AMDGPU: Custom lower more vector operationsMatt Arsenault2017-01-234-18/+513
| | | | | | This avoids stack usage. llvm-svn: 292846
* DAG: Don't fold vector extract into load if target doesn't want toMatt Arsenault2017-01-231-0/+31
| | | | | | | Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
* [APFloat] Switch from (PPCDoubleDoubleImpl, IEEEdouble) layout to ↵Tim Shen2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (IEEEdouble, IEEEdouble) Summary: This patch changes the layout of DoubleAPFloat, and adjust all operations to do either: 1) (IEEEdouble, IEEEdouble) -> (uint64_t, uint64_t) -> PPCDoubleDoubleImpl, then run the old algorithm. 2) Do the right thing directly. 1) includes multiply, divide, remainder, mod, fusedMultiplyAdd, roundToIntegral, convertFromString, next, convertToInteger, convertFromAPInt, convertFromSignExtendedInteger, convertFromZeroExtendedInteger, convertToHexString, toString, getExactInverse. 2) includes makeZero, makeLargest, makeSmallest, makeSmallestNormalized, compare, bitwiseIsEqual, bitcastToAPInt, isDenormal, isSmallest, isLargest, isInteger, ilogb, scalbn, frexp, hash_value, Profile. I could split this into two patches, e.g. use 1) for all operatoins first, then incrementally change some of them to 2). I didn't do that, because 1) involves code that converts data between PPCDoubleDoubleImpl and (IEEEdouble, IEEEdouble) back and forth, and may pessimize the compiler. Instead, I find easy functions and use approach 2) for them directly. Next step is to implement move multiply and divide from 1) to 2). I don't have plans for other functions in 1). Differential Revision: https://reviews.llvm.org/D27872 llvm-svn: 292839
* AMDGPU: Combine fp16/fp64 subtarget featuresMatt Arsenault2017-01-2310-45/+90
| | | | | | | The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
* [AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.Ahmed Bougacha2017-01-231-6/+12
| | | | | | | | | Since we're now avoiding operations using narrow scalar integer types, we have to legalize the integer side of the FP conversions. This requires teaching the legalizer how to do that. llvm-svn: 292828
* [AArch64][GlobalISel] Legalize narrow scalar ops again.Ahmed Bougacha2017-01-239-415/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since r279760, we've been marking as legal operations on narrow integer types that have wider legal equivalents (for instance, G_ADD s8). Compared to legalizing these operations, this reduced the amount of extends/truncates required, but was always a weird legalization decision made at selection time. So far, we haven't been able to formalize it in a way that permits the selector generated from SelectionDAG patterns to be sufficient. Using a wide instruction (say, s64), when a narrower instruction exists (s32) would introduce register class incompatibilities (when one narrow generic instruction is selected to the wider variant, but another is selected to the narrower variant). It's also impractical to limit which narrow operations are matched for which instruction, as restricting "narrow selection" to ranges of types clashes with potentially incompatible instruction predicates. Concerns were also raised regarding MIPS64's sign-extended register assumptions, as well as wrapping behavior. See discussions in https://reviews.llvm.org/D26878. Instead, legalize the operations. Should we ever revert to selecting these narrow operations, we should try to represent this more accurately: for instance, by separating a "concrete" type on operations, and an "underlying" type on vregs, we could move the "this narrow-looking op is really legal" decision to the legalizer, and let the selector use the "underlying" vreg type only, which would be guaranteed to map to a register class. In any case, we eventually should mitigate: - the performance impact by selecting no-op extract/truncates to COPYs (which we currently do), and the COPYs to register reuses (which we don't do yet). - the compile-time impact by optimizing away extract/truncate sequences in the legalizer. llvm-svn: 292827
* [ARM] Classification Improvements to ARM Sched-Models. NFCI.Javed Absar2017-01-231-0/+69
| | | | | | | | | | | | | | | | | This is a series of patches to enable adding of machine sched models for ARM processors easier and compact. They define new sched-readwrites for groups of ARM instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. The current patch focuses on floating-point instructions. Reviewers: Diana Picus (rovka), Renato Golin (rengolin) Differential Revision: https://reviews.llvm.org/D28194 llvm-svn: 292825
* DAG: Allow legalization of fcanonicalize vector typesMatt Arsenault2017-01-231-0/+214
| | | | llvm-svn: 292814
* [X86][SSE] Add missing X86ISD::ANDNP combines.Simon Pilgrim2017-01-222-41/+7
| | | | llvm-svn: 292767
* [X86][SSE] Improve shuffle combining with zero insertionsSimon Pilgrim2017-01-222-59/+30
| | | | | | Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766
* [X86][SSE] Regenerate sqrt testsSimon Pilgrim2017-01-221-13/+15
| | | | llvm-svn: 292764
* Fix test name. NFCI.Simon Pilgrim2017-01-221-6/+6
| | | | llvm-svn: 292763
* Fix some broken CHECK lines.Benjamin Kramer2017-01-2214-23/+23
| | | | | | The colon is important. llvm-svn: 292761
* [x86] avoid crashing with illegal vector type (PR31672)Sanjay Patel2017-01-221-0/+133
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758
* [X86] Don't allow commuting to form phsub operations.Craig Topper2017-01-211-4/+29
| | | | | | Fixes PR31714. llvm-svn: 292713
* [X86] Add test cases that show bad commuting being allowed to create a phsub ↵Craig Topper2017-01-211-0/+33
| | | | | | operation. llvm-svn: 292712
* [NVPTX] Add explicit check for llvm.sqrt.f32 to intrinsics.ll.Justin Lebar2017-01-211-0/+8
| | | | | | Test-only change. llvm-svn: 292690
* [ValueTracking] recognize variations of 'clamp' to improve codegen (PR31693)Sanjay Patel2017-01-201-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By enhancing value tracking, we allow an existing min/max canonicalization to kick in and improve codegen for several targets that have min/max instructions. Unfortunately, recognizing min/max in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but I'm hoping we can remove that soon. Correctness proofs based on Alive: Name: smaxmin Pre: C1 < C2 %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp slt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp sgt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: sminmax Pre: C1 > C2 %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp sgt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp slt i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: smaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: sminmax Done: 1 Optimization is correct! Name: umaxmin Pre: C1 u< C2 %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ult i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ugt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: uminmax Pre: C1 u> C2 %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ugt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ult i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: umaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: uminmax Done: 1 Optimization is correct! llvm-svn: 292660
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-15/+15
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* [WebAssembly] Don't create bitcast-wrappers for varargs.Dan Gohman2017-01-201-0/+17
| | | | | | | | | WebAssembly varargs functions use a significantly different ABI than non-varargs functions, and the current code in WebAssemblyFixFunctionBitcasts doesn't handle that difference. For now, just avoid creating wrapper functions in the presence of varargs. llvm-svn: 292645
* [x86] add tests to show missed min/max vector codegen (PR31693)Sanjay Patel2017-01-201-12/+73
| | | | llvm-svn: 292640
* AArch64LoadStoreOptimizer: Update kill flags when merging storesMatthias Braun2017-01-202-133/+132
| | | | | | | | | | | | | | Kill flags need to be updated correctly when moving stores up/down to form store pair instructions. Those invalid flags have been ignored before but as of r290014 they are recognized when using -mllvm -verify-machineinstrs. Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it to ldst-opt.mir test and adds a new tests for this change. Differential Revision: https://reviews.llvm.org/D28875 llvm-svn: 292625
* [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".Wei Mi2017-01-203-0/+454
| | | | | | | | | | | | | | | The recommit fixes a bug related with live interval update after the partial redundent copy is moved. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292621
* [Thumb] Add support for tMUL in the compare instruction peephole optimizer.Sjoerd Meijer2017-01-202-0/+186
| | | | | | | | | | | | | | | | | We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead. Differential Revision: https://reviews.llvm.org/D27990 llvm-svn: 292608
* [AVX-512] Fix a couple test cases to not pass an undef mask to gather ↵Craig Topper2017-01-201-6/+14
| | | | | | intrinsic. This could break if any future optimizations taken advantage of the undef. llvm-svn: 292585
* [test] Remove a unwanted match for `XFAIL:`.Greg Parker2017-01-201-1/+1
| | | | llvm-svn: 292567
* [AArch64][GlobalISel] Widen scalar int->fp conversions.Ahmed Bougacha2017-01-201-6/+12
| | | | | | | It's incorrect to ignore the higher bits of the integer source. Teach the legalizer how to widen it. llvm-svn: 292563
* [AMDGPU] Prevent spills before exec mask is restoredStanislav Mekhanoshin2017-01-201-0/+78
| | | | | | | | | | | | | Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
* [AArch64][GlobalISel] Split FP conversion legalizer tests. NFC.Ahmed Bougacha2017-01-203-88/+426
| | | | | | | | | Big functions with large vreg # are quite unwieldy to update. Change it to have one function per test (it does increase boilerplate, but makes the core hopefully more readable and maintanable). llvm-svn: 292552
* [AArch64][GlobalISel] Split legalizer combine tests. NFC.Ahmed Bougacha2017-01-201-69/+86
| | | | | | | | | | | Big functions with large vreg # are quite unwieldy to update. This test also relied on legal s8 operations which we're considering removing. Change it to have one function per test (it does increase boilerplate, but makes the core hopefully more readable and maintanable), and use 100% legal operations throughout. llvm-svn: 292551
* [MIRParser] Allow generic register specification on operand.Ahmed Bougacha2017-01-201-0/+3
| | | | | | | | This completes r292321 by adding support for generic registers, e.g.: %2:_(s32) = G_ADD %0, %1 llvm-svn: 292550
* AArch64: fall back to DAG ISel for inline assembly.Tim Northover2017-01-191-0/+10
| | | | | | | We can't currently handle "calls" to inlineasm strings so it's better to let the DAG handle it than generate rubbish. llvm-svn: 292540
* [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)Simon Pilgrim2017-01-191-12/+2
| | | | | | | | | | | | This patch improves the knownbits logic for unsigned integer min/max opcodes. For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits. This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set. Differential Revision: https://reviews.llvm.org/D28853 llvm-svn: 292528
* [XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify ↵Serge Rogatch2017-01-192-0/+12
| | | | | | | | | | | | | | | | | | | such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 llvm-svn: 292516
* [X86][SSE] Attempt to pre-truncate arithmetic operations that have already ↵Simon Pilgrim2017-01-191-229/+81
| | | | | | | | been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493
* [X86][SSE] Added tests for pre-truncating arithmetic operations that have ↵Simon Pilgrim2017-01-191-0/+127
| | | | | | | | already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292487
* [DAG] Don't increase SDNodeOrder for dbg.value/declare.Mikael Holmen2017-01-192-0/+193
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The SDNodeOrder is saved in the IROrder field in the SDNode, and this field may affects scheduling. Thus, letting dbg.value/declare increase the order numbers may in turn affect scheduling. Because of this change we also need to update the code deciding when dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues. Dbg values now have the same order as the SDNode they are connected to, not the following orders. Test cases provided by Florian Hahn. Reviewers: bogner, aprantl, sunfish, atrick Reviewed By: atrick Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D25318 llvm-svn: 292485
* [GlobalISel] Pointers are legal operands for G_SELECT on AArch64Kristof Beyls2017-01-191-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D28805 llvm-svn: 292481
OpenPOWER on IntegriCloud