summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [X86][BtVer2] Fix PHMINPOS schedule resources typoSimon Pilgrim2018-09-281-2/+2
| | | | | | PHMINPOS can run on either JFPU pipe llvm-svn: 343299
* [X86] Add the test case from PR38986.Craig Topper2018-09-271-0/+29
| | | | | | The assembly for this test should be optimal now after changes to the ScalarizeMaskedMemIntrin patch. llvm-svn: 343281
* [ScalarizeMaskedMemIntrin] When expanding masked gathers, start with the ↵Craig Topper2018-09-272-261/+217
| | | | | | | | passthru vector and insert the new load results into it. Previously we started with undef and did a final merge with the passthru at the end. llvm-svn: 343273
* [ScalarizeMaskedMemIntrin] When expanding masked loads, start with the ↵Craig Topper2018-09-271-42/+12
| | | | | | | | passthru value and insert each conditional load result over their element. Previously we started with undef and did one final merge at the end with a select. llvm-svn: 343271
* [AMDGPU] Fold copy (copy vgpr)Stanislav Mekhanoshin2018-09-271-0/+27
| | | | | | | | This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249
* [ScalarizeMaskedMemIntrin] Don't emit 'icmp eq i1 %x, 1' to check mask ↵Craig Topper2018-09-273-54/+39
| | | | | | | | values. That's just %x so use that directly. Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical. llvm-svn: 343244
* Revert r343192 as an ubsan build is currently failingLuke Cheeseman2018-09-272-13/+11
| | | | llvm-svn: 343235
* [Sparc] Remove the support for builtin setjmp/longjmpDaniel Cederman2018-09-271-93/+0
| | | | | | | | | | | | | | | | Summary: It is currently broken and for Sparc there is not much benefit in using a builtin version compared to a library version. Both versions needs to store the same four values in setjmp and flush the register windows in longjmp. If the need for a builtin setjmp/longjmp arises there is an improved implementation available at https://reviews.llvm.org/D50969. Reviewers: jyknight, joerg, venkatra Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51487 llvm-svn: 343210
* Reapply changes reverted in r343114, lldb patch to follow shortlyLuke Cheeseman2018-09-272-11/+13
| | | | llvm-svn: 343192
* [X86] Update tzcnt fast-isel tests to match clang r343126.Craig Topper2018-09-262-74/+22
| | | | | | We now generate cttz with the zero_undef flag set to false. This allows -O0 to avoid the zero check. llvm-svn: 343127
* Revert r343112 as CallFrameString API change has broken lldb buildsLuke Cheeseman2018-09-262-13/+11
| | | | llvm-svn: 343114
* [AArch64] - Return address signing dwarf supportLuke Cheeseman2018-09-262-11/+13
| | | | | | - Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll llvm-svn: 343112
* [CodeGen] Always print register ties in MI::dump()Francis Visoiu Mistrih2018-09-263-8/+8
| | | | | | | | | It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107
* Revert r343089 "[AArch64] - Return address signing dwarf support"Hans Wennborg2018-09-262-13/+11
| | | | | | | | | | | | | | | | | | | This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail. > Functions that have signed return addresses need additional dwarf support: > - After signing the LR, and before authenticating it, the LR register is in a > state the is unusable by a debugger or unwinder > - To account for this a new directive, .cfi_negate_ra_state, is added > - This directive says the signed state of the LR register has now changed, > i.e. unsigned -> signed or signed -> unsigned > - This directive has the same CFA code as the SPARC directive GNU_window_save > (0x2d), adding a macro to account for multiply defined codes > - This patch matches the gcc implementation of this support: > https://patchwork.ozlabs.org/patch/800271/ > > Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343103
* [PowerPC] optimize conditional branch on CRSET/CRUNSETHiroshi Inoue2018-09-262-0/+264
| | | | | | | | | | | | | | This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET. Other optimizers, such as block placement, may generate such code and hence I do this at the very end of the optimization in pre-emit peephole pass. A conditional branch based on a constant is eliminated or converted into unconditional branch. Also CRSET/CRUNSET is eliminated if the condition code register is not used by instruction other than the branch to be optimized. Differential Revision: https://reviews.llvm.org/D52345 llvm-svn: 343100
* [X86][SSE] Refresh PR34947 test code to handle D52504Simon Pilgrim2018-09-261-181/+327
| | | | | | The previously reduced version used urem <9 x i32> zeroinitializer, %tmp which D52504 will simplify. llvm-svn: 343097
* [X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151)Simon Pilgrim2018-09-263-310/+172
| | | | | | | | | | Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093
* [ARM] Fix for PR39060Sam Parker2018-09-263-9/+182
| | | | | | | | | | | | | When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub. bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060 Differential Revision: https://reviews.llvm.org/D52463 llvm-svn: 343092
* [CodeGen] Enable tail calls for functions with NonNull attributes.David Green2018-09-261-0/+12
| | | | | | | | | | | Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
* Fixes removal of dead elements from PressureDiff (PR37252).Yury Gribov2018-09-264-30/+30
| | | | | | | | Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 343090
* [AArch64] - Return address signing dwarf supportLuke Cheeseman2018-09-262-11/+13
| | | | | | | | | | | | | | | | | Functions that have signed return addresses need additional dwarf support: - After signing the LR, and before authenticating it, the LR register is in a state the is unusable by a debugger or unwinder - To account for this a new directive, .cfi_negate_ra_state, is added - This directive says the signed state of the LR register has now changed, i.e. unsigned -> signed or signed -> unsigned - This directive has the same CFA code as the SPARC directive GNU_window_save (0x2d), adding a macro to account for multiply defined codes - This patch matches the gcc implementation of this support: https://patchwork.ozlabs.org/patch/800271/ Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343089
* Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"Hans Wennborg2018-09-263-558/+0
| | | | | | | | | | | | | | | | | | | | This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082
* Revert "Revert "[ConstHoist] Do not rebase single (or few) dependent constant""Zhaoshi Zheng2018-09-261-0/+191
| | | | | | | | | | | | | | | | | | | This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346. Reland r342994: disabled the optimization and explicitly enable it in test. -mllvm -consthoist-min-num-to-rebase<unsigned>=0 [ConstHoist] Do not rebase single (or few) dependent constant If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 343053
* [WebAssembly] SIMD conversionsThomas Lively2018-09-261-0/+100
| | | | | | | | | | | | | | | | Summary: Lowers (s|u)itofp and fpto(s|u)i instructions for vectors. The fp to int conversions produce poison values if their arguments are out of the convertible range, so a future CL will have to add an LLVM intrinsic to make the saturating behavior of this conversion usable. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52372 llvm-svn: 343052
* [AMDGPU] Fix ds combine with subregsStanislav Mekhanoshin2018-09-251-3/+99
| | | | | | Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047
* [X86] Allow movmskpd/ps ISD nodes to be created and selected with integer ↵Craig Topper2018-09-252-58/+12
| | | | | | | | | | input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046
* [X86] Add some more movmsk test cases. NFCCraig Topper2018-09-251-0/+232
| | | | | | | | These IR patterns represent the exact behavior of a movmsk instruction using (zext (bitcast (icmp slt X, 0))). For the v4i32/v8i32/v2i64/v4i64 we currently emit a PCMPGT for the icmp slt which is unnecessary since we only care about the sign bit of the result. This is because of the int->fp bitcast we put on the input to the movmsk nodes for these cases. I'll be fixing this in a future patch. llvm-svn: 343045
* AMDGPU: Add Selection patterns to support add of one bit.Changpeng Fang2018-09-251-0/+21
| | | | | | | | | | | | | | Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030
* [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)Sanjay Patel2018-09-251-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008
* Revert "[ConstHoist] Do not rebase single (or few) dependent constant"Jessica Paquette2018-09-251-191/+0
| | | | | | | | | | | | | This caused a couple test failures on a bot: CodeGen/X86/constant-hoisting-bfi.ll Transforms/ConstantHoisting/X86/ehpad.ll Example: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/53575/ llvm-svn: 343005
* [RegAllocGreedy] avoid using physreg candidates that cannot be correctly spilledDaniil Fukalov2018-09-251-0/+100
| | | | | | | | | | | | | | | | | | | For the AMDGPU target if a MBB contains exec mask restore preamble, SplitEditor may get state when it cannot insert a spill instruction. E.g. for a MIR bb.100: %1 = S_OR_SAVEEXEC_B64 %2, implicit-def $exec, implicit-def $scc, implicit $exec and if the regalloc will try to allocate a virtreg to the physreg already assigned to virtreg %1, it should insert spill instruction before the S_OR_SAVEEXEC_B64 instruction. But it is not possible since can generate incorrect code in terms of exec mask. The change makes regalloc to ignore such physreg candidates. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D52052 llvm-svn: 343004
* [ConstHoist] Do not rebase single (or few) dependent constantZhaoshi Zheng2018-09-251-0/+191
| | | | | | | | | | | If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 342994
* [X86] Add AVX512 support to combineVectorSizedSetCCEquality.Craig Topper2018-09-251-283/+149
| | | | | | | | | | | | Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989
* Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-251-8/+8
| | | | | | | | As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
* [AMDGPU] restore r342722 which was reverted with r342743Sameer Sahasrabuddhe2018-09-251-0/+64
| | | | | | | | | | | | | | | | | | | [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956
* [WebAssembly] SIMD sqrtThomas Lively2018-09-251-0/+24
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52387 llvm-svn: 342937
* [AMDGPU] Remove useless check from test. NFC.Stanislav Mekhanoshin2018-09-251-1/+0
| | | | | | | The check for assignment of zero is practically useless while the assignment moves around with different scheduling. llvm-svn: 342935
* [X86] Don't create FILD ISD nodes when X87 is disabled.Craig Topper2018-09-251-0/+26
| | | | | | | | The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934
* [WebAssembly][NFC] Fix hardcoded stack indices in testsThomas Lively2018-09-241-4/+4
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52388 llvm-svn: 342928
* Re-submitting changes in D51550 because it failed to patch.Christy Lee2018-09-243-3/+7
| | | | | | | | | | | | Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919
* [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-241-8/+8
| | | | | | The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
* [Power9] [LLVM] Add __float128 exponent GET and SET builtinsStefan Pintilie2018-09-241-0/+36
| | | | | | | | | | | | | Added __builtin_vsx_scalar_extract_expq __builtin_vsx_scalar_insert_exp_qp Builtins should behave the same way as in GCC. Differential Revision: https://reviews.llvm.org/D48185 llvm-svn: 342910
* [X86][AVX] Add truncation as shuffle test for PR31451Simon Pilgrim2018-09-241-0/+17
| | | | llvm-svn: 342908
* [Thumb1] Any imm8 should have cost of 1Zhaoshi Zheng2018-09-241-0/+39
| | | | | | | | | A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or [0, 255] in unsigned i8 type on Thumb1. Differential Revision: https://reviews.llvm.org/D52257 llvm-svn: 342898
* [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)Simon Pilgrim2018-09-241-4/+4
| | | | | | | | Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
* [DAGCombiner] use UADDO to optimize saturated unsigned addSanjay Patel2018-09-242-22/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
* [Mips][FastISel] Fix selectBranch on icmp i1Petar Jovanovic2018-09-241-0/+189
| | | | | | | | | | | | | | The r337288 tried to fix result of icmp i1 when its input is not sanitized by falling back to DagISel. While it now produces the correct result for bit 0, the other bits can still hold arbitrary value which is not supported by MipsFastISel branch lowering. This patch fixes the issue by falling back to DagISel in this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D52045 llvm-svn: 342884
* [PowerPC] Support operand modifier 'x' in inline asmZaara Syeda2018-09-241-0/+22
| | | | | | | | | | | gcc uses operand modifier 'x' in inline asm for VSX registers. Without this modifier, instructions which use VSX numbering for their operands are printed as VMX registers. This patch adds support for the operand modifier 'x'. Differential Revision: https://reviews.llvm.org/D52244 llvm-svn: 342882
* [NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constantsRoman Lebedev2018-09-242-0/+189
| | | | | | | | | | It would be best to introduce ISD::BitFieldExtract, because clearly more than one backend faces the same problem. But for now let's solve this in the x86-specific DAG combine. https://bugs.llvm.org/show_bug.cgi?id=38938 llvm-svn: 342880
* AMDGPU: Fix private handling for allowsMisalignedMemoryAccessesMatt Arsenault2018-09-241-17/+20
| | | | | | | | | | | | | If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879
OpenPOWER on IntegriCloud