summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [PHIElimination] Update the regression test for PR16508Bjorn Pettersson2018-09-302-28/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When PR16508 was solved (in rL185363) a regression test was added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll. I discovered that the test case no longer reproduced the scenario from PR16508. This problem could have been amended by adding an extra RUN line with "-O1" (or possibly "-O0"), but instead I added a mir-reproducer test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir to get a reproducer that is less sensitive to changes in earlier passes (including O-level). While being at it I also corrected a code comment in PHIElimination::EliminatePHINodes that has been incorrect since the related bugfix from rL185363. Reviewers: MatzeB, hfinkel Reviewed By: MatzeB Subscribers: nemanjai, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52553 llvm-svn: 343416
* [NFC][CodeGen][X86][AArch64] Add 64-bit constant bit field extract pattern testsRoman Lebedev2018-09-302-0/+174
| | | | llvm-svn: 343404
* [X86] Regenerate MMX coalescing testSimon Pilgrim2018-09-301-12/+26
| | | | | | Exposes another extractelement(bitcast(scalartovector())) pattern llvm-svn: 343403
* [X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're ↵Craig Topper2018-09-303-54/+115
| | | | | | | | | | | | | | | | | | | | | on compiling for a CPU with single uop BEXTR Summary: This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction. The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop. For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case. Reviewers: RKSimon, spatel, lebedev.ri, andreadb Reviewed By: RKSimon Subscribers: llvm-commits, andreadb, RKSimon Differential Revision: https://reviews.llvm.org/D52570 llvm-svn: 343399
* [DAGCombiner][NFC] Tests for X div/rem Y single bit foldDavid Bolvansky2018-09-294-0/+675
| | | | llvm-svn: 343392
* [X86][AVX2] Cleanup shuffle combining tests - add common prefixesSimon Pilgrim2018-09-291-567/+278
| | | | llvm-svn: 343391
* [X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target ↵Simon Pilgrim2018-09-292-5/+3
| | | | | | | | shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390
* [X86] Add fast-isel test cases for unaligned load/store intrinsics recently ↵Craig Topper2018-09-291-0/+277
| | | | | | | | | | | | | | added to clang This adds tests for: _mm_loadu_si16 _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 llvm-svn: 343389
* [X86] getTargetConstantBitsFromNode - add support for rearranging constant ↵Simon Pilgrim2018-09-295-86/+62
| | | | | | | | bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384
* [X86] Regenerate fma comments.Simon Pilgrim2018-09-292-233/+233
| | | | llvm-svn: 343376
* [X86] getTargetConstantBitsFromNode - add support for peeking through ↵Simon Pilgrim2018-09-292-10/+9
| | | | | | ISD::EXTRACT_SUBVECTOR llvm-svn: 343375
* [X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targetsSimon Pilgrim2018-09-292-18/+36
| | | | | | The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373
* [ARM] Fix correctness checks in promoteToConstantPool.Eli Friedman2018-09-282-16/+23
| | | | | | | | | | | | | | | | | Correctly check for relocations in the constant to promote. And don't allow promoting a constant multiple times. This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ; it's not a complete fix because we also need to prevent ARMConstantIslands from cloning the constant. (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51472 llvm-svn: 343361
* [ARM] Use preferred alignment for constants in promoteToConstantPool.Eli Friedman2018-09-281-0/+11
| | | | | | | | | | | | | | | This mostly affects IR generated by non-clang frontends because clang generally sets the alignment of globals explicitly. Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 . (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51469 llvm-svn: 343359
* [X86] Add test cases for failures to use narrow test with immediate ↵Craig Topper2018-09-281-0/+236
| | | | | | | | instructions when a truncate is beteen the CMP and the AND and the sign flag is used. The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this. llvm-svn: 343355
* [AArch64] Split zero cycle feature more granularlyEvandro Menezes2018-09-282-42/+180
| | | | | | | | | | Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp` and `zcz-fp`, respectively, while retaining the original feature option to mean both. Differential revision: https://reviews.llvm.org/D52621 llvm-svn: 343354
* Revert r343317Luke Cheeseman2018-09-282-13/+11
| | | | | | - asan buildbots are breaking and I need to investigate the issue llvm-svn: 343341
* [GISel]: Remove an incorrect assert in CallLoweringAditya Nandakumar2018-09-281-0/+13
| | | | | | | | | | | https://reviews.llvm.org/D51147 Asserting if any extend of vectors should be up to the target's legalizer/target specific code not in CallLowering. reviewed by : dsanders. llvm-svn: 343325
* Reapply changes reverted by r343235Luke Cheeseman2018-09-282-11/+13
| | | | | | | | - Add fix so that all code paths that create DWARFContext with an ObjectFile initialise the target architecture in the context - Add an assert that the Arch is known in the Dwarf CallFrameString method llvm-svn: 343317
* [MIPS GlobalISel] Lower i64 argumentsPetar Jovanovic2018-09-281-0/+106
| | | | | | | | | | | | | Lower integer arguments larger then 32 bits for MIPS32. setMostSignificantFirst is used in order for G_UNMERGE_VALUES and G_MERGE_VALUES to always hold registers in same order, regardless of endianness. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D52409 llvm-svn: 343315
* Split invocations in CodeGen/X86/cpus.ll among multiple tests. (NFC)Jonas Devlieghere2018-09-285-137/+142
| | | | | | | | | | | | On GreenDragon `CodeGen/X86/cpus.ll` is timing out on the bot with Asan and UBSan enabled. With the same configuration on my machine, the test passes but takes more than 3 minutes to do so. I could increase the timeout, but I believe it makes more sense to split up the test because it allows for more parallelism. Differential revision: https://reviews.llvm.org/D52603 llvm-svn: 343313
* [X86][Btver2] Fix BSF/BSR scheduleSimon Pilgrim2018-09-281-12/+12
| | | | | | | | Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311
* [ARM] Allow execute only code on Cortex-m23David Spickett2018-09-283-0/+4
| | | | | | | | | | | The NoMovt feature prevents the use of MOVW/MOVT instructions on Cortex-M23 for performance reasons. These instructions are required for execute only code so NoMovt should be disabled when that option is enabled. Differential Revision: https://reviews.llvm.org/D52551 llvm-svn: 343302
* [X86][BtVer2] Fix PHMINPOS schedule resources typoSimon Pilgrim2018-09-281-2/+2
| | | | | | PHMINPOS can run on either JFPU pipe llvm-svn: 343299
* [X86] Add the test case from PR38986.Craig Topper2018-09-271-0/+29
| | | | | | The assembly for this test should be optimal now after changes to the ScalarizeMaskedMemIntrin patch. llvm-svn: 343281
* [ScalarizeMaskedMemIntrin] When expanding masked gathers, start with the ↵Craig Topper2018-09-272-261/+217
| | | | | | | | passthru vector and insert the new load results into it. Previously we started with undef and did a final merge with the passthru at the end. llvm-svn: 343273
* [ScalarizeMaskedMemIntrin] When expanding masked loads, start with the ↵Craig Topper2018-09-271-42/+12
| | | | | | | | passthru value and insert each conditional load result over their element. Previously we started with undef and did one final merge at the end with a select. llvm-svn: 343271
* [AMDGPU] Fold copy (copy vgpr)Stanislav Mekhanoshin2018-09-271-0/+27
| | | | | | | | This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249
* [ScalarizeMaskedMemIntrin] Don't emit 'icmp eq i1 %x, 1' to check mask ↵Craig Topper2018-09-273-54/+39
| | | | | | | | values. That's just %x so use that directly. Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical. llvm-svn: 343244
* Revert r343192 as an ubsan build is currently failingLuke Cheeseman2018-09-272-13/+11
| | | | llvm-svn: 343235
* [Sparc] Remove the support for builtin setjmp/longjmpDaniel Cederman2018-09-271-93/+0
| | | | | | | | | | | | | | | | Summary: It is currently broken and for Sparc there is not much benefit in using a builtin version compared to a library version. Both versions needs to store the same four values in setjmp and flush the register windows in longjmp. If the need for a builtin setjmp/longjmp arises there is an improved implementation available at https://reviews.llvm.org/D50969. Reviewers: jyknight, joerg, venkatra Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51487 llvm-svn: 343210
* Reapply changes reverted in r343114, lldb patch to follow shortlyLuke Cheeseman2018-09-272-11/+13
| | | | llvm-svn: 343192
* [X86] Update tzcnt fast-isel tests to match clang r343126.Craig Topper2018-09-262-74/+22
| | | | | | We now generate cttz with the zero_undef flag set to false. This allows -O0 to avoid the zero check. llvm-svn: 343127
* Revert r343112 as CallFrameString API change has broken lldb buildsLuke Cheeseman2018-09-262-13/+11
| | | | llvm-svn: 343114
* [AArch64] - Return address signing dwarf supportLuke Cheeseman2018-09-262-11/+13
| | | | | | - Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll llvm-svn: 343112
* [CodeGen] Always print register ties in MI::dump()Francis Visoiu Mistrih2018-09-263-8/+8
| | | | | | | | | It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107
* Revert r343089 "[AArch64] - Return address signing dwarf support"Hans Wennborg2018-09-262-13/+11
| | | | | | | | | | | | | | | | | | | This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail. > Functions that have signed return addresses need additional dwarf support: > - After signing the LR, and before authenticating it, the LR register is in a > state the is unusable by a debugger or unwinder > - To account for this a new directive, .cfi_negate_ra_state, is added > - This directive says the signed state of the LR register has now changed, > i.e. unsigned -> signed or signed -> unsigned > - This directive has the same CFA code as the SPARC directive GNU_window_save > (0x2d), adding a macro to account for multiply defined codes > - This patch matches the gcc implementation of this support: > https://patchwork.ozlabs.org/patch/800271/ > > Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343103
* [PowerPC] optimize conditional branch on CRSET/CRUNSETHiroshi Inoue2018-09-262-0/+264
| | | | | | | | | | | | | | This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET. Other optimizers, such as block placement, may generate such code and hence I do this at the very end of the optimization in pre-emit peephole pass. A conditional branch based on a constant is eliminated or converted into unconditional branch. Also CRSET/CRUNSET is eliminated if the condition code register is not used by instruction other than the branch to be optimized. Differential Revision: https://reviews.llvm.org/D52345 llvm-svn: 343100
* [X86][SSE] Refresh PR34947 test code to handle D52504Simon Pilgrim2018-09-261-181/+327
| | | | | | The previously reduced version used urem <9 x i32> zeroinitializer, %tmp which D52504 will simplify. llvm-svn: 343097
* [X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151)Simon Pilgrim2018-09-263-310/+172
| | | | | | | | | | Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093
* [ARM] Fix for PR39060Sam Parker2018-09-263-9/+182
| | | | | | | | | | | | | When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub. bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060 Differential Revision: https://reviews.llvm.org/D52463 llvm-svn: 343092
* [CodeGen] Enable tail calls for functions with NonNull attributes.David Green2018-09-261-0/+12
| | | | | | | | | | | Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
* Fixes removal of dead elements from PressureDiff (PR37252).Yury Gribov2018-09-264-30/+30
| | | | | | | | Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 343090
* [AArch64] - Return address signing dwarf supportLuke Cheeseman2018-09-262-11/+13
| | | | | | | | | | | | | | | | | Functions that have signed return addresses need additional dwarf support: - After signing the LR, and before authenticating it, the LR register is in a state the is unusable by a debugger or unwinder - To account for this a new directive, .cfi_negate_ra_state, is added - This directive says the signed state of the LR register has now changed, i.e. unsigned -> signed or signed -> unsigned - This directive has the same CFA code as the SPARC directive GNU_window_save (0x2d), adding a macro to account for multiply defined codes - This patch matches the gcc implementation of this support: https://patchwork.ozlabs.org/patch/800271/ Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343089
* Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"Hans Wennborg2018-09-263-558/+0
| | | | | | | | | | | | | | | | | | | | This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082
* Revert "Revert "[ConstHoist] Do not rebase single (or few) dependent constant""Zhaoshi Zheng2018-09-261-0/+191
| | | | | | | | | | | | | | | | | | | This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346. Reland r342994: disabled the optimization and explicitly enable it in test. -mllvm -consthoist-min-num-to-rebase<unsigned>=0 [ConstHoist] Do not rebase single (or few) dependent constant If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 343053
* [WebAssembly] SIMD conversionsThomas Lively2018-09-261-0/+100
| | | | | | | | | | | | | | | | Summary: Lowers (s|u)itofp and fpto(s|u)i instructions for vectors. The fp to int conversions produce poison values if their arguments are out of the convertible range, so a future CL will have to add an LLVM intrinsic to make the saturating behavior of this conversion usable. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52372 llvm-svn: 343052
* [AMDGPU] Fix ds combine with subregsStanislav Mekhanoshin2018-09-251-3/+99
| | | | | | Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047
* [X86] Allow movmskpd/ps ISD nodes to be created and selected with integer ↵Craig Topper2018-09-252-58/+12
| | | | | | | | | | input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046
* [X86] Add some more movmsk test cases. NFCCraig Topper2018-09-251-0/+232
| | | | | | | | These IR patterns represent the exact behavior of a movmsk instruction using (zext (bitcast (icmp slt X, 0))). For the v4i32/v8i32/v2i64/v4i64 we currently emit a PCMPGT for the icmp slt which is unnecessary since we only care about the sign bit of the result. This is because of the int->fp bitcast we put on the input to the movmsk nodes for these cases. I'll be fixing this in a future patch. llvm-svn: 343045
OpenPOWER on IntegriCloud