summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Implement strip.invariant.groupPiotr Padlewski2018-07-021-1/+9
| | | | | | | | | | | | | | | | Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
* [X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their ↵Craig Topper2018-07-011-8/+8
| | | | | | | | clang intrinsic names. I thought I fixed these yesterday, but I guess I missed a few. llvm-svn: 336071
* [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)Simon Pilgrim2018-06-301-268/+80
| | | | | | | | | | The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048
* [X86] Update some avx512 fast-isel tests to match their real clang IRgen.Craig Topper2018-06-302-332/+353
| | | | | | | | | | Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value. Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed. Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend. llvm-svn: 336045
* [X86] Change some chec-prefixes from X32 to X86 to match the FileCheck ↵Craig Topper2018-06-301-48/+48
| | | | | | | | command line. I think this test changed and these test cases were created around the same time and missed the change. llvm-svn: 336044
* [X86] Remove test cases from avx512vl-intrinsics-fast-isel.ll for intrinsics ↵Craig Topper2018-06-301-49/+0
| | | | | | that don't really exist in clang. llvm-svn: 336043
* AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.Tom Stellard2018-06-301-0/+20
| | | | | | | | | | | | | | | | | | | Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041
* [MachineOutliner] Add support for target-default outlining.Jessica Paquette2018-06-301-2/+7
| | | | | | | | | | | | | | | | | This adds functionality to the outliner that allows targets to specify certain functions that should be outlined from by default. If a target supports default outlining, then it specifies that in its TargetOptions. In the case that it does, and the user hasn't specified that they *never* want to outline, the outliner will be added to the pass pipeline and will run on those default functions. This is a preliminary patch for turning the outliner on by default under -Oz for AArch64. https://reviews.llvm.org/D48776 llvm-svn: 336040
* [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.Craig Topper2018-06-307-232/+2526
| | | | llvm-svn: 336035
* [WebAssembly] Update comments for non-splat pow2 vector test caseHeejin Ahn2018-06-291-1/+3
| | | | | | | | | | | | | | | | | | | Summary: After rL335727, (sdiv X, 1) is treated as a special case, so we can safely transform 'sdiv's in non-splat pow vectors into 'shr's even when some of its entries are '1'. The test expectations have been already fixed in rL335771, but the comments were out of date. Also changed the filename from `vector_sdiv.ll` to `vector-sdiv.ll` to be consistent with other test file names. Reviewers: RKSimon Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48692 llvm-svn: 336018
* AMDGPU: Don't use struct type for argument layoutMatt Arsenault2018-06-296-701/+700
| | | | | | | | | | This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999
* [X86] Limit the number of target specific nodes emitted in LowerShiftPartsCraig Topper2018-06-292-42/+22
| | | | | | | | | | The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998
* [mips] Support shrink-wrappingPetar Jovanovic2018-06-293-2/+394
| | | | | | | | | | Except for -O0, it's enabled by default. Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D47947 llvm-svn: 335989
* [AMDGPU] Enable LICM in the BE pipelineStanislav Mekhanoshin2018-06-298-32/+289
| | | | | | | | | | This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988
* [MachineOutliner] Add always and never options to -enable-machine-outlinerJessica Paquette2018-06-291-0/+12
| | | | | | | | | | | | | | | | | This is a recommit of r335887, which was erroneously committed earlier. To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target default outlining behaviour. https://reviews.llvm.org/D48682 llvm-svn: 335986
* [X86][SSE] Support v16i8/v32i8 vector rotationsSimon Pilgrim2018-06-293-1341/+1130
| | | | | | | | | | This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957
* [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in ↵Craig Topper2018-06-293-8/+490
| | | | | | | | IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944
* [MachineOutliner] Never add the outliner in -O0Jessica Paquette2018-06-281-0/+30
| | | | | | | | | | | | This is a recommit of r335879. We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also removes -O0 from the outliner DWARF test. llvm-svn: 335930
* [COFF] Fix constant sharing regression for MinGWMartin Storsjo2018-06-281-0/+9
| | | | | | | | | | This fixes a regression since SVN r334523, where the object files built targeting MinGW were rejected by GNU binutils tools. Prior to that commit, we only put constants in comdat for MSVC configurations. Differential Revision: https://reviews.llvm.org/D48567 llvm-svn: 335918
* [X86] Suppress load folding into and/or/xor if it will prevent matching ↵Craig Topper2018-06-281-68/+42
| | | | | | | | | | btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895
* Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC ↵Jonas Devlieghere2018-06-285-408/+8
| | | | | | | | | | | | | code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894
* [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)Simon Pilgrim2018-06-282-6/+33
| | | | | | | | | | We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884. llvm-svn: 335886
* Revert "[MachineOutliner] Add always and never options to ↵Jessica Paquette2018-06-281-30/+0
| | | | | | | | | -enable-machine-outliner" I accidentally committed this instead of D48683 because I haven't had coffee yet. llvm-svn: 335883
* Revert "[MachineOutliner] Never add the outliner in -O0"Jessica Paquette2018-06-281-16/+4
| | | | | | | | | | | This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091. It relies on r335872 since that introduces the machine outliner flags test. I meant to commit D48683 in that commit, but got mixed up and committed D48682 instead. So, I'm reverting this and r335872, since D48682 hasn't made it through review yet. llvm-svn: 335882
* [MachineOutliner] Never add the outliner in -O0Jessica Paquette2018-06-281-4/+16
| | | | | | | | | | | We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also updates machine-outliner-flags to reflect the change and improves the comment describing what that test does. llvm-svn: 335879
* SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)Matthias Braun2018-06-282-2/+29
| | | | | | | | | | | | | Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877
* [MachineOutliner] Add always and never options to -enable-machine-outlinerJessica Paquette2018-06-281-0/+30
| | | | | | | | | | | | | To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target-default outlining behaviour. llvm-svn: 335872
* Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"Haojian Wu2018-06-282-33/+6
| | | | | | | | This reverts commit r335821. This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce. llvm-svn: 335871
* [AMDGPU] Early expansion of 32 bit udiv/uremStanislav Mekhanoshin2018-06-283-128/+2431
| | | | | | | | | | | | This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868
* [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16Stanislav Mekhanoshin2018-06-282-8/+124
| | | | | | Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866
* [ARM] Parallel DSP PassSjoerd Meijer2018-06-2813-0/+677
| | | | | | | | | | | | | | | | | Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850
* AMDGPU: Error on calls from graphics shadersMatt Arsenault2018-06-281-0/+7
| | | | | | | | In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829
* AMDGPU: Fix assert on aggregate type kernel argumentsMatt Arsenault2018-06-281-0/+171
| | | | | | | | | | Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827
* [DAGCombiner] Ensure we use the correct CC result type in visitSDIVSimon Pilgrim2018-06-282-6/+33
| | | | | | | | We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821
* [RISCV] Add machine function pass to merge base + offsetSameer AbuAsal2018-06-271-37/+38
| | | | | | | | | | | | | | | | | | | | | | | | Summary: In r333455 we added a peephole to fix the corner cases that result from separating base + offset lowering of global address.The peephole didn't handle some of the cases because it only has a basic block view instead of a function level view. This patch replaces that logic with a machine function pass. In addition to handling the original cases it handles uses of the global address across blocks in function and folding an offset from LW\SW instruction. This pass won't run for OptNone compilation, so there will be a negative impact overall vs the old approach at O0. Reviewers: asb, apazos, mgrang Reviewed By: asb Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones Differential Revision: https://reviews.llvm.org/D47857 llvm-svn: 335786
* [WebAssembly] Try fixing test/CodeGen/WebAssembly/vector_sdiv.llFangrui Song2018-06-271-3/+3
| | | | llvm-svn: 335771
* [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in ↵Daniel Sanders2018-06-271-12/+12
| | | | | | | | | | | | | AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767
* [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zerosSanjay Patel2018-06-279-39/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761
* [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live acrossJessica Paquette2018-06-271-0/+183
| | | | | | | | | | | | | | It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758
* [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit ↵Craig Topper2018-06-271-68/+33
| | | | | | | | | | position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754
* [X86] Add test cases for D48606.Craig Topper2018-06-271-0/+995
| | | | llvm-svn: 335753
* [X86][SSE] Add missing AVX512 rotation testsSimon Pilgrim2018-06-273-230/+943
| | | | | | Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-2710-93/+93
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-275-22/+43
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* [AArch64] Reverting FP16 vcvth_n_s64_f16 to fixLuke Geeson2018-06-271-20/+0
| | | | llvm-svn: 335737
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* [NEON] Support vldNq intrinsics in AArch32 (LLVM part)Ivan A. Kosarev2018-06-271-0/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733
* [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in ↵Simon Pilgrim2018-06-271-3535/+1415
| | | | | | | | pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727
* [X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 testsSimon Pilgrim2018-06-271-25/+17
| | | | llvm-svn: 335721
* [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)Simon Pilgrim2018-06-271-54/+12
| | | | | | Fixes PR37569. llvm-svn: 335719
OpenPOWER on IntegriCloud