summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64, x86] add tests for shift-not (PR39657); NFCSanjay Patel2018-11-202-0/+50
| | | | llvm-svn: 347316
* [DAGCombine] Add calls to SimplifyDemandedVectorElts from ↵Simon Pilgrim2018-11-203-91/+67
| | | | | | | | visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313
* [PowerPC] Add Itineraries for STWU/STWUX etcJinsong Ji2018-11-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8. Since there are already multiple IIC for store update, this patch also merge IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU IIC_LdStSTDUX to IIC_LdStSTUX and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference. Differential Revision: https://reviews.llvm.org/D54700 llvm-svn: 347311
* [PowerPC][NFC]Add testcase for STWU scheduling checkJinsong Ji2018-11-201-0/+72
| | | | | | | | | | | | | | This patch add a STWU testcase for scheduling check. Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd, We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future). We will fix the missing itineraries of IIC_LdStStoreUpd in following patch, and update this testcase to show the scheduling difference only there. Differential Revision: https://reviews.llvm.org/D54699 llvm-svn: 347310
* [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS ↵Simon Pilgrim2018-11-202-40/+19
| | | | | | | | instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303
* [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits supportSimon Pilgrim2018-11-206-518/+79
| | | | | | | | | | For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301
* [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.Simon Pilgrim2018-11-202-153/+115
| | | | | | As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier. llvm-svn: 347300
* [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS ↵Simon Pilgrim2018-11-203-12/+9
| | | | | | | | instructions. As discussed on rL347240. llvm-svn: 347299
* [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 ↵Craig Topper2018-11-2013-332/+316
| | | | | | | | | | | | where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296
* [X86] Replace more calls to getZeroVector with regular getConstant.Craig Topper2018-11-201-26/+21
| | | | | | | | getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it. The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast. llvm-svn: 347290
* [PowerPC] Don't combine to bswap store on 1-byte truncating storeNemanja Ivanovic2018-11-201-0/+26
| | | | | | | | | | Turns out that there was no check for a store that truncates down to a single byte when combining a (store (bswap...)) into a byte-swapping store. This patch just adds that check. Fixes https://bugs.llvm.org/show_bug.cgi?id=39478. llvm-svn: 347288
* [SelectionDAG] Compute known bits and num sign bits for live out vector ↵Craig Topper2018-11-201-50/+4
| | | | | | | | | | | | | | | | | | | registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks Summary: We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply. This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362 Reviewers: spatel, efriedma, RKSimon, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D54725 llvm-svn: 347287
* Implement computeKnownBits for scalar_to_vectorStanislav Mekhanoshin2018-11-191-2/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D54728 llvm-svn: 347274
* [X86] Add test case to show missed opportunity to use a single pmuludq to ↵Craig Topper2018-11-191-0/+138
| | | | | | | | | | implement a multiply when a zext lives in another basic block. This can occur when one of the inputs to the multiply is loop invariant. Though my test cases just use two basic blocks with an unconditional jump which we won't merge until after isel in the codegen pipeline. For scalars, I believe SelectionDAGBuilder can add an AssertZExt to pass knowledge across basic blocks but its explicitly disabled for vectors. llvm-svn: 347266
* AMDGPU: Fix V_FMA_F16 selection on GFX9Konstantin Zhuravlyov2018-11-192-9/+9
| | | | | | | | GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D54545 llvm-svn: 347265
* [AMDGPU] Restored selection of scalar_to_vector (v2x16)Stanislav Mekhanoshin2018-11-191-0/+26
| | | | | | | | | This works if DAG combiner is enabled, but without combining we cannot select scalar_to_vector of <2 x half> and <2 x i16>. Differential Revision: https://reviews.llvm.org/D54718 llvm-svn: 347259
* [TargetLowering] expandFP_TO_UINT - improve fp16 supportSimon Pilgrim2018-11-191-248/+48
| | | | | | | | | | As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision: https://reviews.llvm.org/D54703 llvm-svn: 347251
* [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)Simon Pilgrim2018-11-195-1074/+872
| | | | | | | | SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision: https://reviews.llvm.org/D54707 llvm-svn: 347242
* [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 ↵Craig Topper2018-11-196-635/+692
| | | | | | | | | | | | | | non-avx2 algorithm to each 128-bit lane. Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together. This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb. Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split. Differential Revision: https://reviews.llvm.org/D54668 llvm-svn: 347240
* [ARM] Attempt to fix arm selfhost bots after rL347191Sam Parker2018-11-191-2/+2
| | | | llvm-svn: 347238
* [AMDGPU] Convert insert_vector_elt into set of selectsStanislav Mekhanoshin2018-11-199-140/+515
| | | | | | | | | This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision: https://reviews.llvm.org/D54606 llvm-svn: 347231
* [WebAssembly] replaced .param/.result by .functypeWouter van Oortmerssen2018-11-1964-1723/+1070
| | | | | | | | | | | | | | | | | | | | | Summary: This makes it easier/cleaner to generate a single signature from this directive. Also: - Adds the symbol name, such that we don't depend on the location of this directive anymore. - Actually constructs the signature in the assembler, and make the assembler own it. - Refactor the use of MVT vs ValType in the streamer and assembler to require less conversions overall. - Changed 700 or so tests to use it. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D54652 llvm-svn: 347228
* [SelectionDAG] simplify vector select with undef operand(s)Sanjay Patel2018-11-192-16/+1
| | | | llvm-svn: 347227
* [Hexagon] make test immune to improvements in undef simplificationSanjay Patel2018-11-191-2/+2
| | | | llvm-svn: 347218
* [x86] add/make tests immune to improvements in undef simplificationSanjay Patel2018-11-193-77/+161
| | | | llvm-svn: 347217
* [SelectionDAG] simplify select FP with undef conditionSanjay Patel2018-11-191-0/+1
| | | | llvm-svn: 347212
* [x86] add test for select FP with undef condition; NFCSanjay Patel2018-11-191-0/+8
| | | | llvm-svn: 347211
* Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.Martin Elshuber2018-11-192-0/+411
| | | | | | | | | | | | | | This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
* [X86] Add codegen tests for slow-shld scalar funnel shiftsSimon Pilgrim2018-11-192-198/+521
| | | | llvm-svn: 347195
* [ARM] Remove trunc sinks in ARM CGPSam Parker2018-11-194-107/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191
* [MSP430] Optimize srl/sra in case of A >> (8 + N)Anton Korobeynikov2018-11-191-0/+25
| | | | | | | | | | | There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187
* [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the ↵Craig Topper2018-11-192-15/+15
| | | | | | | | sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185
* [X86] Use compare with 0 to fill an element with sign bits when sign ↵Craig Topper2018-11-197-564/+569
| | | | | | | | extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181
* [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under ↵Craig Topper2018-11-192-204/+114
| | | | | | | | -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp ↵Simon Pilgrim2018-11-183-75/+82
| | | | | | conversions. llvm-svn: 347177
* [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.Craig Topper2018-11-181-201/+114
| | | | | | | | Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176
* [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and ↵Craig Topper2018-11-182-2/+2073
| | | | | | | | vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support. Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend. llvm-svn: 347175
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.Simon Pilgrim2018-11-184-33/+18
| | | | | | SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173
* [X86] Disable combineToExtendVectorInReg under ↵Craig Topper2018-11-183-94/+80
| | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172
* [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an ↵Craig Topper2018-11-1819-814/+682
| | | | | | | | | | | | | | | | extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171
* [DAG] add undef simplifications for select nodesSanjay Patel2018-11-183-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170
* [x86] regenerate full checks; NFCSanjay Patel2018-11-181-5/+26
| | | | llvm-svn: 347167
* [SystemZ] make test immune to improvements in undef simplificationSanjay Patel2018-11-181-2/+2
| | | | llvm-svn: 347166
* [Hexagon] make tests immune to improvements in undef simplificationSanjay Patel2018-11-183-8/+8
| | | | llvm-svn: 347165
* [ARM] make test immune to improvements in undef simplificationSanjay Patel2018-11-181-2/+2
| | | | llvm-svn: 347164
* [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.Simon Pilgrim2018-11-183-107/+40
| | | | | | Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162
* [x86] make tests immune to improvements in undef handlingSanjay Patel2018-11-182-19/+30
| | | | llvm-svn: 347161
* [X86][SSE] Add some generic masked gather codegen testsSimon Pilgrim2018-11-181-0/+1156
| | | | llvm-svn: 347159
* [X86][SSE] Use raw shuffle mask decode in ↵Simon Pilgrim2018-11-187-198/+187
| | | | | | | | SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158
* [WebAssembly] Add null streamer supportHeejin Ahn2018-11-181-0/+19
| | | | | | | | | | | | Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54660 llvm-svn: 347155
OpenPOWER on IntegriCloud