summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] make test immune to improvements in undef simplificationSanjay Patel2018-11-191-2/+2
| | | | llvm-svn: 347218
* [x86] add/make tests immune to improvements in undef simplificationSanjay Patel2018-11-193-77/+161
| | | | llvm-svn: 347217
* [SelectionDAG] simplify select FP with undef conditionSanjay Patel2018-11-191-0/+1
| | | | llvm-svn: 347212
* [x86] add test for select FP with undef condition; NFCSanjay Patel2018-11-191-0/+8
| | | | llvm-svn: 347211
* Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.Martin Elshuber2018-11-192-0/+411
| | | | | | | | | | | | | | This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
* [X86] Add codegen tests for slow-shld scalar funnel shiftsSimon Pilgrim2018-11-192-198/+521
| | | | llvm-svn: 347195
* [ARM] Remove trunc sinks in ARM CGPSam Parker2018-11-194-107/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191
* [MSP430] Optimize srl/sra in case of A >> (8 + N)Anton Korobeynikov2018-11-191-0/+25
| | | | | | | | | | | There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187
* [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the ↵Craig Topper2018-11-192-15/+15
| | | | | | | | sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185
* [X86] Use compare with 0 to fill an element with sign bits when sign ↵Craig Topper2018-11-197-564/+569
| | | | | | | | extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181
* [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under ↵Craig Topper2018-11-192-204/+114
| | | | | | | | -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp ↵Simon Pilgrim2018-11-183-75/+82
| | | | | | conversions. llvm-svn: 347177
* [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.Craig Topper2018-11-181-201/+114
| | | | | | | | Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176
* [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and ↵Craig Topper2018-11-182-2/+2073
| | | | | | | | vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support. Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend. llvm-svn: 347175
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.Simon Pilgrim2018-11-184-33/+18
| | | | | | SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173
* [X86] Disable combineToExtendVectorInReg under ↵Craig Topper2018-11-183-94/+80
| | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172
* [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an ↵Craig Topper2018-11-1819-814/+682
| | | | | | | | | | | | | | | | extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171
* [DAG] add undef simplifications for select nodesSanjay Patel2018-11-183-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170
* [x86] regenerate full checks; NFCSanjay Patel2018-11-181-5/+26
| | | | llvm-svn: 347167
* [SystemZ] make test immune to improvements in undef simplificationSanjay Patel2018-11-181-2/+2
| | | | llvm-svn: 347166
* [Hexagon] make tests immune to improvements in undef simplificationSanjay Patel2018-11-183-8/+8
| | | | llvm-svn: 347165
* [ARM] make test immune to improvements in undef simplificationSanjay Patel2018-11-181-2/+2
| | | | llvm-svn: 347164
* [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.Simon Pilgrim2018-11-183-107/+40
| | | | | | Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162
* [x86] make tests immune to improvements in undef handlingSanjay Patel2018-11-182-19/+30
| | | | llvm-svn: 347161
* [X86][SSE] Add some generic masked gather codegen testsSimon Pilgrim2018-11-181-0/+1156
| | | | llvm-svn: 347159
* [X86][SSE] Use raw shuffle mask decode in ↵Simon Pilgrim2018-11-187-198/+187
| | | | | | | | SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158
* [WebAssembly] Add null streamer supportHeejin Ahn2018-11-181-0/+19
| | | | | | | | | | | | Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54660 llvm-svn: 347155
* [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends ↵Craig Topper2018-11-182-48/+42
| | | | | | | | from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149
* [X86] Add support for matching PACKUSWB from a v64i8 shuffle.Craig Topper2018-11-171-8/+3
| | | | llvm-svn: 347143
* [X86] Add test case to show missed opportunity to use PACKUSWB in v64i8 ↵Craig Topper2018-11-171-0/+47
| | | | | | shuffle lowering. llvm-svn: 347142
* [X86][SSE] Add shuffle demanded elts test case for PR39549Simon Pilgrim2018-11-171-0/+22
| | | | llvm-svn: 347139
* [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and ↵Craig Topper2018-11-172-184/+59
| | | | | | prefer-vector-width=256. llvm-svn: 347131
* [X86] Add test cases to show incorrect use of a 512 bit vector in v32i8 ↵Craig Topper2018-11-172-0/+216
| | | | | | | | multiply lowering with prefer-vector-width=256. On the min-legal-vector-width test this actually causes some of the v32i16 operations we emitted to be scalarized. llvm-svn: 347130
* Moved dag-combine-select-undef.ll into amdgpu. NFC.Stanislav Mekhanoshin2018-11-172-19/+20
| | | | | | Tests really needs target arch to be specified. llvm-svn: 347115
* Fixed test after r347110Stanislav Mekhanoshin2018-11-161-7/+2
| | | | | | | | Comments in llc outputs are printed differently on different platforms, some with '#', some with '##'. Removed non-essential part of the checks. llvm-svn: 347112
* DAG combiner: fold (select, C, X, undef) -> XStanislav Mekhanoshin2018-11-1612-415/+248
| | | | | | Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
* [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under ↵Craig Topper2018-11-164-401/+97
| | | | | | | | | | -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105
* [WebAssembly] Cleanup unused declares in test code. NFC.Sam Clegg2018-11-162-4/+8
| | | | | | | | | In one case probably you have be using it, in the other it looks like it was redundant. Differential Revision: https://reviews.llvm.org/D54644 llvm-svn: 347098
* [PowerPC][NFC] Add tests for vector fp <-> int conversionsNemanja Ivanovic2018-11-1616-0/+14768
| | | | | | | | | This NFC patch just adds test cases for conversions that currently require scalarization of vectors. An updcoming patch will change the legalization for these and it is more suitable on the review to show the diferences in code gen rather than just the new code gen. llvm-svn: 347090
* AArch64: Emit a call frame instruction for the shadow call stack register.Peter Collingbourne2018-11-161-0/+1
| | | | | | | | | | When unwinding past a function that uses shadow call stack, we must subtract 8 from the value of the x18 register. This patch causes us to emit a call frame instruction that causes that to happen. Differential Revision: https://reviews.llvm.org/D54609 llvm-svn: 347089
* [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib callsAnton Korobeynikov2018-11-161-0/+35
| | | | | | | | Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54626 llvm-svn: 347080
* [X86] Disable Condbr_merge passRong Xu2018-11-163-15/+14
| | | | | | | Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079
* Revert "[PowerPC] Make no-PIC default to match GCC - LLVM"Stefan Pintilie2018-11-1680-515/+580
| | | | | | This reverts commit r347069 llvm-svn: 347076
* [WebAssembly] Default to static reloc modelSam Clegg2018-11-162-5/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D54637 llvm-svn: 347073
* [PowerPC] Make no-PIC default to match GCC - LLVMStefan Pintilie2018-11-1680-580/+515
| | | | | | | | Set -fno-PIC as the default option. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 347069
* [X86] Add codegen tests for scalar funnel shiftsSimon Pilgrim2018-11-162-0/+532
| | | | llvm-svn: 347066
* [x86] regenerate complete checks for test; NFCSanjay Patel2018-11-161-30/+49
| | | | llvm-svn: 347051
* [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`Roman Lebedev2018-11-161-230/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048
* [RISCV] Constant materialisation for RV64IAlex Bradbury2018-11-161-7/+204
| | | | | | | | | | | | | | | | | | | | | | | This commit introduces support for materialising 64-bit constants for RV64I, making use of the RISCVMatInt::generateInstSeq helper in order to share logic for immediate materialisation with the MC layer (where it's used for the li pseudoinstruction). test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit constant tests. It would be preferable if anyext constant returns were sign rather than zero extended (see PR39092). This patch simply adds an explicit signext to the returns in imm.ll. Further optimisations for constant materialisation are possible, most notably for mask-like values which can be generated my loading -1 and shifting right. A future patch will standardise on the C++ codepath for immediate selection on RV32 as well as RV64, and then add further such optimisations to RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for codegen and li expansion. Differential Revision: https://reviews.llvm.org/D52962 llvm-svn: 347042
* [MSP430] Add more tests for ABI and calling conventionAnton Korobeynikov2018-11-165-2/+222
| | | | | | | | Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54582 llvm-svn: 347040
OpenPOWER on IntegriCloud