| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.
After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.
This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55165
llvm-svn: 348159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.
The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54836
llvm-svn: 348158
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.
This patch returns 0 cost for the icmp in these cases.
Review: Ulrich Weigand
https://reviews.llvm.org/D55111
llvm-svn: 348141
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.
Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html
Reviewers: olista01, samparker, aemerson
Reviewed By: samparker
Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54629
llvm-svn: 348137
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
V_ADD_I32_e64
V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
V_SUB_I32_e64
V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.
We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
%19:vgpr_32 = V_AND_B32_e32 255,
killed %16:vgpr_32, implicit $exec
%47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
%26.sub0:vreg_64, %19:vgpr_32, implicit $exec
%48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
%26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec
which then allows the SDWA encoding and becomes
%47:vgpr_32 = V_ADD_I32_sdwa
0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
implicit-def $vcc, implicit $exec
%48:vgpr_32 = V_ADDC_U32_e32
0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec
Differential Revision: https://reviews.llvm.org/D54882
llvm-svn: 348132
|
| |
|
|
|
|
|
|
|
|
|
| |
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.
https://reviews.llvm.org/D53190
llvm-svn: 348122
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
llvm-svn: 348121
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.
This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.
This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.
Differential revision: https://reviews.llvm.org/D54853
llvm-svn: 348114
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.
This adds debug printing so that this process is visible when using the
-debug option.
To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.
Differential revision: https://reviews.llvm.org/D54852
llvm-svn: 348113
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D55112
llvm-svn: 348110
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
|
| |
|
|
|
|
|
|
|
|
|
| |
is accessed as got-indirect or not.
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals.
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry.
Differential Revision: https://reviews.llvm.org/D54925
llvm-svn: 348108
|
| |
|
|
|
|
| |
bitcast and a store of a iX scalar.
llvm-svn: 348104
|
| |
|
|
| |
llvm-svn: 348103
|
| |
|
|
|
|
| |
Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.
llvm-svn: 348087
|
| |
|
|
|
|
|
|
| |
avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.
llvm-svn: 348086
|
| |
|
|
|
|
|
|
|
|
| |
similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.
By custom legalizing it we can avoid this churn and maybe produce better code.
llvm-svn: 348085
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.
This makes it so that we check for that condition before mapping instructions.
This allows us to outline more, since we don't pessimise as many instructions.
Also update some tests, since we outline more.
llvm-svn: 348081
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55138
llvm-svn: 348079
|
| |
|
|
|
|
|
|
|
| |
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.
Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.
Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>,
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions.
SRAW assumed that (sext_inreg foo, i32) could only be produced when
sign-extended an i32. However, it can be produced by input such as:
define i64 @tricky_ashr(i64 %a, i64 %b) {
%1 = shl i64 %a, 32
%2 = ashr i64 %1, 32
%3 = ashr i64 %2, %b
ret i64 %3
}
It's important not to select sraw in the above case, because sraw only uses
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.
Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.
This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and
adds test cases that would demonstrate a miscompile if the incorrect patterns
were re-added.
llvm-svn: 348067
|
| |
|
|
|
|
|
|
|
|
|
| |
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.
Author: Denys Zariaiev<denys.zariaiev@gmail.com>
Differential Revision: https://reviews.llvm.org/D55144
llvm-svn: 348057
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.
If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.
There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.
Change-Id: I170e6816323beb1348677b358c9d380865cd1a19
Reviewers: arsenm, alex-t, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53283
llvm-svn: 348050
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:
1. VirtReg2Value is generated lazily; there were some cases where
a lookup was performed before all relevant virtual registers were
created, leading to an out-of-sync mapping. Those cases were:
- Complex code to lower formal arguments that generated CopyFromReg
nodes from live-in registers (fixed by never querying the mapping
for live-in registers).
- Code that generates CopyToReg for formal arguments that are used
outside the entry basic block (fixed by never querying the
mapping for Register nodes, which don't need the divergence info
anyway).
2. For complex values that are lowered to a sequence of registers,
all registers must be reflected in the VirtReg2Value mapping.
I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.
There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.
Reviewers: alex-t, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54340
llvm-svn: 348049
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.
This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.
This mostly offers some serious simplification.
Also update the test to show the added functionality.
llvm-svn: 348036
|
| |
|
|
|
|
|
|
|
|
| |
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.
Differential Revision: https://reviews.llvm.org/D54988
llvm-svn: 348035
|
| |
|
|
|
|
|
|
| |
instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.
llvm-svn: 348019
|
| |
|
|
|
|
|
|
| |
operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.
llvm-svn: 348018
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D55093
llvm-svn: 348014
|
| |
|
|
|
|
|
|
| |
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.
Differential revision: https://reviews.llvm.org/D53762
llvm-svn: 347993
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds CSR instructions aliases for the cases where the instruction
takes an immediate operand but the alias doesn't have the i suffix. This is
necessary for gas/gcc compatibility.
gas doesn't do a similar conversion for fsflags or fsrm, so this should be
complete.
Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.
llvm-svn: 347991
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions.
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction,
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).
Until recently unimp was undocumented and supported just by binutils, which
printed unimp for either the 16 or 32-bit form. Both forms are now documented
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.
Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.
llvm-svn: 347988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend
operands when it is able to do so. For some targets this is more expensive
than a sign-extension, which is also a valid choice. Introduce the
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy ==
MVT::i64, as it can be performed using a single instruction.
Differential Revision: https://reviews.llvm.org/D52978
llvm-svn: 347977
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in the RFC
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit
RISC-V has i64 as the only legal integer type. This patch introduces patterns
to support codegen of the new instructions
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw,
sraiw, ld, sd.
Custom selection code is needed for srliw as SimplifyDemandedBits will remove
lower bits from the mask, meaning the obvious pattern won't work:
def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
(SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for
RV64I other than those files using frameaddr or returnaddr intrinsics
(LegalizeDAG doesn't know how to promote the operands - a future patch
addresses this).
When promoting i32 sltu/sltiu operands, it would be more efficient to use
sign-extension rather than zero-extension for RV64. A future patch adds a hook
to allow this.
Differential Revision: https://reviews.llvm.org/D52977
llvm-svn: 347973
|
| |
|
|
|
|
| |
shuffle.
llvm-svn: 347967
|
| |
|
|
|
|
|
|
| |
get after DAG combine cleans it up.
Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.
llvm-svn: 347966
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:
sdiv %1, i32 4
gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.
Differential Revision: https://reviews.llvm.org/D54546
llvm-svn: 347965
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three minor changes to these extra costs:
* For ICmp instructions, instead of adding 2 all the time for extending each
operand, this is only done if that operand is neither a load or an
immediate.
* The operands extension costs for divides removed, because we now use a high
cost already for the divide (20).
* The costs for lhsr/ashr extra costs removed as this did not seem useful.
Review: Ulrich Weigand
https://reviews.llvm.org/D55053
llvm-svn: 347961
|
| |
|
|
|
|
| |
We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.
llvm-svn: 347959
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Suppressed warnings in release builds due to variable used
only in assert statement.
Subscribers: llvm-commits, eraman, mgorny
Differential Revision: https://reviews.llvm.org/D55100
llvm-svn: 347939
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.
Fix: correct the use of DenseMap.
Reviewers: davidxl, hans, wmi
Reviewed By: wmi
Subscribers: mgorny, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D55088
llvm-svn: 347938
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Expands for vector types all of the integer operations that are
expanded for scalars because they are not supported at all by
WebAssembly.
This CL has no tests because such tests would really be testing the
target-independent expansion, but I'm happy to add tests if reviewers
think it would be helpful.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55010
llvm-svn: 347923
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.
Patch by: Sander Bogaert (dzn)
Differential revision: https://reviews.llvm.org/D54776
llvm-svn: 347922
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Utilise a similar ('late') lowering strategy to D47882. The changes to
AtomicExpandPass allow this strategy to be utilised by other targets which
implement shouldExpandAtomicCmpXchgInIR.
All cmpxchg are lowered as 'strong' currently and failure ordering is ignored.
This is conservative but correct.
Differential Revision: https://reviews.llvm.org/D48131
llvm-svn: 347914
|
| |
|
|
|
|
|
|
| |
custom type legalization operation instead.
This seems to produce the same results on the tests we have.
llvm-svn: 347912
|
| |
|
|
|
|
|
|
|
| |
Also revert fix r347876
One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.
llvm-svn: 347911
|
| |
|
|
|
|
|
|
| |
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.
llvm-svn: 347906
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.
I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.
Differential Revision: https://reviews.llvm.org/D54276
llvm-svn: 347902
|
| |
|
|
|
|
|
|
|
|
|
|
| |
splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.
I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?
Differential Revision: https://reviews.llvm.org/D54919
llvm-svn: 347898
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.
A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).
Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
|