| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
vector to v8i1 pre-legalize.
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.
I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.
llvm-svn: 321611
|
| |
|
|
|
|
| |
As it has a scalar source we don't treat it as a target shuffle so needs special handling.
llvm-svn: 321610
|
| |
|
|
|
|
| |
Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants.
llvm-svn: 321607
|
| |
|
|
|
|
|
|
| |
legalization sees the i4 and changes to load/store.
Same for v2i1 and i2.
llvm-svn: 321602
|
| |
|
|
|
|
|
|
| |
legalization sees the i4 and changes to load/store.
Same for i2 and v2i1.
llvm-svn: 321601
|
| |
|
|
|
|
|
|
| |
don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.
llvm-svn: 321598
|
| |
|
|
|
|
| |
This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns.
llvm-svn: 321597
|
| |
|
|
|
|
| |
I don't think anything would actually expect the other bits to be zero.
llvm-svn: 321596
|
| |
|
|
| |
llvm-svn: 321595
|
| |
|
|
|
|
| |
Use getMemBasePlusOffset and calculate proper pointer info and alignment for the second store.
llvm-svn: 321594
|
| |
|
|
|
|
|
|
| |
natively.
We should only be creating natively supported kshifts now.
llvm-svn: 321577
|
| |
|
|
|
|
|
|
| |
This allows us to remove some isel patterns.
This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI.
llvm-svn: 321576
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.
This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.
By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.
We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.
Differential Revision: https://reviews.llvm.org/D38318
llvm-svn: 321553
|
| |
|
|
|
|
|
|
| |
and punpckldq.
Differential Revision: https://reviews.llvm.org/D41595
llvm-svn: 321549
|
| |
|
|
|
|
|
|
|
|
|
|
| |
narrower extend.
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.
This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.
If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.
llvm-svn: 321538
|
| |
|
|
| |
llvm-svn: 321537
|
| |
|
|
|
|
|
|
| |
-Use MinAlign instead of std::min.
-Use SelectionDAG::getMemBasePlusOffset.
-Apply offset to the pointer info for the second load/store created.
llvm-svn: 321536
|
| |
|
|
|
|
|
|
| |
LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask.
The truncate will be lowered X86ISD::VTRUNC later.
llvm-svn: 321534
|
| |
|
|
|
|
| |
The custom lowering already widens the result type to 512-bits if VLX isn't supported.
llvm-svn: 321533
|
| |
|
|
|
|
|
|
|
|
| |
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.
Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.
llvm-svn: 321532
|
| |
|
|
|
|
|
|
|
|
| |
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.
The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.
Differential Revision: https://reviews.llvm.org/D41484
llvm-svn: 321516
|
| |
|
|
|
|
| |
Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features
llvm-svn: 321501
|
| |
|
|
|
|
|
|
| |
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.
So just do it in LowerMUL where we can catch more cases.
llvm-svn: 321496
|
| |
|
|
|
|
| |
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.
llvm-svn: 321490
|
| |
|
|
|
|
|
|
| |
Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of.
I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do.
llvm-svn: 321463
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D41579
llvm-svn: 321459
|
| |
|
|
| |
llvm-svn: 321452
|
| |
|
|
|
|
| |
SSE/AVX counterparts.
llvm-svn: 321451
|
| |
|
|
| |
llvm-svn: 321450
|
| |
|
|
|
|
|
|
|
|
| |
upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.
This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.
llvm-svn: 321437
|
| |
|
|
| |
llvm-svn: 321433
|
| |
|
|
| |
llvm-svn: 321432
|
| |
|
|
|
|
|
|
| |
(PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.
llvm-svn: 321424
|
| |
|
|
|
|
| |
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.
llvm-svn: 321420
|
| |
|
|
|
|
| |
consistent with the other AVX512 ISAs.
llvm-svn: 321416
|
| |
|
|
|
|
| |
RHS not just all zeros/ones.
llvm-svn: 321415
|
| |
|
|
|
|
| |
This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types.
llvm-svn: 321408
|
| |
|
|
|
|
|
|
| |
truncate on N1.
Later in the code we explicitly bypass the truncate so we should be checking its type to make sure that it's safe.
llvm-svn: 321407
|
| |
|
|
|
|
|
|
| |
Immediately after it is created we check if its equal to another EVT. Then we inconsistently use one or the other variables in the code below.
Instead do the equality check directly on the getValueType result and remove the variable. Use the origina VT variable throughout the remaining code.
llvm-svn: 321406
|
| |
|
|
| |
llvm-svn: 321405
|
| |
|
|
|
|
| |
This will be used to help tidyup existing pseudos that we've added scheduling info to.
llvm-svn: 321401
|
| |
|
|
|
|
| |
Apparently we don't have tests for this which I didn't realize before. I'll try to fix that but wanted to fix the obvious bug.
llvm-svn: 321399
|
| |
|
|
| |
llvm-svn: 321398
|
| |
|
|
|
|
|
|
|
|
| |
to get the type of the operand.
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.
I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.
llvm-svn: 321397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
|
| |
|
|
|
|
|
|
| |
non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements.
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.
llvm-svn: 321369
|
| |
|
|
|
|
| |
Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode.
llvm-svn: 321368
|
| |
|
|
| |
llvm-svn: 321340
|
| |
|
|
| |
llvm-svn: 321336
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.
The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.
Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.
I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.
llvm-svn: 321335
|