| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 361822
|
|
|
|
|
|
|
|
|
|
|
| |
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."
Differential Revision: https://reviews.llvm.org/D62506
llvm-svn: 361815
|
|
|
|
|
|
|
| |
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.
llvm-svn: 361814
|
|
|
|
|
|
| |
variable warnings. NFCI.
llvm-svn: 361804
|
|
|
|
|
|
| |
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel
llvm-svn: 361734
|
|
|
|
|
|
|
|
|
|
| |
original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.
There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.
llvm-svn: 361716
|
|
|
|
|
|
|
|
| |
shift(build_vector(),C)
Commonly occurs in sign-extension cases
llvm-svn: 361706
|
|
|
|
|
|
|
|
|
| |
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.
Differential Revision: https://reviews.llvm.org/D62448
llvm-svn: 361704
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
|
|
|
|
|
|
|
|
|
|
| |
models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
debug info"
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61680
llvm-svn: 361527
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general dynamic/local dynamic TLS models, with -fno-plt,
* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
Note, on x86, if we can get rid of %ebx as the PIC register,
it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`
Reorganize the code by separating 32-bit and 64-bit.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D62106
llvm-svn: 361453
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.
This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.
This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.
llvm-svn: 361431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:
fatal error: error in backend: Target does not support the tiny CodeModel
[Inferior 2 (process 31509) exited with code 0106]
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
(gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
Target: arm-arm-none-eabi
Thread model: posix
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
clang-9: note: diagnostic msg:
But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:
fatal error: error in backend: Target does not support the tiny CodeModel
Differential Revision: https://reviews.llvm.org/D62236
llvm-svn: 361370
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).
Differential Revision: https://reviews.llvm.org/D62220
llvm-svn: 361352
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CET-IBT enabled
Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D61881
llvm-svn: 361342
|
|
|
|
|
|
| |
We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again.
llvm-svn: 361288
|
|
|
|
|
|
| |
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692
llvm-svn: 361270
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
|
|
|
|
|
|
| |
for each flavor from the main switch. NFC
llvm-svn: 361108
|
|
|
|
|
|
| |
Helps to improve folding of comparisons with movmsk results.
llvm-svn: 361056
|
|
|
|
|
|
| |
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns.
llvm-svn: 361052
|
|
|
|
|
|
| |
We can now rely on generic expansion to handle this.
llvm-svn: 361038
|
|
|
|
|
|
|
|
| |
fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.
llvm-svn: 361035
|
|
|
|
|
|
| |
Return the input value for the NOT pattern: (xor X, -1) -> X
llvm-svn: 361012
|
|
|
|
|
|
|
|
|
| |
ignore list for inlining compatibility.
These are tuning flags and won't cause any codegen issue if we inline a function
with a different value.
llvm-svn: 360992
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D62014
llvm-svn: 360983
|
|
|
|
|
|
|
|
| |
These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64
and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These
are covered in clang/test, but not llvm/test.
llvm-svn: 360960
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional
jumps, which indicates the offset should be a "short jump" (8-bit immediate
offset from EIP, -128 to +127). This patch expands to all recognized Jcc
condition codes, and removes the inline restriction.
Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a
couple of Jcc are actually checked, and only inline (i.e., not when using the
integrated assembler for asm sources). A quick search through asm-containing
libraries at hand shows a pretty broad range of Jcc conditions spelled with
"short."
GAS ignores the "short" modifier, and instead uses an encoding based on the
given immediate. MS inline seems to do the same, and I suspect MASM does, too.
NASM will yield an error if presented with an out-of-range immediate value.
Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not
fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY
Differential Revision: https://reviews.llvm.org/D61990
llvm-svn: 360954
|
|
|
|
|
|
|
|
| |
This better matches the verbiage in Intel documentation, and should help avoid
confusion between these two different kinds of values, both of which are parsed
from mnemonics.
llvm-svn: 360953
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)
Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed
I think handling them all in one place will make the code easier to
maintain in the future.
Reviewers: craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61690
llvm-svn: 360952
|
|
|
|
|
|
|
|
|
| |
Similarly change 0x2 to 0xA for ceil.
This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate
in patterns without masking.
llvm-svn: 360915
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
|
|
|
|
|
|
|
|
|
|
| |
after we know for sure the match will succeed
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.
Differential Revision: https://reviews.llvm.org/D61047
llvm-svn: 360823
|
|
|
|
|
|
|
|
|
|
|
|
| |
don't have any flexibility. NFC
These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the
element size is always known.
One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in
X86ISelLowering and probably low value.
llvm-svn: 360815
|
|
|
|
|
|
|
|
|
|
|
| |
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.
This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.
llvm-svn: 360747
|
|
|
|
| |
llvm-svn: 360740
|
|
|
|
|
|
|
|
| |
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360736
|
|
|
|
|
|
|
|
| |
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.
Differential Revision: https://reviews.llvm.org/D61862
llvm-svn: 360719
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
targets (PR40758)
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.
This combine avoids AND immediate mask which usually means we reduce encoding size.
Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.
Differential Revision: https://reviews.llvm.org/D61830
llvm-svn: 360684
|
|
|
|
|
|
|
|
| |
is initialized. NFCI.
Fixes scan-build warnings
llvm-svn: 360664
|
|
|
|
|
|
|
|
|
|
| |
targets
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.
Differential Revision: https://reviews.llvm.org/D61863
llvm-svn: 360649
|
|
|
|
|
|
|
|
| |
This follows the pattern of the existing isCommutativeBinOp().
x86 shows improvements from vector narrowing for the min/max opcodes.
llvm-svn: 360639
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
calling ReplaceAllUsesOfValueWith and returning SDValue().
Returning SDValue() makes the caller think that nothing happened and it will
end up executing the Expand path. This generates extra nodes that will need to
be pruned as dead code.
Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a
change and it will take care of replacing uses. This will prevent falling into
the Expand path.
llvm-svn: 360627
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence.
These are updates to match how isel table would emit a LOCK_OR32mi8 node.
-Use i32 for the immediate zero even though only 8 bits are encoded.
-Use i16 for segment register.
-Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match
64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The
only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1.
-Emit an extra i32 result for the flags output.
I don't know if the types here really matter just noticed it was inconsistent
with normal behavior.
llvm-svn: 360619
|
|
|
|
|
|
| |
Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail.
llvm-svn: 360606
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.
Link: https://github.com/ClangBuiltLinux/linux/issues/469
Reviewers: echristo, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61560
llvm-svn: 360604
|
|
|
|
|
|
|
|
| |
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.
Differential Revision: https://reviews.llvm.org/D61782
llvm-svn: 360596
|