| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
hardwiring to MVT::i8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, craig.topper, rnk
Reviewed By: rnk
Subscribers: rnk, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69034
|
|
|
|
|
|
|
|
|
|
| |
Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp
expansions but do not change any default policy for now.
This also fixes a bug in the memcmp expansion itself when large
displacements are needed.
https://reviews.llvm.org/D69507
|
|
|
|
|
|
|
|
|
|
| |
stores w/StoreSDNode) by default
Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other.
This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization.
Differential Revision: https://reviews.llvm.org/D69219
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
known zero.
This catches some cases. There are probably ways to improve this.
I tried doing it as a combine on the setcc, but that broke
some cases involving flag reuse in place of test.
I renamed the isX86CCUnsigned to isX86CCSigned and flipped its
polarity to make it consistent with the similar functions for
ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned.
Fixes PR43823.
Differential Revision: https://reviews.llvm.org/D69499
|
| |
|
|
|
|
|
| |
(I'm currently trying to debug a strange error message I get when
pushing to github, despite the pushes being successful).
|
|
|
|
|
|
|
|
|
|
| |
setcc), undef,))), C) into (bitcast (vXi1 (concat_vectors (vYi1 setcc), zero,)))
The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND.
With the AND removed we might be able to use a kortest on the result.
Differential Revision: https://reviews.llvm.org/D69205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
|
|
|
|
|
|
|
| |
Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert
it to ISD::INSERT_SUBVECTOR.
https://reviews.llvm.org/D69464
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LowerPATCHABLE_TYPED_EVENT_CALL
Summary:
The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This
results in the register allocator picking a 32-bit register. We
need to use the 64-bit register when forming the MOV64rr
instructions. Otherwise we print illegal assembly in the text
output.
I think prior to this it was impossible for SrcReg to be equal
to DstReg so the NOP code was not reachable.
While there use Register instead of unsigned.
Also add a FIXME for what looks like a bug.
Reviewers: dberris
Reviewed By: dberris
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69365
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pairwise.
Summary:
We don't pattern match pairwise shuffles in SelectionDAG. So we
should only return the optimized costs if its not a pairwise
shuffle.
I think SLP vectorizer gives priority to non pairwise shuffle if
the cost is the same. And the look up for reduction intrinsics
passes false for the pairwise flag. So this probably has no real
effect today.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69083
|
|
|
|
|
|
|
|
|
|
|
| |
PTEST and especially the MOVMSK instructions are slow on Knights Landing
or later. As a bonus, this patch increases instruction parallelism by
emitting:
KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0
Instead of:
KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0
https://reviews.llvm.org/D69157
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69307
|
|
|
|
| |
Without this, we can create a PSADBW node that isn't legal.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The default implementation of the describeLoadedValue() hook uses the
MoveImm property to determine if an instruction moves an immediate. If
an instruction has that property the function returns the second
operand, assuming that that is the immediate value the instruction
moves. As far as I can tell, the MoveImm property does not imply that
the second operand is the immediate value, nor that any other operand
necessarily holds the immediate value; it just means that the
instruction moves some immediate value.
One example where the second operand is not the immediate is SystemZ's
LZER instruction, which moves a zero immediate implicitly: $f0S = LZER.
That case triggered an out-of-bound assertion when getting the operand.
I have added a test case for that instruction.
Another example is ARM's MVN instruction, which holds the logical
bitwise NOT'd value of the immediate that is moved. For the following
reproducer:
extern void foo(int);
int main() { foo(-11); }
an incorrect call site value would be emitted:
$ clang --target=arm foo.c -O1 -g -Xclang -femit-debug-entry-values \
-c -o - | ./build/bin/llvm-dwarfdump - | \
grep -A2 call_site_parameter
0x00000058: DW_TAG_GNU_call_site_parameter
DW_AT_location (DW_OP_reg0 R0)
DW_AT_GNU_call_site_value (DW_OP_lit10)
Another example is the A2_combineii instruction on Hexagon which moves
two immediates to a super-register: $d0 = A2_combineii 20, 10.
Perhaps these are rare exceptions, and most MoveImm instructions hold
the immediate in the second operand, but in my opinion the default
implementation of the hook should only describe values that it can, by
some contract, guarantee are safe to describe, rather than leaving it up
to the targets to override the exceptions, as that can silently result
in incorrect call site values.
This patch adds X86's relevant move immediate instructions to the
target's hook implementation, so this commit should be a NFC for that
target. We need to do the same for ARM and AArch64.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: vsk
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69109
|
|
|
|
|
|
| |
Stop hardwiring classes
llvm-svn: 375470
|
|
|
|
|
|
| |
reduction combine
llvm-svn: 375463
|
|
|
|
|
|
| |
This doesn't need to be just for bitops, but the ops do need to be fully associative.
llvm-svn: 375445
|
|
|
|
|
|
|
|
|
|
| |
emitting X86ISD::FHADD in LowerUINT_TO_FP_i64.
This was a regression from r375341.
Fixes PR43729.
llvm-svn: 375381
|
|
|
|
|
|
|
|
| |
'Zeroable' known undef/zero bits. NFCI.
Renamed 'resolveTargetShuffleAndZeroables' to 'resolveTargetShuffleFromZeroables' to match.
llvm-svn: 375348
|
|
|
|
|
|
| |
tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles.
llvm-svn: 375342
|
|
|
|
|
|
|
|
| |
We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea.
Differential Revision: https://reviews.llvm.org/D69175
llvm-svn: 375341
|
|
|
|
|
|
| |
Now X86ISelLowering doesn't depend on many IR analyses.
llvm-svn: 375320
|
|
|
|
|
|
| |
Only forward declarations are needed here. Follow-on to r375311.
llvm-svn: 375319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Works on this dependency chain:
ArrayRef.h ->
Hashing.h -> --CUT--
Host.h ->
StringMap.h / StringRef.h
ArrayRef is very popular, but Host.h is rarely needed. Move the
IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are
more likely to need it.
llvm-svn: 375316
|
|
|
|
|
|
|
|
|
|
| |
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
|
|
|
|
|
|
|
|
|
|
| |
Previously, the parser checked for a '%' prefix to indicate a register.
In Intel syntax mode, LLVM does not print a '%' prefix on registers, so
LLVM could not parse its own assembly output. Instead, require that
register numbers be integer literals, or at least start with an integer
literal, which is consistent with .cfi_* directive register parsing.
llvm-svn: 375287
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.
NFC
Differential Revision: https://reviews.llvm.org/D69187
llvm-svn: 375277
|
|
|
|
|
|
| |
NFCI.
llvm-svn: 375253
|
|
|
|
|
|
| |
https://reviews.llvm.org/D69111
llvm-svn: 375197
|
|
|
|
|
|
|
|
|
|
|
| |
Add generic DAG combine for extending masked loads.
Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.
Differential Revision: https://reviews.llvm.org/D68337
llvm-svn: 375085
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084
|
|
|
|
|
|
|
|
|
|
| |
from the resolveTargetShuffleAndZeroables call.
Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements.
Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0
llvm-svn: 374928
|
|
|
|
| |
llvm-svn: 374922
|
|
|
|
|
|
| |
NFCI.
llvm-svn: 374878
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VBROADCAST when trying to share broadcasts.
The only things VBROADCAST_LOAD uses is an address and a chain
node. It has no vector inputs.
So if its a user of the source of another broadcast that could
only mean one of two things. The other broadcast is broadcasting
the address of the broadcast_load. Or the source is a load and
the use we're seeing is the chain result from that load. Neither
of these cases make sense to combine here.
This issue was reported post-commit r373871. Test case has not
been reduced yet.
llvm-svn: 374862
|
|
|
|
|
|
|
|
|
|
|
|
| |
to vgatherpf/vscatterpf.
We need to encode bit 4 into the EVEX.V' bit. We do this right
for regular gather/scatter which use either MRMSrcMem or MRMDestMem
formats. The prefetches use MRM*m formats.
Fixes an issue recently added to PR36202.
llvm-svn: 374849
|
|
|
|
|
|
|
|
| |
Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it.
For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs.
llvm-svn: 374786
|
|
|
|
|
|
|
|
| |
Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have).
For targets supporting POPCNT, we provide overrides that assume 1cy costs.
llvm-svn: 374775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the following changes to the btver2 scheduling model:
- The number of micro opcodes for YMM loads and stores is now 2 (it was
incorrectly set to 1 for both aligned and misaligned loads/stores).
- Increased the number of AGU resource cycles for YMM loads and stores
to 2cy (instead of 1cy).
- Removed JFPU01 and JFPX from the list of resources consumed by pure
float/vector loads (no MMX).
I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those
are dispatched to the FPU but not really issues on JFPU01.
Differential Revision: https://reviews.llvm.org/D68871
llvm-svn: 374765
|
|
|
|
|
|
|
|
|
| |
Add an extra parameter so the backend can take the alignment into
consideration.
Differential Revision: https://reviews.llvm.org/D68400
llvm-svn: 374763
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from the subtract directly during isel.
This prevents isel from emitting a TEST instruction that
optimizeCompareInstr will need to remove later.
In some of the modified tests, the SUB gets duplicated due to
the flags being needed in two places and being clobbered in
between. optimizeCompareInstr was able to optimize away the TEST
that was using the result of one of them, but optimizeCompareInstr
doesn't know to turn SUB into CMP after removing the TEST. It
only knows how to turn SUB into CMP if the result was already
dead.
With this change the TEST never exists, so optimizeCompareInstr
doesn't have to remove it. Then it can just turn the SUB into
CMP immediately.
Fixes PR43649.
llvm-svn: 374755
|
|
|
|
|
|
|
|
| |
well as KnownZero.
We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all.
llvm-svn: 374732
|
|
|
|
|
|
|
|
|
|
| |
This enables use of the saturating truncate instructions when the
result type is less than 128 bits. It also enables the use of
saturating truncate instructions on KNL when the input is less
than 512 bits. We can do this by widening the input and then
extracting the result.
llvm-svn: 374731
|
|
|
|
|
|
| |
getTargetShuffleInputs with KnownUndef/Zero results.
llvm-svn: 374725
|