| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes in X86.td:
I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others
Differential Revision: http://reviews.llvm.org/D16357
llvm-svn: 258659
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16137
llvm-svn: 258657
|
|
|
|
|
|
| |
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.
llvm-svn: 258645
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke. Now we die before emitting any invalid code.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16490
llvm-svn: 258638
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before:
.func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv(
)
{
After:
.func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv()
{
Reviewers: bkramer
Subscribers: llvm-commits, tra, jhen, echristo, jholewinski
Differential Revision: http://reviews.llvm.org/D16512
llvm-svn: 258637
|
|
|
|
| |
llvm-svn: 258626
|
|
|
|
| |
llvm-svn: 258624
|
|
|
|
|
|
| |
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.
llvm-svn: 258622
|
|
|
|
|
|
|
|
|
| |
Seems like some compilers still give unused variable warnings for
bool var = ...;
(void)var;
so I have to inline the variable.
llvm-svn: 258619
|
|
|
|
| |
llvm-svn: 258618
|
|
|
|
| |
llvm-svn: 258615
|
|
|
|
|
|
| |
Replace tests with lrp with basic IR expansion
llvm-svn: 258612
|
|
|
|
| |
llvm-svn: 258608
|
|
|
|
|
|
| |
This has side effects.
llvm-svn: 258607
|
|
|
|
|
|
|
| |
This is a leftover from AMDIL that doesn't do anything
and doesn't belong here.
llvm-svn: 258606
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
|
|
|
|
|
|
|
|
|
| |
isConjunctionDisjunctionTree()
This function will exhibit exponential runtime (2**n) so we should
rather use a lower limit.
llvm-svn: 258604
|
|
|
|
| |
llvm-svn: 258603
|
|
|
|
|
|
|
|
|
| |
Previously it failed to add NumArgRegs to the offset and so clobbered an
already-used register. Now just start the numbering after the arg regs
and don't duplicate the add. Test coverage for this coming shortly with
the implementation of byval.
llvm-svn: 258597
|
|
|
|
| |
llvm-svn: 258567
|
|
|
|
| |
llvm-svn: 258558
|
|
|
|
|
|
|
|
|
|
|
| |
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
|
|
|
|
|
|
|
|
| |
The promote alloca pass didn't handle these intrinsics and crashed.
These intrinsics should accept any address space, but for now just
erase them to avoid breaking.
llvm-svn: 258537
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR26186.
Reviewers: grosser, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D16479
llvm-svn: 258536
|
|
|
|
|
|
|
| |
Now that both callsites are identical, we can simplify the
prototype and make it easier to reason about the 2-CC case.
llvm-svn: 258534
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.
Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
%cc1 = fcmp one float %a, %b
%cc2 = fcmp olt float %c, %d
%and = and i1 %cc1, %cc2
%r = select i1 %and, i32 %e, i32 %f
ret i32 %r
}
Assuming (%a < %b) and (%c < %d); we used to do:
fcmp s0, s1 # nzcv <- 1000
orr w8, wzr, #0x1 # w8 <- 1
csel w9, w8, wzr, mi # w9 <- 1
csel w8, w8, w9, gt # w8 <- 1
fcmp s2, s3 # nzcv <- 1000
cset w9, mi # w9 <- 1
tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000
csel w0, w0, w1, ne # w0 <- w0
We now do:
fcmp s2, s3 # nzcv <- 1000
fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000
fccmp s0, s1, #8, le # !le, so: nzcv <- 1000
csel w0, w0, w1, pl # !pl, so: w0 <- w1
In other words, we transformed:
(c < d) && ((a < b) || (a > b))
into:
(c < d) && (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
(c < d) && !((a u>= b) && (a u<= b))
Note that this problem doesn't occur in the test-suite.
changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
In general we can create code for arbitrary "... (and (and A B) C)"
sequences. We can also implement some "or" expressions, because
"(or A B)" is equivalent to "not (and (not A) (not B))" and we can
implement some negation operations. [...] However there is no way
to negate the result of a partial sequence.
Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
== ((a olt b) || (a ogt b))
== ((a ord b) && (a une b))
- (a ueq b)
== ((a uno b) || (a oeq b))
== ((a ule b) && (a uge b))
Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
(a || b)
= !(!a && !(b))
= !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2)
= !(!a && !b1 && !b2)
However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:
(a || b)
= !(!a && (!b))
= !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2)
= !(!a && b1 && b2)
This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.
llvm-svn: 258533
|
|
|
|
|
|
|
|
|
|
|
|
| |
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.
llvm-svn: 258532
|
|
|
|
|
|
| |
Patch by Tobias Edler Von Koch.
llvm-svn: 258527
|
|
|
|
|
|
| |
These ones aren't directly emitted by mesa and inserted by a pass.
llvm-svn: 258523
|
|
|
|
| |
llvm-svn: 258522
|
|
|
|
|
|
|
| |
These aren't supposed to be used outside of the backend,
so there aren't any users to worry about.
llvm-svn: 258516
|
|
|
|
| |
llvm-svn: 258515
|
|
|
|
|
|
| |
I don't think this was ever used.
llvm-svn: 258514
|
|
|
|
|
|
|
| |
Mesa doesn't use this, and this is pattern matched already
from fsub x, (ffloor x)
llvm-svn: 258513
|
|
|
|
|
|
| |
I got a vanity URL, and moved the github waterfall repo.
llvm-svn: 258484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type. This patch skips
lowering of VSETCC if the operand is an f16 vector.
v4 and v8 tests included.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15361
llvm-svn: 258471
|
|
|
|
|
|
|
|
|
|
|
|
| |
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.
Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.
Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.
Differential Revision: http://reviews.llvm.org/D14901
llvm-svn: 258440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
And use it in PPCLoopDataPrefetch.cpp.
@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.
Reviewers: hfinkel
Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D16306
llvm-svn: 258419
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is now the same as the behaviour of the GNU assembler. This was done
as it is required in order to build the Linux kernel with the integrated
assembler enabled.
Reviewers: dsanders, vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13594
llvm-svn: 258400
|
|
|
|
|
|
|
|
| |
Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD.
Differential Revision: http://reviews.llvm.org/D16316
llvm-svn: 258398
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16398
llvm-svn: 258397
|
|
|
|
| |
llvm-svn: 258395
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" or "Move or Merge Scalar Double-Precision Floating-Point Value".
The former should produce warnings when parsing a memory operand that is not ESI/EDI, but the latter should not.
Fixed the code to produce warnings only after making sure we're dealing with the first case.
Expanded the tests of the produced warnings + fixed RUN line of the test so that it would check both stdout and stderr
Differential Revision: http://reviews.llvm.org/D16359
llvm-svn: 258393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently the SI scheduler can be selected via command line option,
but it turned out it would be better if it was selectable via a Target Attribute.
This patch adds "si-scheduler" attribute to the backend.
Reviewers: tstellarAMD, echristo
Subscribers: echristo, arsenm
Differential Revision: http://reviews.llvm.org/D16192
llvm-svn: 258386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
While working on uniform branching, I've hit a few cases where we emit
i1 SETCC operations.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16233
llvm-svn: 258352
|
|
|
|
| |
llvm-svn: 258350
|
|
|
|
| |
llvm-svn: 258348
|
|
|
|
| |
llvm-svn: 258347
|
|
|
|
| |
llvm-svn: 258346
|
|
|
|
| |
llvm-svn: 258343
|