| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
legacy intrinsics and a select.
llvm-svn: 286089
|
| |
|
|
|
|
|
|
|
|
|
| |
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI. Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.
Addresses PR30825!
llvm-svn: 286082
|
| |
|
|
| |
llvm-svn: 286075
|
| |
|
|
|
|
| |
In preparation for demandedelts support
llvm-svn: 286074
|
| |
|
|
|
|
| |
upgrade them to a select and the older AVX2 intrinsic.
llvm-svn: 286073
|
| |
|
|
|
|
| |
Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286072
|
| |
|
|
| |
llvm-svn: 286071
|
| |
|
|
|
|
| |
in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286070
|
| |
|
|
|
|
| |
already exists in the avx512f test file.
llvm-svn: 286069
|
| |
|
|
|
|
| |
In preparation for demandedelts support
llvm-svn: 286068
|
| |
|
|
|
|
| |
for these test cases will be improved for AVX512 in a future commit.
llvm-svn: 286063
|
| |
|
|
|
|
| |
addr:)) -> VCVTPS2PDZ128rm
llvm-svn: 286059
|
| |
|
|
|
|
| |
-show-mc-encoding to show where we aren't using EVEX instructions.
llvm-svn: 286058
|
| |
|
|
|
|
| |
instruction when available.
llvm-svn: 286057
|
| |
|
|
|
|
| |
they can use EVEX instructions when available.
llvm-svn: 286056
|
| |
|
|
|
|
| |
see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line.
llvm-svn: 286055
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
valid int64_t to the set.
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.
It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range. But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.
Fixes PR30914.
Reviewers: jeremyhu
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D26323
llvm-svn: 286038
|
| |
|
|
| |
llvm-svn: 286009
|
| |
|
|
|
|
|
|
| |
Broadcast from memory instructions should be treated as moves. They can't be unfolded.
Fixes pr30693.
llvm-svn: 285998
|
| |
|
|
| |
llvm-svn: 285997
|
| |
|
|
|
|
| |
This reverts commit r285939 and r285948. These broke some conformance tests.
llvm-svn: 285995
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.
Reviewers: rengolin, efriedma, t.p.northover, jmolloy
Subscribers: efriedma, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D26120
llvm-svn: 285969
|
| |
|
|
|
|
|
|
| |
Patch By: Wei Ding
Differential Revision: https://reviews.llvm.org/D18049
llvm-svn: 285939
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
|
| |
|
|
|
|
| |
This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 .
llvm-svn: 285912
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 285893
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.
Reviewers: delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26134
llvm-svn: 285878
|
| |
|
|
|
|
|
|
| |
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
llvm-svn: 285876
|
| |
|
|
| |
llvm-svn: 285846
|
| |
|
|
|
|
| |
Some of these are already fixed or tested somewhere else.
llvm-svn: 285840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25656
llvm-svn: 285835
|
| |
|
|
| |
llvm-svn: 285828
|
| |
|
|
|
|
| |
This is already done with VGPR immediates and saves 4 bytes.
llvm-svn: 285765
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
llvm-svn: 285762
|
| |
|
|
|
|
|
|
|
|
|
| |
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll
Differential Revision: https://reviews.llvm.org/D25802
llvm-svn: 285716
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.
The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot. I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.
Reviewers: bogner, ab, t.p.northover, arsenm
Subscribers: wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25999
llvm-svn: 285704
|
| |
|
|
|
|
|
| |
purpose of the test was to have 2 different function attribute sets, but due
to a typo there was only one both with number #0.
llvm-svn: 285701
|
| |
|
|
|
|
|
|
| |
Note: Test is per differential review, but the other changed code in the review was for an optimisation that din't quite work. Nevertheless, the test is valid for the unoptimised version of the fix.
Differential Review: https://reviews.llvm.org/D24658
llvm-svn: 285692
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 285690
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D26077
llvm-svn: 285684
|
| |
|
|
|
|
|
| |
This patch corresponds to review https://reviews.llvm.org/D26095.
Committing on behalf of Tony Jiang.
llvm-svn: 285681
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844
The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841
llvm-svn: 285656
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Generate the slowest possible codepath for noopt CodeGen. Even trying to be
clever with the negated jump can cause out-of-range jumps. Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible. This re-enables the
previous optimization as well as hopefully gives us working code in all cases.
Addresses PR30356!
llvm-svn: 285649
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass. We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.
Reviewers: tra
Subscribers: llvm-commits, jholewinski, jingyue, mgorny
Differential Revision: https://reviews.llvm.org/D26165
llvm-svn: 285642
|
| |
|
|
|
|
|
| |
This patch corresponds to review https://reviews.llvm.org/D26072.
Committing on behalf of Sean Fertile.
llvm-svn: 285627
|
| |
|
|
| |
llvm-svn: 285615
|
| |
|
|
| |
llvm-svn: 285614
|
| |
|
|
| |
llvm-svn: 285588
|
| |
|
|
| |
llvm-svn: 285560
|
| |
|
|
|
|
|
| |
In D26098, Davide Italiano submitted a .s file instead of the .ll file
that was the last stage of the review.
llvm-svn: 285559
|