| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Now that PR33325 is fixed, this should always improve the generated code.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42793
llvm-svn: 324317
|
| |
|
|
|
|
|
|
| |
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.
I plan to teach DAG combine to optimize these so need to stop using them.
llvm-svn: 324315
|
| |
|
|
|
|
| |
Those should have glc bit set for system and agent synchronization scopes
llvm-svn: 324314
|
| |
|
|
|
|
|
| |
Wasm uses the expand action for several FP compare ops, and that behavior
changed.
llvm-svn: 324305
|
| |
|
|
| |
llvm-svn: 324304
|
| |
|
|
|
|
|
|
| |
This reverts r323297.
It breaks building grub.
llvm-svn: 324301
|
| |
|
|
| |
llvm-svn: 324295
|
| |
|
|
|
|
|
|
| |
sext when AVX512 is enabled.
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.
llvm-svn: 324294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
|
| |
|
|
|
|
| |
X86FrameLowering sets stack size to 0 if redzone is enabled.
llvm-svn: 324285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major visible difference here is that in line-table dumps,
directory and file names are wrapped in double-quotes; previously,
directory names got single quotes and file names were not quoted at
all.
The improvement in this patch is that when a DWARF v5 line table
header has indirect strings, in a verbose dump these will all have
their section[offset] printed as well as the name itself. This
matches the format used for dumping strings in the .debug_info
section.
Differential Revision: https://reviews.llvm.org/D42802
llvm-svn: 324270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.
Reviewers: craig.topper, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42808
llvm-svn: 324261
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AND with immediate will match first.
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.
This also allows us to match an and to a movzx after a not.
This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.
llvm-svn: 324260
|
| |
|
|
|
|
|
|
|
|
|
|
| |
of a 64 bit mask.
If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops.
This patch teachs shrinkAndImmediate not to break that optimization.
Differential Revision: https://reviews.llvm.org/D42899
llvm-svn: 324249
|
| |
|
|
|
|
|
| |
Unlike V6_vmpyhv, it produces the result in the exact form that is
expected without the need for a shuffle.
llvm-svn: 324241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr.
But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge:
for (unsigned i = 0; i < TripCount; i++) {
// do something
if (__builtin_expect(check(), 1))
break;
// do something
}
Differential Revision: https://reviews.llvm.org/D42637
llvm-svn: 324229
|
| |
|
|
|
|
|
|
| |
Remove combineBitcastForMaskedOp.
Add test cases for the merge masked versions to make sure we have all those covered.
llvm-svn: 324210
|
| |
|
|
|
|
|
|
|
|
| |
patterns instead.
We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking.
The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it.
llvm-svn: 324205
|
| |
|
|
| |
llvm-svn: 324202
|
| |
|
|
|
|
| |
To be improved by D42044
llvm-svn: 324200
|
| |
|
|
|
|
|
|
|
|
| |
(and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations.
This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions.
There's still some room for improvement to remove more transitions, but this is a good start.
llvm-svn: 324184
|
| |
|
|
|
|
| |
Hopefully help make this a lot more maintainable
llvm-svn: 324180
|
| |
|
|
|
|
| |
This is necessary to prevent the shuffles from being combined/simplified in an upcoming patch.
llvm-svn: 324178
|
| |
|
|
|
|
|
|
| |
Clang already stopped using these a couple months ago.
The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around.
llvm-svn: 324177
|
| |
|
|
|
|
|
| |
From the discussion in D41835 it looks possible the change will be backed out,
but for now let's fix the RISCV tests.
llvm-svn: 324172
|
| |
|
|
|
|
| |
vectors are not handled by LowerVectorAllZeroTest.
llvm-svn: 324130
|
| |
|
|
|
|
|
|
|
|
| |
ptest for all ones comparison.
Turns out I misunderstood the flag behavior of PTEST because I read the documentation for KORTEST which is different than PTEST/KTEST and made a bad assumption.
Keep the test rename though cause that's useful.
llvm-svn: 324129
|
| |
|
|
|
|
| |
Also rename the test from pr12312.ll to ptest.ll so its more recognizable.
llvm-svn: 324124
|
| |
|
|
|
|
|
|
| |
This requires corresponding clang change.
Differential Revision: https://reviews.llvm.org/D40955
llvm-svn: 324101
|
| |
|
|
|
|
| |
See https://reviews.llvm.org/D42793#996098 for context.
llvm-svn: 324099
|
| |
|
|
| |
llvm-svn: 324094
|
| |
|
|
| |
llvm-svn: 324074
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When handling vectors with non byte-sized elements, reverse the order of the
elements in the built integer if the target is Big-Endian.
SystemZ tests updated.
Review: Eli Friedman, Ulrich Weigand.
https://reviews.llvm.org/D42786
llvm-svn: 324063
|
| |
|
|
|
|
|
|
|
|
| |
test/CodeGen/SystemZ/vec-trunc-to-i1.ll was marked as a temporary
FAIL when it was previously updated when it needed one more COPY.
This was however wrong, since the loop body had been reduced
significantly, and it was actually an improvement.
Review: Ulrich Weigand.
llvm-svn: 324060
|
| |
|
|
|
|
|
|
| |
32-bit halves from i32, bitcasting each to v32i1, and concatenating.
This prevents the scalarization that would otherwise occur.
llvm-svn: 324057
|
| |
|
|
|
|
|
|
| |
v32i1 and bitcasting to i32.
This saves a trip through memory and seems to open up other combining opportunities.
llvm-svn: 324056
|
| |
|
|
|
|
|
|
| |
To avoid trigger "No default SetCC type for vectors!" Assertion
Differential Revision: https://reviews.llvm.org/D42675
llvm-svn: 324054
|
| |
|
|
|
|
| |
My rebase had missed the new $ sigil we're using.
llvm-svn: 324051
|
| |
|
|
|
|
|
|
|
|
| |
This fixes a crash where the user is a COPY, which deliberately does not
constrain its source operands, resulting in a vreg without a reg class escaping
selection.
Differential Revision: https://reviews.llvm.org/D42697
llvm-svn: 324047
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Example situation:
```
BB0:
%0 = ...
use %0
; ...
condjump BB1
jmp BB2
BB1:
%0 = ... ; rematerialized def from above (from earlier split step)
jmp BB2
BB2:
; ...
use %0
```
%0 will have a live interval with 3 value numbers (for the BB0, BB1 and
BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
number in BB2 (This only works because it is a secondary split so
SplitKit is can trace this back to a single original def).
We need to recompute all live ranges affected by a value number that we
rematerialize. The case that we missed before is that when the value
that is rematerialized is at a join (Phi VNI) then we also have to
recompute liveness for the predecessor VNIs.
rdar://35699130
Differential Revision: https://reviews.llvm.org/D42667
llvm-svn: 324039
|
| |
|
|
| |
llvm-svn: 324024
|
| |
|
|
| |
llvm-svn: 324017
|
| |
|
|
| |
llvm-svn: 324015
|
| |
|
|
|
|
|
|
| |
This is a rather non-controversial change. We were missing these instructions
from the list of instructions that are lane-sensitive. These two put the result
into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068.
llvm-svn: 324005
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output
We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.
Fixes PR36199
Differential Revision: https://reviews.llvm.org/D42809
llvm-svn: 324002
|
| |
|
|
|
|
|
| |
Until we support extending loads properly we're going to fall back for these.
We already handle stores in the same way, so this is just being consistent.
llvm-svn: 324001
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.
This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).
This change is a continuation of the work started in D30751.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar
Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits
Differential Revision: https://reviews.llvm.org/D41835
llvm-svn: 323991
|
| |
|
|
|
|
|
|
|
|
|
|
| |
target has UnpackedD16VMem feature.
Reviewers:
Matt and Brian
Differential Revision:
https://reviews.llvm.org/D42548
llvm-svn: 323988
|
| |
|
|
|
|
|
|
|
|
| |
index vectors
This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles.
Differential Revision: https://reviews.llvm.org/D42487
llvm-svn: 323987
|
| |
|
|
|
|
| |
As noted in D42323, we're not checking for denorms as we should.
llvm-svn: 323985
|