| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
- Add MACH flags
- Add XNACK flag
- Add reserved flags
- Minor cleanups in docs
Differential Revision: https://reviews.llvm.org/D43356
llvm-svn: 325399
|
| |
|
|
|
|
|
|
|
|
| |
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902
Differential Revision: https://reviews.llvm.org/D43355
llvm-svn: 325393
|
| |
|
|
|
|
|
|
|
|
|
| |
The data type is assumed to be a vector, but sometimes it is not, leading
to an assertion.
Add simple test-case to verify this.
Differential revision: https://reviews.llvm.org/D42599
llvm-svn: 325378
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.
Reviewers:
arsenm
Differential Revision:
https://reviews.llvm.org/D33559
llvm-svn: 325372
|
| |
|
|
|
|
|
|
|
|
| |
This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST.
We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering.
Differential Revision: https://reviews.llvm.org/D43201
llvm-svn: 325371
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction.
Summary:
In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.
Reviewers:
rampitec
Differential Revision:
https://reviews.llvm.org/D43297
llvm-svn: 325355
|
| |
|
|
|
|
|
|
|
|
| |
shuffle masks
Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask.
I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines.
llvm-svn: 325354
|
| |
|
|
|
|
| |
and the root started as a float domain shuffle
llvm-svn: 325349
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These instructions conflict with their full length variants
for the purposes of FastISel as they cannot be distingushed
based on the number and type of operands and predicates.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D41285
llvm-svn: 325341
|
| |
|
|
|
|
|
|
| |
simplifying VSELECT operands
This just adds a basic pass through - we can add constant selection mask handling in a future patch to fully match InstCombine.
llvm-svn: 325338
|
| |
|
|
|
|
|
|
|
|
| |
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Eli Friedman
llvm-svn: 325327
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently when expanding a SETCC node into a SELECT_CC, LLVM uses
an incorrect type for determining BooleanContent of the result. This
patch fixes the issue.
Fixes PR36079.
Reviewers: rogfer01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43282
llvm-svn: 325325
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0
a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Differential Revision: https://reviews.llvm.org/D34515
llvm-svn: 325323
|
| |
|
|
|
|
| |
Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op.
llvm-svn: 325318
|
| |
|
|
|
|
| |
Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish.
llvm-svn: 325317
|
| |
|
|
| |
llvm-svn: 325306
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D43350
llvm-svn: 325299
|
| |
|
|
|
|
|
|
| |
We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize
Differential Revision: https://reviews.llvm.org/D43346
llvm-svn: 325290
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
(zextload)) even when the and does not have multiple uses
Same for the sign extend case.
Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.
This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?
Differential Revision: https://reviews.llvm.org/D43063
llvm-svn: 325289
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fix silly mistake in a test
Reviewers: gkistanova, apilipenko
Subscribers: javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D43342
llvm-svn: 325283
|
| |
|
|
|
|
| |
optsize.
llvm-svn: 325277
|
| |
|
|
|
|
| |
Tests showing missing opportunities to use PACK instructions in cases where we need to truncate to illegal types for stores
llvm-svn: 325270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reference '&' is missing in the function parameter. If there are
back-to-back optimizations in terms of dag node list like below:
t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64
t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64
The bug will trigger a segfault for the added test case remove_truncate_5.ll:
LLVMSymbolizer: error reading file: No such file or directory
#0 0x000000000241c4d9 (llc+0x241c4d9)
#1 0x000000000241c56a (llc+0x241c56a)
#2 0x000000000241aa50 (llc+0x241aa50)
...
#22 0x0000000000fd5edf (llc+0xfd5edf)
#23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05)
#24 0x0000000000fd3e69 (llc+0xfd3e69)
...
Segmentation fault
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325267
|
| |
|
|
|
|
| |
Lowering of formal arguments needs to be aware of vararg functions.
llvm-svn: 325255
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).
For example, the following C code:
#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))
results in the following assembly if compiled with GCC:
vadd.f32 s0, s0, s1
whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.
This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:
vadd.f32 q0, q0, q1
This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:
asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))
Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.
Reviewers: grosbach, rengolin
Reviewed By: rengolin
Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42962
llvm-svn: 325244
|
| |
|
|
|
|
|
|
| |
PACKUS vXi32-vXi8 saturated truncation
We can use PACKSS/PACKUS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKUSWB to [0,255].
llvm-svn: 325243
|
| |
|
|
|
|
|
|
|
|
| |
PACKSS vXi32-vXi8 saturated truncation
We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127].
PACKUS is a little trickier and will be handled in a separate patch.
llvm-svn: 325235
|
| |
|
|
|
|
|
|
|
|
|
|
| |
TargetLowering::SimplifyDemandedVectorElts
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.
Further features can be moved/added in future patches.
Differential Revision: https://reviews.llvm.org/D42896
llvm-svn: 325232
|
| |
|
|
|
|
|
|
| |
This adds f16 VCMP match rules and fixes the test cases.
Differential Revision: https://reviews.llvm.org/D43291
llvm-svn: 325228
|
| |
|
|
|
|
| |
These must have not been printing the last time the test was re-generated.
llvm-svn: 325198
|
| |
|
|
| |
llvm-svn: 325190
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The bound instruction does not have reversed operands in gas.
Fixes PR27653.
Patch by Maya Madhavan.
Differential Revision: https://reviews.llvm.org/D43243
llvm-svn: 325178
|
| |
|
|
| |
llvm-svn: 325169
|
| |
|
|
|
|
|
|
|
|
| |
between PACK*SDW/PACK*SWB
Try to keep PACK*SDW/PACK*SWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles.
This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch.
llvm-svn: 325149
|
| |
|
|
|
|
|
|
|
| |
Kernel arguments likely read by all workitems and should not bypass
cache. Fixes performance hit in sub-dword argument loads.
Differential Revision: https://reviews.llvm.org/D43249
llvm-svn: 325146
|
| |
|
|
| |
llvm-svn: 325138
|
| |
|
|
|
|
|
|
| |
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.
llvm-svn: 325135
|
| |
|
|
|
|
|
|
| |
This adds support for handling f16 stack spills/reloads.
Differential Revision: https://reviews.llvm.org/D43280
llvm-svn: 325130
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9
llvm-svn: 325128
|
| |
|
|
|
|
|
|
| |
truncation
While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it.
llvm-svn: 325127
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of solving the hard problem of how to pass the callee to the indirect
jump thunk without a register, just use a CSR. At a call boundary, there's
nothing stopping us from using a CSR to hold the callee as long as we save and
restore it in the prologue.
Also, add tests for this mregparm=3 case. I wrote execution tests for
__llvm_retpoline_push, but they never got committed as lit tests, either
because I never rewrote them or because they got lost in merge conflicts.
Reviewers: chandlerc, dwmw2
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43214
llvm-svn: 325049
|
| |
|
|
| |
llvm-svn: 325042
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old syntax:
BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2
* %r0 = SOME_OP %r2
* %r1 = ANOTHER_OP internal %r0
New syntax:
BUNDLE implicit-def %r0, implicit-def %r1, implicit %r2 {
%r0 = SOME_OP %r2
%r1 = ANOTHER_OP internal %r0
}
llvm-svn: 325032
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D43170
llvm-svn: 325030
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(not y))
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.
While there add a one use check to the transform above that looked similar.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43225
llvm-svn: 325019
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the other input guarantees upper 32 bits are 0.
Summary: This gets the shift case from PR35792.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43222
llvm-svn: 325018
|
| |
|
|
|
|
|
|
|
|
|
| |
Change ARMConstantIslandPass to:
- accept f16 literals as litpool entries,
- if the litpool needs to be inserted in the middle of a big block, then we
need to 4-byte align the next instruction in ARM mode.
Differential Revision: https://reviews.llvm.org/D42784
llvm-svn: 325012
|
| |
|
|
|
|
|
|
| |
The bug has been lying dormant, but apparently was never exposed, until
after rL324941 because we didn't return the correct result
for shifts with undef operands.
llvm-svn: 325010
|
| |
|
|
|
|
| |
Using "void main" might be confusing for some cases.
llvm-svn: 324997
|
| |
|
|
|
|
|
|
| |
This addressing mode wasn't checked, so we were running in an assert.
Differential Revision: https://reviews.llvm.org/D43179
llvm-svn: 324996
|