| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.
Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.
llvm-svn: 297651
|
| |
|
|
|
|
|
| |
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.
llvm-svn: 297621
|
| |
|
|
| |
llvm-svn: 297611
|
| |
|
|
|
|
|
|
|
| |
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.
Differential Revision: https://reviews.llvm.org/D30814
llvm-svn: 297608
|
| |
|
|
|
|
| |
they can be correctly matched by EVEX2VEX table generation.
llvm-svn: 297601
|
| |
|
|
| |
llvm-svn: 297600
|
| |
|
|
| |
llvm-svn: 297599
|
| |
|
|
| |
llvm-svn: 297594
|
| |
|
|
|
|
| |
This allows us to remove a duplicate set of patterns.
llvm-svn: 297593
|
| |
|
|
|
|
| |
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.
llvm-svn: 297591
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
llvm-svn: 297586
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.
The case statements corresponding to CRC instructions are incorrect and should
be removed.
Also adding a testcase while on this.
Reviewers: t.p.northover, javed.absar, apazos, rengolin
Reviewed By: rengolin
Subscribers: evandro, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30274
llvm-svn: 297582
|
| |
|
|
|
|
|
|
|
|
| |
register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
llvm-svn: 297574
|
| |
|
|
| |
llvm-svn: 297572
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.
This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.
Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?
Differential Revision: https://reviews.llvm.org/D29841
llvm-svn: 297568
|
| |
|
|
| |
llvm-svn: 297567
|
| |
|
|
| |
llvm-svn: 297563
|
| |
|
|
| |
llvm-svn: 297557
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Differential Revision: https://reviews.llvm.org/D30557
llvm-svn: 297536
|
| |
|
|
|
|
| |
Use Liveness::getNearestAliasedRef to find the reaching def instead.
llvm-svn: 297526
|
| |
|
|
|
|
|
|
| |
This function will find the closest ref node aliased to Reg that is
in an instruction preceding Inst. This could be used to identify the
hypothetical reaching def of Reg, if Reg was a member of Inst.
llvm-svn: 297524
|
| |
|
|
|
|
| |
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl
llvm-svn: 297523
|
| |
|
|
| |
llvm-svn: 297521
|
| |
|
|
| |
llvm-svn: 297507
|
| |
|
|
|
|
|
|
| |
In order to make it easier to parse information about the performance of
MacroFusion, this patch adds the function and the instruction names to the
debug output of this pass.
llvm-svn: 297504
|
| |
|
|
|
|
|
|
| |
for SI
Differential Revision: https://reviews.llvm.org/D29674
llvm-svn: 297499
|
| |
|
|
|
|
|
|
| |
Patch by Guansong Zhang.
Differential Revision: https://reviews.llvm.org/D30750
llvm-svn: 297498
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.
Differential Revision: https://reviews.llvm.org/D30780
llvm-svn: 297458
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.
As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30640
llvm-svn: 297457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
(defined in ARMInstrInfo.td)
Reviewers: grosbach, rengolin, jmolloy
Reviewed By: jmolloy
Subscribers: aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D30782
llvm-svn: 297456
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].
Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.
Reviewers: jmolloy, rogfer01
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30401
llvm-svn: 297453
|
| |
|
|
|
|
|
|
|
|
|
|
| |
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
|
| |
|
|
| |
llvm-svn: 297420
|
| |
|
|
|
|
|
|
| |
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
could be reused.
llvm-svn: 297414
|
| |
|
|
|
|
| |
Extract individual transformations into their own functions.
llvm-svn: 297401
|
| |
|
|
| |
llvm-svn: 297393
|
| |
|
|
|
|
|
| |
(op ... (zext i1 c) ...) -> (select c (op ... 1 ...),
(op ... 0 ...))
llvm-svn: 297391
|
| |
|
|
|
|
|
|
| |
(PR32037).
If the constants are already the correct size, we can copy them directly into the shuffle mask.
llvm-svn: 297381
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.
Revert "[Target/MIPS] Kill dead code, no functional change intended."
This reverts commit r296153.
Revert "Recommit "[mips] Fix atomic compare and swap at O0.""
This reverts commit r296134.
llvm-svn: 297380
|
| |
|
|
|
|
|
|
|
| |
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.
Differential Revision: https://reviews.llvm.org/D30745
llvm-svn: 297376
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
llvm-svn: 297372
|
| |
|
|
|
|
|
|
|
|
|
|
| |
vectorized.
Reviewers:
arsenm
Differential Revision:
http://reviews.llvm.org/D30719
llvm-svn: 297328
|
| |
|
|
|
|
|
|
| |
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).
llvm-svn: 297288
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: davide, RKSimon, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D27089
llvm-svn: 297285
|
| |
|
|
|
|
|
|
| |
elements wide
By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage.
llvm-svn: 297267
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/
Re-commit r296771 to continue testing on the patch.
Sorry for the trouble!
llvm-svn: 297256
|
| |
|
|
|
|
|
|
|
|
|
|
| |
isImageReadWrite calls
This is repetition of isImage() function in NVPTXUtilities.cpp.
Patch by Briana Grace!
Differential Revision: https://reviews.llvm.org/D30706
llvm-svn: 297252
|
| |
|
|
|
|
|
|
|
|
| |
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
llvm-svn: 297251
|
| |
|
|
|
|
|
| |
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.
llvm-svn: 297248
|