| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before a later use.
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.
A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.
To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.
I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.
Differential Revision: https://reviews.llvm.org/D71736
|
|
|
|
|
|
|
|
|
|
|
|
| |
for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.
Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
|
|
|
|
|
|
|
|
|
| |
constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.
So just skip all that and go directly to EmitTest.
|
|
|
|
|
|
| |
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
|
|
| |
as it broke the aarch64 build.
This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
|
|
|
|
| |
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
| |
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
|
|
|
|
|
|
|
|
| |
The inconsistency caused uops mode to fail on an older version of libpfm
since the dispatched_port was added as an alias for executed_port only
after v4.6.0 of libpfm.
Differential revision: https://reviews.llvm.org/D71665
|
|
|
|
|
|
| |
This patch is mainly for custom lowering the vector operation.
Differential Revision: https://reviews.llvm.org/D71592
|
|
|
|
|
|
|
|
| |
operations from being folded into masked instructions.
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
|
|
|
|
|
|
|
| |
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.
Differential Revision: https://reviews.llvm.org/D71610
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add strict fma support
Reviewers: craig.topper, RKSimon, LiuChen3
Subscribers: hiraditya, llvm-commits, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71604
|
|
|
|
| |
improve readability. NFC
|
|
|
|
|
|
|
| |
This reverts commit aa5ee8f244441a8ea103a7e0ed8b6f3e74454516.
This change broke the sanitizer buildbots. See comments at the patchset
(https://reviews.llvm.org/D71360) for more information.
|
|
|
|
|
|
|
|
|
| |
of integers to floating point.
This includes some of Craig Topper's changes for promotion support from
D71130.
Differential Revision: https://reviews.llvm.org/D69275
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.
// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S
// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S
This patch fixes first case.
And does cleanup.
Reviewers: rnk, craig.topper, fhahn, echristo
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71360
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
|
|
|
|
|
|
|
| |
from getNegatedExpression()"
This reverts commit d1f0bdf2d2df9bdf11ee2ddfff3df50e53f2f042.
The patch can cause infinite loops in DAGCombiner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.
We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().
This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.
Differential Revision: https://reviews.llvm.org/D70975
|
|
|
|
| |
FixupLEAPass::processInstrForSlow3OpLEA.
|
|
|
|
|
|
| |
under min-legal-vector-width=256
This is an improvement to 88dacbd43625cf7aad8a01c0c3b92142c4dc0970
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.
Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70582
|
|
|
|
|
|
|
|
|
|
| |
min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256.
This reverts 3e1aee2ba717529b651a79ed4fc7e7147358043f in favor
of a different approach.
Scalarizing isn't great codegen, but making the type illegal was
interfering with k constraint in inline assembly.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71184
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
A second try after reverted D71072.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71149
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: ormris, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70431
|
|
|
|
|
| |
This reverts commit 3cd93a4efcdeabeb20cb7bec9fbddcb540d337a1.
I'll recommit with a well-formatted arcanist commit message.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
|
|
|
|
|
|
|
|
| |
The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.
Differential Revision: https://reviews.llvm.org/D71095
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:
// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)
Instead, I'd suggest to use the following sequence:
// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).
In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.
There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)
Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper
Differential Revision: https://reviews.llvm.org/D67105
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.
This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.
If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.
In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.
Reviewers: hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71097
|
|
|
|
|
|
| |
This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac.
This seems to break buildbots.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to D70607 where we made any
extract element on SLM more costly than default. But that is
pessimistic for extract from element 0 because that corresponds
to x86 movd/movq instructions. These generally have >1 cycle
latency, but they are probably implemented as single uop
instructions.
Note that no vectorization tests are affected by this change.
Also, no targets besides SLM are affected because those are
falling through to the default cost of 1 anyway. But this will
become visible/important if we add more specializations via cost
tables.
Differential Revision: https://reviews.llvm.org/D71023
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71072
|
|
|
|
|
|
|
|
|
|
| |
explicitly return the chain instead of calling getValue on the single SDValue.
We shouldn't assume that the returned result can be used to get
the other result.
This is prep-work for strict FP where we will also need to pass
the chain result along in more cases.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D68757
|
|
|
|
|
|
| |
single feature flag covers the two places they were used.
Differential Revision: https://reviews.llvm.org/D71048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds forward iterators mc_difflist_iterator,
mc_subreg_iterator and mc_superreg_iterator, based on the existing
DiffListIterator. Those are used to provide iterator ranges over
sub- and super-register from TRI, which are slightly more convenient
than the existing MCSubRegIterator/MCSuperRegIterator. Unfortunately,
it duplicates a bit of functionality, but the new iterators are a bit
more convenient (and can be used with various existing iterator
utilities) and should probably replace the old iterators in the future.
This patch updates some existing users.
Reviewers: evandro, qcolombet, paquette, MatzeB, arsenm
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D70565
|
|
|
|
|
|
| |
manual and add function isMacroFused
Differential Revision: https://reviews.llvm.org/D70999
|
|
|
|
|
|
|
|
|
|
| |
I suspect this became unnecessary after r354161. Prior to that
we may have been going through the default expansion of FP_TO_UINT
on 64-bit targets and then ending up back in Custom X86 handling
to handle the FP_TO_SINT for it. Now we just Custom handle the
FP_TO_UINT directly. We already need to handle it for 32-bit mode
during type legalization so we wouldn't save any code by using
the default expansion on 64-bit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This follows a previous patch that changes the X86 datalayout to represent
mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces
(https://reviews.llvm.org/D64931)
This patch implements the address space cast lowering to the corresponding
sign extension, zero extension, or truncate instructions.
Related to https://bugs.llvm.org/show_bug.cgi?id=42359
Reviewers: rnk, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69639
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions.
Reviewers: craig.topper, RKSimon, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70938
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Model MXCSR for all AVX512 instructions
Reviewers: craig.topper, RKSimon, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70881
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Model MXCSR for AVX instructions other than AVX512
Reviewers: craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70875
|
|
|
|
| |
comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
|