| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
v4i32->v4f32 under avx512.
With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With
avx512f we get v16i32->v16f32 instructions which we can use to
emulate v4i32->v4f32.
|
| |
|
|
|
|
|
|
|
| |
Intrinsic has incorrect argument type!
i32 (i32*)* @llvm.setjmp
*wipes tear*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avx512vl. Similar for vXi64 on avx512dq without avx512vl.
Summary:
Previously we did this with isel patterns that used garbage in
the widened part of the source. But that's not valid for strictfp.
So now we custom widen and use zeroes for the widened elemens for
strictfp.
This replaces D71864.
Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3
Reviewed By: pengfei
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71879
|
|
|
|
|
| |
For non-strict, generic type legalization will take care of this,
but that doesn't happen currently for strict nodes.
|
| |
|
|
|
|
|
|
|
|
| |
i686-pc-windows-msvc to match what we do for non-strict.
The float libcalls are inlined in MSVC's math header where they
just cast to double and use the double libcall. Do the same when
we emit libcalls.
|
|
|
|
|
|
|
|
| |
I believe the algorithm we use for non-strict is exception safe
for strict. The fsub won't generate any exceptions. After it we
will have an exact version of the i32 integer in a double. Then
we just round it to f32. That rounding will generate a precision
exception if it can't be represented exactly.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71892
|
| |
|
|
|
|
|
|
|
|
| |
avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl.
Previously we widened these through isel patterns, but that
didn't work for STRICT_ nodes. Those need to be padded with
zeroes in the upper bits which is harder to do in isel patterns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but not AVX512VL.
Previously we were widening with isel patterns, but that wasn't
exception safe for strict FP. So now we widen to v4i32->v4f64
during type legalization. And then let op legalization further
widen to v8i32->v8f64.
The vec_int_to_fp.ll changes are caused by us no longer narrowing
extracts of strict_uint_to_fp to the v4i32->v2f64 instruction
without AVX512VL only to have isel rewiden it. Now we just keep
it wide throughout. So we don't have an opportunity to narrow
the load.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but not avx512vl.
AVX512F added instruction for vector fp_to_uint conversions. With
AVX512VL we can use a specific instruction that does v2f64->v4i32 with
zeroes in the 2 extra elements. For non-strict nodes without AVX512VL
we relied on type legalization to turn it to v4f64->v4i32 which would
later be widened by op legalization to v8f64->v8i32. But type legalization
doesn't currently widen strict nodes since it doesn't know how to
safely and efficiently pad the extra elements. But for X86 we know
padding with zeroes is safe and efficient so do that ourselves.
|
|
|
|
| |
the AVX512DQ+AVX512VL code is very similar in both. NFC
|
|
|
|
|
|
|
| |
strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets.
With avx512dq+avx512vl we have instruction that implements this and
places zeroes in the upper 64-bits of the destination xmm register.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend
Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71871
|
|
|
|
|
|
|
|
| |
v2i1->v2f64 on targets with AVX512F, but not AVX512VL.
In the worst case, this requires a 128-bit move instruction to
implicitly zero the upper bits. In the common case, we should
recognize the producing instruction already zeroed the upper bits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
we're scanning the basic block instead of looking back for it.
Summary:
We're already scanning forward through the basic block. Might as
well just remember eflags defs instead of doing a bounded search
backwards later.
Based on a comment in D71841.
Reviewers: RKSimon, spatel, uweigand
Reviewed By: uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71865
|
|
|
|
| |
STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71867
|
|
|
|
| |
EFLAGS use/def instead of our own custom operand scan. NFCI
|
|
|
|
|
|
|
|
| |
It was added in rL238619.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
targets with avx512dq and avx512vl instructions.
On 32-bit targets we can't use the scalar instruction so we
insert the scalar into a vector and use packed conversions.
Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid
some complexity creating target specific ISD opcodes for
v4f32->v2i64. But this causes extra vzeroupper instructions and
possibly frequency throttling on Intel CPUs.
This patch changes this to create a 128-bit vector and uses a
target specific ISD opcode if needed.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71850
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:
- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
compare can raise exceptions and therefore needs to be a strict compare.
I've made it signaling (even though quiet would also be correct) as
signaling is the more usual default for an LT. This code exists both
in common code and in the X86 target.
- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
that ends up not chosen. I've fixed the algorithm to use only a single
STRICT_SINT_TO_FP instead.
- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
the wrong thing because it calls getOperationAction using the result VT.
But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
be called using the operand VT.
- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D71840
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.
Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0
Patch by: @RKSimon (Simon Pilgrim)
Differential Revision: https://reviews.llvm.org/D63815
|
|
|
|
|
|
| |
1. Remove duplicate function for class name at the beginning of the
comment.
2. Use auto where the type is already obvious from the context.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
assembly.
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.
Also adds support for operand modifiers in Intel-style inline assembly.
Reviewers: rnk
Reviewed By: rnk
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71677
|
|
|
|
|
|
|
| |
Demote member functions to static functions where possible
Use early continue/early return to reduce nesting
Clarify comments slightly.
Reuse previously define expression in one case.
|
|
|
|
| |
Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!
This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum. For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html
The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING. These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes. They are likely to change both in spelling and meaning.
WARNING: This patch is knowingly incorrect in some cornercases. We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.
The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary. The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.
The new options for llvm-mc are:
-x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
-x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.
A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.
alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.
Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.
Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.
Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before a later use.
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.
A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.
To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.
I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.
Differential Revision: https://reviews.llvm.org/D71736
|
|
|
|
|
|
|
|
|
|
|
|
| |
for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.
Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
|
|
|
|
|
|
|
|
|
| |
constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.
So just skip all that and go directly to EmitTest.
|
|
|
|
|
|
| |
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
|
|
| |
as it broke the aarch64 build.
This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
|
|
|
|
| |
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
|
|
|
|
| |
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
|
|
|
|
|
|
|
|
| |
The inconsistency caused uops mode to fail on an older version of libpfm
since the dispatched_port was added as an alias for executed_port only
after v4.6.0 of libpfm.
Differential revision: https://reviews.llvm.org/D71665
|
|
|
|
|
|
| |
This patch is mainly for custom lowering the vector operation.
Differential Revision: https://reviews.llvm.org/D71592
|
|
|
|
|
|
|
|
| |
operations from being folded into masked instructions.
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
|
|
|
|
|
|
|
| |
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.
Differential Revision: https://reviews.llvm.org/D71610
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add strict fma support
Reviewers: craig.topper, RKSimon, LiuChen3
Subscribers: hiraditya, llvm-commits, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71604
|
|
|
|
| |
improve readability. NFC
|
|
|
|
|
|
|
| |
This reverts commit aa5ee8f244441a8ea103a7e0ed8b6f3e74454516.
This change broke the sanitizer buildbots. See comments at the patchset
(https://reviews.llvm.org/D71360) for more information.
|
|
|
|
|
|
|
|
|
| |
of integers to floating point.
This includes some of Craig Topper's changes for promotion support from
D71130.
Differential Revision: https://reviews.llvm.org/D69275
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.
// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S
// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S
This patch fixes first case.
And does cleanup.
Reviewers: rnk, craig.topper, fhahn, echristo
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71360
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
|