| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conservative (PR44539)
Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539.
As discussed there, this pass makes some overly optimistic
assumptions, as it does not have access to actual branch weights.
This patch makes the computation of the depth of the optimized cmov
more conservative, by assuming a distribution of 75/25 rather than
50/50 and placing the weights to get the more conservative result
(larger depth). The fully conservative choice would be
std::max(TrueOpDepth, FalseOpDepth), but that would break at least
one existing test (which may or may not be an issue in practice).
Differential Revision: https://reviews.llvm.org/D74155
(cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86 uses i8 for shift amounts. This code can fail on a 32-bit target
if it runs after type legalization.
This code was copied from AArch64 and modified for X86, but the
shift amount wasn't changed to the correct type for X86.
Fixes PR44812
(cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to D73680 (AArch64 BTI).
A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR.
Also, add 32-bit tests and test that .cfi_startproc is at the function
entry. The line information has a general implementation and is tested
by AArch64/patchable-function-entry-empty.mir
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D73760
(cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
|
|
|
|
|
|
|
|
|
| |
Long `cl::value_desc()` is added right after the flag name,
before `cl::desc()` column. And thus the `cl::desc()` column,
for all flags, is padded to the right,
which makes the output unreadable.
(cherry picked from commit 70cbf8c71c510077baadcad305fea6f62e830b06)
|
|
|
|
|
|
|
|
|
|
| |
We were performing an emulated i32->f64 in the SSE registers, then
storing that value to memory and doing a extload into the X87
domain.
After this patch we'll now just store the i32 to memory along
with an i32 0. Then do a 64-bit FILD to f80 completely in the X87
unit. This matches what we do without SSE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults.
(*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach.
Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk.
Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off.
Differential Revision: https://reviews.llvm.org/D72738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.
I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.
Reviewers: jyknight, aemerson
Differential Revision: https://reviews.llvm.org/D70465
|
|
|
|
| |
The extload on X87 is free.
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode i64->f32/f64/f80 uint_to_fp algorithm.
This allows us to generate better code for selecting the fixup
to load.
Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.
|
|
|
|
|
|
|
|
| |
lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971
By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before.
Differential Revision: https://reviews.llvm.org/D72307
|
|
|
|
| |
the load folding tables./
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
readPrefixes() assumes insn->bytes is non-empty. The code path is not
exercised in llvm-mc because llvm-mc does not feed empty input to
MCDisassembler::getInstruction().
This bug is uncovered by a5994c789a2982a770254ae1607b5b4cb641f73c.
An empty string did not crash before because the deleted regionReader()
allowed UINT64_C(-1) as insn->readerCursor.
Bytes.size() <= Address -> R->Base
0 <= UINT64_C(-1) - UINT32_C(-1)
|
| |
|
|
|
|
|
| |
At least one of these is used without a Glue. This doesn't seem
to change the X86GenDAGISel.inc output so maybe it doesn't matter?
|
|
|
|
|
|
| |
Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better).
This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
|
|
|
|
|
|
| |
elements of the lane mask.
Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
|
|
|
|
| |
X86Disassembler.cpp and refactor
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
STRICT_FSETCC/STRICT_FSETCCS nodes.
This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered
early while lowering SELECT, but the output chain doesn't get
connected. Then we visit the node again when it is its turn
because we haven't replaced the use of the chain result. In the
case of the fp128 libcall lowering, after D72341 this will cause
the libcall to be emitted twice.
|
|
|
|
| |
TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
|
|
|
|
|
|
|
| |
llvm-objdump -d on clang is decreased from 7.8s to 7.4s.
The improvement is likely due to the elimination of logger setup and
dbgprintf(), which has a large overhead.
|
|
|
|
| |
llvm-objdump -d on clang is decreased from 8.2s to 7.8s.
|
|
|
|
| |
during PreprocessISelDAG to remove some duplicate isel patterns.
|
|
|
|
|
|
| |
The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't.
The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
|
| |
|
| |
|
|
|
|
| |
The generic saturated math opcodes are no longer widened inside X86TargetLowering
|
|
|
|
|
|
|
|
| |
Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed.
This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case.
Helps with poor shuffles mentioned in D66004.
|
|
|
|
| |
now that we don't mutate strict fp nodes. NFC
|
|
|
|
|
|
| |
For X87<->SSE conversions, the SSE type is always smaller than
the X87 type. So we can always use the smallest type for the
memory type.
|
|
|
|
|
|
|
|
|
|
|
| |
strict_fp_round into stack operations.
We use the stack for X87 fp_round and for moving from SSE f32/f64 to
X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64.
Note for the SSE<->X87 conversions the conversion always happens in the
X87 domain. The load/store ops in the X87 instructions are able
to signal exceptions.
|
| |
|
|
|
|
| |
simplify some code. NFCI
|
|
|
|
|
|
|
| |
For x86-64, we diverge from GCC -fpatchable-function-entry in that we
emit multi-byte NOPs.
Differential Revision: https://reviews.llvm.org/D72220
|
|
|
|
|
|
|
|
| |
multi-input shuffle elements
We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths.
Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
|
| |
|
|
|
|
|
|
| |
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
assembly.
Summary:
Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions.
Also add a regression test for llvm.org/PR44272, since this finishes fixing it.
Reviewers: thakis, rnk
Reviewed By: thakis
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72417
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch gives out the details of the znver2 scheduler model.
There are few improvements with respect to execution units, latencies and
throughput when compared with znver1.
The tests that were present for znver1 for llvm-mca tool were replicated.
The latencies, execution units, timeline and throughput information are updated for znver2.
Reviewers: craig.topper, Simon Pilgrim
Differential Revision: https://reviews.llvm.org/D66088
|
|
|
|
|
|
|
|
|
|
|
| |
When EFLAGS is no longer live into a basic block, remove it from the live-in
list.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44462.
Review: Craig Topper
Differential Revision: https://reviews.llvm.org/D71375
|
| |
|
|
|
|
|
|
|
|
|
|
| |
64-bit mode
For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier.
When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd.
Differential Revision: https://reviews.llvm.org/D72368
|
|
|
|
|
|
| |
We have to do an intermediate jump to a GPR to make the cast.
Fixes PR43750.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code.
Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and *only* worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support.
For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall.
Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least *correct* and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place.
Differential Revision: https://reviews.llvm.org/D72303
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Adding fp128 support for strict fcmp
Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand
Subscribers: hiraditya, llvm-commits, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71897
|
|
|
|
|
|
| |
Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now.
Differential Revision: https://reviews.llvm.org/D72354
|
|
|
|
|
|
|
|
| |
v2i64 in foldVectorXorShiftIntoCmp.
Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence.
Differential Revision: https://reviews.llvm.org/D72318
|