| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Amend MS offset operator implementation, to more closely fit with its MS counterpart:
1. InlineAsm: evaluate non-local source entities to their (address) location
2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous
3. address PR32530
Based on http://llvm.org/D37461
Fix broken test where the break appears unrelated.
- Set up appropriate memory-input rewrites for variable references.
- Intel-dialect assembly printing now correctly handles addresses by adding "offset".
- Pass offsets as immediate operands (using "r" constraint for offsets of locals).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71436
|
| |
|
|
|
|
|
| |
If only the sign bit is demanded, and the LHS is all zeroes, then
we can bypass the PCMPGT.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Recommit notes:
Introduce temporary variables in order to make sure
instructions get inserted into MachineFunction in same order
regardless of compiler used to build llvm.
Differential Revision: https://reviews.llvm.org/D71363
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add missing part of patch D71361. Now that the stack-frame
can be operated using a addw/subw instruction, they should
appear in the unwinding list.
Reviewers: dmgreen, efriedma
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72000
|
| |
|
|
|
|
|
| |
The path already used for f16/f32 works a lot better when v_trunc_f64
is available.
|
|
|
|
|
|
|
|
|
| |
Sometimes the result bank of the phi is already assigned to something,
and should not be ignored. This is in preparation for additional
boolean phi handling changes.
Also refine the logic to fix some cases that were incorrectly deciding
to use SGPRs.
|
|
|
|
|
|
|
|
| |
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
|
|
|
|
|
|
| |
This reverts commit dbc136e0fe7e14c64dcb78e72321bb41af60afa4.
It broke buildbots:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/21066
|
|
|
|
|
|
|
|
|
| |
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.
Differential Revision: https://reviews.llvm.org/D70997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch allows to emit thumb2 add and sub
instructions with 12 bit immediates in the
emitT2RegPlusImmediate function.
- Splitting parts of the D70680
Reviewers: eli.friedman, olista01, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71361
|
|
|
|
|
|
|
|
|
|
| |
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Differential Revision: https://reviews.llvm.org/D71363
|
|
|
|
|
|
|
|
|
| |
G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates
these intrinsics from __builtin_bswap32 and __builtin_bswap64.
Add lower and narrowscalar for G_BSWAP.
Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later.
Differential Revision: https://reviews.llvm.org/D71362
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For now, PowerPC will using several instructions to get the constant and "and" it with the following case:
define i32 @test1(i32 %a) {
%and = and i32 %a, -2
ret i32 %and
}
However, we could exploit it with the rotate mask instructions.
MB ME
+----------------------+
|xxxxxxxxxxx00011111000|
+----------------------+
0 32 64
Notice that, we can only do it if the MB is larger than 32 and MB <= ME as
RLWINM will replace the content of [0 - 32) with [32 - 64) even we didn't rotate it.
Differential Revision: https://reviews.llvm.org/D71829
|
|
|
|
|
| |
These are implemented slightly more efficiently than comparing
to 1 in the case that the value is more than 64 bits.
|
| |
|
|
|
|
|
|
|
|
| |
X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations.
These nodes should only ever be formed with an i8 TargetConstant
so we don't need to check for it to be a constant. It's also
always 8-bits so we don't need to use APInt compare functions.
|
|
|
|
|
|
|
|
|
| |
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.
They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
|
|
|
|
|
|
|
|
|
| |
targets.
We had a Custom operation action for v4i32 on SSE1. But since
v4i32 isn't legal until SSE2 this was not what was intended. The
code that get executed was intended for op legalization and
creates a bunch of v4i32 nodes that all end up scalarized.
|
|
|
|
| |
LowerUINT_TO_FP_i32. NFCI
|
|
|
|
| |
D48683 accidentally disabled -enable-machine-outliner for x86-32.
|
|
|
|
|
| |
In the last commit, I neglected to initialize the new subtarget feature
I added which caused failures on a few bots. This should fix that.
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
Below 6 instructions don't set the hasSideEffects flag and don't have match
pattern, so their hasSideEffects flag will be set true by llvm-tblgen.
But in fact below instructions don't modify any special register and don't have
other SideEffects, they shouldn't have SideEffects.
This patch is to modify the hasSideEffects of below instructions from 1 to 0.
```
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
VSPLTBs
VSPLTHs
```
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D71391
|
|
|
|
|
| |
This matches the DAG behavior where we don't use SReg_32_XM0
everywhere anymore, and fixes not coalescing the copies into m0.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
v4i32->v4f32 under avx512.
With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With
avx512f we get v16i32->v16f32 instructions which we can use to
emulate v4i32->v4f32.
|
| |
|
|
|
|
|
|
|
| |
Intrinsic has incorrect argument type!
i32 (i32*)* @llvm.setjmp
*wipes tear*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avx512vl. Similar for vXi64 on avx512dq without avx512vl.
Summary:
Previously we did this with isel patterns that used garbage in
the widened part of the source. But that's not valid for strictfp.
So now we custom widen and use zeroes for the widened elemens for
strictfp.
This replaces D71864.
Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3
Reviewed By: pengfei
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71879
|
|
|
|
|
| |
For non-strict, generic type legalization will take care of this,
but that doesn't happen currently for strict nodes.
|
| |
|
|
|
|
|
|
|
|
| |
i686-pc-windows-msvc to match what we do for non-strict.
The float libcalls are inlined in MSVC's math header where they
just cast to double and use the double libcall. Do the same when
we emit libcalls.
|
|
|
|
|
|
|
|
| |
I believe the algorithm we use for non-strict is exception safe
for strict. The fsub won't generate any exceptions. After it we
will have an exact version of the i32 integer in a double. Then
we just round it to f32. That rounding will generate a precision
exception if it can't be represented exactly.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D71892
|
| |
|
|
|
|
|
|
|
|
| |
avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl.
Previously we widened these through isel patterns, but that
didn't work for STRICT_ nodes. Those need to be padded with
zeroes in the upper bits which is harder to do in isel patterns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but not AVX512VL.
Previously we were widening with isel patterns, but that wasn't
exception safe for strict FP. So now we widen to v4i32->v4f64
during type legalization. And then let op legalization further
widen to v8i32->v8f64.
The vec_int_to_fp.ll changes are caused by us no longer narrowing
extracts of strict_uint_to_fp to the v4i32->v2f64 instruction
without AVX512VL only to have isel rewiden it. Now we just keep
it wide throughout. So we don't have an opportunity to narrow
the load.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but not avx512vl.
AVX512F added instruction for vector fp_to_uint conversions. With
AVX512VL we can use a specific instruction that does v2f64->v4i32 with
zeroes in the 2 extra elements. For non-strict nodes without AVX512VL
we relied on type legalization to turn it to v4f64->v4i32 which would
later be widened by op legalization to v8f64->v8i32. But type legalization
doesn't currently widen strict nodes since it doesn't know how to
safely and efficiently pad the extra elements. But for X86 we know
padding with zeroes is safe and efficient so do that ourselves.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous btf field relocation is always at assignment like
r1 = 4
which is converted from an ld_imm64 instruction.
This patch did an optimization such that relocation
instruction might be load/store/shift. Specically, the
following insns may also have relocation, except BPF_MOV:
LDB, LDH, LDW, LDD, STB, STH, STW, STD,
LDB32, LDH32, LDW32, STB32, STH32, STW32,
SLL, SRL, SRA
To accomplish this, a few BPF target specific
codegen only instructions are invented. They
are generated at backend BPF SimplifyPatchable phase,
which is at early llc phase when SSA form is available.
The new codegen only instructions will be converted to
real proper instructions at the codegen and BTF emission stage.
Note that, as revealed by a few tests, this optimization might
be actual generating more relocations:
Scenario 1:
if (...) {
... __builtin_preserve_field_info(arg->b2, 0) ...
} else {
... __builtin_preserve_field_info(arg->b2, 0) ...
}
Compiler could do CSE to only have one relocation. But if both
of the above is translated into codegen internal instructions,
the compiler will not be able to do that.
Scenario 2:
offset = ... __builtin_preserve_field_info(arg->b2, 0) ...
...
... offset ...
... offset ...
... offset ...
For whatever reason, the compiler might be temporarily do copy
propagation of the righthand of "offset" assignment like
... __builtin_preserve_field_info(arg->b2, 0) ...
... __builtin_preserve_field_info(arg->b2, 0) ...
and CSE will be able to deduplicate later.
But if these intrinsics are converted to BPF pseudo instructions,
they will not be able to get deduplicated.
I do not expect we have big instruction count difference.
It may actually reduce instruction count since now relocation
is in deeper insn dependency chain.
For example, for test offset-reloc-fieldinfo-2.ll, this patch
generates 7 instead of 6 relocations for non-alu32 mode, but it
actually reduced instruction count from 29 to 26.
Differential Revision: https://reviews.llvm.org/D71790
|
|
|
|
| |
the AVX512DQ+AVX512VL code is very similar in both. NFC
|