| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
See https://bugs.llvm.org/show_bug.cgi?id=43486
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D68350
llvm-svn: 374553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a "double" (64-bit) value has zero low 32-bits, it's possible to load
such value into a GP/FP registers as an instruction immediate. But now
assembler loads only high 32-bits of the value.
For example, if a target register is GPR the `li.d $4, 1.0` instruction
converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000`
in the register. While a correct representation of the `1.0` value is
`0x3FF0000000000000`. The patch fixes that.
Differential Revision: https://reviews.llvm.org/D68776
llvm-svn: 374544
|
|
|
|
|
|
| |
Now that its used by isNegatibleForFree we should try to avoid costly deep recursion
llvm-svn: 374534
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assertion is everzealous and fail tests like:
renamable $x3 = LI8 0
STD renamable $x3, 16, $x1
renamable $x3 = LI8 0
Remove the assertion since killed flag of $x3 is not mandentory.
Differential Revision: https://reviews.llvm.org/D68344
llvm-svn: 374515
|
|
|
|
|
|
| |
saturating truncating store.
llvm-svn: 374509
|
|
|
|
| |
llvm-svn: 374495
|
|
|
|
|
|
|
|
|
|
| |
instructions in more situations.
If we don't have VLX we won't end up selecting a saturating
truncate for 256-bit or smaller vectors so we should just use
the pack lowering.
llvm-svn: 374487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When handling the packus pattern for i32->i8 we do a two step
process using a packss to i16 followed by a packus to i8. If the
final i8 step is a type with less than 64-bits the packus step
will return SDValue(), but the i32->i16 step might have succeeded.
This leaves the nodes from the middle step dangling.
Guard against this by pre-checking that the number of elements is
at least 8 before doing the middle step.
With that check in place this should mean the only other
case the middle step itself can fail is when SSE2 is disabled. So
add an early SSE2 check then just assert that neither the middle
or final step ever fail.
llvm-svn: 374460
|
|
|
|
|
|
|
|
| |
It was missing an undef flag.
Differential Revision: https://reviews.llvm.org/D68813
llvm-svn: 374455
|
|
|
|
|
|
|
|
|
|
| |
inputs to be between 0 and 255 when zmm registers are disabled on SKX.
If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp.
Differential Revision: https://reviews.llvm.org/D68763
llvm-svn: 374431
|
|
|
|
|
|
|
|
|
| |
Add a helper function getMCSymbolForTOCPseudoMO to clean up PPCAsmPrinter
a little bit.
Differential Revision: https://reviews.llvm.org/D68721
llvm-svn: 374420
|
|
|
|
|
|
|
|
| |
Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics.
Differential Revision: https://reviews.llvm.org/D68567
llvm-svn: 374377
|
|
|
|
| |
llvm-svn: 374372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, patchable extern relocations are introduced to patch
external variables used for multi versioning in
compile once, run everywhere use case. The load instruction
will be converted into a move with an patchable immediate
which can be changed by bpf loader on the host.
The kernel verifier has evolved and is able to load
and propagate constant values, so compiler relocation
becomes unnecessary. This patch removed codes related to this.
Differential Revision: https://reviews.llvm.org/D68760
llvm-svn: 374367
|
|
|
|
|
|
| |
Split off from D67557.
llvm-svn: 374356
|
|
|
|
|
|
| |
Split off from D67557, fixes the compile time regression mentioned in rL372756
llvm-svn: 374351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion."
This reverts the code changes from commit 53f967f2bdb6aa7b08596880c3689d1ecad6f0ff
but keeps the test case.
Reviewers: hliao, arsenm, tpr, dstuttard
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68769
llvm-svn: 374347
|
|
|
|
|
|
|
|
|
|
| |
placeholders. NFCI.
Continuing to undo the rL372756 reversion.
Differential Revision: https://reviews.llvm.org/D67557
llvm-svn: 374345
|
|
|
|
|
|
|
|
|
| |
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.
Differential Revision: https://reviews.llvm.org/D68566
llvm-svn: 374336
|
|
|
|
| |
llvm-svn: 374326
|
|
|
|
|
|
|
|
| |
EXPENSIVE_CHECKS build was failing on new test.
This is fixed by marking $ra register as undef.
Test now has -verify-machineinstrs to check for operand flags.
llvm-svn: 374320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.
This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.
Differential revision: https://reviews.llvm.org/D67350
llvm-svn: 374301
|
|
|
|
|
|
|
|
|
| |
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.
llvm-svn: 374284
|
|
|
|
|
|
|
|
|
|
|
|
| |
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.
As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.
This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.
Differential Revision: https://reviews.llvm.org/D68419
llvm-svn: 374261
|
|
|
|
|
|
|
|
|
|
|
| |
In a future patch, this will help cleanup m0 handling.
The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle move immediates
itself. The virtual register uses will often be allocated to the same
register, so there end up being no real copy.
llvm-svn: 374257
|
|
|
|
|
|
|
|
|
|
| |
This was ignoring the register bank of the input pointer, and
isUniformMMO seems overly aggressive.
This will now conservatively assume a VGPR in cases where the incoming
bank hasn't been determined yet (i.e. is from a loop phi).
llvm-svn: 374255
|
|
|
|
| |
llvm-svn: 374254
|
|
|
|
| |
llvm-svn: 374253
|
|
|
|
|
|
| |
Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR.
llvm-svn: 374252
|
|
|
|
|
|
|
|
|
| |
If original instruction did not have source modifiers they were
not added to the new DPP instruction as well, even if needed.
Differential Revision: https://reviews.llvm.org/D68729
llvm-svn: 374241
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is necessary and sufficient to get simple cases of multiple
return working with multivalue enabled. More complex cases will
require block and loop signatures to be generalized to potentially be
type indices as well.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68684
llvm-svn: 374235
|
|
|
|
|
|
| |
To fix "infinite recursion" warning.
llvm-svn: 374222
|
|
|
|
|
|
|
|
| |
Use the the new math constants in `MathExtras.h`.
Differential revision: https://reviews.llvm.org/D68285
llvm-svn: 374208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 374205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This clang builtin and corresponding LLVM intrinsic are necessary to
expose the exact semantics of the underlying WebAssembly instruction
to users. LLVM produces a poison value if the dynamic swizzle indices
are greater than the vector size, but the WebAssembly instruction sets
the corresponding output lane to zero. Users who depend on this
behavior can safely use this builtin.
Depends on D68527.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68531
llvm-svn: 374189
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the new v8x16.swizzle SIMD instruction as specified at
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices.
In addition to adding swizzles as a candidate lowering in
LowerBUILD_VECTOR, also rewrites and simplifies the lowering to
minimize the number of replace_lanes necessary rather than trying to
minimize code size. This leads to more uses of v128.const instead of
splats, which is expected to increase performance.
The new code will be easier to tune once V8 implements all the vector
construction operations, and it will also be easier to add new
candidate instructions in the future if necessary.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68527
llvm-svn: 374188
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stack
This patch makes sure that if we tag some memory, we untag that memory before
the function returns/throws via any exit, reachable from the tag operation. For
that we place the untag operation either at:
a) the lifetime end call for the alloca, if that call post-dominates the
lifetime start call (where the tag operation is placed), or it (the
lifetime end call) dominates all reachable exits, otherwise
b) at the reachable exits
Differential Revision: https://reviews.llvm.org/D68469
llvm-svn: 374182
|
|
|
|
| |
llvm-svn: 374165
|
|
|
|
|
|
|
|
|
|
|
| |
The `expandLoadImmReal` handles four different and almost non-overlapping
cases: loading a "single" float immediate into a GPR, loading a "single"
float immediate into a FPR, and the same couple for a "double" float
immediate.
It's better to move each `else if` branch into separate methods.
llvm-svn: 374164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bpf specific clang intrinsic is introduced:
u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.
The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
enum {
FIELD_BYTE_OFFSET = 0,
FIELD_BYTE_SIZE,
FIELD_EXISTENCE,
FIELD_SIGNEDNESS,
FIELD_LSHIFT_U64,
FIELD_RSHIFT_U64,
};
for __builtin_preserve_field_info. The old
access offset relocation is covered by
FIELD_BYTE_OFFSET = 0.
An example:
struct s {
int a;
int b1:9;
int b2:4;
};
enum {
FIELD_BYTE_OFFSET = 0,
FIELD_BYTE_SIZE,
FIELD_EXISTENCE,
FIELD_SIGNEDNESS,
FIELD_LSHIFT_U64,
FIELD_RSHIFT_U64,
};
void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
unsigned long long ull = 0;
unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
#ifdef USE_PROBE_READ
bpf_probe_read(&ull, size, (const void *)arg + offset);
unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
lshift = lshift + (size << 3) - 64;
#endif
#else
switch(size) {
case 1:
ull = *(unsigned char *)((void *)arg + offset); break;
case 2:
ull = *(unsigned short *)((void *)arg + offset); break;
case 4:
ull = *(unsigned int *)((void *)arg + offset); break;
case 8:
ull = *(unsigned long long *)((void *)arg + offset); break;
}
unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
#endif
ull <<= lshift;
if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}
There is a minor overhead for bpf_probe_read() on big endian.
The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
r3 = r1
r1 = 0
r1 = 4 <=== relocation (FIELD_BYTE_OFFSET)
r3 += r1
r1 = r10
r1 += -8
r2 = 4 <=== relocation (FIELD_BYTE_SIZE)
call bpf_probe_read
r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
r1 = *(u64 *)(r10 - 8)
r1 <<= r2
r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
r0 = r1
r0 >>= r2
r3 = 1 <=== relocation (FIELD_SIGNEDNESS)
if r3 == 0 goto LBB0_2
r1 s>>= r2
r0 = r1
LBB0_2:
exit
Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
r1 = 41 <=== relocation (FIELD_LSHIFT_U64)
r6 += r1
r6 += -64
r6 <<= 32
r6 >>= 32
r1 = *(u64 *)(r10 - 8)
r1 <<= r6
r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
The code and relocation generated when using direct load.
r2 = 0
r3 = 4
r4 = 4
if r4 s> 3 goto LBB0_3
if r4 == 1 goto LBB0_5
if r4 == 2 goto LBB0_6
goto LBB0_9
LBB0_6: # %sw.bb1
r1 += r3
r2 = *(u16 *)(r1 + 0)
goto LBB0_9
LBB0_3: # %entry
if r4 == 4 goto LBB0_7
if r4 == 8 goto LBB0_8
goto LBB0_9
LBB0_8: # %sw.bb9
r1 += r3
r2 = *(u64 *)(r1 + 0)
goto LBB0_9
LBB0_5: # %sw.bb
r1 += r3
r2 = *(u8 *)(r1 + 0)
goto LBB0_9
LBB0_7: # %sw.bb5
r1 += r3
r2 = *(u32 *)(r1 + 0)
LBB0_9: # %sw.epilog
r1 = 51
r2 <<= r1
r1 = 60
r0 = r2
r0 >>= r1
r3 = 1
if r3 == 0 goto LBB0_11
r2 s>>= r1
r0 = r2
LBB0_11: # %sw.epilog
exit
Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
r2 = 0
r3 = 4 <=== relocation
r4 = 4 <=== relocation
if r4 s> 3 goto LBB0_3
LBB0_3: # %entry
if r4 == 4 goto LBB0_7
LBB0_7: # %sw.bb5
r1 += r3
r2 = *(u32 *)(r1 + 0)
LBB0_9: # %sw.epilog
r1 = 51 <=== relocation
r2 <<= r1
r1 = 60 <=== relocation
r0 = r2
r0 >>= r1
r3 = 1
if r3 == 0 goto LBB0_11
r2 s>>= r1
r0 = r2
LBB0_11: # %sw.epilog
exit
For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.
Differential Revision: https://reviews.llvm.org/D67980
llvm-svn: 374099
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.
Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.
llvm-svn: 374092
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"
This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a.
This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04.
The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.
llvm-svn: 374091
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores. It also limits
the SILoadStoreOptimizer from merging more instructions.
This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65097
llvm-svn: 374087
|
|
|
|
|
|
|
|
|
|
|
| |
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.
Differential Revision: https://reviews.llvm.org/D68607
llvm-svn: 374083
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When searching for local expression tree created by stackified
registers, for 'block' placement, we start the search from the previous
instruction of a BB's terminator. But in 'try''s case, we should start
from the previous instruction of a call that can throw, or a EH_LABEL
that precedes the call, because the return values of the call's previous
instructions can be stackified and consumed by the throwing call.
For example,
```
i32.call @foo
call @bar ; may throw
br $label0
```
In this case, if we start the search from the previous instruction of
the terminator (`br` here), we end up stopping at `call @bar` and place
a 'try' between `i32.call @foo` and `call @bar`, because `call @bar`
does not have a return value so it is not a local expression tree of
`br`.
But in this case, unlike when placing 'block's, we should start the
search from `call @bar`, because the return value of `i32.call @foo` is
stackified and used by `call @bar`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68619
llvm-svn: 374073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
llvm-svn: 374068
|
|
|
|
|
|
|
|
|
|
|
| |
When -pg option is present than a call to _mcount is inserted into every
function. However since the proper ABI was not followed then the generated
gmon.out did not give proper results. By inserting needed instructions
before every _mcount we can fix this.
Differential Revision: https://reviews.llvm.org/D68390
llvm-svn: 374055
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
llvm-svn: 374044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68184
llvm-svn: 374041
|