| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.
Differential Revision: https://reviews.llvm.org/D49574
llvm-svn: 339600
|
| |
|
|
|
|
|
| |
It breaks when using EXPENSIVE_CHECKS with the error message
"Bad machine code: Using an undefined physical register".
llvm-svn: 339570
|
| |
|
|
| |
llvm-svn: 339562
|
| |
|
|
|
|
|
|
| |
int64_t
Test case reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7173
llvm-svn: 339556
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The GR740 provides an up cycle counter in the
registers ASR22 and ASR23. As these registers can not be
read together atomically we only use the value of ASR23
for llvm.readcyclecounter(). The ASR23 register holds the
32 LSBs of the up-counter.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48638
llvm-svn: 339551
|
| |
|
|
| |
llvm-svn: 339546
|
| |
|
|
|
|
|
|
| |
fp_to_fp16 in case the result type isn't a scalar integer.
This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary.
llvm-svn: 339536
|
| |
|
|
|
|
|
|
|
|
| |
make sure the output type is scalar. For vectors, use a store and load of temporary.
Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid.
This is basically the opposite case of the root cause of PR38533.
llvm-svn: 339535
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes PR37524.
The exception handling encodings for x86_64 in kernel code model
has been changed with r309884. Restore it to correct ones. These
encodings include PersonalityEncoding, LSDAEncoding and
TTypeEncoding.
Differential Revision: https://reviews.llvm.org/D50490
llvm-svn: 339534
|
| |
|
|
|
|
|
|
|
|
| |
fp16_to_fp in case the input type isn't an i16.
The bitcast can be further legalized as needed.
Fixes PR38533.
llvm-svn: 339533
|
| |
|
|
|
|
|
| |
Also add some more tests in preparation for
a future patch.
llvm-svn: 339526
|
| |
|
|
|
|
|
| |
Addresses fixme, although this should still be checking individual
operand flags.
llvm-svn: 339525
|
| |
|
|
|
|
|
|
| |
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.
llvm-svn: 339513
|
| |
|
|
|
|
|
|
|
|
|
| |
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.
Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.
llvm-svn: 339512
|
| |
|
|
| |
llvm-svn: 339511
|
| |
|
|
|
|
|
|
|
|
| |
of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.
This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.
llvm-svn: 339496
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49625
llvm-svn: 339491
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll
tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*
Reviewers: dschuff, sunfish
Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100
Differential Revision: https://reviews.llvm.org/D50568
llvm-svn: 339474
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.
Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.
The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)
According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.
Differential Revision: https://reviews.llvm.org/D50030
llvm-svn: 339472
|
| |
|
|
| |
llvm-svn: 339464
|
| |
|
|
|
|
| |
Clear the nan (or non-nan) test bits from the mask.
llvm-svn: 339462
|
| |
|
|
| |
llvm-svn: 339460
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.
Differential Revision: https://reviews.llvm.org/D50518
llvm-svn: 339432
|
| |
|
|
|
|
|
|
|
|
| |
The previous name sounds like it inserts cfguard implementation, but it
really just emits the table of address-taken functions. Change the name
to better reflect that.
Clang will be updated in the next commit.
llvm-svn: 339419
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
i64x2 and f64x2 operations are not implemented in V8, so we normally
do not want to emit them. However, they are in the SIMD spec proposal,
so we still want to be able to test them in the toolchain. This patch
adds a flag to enable their emission.
Reviewers: aheejin, dschuff
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50423
Patch by Thomas Lively (tlively)
llvm-svn: 339407
|
| |
|
|
|
|
|
|
| |
multiply amounts.
This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here.
llvm-svn: 339406
|
| |
|
|
| |
llvm-svn: 339365
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Similar to rL337966 - if the DAGCombiner's rotate matching was
working as expected, I don't think we'd see any test diffs here.
AArch only goes right, and PPC only goes left.
x86 has both, so no diffs there.
Differential Revision: https://reviews.llvm.org/D50091
llvm-svn: 339359
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation
Reviewers: spatel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D50195
llvm-svn: 339357
|
| |
|
|
|
|
|
| |
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check. Otherwise, NFC.
llvm-svn: 339354
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50454
llvm-svn: 339340
|
| |
|
|
|
|
|
|
| |
Exposed by D50328
Differential Revision: https://reviews.llvm.org/D50328
llvm-svn: 339337
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.
This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.
However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.
This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.
Differential Revision: https://reviews.llvm.org/D50328
llvm-svn: 339335
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
- 'read's and 'write's are lowered to atomic loads and stores, and
- an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)
Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.
Differential Revision: https://reviews.llvm.org/D50391
llvm-svn: 339316
|
| |
|
|
| |
llvm-svn: 339300
|
| |
|
|
|
|
|
|
|
| |
isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.
llvm-svn: 339299
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".
This addresses PR37129.
Differential Revision: https://reviews.llvm.org/D50219
llvm-svn: 339294
|
| |
|
|
|
|
| |
These are related to the block of code under review in D50195.
llvm-svn: 339293
|
| |
|
|
|
|
|
|
|
| |
On Darwin we pin the DWARF line tables to version 2. Stop doing so for
DWARF v5 and later.
Differential revision: https://reviews.llvm.org/D49381
llvm-svn: 339288
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop". However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.
The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)
Differential Revision: https://reviews.llvm.org/D49459
llvm-svn: 339283
|
| |
|
|
| |
llvm-svn: 339276
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50405
llvm-svn: 339272
|
| |
|
|
|
|
|
| |
I think this is the only situation where the callsite
will have a null instruction.
llvm-svn: 339271
|
| |
|
|
| |
llvm-svn: 339270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:
LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64
Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950
llvm-svn: 339260
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
```
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
```
Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
```
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
```
Reviewers: eli.friedman, olista01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D49727
llvm-svn: 339257
|
| |
|
|
|
|
|
|
| |
Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike.
I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x
llvm-svn: 339254
|
| |
|
|
| |
llvm-svn: 339251
|
| |
|
|
|
|
| |
Making the test use urem relies on it calling udiv-like combines, but the real issue is with the udiv so we're better off using that directly.
llvm-svn: 339247
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50427
llvm-svn: 339241
|