| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.
Reviewers: majnemer
Subscribers: eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36904
llvm-svn: 312569
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.
With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128
The same thing can happen for AVX with vblendps and those separate patterns already exist.
For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.
For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.
So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.
llvm-svn: 312564
|
| |
|
|
|
|
|
|
| |
If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.
llvm-svn: 312561
|
| |
|
|
|
|
| |
Some of the cases show missing pattern i intend to fix shortly.
llvm-svn: 312560
|
| |
|
|
|
|
|
|
| |
(v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.
We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.
llvm-svn: 312543
|
| |
|
|
| |
llvm-svn: 312537
|
| |
|
|
|
|
|
|
| |
As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2
Differential Revision: https://reviews.llvm.org/D37464
llvm-svn: 312534
|
| |
|
|
| |
llvm-svn: 312530
|
| |
|
|
| |
llvm-svn: 312529
|
| |
|
|
| |
llvm-svn: 312528
|
| |
|
|
|
|
|
|
|
| |
In RWPI code, globals that are not read-only are accessed relative to
the SB register (R9). This is achieved by explicitly generating an ADD
instruction between SB and an offset that we either load from a constant
pool or movw + movt into a register.
llvm-svn: 312521
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example,
if (a == 0) { ... }
else if (a < 0) { ... }
can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch.
This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible.
To maximize the opportunity, we do canonicalization of code sequence before merging compares.
For the above example, the input for this pass looks like:
cmplwi r3, 0
beq 0, .LBB0_3
cmpwi r3, -1
bgt 0, .LBB0_4
So, before merging two compares, we canonicalize it as
cmpwi r3, 0 ; cmplwi and cmpwi yield same result for beq
beq 0, .LBB0_3
cmpwi r3, 0 ; greather than -1 means greater or equal to 0
bge 0, .LBB0_4
The generated code should be
cmpwi r3, 0
beq 0, .LBB0_3
bge 0, .LBB0_4
Differential Revision: https://reviews.llvm.org/D37211
llvm-svn: 312514
|
| |
|
|
| |
llvm-svn: 312504
|
| |
|
|
| |
llvm-svn: 312503
|
| |
|
|
| |
llvm-svn: 312502
|
| |
|
|
|
|
|
|
|
| |
As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)
llvm-svn: 312496
|
| |
|
|
|
|
|
|
|
|
| |
forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)
This reverts commit r312328.
llvm-svn: 312490
|
| |
|
|
|
|
| |
Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles.
llvm-svn: 312486
|
| |
|
|
| |
llvm-svn: 312485
|
| |
|
|
| |
llvm-svn: 312481
|
| |
|
|
|
|
| |
https://reviews.llvm.org/rL312442
llvm-svn: 312474
|
| |
|
|
| |
llvm-svn: 312473
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
references in .text
Summary:
This is a re-roll of D36615 which uses PLT relocations in the back-end
to the call to __xray_CustomEvent() when building in -fPIC and
-fxray-instrument mode.
Reviewers: pcc, djasper, bkramer
Subscribers: sdardis, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D37373
llvm-svn: 312466
|
| |
|
|
|
|
|
|
| |
together write the whole vector, but the starting vector isn't undef.
In this case we should replace the starting vector with undef.
llvm-svn: 312462
|
| |
|
|
|
|
| |
X, Idx), Idx) into an insert of X into the larger zero vector.
llvm-svn: 312460
|
| |
|
|
|
|
| |
register that I missed in r312450.
llvm-svn: 312459
|
| |
|
|
|
|
| |
larger vector.
llvm-svn: 312458
|
| |
|
|
|
|
|
|
| |
into a move instruction which will implicitly zero the upper elements.
Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.
llvm-svn: 312450
|
| |
|
|
| |
llvm-svn: 312449
|
| |
|
|
|
|
| |
In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern.
llvm-svn: 312448
|
| |
|
|
|
|
| |
https://reviews.llvm.org/rL312442
llvm-svn: 312443
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated.
This is a subset of the generated tests that we think may add value to our X86 set of tests.
Some of the checks are not optimal and will be changed after fixing:
1. PR34394
2. PR34382
3. PR34380
4. PR34359
Differential Revision: https://reviews.llvm.org/D37329
llvm-svn: 312442
|
| |
|
|
|
|
| |
assert of non-simple type after type-legalization.".
llvm-svn: 312439
|
| |
|
|
|
|
|
|
|
|
| |
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.
Adding a type check in case the combine is running after the type legalize pass.
Differential Revision: https://reviews.llvm.org/D37330
llvm-svn: 312438
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort.
We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code.
Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one.
The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed.
Reviewers: spatel, zvi, igorb, guyblank, RKSimon
Reviewed By: guyblank
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37320
llvm-svn: 312422
|
| |
|
|
|
|
|
| |
Testcase for rL312364:
[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()
llvm-svn: 312388
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If getHexUint reads in a hex 0, it will create an APInt with a value of 0.
The number of active bits on this APInt is used to calculate the bitwidth of
Result. The number of active bits is defined as an APInt's bitwidth - its
number of leading 0s. Since this APInt is 0, its bitwidth and number of leading
0s are equal.
Thus, Result is constructed with a bitwidth of 0, triggering an APInt assert.
This commit fixes that by checking if the APInt is equal to 0, and setting the
bitwidth to 32 if it is. Otherwise, it sets the bitwidth using getActiveBits.
This caused issues when compiling MIR files with successor probabilities. In
the case that a successor is tagged with a probability of 0, this assert would
fire on debug builds.
https://reviews.llvm.org/D37401
llvm-svn: 312387
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
are the same
This is limited to a set of patterns based on the example in PR34111:
https://bugs.llvm.org/show_bug.cgi?id=34111
...but as I was investigating this, I see that horizontal patterns can go wrong in many,
many other ways that would not be handled by this patch. Each data type may even go
different in the DAG after starting with the same basic IR pattern, so even proper IR
canonicalization won't fix it all.
Differential Revision: https://reviews.llvm.org/D37357
llvm-svn: 312379
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A register in CodeGen can be marked as reserved: In that case we
consider the register always live and do not use (or rather ignore)
kill/dead/undef operand flags.
LiveIntervalAnalysis however tracks liveness per register unit (not per
register). We already needed adjustments for this in r292871 to deal
with super/sub registers. However I did not look at aliased register
there. Looking at ARM:
FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV
(regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit
(FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers.
This shared register unit was previously considered non-reserved,
however given that we uses of the reserved FPSCR potentially violate
some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV
reserved too and stop tracking liveness for it.
This patch:
- Defines a register unit as reserved when: At least for one root
register, the root register and all its super registers are reserved.
- Adjust LiveIntervals::computeRegUnitRange() for new reserved
definition.
- Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way
of testing.
- Stop computing LiveRanges for reserved register units in HMEditor even
with UpdateFlags enabled.
- Skip verification of uses of reserved reg units in the machine
verifier (this usually didn't happen because there would be no cached
liverange but there is no guarantee for that and I would run into this
case before the HMEditor tweak, so may as well fix the verifier too).
Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today;
aliased registers are rarely used, the only other cases are hexagons
P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved
registers in an alias.
Differential Revision: https://reviews.llvm.org/D37356
llvm-svn: 312348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
llvm-svn: 312337
|
| |
|
|
| |
llvm-svn: 312335
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
Reviewers: mkuper, Ayal, dlj, rengolin, srhines
Reviewed By: rengolin
Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35498
llvm-svn: 312331
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issues addressed since original review:
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 312328
|
| |
|
|
|
|
|
| |
In the ROPI relocation model, read-only variables are accessed relative
to the PC. We use the (MOV|LDRLIT)_ga_pcrel pseudoinstructions for this.
llvm-svn: 312323
|
| |
|
|
|
|
|
|
| |
Test constants as well in the PIC tests. These are also represented as
G_GLOBAL_VALUE, and although they are treated just like other globals
for PIC, they won't be for ROPI, so it's good to have this coverage.
llvm-svn: 312319
|
| |
|
|
| |
llvm-svn: 312309
|
| |
|
|
| |
llvm-svn: 312297
|
| |
|
|
|
|
|
|
|
| |
Not all of these will be able to be used by atomics because tablegen, but it
still seems like a good change by itself.
Differential Revision: https://reviews.llvm.org/D37345
llvm-svn: 312287
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Before, this commit caused a buildbot failure:
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll
This was caused by the Key value in DiagnosticInfoOptimizationBase being
deallocated before emitting the remarks defined in MachineOutliner.cpp. As of
r312277 this should no longer be an issue.
llvm-svn: 312280
|
| |
|
|
| |
llvm-svn: 312279
|