| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
AVX2 versions of vector extract when AVX512VL is enabled.
llvm-svn: 270318
|
| |
|
|
|
|
| |
AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones.
llvm-svn: 270317
|
| |
|
|
|
|
| |
is enabled.
llvm-svn: 270316
|
| |
|
|
|
|
| |
the instruction encodings and ensure everything is with EVEX.
llvm-svn: 270315
|
| |
|
|
| |
llvm-svn: 270313
|
| |
|
|
|
|
|
|
|
|
| |
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).
Patch by Matthias Braun
llvm-svn: 270312
|
| |
|
|
|
|
|
|
| |
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.
llvm-svn: 270307
|
| |
|
|
|
|
|
| |
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.
llvm-svn: 270305
|
| |
|
|
|
|
|
|
|
|
|
| |
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.
llvm-svn: 270301
|
| |
|
|
| |
llvm-svn: 270297
|
| |
|
|
| |
llvm-svn: 270296
|
| |
|
|
|
|
| |
Original patch by Tom Stellard
llvm-svn: 270295
|
| |
|
|
|
|
|
|
|
| |
This saves a small amount of code size, and is a first small step toward
passing values on the stack across block boundaries.
Differential Review: http://reviews.llvm.org/D20450
llvm-svn: 270294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now use LiveRangeCalc::extendToUses() instead of a specially designed
algorithm in constructMainRangeFromSubranges():
- The original motivation for constructMainRangeFromSubranges() were
differences between the main liverange and subranges because of hidden
dead definitions. This case however cannot happen anymore with the
DetectDeadLaneMasks pass in place.
- It simplifies the code.
- This fixes a longstanding bug where we did not properly create new SSA
values on merging control flow (the MachineVerifier missed most of
these cases).
- Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and
LiveRangeCalc to better match the implementation/available helper
functions.
This re-applies r269016. The fixes from r270290 and r270259 should avoid
the machine verifier problems this time.
llvm-svn: 270291
|
| |
|
|
|
|
|
|
|
|
|
| |
It is fine for subregister ranges to be undefined on some CFG paths as
we may have a "vregX:other_subreg<read-undef> =" def on that path. We
do not (and should not) have live segments for the subregister ranges.
The MachineVerifier should not complain about this.
This is a slight variant of http://llvm.org/PR27705
llvm-svn: 270290
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20311
llvm-svn: 270287
|
| |
|
|
|
|
|
| |
* Change reloc to PIC_;
* Cleanup (clang-format & modify test);
llvm-svn: 270282
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix renameDisconnectedComponents() creating vreg uses that can be
reached from function begin withouthaving a definition (or explicit
live-in). Fix this by inserting IMPLICIT_DEF instruction before
control-flow joins as necessary.
Removes an assert from MachineScheduler because we may now get
additional IMPLICIT_DEF when preparing the scheduling policy.
This fixes the underlying problem of http://llvm.org/PR27705
llvm-svn: 270259
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.
Reviewers: t.p.northover, mcrosier, jmolloy
Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20172
llvm-svn: 270251
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch is a first step towards a more extendible method of matching combined target shuffle masks.
Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.
I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.
Differential Revision: http://reviews.llvm.org/D19198
llvm-svn: 270230
|
| |
|
|
| |
llvm-svn: 270229
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
|
| |
|
|
|
|
|
| |
We now handle them just like non hidden ones. This was already the case
on x86 (r207518) and arm (r207517).
llvm-svn: 270205
|
| |
|
|
|
|
|
|
| |
Note: This is specifically to allow GCC's test pr44707 to pass.
Trivial change, not put for differential revision. Test included.
llvm-svn: 270192
|
| |
|
|
|
|
| |
sure we don't break any older intrinsics.
llvm-svn: 270183
|
| |
|
|
| |
llvm-svn: 270175
|
| |
|
|
|
|
| |
supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL.
llvm-svn: 270174
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When matching an interleaved load to an ldN pattern, the interleaved access
pass checks that all users of the load are shuffles. If the load is used by an
instruction other than a shuffle, the pass gives up and an ldN is not
generated. This patch considers users of the load that are extractelement
instructions. It attempts to modify the extracts to use one of the available
shuffles rather than the load. After the transformation, the load is only used
by shuffles and will then be matched with an ldN pattern.
Differential Revision: http://reviews.llvm.org/D20250
llvm-svn: 270142
|
| |
|
|
|
|
|
|
|
| |
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.
Differential Revision: http://reviews.llvm.org/D20406
llvm-svn: 270109
|
| |
|
|
| |
llvm-svn: 270096
|
| |
|
|
| |
llvm-svn: 270081
|
| |
|
|
| |
llvm-svn: 270080
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted
mask (e.g., 0x000ffff0). Both 'and's must have a single use.
This changes code like:
and w8, w0, #0xffff000f
and w9, w1, #0x0000fff0
orr w0, w9, w8
into
lsr w8, w1, #4
bfi w0, w8, #4, #12
llvm-svn: 270063
|
| |
|
|
|
|
|
|
|
|
| |
- Renamed intrinsics.ll to intrinsics-coprocessor.ll
as all the tests were testing coprocessor instructions,
also made the test checks match the full instruction.
Differential Revision: http://reviews.llvm.org/D20393
llvm-svn: 270057
|
| |
|
|
| |
llvm-svn: 270046
|
| |
|
|
| |
llvm-svn: 270041
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
verifier.
Summary: Partially fixes PR27458
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: http://reviews.llvm.org/D20330
llvm-svn: 270037
|
| |
|
|
|
|
|
|
|
|
| |
Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2.
This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro.
There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006).
Differential Revision: http://reviews.llvm.org/D19659
llvm-svn: 270036
|
| |
|
|
| |
llvm-svn: 270011
|
| |
|
|
|
|
|
| |
If the load has a pointer type, we don't want to change
its type.
llvm-svn: 270000
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.
Differential Revision: http://reviews.llvm.org/D20295
llvm-svn: 269969
|
| |
|
|
|
|
| |
clang/test/CodeGen/sse2-builtins.c
llvm-svn: 269966
|
| |
|
|
|
|
|
| |
We can chain bcnt instructions together, so
any width popcnt is pretty fast.
llvm-svn: 269950
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instructions"
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
|
| |
|
|
|
|
|
| |
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.
llvm-svn: 269948
|
| |
|
|
|
|
|
| |
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.
llvm-svn: 269945
|
| |
|
|
|
|
|
| |
Fix minor bugs and uses of undef which break when
pointer related optimization passes are run.
llvm-svn: 269944
|
| |
|
|
| |
llvm-svn: 269933
|
| |
|
|
|
|
|
| |
Don't expand divisions by constants if it would require multiple instructions.
The current assumption is that engines will perform the desired optimizations.
llvm-svn: 269930
|