| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>
Differential Revision: http://reviews.llvm.org/D18512
llvm-svn: 264701
|
| |
|
|
|
|
|
|
|
| |
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.
llvm-svn: 264690
|
| |
|
|
|
|
|
|
|
|
|
| |
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues
llvm-svn: 264689
|
| |
|
|
| |
llvm-svn: 264676
|
| |
|
|
| |
llvm-svn: 264674
|
| |
|
|
| |
llvm-svn: 264673
|
| |
|
|
| |
llvm-svn: 264672
|
| |
|
|
| |
llvm-svn: 264671
|
| |
|
|
| |
llvm-svn: 264670
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.
The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.
Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.
Differential Revision: http://reviews.llvm.org/D18492
llvm-svn: 264666
|
| |
|
|
| |
llvm-svn: 264661
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18279
llvm-svn: 264608
|
| |
|
|
|
|
|
|
|
|
| |
They do have a def machine operand.
Fixing the definition is necessary for an upcoming patch.
Differential Revision: http://reviews.llvm.org/D18384
llvm-svn: 264607
|
| |
|
|
|
|
|
|
| |
optimizing for minsize
Mimic what x86 does when optimizing sdiv/udiv for minsize.
llvm-svn: 264606
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.
I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
1. This allows us to return more accurate information via the TTI interface
(I recognize that this currently makes no practical difference)
2. Is hopefully easier to understand (it allows the core's features to match
its manual while still having the desired effect).
llvm-svn: 264600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineFunctionProperties represents a set of properties that a MachineFunction
can have at particular points in time. Existing examples of this idea are
MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which
will eventually be switched to use this mechanism.
This change introduces the AllVRegsAllocated property; i.e. the property that
all virtual registers have been allocated and there are no VReg operands
left.
With this mechanism, passes can declare that they require a particular property
to be set, or that they set or clear properties by implementing e.g.
MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class
verifies that the requirements are met, and handles the setting and clearing
based on the delcarations. Passes can also directly query and update the current
properties of the MF if they want to have conditional behavior.
This change annotates the target-independent post-regalloc passes; future
changes will also annotate target-specific ones.
Reviewers: qcolombet, hfinkel
Differential Revision: http://reviews.llvm.org/D18421
llvm-svn: 264593
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.
This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.
shader-db stats:
2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18451
llvm-svn: 264589
|
| |
|
|
| |
llvm-svn: 264584
|
| |
|
|
| |
llvm-svn: 264581
|
| |
|
|
|
|
|
|
|
|
| |
Add the Lanai backend to lib/Target.
General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html).
Differential Revision: http://reviews.llvm.org/D17011
llvm-svn: 264578
|
| |
|
|
|
|
|
|
|
|
| |
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0
Differential Revision: http://reviews.llvm.org/D18511
llvm-svn: 264566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.
It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.
This fixes the latter problem and adds the FMAX/MINNUM mappings.
llvm-svn: 264532
|
| |
|
|
|
|
| |
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization
llvm-svn: 264517
|
| |
|
|
|
|
|
|
| |
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization
Differential Revision: http://reviews.llvm.org/D18307
llvm-svn: 264512
|
| |
|
|
|
|
|
|
| |
Currently this is to mainly to prevent scalarization of integer division by constants.
Differential Revision: http://reviews.llvm.org/D18307
llvm-svn: 264511
|
| |
|
|
| |
llvm-svn: 264510
|
| |
|
|
|
|
|
|
| |
multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.
llvm-svn: 264509
|
| |
|
|
|
|
|
|
|
| |
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.
This fixes PR27083.
llvm-svn: 264508
|
| |
|
|
|
|
|
|
| |
Add all 256-bit vector tests.
Added AVX512F/AVX512BW test targets.
Renamed tests something more meaningful.
llvm-svn: 264507
|
| |
|
|
|
|
|
|
|
| |
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.
This fixes PR27071.
llvm-svn: 264465
|
| |
|
|
|
|
|
|
|
|
|
| |
regmasks
When encountering instructions with regmasks, instead of cleaning up all the
elements in MaybeDeadCopies map, remove only the instructions erased. By keeping
more instruction in MaybeDeadCopies, this change will expose more dead copies
across instructions with regmasks.
llvm-svn: 264462
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When merging stores in DAGCombiner, add check to ensure that no
dependenices exist that would cause the construction of a cycle in our
DAG. This may happen if one store has a data dependence on another
instruction (e.g. a load) which itself has a (chain) dependence on
another store being merged. These stores cannot be merged safely and
doing so results in a cycle that is discovered in LegalizeDAG.
This test is only done in cases where Antialias analysis is used (UseAA)
as non-AA store merge candidates will be merged logically after all
loads which have been checked to not alias.
Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight
Subscribers: llvm-commits, tberghammer, danalbert, srhines
Differential Revision: http://reviews.llvm.org/D18336
llvm-svn: 264461
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible to have a fallthrough MBB prior to MBB placement. The original
addition of the BB would result in reordering the BB as not preceding the
successor. Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island! Insert the spliced BB
into the same location to avoid that.
Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!
llvm-svn: 264454
|
| |
|
|
|
|
|
|
|
|
|
|
| |
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.
Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.
Differential Revision: http://reviews.llvm.org/D18374
llvm-svn: 264440
|
| |
|
|
|
|
|
|
|
| |
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).
Differential Revision: http://reviews.llvm.org/D18246
llvm-svn: 264375
|
| |
|
|
|
|
|
|
|
| |
We did not have an explicit branch to the continuation BB. When the check was
hoisted, this could permit control follow to fall through into the division
trap. Add the explicit branch to the continuation basic block to ensure that
code execution is correct.
llvm-svn: 264370
|
| |
|
|
|
|
|
|
|
|
| |
It is incorrect to get the corresponding MBB for a ReturnInst before
SelectAllBasicBlocks since SelectAllBasicBlocks can change the
correspondence between a ReturnInst and the MBB it is in.
PR27062
llvm-svn: 264358
|
| |
|
|
|
|
|
| |
Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because
populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg.
llvm-svn: 264354
|
| |
|
|
|
|
| |
This fixes http://llvm.org/PR26991
llvm-svn: 264345
|
| |
|
|
|
|
| |
making sure we give it a register and mark it as a register constraint.
llvm-svn: 264340
|
| |
|
|
| |
llvm-svn: 264339
|
| |
|
|
|
|
| |
Fixes an issue in rL264329.
llvm-svn: 264337
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Only adds support for "naked" calls to llvm.experimental.deoptimize.
Support for round-tripping through RewriteStatepointsForGC will come
as a separate patch (should be simpler than this one).
Reviewers: reames
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18429
llvm-svn: 264329
|
| |
|
|
|
|
| |
Patch by Sundeep Kushwaha.
llvm-svn: 264328
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.
Patch by Jyotsna Verma.
llvm-svn: 264324
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Given that StatepointLowering now uniques derived pointers before
putting them in the per-statepoint spill map, we may end up with missing
entries for derived pointers when we visit a gc.relocate on a pointer
that was de-duplicated away.
Fix this by keeping two maps, one mapping gc pointers to their
de-duplicated values, and one mapping a de-duplicated value to the slot
it is spilled in.
llvm-svn: 264320
|
| |
|
|
| |
llvm-svn: 264308
|
| |
|
|
|
|
|
|
|
|
|
|
| |
KTEST instruction may be used instead of TEST in this case:
%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1
Differential Revision: http://reviews.llvm.org/D18444
llvm-svn: 264298
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If the operation's type has been promoted during type legalization, we
need to account for the fact that the high bits of the comparison
operand are likely unspecified.
The LHS is usually zero-extended, but MIPS sign extends it, so we have
to be slightly careful.
Patch by Simon Dardis.
llvm-svn: 264296
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32
instruction, do not guarantee that the top 16 bits are zeroed out.
Remove the unsafe AssertZext and add tests to exercise this.
Reviewers: jmolloy, sbaranga, kristof.beyls, aadg
Subscribers: llvm-commits, srhines, aemerson
Differential Revision: http://reviews.llvm.org/D18426
llvm-svn: 264285
|