| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
It appears that we haven't been prioritizing rules that contain nested
instructions properly. InstructionOperandMatcher didn't override
isHigherPriorityThan so it never compared the instructions/operands/predicates
inside nested instructions.
Fixes PR35926. Thanks to Diana Picus for the bug report.
llvm-svn: 322754
|
| |
|
|
|
|
| |
Failing build bots. Revert the commit now.
llvm-svn: 322748
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trying to link
__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
return &foo;
}
on OS X fails with
ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64
The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.
llvm-svn: 322739
|
| |
|
|
|
|
|
|
| |
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.
Differential revision: https://reviews.llvm.org/D40922
llvm-svn: 322738
|
| |
|
|
|
|
|
|
| |
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).
Differential revision: https://reviews.llvm.org/D38378
llvm-svn: 322737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
elements and use a 64-bit broadcast
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.
We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.
I've also restricted this to AVX2 only because we can only broadcast loads under AVX.
Differential Revision: https://reviews.llvm.org/D42086
llvm-svn: 322730
|
| |
|
|
|
|
|
|
|
|
|
|
| |
introduce bitcasts to i64 in 32-bit mode
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.
The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.
This fixes pr35972.
llvm-svn: 322724
|
| |
|
|
| |
llvm-svn: 322723
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
|
| |
|
|
|
|
|
|
| |
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.
Review: Ulrich Weigand
llvm-svn: 322688
|
| |
|
|
|
|
|
| |
G_FPEXT and G_FPTRUNC are handled by TableGen'erated code, just add
tests.
llvm-svn: 322665
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.
In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.
Reviewers: craig.topper, jmolloy, olista01
Reviewed By: olista01
Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41863
llvm-svn: 322663
|
| |
|
|
| |
llvm-svn: 322657
|
| |
|
|
|
|
|
|
|
|
|
|
| |
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D37985
llvm-svn: 322656
|
| |
|
|
|
|
|
|
|
|
|
| |
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.
Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.
llvm-svn: 322651
|
| |
|
|
|
|
|
|
|
|
| |
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.
llvm-svn: 322644
|
| |
|
|
|
|
| |
Possible missed opportunity to use 64-bit lane permute on AVX1 in lowerShuffleAsRepeatedMaskAndLanePermute
llvm-svn: 322628
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D42102
llvm-svn: 322623
|
| |
|
|
|
|
|
|
| |
Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size.
Differential Revision: https://reviews.llvm.org/D42108
llvm-svn: 322619
|
| |
|
|
|
|
| |
For some reason they don't have a trailing i like the packed equivalents.
llvm-svn: 322600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.
Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.
def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
GISDNodeXFormEquiv<imm8_xform>;
Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);
Reviewers: dsanders, ab, rovka
Reviewed By: dsanders
Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet
Differential Revision: https://reviews.llvm.org/D42012
llvm-svn: 322582
|
| |
|
|
| |
llvm-svn: 322574
|
| |
|
|
|
|
| |
Extend the MMX zero code to take any constant with zero'd upper 32-bits
llvm-svn: 322553
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch unifies the printing of address ranges as [0x0, 0x1).
rdar://34822059
Reviewers: aprantl, dblaikie
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D42056
llvm-svn: 322543
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As commented on the existing code:
// The Reg operand should be a virtual register, which is defined
// outside the current basic block. DAG combiner has done a pretty
// good job in removing truncating inside a single basic block.
However, when the Reg operand comes from bpf_load_[byte | half | word]
intrinsics, the generic optimizer doesn't understand their results are
zero extended, so these single basic block elimination opportunities were
missed.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322534
|
| |
|
|
|
|
|
|
|
|
| |
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.
This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.
Differential Revision: https://reviews.llvm.org/D41908
llvm-svn: 322525
|
| |
|
|
|
|
|
|
|
|
| |
BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873)
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.
Differential Revision: https://reviews.llvm.org/D42042
llvm-svn: 322524
|
| |
|
|
| |
llvm-svn: 322523
|
| |
|
|
| |
llvm-svn: 322522
|
| |
|
|
| |
llvm-svn: 322521
|
| |
|
|
| |
llvm-svn: 322519
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D41617
llvm-svn: 322500
|
| |
|
|
|
|
|
| |
The old implementation was not always correct. The new one recognizes
more shuffles that match specific instructions.
llvm-svn: 322498
|
| |
|
|
|
|
|
|
|
|
| |
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.
Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662
llvm-svn: 322488
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40067
llvm-svn: 322485
|
| |
|
|
|
|
|
| |
This reverts commit r322085. Internal PPC testing is still showing the
same symptoms as when this patch landed the last time.
llvm-svn: 322474
|
| |
|
|
|
|
| |
The test was added ages ago, but we didn't comment where it came from.
llvm-svn: 322465
|
| |
|
|
| |
llvm-svn: 322464
|
| |
|
|
| |
llvm-svn: 322463
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scalar operations
Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free.
Reviewers: spatel, RKSimon, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42018
llvm-svn: 322462
|
| |
|
|
| |
llvm-svn: 322460
|
| |
|
|
| |
llvm-svn: 322459
|
| |
|
|
|
|
| |
Shows a missed opportunity to remove a unnecessary move compared to 31 shuffle mask.
llvm-svn: 322458
|
| |
|
|
| |
llvm-svn: 322457
|
| |
|
|
|
|
| |
types have the same number of elements.
llvm-svn: 322455
|
| |
|
|
|
|
|
|
| |
We have to take special care to avoid the cases where the result of the truncate would be padded with zero elements.
Ideally we'd just use ISD::TRUNCATE for these cases instead.
llvm-svn: 322454
|
| |
|
|
|
|
|
|
| |
Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization.
Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already.
llvm-svn: 322450
|
| |
|
|
|
|
|
|
| |
additional test cases.
Additional test cases cover selects with i16/i8 conditions that are only 128/256-bits wide, but the compares are 512-bits wide and can only produce k-registers. We should be able to artificially widen the selects to avoid moving the k-register to an xmm/ymm register.
llvm-svn: 322449
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to the existing match as part of a loop-reduction, add a
straightforward pattern match for DAG-contained patterns.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D41811
llvm-svn: 322446
|
| |
|
|
| |
llvm-svn: 322444
|