| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698.
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.
The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.
This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.
Differential Revision: http://reviews.llvm.org/D6850
llvm-svn: 226845
|
| |
|
|
|
|
|
|
|
|
|
| |
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
Fixed recommit of r226811.
llvm-svn: 226816
|
| |
|
|
| |
llvm-svn: 226814
|
| |
|
|
|
|
|
|
|
| |
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
llvm-svn: 226811
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.
llvm-svn: 226808
|
| |
|
|
|
|
|
|
|
| |
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).
Added a test provided by David (dag@cray.com)
llvm-svn: 226805
|
| |
|
|
| |
llvm-svn: 226768
|
| |
|
|
| |
llvm-svn: 226767
|
| |
|
|
|
|
|
|
|
|
| |
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.
The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.
Differential Revision: http://reviews.llvm.org/D7094
llvm-svn: 226745
|
| |
|
|
|
|
|
| |
It can help with argument juggling on some targets, and is generally a good
idea.
llvm-svn: 226740
|
| |
|
|
|
|
|
| |
Make sure this uses the faster expansion using magic constants
to avoid the full division path.
llvm-svn: 226734
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.
Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.
Also adds a missing tablegen pattern for MOVDDUP.
Differential Revision: http://reviews.llvm.org/D7042
llvm-svn: 226716
|
| |
|
|
| |
llvm-svn: 226713
|
| |
|
|
|
|
|
|
|
|
| |
Thumbv4t does not have lo->lo copies other than MOVS,
and that can't be predicated. So emit MOVS when needed
and bail if there's a predicate.
http://reviews.llvm.org/D6592
llvm-svn: 226711
|
| |
|
|
|
|
|
|
|
| |
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.
llvm-svn: 226682
|
| |
|
|
|
|
| |
patterns.
llvm-svn: 226681
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions. This for instance enables
a DAGCombine to fire on code such as
(and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
(zextload ...)
as seen in the testcase changes.
There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.
Differential Revision: http://reviews.llvm.org/D6533
llvm-svn: 226676
|
| |
|
|
|
|
|
|
| |
It hadn't gone through review yet, but was still on my local copy.
This reverts commit r226663
llvm-svn: 226665
|
| |
|
|
|
|
|
|
|
| |
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.
From: Andrew Turner <andrew@fubar.geek.nz>
llvm-svn: 226664
|
| |
|
|
| |
llvm-svn: 226663
|
| |
|
|
|
|
| |
in x32 mode.
llvm-svn: 226661
|
| |
|
|
|
|
| |
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.
llvm-svn: 226623
|
| |
|
|
|
|
| |
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.
llvm-svn: 226622
|
| |
|
|
|
|
| |
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.
llvm-svn: 226621
|
| |
|
|
|
|
| |
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.
llvm-svn: 226620
|
| |
|
|
| |
llvm-svn: 226606
|
| |
|
|
| |
llvm-svn: 226602
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.
This is not a complete solution but works around the most pressing
issue.
Review: http://reviews.llvm.org/D7070
llvm-svn: 226600
|
| |
|
|
|
|
|
|
|
|
| |
With the appropriate Verifier changes, exactracting the result out of a
statepoint wrapping a vararg function crashes. However, a void vararg
function works fine: commit this first step.
Differential Revision: http://reviews.llvm.org/D7071
llvm-svn: 226599
|
| |
|
|
| |
llvm-svn: 226596
|
| |
|
|
|
|
|
| |
This allows us to re-use the same register for the scratch offset
when accessing large private arrays.
llvm-svn: 226585
|
| |
|
|
|
|
|
|
| |
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.
llvm-svn: 226584
|
| |
|
|
|
|
|
|
| |
No functional change.
Reviewed by D. Sanders
llvm-svn: 226574
|
| |
|
|
|
|
|
|
|
| |
This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132.
It also includes patterns for instruction selection and test cases.
Reviewed by D. Sanders
llvm-svn: 226573
|
| |
|
|
|
|
|
|
|
|
| |
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).
Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.
Differential Revision: http://reviews.llvm.org/D7055
llvm-svn: 226513
|
| |
|
|
|
|
| |
compatible with i32 rather than doing custom type conversion.
llvm-svn: 226500
|
| |
|
|
| |
llvm-svn: 226483
|
| |
|
|
| |
llvm-svn: 226480
|
| |
|
|
| |
llvm-svn: 226478
|
| |
|
|
|
|
|
|
|
|
| |
Original patch by Luke Iannini. Minor improvements and test added by
Erik de Castro Lopo.
Differential Revision: http://reviews.llvm.org/D6877
From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
|
| |
|
|
| |
llvm-svn: 226472
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No change in this commit, but clang was changed to also produce trivial comdats when
needed.
Original message:
Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.
Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.
llvm-svn: 226467
|
| |
|
|
|
|
|
|
|
|
| |
source and dest are local
This fixes PR21792.
Differential Revision: http://reviews.llvm.org/D6823
llvm-svn: 226433
|
| |
|
|
|
|
|
|
|
|
|
| |
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.
llvm-svn: 226432
|
| |
|
|
|
|
| |
This is already covered in ftrunc.ll
llvm-svn: 226412
|
| |
|
|
| |
llvm-svn: 226406
|
| |
|
|
| |
llvm-svn: 226405
|
| |
|
|
| |
llvm-svn: 226404
|
| |
|
|
|
|
|
|
| |
Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added.
I'll then move on to SSE tests and then the integer instructions.
llvm-svn: 226400
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.
GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.
When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.
llvm-svn: 226399
|