| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 176412
|
| |
|
|
| |
llvm-svn: 176411
|
| |
|
|
|
|
| |
Mark them as expand, they are not legal as our backend does not match them.
llvm-svn: 176410
|
| |
|
|
|
|
|
|
| |
This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation
fingers crossed so that it does break clang boostrap again..
llvm-svn: 176408
|
| |
|
|
|
|
|
| |
this is similar to getObjectSize(), but doesnt subtract the offset
tweak the BasicAA code accordingly (per PR14988)
llvm-svn: 176407
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
llvm-svn: 176403
|
| |
|
|
| |
llvm-svn: 176400
|
| |
|
|
|
|
|
|
|
|
| |
The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.
PR14448.
llvm-svn: 176399
|
| |
|
|
| |
llvm-svn: 176397
|
| |
|
|
| |
llvm-svn: 176391
|
| |
|
|
|
|
|
|
|
|
| |
The sys::fs::is_directory() check is unnecessary because, if the filename is
a directory, the function will fail anyway with the same error code returned.
Remove the check to avoid an unnecessary stat call.
Someone needs to review on windows and see if the check is necessary there or not.
llvm-svn: 176386
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch eliminates the need to emit a constant move instruction when this
pattern is matched:
(select (setgt a, Constant), T, F)
The pattern above effectively turns into this:
(conditional-move (setlt a, Constant + 1), F, T)
llvm-svn: 176384
|
| |
|
|
| |
llvm-svn: 176382
|
| |
|
|
| |
llvm-svn: 176380
|
| |
|
|
| |
llvm-svn: 176378
|
| |
|
|
|
|
|
| |
handle indirect register inputs.
rdar://13322011
llvm-svn: 176367
|
| |
|
|
|
|
| |
Fixes PR15384.
llvm-svn: 176366
|
| |
|
|
|
|
|
|
|
|
| |
LLVMContext.
This reduces the time actually spent doing string to ID conversion and shows a 10% improvement in compile time for a particularly bad case that involves ARM Neon intrinsics (these have many overloads).
Patch by Jean-Luc Duprat!
llvm-svn: 176365
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
but TLI.getShiftAmountTy() so far only return scalar type. As a
result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
return target-specificed scalar type or the same vector type as the
1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
type.
llvm-svn: 176364
|
| |
|
|
|
|
|
|
| |
dispatch code. As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352
llvm-svn: 176363
|
| |
|
|
| |
llvm-svn: 176358
|
| |
|
|
|
|
|
|
| |
v2: based on Michels patch, but now allows copying of all registers sizes.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176346
|
| |
|
|
|
|
|
| |
They won't match anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176345
|
| |
|
|
|
|
|
| |
It's much easier to specify the encoding with tablegen directly.
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176344
|
| |
|
|
|
| |
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176343
|
| |
|
|
|
| |
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176342
|
| |
|
|
| |
llvm-svn: 176341
|
| |
|
|
| |
llvm-svn: 176330
|
| |
|
|
|
|
|
|
| |
successor basic blocks.
Currently this is off by default.
llvm-svn: 176329
|
| |
|
|
|
|
|
|
| |
terminator.
No functionality change.
llvm-svn: 176326
|
| |
|
|
|
|
|
|
| |
This function will be used later when the capability to search delay slot
filling instructions in successor blocks is added. No intended functionality
changes.
llvm-svn: 176325
|
| |
|
|
| |
llvm-svn: 176321
|
| |
|
|
|
|
|
|
| |
can fill the delay slot.
Currently, this is off by default.
llvm-svn: 176320
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 176318
|
| |
|
|
| |
llvm-svn: 176317
|
| |
|
|
| |
llvm-svn: 176316
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We avoided computing DAG height/depth during Node printing because it
shouldn't depend on an otherwise valid DAG. But this has become far
too annoying for the common case of a valid DAG where we want to see
valid values. If doing the computation on-the-fly turns out to be a
problem in practice, then I'll add a mode to the diagnostics to only
force it when we're likely to have a valid DAG, otherwise explicitly
print INVALID instead of bogus numbers. For now, just go for it all
the time.
llvm-svn: 176314
|
| |
|
|
|
|
|
| |
This class tracks dependence between memory instructions using underlying
objects of memory operands.
llvm-svn: 176313
|
| |
|
|
|
|
|
|
|
|
|
| |
SelectionDAGIsel::LowerArguments needs a function, not a basic block. So it
makes sense to pass it the function instead of extracting a basic-block from
the function and then tossing it. This is also more self-documenting (functions
have arguments, BBs don't).
In addition, added comments to a couple of Select* methods.
llvm-svn: 176305
|
| |
|
|
|
|
|
|
| |
This was causing the folding set to fail to fold attributes, because it was
being calculated in one spot without an empty values string but here with an
empty values string.
llvm-svn: 176301
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The instcombine recognized pattern looks like:
a = b * c
d = a +/- Cst
or
a = b * c
d = Cst +/- a
When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0).
The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1.
llvm-svn: 176300
|
| |
|
|
|
|
|
|
| |
our bitwise compare is equal to the field we're looking for.
Noticed on inspection.
llvm-svn: 176296
|
| |
|
|
|
|
| |
that doesn't exist.
llvm-svn: 176289
|
| |
|
|
| |
llvm-svn: 176288
|
| |
|
|
|
|
|
|
|
|
| |
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.
Differential Revision: http://llvm-reviews.chandlerc.com/D466
llvm-svn: 176287
|
| |
|
|
| |
llvm-svn: 176285
|
| |
|
|
|
|
| |
other per-instruction statistics.
llvm-svn: 176273
|
| |
|
|
| |
llvm-svn: 176270
|
| |
|
|
|
|
|
|
| |
The work done by the post-encoder (setting architecturally unused bits to 0 as
required) can be done by the existing operand that covers the "#0.0". This
removes at least one use of the discouraged PostEncoderMethod uses.
llvm-svn: 176261
|
| |
|
|
|
|
|
|
| |
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.
llvm-svn: 176259
|