| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 193320
|
| |
|
|
|
|
|
|
|
| |
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.
llvm-svn: 193316
|
| |
|
|
| |
llvm-svn: 193312
|
| |
|
|
| |
llvm-svn: 193311
|
| |
|
|
| |
llvm-svn: 193310
|
| |
|
|
|
|
| |
cases. Some of these may have been tested by other tests, but most weren't. Patch by Cameron McInally.
llvm-svn: 193309
|
| |
|
|
|
|
| |
Also update the cost model.
llvm-svn: 193270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On sandy bridge (PR17654) we now get
vpxor %xmm1, %xmm1, %xmm1
vpunpckhbw %xmm1, %xmm0, %xmm2
vpunpcklbw %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
On haswell it's a simple
vpmovzxbw %xmm0, %ymm0
There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.
llvm-svn: 193262
|
| |
|
|
|
|
|
|
|
| |
- Skip instructions added in prolog. For specific targets, prolog may
insert helper function calls (e.g. _chkstk will be called when
there're more than 4K bytes allocated on stack). However, these
helpers don't use/def YMM/XMM registers.
llvm-svn: 193261
|
| |
|
|
|
|
| |
intrinsics)
llvm-svn: 193239
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This fixes piglit:
- shaders/glsl-fs-texture2d-masked
- shaders/glsl-fs-texture2d-masked-4
Patch by: Marek Olšák
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 193222
|
| |
|
|
|
|
|
|
|
|
| |
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
llvm-svn: 193215
|
| |
|
|
| |
llvm-svn: 193213
|
| |
|
|
| |
llvm-svn: 193212
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
llvm-svn: 193179
|
| |
|
|
| |
llvm-svn: 193156
|
| |
|
|
|
|
| |
PR3168
llvm-svn: 193152
|
| |
|
|
| |
llvm-svn: 193130
|
| |
|
|
| |
llvm-svn: 193117
|
| |
|
|
| |
llvm-svn: 193112
|
| |
|
|
| |
llvm-svn: 193111
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Fixes <rdar://problem/14968098>
llvm-svn: 193096
|
| |
|
|
| |
llvm-svn: 193083
|
| |
|
|
|
|
|
| |
The second parameter of the SLD intrinsic is the number of columns (GPR) to
slide left the source array.
llvm-svn: 193076
|
| |
|
|
|
|
|
|
|
| |
This ensures that the prefix data is treated as part of the function for
the purpose of debug info. This provides a better debugging experience,
among other things by allowing a debug info client to correctly look up
a function in debug info given a function pointer.
llvm-svn: 193042
|
| |
|
|
| |
llvm-svn: 193021
|
| |
|
|
|
|
| |
Forgot to 'svn add'
llvm-svn: 192978
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PR17168 describes a test case that fails when compiling for debug with
fast-isel. Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.
For this problem to occur requires the following:
* Compile for debug
* Compile with fast-isel
* In a block B, fast-isel must partially succeed before punting to DAG-isel
* B must start with a PHI
* The first unhandled node in the DAG must not generate a machine instruction
* A debug value with an order less than that of that first node exists
When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire. Currently it tests whether the block is empty,
or whether the last instruction generated is a phi. When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi. This patch adds that check, and adds the test case from the
PR as a regression test.
llvm-svn: 192976
|
| |
|
|
| |
llvm-svn: 192970
|
| |
|
|
| |
llvm-svn: 192961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This caused the clang-native-mingw32-win7 buildbot to break.
The assembler was complaining about the following lines that were showing up
in the asm for CrashRecoveryContext.cpp:
movl $"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4", 4(%eax)
calll "_AddVectoredExceptionHandler@8"
.def "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4";
"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4":
calll "_RemoveVectoredExceptionHandler@4"
Reverting for now.
llvm-svn: 192940
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
Error: cannot honor width suffix -- `ldrb r3,[r0],#1'
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
llvm-svn: 192916
|
| |
|
|
|
|
|
|
| |
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
llvm-svn: 192908
|
| |
|
|
|
|
|
|
| |
binaries.
This tests r190354.
llvm-svn: 192903
|
| |
|
|
| |
llvm-svn: 192895
|
| |
|
|
|
|
| |
They were failing with -mcpu=atom.
llvm-svn: 192890
|
| |
|
|
| |
llvm-svn: 192889
|
| |
|
|
|
|
| |
E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2).
llvm-svn: 192884
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
llvm-svn: 192883
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following:
typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));
int4 __bbase_cvt_int(ushort4 v) {
ushort8 a;
a.lo = v;
return _mm_cvtepu16_epi32(a);
}
This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
%tmp = bitcast double %v.coerce to <4 x i16>
%tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
%0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
ret <4 x i32> %tmp2
}
The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
movsd %xmm0, -8(%rsp) <== Spill to the stack.
movq -8(%rsp), %xmm0 <== Reload it right back out.
pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for.
pblendw $1, %xmm1, %xmm0 <== We don't need this at all
pmovzxwd %xmm0, %xmm0 <== We already did this
ret
The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).
To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
pmovzxwd %xmm0, %xmm0
ret
rdar://15245794
llvm-svn: 192866
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reason this got reverted was that the @feat.00 symbol which was emitted
for every TU became quoted, and on cygwin/mingw we use the gas assembler which
couldn't handle the quotes.
This commit fixes the problem by only emitting @feat.00 for win32, where we use
clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no
loss there.
With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since
it uses the Itanium ABI, which doesn't put these funny characters in symbols.
> Because of win32 mangling, we produce symbol and section names with
> funny characters in them, most notably @ characters.
>
> MC would choke on trying to parse its own assembly output. This patch addresses
> that by:
>
> - Making @ trigger quoting of symbol names
> - Also quote section names in the same way
> - Just parse section names like other identifiers (to allow for quotes)
> - Don't assume @ signifies a symbol variant if it is in a string.
llvm-svn: 192859
|
| |
|
|
| |
llvm-svn: 192843
|
| |
|
|
| |
llvm-svn: 192842
|
| |
|
|
|
|
|
|
|
| |
bulldozer and piledriver. Support for the instruction itself seems to have
already been added in r178040.
Differential Revision: http://llvm-reviews.chandlerc.com/D1933
llvm-svn: 192828
|
| |
|
|
|
|
|
|
|
|
| |
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192813
|
| |
|
|
| |
llvm-svn: 192810
|
| |
|
|
|
|
| |
value and unsigned saturating accumulate of signed value instructions.
llvm-svn: 192800
|
| |
|
|
|
|
|
|
|
|
|
| |
constant.
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
llvm-svn: 192795
|
| |
|
|
|
|
|
| |
The input to an RxSBG operation can be narrower as long as the upper bits
are don't care. This fixes a FIXME added in r192783.
llvm-svn: 192790
|
| |
|
|
|
|
|
|
| |
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
llvm-svn: 192784
|