| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 228754
|
| |
|
|
|
|
|
|
|
| |
Simply loading or storing the frame pointer is not sufficient for
Windows targets. Instead, create a synthetic frame object that we will
lower later. References to this synthetic object will be replaced with
the correct reference to the frame address.
llvm-svn: 228748
|
| |
|
|
| |
llvm-svn: 228728
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers. The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction. The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.
Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc. It has also been verified with the
--no-tls-optimize flag workaround removed.
llvm-svn: 228725
|
| |
|
|
|
|
|
|
|
|
|
| |
Walk the instructions marked FrameSetup and consider any stores of XMM
registers to the stack as needing a SaveXMM opcode.
This fixes PR22521.
Differential Revision: http://reviews.llvm.org/D7527
llvm-svn: 228724
|
| |
|
|
|
|
|
| |
Some old assembly code uses the cntlz alias for cntlzw, binutils supports this,
and we should too. Fixes PR22519.
llvm-svn: 228719
|
| |
|
|
|
|
| |
decoder function for 64bit control register class.
llvm-svn: 228708
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D7465
llvm-svn: 228703
|
| |
|
|
|
|
|
|
| |
Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns)
Differential Revision: http://reviews.llvm.org/D7492
llvm-svn: 228688
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc.
Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources.
Differential Revision: http://reviews.llvm.org/D7484
llvm-svn: 228685
|
| |
|
|
|
|
|
|
| |
and SWM16
Differential Revision: http://reviews.llvm.org/D7436
llvm-svn: 228683
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches X86FastISel how to select AVX instructions for scalar
float/double convert operations.
Before this patch, X86FastISel always selected legacy SSE instructions
for FPExt (from float to double) and FPTrunc (from double to float).
For example:
\code
define double @foo(float %f) {
%conv = fpext float %f to double
ret double %conv
}
\end code
Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:
cvtss2sd %xmm0, %xmm0
With this patch, X86FastIsel selects a VCVTSS2SDrr instead:
vcvtss2sd %xmm0, %xmm0, %xmm0
Added test fast-isel-fptrunc-fpext.ll to check both the register-register and
the register-memory float/double conversion variants.
Differential Revision: http://reviews.llvm.org/D7438
llvm-svn: 228682
|
| |
|
|
|
|
|
|
|
|
| |
when handling store unfolding.
Bug spotted by Steve King.
I have no idea how to test this.
llvm-svn: 228672
|
| |
|
|
| |
llvm-svn: 228671
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Win64 has specific contraints on what valid prologues and epilogues look
like. This constraint is born from the flexibility and descriptiveness
of Win64's unwind opcodes.
Prologues previously emitted by LLVM could not be represented by the
unwind opcodes, preventing operations powered by stack unwinding to
successfully work.
Differential Revision: http://reviews.llvm.org/D7520
llvm-svn: 228641
|
| |
|
|
|
|
| |
preparation for making it MachineFunction dependent.
llvm-svn: 228638
|
| |
|
|
| |
llvm-svn: 228635
|
| |
|
|
|
|
| |
parameter.
llvm-svn: 228630
|
| |
|
|
|
|
| |
unused ones.
llvm-svn: 228627
|
| |
|
|
| |
llvm-svn: 228614
|
| |
|
|
| |
llvm-svn: 228605
|
| |
|
|
| |
llvm-svn: 228602
|
| |
|
|
| |
llvm-svn: 228598
|
| |
|
|
| |
llvm-svn: 228593
|
| |
|
|
|
|
|
|
|
|
|
|
| |
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.
Phabricator review: http://reviews.llvm.org/D7469
llvm-svn: 228580
|
| |
|
|
| |
llvm-svn: 228579
|
| |
|
|
| |
llvm-svn: 228578
|
| |
|
|
|
|
| |
used.
llvm-svn: 228563
|
| |
|
|
| |
llvm-svn: 228562
|
| |
|
|
|
|
| |
patterns. AVX and AVX2 can handle unaligned loads being folded so we can just use 'load'
llvm-svn: 228551
|
| |
|
|
| |
llvm-svn: 228529
|
| |
|
|
|
|
| |
NFC.
llvm-svn: 228526
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Should fix PR21645.
llvm-svn: 228518
|
| |
|
|
| |
llvm-svn: 228515
|
| |
|
|
| |
llvm-svn: 228514
|
| |
|
|
| |
llvm-svn: 228509
|
| |
|
|
| |
llvm-svn: 228493
|
| |
|
|
|
|
|
|
|
|
| |
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.
llvm-svn: 228488
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from a conditional branch fed by an add/sub/mul-with-overflow node.
We previously used the SDLoc of the overflow node, for no good reason.
In some cases, this led to the Bcc and B terminators having different
source orders, and DBG_VALUEs being inserted between them.
The real issue is with the code that can't handle DBG_VALUEs between
terminators: the few places affected by this will be fixed soon.
In the meantime, fixing the SDLoc is a positive change no matter what.
No tests, as I have no idea how to get .loc emitted for branches?
rdar://19347133
llvm-svn: 228463
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).
Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.
llvm-svn: 228460
|
| |
|
|
| |
llvm-svn: 228452
|
| |
|
|
| |
llvm-svn: 228442
|
| |
|
|
|
|
| |
Fixes PR22488
llvm-svn: 228411
|
| |
|
|
| |
llvm-svn: 228410
|
| |
|
|
|
|
| |
NFC.
llvm-svn: 228399
|
| |
|
|
|
|
|
|
|
| |
Doesn't seem necessary anymore. I think this was mostly compensating for
not enabling WQM for texture sampling instructions.
v2: Add test coverage
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 228373
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If whole quad mode isn't enabled for these, the level of detail is
calculated incorrectly for pixels along diagonal triangle edges, causing
artifacts.
v2: Use a TSFlag instead of lots of switch cases
v3: Add test coverage
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 228372
|
| |
|
|
| |
llvm-svn: 228349
|
| |
|
|
| |
llvm-svn: 228348
|
| |
|
|
|
|
| |
multiply to be expanded.
llvm-svn: 228347
|