| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 222516
|
|
|
|
|
|
|
|
|
|
|
|
| |
divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
|
|
|
|
| |
llvm-svn: 222509
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
|
|
|
|
|
|
|
|
| |
match the custom lowering.
<rdar://problem/19026326>
llvm-svn: 222489
|
|
|
|
|
|
| |
Follow up to r221940, where I must not have caught em all. NFC
llvm-svn: 222481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
|
|
|
|
| |
llvm-svn: 222475
|
|
|
|
|
|
| |
have a circular dependency.
llvm-svn: 222458
|
|
|
|
|
|
|
| |
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
llvm-svn: 222439
|
|
|
|
|
|
| |
code R_ARM_PLT32
llvm-svn: 222414
|
|
|
|
| |
llvm-svn: 222412
|
|
|
|
| |
llvm-svn: 222399
|
|
|
|
| |
llvm-svn: 222396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.
With this patch, a build_vector that performs a blend with zero is
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.
This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.
Differential Revision: http://reviews.llvm.org/D6311
llvm-svn: 222375
|
|
|
|
|
|
|
|
|
|
|
|
| |
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
llvm-svn: 222368
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5519
llvm-svn: 222367
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D6169
llvm-svn: 222355
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5800
llvm-svn: 222352
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5799
llvm-svn: 222351
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5407
llvm-svn: 222348
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5240
llvm-svn: 222347
|
|
|
|
|
|
|
|
|
|
| |
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
Differential Revision: http://reviews.llvm.org/D5699
llvm-svn: 222340
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
|
|
|
|
|
|
|
|
| |
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved
and benefical for ooo cores, we can enable it again.
llvm-svn: 222333
|
|
|
|
|
|
|
|
|
|
|
|
| |
AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.
Reviewed in http://reviews.llvm.org/D5864.
llvm-svn: 222331
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)
Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.
llvm-svn: 222319
|
|
|
|
| |
llvm-svn: 222292
|
|
|
|
|
|
|
|
|
| |
This partially makes up for not having address spaces
used for alias analysis in some simple cases.
This is not yet enabled by default so shouldn't change anything yet.
llvm-svn: 222286
|
|
|
|
|
|
|
|
|
|
| |
Assuming unmodeled side effects interferes with some scheduling
opportunities.
Don't put it in the base class of DS instructions since there
are a few weird effecting, non load/store instructions there.
llvm-svn: 222285
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.
This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.
This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll
Differential Revision: http://reviews.llvm.org/D6252
llvm-svn: 222281
|
|
|
|
| |
llvm-svn: 222274
|
|
|
|
|
|
|
|
|
| |
shift-right for booleans (i1).
Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222272
|
|
|
|
|
|
|
|
|
| |
shift-right for booleans (i1).
Logical shift-right immediate with sign-/zero-extensions also works for boolean
values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222270
|
|
|
|
|
|
| |
Renaming test files.
llvm-svn: 222263
|
|
|
|
|
|
|
|
|
|
|
| |
"zero" shifts."
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.
Related to PR21594.
llvm-svn: 222257
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should expose more of the actually used VALU
instructions to the machine optimization passes.
This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.
llvm-svn: 222256
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.
Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.
I don't have a test case for this, because it isn't easy to trigger.
llvm-svn: 222255
|
|
|
|
| |
llvm-svn: 222253
|
|
|
|
|
|
| |
Adding test to show correct instruction selection and encoding.
llvm-svn: 222249
|
|
|
|
|
|
|
|
| |
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.
llvm-svn: 222247
|
|
|
|
| |
llvm-svn: 222244
|
|
|
|
|
|
| |
This reverts commit r222180.
llvm-svn: 222188
|
|
|
|
|
|
|
| |
The itanium environment on Windows uses MSVC and is a MSVC environment. Report
this correctly.
llvm-svn: 222180
|
|
|
|
|
|
|
|
|
| |
This was resulting in use of a register after a kill.
For some reason this showed up as a problem in many tests
when moving the SIFixSGPRCopies pass closer to instruction
selection.
llvm-svn: 222175
|
|
|
|
|
|
| |
I'm not sure if this was breaking anything.
llvm-svn: 222174
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5934
llvm-svn: 222141
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was motivated by a bug which caused code like this to be
miscompiled:
declare void @take_ptr(i8*)
define void @test() {
%addr1.32 = alloca i8
%addr2.32 = alloca i32, i32 1028
call void @take_ptr(i8* %addr1)
ret void
}
This was emitting the following assembly to get the value of %addr1:
add r0, sp, #1020
add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
add r0, sp, #1020
add r0, sp, #8
This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.
llvm-svn: 222125
|
|
|
|
| |
llvm-svn: 222109
|
|
|
|
|
|
| |
the instruction pattern.
llvm-svn: 222094
|