| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
(icmp slt x, 0) -> (icmp sle (add x, 1), 0)
(icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
(icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
(icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)
Original Message:
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.
Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.
This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).
This uses the LOCK ISD opcodes added by r262244.
Differential Revision: http://reviews.llvm.org/D17633
llvm-svn: 265636
|
| |
|
|
|
|
| |
The non-1 and EQ/NE tests were misguided.
llvm-svn: 265635
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.
This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.
This is already covered by the lit tests; I just got the expectations
wrong previously.
llvm-svn: 265623
|
| |
|
|
|
|
| |
It caused ASan 32-bit tests to hang (PR27245).
llvm-svn: 265559
|
| |
|
|
|
|
|
|
|
| |
eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.
llvm-svn: 265551
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Differential Revision: http://reviews.llvm.org/D15302
llvm-svn: 265547
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal. This change has LLVM lower
```
%val = call @llvm.experimental.deoptimize
ret %val
```
to effectively
```
call @__llvm_deoptimize()
unreachable
```
instead.
llvm-svn: 265502
|
| |
|
|
|
|
|
| |
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.
llvm-svn: 265481
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17863
llvm-svn: 265480
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.
Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.
This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).
This uses the LOCK ISD opcodes added by r262244.
Differential Revision: http://reviews.llvm.org/D17633
llvm-svn: 265450
|
| |
|
|
| |
llvm-svn: 265448
|
| |
|
|
|
|
|
|
|
|
|
| |
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to include the first line of the function (expected to be
a comment line). This ensures that nothing bad has happened
before the first actual line of checked asm. It also matches
the existing behavior of the old script.
llvm-svn: 265416
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.
Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.
The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng
http://reviews.llvm.org/D16984
llvm-svn: 265397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I encountered this issue when constant folding during inlining tried to
fold away a bitcast of a double to an x86_mmx, which is not an integral
type. The test case exposes the same issue with a smaller code snippet
during early CSE.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18528
llvm-svn: 265367
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
eliminateCallFramePseudoInstr (PR27140)"
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:
addl $12, %esp
leal -4(%ebp), %esp
To be "optimized" into simply:
addl $8, %esp
This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.
llvm-svn: 265345
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds some dllexport tests to verify that:
- Variables in bss are exported appropriately
- Non-dllexport symbols aliased to dllexport symbols are not exported
- Symbols declared as dllexport but are not defined are not exported
We plan to enable dllimport/dllexport support for the PS4, and these
additional tests are for points we noticed in our internal testing.
Patch by Warren Ristow!
Differential Revision: http://reviews.llvm.org/D18682
llvm-svn: 265333
|
| |
|
|
|
|
| |
investigate.
llvm-svn: 265317
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Differential Revision: http://reviews.llvm.org/D15302
llvm-svn: 265309
|
| |
|
|
|
|
|
|
|
| |
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.
Differential Revision: http://reviews.llvm.org/D18740
llvm-svn: 265283
|
| |
|
|
| |
llvm-svn: 265267
|
| |
|
|
|
|
|
|
|
|
| |
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.
Differential Revision: http://reviews.llvm.org/D18737
llvm-svn: 265259
|
| |
|
|
|
|
| |
More examples of PR22603, poor vector splitting for AVX512F targets as well as missing uses of PACKSS/MOVMSK
llvm-svn: 265248
|
| |
|
|
| |
llvm-svn: 265247
|
| |
|
|
| |
llvm-svn: 265222
|
| |
|
|
| |
llvm-svn: 265192
|
| |
|
|
|
|
| |
float2double
llvm-svn: 265186
|
| |
|
|
| |
llvm-svn: 265185
|
| |
|
|
| |
llvm-svn: 265184
|
| |
|
|
| |
llvm-svn: 265183
|
| |
|
|
| |
llvm-svn: 265179
|
| |
|
|
| |
llvm-svn: 265173
|
| |
|
|
|
|
|
|
|
| |
Was there really no other way to splat a byte in SSE2?
punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
llvm-svn: 265172
|
| |
|
|
| |
llvm-svn: 265171
|
| |
|
|
|
|
| |
Added SSE + AVX1 tests as well as AVX2
llvm-svn: 265169
|
| |
|
|
|
|
|
|
| |
Note however that this is identical to the existing SSE2 run.
What we really want is yet another run for an SSE2 machine that
also has fast unaligned 16-byte accesses.
llvm-svn: 265167
|
| |
|
|
| |
llvm-svn: 265164
|
| |
|
|
|
|
| |
Replaced lots of dodgy greps with actual codegen
llvm-svn: 265163
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.
That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.
The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.
llvm-svn: 265161
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.
That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.
The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.
Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.
Differential Revision: http://reviews.llvm.org/D18676
llvm-svn: 265148
|
| |
|
|
| |
llvm-svn: 265135
|
| |
|
|
|
|
|
|
| |
Add a new Intel MCU CPU Lakemont, which doesn't support X87.
Differential Revision: http://reviews.llvm.org/D18650
llvm-svn: 265128
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This changes some dllexport tests, to verify that some symbols that
should not be exported are not, in a way that improves the robustness
of CHECK-SAME interaction with CHECK-NOT.
We plan to enable dllimport/dllexport support for the PS4, and these
changes are for points we noticed in our internal testing.
Patch by Warren Ristow!
llvm-svn: 265106
|
| |
|
|
|
|
|
|
|
| |
Re-enable an assertion enabled by Justin Lebar in rL265092. rL265092
was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc
does not like non-i64 return types. Change the test case to not do
that.
llvm-svn: 265099
|
| |
|
|
| |
llvm-svn: 265081
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.
http://reviews.llvm.org/D18612
<rdar://problem/25427165>
llvm-svn: 265077
|
| |
|
|
| |
llvm-svn: 265048
|
| |
|
|
|
|
|
|
|
|
| |
eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio
llvm-svn: 265046
|
| |
|
|
|
|
| |
We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars).
llvm-svn: 265045
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR27140)
For code such as:
void f(int, int);
void g() {
f(1, 2);
}
compiled for 32-bit X86 Linux, Clang would previously generate:
subl $12, %esp
subl $8, %esp
pushl $2
pushl $1
calll f
addl $16, %esp
addl $12, %esp
retl
This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265039
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.
Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143
Differential Revision: http://reviews.llvm.org/D18566
llvm-svn: 265029
|