| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
done for 128-bit.
llvm-svn: 156375
|
| |
|
|
| |
llvm-svn: 156324
|
| |
|
|
|
|
|
| |
single use.
rdar://11360370
llvm-svn: 156316
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
rdar: 10961709
llvm-svn: 156312
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156294
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156293
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156292
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156285
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156284
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156283
|
| |
|
|
| |
llvm-svn: 156282
|
| |
|
|
| |
llvm-svn: 156281
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156280
|
| |
|
|
|
|
|
|
| |
from the previous 2 patches.
Patch by Jack Carter.
llvm-svn: 156279
|
| |
|
|
|
|
|
|
|
| |
The primitive conservative heuristic seems to give a slight overall
improvement while not regressing stuff. Make it available to wider
testing. If you notice any speed regressions (or significant code
size regressions) let me know!
llvm-svn: 156258
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:
ucomisd (%rdi), %xmm0
cmovbel %edx, %esi
cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.
This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.
It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.
Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.
Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)
llvm-svn: 156234
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
|
| |
|
|
|
|
| |
16-bits encoding of CMN instructions.
llvm-svn: 156195
|
| |
|
|
| |
llvm-svn: 156156
|
| |
|
|
|
|
| |
This patch creates and optimizes packets as per Hexagon ISA rules.
llvm-svn: 156109
|
| |
|
|
|
|
| |
lower half correctly. Missed in r155982.
llvm-svn: 156059
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to catch cases like:
%reg1024<def> = MOV r1
%reg1025<def> = MOV r0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:
%reg1024<def> = MOV r0
%reg1025<def> = MOV 0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
That did no benefit but rather ensure the last MOV would not be coalesced.
rdar://11355268
llvm-svn: 156048
|
| |
|
|
|
|
| |
just like it now knows for FMULs.
llvm-svn: 156029
|
| |
|
|
| |
llvm-svn: 156023
|
| |
|
|
|
|
|
|
| |
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.
llvm-svn: 155992
|
| |
|
|
|
|
| |
for AsmPrinter.
llvm-svn: 155982
|
| |
|
|
|
|
| |
PR10799
llvm-svn: 155954
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0
FROM
movl %edi, %ecx
subl %esi, %ecx
cmpl %edi, %esi
movl $0, %eax
cmovll %ecx, %eax
TO
xorl %eax, %eax
subl %esi, %edi
cmovll %eax, %edi
movl %edi, %eax
rdar: 10734411
llvm-svn: 155919
|
| |
|
|
| |
llvm-svn: 155912
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
llvm-svn: 155853
|
| |
|
|
| |
llvm-svn: 155840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.
(this time, actually commit what was reviewed!)
llvm-svn: 155825
|
| |
|
|
|
|
|
|
| |
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal. Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.
llvm-svn: 155824
|
| |
|
|
|
|
|
|
|
|
| |
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155749
|
| |
|
|
| |
llvm-svn: 155746
|
| |
|
|
|
|
|
|
|
|
|
|
| |
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.
llvm-svn: 155745
|
| |
|
|
|
|
| |
This definitely caused regression with ARM -mno-thumb.
llvm-svn: 155743
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x == -y --> x+y == 0
x != -y --> x+y != 0
On x86, the generated code goes from
negl %esi
cmpl %esi, %edi
je .LBB0_2
to
addl %esi, %edi
je .L4
This case is correctly handled for ARM with "cmn".
Patch by Manman Ren.
rdar://11245199
PR12545
llvm-svn: 155739
|
| |
|
|
| |
llvm-svn: 155732
|
| |
|
|
|
|
| |
<rdar://problem/11325085>.
llvm-svn: 155724
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.
Fixes PR6679. Patch by Christoph Erhardt!
llvm-svn: 155704
|
| |
|
|
|
|
| |
Bridge after r155618.
llvm-svn: 155696
|
| |
|
|
| |
llvm-svn: 155686
|
| |
|
|
|
|
|
|
| |
instructions.
- However, it does support dmb, dsb, isb, mrs, and msr.
rdar://11331541
llvm-svn: 155685
|
| |
|
|
|
|
|
|
|
|
|
| |
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155668
|
| |
|
|
|
|
|
| |
On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.
llvm-svn: 155630
|
| |
|
|
|
|
|
|
|
|
| |
the feature set of v7a. This comes about if the user specifies something like
-arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as
uxtab in this case.
rdar://11318438
llvm-svn: 155601
|
| |
|
|
| |
llvm-svn: 155589
|
| |
|
|
|
|
| |
a failure if run on an Intel Atom with post RA instruction scheduling.
llvm-svn: 155587
|
| |
|
|
| |
llvm-svn: 155522
|