| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
buildbot failure.
llvm-svn: 156377
|
| |
|
|
|
|
| |
done for 128-bit.
llvm-svn: 156375
|
| |
|
|
|
|
|
| |
Added new case-ranges orientated methods for adding/removing cases in SwitchInst. After this patch cases will internally representated as ConstantArray-s instead of ConstantInt, externally cases wrapped within the ConstantRangesSet object.
Old methods of SwitchInst are also works well, but marked as deprecated. So on this stage we have no side effects except that I added support for case ranges in BitcodeReader/Writer, of course test for Bitcode is also added. Old "switch" format is also supported.
llvm-svn: 156374
|
| |
|
|
| |
llvm-svn: 156324
|
| |
|
|
|
|
| |
order of their operands across instructions. This allows for greater CSE opportunities.
llvm-svn: 156323
|
| |
|
|
|
|
|
| |
single use.
rdar://11360370
llvm-svn: 156316
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
rdar: 10961709
llvm-svn: 156312
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156294
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156293
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156292
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156285
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156284
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156283
|
| |
|
|
| |
llvm-svn: 156282
|
| |
|
|
| |
llvm-svn: 156281
|
| |
|
|
|
|
| |
Patch by Jack Carter.
llvm-svn: 156280
|
| |
|
|
|
|
|
|
| |
from the previous 2 patches.
Patch by Jack Carter.
llvm-svn: 156279
|
| |
|
|
|
|
|
|
|
| |
The primitive conservative heuristic seems to give a slight overall
improvement while not regressing stuff. Make it available to wider
testing. If you notice any speed regressions (or significant code
size regressions) let me know!
llvm-svn: 156258
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:
ucomisd (%rdi), %xmm0
cmovbel %edx, %esi
cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.
This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.
It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.
Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.
Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)
llvm-svn: 156234
|
| |
|
|
|
|
|
|
| |
case when alloca's size is calculated within the "add/sub/... nsw".
Also added fix to 2011-06-13-nsw-alloca.ll test.
llvm-svn: 156231
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
|
| |
|
|
|
|
| |
16-bits encoding of CMN instructions.
llvm-svn: 156195
|
| |
|
|
| |
llvm-svn: 156156
|
| |
|
|
|
|
|
|
|
| |
for the assembler and disassembler. Which were not being set/read correctly
for offsets greater than 22 bits in some cases.
Changes to lib/Target/ARM/ARMAsmBackend.cpp from Gideon Myles!
llvm-svn: 156118
|
| |
|
|
|
|
|
|
| |
being done for malloc)
fix a few typos found by Chad in my previous commit
llvm-svn: 156110
|
| |
|
|
|
|
| |
This patch creates and optimizes packets as per Hexagon ISA rules.
llvm-svn: 156109
|
| |
|
|
| |
llvm-svn: 156102
|
| |
|
|
| |
llvm-svn: 156077
|
| |
|
|
|
|
| |
lower half correctly. Missed in r155982.
llvm-svn: 156059
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to catch cases like:
%reg1024<def> = MOV r1
%reg1025<def> = MOV r0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:
%reg1024<def> = MOV r0
%reg1025<def> = MOV 0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
That did no benefit but rather ensure the last MOV would not be coalesced.
rdar://11355268
llvm-svn: 156048
|
| |
|
|
|
|
| |
just like it now knows for FMULs.
llvm-svn: 156029
|
| |
|
|
| |
llvm-svn: 156023
|
| |
|
|
| |
llvm-svn: 156019
|
| |
|
|
|
|
|
|
| |
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.
llvm-svn: 155992
|
| |
|
|
|
|
|
| |
methods. Use a weak value handle to keep up with this.
PR12245
llvm-svn: 155984
|
| |
|
|
| |
llvm-svn: 155983
|
| |
|
|
|
|
| |
for AsmPrinter.
llvm-svn: 155982
|
| |
|
|
|
|
| |
PR10799
llvm-svn: 155954
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Aliases for adding a negative immediate when using an explicit 'w'
suffix. E.g.,
adds.w r2, #-16
adds.w r2, r2, #-16
addw r2, #-16
addw r2, #-16
addw r2, r2, #-16
rdar://11330769
llvm-svn: 155946
|
| |
|
|
|
|
|
|
|
|
| |
Expressions for movw/movt don't always have an :upper16: or :lower16:
on them and that's ok. When they don't, it's just a plain [0-65536]
immediate result, effectively the same as a :lower16: variant kind.
rdar://10550147
llvm-svn: 155941
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, an unsupported/unknown assembler directive issued a warning.
That's generally unsafe, and inconsistent with the behaviour of pretty
much every system assembler. Now that the MC assemblers are mature
enough to be the default on multiple targets, it's reasonable to
issue errors for these.
For target or platform directives that need to stay warnings, we
should add explicit handlers for them in, e.g., ELFAsmParser.cpp,
DarwinAsmParser.cpp, et. al., and issue the warning there.
rdar://9246275
llvm-svn: 155926
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0
FROM
movl %edi, %ecx
subl %esi, %ecx
cmpl %edi, %esi
movl $0, %eax
cmovll %ecx, %eax
TO
xorl %eax, %eax
subl %esi, %edi
cmovll %eax, %edi
movl %edi, %eax
rdar: 10734411
llvm-svn: 155919
|
| |
|
|
|
|
| |
(to generate debug info for local variables) if stack needs realignment
llvm-svn: 155917
|
| |
|
|
| |
llvm-svn: 155912
|
| |
|
|
|
|
| |
has no exit blocks. Fixes PR12706!
llvm-svn: 155884
|
| |
|
|
|
|
|
|
|
| |
<rdar://problem/11291436>.
This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.
llvm-svn: 155866
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
llvm-svn: 155853
|
| |
|
|
| |
llvm-svn: 155840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.
(this time, actually commit what was reviewed!)
llvm-svn: 155825
|
| |
|
|
|
|
|
|
| |
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal. Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.
llvm-svn: 155824
|