| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction.
llvm-svn: 130129
|
|
|
|
| |
llvm-svn: 130068
|
|
|
|
| |
llvm-svn: 130033
|
|
|
|
|
|
|
|
|
|
|
| |
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
would cause an assert.
2. The load may not be used by the current chain of instructions,
and we could move it past a side-effecting instruction. Change
how we process uses to define the problem away.
llvm-svn: 130018
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
movzbl (%rdi), %eax
cmpl $47, %eax
->
cmpb $47, (%rdi)
This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)
llvm-svn: 130005
|
|
|
|
|
|
| |
which broke a couple GCC test suite tests at -O0.
llvm-svn: 129914
|
|
|
|
|
|
|
|
|
| |
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).
Fixes rdar://9292577
llvm-svn: 129842
|
|
|
|
| |
llvm-svn: 129796
|
|
|
|
|
|
| |
unnecessary work where possible.
llvm-svn: 129763
|
|
|
|
|
|
| |
<rdar://problem/7662569>
llvm-svn: 129761
|
|
|
|
|
|
|
|
|
| |
generated
en-mass for C++ PODs. On my c++ test file, this cuts the fast isel rejects by 10x
and shrinks the generated .s file by 5%
llvm-svn: 129755
|
|
|
|
|
|
| |
this fixes a few rejects on c++ iterator loops.
llvm-svn: 129694
|
|
|
|
| |
llvm-svn: 129693
|
|
|
|
|
|
|
|
|
|
| |
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
llvm-svn: 129666
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
|
|
|
|
|
|
|
| |
which don't need to check for falling off the end of a block *and* end of phi
nodes, since terminators are never phis.
llvm-svn: 129655
|
|
|
|
|
|
|
|
|
|
| |
allowing us to fold the immediate into the 'and' in this case:
int test1(int i) {
return 8&i;
}
llvm-svn: 129653
|
|
|
|
|
|
|
| |
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
llvm-svn: 129650
|
|
|
|
|
|
| |
the node to a libcall. rdar://9280991
llvm-svn: 129633
|
|
|
|
|
|
| |
Luis Felipe Strano Moraes!
llvm-svn: 129558
|
|
|
|
|
|
| |
RHS of a shift.
llvm-svn: 129522
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
llvm-svn: 129508
|
|
|
|
| |
llvm-svn: 129503
|
|
|
|
|
|
| |
where the RHS is of the legal type for the new operation.
llvm-svn: 129484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
|
|
|
|
| |
llvm-svn: 129385
|
|
|
|
|
|
|
|
|
|
|
|
| |
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129383
|
|
|
|
| |
llvm-svn: 129271
|
|
|
|
|
|
|
|
| |
code.
Switch lowering probably shouldn't be using FP for this. This resolves PR9581.
llvm-svn: 129199
|
|
|
|
|
|
| |
with undef arguments.
llvm-svn: 129185
|
|
|
|
|
|
| |
is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
|
|
|
|
|
|
|
|
|
| |
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.
Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing
llvm-svn: 129100
|
|
|
|
|
|
|
|
|
|
|
|
| |
It needed to be moved closer to the setjmp statement, because the code directly
after the setjmp needs to know about values that are on the stack. Also, the
'bitcast' of the function context was causing a dead load. This wouldn't be too
horrible, except that at -O0 it wasn't optimized out, and because it wasn't
using the correct base pointer (if there is a VLA), it would try to access a
value from a garbage address.
<rdar://problem/9130540>
llvm-svn: 128873
|
|
|
|
| |
llvm-svn: 128868
|
|
|
|
|
|
|
|
| |
transformations in target-specific DAG combines without causing DAGCombiner to
delete the same node twice. If you know of a better way to avoid this (see my
next patch for an example), please let me know.
llvm-svn: 128758
|
|
|
|
| |
llvm-svn: 128730
|
|
|
|
|
|
| |
should improve src line debug info when sdisel is used. rdar://9199118
llvm-svn: 128728
|
|
|
|
|
|
| |
rdar://8911343
llvm-svn: 128696
|
|
|
|
|
|
|
|
| |
It couldn't be used outside of the file because SDISelAsmOperandInfo
is local to SelectionDAGBuilder.cpp. Making it a static function avoids
a weird linkage dance.
llvm-svn: 128342
|
|
|
|
|
|
|
| |
Yet another case of unchecked NULL node (for physreg copy).
May fix PR9509.
llvm-svn: 128266
|
|
|
|
|
|
| |
Also cleaning up some duplicated code while I'm here.
llvm-svn: 128176
|
|
|
|
|
|
|
|
|
| |
so the scheduler can't create new interferences on the copies
themselves. Prior to this fix the scheduler could get stuck in a loop
creating copies.
Fixes PR9509.
llvm-svn: 128164
|
|
|
|
| |
llvm-svn: 128163
|
|
|
|
|
|
|
|
| |
I'm tired of doing this manually for each checkout.
If anyone knows a better way debug isel for non-trivial tests feel
free to revert and let me know how to do it.
llvm-svn: 128132
|
|
|
|
| |
llvm-svn: 128004
|
|
|
|
|
|
|
|
| |
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
shrl $2, %edi
imulq $613566757, %rdi, %rax
shrq $32, %rax
ret
instead of
movl %edi, %eax
imulq $613566757, %rax, %rcx
shrq $32, %rcx
subl %ecx, %eax
shrl %eax
addl %ecx, %eax
shrl $4, %eax
on x86_64
llvm-svn: 127829
|
|
|
|
| |
llvm-svn: 127809
|
|
|
|
| |
llvm-svn: 127807
|