| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
the spilled register.
This is quite common on ARM now that some stores have early-clobber defines.
llvm-svn: 129714
|
| |
|
|
|
|
|
|
|
| |
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.
Fixes rdar://9207598
llvm-svn: 129711
|
| |
|
|
|
|
| |
this fixes a few rejects on c++ iterator loops.
llvm-svn: 129694
|
| |
|
|
| |
llvm-svn: 129693
|
| |
|
|
|
|
|
|
|
|
| |
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
llvm-svn: 129666
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
|
| |
|
|
|
|
|
| |
which don't need to check for falling off the end of a block *and* end of phi
nodes, since terminators are never phis.
llvm-svn: 129655
|
| |
|
|
|
|
|
|
|
|
| |
allowing us to fold the immediate into the 'and' in this case:
int test1(int i) {
return 8&i;
}
llvm-svn: 129653
|
| |
|
|
|
|
|
| |
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
llvm-svn: 129650
|
| |
|
|
|
|
| |
For further information on this particular issue see: http://connect.microsoft.com/VisualStudio/feedback/details/520043/error-converting-from-null-to-a-pointer-type-in-std-pair
llvm-svn: 129642
|
| |
|
|
| |
llvm-svn: 129639
|
| |
|
|
|
|
| |
error in foo.o; no .eh_frame_hdr table will be created.
llvm-svn: 129635
|
| |
|
|
|
|
| |
the node to a libcall. rdar://9280991
llvm-svn: 129633
|
| |
|
|
|
|
| |
information generated for an interface.
llvm-svn: 129624
|
| |
|
|
| |
llvm-svn: 129600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transferValues() function can now handle both singly and multiply defined
values, as long as the resulting live range is known. Only rematerialized values
have their live range recomputed by extendRange().
The updateSSA() function can now insert PHI values in bulk across multiple
values in multiple target registers in one pass. The list of blocks received
from transferValues() is in layout order which seems to work well for the
iterative algorithm. Blocks from extendRange() are still in reverse BFS order,
but this function is used so rarely now that it doesn't matter.
llvm-svn: 129580
|
| |
|
|
| |
llvm-svn: 129579
|
| |
|
|
|
|
|
|
|
| |
debug info.
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129571
|
| |
|
|
|
|
| |
Luis Felipe Strano Moraes!
llvm-svn: 129558
|
| |
|
|
|
|
|
|
| |
This reduces the"
It broke several builds.
llvm-svn: 129557
|
| |
|
|
|
|
| |
RHS of a shift.
llvm-svn: 129522
|
| |
|
|
|
|
| |
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129518
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
llvm-svn: 129508
|
| |
|
|
| |
llvm-svn: 129503
|
| |
|
|
|
|
| |
where the RHS is of the legal type for the new operation.
llvm-svn: 129484
|
| |
|
|
|
|
| |
understand actual reason behind this fixme. Spot checking suggest that newer gdb does not need this.
llvm-svn: 129461
|
| |
|
|
| |
llvm-svn: 129442
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
|
| |
|
|
| |
llvm-svn: 129417
|
| |
|
|
|
|
|
|
| |
registers for fast allocation.
Fixes rdar://9207598
llvm-svn: 129408
|
| |
|
|
| |
llvm-svn: 129407
|
| |
|
|
| |
llvm-svn: 129406
|
| |
|
|
| |
llvm-svn: 129405
|
| |
|
|
|
|
| |
In case of multiple compile unit in one object file, each compile unit is responsible for its own set of type entries anyway. This refactoring makes this obvious.
llvm-svn: 129402
|
| |
|
|
| |
llvm-svn: 129400
|
| |
|
|
|
|
|
| |
Use a Bitvector instead, we didn't need the smaller memory footprint anyway.
This makes the greedy register allocator 10% faster.
llvm-svn: 129390
|
| |
|
|
| |
llvm-svn: 129385
|
| |
|
|
|
|
|
|
|
|
|
|
| |
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129383
|
| |
|
|
|
|
|
|
|
| |
This merges the behavior of splitSingleBlocks into splitAroundRegion, so the
RS_Region and RS_Block register stages can be coalesced. That means the leftover
intervals after region splitting go directly to spilling instead of a second
pass of per-block splitting.
llvm-svn: 129379
|
| |
|
|
|
|
| |
This makes it possible to target multiple registers in one pass.
llvm-svn: 129374
|
| |
|
|
|
|
| |
accidentally be skipped.
llvm-svn: 129373
|
| |
|
|
| |
llvm-svn: 129368
|
| |
|
|
| |
llvm-svn: 129367
|
| |
|
|
| |
llvm-svn: 129334
|
| |
|
|
|
|
| |
when compiling many small functions.
llvm-svn: 129321
|
| |
|
|
|
|
|
|
|
|
|
|
| |
mean that it has to be ConstantArray of ConstantStruct. We might have
ConstantAggregateZero, at either level, so don't crash on that.
Also, semi-deprecate the sentinal value. The linker isn't aware of sentinals so
we end up with the two lists appended, each with their "sentinals" on them.
Different parts of LLVM treated sentinals differently, so make them all just
ignore the single entry and continue on with the rest of the list.
llvm-svn: 129307
|
| |
|
|
|
|
| |
weight limit has been exceeded.
llvm-svn: 129305
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the 'unwind' instruction. However, later on that instruction was converted into
a jump to the basic block it was located in, causing an infinite loop when we
get there.
It turns out, we get there if the _Unwind_Resume_or_Rethrow call returns (which
it's not supposed to do). It returns if it cannot find a place to unwind
to. Thus we would get what appears to be a "hang" when in reality it's just that
the EH couldn't be propagated further along.
Instead of infinitely looping (or calling `unwind', which none of our back-ends
support (it's lowered into nothing...)), call the @llvm.trap() intrinsic
instead. This may not conform to specific rules of a particular language, but
it's rather better than infinitely looping.
<rdar://problem/9175843&9233582>
llvm-svn: 129302
|
| |
|
|
|
|
| |
more copies. rdar://9266679
llvm-svn: 129297
|