| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
|
|
|
|
| |
llvm-svn: 129654
|
|
|
|
|
|
|
|
|
|
| |
allowing us to fold the immediate into the 'and' in this case:
int test1(int i) {
return 8&i;
}
llvm-svn: 129653
|
|
|
|
|
|
|
| |
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
llvm-svn: 129650
|
|
|
|
|
|
| |
with a 64-bit datalayout.
llvm-svn: 129643
|
|
|
|
|
|
| |
the node to a libcall. rdar://9280991
llvm-svn: 129633
|
|
|
|
|
|
| |
rdar://problem/9292717
llvm-svn: 129619
|
|
|
|
|
|
|
|
| |
The a bit must be encoded as 0.
rdar://problem/9292625
llvm-svn: 129618
|
|
|
|
| |
llvm-svn: 129616
|
|
|
|
|
|
| |
a case involving EOR, so I only added a test for ORR.
llvm-svn: 129610
|
|
|
|
| |
llvm-svn: 129607
|
|
|
|
|
|
| |
problem as all of the other instructions we fold with CMPs.
llvm-svn: 129602
|
|
|
|
|
|
| |
fixes <rdar://problem/9287901>.
llvm-svn: 129599
|
|
|
|
|
|
| |
register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions.
llvm-svn: 129594
|
|
|
|
| |
llvm-svn: 129589
|
|
|
|
|
|
|
|
|
| |
debug info.
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129571
|
|
|
|
|
|
| |
Luis Felipe Strano Moraes!
llvm-svn: 129558
|
|
|
|
|
|
|
|
| |
This reduces the"
It broke several builds.
llvm-svn: 129557
|
|
|
|
|
|
| |
forget to right shift the source by 32 first. rdar://9287902
llvm-svn: 129556
|
|
|
|
| |
llvm-svn: 129551
|
|
|
|
|
|
|
|
| |
instructions
(single element or n-element structure to all lanes).
llvm-svn: 129550
|
|
|
|
| |
llvm-svn: 129548
|
|
|
|
|
|
|
| |
canonical, and generally leads to better code. Found while looking at
an article about saturating arithmetic.
llvm-svn: 129545
|
|
|
|
|
|
|
|
|
|
| |
repeatedly undo each other. The solution is to perform more aggressive constant folding to make one of the edges just folded away rather than trying to thread it.
Fixes <rdar://problem/9284786>.
Discovered with CSmith.
llvm-svn: 129538
|
|
|
|
|
|
| |
operations.
llvm-svn: 129531
|
|
|
|
| |
llvm-svn: 129519
|
|
|
|
|
|
| |
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129518
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
llvm-svn: 129508
|
|
|
|
|
|
| |
(movzx/movsx) because they give more information. Revert that part of the patch.
llvm-svn: 129498
|
|
|
|
|
|
| |
cases, it's much nicer and more informative reading the alias.
llvm-svn: 129497
|
|
|
|
|
|
| |
rdar://problem/9280370
llvm-svn: 129480
|
|
|
|
|
|
|
|
| |
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>). When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.
llvm-svn: 129472
|
|
|
|
|
|
|
|
| |
instructions (tBcc and t2Bcc).
rdar://problem/9280470
llvm-svn: 129471
|
|
|
|
|
|
| |
rdar://problem/9279440
llvm-svn: 129469
|
|
|
|
| |
llvm-svn: 129468
|
|
|
|
|
|
|
| |
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.
llvm-svn: 129466
|
|
|
|
|
|
|
|
| |
as such.
rdar://problem/9276651
llvm-svn: 129462
|
|
|
|
|
|
|
|
| |
not properly handled.
rdar://problem/9276427
llvm-svn: 129456
|
|
|
|
|
|
|
| |
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.
llvm-svn: 129432
|
|
|
|
|
|
|
| |
alignment for its type, use the minimum of the specified alignment and the ABI
alignment. This fixes <rdar://problem/9275290>.
llvm-svn: 129428
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
|
|
|
|
| |
llvm-svn: 129419
|
|
|
|
| |
llvm-svn: 129417
|
|
|
|
|
|
| |
rdar://problem/9273947
llvm-svn: 129411
|
|
|
|
|
|
|
|
| |
registers for fast allocation.
Fixes rdar://9207598
llvm-svn: 129408
|
|
|
|
| |
llvm-svn: 129403
|
|
|
|
|
|
|
|
|
| |
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
llvm-svn: 129401
|
|
|
|
|
|
|
|
| |
generators. It may improve robustness when testing from VS too.
Based on a patch by David Neto!
llvm-svn: 129398
|
|
|
|
|
|
|
|
|
|
| |
In addition, the base register is not rGPR, but GPR with th exception that:
if n == 15 then UNPREDICTABLE
rdar://problem/9273836
llvm-svn: 129391
|