| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Generalization of Nate Begeman's patch!
llvm-svn: 130502
|
|
|
|
| |
llvm-svn: 130498
|
|
|
|
|
|
|
|
|
|
|
|
| |
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.
Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.
rdar://9329627
llvm-svn: 130245
|
|
|
|
|
|
| |
lit needs a linter ...
llvm-svn: 130126
|
|
|
|
| |
llvm-svn: 130050
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.
Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.
Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).
llvm-svn: 130048
|
|
|
|
| |
llvm-svn: 130046
|
|
|
|
|
|
|
|
|
|
|
| |
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.
rdar://9321541
llvm-svn: 129971
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
|
|
|
|
| |
llvm-svn: 129195
|
|
|
|
| |
llvm-svn: 128690
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
masks to match inversely for the code as is to work. For the example given
we actually want:
bfi r0, r2, #1, #1
not #0, however, given the way the pattern is written it's not possible
at the moment.
Fixes rdar://9177502
llvm-svn: 128320
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127498
|
|
|
|
|
|
| |
created from the", it broke some GCC test suite tests.
llvm-svn: 127477
|
|
|
|
|
|
|
|
|
|
| |
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127459
|
|
|
|
| |
llvm-svn: 124933
|
|
|
|
|
|
|
|
|
|
| |
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.
It's now enabled by default for Darwin.
llvm-svn: 123991
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
llvm-svn: 123971
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now. It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior. Since that isn't obviously wrong, I've just
changed the test file. This completes the work for Radar 8711675.
llvm-svn: 121730
|
|
|
|
|
|
| |
#shamt. rdar://8752056
llvm-svn: 121606
|
|
|
|
|
|
|
|
| |
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460
llvm-svn: 121391
|
|
|
|
|
|
| |
Check for that and try narrowing it to tADDspi instead. Radar 8724703.
llvm-svn: 120892
|
|
|
|
|
|
| |
32-bit wide version by adding the .w suffix.
llvm-svn: 120838
|
|
|
|
|
|
| |
Additionally, update these to unified syntax.
llvm-svn: 120589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407
llvm-svn: 119977
|
|
|
|
|
|
|
|
|
| |
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.
Adjust all testcases accordingly.
llvm-svn: 119725
|
|
|
|
|
|
|
| |
appear to differ on Linux. Try to make them pass on Linux.
Would be good for a Linux person to review this.
llvm-svn: 119572
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
|
|
|
|
|
|
| |
assumptions about stack layout. Specifically, LR must be saved next to FP.
llvm-svn: 118026
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain. It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions. Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions. Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.
llvm-svn: 117756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
|
|
|
|
|
|
|
|
|
|
| |
- Initial register pressure in the loop should be all the live defs into the
loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.
llvm-svn: 116956
|
|
|
|
| |
llvm-svn: 116955
|
|
|
|
|
|
|
| |
integers by default, and remove the controlling flag, now
that LICM will hoist such vdup's. 8003375.
llvm-svn: 116852
|
|
|
|
|
|
|
| |
erased the instruction during LICM so UpdateRegPressureAfter() should not
reference it afterwards.
llvm-svn: 116845
|
|
|
|
|
|
| |
is", which breaks some nightly tests.
llvm-svn: 116816
|
|
|
|
|
|
|
|
|
|
|
| |
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
llvm-svn: 116781
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
callee-saved registers at the end of the lists. Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases. Thanks to
Jakob for helping me realize the problem.
Most of this patch is fixing the testsuite. There are quite a few places
where we're checking for specific registers. I changed those to wildcards
in places where that doesn't weaken the tests. The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.
llvm-svn: 116055
|
|
|
|
|
|
|
|
| |
this makes
irrelevant, but add a new test for the new, improved functionality.
llvm-svn: 114494
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
llvm-svn: 114340
|
|
|
|
|
|
| |
and shift instructions on ARM. Update the tests to match.
llvm-svn: 114230
|
|
|
|
| |
llvm-svn: 114206
|
|
|
|
|
|
|
|
|
|
|
| |
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
|
|
|
|
|
|
|
| |
the VST pseudos. The VLD/VST scheduling still needs work (see pr6722), but
at least we shouldn't confuse the loads with the stores.
llvm-svn: 113473
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."
r112986 fixed a latent bug exposed by the above.
llvm-svn: 112989
|
|
|
|
|
|
|
|
| |
either", it is breaking oggenc with Clang for ARMv6.
This reverts commit 8d6e29cfda270be483abf638850311670829ee65.
llvm-svn: 112962
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.
rdar://7352504
rdar://8374540
rdar://8355680
llvm-svn: 112883
|
|
|
|
|
|
| |
ARM register class allocation order functions to take advantage of that.
llvm-svn: 112841
|