| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
predicates.
llvm-svn: 129816
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Enable these fp vmlx codegen changes for Cortex-A9.
llvm-svn: 129775
|
|
|
|
| |
llvm-svn: 129034
|
|
|
|
|
|
| |
No one uses *-mingw64. mingw-w64 is represented as {i686|x86_64}-w64-mingw32. In llvm side, i686 and x64 can be treated as similar way.
llvm-svn: 125747
|
|
|
|
| |
llvm-svn: 124077
|
|
|
|
|
|
| |
and fixes here and there.
llvm-svn: 123170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
|
|
|
|
| |
llvm-svn: 119462
|
|
|
|
|
|
| |
out of TargetRegisterInfo to TargetFrameInfo, which is definitely much better suitable place
llvm-svn: 119097
|
|
|
|
| |
llvm-svn: 118835
|
|
|
|
| |
llvm-svn: 118827
|
|
|
|
|
|
|
|
| |
so and also change X86 for consistency.
Investigating if this can be improved a bit.
llvm-svn: 115469
|
|
|
|
|
|
|
|
|
|
| |
the eqv X86 class.
For now, I split the ELFARMAsmBackend from the DarwinARMAsmBackend
(also mimicking X86)
Tested against -r115126
llvm-svn: 115129
|
|
|
|
|
|
| |
ARMTargetMachine.cpp:53: error: control reaches end of non-void function
llvm-svn: 114992
|
|
|
|
|
|
|
|
|
| |
llc now recognizes the "intent" to support MC/obj emission for ARM, but
given that they are all stubs, it asserts on --filetype=obj --march=arm
Patch by Jason Kim.
llvm-svn: 114856
|
|
|
|
|
|
|
| |
register allocation. Remove the NEONPreAllocPass, which is no longer needed.
Yeah!!
llvm-svn: 113818
|
|
|
|
|
|
| |
support it. e.g. cortex-m* processors.
llvm-svn: 110798
|
|
|
|
|
|
| |
more 32-bit to 16-bit optimizations.
llvm-svn: 110584
|
|
|
|
|
|
| |
experimentation.
llvm-svn: 110579
|
|
|
|
| |
llvm-svn: 109359
|
|
|
|
| |
llvm-svn: 107513
|
|
|
|
|
|
| |
Radar 8128745.
llvm-svn: 106820
|
|
|
|
| |
llvm-svn: 106775
|
|
|
|
| |
llvm-svn: 106355
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344
|
|
|
|
|
|
| |
(conservatively) aware of predicated instructions. This enables ARM to move if-conversion before post-ra scheduler.
llvm-svn: 106091
|
|
|
|
| |
llvm-svn: 105677
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the same condition, it's important to make sure they are scheduled together
to avoid forming multiple IT blocks. I'm adding a pre-regalloc pass that forms
IT blocks early (by re-scheduling instructions and split basic blocks) to
attempt to fix this. This is not turned on by default since I am not sure this
is the right fix.
Another issue is llvm selects are modeled as two-address conditional moves.
This can be very bad when the copies before the conditional moves are not
coalesced away. Teach IT formation pass to move the copies above the IT block
(when legal) to avoid breaking the IT block.
llvm-svn: 105669
|
|
|
|
|
|
|
|
| |
Move EmitTargetCodeForMemcpy, EmitTargetCodeForMemset, and
EmitTargetCodeForMemmove out of TargetLowering and into
SelectionDAGInfo to exercise this.
llvm-svn: 103481
|
|
|
|
|
|
| |
It is not ready for public yet.
llvm-svn: 100673
|
|
|
|
| |
llvm-svn: 100668
|
|
|
|
| |
llvm-svn: 100641
|
|
|
|
| |
llvm-svn: 99097
|
|
|
|
| |
llvm-svn: 95134
|
|
|
|
|
|
|
| |
eliminate random "code emitter" stuff in Alpha, except for
the JIT path. Next up, remove the template cruft.
llvm-svn: 95131
|
|
|
|
|
|
|
|
|
|
|
| |
function can support dynamic stack realignment. That's a much easier question
to answer at instruction selection stage than whether the function actually
will have dynamic alignment prologue. This allows the removal of the
stack alignment heuristic pass, and improves code quality for cases where
the heuristic would result in dynamic alignment code being generated when
it was not strictly necessary.
llvm-svn: 93885
|
|
|
|
|
|
| |
No functionality change.
llvm-svn: 90336
|
|
|
|
|
|
|
| |
conservatively. eliminateFrameIndex() machinery adjust to handle addr mode
6 (vld1/vst1) used for spills. Fix tests to expect aligned Q-reg spilling
llvm-svn: 88874
|
|
|
|
|
|
| |
Please verify.
llvm-svn: 86397
|
|
|
|
|
|
|
|
|
|
|
|
| |
load of a GV from constantpool and then add pc. It allows the code sequence to
be rematerializable so it would be hoisted by machine licm.
- Add a late pass to break these pseudo instructions into a number of real
instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm
to this pass. This is done before post regalloc scheduling to allow the
scheduler to proper schedule these instructions. It also allow them to be
if-converted and shrunk by later passes.
llvm-svn: 86304
|
|
|
|
| |
llvm-svn: 86251
|
|
|
|
| |
llvm-svn: 85914
|
|
|
|
|
|
| |
the compile time.
llvm-svn: 85850
|
|
|
|
|
|
| |
I'm going to redo this using the OptimizeForSize function attribute.
llvm-svn: 85426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use it to control tail merging when there is a tradeoff between performance
and code size. When there is only 1 instruction in the common tail, we have
been merging. That can be good for code size but is a definite loss for
performance. Now we will avoid tail merging in that case when the
optimization level is "Aggressive", i.e., "-O3". Radar 7338114.
Since the IfConversion pass invokes BranchFolding, it too needs to know
the optimization level. Note that I removed the RegisterPass instantiation
for IfConversion because it required a default constructor. If someone
wants to keep that for some reason, we can add a default constructor with
a hard-wired optimization level.
llvm-svn: 85346
|
|
|
|
| |
llvm-svn: 84868
|
|
|
|
|
|
| |
instruction get scheduled properly.
llvm-svn: 84843
|
|
|
|
| |
llvm-svn: 84831
|
|
|
|
| |
llvm-svn: 83236
|
|
|
|
| |
llvm-svn: 83145
|