|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | It never does anything when running 'make check', and it get's in the
way of updating live intervals in 2-addr.
The hook was originally added to help form IT blocks in Thumb2 code
before register allocation, but the pass ordering has changed since
then, and we run if-conversion after register allocation now.
When the MI scheduler is enabled, there will be no less than two
schedulers between 2-addr and Thumb2ITBlockPass, so this hook is
unlikely to help anything.
llvm-svn: 161794 | 
| | 
| 
| 
| | llvm-svn: 153421 | 
| | 
| 
| 
| | llvm-svn: 152978 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Without this hook, functions w/ a completely empty body (including no
epilogue) will cause an MCEmitter assertion failure.
For example,
define internal fastcc void @empty_function() {
  unreachable
}
rdar://10947471
llvm-svn: 151673 | 
| | 
| 
| 
| 
| 
| | MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
   "optimize for latency". Call instructions don't have the right latency and
   this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
   not # of micro-ops since multi-latency instructions is completely executed
   even when the predicate is false. Also, some instruction will be "slower"
   when they are predicated due to the register def becoming implicit input.
   rdar://8598427
llvm-svn: 118135 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | if-conversion heuristic APIs.  For now,
stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide
more nuanced estimates in the future.
llvm-svn: 115364 | 
| | 
| 
| 
| | llvm-svn: 115339 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | if-conversion profitability.
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
llvm-svn: 114973 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570 | 
| | 
| 
| 
| | llvm-svn: 108078 | 
| | 
| 
| 
| | llvm-svn: 106901 | 
| | 
| 
| 
| | llvm-svn: 106517 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | - This fixed a number of bugs in if-converter, tail merging, and post-allocation
  scheduler. If-converter now runs branch folding / tail merging first to
  maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
  register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
  change the instruction ordering in the IT block (since IT mask has been
  finalized). It also ensures no other instructions can be scheduled between
  instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344 | 
| | 
| 
| 
| 
| 
| | will use this to try to avoid breaking up IT blocks.
llvm-svn: 105745 | 
| | 
| 
| 
| 
| 
| | doesn't have to guess.
llvm-svn: 103194 | 
| | 
| 
| 
| | llvm-svn: 103193 | 
| | 
| 
| 
| 
| 
| 
| | MachineBasicBlock::canFallThrough(), which is target-independent and more
thorough.
llvm-svn: 90634 | 
| | 
| 
| 
| | llvm-svn: 86423 | 
| | 
| 
| 
| | llvm-svn: 86408 | 
| | 
| 
| 
| 
| 
| | tLDRpci_pic.
llvm-svn: 86330 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | except it doesn't care if the definitions' virtual registers differ. This is
  used by machine LICM and other MI passes to perform CSE.
- Teach Thumb2InstrInfo::isIdentical() to check two t2LDRpci_pic are identical.
  Since pc relative constantpool entries are always different, this requires it
  it check if the values can actually the same.
llvm-svn: 86328 | 
| | 
| 
| 
| | llvm-svn: 86310 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | load of a GV from constantpool and then add pc. It allows the code sequence to
  be rematerializable so it would be hoisted by machine licm.
- Add a late pass to break these pseudo instructions into a number of real
  instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm
  to this pass. This is done before post regalloc scheduling to allow the
  scheduler to proper schedule these instructions. It also allow them to be
  if-converted and shrunk by later passes.
llvm-svn: 86304 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | - This change also makes it possible to switch between ARM / Thumb on a
  per-function basis.
- Fixed thumb2 routine which expand reg + arbitrary immediate. It was using
  using ARM so_imm logic.
- Use movw and movt to do reg + imm when profitable.
- Other code clean ups and minor optimizations.
llvm-svn: 77300 | 
| | 
| 
| 
| 
| 
| | more getOpcode calls.
llvm-svn: 77181 | 
| | 
| 
| 
| 
| 
| | elimination more exactly for Thumb-2 to get better code gen.
llvm-svn: 76919 | 
| | 
| 
| 
| 
| 
| | that don't allow negative offsets. During frame elimination convert *i12 opcode to a *i8 when necessary due to a negative offset.
llvm-svn: 76883 | 
| | 
| 
| 
| 
| 
| | Minor code duplication cleanup.
llvm-svn: 76124 | 
| | 
| 
| 
| | llvm-svn: 75067 | 
| | 
| 
| 
| 
| 
| | shared between ARM and Thumb2. Not yet activated because register information must be generalized first.
llvm-svn: 75010 | 
|  | Thumb1InstrInfo, Thumb2InstrInfo, Thumb1RegisterInfo and Thumb2RegisterInfo. Move methods from ARMInstrInfo to ARMBaseInstrInfo to prepare for sharing with Thumb2.
llvm-svn: 74731 |