|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.
llvm-svn: 163093 | 
| | 
| 
| 
| 
| 
| 
| | These nodes are no longer needed because the peephole pass can fold
CMOV+AND into ANDCC etc.
llvm-svn: 162179 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | architecture
It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.
llvm-svn: 161736 | 
| | 
| 
| 
| 
| 
| | getSimpleVT can be removed.
llvm-svn: 161735 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 161581 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>
llvm-svn: 161232 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>
llvm-svn: 158302 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Factor out the expansion code into a function.
This change is to be enabled in clang.
rdar://9877866
llvm-svn: 157830 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | We handle struct byval by inserting a pseudo op, which will be expanded to a
loop at ExpandISelPseudos.
A separate patch for clang will be submitted to enable struct byval.
rdar://9877866
llvm-svn: 157793 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479 | 
| | 
| 
| 
| 
| 
| 
| 
| | This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).
llvm-svn: 156162 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370 | 
| | 
| 
| 
| 
| 
| 
| 
| | This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011 | 
| | 
| 
| 
| | llvm-svn: 152978 | 
| | 
| 
| 
| 
| 
| 
| | register allocation by allowing all 32 D-registers to be used. Patch by Cameron
Zwarich.
llvm-svn: 152824 | 
| | 
| 
| 
| 
| 
| | direct call.
llvm-svn: 151645 | 
| | 
| 
| 
| 
| 
| | prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | value is zero. Instead of a cmov + op, issue an conditional op instead. e.g.
    cmp   r9, r4
    mov   r4, #0
    moveq r4, #1 
    orr   lr, lr, r4
should be:
    cmp   r9, r4
    orreq lr, lr, #1
That is, optimize (or x, (cmov 0, y, cond)) to (or.cond x, y). Similarly extend
this to xor as well as (and x, (cmov -1, y, cond)) => (and.cond x, y).
It's possible to extend this to ADD and SUB but I don't think they are common.
rdar://8659097
llvm-svn: 151224 | 
| | 
| 
| 
| 
| 
| | to static data that should not be modified.
llvm-svn: 151134 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The EmitBasePointerRecalculation function has 2 problems, one minor and one
fatal.  The minor problem is that it inserts the code at the setjmp
instead of in the dispatch block.  The fatal problem is that at the point
where this code runs, we don't know whether there will be a base pointer,
so the entire function is a no-op.  The base pointer recalculation needs to
be handled as it was before, by inserting a pseudo instruction that gets
expanded late.
Most of the support for the old approach is still here, but it no longer
has any connection to the eh_sjlj_dispatchsetup intrinsic.  Clean up the
parts related to the intrinsic and just generate the pseudo instruction
directly.
llvm-svn: 144781 | 
| | 
| 
| 
| 
| 
| | integer variants. rdar://10437054
llvm-svn: 144608 | 
| | 
| 
| 
| | llvm-svn: 143594 | 
| | 
| 
| 
| 
| 
| | alignment permits.
llvm-svn: 143582 | 
| | 
| 
| 
| 
| 
| 
| | it with the new SjLj emitter stuff. This way there's no need to emit that
kind-of-hacky intrinsic.
llvm-svn: 141419 | 
| | 
| 
| 
| 
| 
| | functionality change.
llvm-svn: 141323 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This code will replace the version in ARMAsmPrinter.cpp. It creates a new
machine basic block, which is the dispatch for the return from a longjmp
call. It then shoves the address of that machine basic block into the correct
place in the function context so that the EH runtime will jump to it directly
instead of having to go through a compare-and-jump-to-the-dispatch bit. This
should be more efficient in the common case.
llvm-svn: 141031 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Encode the immediate into its 8-bit form as part of isel rather than later,
which simplifies things for mapping the encoding bits, allows the removal
of the custom disassembler decoding hook, makes the operand printer trivial,
and prepares things more cleanly for handling these in the asm parser.
rdar://10211428
llvm-svn: 140834 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons.  Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all").  Patch mostly by
Nadav Rotem.
llvm-svn: 139159 | 
| | 
| 
| 
| | llvm-svn: 138868 | 
| | 
| 
| 
| | llvm-svn: 138845 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.
When a i64 sub is expanded to subc + sube.
  libcall #1
     \
      \        subc 
       \       /  \
        \     /    \
         \   /    libcall #2
          sube
If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.
  subc
   |
  libcall #2
   |
  libcall #1
   |
  sube
However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.
The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.
rdar://10019576
llvm-svn: 138791 | 
| | 
| 
| 
| | llvm-svn: 135375 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | if (x != 0) x = 1
if (x == 1) x = 1
Previous codegen looks like this:
        mov     r1, r0
        cmp     r1, #1
        mov     r0, #0
        moveq   r0, #1
The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
        cmp     r0, #1
        movne   r0, #0
rdar://9758317
llvm-svn: 135017 | 
| | 
| 
| 
| 
| 
| | Part of rdar://9643582
llvm-svn: 134095 | 
| | 
| 
| 
| 
| 
| | Part of rdar://9119939
llvm-svn: 132510 | 
| | 
| 
| 
| 
| 
| | functionality change.
llvm-svn: 131012 | 
| | 
| 
| 
| 
| 
| 
| 
| | model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.
llvm-svn: 130743 | 
| | 
| 
| 
| 
| 
| | rdar://9326019
llvm-svn: 130234 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.
Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.
Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).
llvm-svn: 130048 | 
| | 
| 
| 
| | llvm-svn: 130046 | 
| | 
| 
| 
| 
| 
| | <rdar://problem/7662569>
llvm-svn: 129858 | 
| | 
| 
| 
| 
| 
| | can be recognized. This fixes <rdar://problem/9183078>.
llvm-svn: 128584 | 
| | 
| 
| 
| 
| 
| | predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981 | 
| | 
| 
| 
| 
| 
| | to canonicalize IR", it broke a lot of things.
llvm-svn: 127954 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}
=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret
This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:
sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0
This allows codegen to generate better code like this:
LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL
rdar://9147433
llvm-svn: 127953 | 
| | 
| 
| 
| | llvm-svn: 127694 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:
_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}
The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>
llvm-svn: 127630 |