| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Patch by Evzen Muller!
llvm-svn: 114991
|
| |
|
|
|
|
| |
accesses for ARM targets that would otherwise allow it. Radar 8465431.
llvm-svn: 114941
|
| |
|
|
| |
llvm-svn: 114746
|
| |
|
|
|
|
|
|
| |
new VariantKind to the MCSymbolExpr seems like overkill, but I'm not sure
there's a more straightforward way to get the printing difference captured.
(i.e., x86 uses @PLT, ARM uses (PLT)).
llvm-svn: 114613
|
| |
|
|
|
|
|
|
| |
CombineTo to avoid putting the result on the worklist. I don't think it makes
much difference for now, but it might help someday as we add more DAG
combine optimizations.
llvm-svn: 114595
|
| |
|
|
|
|
|
|
| |
of those. Refactor to share code for handling BUILD_VECTOR(VMOVRRD).
I don't have a testcase that exercises this, but it seems like an obvious
good thing to do.
llvm-svn: 114589
|
| |
|
|
|
|
|
|
| |
this makes
irrelevant, but add a new test for the new, improved functionality.
llvm-svn: 114494
|
| |
|
|
| |
llvm-svn: 114463
|
| |
|
|
|
|
| |
and store intrinsics are represented with MemIntrinsicSDNodes.
llvm-svn: 114454
|
| |
|
|
| |
llvm-svn: 114410
|
| |
|
|
|
|
|
| |
instead of srcvalue/offset pairs. This corrects SV info for mem
operations whose size is > 32-bits.
llvm-svn: 114401
|
| |
|
|
|
|
|
|
| |
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert. Radar 8407927.
llvm-svn: 114233
|
| |
|
|
|
|
| |
used for fast-isel.
llvm-svn: 113652
|
| |
|
|
|
|
|
|
|
|
|
| |
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
|
| |
|
|
| |
llvm-svn: 113338
|
| |
|
|
|
|
|
|
| |
vabd intrinsic and add and/or zext operations. In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests. Auto-upgrade the old intrinsics.
llvm-svn: 112941
|
| |
|
|
|
|
|
| |
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests. Add auto-upgrade support for the old intrinsics.
llvm-svn: 112773
|
| |
|
|
|
|
| |
it sets the CPSR register.
llvm-svn: 112393
|
| |
|
|
|
|
|
|
|
| |
comparison that would overflow.
- The other under/overflow cases can't actually happen because the immediates
which would trigger them are legal (so we don't enter this code), but
adjusted the style to make it clear the transform is always valid.
llvm-svn: 112053
|
| |
|
|
|
|
| |
zero-extend operations.
llvm-svn: 111614
|
| |
|
|
|
|
| |
Testcase from Nick Lewycky.
llvm-svn: 111341
|
| |
|
|
| |
llvm-svn: 111226
|
| |
|
|
| |
llvm-svn: 111208
|
| |
|
|
| |
llvm-svn: 111050
|
| |
|
|
| |
llvm-svn: 110810
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
float t1(int argc) {
return (argc == 1123) ? 1.234f : 2.38213f;
}
We would generate truly awful code on ARM (those with a weak stomach should look
away):
_t1:
movw r1, #1123
movs r2, #1
movs r3, #0
cmp r0, r1
mov.w r0, #0
it eq
moveq r0, r2
movs r1, #4
cmp r0, #0
it ne
movne r3, r1
adr r0, #LCPI1_0
ldr r0, [r0, r3]
bx lr
The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".
I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.
Now we generate this, which looks optimal to me:
_t1:
movw r1, #1123
movs r2, #0
cmp r0, r1
adr r0, #LCPI0_0
it eq
moveq r2, #4
ldr r0, [r0, r2]
bx lr
.align 2
LCPI0_0:
.long 1075344593 @ float 2.382130e+00
.long 1067316150 @ float 1.234000e+00
llvm-svn: 110799
|
| |
|
|
|
|
|
|
|
| |
memory and synchronization barrier dmb and dsb instructions.
- Change instruction names to something more sensible (matching name of actual
instructions).
- Added tests for memory barrier codegen.
llvm-svn: 110785
|
| |
|
|
| |
llvm-svn: 110710
|
| |
|
|
|
|
|
|
| |
function stack frame has a var-sized object.
Also added a test case to check for the added benefit of this patch: it's optimizing away the unnecessary restore of sp from fp for some non-leaf functions.
llvm-svn: 110707
|
| |
|
|
|
|
| |
register is", it breaks a couple test-suite tests.
llvm-svn: 110701
|
| |
|
|
|
|
|
|
|
|
| |
reserved, not available for general allocation. This eliminates all the
extra checks for Darwin.
This change also fixes the use of FP to access frame indices in leaf
functions and cleaned up some confusing code in epilogue emission.
llvm-svn: 110655
|
| |
|
|
|
|
| |
seem to be working correctly. No functional change.
llvm-svn: 110226
|
| |
|
|
|
|
| |
(absolute difference with accumulate) intrinsics. Radar 8228576.
llvm-svn: 110170
|
| |
|
|
|
|
|
|
|
| |
VFP is enabled.
Add support for using the FPSCR in conjunction with the vcvtr instruction, for controlling fp to int rounding.
Add support for the FLT_ROUNDS_ node now that the FPSCR is exposed.
llvm-svn: 110152
|
| |
|
|
|
|
| |
transformations.
llvm-svn: 109800
|
| |
|
|
|
|
|
|
|
| |
integers with mov + vdup. 8003375. This is
currently disabled by default because LICM will
not hoist a VDUP, so it pessimizes the code if
the construct occurs inside a loop (8248029).
llvm-svn: 109799
|
| |
|
|
| |
llvm-svn: 109359
|
| |
|
|
|
|
|
| |
function live in set. This will give us tGPR for Thumb1 and GPR otherwise,
so the copy will be spillable. rdar://8224931
llvm-svn: 109293
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
llvm-svn: 109279
|
| |
|
|
| |
llvm-svn: 109091
|
| |
|
|
| |
llvm-svn: 109064
|
| |
|
|
| |
llvm-svn: 109047
|
| |
|
|
| |
llvm-svn: 109009
|
| |
|
|
| |
llvm-svn: 108991
|
| |
|
|
| |
llvm-svn: 108841
|
| |
|
|
|
|
| |
its scalar floating point registers alias its vector registers.
llvm-svn: 108761
|
| |
|
|
|
|
|
|
| |
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR6581.
llvm-svn: 108730
|
| |
|
|
| |
llvm-svn: 108727
|
| |
|
|
|
|
|
|
| |
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR7499.
llvm-svn: 108722
|
| |
|
|
|
|
|
| |
instruction for non-constant operands. This includes the case referenced
in the README.txt regarding a bitfield copy.
llvm-svn: 108608
|