| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
| 
| 
|  | 
Found by the Clang static analyzer.
llvm-svn: 221366
 | 
| | 
| 
| 
|  | 
llvm-svn: 221358
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.
I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.
llvm-svn: 221341
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).
llvm-svn: 221321
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.
This had a couple of potential issues:
  + The .cfi directives we emit misplaced dN because they were based on
    PrologEpilogInserter's calculation.
  + Unaligned stores can be less efficient.
  + Unaligned stores can actually fault (likely only an issue in niche cases,
    but possible).
This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.
llvm-svn: 221320
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
 
rdar://problem/18740489
llvm-svn: 221178
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: rnk
Reviewed By: rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5978
llvm-svn: 221061
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This removes calls to isMaterializable in the following cases:
* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.
llvm-svn: 221050
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.
As a workaround, add an explicit way to get the MachineInstr.
llvm-svn: 221017
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}
As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr
Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
llvm-svn: 220978
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a
This patch allows all of these expressions to be matched.
llvm-svn: 220671
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:
vmov.i32        d16, #0x0
vcmpe.f64       d0, d16
With this change it becomes:
vcmpe.f64        d0, #0
Patch by Sergey Dmitrouk.
llvm-svn: 220486
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.
llvm-svn: 220472
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This commit enables using movt/movw to load the stack guard address:
movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]
Previously a pc-relative load was emitted:
ldr r0, LCPI0_0
ldr r0, [pc, r0]
rdar://problem/18740489
llvm-svn: 220470
 | 
| | 
| 
| 
| 
| 
|  | 
variants in thumb mode
llvm-svn: 220379
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.
Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.
llvm-svn: 220288
 | 
| | 
| 
| 
| 
| 
| 
|  | 
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).
llvm-svn: 220278
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The previous code had a few problems, motivating the choices here.
1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.
We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.
llvm-svn: 220236
 | 
| | 
| 
| 
| 
| 
| 
|  | 
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.
llvm-svn: 220204
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.
To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.
This fixes http://llvm.org/bugs/show_bug.cgi?id=19396
llvm-svn: 220196
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.
llvm-svn: 220194
 | 
| | 
| 
| 
|  | 
llvm-svn: 220155
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:
// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}
The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.
This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.
rdar://problem/18581150
llvm-svn: 220015
 | 
| | 
| 
| 
|  | 
llvm-svn: 219799
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Early attempts to support AAPCS bare metal MachO targets based the decision on
the CPU being compiled for. This was not a particularly great idea and we've
got a better option now, but this check remained.
No functional change for any target we care about.
llvm-svn: 219767
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.
So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.
llvm-svn: 219734
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.
llvm-svn: 219733
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.
There's no actual ABI change here, just to reassure people.
llvm-svn: 219719
 | 
| | 
| 
| 
| 
| 
|  | 
indirecting through the TargetMachine.
llvm-svn: 219674
 | 
| | 
| 
| 
| 
| 
|  | 
transitively from the DFAPacketizer via TargetInstrInfo.h.
llvm-svn: 219652
 | 
| | 
| 
| 
| 
| 
|  | 
Patch by Matthew Wahab.
llvm-svn: 219606
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.
NFC.
llvm-svn: 219573
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
long gone.
NFC.
llvm-svn: 219433
 | 
| | 
| 
| 
| 
| 
| 
|  | 
These methods are already used in lots of places. This makes things more
consistent. NFC.
llvm-svn: 219386
 | 
| | 
| 
| 
| 
| 
|  | 
Patch by Charlie Turner.
llvm-svn: 219301
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This must be enforced for all v6M cores, not just the cortex-m0,
irregardless of the user-specified alignment.
Patch by Charlie Turner.
llvm-svn: 219300
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This switch can be reduced to a simpler if/else statement.
Patch by Charlie Turner.
llvm-svn: 219299
 | 
| | 
| 
| 
| 
| 
|  | 
calls to getTargetLowering() with the cached variable.
llvm-svn: 219284
 | 
| | 
| 
| 
|  | 
llvm-svn: 219172
 | 
| | 
| 
| 
|  | 
llvm-svn: 219128
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This instruction form is handled by different AsmOperands now, so the code is
completely dead (and wrong anyway).
llvm-svn: 219127
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.
llvm-svn: 219106
 | 
| | 
| 
| 
|  | 
llvm-svn: 218999
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
llvm-svn: 218981
 | 
| | 
| 
| 
|  | 
llvm-svn: 218930
 | 
| | 
| 
| 
| 
| 
|  | 
pass it down in the constructor.
llvm-svn: 218929
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.
rdar://problem/18011155
llvm-svn: 218789
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.
llvm-svn: 218763
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.
llvm-svn: 218747
 |