| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.
With this change we differentiate between shifts of constants and other shifts.
radar://13576547
llvm-svn: 178808
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.
We can efficiently support
for (i = 0 ; i < ; i += 4)
w[0:3] = v[0:3] << <2, 2, 2, 2>
but not
for (i = 0; i < ; i += 4)
w[0:3] = v[0:3] << x[0:3]
This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.
Targets can then choose to return a different cost for instructions with such
operand values.
A follow-up commit will test this feature on x86.
radar://13576547
llvm-svn: 178807
|
| |
|
|
|
|
|
|
|
|
|
|
| |
BCL is normally a conditional branch-and-link instruction, but has
an unconditional form (which is used in the SjLj code, for example).
To make clear that this BCL instruction definition is specifically
the special unconditional form (which does not meaningfully take
a condition-register input), rename it to BCLalways.
No functionality change intended.
llvm-svn: 178803
|
| |
|
|
|
|
|
|
| |
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
llvm-svn: 178801
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It fixes following tests for Hexagon:
CodeGen/Generic/2003-07-29-BadConstSbyte.ll
CodeGen/Generic/2005-10-21-longlonggtu.ll
CodeGen/Generic/2009-04-28-i128-cmp-crash.ll
CodeGen/Generic/MachineBranchProb.ll
CodeGen/Generic/builtin-expect.ll
CodeGen/Generic/pr12507.ll
llvm-svn: 178794
|
| |
|
|
| |
llvm-svn: 178783
|
| |
|
|
|
|
|
|
|
|
| |
At the time when the XCore backend was added there were some issues with
with overlapping register classes but these all seem to be fixed now.
Describing the register classes correctly allow us to get rid of a
codegen only instruction (LDAWSP_lru6_RRegs) and it means we can
disassemble ru6 instructions that use registers above r11.
llvm-svn: 178782
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.
Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.
Also process blocks in a reverse post-order and propagate high-latency
flags to successors.
<rdar://problem/13468102>
llvm-svn: 178773
|
| |
|
|
| |
llvm-svn: 178763
|
| |
|
|
| |
llvm-svn: 178762
|
| |
|
|
| |
llvm-svn: 178761
|
| |
|
|
|
|
|
|
|
|
|
| |
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
llvm-svn: 178737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.
For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.
radar://13130673
radar://13537826
llvm-svn: 178703
|
| |
|
|
| |
llvm-svn: 178675
|
| |
|
|
|
|
|
| |
Incorporating review feedback from Bill Schmidt on r178617. No functionality
change intended.
llvm-svn: 178672
|
| |
|
|
| |
llvm-svn: 178667
|
| |
|
|
| |
llvm-svn: 178665
|
| |
|
|
|
|
|
| |
Mesa does not override llvm behavior wrt KILLGT anymore so llvm
has to handle KILLGT on its own.
llvm-svn: 178664
|
| |
|
|
|
|
|
| |
I discussed this with Bill Schmidt on IRC, and it was decided that this is a
safe and reasonable default.
llvm-svn: 178659
|
| |
|
|
| |
llvm-svn: 178658
|
| |
|
|
| |
llvm-svn: 178657
|
| |
|
|
|
|
|
|
|
|
| |
This patch follows up on work done by Bill Schmidt in r178277,
and replaces most of the remaining uses of VRRC in ISEL DAG patterns.
The resulting .inc files are identical except for comments, so
no change in code generation is expected.
llvm-svn: 178656
|
| |
|
|
|
|
|
|
| |
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
llvm-svn: 178639
|
| |
|
|
| |
llvm-svn: 178637
|
| |
|
|
| |
llvm-svn: 178634
|
| |
|
|
|
|
|
| |
It's a bit of churn in the blame log, but I think there are real benefits to
the newer system so I'm making the change in one go.
llvm-svn: 178633
|
| |
|
|
|
|
|
|
|
|
| |
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
llvm-svn: 178621
|
| |
|
|
|
|
| |
These refer to the reciprocal estimate support recently committed.
llvm-svn: 178618
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
llvm-svn: 178617
|
| |
|
|
| |
llvm-svn: 178589
|
| |
|
|
|
|
|
|
|
|
| |
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps
points to the start of the function.
Patch by Sasa Stankovic.
llvm-svn: 178588
|
| |
|
|
|
|
|
|
|
|
| |
This patch fixes the following two tests which have been failing on
llvm-mips-linux builder since r178403:
LLVM :: Analysis/Profiling/load-branch-weights-ifs.ll
LLVM :: Analysis/Profiling/load-branch-weights-loops.ll
llvm-svn: 178584
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
qualifiers.
This patch only adds support for parsing these identifiers in the
X86AsmParser. The front-end interface isn't capable of looking up
these identifiers at this point in time. The end result is the
compiler now errors during object file emission, rather than at
parse time. Test case coming shortly.
Part of rdar://13499009 and PR13340
llvm-svn: 178566
|
| |
|
|
|
|
|
|
|
|
| |
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target. The
target has nothing to do with it in this case; we always need a stwcx.
Thanks to Kai Nacke for reporting the problem.
llvm-svn: 178559
|
| |
|
|
| |
llvm-svn: 178549
|
| |
|
|
| |
llvm-svn: 178536
|
| |
|
|
|
|
| |
There is only a few new instructions, the rest is handled with patterns.
llvm-svn: 178528
|
| |
|
|
|
|
|
| |
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.
llvm-svn: 178527
|
| |
|
|
|
|
|
| |
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.
llvm-svn: 178526
|
| |
|
|
|
|
|
|
|
|
|
| |
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.
This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.
llvm-svn: 178525
|
| |
|
|
|
|
|
|
|
|
|
| |
The 'sparc' architecture produces 32-bit code while 'sparcv9' produces
64-bit code.
It is also possible to run 32-bit code using SPARC v9 instructions with:
llc -march=sparc -mattr=+v9
llvm-svn: 178524
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.
Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.
llvm-svn: 178523
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are going to use the same registers for 32-bit and 64-bit values, but
in two different register classes. The I64Regs register class has a
larger spill size and alignment.
The addition of an i64 register class confuses TableGen's type
inference, so it is necessary to clarify the type of some immediates and
the G0 register.
In 64-bit mode, pointers are i64 and should use the I64Regs register
class. Implement getPointerRegClass() to dynamically provide the pointer
register class depending on the subtarget. Use ptr_rc and iPTR for
memory operands.
Finally, add the i64 type to the IntRegs register class. This register
class is not used to hold i64 values, I64Regs is for that. The type is
required to appease TableGen's type checking in output patterns like this:
def : Pat<(add i64:$a, i64:$b), (ADDrr $a, $b)>;
SPARC v9 uses the same ADDrr instruction for i32 and i64 additions, and
TableGen doesn't know to check the type of register sub-classes.
llvm-svn: 178522
|
| |
|
|
|
|
| |
Thanks to Bill Schmidt for finding this in review of r178480.
llvm-svn: 178521
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buffered means a later divide may be executed out-of-order while a
prior divide is sitting (buffered) in a reservation station.
You can tell it's not pipelined, because operations that use it
reserve it for more than one cycle:
def : WriteRes<WriteIDiv, [HWPort0, HWDivider]> {
let Latency = 25;
let ResourceCycles = [1, 10];
}
We don't currently distinguish between an unpipeline operation and one
that is split into multiple micro-ops requiring the same unit. Except
that the later may have NumMicroOps > 1 if they also consume
issue/dispatch resources.
llvm-svn: 178519
|
| |
|
|
| |
llvm-svn: 178508
|
| |
|
|
| |
llvm-svn: 178505
|
| |
|
|
| |
llvm-svn: 178504
|
| |
|
|
| |
llvm-svn: 178503
|
| |
|
|
| |
llvm-svn: 178489
|