|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| ... |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).
llvm-svn: 209745 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract.  This operation is
represented by a VADD_SPLAT in the selection DAG.  Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation.  This patch corrects the
problem.
Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node.  This is
correct much of the time, but not always.  The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type.  The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.
The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8.  The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}.  isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node.  If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}.  The correct code generation is a vspltish of 8 and a vadduhm.
This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.
llvm-svn: 209662 | 
| | 
| 
| 
| 
| 
| 
| | This seems to match what gcc does for ppc and what every other llvm
backend does.
llvm-svn: 209638 | 
| | 
| 
| 
| | llvm-svn: 209377 | 
| | 
| 
| 
| 
| 
| | on PPC.
llvm-svn: 209376 | 
| | 
| 
| 
| 
| 
| 
| | This required updating the generated functions and TD file accordingly
to be pointers rather than const references.
llvm-svn: 209375 | 
| | 
| 
| 
| 
| 
| 
| | a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.
llvm-svn: 209340 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend.  We matched an immediate offset with STWX8 even though it only
supports register offset.
The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.
Many thanks to Bill Schmidt for testing this.
llvm-svn: 209219 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.
llvm-svn: 209082 | 
| | 
| 
| 
| | llvm-svn: 209048 | 
| | 
| 
| 
| | llvm-svn: 209040 | 
| | 
| 
| 
| 
| 
| | inappropriate since it lost its Mask parameter in r154011.
llvm-svn: 208811 | 
| | 
| 
| 
| | llvm-svn: 208743 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | member variable and sink the initialization of crbits into the
subtarget feature reset code.
No functional change, but this refactor will be used in a future
commit.
llvm-svn: 208726 | 
| | 
| 
| 
| 
| 
| 
| | Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).
llvm-svn: 208509 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The counter-loops formation pass needs to know what operations might be
function calls (because they can't appear in counter-based loops). On PPC32,
128-bit shifts might be runtime calls (even though you can't use __int128 on
PPC32, it seems that SROA might form them).
Fixes PR19709.
llvm-svn: 208501 | 
| | 
| 
| 
| 
| 
| | We were already always passing true, this just removes the option.
llvm-svn: 208205 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.
This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.
Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.
llvm-svn: 207920 | 
| | 
| 
| 
| 
| 
| | introduced most of these recently.
llvm-svn: 207616 | 
| | 
| 
| 
| 
| 
| | anything. In some cases remove all together if there are no callers either.
llvm-svn: 207610 | 
| | 
| 
| 
| 
| 
| | 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition
llvm-svn: 207504 | 
| | 
| 
| 
| 
| 
| 
| | opcode so there's no reason to use the target namespace for it
rather than TargetOpcode.
llvm-svn: 207475 | 
| | 
| 
| 
| | llvm-svn: 207473 | 
| | 
| 
| 
| | llvm-svn: 207397 | 
| | 
| 
| 
| | llvm-svn: 207394 | 
| | 
| 
| 
| | llvm-svn: 207377 | 
| | 
| 
| 
| | llvm-svn: 207374 | 
| | 
| 
| 
| 
| 
| | and size.
llvm-svn: 207329 | 
| | 
| 
| 
| | llvm-svn: 207327 | 
| | 
| 
| 
| | llvm-svn: 207197 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143 | 
| | 
| 
| 
| 
| 
| 
| 
| | I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.
llvm-svn: 207124 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971 | 
| | 
| 
| 
| 
| 
| 
| | definition below all of the header #include lines, lib/Target/...
edition.
llvm-svn: 206842 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...
llvm-svn: 206838 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822 | 
| | 
| 
| 
| 
| 
| | its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.
This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).
llvm-svn: 206241 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:
MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).
llvm-svn: 206141 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.
This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.
llvm-svn: 206136 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const
Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:
SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup
MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown
llvm-svn: 206120 | 
| | 
| 
| 
| | llvm-svn: 205961 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.
Noticed by inspection; no test case (yet).
llvm-svn: 205787 | 
| | 
| 
| 
| 
| 
| | Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"
llvm-svn: 205660 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.
Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown
Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup
Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.
llvm-svn: 205658 | 
| | 
| 
| 
| 
| 
| | Remove the declaration of an unimplemented function.
llvm-svn: 205657 | 
| | 
| 
| 
| 
| 
| 
| 
| | gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.
llvm-svn: 205630 | 
| | 
| 
| 
| | llvm-svn: 205610 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.
llvm-svn: 205476 |