|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | that are unused.
This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.
Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.
llvm-svn: 219086 | 
| | 
| 
| 
| 
| 
| | NFC.
llvm-svn: 219061 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends.  During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node.  In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE.  Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat.  Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node).  We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed.  On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask.  This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user.  Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE.  This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly.  (Previous patches used HandleSDNode to detect the
CSE but that's not practical here).  The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc.  The only thing we pay for is the
creation of the listener.  The callback never ever triggers in spec2k since
this is a corner case.
Fixes rdar://problem/18206171
llvm-svn: 219009 | 
| | 
| 
| 
| | llvm-svn: 218999 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787 | 
| | 
| 
| 
| 
| 
| | "Move the complex address expression out of DIVariable and into an extra"
llvm-svn: 218782 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778 | 
| | 
| 
| 
| 
| 
| | refinement of an estimate. NFC.
llvm-svn: 218700 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...
llvm-svn: 218698 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.
No functional change intended.
llvm-svn: 218677 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.
This problem is found in spec2006-197.parser.
For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.
Patch by David Xu!
llvm-svn: 218569 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
llvm-svn: 218553 | 
| | 
| 
| 
| | llvm-svn: 218494 | 
| | 
| 
| 
| | llvm-svn: 218493 | 
| | 
| 
| 
| 
| 
| 
| | since we are accessing the TargetMachine that we're a member
function of.
llvm-svn: 218489 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.
This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.
llvm-svn: 218458 | 
| | 
| 
| 
| 
| 
| | FunctionLoweringInfo.
llvm-svn: 218364 | 
| | 
| 
| 
| | llvm-svn: 218314 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is purely a plumbing patch. No functional changes intended.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.
Next steps:
    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.
Differential Revision: http://reviews.llvm.org/D5425
llvm-svn: 218219 | 
| | 
| 
| 
| | llvm-svn: 218171 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.
Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.
Patch by Olivier Sallenave, thanks!
llvm-svn: 218120 | 
| | 
| 
| 
| 
| 
| 
| 
| | With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.
llvm-svn: 218101 | 
| | 
| 
| 
| 
| 
| 
| | Without a vector to hold the created ops, these 
functions don't have any use.
llvm-svn: 217831 | 
| | 
| 
| 
| 
| 
| | Make the optimizeCmpPredicate function available to all targets.
llvm-svn: 217822 | 
| | 
| 
| 
| | llvm-svn: 217814 | 
| | 
| 
| 
| | llvm-svn: 217711 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | vectors.
e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup
the result by 32 bits, not 64. PR20917.
llvm-svn: 217671 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.
This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.
llvm-svn: 217610 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820
That patch allowed combining of splatted vector FP constants that are multiplied.
This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.
This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.
Differential Revision: http://reviews.llvm.org/D5254
llvm-svn: 217599 | 
| | 
| 
| 
| | llvm-svn: 217570 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | instructions (PR20863)
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.
Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.
This removes some redundant code at -O0, and more importantly fixes PR20863.
Differential Revision: http://reviews.llvm.org/D5249
llvm-svn: 217401 | 
| | 
| 
| 
| | llvm-svn: 217399 | 
| | 
| 
| 
| 
| 
| 
| 
| | fold permutations.
The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll.
llvm-svn: 217393 | 
| | 
| 
| 
| 
| 
| | Also added a FIXME regarding redundant folds for non-canonicalized constants.
llvm-svn: 217390 | 
| | 
| 
| 
| 
| 
| 
| 
| | This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.
llvm-svn: 217286 | 
| | 
| 
| 
| | llvm-svn: 217278 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
llvm-svn: 217075 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo
Reviewed by Eric
llvm-svn: 217074 | 
| | 
| 
| 
| | llvm-svn: 217071 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
Reviewed by Eric
llvm-svn: 217060 | 
| | 
| 
| 
| 
| 
| 
| 
| | Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 216982 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | selection. NFC.
This allows the target to disable target-independent instruction selection and
jump directly into the target-dependent instruction selection code.
This can be beneficial for targets, such as AArch64, which could emit much
better code, but never got a chance to do so, because the target-independent
instruction selector was able to find an instruction sequence.
llvm-svn: 216947 | 
| | 
| 
| 
| 
| 
| 
| 
| | If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
llvm-svn: 216932 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().
Fixes PR20828.
llvm-svn: 216929 | 
| | 
| 
| 
| 
| 
| 
| | This was copy-paste from the integer version, but
FP build_vectors don't truncate.
llvm-svn: 216928 | 
| | 
| 
| 
| 
| 
| 
| 
| | This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour
Reviewed by Andy Trick and Chandler C
llvm-svn: 216919 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
llvm-svn: 216908 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
llvm-svn: 216898 | 
| | 
| 
| 
| | llvm-svn: 216781 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).
Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.
Reviewers: majnemer
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5060
llvm-svn: 216780 |