| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.
In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.
This patch takes advantage of that.
llvm-svn: 268328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This operation may branch to the handler block and we do not want it
to happen anywhere within the basic block.
Moreover, by marking it "terminator and branch" the machine verifier
does not wrongly assume (because of AnalyzeBranch not knowing better)
the branch is analyzable. Indeed, the target was seeing only the
unconditional branch and not the faulting load op and thought it was
a simple unconditional block.
The machine verifier was complaining because of that and moreover,
other optimizations could have done wrong transformation!
In the process, simplify the representation of the handler block in
the faulting load op. Now, we directly reference the handler block
instead of using a label. This has the benefits of:
1. MC knows how to issue a label for a BB, so leave that to it.
2. Accessing the target BB from its label is painful, whereas it is
direct from a MBB operand.
Note: The 2 bytes offset in implicit-null-check.ll comes from the
fact the unconditional jumps are not removed anymore, as the whole
terminator sequence is not analyzable anymore.
Will fix it in a subsequence commit.
llvm-svn: 268327
|
|
|
|
|
|
| |
Demonstrate missing 128-bit wide shuffle combine support
llvm-svn: 268290
|
|
|
|
|
|
|
|
| |
Fixes PR27241.
Differential Revision: http://reviews.llvm.org/D19688
llvm-svn: 268227
|
|
|
|
|
|
| |
there fix the execution domain for VPACKSSDW/VPACKUSDW.
llvm-svn: 268200
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D19775
llvm-svn: 268195
|
|
|
|
|
|
|
|
| |
implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth.
Differential Revision: http://reviews.llvm.org/D19579
llvm-svn: 268190
|
|
|
|
|
|
| |
VLX and BWI are supported.
llvm-svn: 268189
|
|
|
|
|
|
| |
Fix a FIXME. Disable loop alignment if compiled with -Oz now.
llvm-svn: 268121
|
|
|
|
| |
llvm-svn: 268106
|
|
|
|
| |
llvm-svn: 268094
|
|
|
|
|
|
|
|
|
| |
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.
Differential Revision: http://reviews.llvm.org/D19138
llvm-svn: 267809
|
|
|
|
| |
llvm-svn: 267806
|
|
|
|
|
|
|
|
|
|
|
| |
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.
llvm-svn: 267797
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D19592
llvm-svn: 267773
|
|
|
|
| |
llvm-svn: 267723
|
|
|
|
|
|
|
| |
It's probably the case for all 3 MMX users out there, but with
hand-crafted IR, you can trigger selection failures. Fix that.
llvm-svn: 267652
|
|
|
|
|
|
|
|
|
| |
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.
llvm-svn: 267651
|
|
|
|
|
|
|
|
| |
the pattern is matched.
Differential revision: http://reviews.llvm.org/D14840
llvm-svn: 267649
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the prologue.
Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.
An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.
Fixes PR27531.
llvm-svn: 267634
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D19568
llvm-svn: 267629
|
|
|
|
|
|
|
| |
Thanks to that information we wouldn't lie on a register being live whereas it
is not.
llvm-svn: 267622
|
|
|
|
|
|
|
| |
Now, it is possible to know that partial definitions are dead definitions and
recognize that clobbered registers are also dead.
llvm-svn: 267621
|
|
|
|
|
|
|
| |
We don't need to copy the sret argument into %rax upon return.
rdar://25671494
llvm-svn: 267579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
turned into a branch
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344
CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.
For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.
As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie,
50/50 taken/not-taken might still be 100% predictable.
Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.
Differential Revision: http://reviews.llvm.org/D19488
llvm-svn: 267572
|
|
|
|
|
|
|
|
| |
Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass.
Differential Revision: http://reviews.llvm.org/D19409
llvm-svn: 267551
|
|
|
|
|
|
|
|
|
| |
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.
Differential Revision: http://reviews.llvm.org/D19472
llvm-svn: 267495
|
|
|
|
| |
llvm-svn: 267426
|
|
|
|
| |
llvm-svn: 267417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
successors
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.
This resulted in split points which were insufficiently early if an
invoke was present.
This fixes PR27501.
N.B. This removes getLandingPadSuccessor.
llvm-svn: 267412
|
|
|
|
|
|
|
| |
Was reviewed over the shoulder by AsafBadouh.
Connected to review http://reviews.llvm.org/D19195.
llvm-svn: 267379
|
|
|
|
|
|
| |
and without zero undef being lowered to bsf/bsr.
llvm-svn: 267373
|
|
|
|
| |
llvm-svn: 267362
|
|
|
|
|
|
| |
Codegen is pretty bad at the moment but could use PSHUFB quite efficiently
llvm-svn: 267347
|
|
|
|
|
|
| |
Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask.
llvm-svn: 267346
|
|
|
|
|
|
|
|
| |
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.
This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.
llvm-svn: 267343
|
|
|
|
|
|
|
|
| |
lowered as rematerialized constants on scalar unit
Found whilst investigating PR27472
llvm-svn: 267339
|
|
|
|
|
|
| |
instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly.
llvm-svn: 267311
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*. It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.
Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType. The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.
This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata. Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.
The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html
llvm-svn: 267296
|
|
|
|
|
|
| |
Currently failing due to poor blend matching, found whilst investigating PR27472
llvm-svn: 267282
|
|
|
|
|
|
| |
select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part.
llvm-svn: 267280
|
|
|
|
| |
llvm-svn: 267229
|
|
|
|
|
|
| |
If the target allows the alignment, this should be OK.
llvm-svn: 267217
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
|
|
|
|
|
|
| |
If the target allows the alignment, this should still be OK.
llvm-svn: 267209
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When generating assembly using -m16 we must explicitly mark it as
16-bit. Emit .code16 at beginning of file. Fixes wrong results when
using -fno-integrated-as.
Reviewers: dwmw2
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19392
llvm-svn: 267152
|
|
|
|
|
|
| |
CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ.
llvm-svn: 267100
|
|
|
|
|
|
|
| |
If the extracted bits are restricted to the upper half or lower half,
this can be truncated.
llvm-svn: 267024
|
|
|
|
| |
llvm-svn: 266968
|
|
|
|
|
|
| |
the runs. Update check patterns accordingly.
llvm-svn: 266967
|