| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 275110
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.
The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21551
llvm-svn: 275108
|
| |
|
|
|
|
|
|
|
| |
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).
Differential Revision: http://reviews.llvm.org/D22064
llvm-svn: 275106
|
| |
|
|
|
|
|
|
| |
Blocks to be tail-merged may share more than one successor. Correct the
comment to state that they share a specific successor, SuccBB, rather
than a single successor, which is not true.
llvm-svn: 275104
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call sequence to be generated in two passes instead of one.
Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:
popl %edi
popl %edi
pushl %eax
jmp tailcallee # TAILCALL
To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.
Patch by Magnus Lång <margnus1@gmail.com>
Differential Revision: http://reviews.llvm.org/D21325
llvm-svn: 275103
|
| |
|
|
| |
llvm-svn: 275101
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains.
Add additional parameters: AddressSpace, Alignment, Fast.
Reviewers: llvm-commits, jlebar
Subscribers: arsenm, mzolotukhin
Differential Revision: http://reviews.llvm.org/D21935
llvm-svn: 275100
|
| |
|
|
|
|
|
|
|
|
| |
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.
This also works around (but doesn't quite fix) PR28489.
llvm-svn: 275099
|
| |
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21395
Patch by Vivek Pandya.
PR28144
llvm-svn: 275087
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values. Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.
Author: RolandF
Reviewers: uweigand
Subscribers: zhanjunl
Commiting on behalf of Roland.
Differential Revision: http://reviews.llvm.org/D22117
llvm-svn: 275086
|
| |
|
|
|
|
| |
Thanks to Eli Friedman for pointing out in his post-commit review!
llvm-svn: 275084
|
| |
|
|
| |
llvm-svn: 275083
|
| |
|
|
|
|
|
|
| |
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.
llvm-svn: 275082
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: http://reviews.llvm.org/D22118 uses metadata to store the call count, which makes it possible to have branch weight to have only one elements. Also fix the assertion failure in inliner when checking the instruction type to include "invoke" instruction.
Reviewers: mkuper, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22228
llvm-svn: 275079
|
| |
|
|
| |
llvm-svn: 275077
|
| |
|
|
|
|
|
|
|
| |
In preparation for porting this pass to the new PM (which has no
doInitialization()).
Differential Revision: http://reviews.llvm.org/D22223
llvm-svn: 275074
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.
E.g.
if (A1 && A2 && A3 && ..... && A10) {
for (i=0; i < 100000000; i++) {
callsite();
}
}
Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.
In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.
Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.
Reviewers: davidxl, eraman, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22118
llvm-svn: 275073
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22180
llvm-svn: 275072
|
| |
|
|
| |
llvm-svn: 275069
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation
to generate jumps with 16-bit sized immediates in 16-bit mode.
This fixes PR22097.
Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight
Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D20830
llvm-svn: 275068
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reviewers: hfinkel, majnemer, tstellarAMD, sunfish
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17279
llvm-svn: 275066
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This subtle change to getModRefInfo(Instruction, ImmutableCallSite) is to
ensure that the semantics are equal to that of getModRefInfo(CS1, CS2) when
the Instruction is a call-site.
This is now more in line with getModRefInfo generally: it returns Mod when
I modifies a memory location that is accessed (read or written) by CS and
Ref when I reads a memory location that is written by CS.
From a grep of the code, the only uses of this particular getModRefInfo
overload are in MemorySSA and MemCpyOptimizer, and they only care about
where the result is MR_NoModRef or not. Therefore, this change should have
no visible effect.
Separated out from D17279 upon request.
llvm-svn: 275065
|
| |
|
|
|
|
| |
At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet.
llvm-svn: 275059
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.
Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.
Reviewers: rtrieu, dwmw2, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20020
llvm-svn: 275058
|
| |
|
|
|
|
|
|
|
|
| |
Fixes issue mentioned at:
https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13.
Lit tests added.
Differential Revision: http://reviews.llvm.org/D22133
llvm-svn: 275054
|
| |
|
|
|
|
|
|
| |
SWC2 instructions and add CodeGen support
Differential Revision: http://reviews.llvm.org/D18824
llvm-svn: 275050
|
| |
|
|
|
|
|
|
| |
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.
llvm-svn: 275048
|
| |
|
|
|
|
| |
survive long enough to allow the matching.
llvm-svn: 275046
|
| |
|
|
|
|
| |
vectors.
llvm-svn: 275045
|
| |
|
|
|
|
|
|
| |
marked for CanFoldAsLoad.
I don't really know how to test this.
llvm-svn: 275044
|
| |
|
|
|
|
| |
Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot.
llvm-svn: 275042
|
| |
|
|
|
|
|
|
|
| |
For functions which are known to return a specific argument, pointer-comparison
folding can look through the function calls as part of its analysis.
Differential Revision: http://reviews.llvm.org/D9387
llvm-svn: 275039
|
| |
|
|
|
|
|
|
|
| |
For functions which are known to return their argument,
isDereferenceableAndAlignedPointer can examine the argument value.
Differential Revision: http://reviews.llvm.org/D9384
llvm-svn: 275038
|
| |
|
|
|
|
|
|
|
| |
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.
Differential Revision: http://reviews.llvm.org/D9381
llvm-svn: 275037
|
| |
|
|
|
|
|
|
|
| |
If a function is known to return one of its arguments, we can use that in order
to compute known bits of the return value.
Differential Revision: http://reviews.llvm.org/D9397
llvm-svn: 275036
|
| |
|
|
|
|
|
|
|
|
|
| |
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.
Differential Revision: http://reviews.llvm.org/D9383
llvm-svn: 275035
|
| |
|
|
|
|
|
| |
Suggested post-commit by David Majnemer on IRC (following-up on a pre-commit
review comment).
llvm-svn: 275033
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the optimizer smarter about using the 'returned' argument
attribute (generally, but motivated by my llvm.noalias intrinsic work), add a
utility function to Call/InvokeInst, and CallSite, to make it easy to get the
returned call argument (when one exists).
P.S. There is already an unfortunate amount of code duplication between
CallInst and InvokeInst, and this adds to it. We should probably clean that up
separately.
Differential Revision: http://reviews.llvm.org/D22204
llvm-svn: 275031
|
| |
|
|
|
|
| |
Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32.
llvm-svn: 275028
|
| |
|
|
|
|
|
|
|
|
| |
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.
Differential Revision: http://reviews.llvm.org/D22202
llvm-svn: 275027
|
| |
|
|
| |
llvm-svn: 275025
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21622
llvm-svn: 275024
|
| |
|
|
| |
llvm-svn: 275022
|
| |
|
|
| |
llvm-svn: 275021
|
| |
|
|
| |
llvm-svn: 275017
|
| |
|
|
|
|
|
|
|
|
|
| |
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions. A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.
Differential Revision: http://reviews.llvm.org/D21949
llvm-svn: 275016
|
| |
|
|
| |
llvm-svn: 275015
|
| |
|
|
| |
llvm-svn: 275014
|
| |
|
|
| |
llvm-svn: 275011
|
| |
|
|
|
|
|
|
| |
Some abstractions in LLVM "know" that they are reading in-bounds,
FixedStreamArray, and provide a simple result. This breaks down if the
stream map is bogus.
llvm-svn: 275010
|