| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: With MSVC 2013 and GCC < 4.8 gone, we can use the "constexpr" keyword.
Reviewers: bkramer, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25901
llvm-svn: 284947
|
|
|
|
|
|
| |
Also, use APInt to avoid crashing on types larger than vNi64.
llvm-svn: 284874
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value
This is the DAG equivalent of:
https://reviews.llvm.org/rL284649
plus the fold for the NUW case which already existed in InstSimplify.
Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().
llvm-svn: 284844
|
|
|
|
|
|
|
|
| |
Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should have the size suffix stripped off.
llvm-svn: 284826
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.
Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x
Note that although the test cases include a 'sin' function call, I'm side-stepping the
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.
Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.
Differential Revision: https://reviews.llvm.org/D25297
llvm-svn: 284824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)
Updated the promote-vec3.ll test after this change. The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.
Reviewers: RKSimon
Subscribers: llvm-commits, srhines
Differential Revision: https://reviews.llvm.org/D25790
llvm-svn: 284752
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.
This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.
original commit msg:
[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284746
|
|
|
|
|
|
| |
No functionality change intended.
llvm-svn: 284733
|
|
|
|
|
|
|
|
| |
(and x, cst2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector
llvm-svn: 284717
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add alignment attribute to DIVariable family
- Modify bitcode format to match new DIVariable representation
- Update tests to match these changes (also add bitcode upgrade test)
- Expect that frontend passes non-zero align value only when it is not default
(was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned())
Differential Revision: https://reviews.llvm.org/D25073
llvm-svn: 284678
|
|
|
|
|
|
|
|
|
| |
This code crashed on funclet-style EH instructions such as catchpad,
catchswitch, and cleanuppad. Just treat all EH pad instructions
equivalently and avoid merging the globals they reference through any
use.
llvm-svn: 284633
|
|
|
|
| |
llvm-svn: 284616
|
|
|
|
|
|
|
|
| |
-> (add (shl x, c2), c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector
llvm-svn: 284613
|
|
|
|
|
|
| |
This code used a regular when it should have used a multimap.
llvm-svn: 284612
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.
This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395
There may be others, so please let me know if you see any regressions.
Differential Revision: https://reviews.llvm.org/D25485
llvm-svn: 284611
|
|
|
|
|
|
|
|
| |
-> (and x, (shl -1, c1))
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector
llvm-svn: 284608
|
|
|
|
|
|
|
|
| |
-> (mul x, c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector
llvm-svn: 284607
|
|
|
|
|
|
| |
Committed by mistake.
llvm-svn: 284606
|
|
|
|
| |
llvm-svn: 284604
|
|
|
|
| |
llvm-svn: 284603
|
|
|
|
|
|
| |
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.
llvm-svn: 284578
|
|
|
|
|
|
| |
non-opaque constant or constant vector
llvm-svn: 284574
|
|
|
|
|
|
|
|
| |
MBPI exposed by th patch.
Also update the section.ll to fix non-x86 failure.
llvm-svn: 284563
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284545
|
|
|
|
| |
llvm-svn: 284544
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284541
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation:
1. add a new metadata for function section prefix
2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function
3. output the section prefix set by CGP
Reviewers: davidxl, eraman
Subscribers: vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D24989
llvm-svn: 284533
|
|
|
|
| |
llvm-svn: 284527
|
|
|
|
| |
llvm-svn: 284525
|
|
|
|
|
|
|
| |
This is a threading hazard and rightfully complained about by tsan. No
functionality change.
llvm-svn: 284515
|
|
|
|
|
|
| |
There's something wrong with the StringRef usage while parsing the attribute string.
llvm-svn: 284513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
|
|
|
|
|
|
| |
>>u (log2(pow2)+y)
llvm-svn: 284491
|
|
|
|
|
|
|
|
|
| |
In futher patches we shall have alignment field added to DIVariable family
and switching from uint64_t to uint32_t will save 4 bytes per variable.
Differential Revision: https://reviews.llvm.org/D25620
llvm-svn: 284482
|
|
|
|
| |
llvm-svn: 284478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
llvm-svn: 284456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are differences in codegen between Linux and Windows due to:
1. Using std::sort which uses quicksort which is a non-stable sort.
2. Iterating over Set data structure where the iteration order is
non deterministic.
Reviewers: arsenm, grosbach, junbuml, zinob, MatzeB
Subscribers: MatzeB, wdng
Differential Revision: https://reviews.llvm.org/D25695
llvm-svn: 284441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scalar version of this pattern was noted in:
https://reviews.llvm.org/D25485
and fixed with:
https://reviews.llvm.org/rL284395
More refactoring of the constant/splat helpers is needed and will happen in follow-up patches.
Differential Revision: https://reviews.llvm.org/D25685
llvm-svn: 284424
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in:
https://reviews.llvm.org/D25685
This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch.
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.
llvm-svn: 284421
|
|
|
|
| |
llvm-svn: 284411
|
|
|
|
|
|
|
|
|
| |
This came up as part of:
https://reviews.llvm.org/D25485
Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.
llvm-svn: 284395
|
|
|
|
|
|
|
|
| |
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.
In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.
llvm-svn: 284381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
retain the original debug location.
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.
CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.
An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.
Differential Revision: https://reviews.llvm.org/D25611
llvm-svn: 284377
|
|
|
|
|
|
|
|
| |
SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577
llvm-svn: 284312
|
|
|
|
|
|
|
|
|
|
| |
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
|
|
|
|
|
|
|
|
| |
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
|
|
|
|
|
|
|
|
| |
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D19852
llvm-svn: 284254
|