| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.
This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().
Differential Revision: https://reviews.llvm.org/D26986
llvm-svn: 288057
|
|
|
|
|
|
|
|
| |
Patch by Victor Campos
Differential Revision: https://reviews.llvm.org/D27172
llvm-svn: 288056
|
|
|
|
|
|
| |
-cgp-freq-ratio-to-skip-merge option was removed by rollback in r288052.
llvm-svn: 288055
|
|
|
|
|
|
|
|
|
|
| |
cleaned; poor indents and one typo fixed.
Patch by Victor Campos.
Differential Revision: https://reviews.llvm.org/D26786
llvm-svn: 288054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp
is implemented. Additional side effect of this is that we may consume less VGPRs
at a cost of more SGPRs in case if holding of multiple conditions is needed, and
that is a clear win in most cases.
Differential Revision: https://reviews.llvm.org/D26114
llvm-svn: 288053
|
|
|
|
|
|
|
| |
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").
llvm-svn: 288052
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D27000
llvm-svn: 288051
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As far as I can tell, doing our own computations in
NearestCommonDominator is a false optimization -- DomTree will build up
what appears to be exactly this data when it decides it's worthwhile.
Moreover, by building the cache ourselves, we cannot take advantage of
the cache that the domtree might have available.
In addition, I am not convinced of the correctness of the original code.
In particular, setting ResultIndex = 1 on the first addBlock instead of
setting it to 0 is quite fishy. Similarly, it's not clear to me that
setting IndexMap[Node] = 0 for every node as we walk up the tree finding
a common parent is correct. But rather than ponder over these
questions, I'd rather just make the code do the obviously-correct thing.
This patch also changes the NearestCommonDominator API a bit, improving
the names and getting rid of the boolean parameter in addBlock -- see
http://jlebar.com/2011/12/16/Boolean_parameters_to_API_functions_considered_harmful..html
Reviewers: arsenm
Subscribers: aemerson, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26998
llvm-svn: 288050
|
|
|
|
| |
llvm-svn: 288049
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires some changes to the opt-diag API. Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.
Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output. (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)
This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"
Differential Revision: https://reviews.llvm.org/D26489
llvm-svn: 288047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-on patches will add more interesting cases.
The goal of this patch-set is to get the GVN messages printed in
opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This
is the optimization view for the function (the last remark in the
function has a bug which is fixed in this series):
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430
Differential Revision: https://reviews.llvm.org/D26488
llvm-svn: 288046
|
|
|
|
| |
llvm-svn: 288045
|
|
|
|
| |
llvm-svn: 288044
|
|
|
|
|
|
| |
Just let the existing typo correction machinery handle that.
llvm-svn: 288043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment optimized tablegen is generated by LLVM_USE_HOST_TOOLS variable that is not set for Visual Sudio since LLVM_ENABLE_ASSERTIONS depends on CMAKE_BUILD_TYPE value that is not equal to "DEBUG" in case of Visual Studio soltion generation.
Modified to do not depend on LLVM_ENABLE_ASSERTIONS value in VS and Xcode cases
Reviewers: beanz
Subscribers: RKSimon, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D27135
llvm-svn: 288042
|
|
|
|
|
|
| |
This addresses the comment D26832.
llvm-svn: 288041
|
|
|
|
|
|
|
|
| |
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.
Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.
llvm-svn: 288040
|
|
|
|
|
|
|
|
|
| |
If member expression is used in the task region and the base expression
is a DeclRefExp and the variable used in this ref expression is private,
it should be marked as implicitly firstprivate inside this region. Patch
fixes this issue.
llvm-svn: 288039
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On for 64-bit targets, the correct register set to read the fxsave are is
NT_PRFPREG (only 32-bit targets need NT_PRXFPREG, presumably for historic
reasons). Reference:
<https://github.com/torvalds/linux/blob/v4.8/arch/x86/kernel/ptrace.c#L1261>.
Reviewers: tberghammer, valentinagiusti
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D27161
llvm-svn: 288038
|
|
|
|
| |
llvm-svn: 288037
|
|
|
|
| |
llvm-svn: 288036
|
|
|
|
|
|
| |
This reverts commit r287773 which caused issues with ppc64le builds.
llvm-svn: 288035
|
|
|
|
|
|
| |
msc Debug build detected it.
llvm-svn: 288034
|
|
|
|
|
|
|
| |
Remove unused variable that came in due to a copy-and-paste bug
and caused build bot failures.
llvm-svn: 288033
|
|
|
|
| |
llvm-svn: 288032
|
|
|
|
|
|
|
|
|
| |
This adds assembler support for the instructions provided by the
execution-hint facility (NIAI and BP(R)P). This required adding
support for the new relocation types for 12-bit and 24-bit PC-
relative offsets used by the BP(R)P instructions.
llvm-svn: 288031
|
|
|
|
|
|
|
| |
This adds support for the instructions provided with the
load-and-trap facility.
llvm-svn: 288030
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds assembler support for the remaining branch instructions:
the non-relative branch on count variants, and all variants of branch
on index.
The only one of those that can be readily exploited for code generation
is BRCTH (branch on count using a high 32-bit register as count). Do
use it, however, it is necessary to also introduce a hew CHIMux pseudo
to allow comparisons of a 32-bit value agains a short immediate to go
into a high register as well (implemented via CHI/CIH).
This causes a bit of codegen changes overall, but those have proven to
be neutral (or even beneficial) in performance measurements.
llvm-svn: 288029
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves formation of LOC-type instructions from (late)
IfConversion to the early if-conversion pass, and in some cases
additionally creates them directly from select instructions
during DAG instruction selection.
To make early if-conversion work, the patch implements the
canInsertSelect / insertSelect callbacks. It also implements
the commuteInstructionImpl and FoldImmediate callbacks to
enable generation of the full range of LOC instructions.
Finally, the patch adds support for all instructions of the
load-store-on-condition-2 facility, which allows using LOC
instructions also for high registers.
Due to the use of the GRX32 register class to enable high registers,
we now also have to handle the cases where there are still no single
hardware instructions (conditional move from a low register to a high
register or vice versa). These are converted back to a branch sequence
after register allocation. Since the expandRAPseudos callback is not
allowed to create new basic blocks, this requires a simple new pass,
modelled after the ARM/AArch64 ExpandPseudos pass.
Overall, this patch causes significantly more LOC-type instructions
to be used, and results in a measurable performance improvement.
llvm-svn: 288028
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of the decorator does not skip if the android target
arch is the same as host arch (as in both cases the platform comes out as linux).
Nonetheless android x86_64 binaries are not compatible with linux ones.
Technically this should be "skip if target is android and host is *not* android",
but currently nobody runs lldb test suite on an android host, so we don't even
have a way of specifying that the host is android.
llvm-svn: 288027
|
|
|
|
|
|
|
| |
We are getting a null pointer for the list of categories here (presumably due to
the args refactor).
llvm-svn: 288026
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
clang-tidy checks frequently use source ranges of functions.
The source range of constructors and destructors in template instantiations
is currently a single token.
The factory method for constructors and destructors does not allow the
end source location to be specified.
Set end location manually after creating instantiation.
Reviewers: aaron.ballman, rsmith, arphaman
Subscribers: arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D26849
llvm-svn: 288025
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.
As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.
The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.
For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).
Fixes D26848.
llvm-svn: 288024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This never made a lot of sense. They've been invalidated for one IR unit
but they aren't really preserved in any normal sense. It seemed like it
would be an elegant way of communicating to outer IR units that pass
managers and adaptors had already handled invalidation, but we've since
ended up adding sets that model this more clearly: we're now using
the 'AllAnalysesOn<IRUnitT>' set to handle cases where the trick of
"preserving" invalidated analyses didn't work.
This patch moves to rely on that technique exclusively and removes the
cumbersome API aspect of updating the preserved set when doing
invalidation. This in turn will simplify a *number* of upcoming patches.
This has a side benefit of exposing a number of places where we were
failing to mark the 'AllAnalysesOn<IRUnitT>' set as preserved. This
patch fixes those, and with those fixes shouldn't change any observable
behavior.
llvm-svn: 288023
|
|
|
|
|
|
|
|
|
| |
That unifies handling cases when we have SECTIONS and when
-no-rosegment is given in compareSectionsNonScript()
Now Config->SingleRoRx is used for check, testcase is provided.
llvm-svn: 288022
|
|
|
|
|
|
|
|
|
|
|
| |
Previously Config->SingleRoRx was set in
createFiles() and used HasSections.
This change moves it to readConfigs at place of
common flags handling, and adds logic that sets
this flag separatelly from ScriptParser if SECTIONS present.
llvm-svn: 288021
|
|
|
|
|
|
|
|
| |
--no-rosegment: Do not put read-only non-executable sections in their own segment
Differential revision: https://reviews.llvm.org/D26889
llvm-svn: 288020
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D27108
llvm-svn: 288019
|
|
|
|
| |
llvm-svn: 288018
|
|
|
|
| |
llvm-svn: 288017
|
|
|
|
|
|
|
| |
The callers don't use the return value. Found by Michael
Spencer.
llvm-svn: 288016
|
|
|
|
|
|
| |
This reverts commit r288014, the unittest isn't passing
llvm-svn: 288015
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some scanner errors were not checked and reported by the parser.
Fix PR30934
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D26419
llvm-svn: 288014
|
|
|
|
|
|
| |
No functionality changed.
llvm-svn: 288013
|
|
|
|
|
|
|
|
|
| |
Unfortunatelly PT_ARM_EXIDX is special. There is no way to create it
from linker scripts, so we have to create it even if PHDRS is used.
This matches bfd and is required for the lld output to survive bfd's strip.
llvm-svn: 288012
|
|
|
|
|
|
| |
commutable as operand 0 should pass its upper bits through to the output.
llvm-svn: 288011
|
|
|
|
|
|
| |
patterns.
llvm-svn: 288010
|
|
|
|
| |
llvm-svn: 288009
|
|
|
|
|
|
| |
FMA4 scalar intrinsics.
llvm-svn: 288008
|
|
|
|
|
|
| |
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.
llvm-svn: 288007
|