| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
processInstructionForSlowLEA (NFCI)
The function isn't SLM specific (its driven by the FeatureSlowLEA flag).
Minor tidyup prior to PR38225.
llvm-svn: 345836
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is retrying the fold from rL345717
(reverted at rL347780)
...with a fix for the miscompile
demonstrated by PR39510:
https://bugs.llvm.org/show_bug.cgi?id=39510
Original commit message:
This is a fix for PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
We managed to get some of these patterns using computeKnownBits in https://reviews.llvm.org/D47041, but that
can't be used for nabs(). Instead, put in some range-based logic, so we can fold
both abs/nabs with icmp with a constant value.
Alive proofs:
https://rise4fun.com/Alive/21r
Name: abs_nsw_is_positive
%cmp = icmp slt i32 %x, 0
%negx = sub nsw i32 0, %x
%abs = select i1 %cmp, i32 %negx, i32 %x
%r = icmp sgt i32 %abs, -1
=>
%r = i1 true
Name: abs_nsw_is_not_negative
%cmp = icmp slt i32 %x, 0
%negx = sub nsw i32 0, %x
%abs = select i1 %cmp, i32 %negx, i32 %x
%r = icmp slt i32 %abs, 0
=>
%r = i1 false
Name: nabs_is_negative_or_0
%cmp = icmp slt i32 %x, 0
%negx = sub i32 0, %x
%nabs = select i1 %cmp, i32 %x, i32 %negx
%r = icmp slt i32 %nabs, 1
=>
%r = i1 true
Name: nabs_is_not_over_0
%cmp = icmp slt i32 %x, 0
%negx = sub i32 0, %x
%nabs = select i1 %cmp, i32 %x, i32 %negx
%r = icmp sgt i32 %nabs, 0
=>
%r = i1 false
Differential Revision: https://reviews.llvm.org/D53844
llvm-svn: 345832
|
| |
|
|
|
|
|
|
|
|
| |
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.
Differential Revision: https://reviews.llvm.org/D53366
llvm-svn: 345830
|
| |
|
|
| |
llvm-svn: 345827
|
| |
|
|
|
|
|
|
|
|
| |
resolveTargetShuffleInputs (reapplied)
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed.
This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.
llvm-svn: 345824
|
| |
|
|
| |
llvm-svn: 345822
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In MipsBranchExpansion::splitMBB, upon splitting
a block with two direct branches, remove the successor
of the newly created block (which inherits successors from
the original block) which is pointed to by the last
branch in the original block only if the targets of two
branches differ.
This is to fix the failing test when ran with
-verify-machineinstrs enabled.
Differential Revision: https://reviews.llvm.org/D53756
llvm-svn: 345821
|
| |
|
|
| |
llvm-svn: 345820
|
| |
|
|
|
|
|
|
|
|
| |
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.
Review: Ulrich Weigand
https://reviews.llvm.org/D53924
llvm-svn: 345818
|
| |
|
|
|
|
|
|
|
|
|
| |
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.
Review: Ulrich Weigand
https://reviews.llvm.org/D53923
llvm-svn: 345817
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.
This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.
Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb
llvm-svn: 345814
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When we calculate a product of 2 AddRecs, we end up making quite massive
computations to deduce the operands of resulting AddRec. This process can
be optimized by computing all args of intermediate sum and then calling
`getAddExpr` once rather than calling `getAddExpr` with intermediate
result every time a new argument is computed.
Differential Revision: https://reviews.llvm.org/D53189
Reviewed By: rtereshin
llvm-svn: 345813
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TypeIndex used by cl.exe is 0x103, which indicates a SimpleTypeMode
of NearPointer (note the absence of the bitness, normally pointers use a
mode of NearPointer32 or NearPointer64) and a SimpleTypeKind of void.
So this is basically a void*, but without a specified size, which makes
sense given how std::nullptr_t is defined.
clang-cl was actually not emitting *anything* for this. Instead, when we
encountered std::nullptr_t in a DIType, we would actually just emit a
TypeIndex of 0, which is obviously wrong.
std::nullptr_t in DWARF is represented as a DW_TAG_unspecified_type with
a name of "decltype(nullptr)", so we add that logic along with a test,
as well as an update to the dumping code so that we no longer print
void* when dumping 0x103 (which would previously treat Void/NearPointer
no differently than Void/NearPointer64).
Differential Revision: https://reviews.llvm.org/D53957
llvm-svn: 345811
|
| |
|
|
|
|
|
|
|
|
| |
From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D53265
llvm-svn: 345810
|
| |
|
|
|
|
|
| |
This avoids declaring them twice: in X86TargetMachine.cpp and the file
implementing the pass.
llvm-svn: 345801
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change cuts across compiler-rt and llvm, to increment the FDR log
version number to 4, and include the CPU ID in the custom event records.
This is a step towards allowing us to change the `llvm::xray::Trace`
object to start representing both custom and typed events in the stream
of records. Follow-on changes will allow us to change the kinds of
records we're presenting in the stream of traces, to incorporate the
data in custom/typed events.
A follow-on change will handle the typed event case, where it may not
fit within the 15-byte buffer for metadata records.
This work is part of the larger effort to enable writing analysis and
processing tools using a common in-memory representation of the events
found in traces. The work will focus on porting existing tools in LLVM
to use the common representation and informing the design of a
library/framework for expressing trace event analysis as C++ programs.
Reviewers: mboerger, eizan
Subscribers: hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53920
llvm-svn: 345798
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53630
llvm-svn: 345797
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53886
llvm-svn: 345795
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also reduce the test case for implicit defs and test it with all
register classes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53855
llvm-svn: 345794
|
| |
|
|
|
|
|
|
|
|
|
| |
The "regular" file system has a useful feature that makes it possible to
stop recursing when using the recursive directory iterators. This
functionality was missing for the VFS recursive iterator and this patch
adds that.
Differential revision: https://reviews.llvm.org/D53465
llvm-svn: 345793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.
Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma
Reviewed By: efriedma
Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D53673
llvm-svn: 345791
|
| |
|
|
|
|
|
|
|
| |
I think this is the actual important property; the previous visibility
check was an approximation.
Differential Revision: https://reviews.llvm.org/D53852
llvm-svn: 345790
|
| |
|
|
| |
llvm-svn: 345786
|
| |
|
|
|
|
| |
Google is reporting regressions on some benchmarks.
llvm-svn: 345785
|
| |
|
|
|
|
|
|
|
| |
Shows up rarely for 64-bit arithmetic, more frequently for the compare
patterns added in r325323.
Differential Revision: https://reviews.llvm.org/D53848
llvm-svn: 345782
|
| |
|
|
|
|
|
| |
This can miscompile as shown in PR39510:
https://bugs.llvm.org/show_bug.cgi?id=39510
llvm-svn: 345780
|
| |
|
|
|
|
|
|
|
|
|
|
| |
SimplifySetCC could shrink a load without checking for
profitability or legality of such shink with a target.
Added checks to prevent shrinking of aligned scalar loads
in AMDGPU below dword as scalar engine does not support it.
Differential Revision: https://reviews.llvm.org/D53846
llvm-svn: 345778
|
| |
|
|
|
|
|
|
| |
Minor refactor of DWARFUnit::getStringOffsetSectionItem().
Differential Revision: https://reviews.llvm.org/D53948
llvm-svn: 345776
|
| |
|
|
|
|
|
|
|
| |
lowerRangeToAssertZExt currently relies on something like EarlyCSE having
eliminated the constant range [0,1). At -O0 this leads to an assert.
Differential Revision: https://reviews.llvm.org/D53888
llvm-svn: 345770
|
| |
|
|
|
|
|
|
|
|
|
| |
This feature is only relevant to shaders, and is no longer used. When disabled,
lowering of reserved registers for shaders causes a compiler crash.
Remove the feature and add a test for compilation of shaders at OptNone.
Differential Revision: https://reviews.llvm.org/D53829
llvm-svn: 345763
|
| |
|
|
|
|
| |
builds. NFC
llvm-svn: 345761
|
| |
|
|
| |
llvm-svn: 345758
|
| |
|
|
|
|
| |
We should be using the getShiftAmountTy value type for shift amounts.
llvm-svn: 345756
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, spatel
Reviewed By: spatel
Subscribers: lebedev.ri, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D53774
llvm-svn: 345751
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also fix a couple bugs where DILocations are lost. EntryBuilder wasn't passing
on debug locations for PHI's, constants, GLOBAL_VALUE, etc.
Reviewers: aprantl, vsk, bogner, aditya_nandakumar, volkan, rtereshin, aemerson
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53740
llvm-svn: 345743
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch DbgInfoAvailable was set to true in
DwarfDebug::beginModule() or CodeViewDebug::CodeViewDebug(). This made
MIR testing weird since passes would suddenly stop dealing with debug
info just because we stopped the pipeline before the debug printers.
This patch changes the logic to initialize DbgInfoAvailable based on the
fact that debug_compile_units exist in the llvm Module. The debug
printers may then override it with false in case of debug printing being
disabled.
Differential Revision: https://reviews.llvm.org/D53885
llvm-svn: 345740
|
| |
|
|
|
|
|
| |
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.
llvm-svn: 345734
|
| |
|
|
|
|
|
| |
Make sure that -relocation-model=pic prevents use of GP-relative
addressing modes.
llvm-svn: 345731
|
| |
|
|
|
|
| |
This is the inverted case for the transform added with D53874 / rL345725.
llvm-svn: 345728
|
| |
|
|
|
|
|
|
| |
The 'OLT' case was updated at rL266175, so I assume it was just an
oversight that 'UGE' was not included because that patch handled
both predicates in InstSimplify.
llvm-svn: 345727
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the
transform.
This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
Differential Revision: https://reviews.llvm.org/D53874
llvm-svn: 345725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.
In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.
Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.
Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440
llvm-svn: 345723
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover
Reviewed By: RKSimon
Subscribers: craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D52504
llvm-svn: 345721
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.
Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.
This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".
There are still some relevant cases where code quality could be
improved, in particular:
- We often introduce redundant masks with EXEC. Ideally, we'd
have a generic computeKnownBits-like analysis to determine
whether masks are already masked by EXEC, so we can avoid this
masking both here and when lowering uniform control flow.
- The criterion we use to determine whether a def is observed
from outside a loop is conservative: it doesn't check whether
(loop) branch conditions are uniform.
Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53496
llvm-svn: 345719
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.
But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.
(This does actually regress code quality in a few places because it
ends up relying more heavily on phi's of i1, which we don't do a
great job with. However, since it fixes real bugs in the wild, we
should take this change. I have some prototype changes to improve
i1 lowering in general -- not just for control flow -- which should
help recover the code quality, I just need to make those changes
fit for general consumption. -- Nicolai)
Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53359
llvm-svn: 345718
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
We managed to get some of these patterns using computeKnownBits in D47041, but that
can't be used for nabs(). Instead, put in some range-based logic, so we can fold
both abs/nabs with icmp with a constant value.
Alive proofs:
https://rise4fun.com/Alive/21r
Name: abs_nsw_is_positive
%cmp = icmp slt i32 %x, 0
%negx = sub nsw i32 0, %x
%abs = select i1 %cmp, i32 %negx, i32 %x
%r = icmp sgt i32 %abs, -1
=>
%r = i1 true
Name: abs_nsw_is_not_negative
%cmp = icmp slt i32 %x, 0
%negx = sub nsw i32 0, %x
%abs = select i1 %cmp, i32 %negx, i32 %x
%r = icmp slt i32 %abs, 0
=>
%r = i1 false
Name: nabs_is_negative_or_0
%cmp = icmp slt i32 %x, 0
%negx = sub i32 0, %x
%nabs = select i1 %cmp, i32 %x, i32 %negx
%r = icmp slt i32 %nabs, 1
=>
%r = i1 true
Name: nabs_is_not_over_0
%cmp = icmp slt i32 %x, 0
%negx = sub i32 0, %x
%nabs = select i1 %cmp, i32 %x, i32 %negx
%r = icmp sgt i32 %nabs, 0
=>
%r = i1 false
Differential Revision: https://reviews.llvm.org/D53844
llvm-svn: 345717
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constraints on instruction operands.
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:
Foo(MI->getOperand(0).getImm()) == ExpectedVal;
Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.
While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.
With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.
This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).
This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.
Differential Revision: https://reviews.llvm.org/D53880
llvm-svn: 345714
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Our a16 support was only enabled for sample/gather and buffer
load/store, but not for image load/store operations (which take an i16
as the pixel index rather than a half).
Fix our isel lowering and add test cases to prove it out.
Differential Revision: https://reviews.llvm.org/D53750
llvm-svn: 345710
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.
There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.
It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.
Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb
llvm-svn: 345708
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
|