| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
detect more NoNaNs
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SUMMARY:
CMVC has a compiler error on the
const uint64_t OffsetToRaw = is64Bit()
? toSection64(Sec)->FileOffsetToRawData
: toSection32(Sec)->FileOffsetToRawData;
while gcc compiler do not have the problem.
I have to change the code to
uint64_t OffsetToRaw;
if (is64Bit())
OffsetToRaw = toSection64(Sec)->FileOffsetToRawData;
else
OffsetToRaw = toSection32(Sec)->FileOffsetToRawData;
Reviewers: Sean Fertile
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D70255
|
| |
|
|
|
|
|
|
| |
A call site parameter description of a memory operand needs to
unambiguously convey the size of the operand to prevent incorrect entry
value evaluation.
Thanks for David Stenberg for pointing this issue out!
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pair up inlined AutoreleaseRV calls with their matching RetainRV or
ClaimRV.
- RetainRV cancels out AutoreleaseRV. Delete both instructions.
- ClaimRV is a peephole for RetainRV+Release. Delete AutoreleaseRV and
replace ClaimRV with Release.
This avoids problems where more aggressive inlining triggers memory
regressions.
This patch is happy to skip over non-callable instructions and non-ARC
intrinsics looking for the pair. It is likely sound to also skip over
opaque function calls, but that's harder to reason about, and it's not
relevant to the goal here: if there's an opaque function call splitting
up a pair, it's very unlikely that a handshake would have happened
dynamically without inlining.
Note that this patch also subsumes the previous logic that looked
backwards from ReleaseRV.
https://reviews.llvm.org/D70370
rdar://problem/46509586
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In particular, 1- and 2-byte loads and stores ignore the pointer tag
when using SP as the base register.
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70341
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.
The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.
We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.
Differential Revision: https://reviews.llvm.org/D69392
|
| |
|
|
|
|
|
|
|
|
|
| |
SUMMARY:
implement printing out raw section data of xcoff objectfile for llvm-objdump
and option -D --disassemble-all option for llvm-objdump
Reviewers: Sean Fertile
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D70255
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.
This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.
Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.
The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.
Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
|
| |
|
|
|
|
| |
This patch implements writing function descriptors and TOC base into
data section, and also add function descriptors(both csect and label)
and TOC base symbols to the symbol table.
|
| |
|
|
|
|
| |
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills in the small family of MVE intrinsics that have nothing to
do with vectors: they implement bit-shift operations on 32- or 64-bit
values held in one or two general-purpose registers. Most of these
shift operations saturate if shifting left, and round to nearest if
shifting right, although LSLL and ASRL behave like ordinary shifts.
When these instructions take a variable shift count in a register,
they pay attention to its sign, so that (for example) LSLL or UQRSHLL
will shift left if given a positive number but right if given a
negative one. That makes even LSLL and ASRL different enough from
standard LLVM IR shift semantics that I couldn't see any better
alternative than to simply model the whole family as a set of
MVE-specific IR intrinsics.
(The //immediate// forms of LSLL and ASRL, on the other hand, do
behave exactly like a standard IR shift of a 64-bit value. In fact,
those forms don't have ACLE intrinsics defined at all, because you can
just write an ordinary C shift operation if you want one of those.)
The 64-bit shifts have to be instruction-selected in C++, because they
deliver two output values. But the 32-bit ones are simple enough that
I could write a DAG isel pattern directly into each Instruction
record.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70319
|
| |
|
|
|
|
|
|
|
|
|
| |
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.
This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
|
| |
|
|
|
|
|
| |
Start checking the machine function in GlobalISel instead of the
target directly.
This temporarily breaks fcanonicalize selection in GlobalISel.
|
| |
|
|
|
|
|
|
| |
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70391
|
| | |
|
| |
|
|
|
|
|
|
| |
infinite loops (PR43971)
As detailed in PR43971/D70267, the use of XFormVExtractWithShuffleIntoLoad causes issues where we end up in infinite loops of extract(targetshuffle(vecload)) -> extract(shuffle(vecload)) -> extract(vecload) -> extract(targetshuffle(vecload)), there are just too many legalization checks at every stage that we can't guarantee that extract(shuffle(vecload)) -> scalarload can occur.
At the moment we see a number of minor regressions as we don't fold extract(shuffle(vecload)) -> scalarload before legal ops, these can be addressed in future patches and extension of X86ISelLowering's combineExtractWithShuffle.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Assert in getFunctionLocalOffsetAfterInsn() fails when processing a call
MachineInstr inside a bundle and compiling with debug info. This is
because labels are added by DwarfDebug::beginInstruction() which is
called for each top-level MI by EmitFunctionBody()'s for-loop iteration
but constructCallSiteEntryDIEs() which calls
getFunctionLocalOffsetAfterInsn() iterates over all MIs.
This commit modifies constructCallSiteEntryDIEs() to get the associated
bundle MI for call MIs inside a bundle and use that to when calling
getFunctionLocalOffsetAfterInsn() and getLabelAfterInsn(). It also skips
loop iterations for bundle MIs since the loop statements are concerned
with debug info for each physical instructions and bundles represent a
group of instructions. It also fix the comment about PCAddr since the
code is getting the return address and not the call address.
Reviewers: dstenb, vsk, aprantl, djtodoro, dblaikie, NikolaPrica
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70293
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151
Summary:
Dependence anlysis has a mechanism to cache results. Thus for particular memory access the cache keep track of side effects in basic blocks. The problem is that for invariant loads dependepce analysis legally ignores many dependencies due to a special semantic rules for such loads. But later results calculated for invariant load retrived from the cache for general case acceses. As a result we have wrong dependence information causing GVN to do illegal transformation. Fixes, T42151.
Proposed solution is to disable caching of invariant loads. I think such loads a pretty rare and it doesn't make sense to extend caching mechanism for them.
Reviewers: reames, chandlerc, skatkov, morisset, jdoerfert
Reviewed By: reames
Subscribers: hiraditya, test, jdoerfert, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64405
|
| |
|
|
| |
Differential revision: https://reviews.llvm.org/D70383
|
| |
|
|
|
|
|
| |
The main motivation for this is being able to write simpler assertions
and get better error messages in unit tests.
Split off from D70394.
|
| |
|
|
|
|
| |
The main motivation for this is better failure messages in unit tests.
Split off from D70394.
|
| |
|
|
|
|
|
|
| |
Remove the restriction, from the mve tail predication pass, that the
all masked vectors instructions need to be 128-bits. This allows us
to supported extending loads and truncating stores.
Differential Revision: https://reviews.llvm.org/D69946
|
| |
|
|
|
| |
As a test commit I fixed a misspelling in one of comments in SLP
vectorizer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies ARMLowOverheadLoops to convert a predicated
vector low-overhead loop into a tail-predicatd one. This is currently
a very basic conversion, with the following restrictions:
- Operates only on single block loops.
- The loop can only contain a single vctp instruction.
- No other instructions can write to the vpr.
- We only allow a subset of the mve instructions in the loop.
TODO: Pass the number of elements, not the number of iterations to
dlstp/wlstp.
Differential Revision: https://reviews.llvm.org/D69945
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
LegalizeTypes and LegalizeDAG to one version in TaretLowering.
Reviewers: RKSimon, efriedma, spatel
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70354
|
| |
|
|
|
|
|
|
| |
llvm-objdump"
This reverts commit 8f8a9f3437d4517f674395da30edb59d5514f7bc.
Reverting since this patch seems to break a lot of llvm buildbots.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch by Andrew Browne <browneee@google.com>
Reviewers: tejohnson, evgeny777
Reviewed By: tejohnson
Subscribers: arphaman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70195
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The earlier commit (https://reviews.llvm.org/D70014) missed this one : If Always_Inline happens to be the only entry in FuncFlags, then the assembler will not print it in the summary.
Patch by Bharathi Seshadri <bseshadr@cisco.com>
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70323
|
| |
|
|
|
|
|
|
| |
uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.
This reverts commit a3e61946c5bd7bdfab15af76b292e52d6ffa27f7.
|
| |
|
|
|
|
|
|
|
|
| |
This patch aims to improve the code generation for float vector gather on POWER9.
Patterns have been implemented to utilize instructions that deliver improved
performance.
Patch by: Kamau Bridgeman
Differential Revision: https://reviews.llvm.org/D62908
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Sometimes the CPUSubtype determines the Triple::ArchType that must be used.
Add the subtype to the API's to allow targets that need this to correctly
identify the contents of the binary.
Reviewers: pete
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70345
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Pass down the already accessed ValueInfo to shouldPromoteLocalToGlobal,
to avoid an unnecessary extra index lookup.
Add some assertion checking to confirm we have a non-empty VI when
expected.
Also some misc cleanup, merging the two versions of
doImportAsDefinition, since one was only called by the other, and
unnecessarily passed in a member variable.
Reviewers: steven_wu, pcc, evgeny777
Reviewed By: evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70337
|
| |
|
|
|
|
|
|
|
|
|
| |
SUMMARY:
implement printing out raw section data of xcoff objectfile for llvm-objdump
and option -D --disassemble-all option for llvm-objdump
Reviewers: Sean Fertile
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D70255
|
| |
|
|
| |
surprising the next person to add a case after this one. NFC
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Clean up the code that does GV promotion in the ThinLTO backends.
Specifically, we don't need to check whether we are importing since that
is already checked and handled correctly in shouldPromoteLocalToGlobal.
Simply call shouldPromoteLocalToGlobal, and if it returns true we are
guaranteed that we are promoting, whether or not we are importing (or in
the exporting module). This also makes the handling in getName()
consistent with that in getLinkage(), which checks the DoPromote parameter
regardless of whether we are importing or exporting.
Reviewers: steven_wu, pcc, evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70327
|
| |
|
|
|
|
|
|
|
|
| |
compiler-rt's getAMDProcessorTypeAndSubtype.
This is the CPUID model used on Ryzen 3000 series (Zen 2/Matisse) CPUs.
Patch by Alex James
Differential Revision: https://reviews.llvm.org/D70279
|
| |
|
|
|
|
|
|
|
|
| |
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.
This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.
Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.
Differential Revision: https://reviews.llvm.org/D70334
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.
The core notions of the transform are as follows:
If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.
Differential Revision: https://reviews.llvm.org/D69830
|
| |
|
|
|
|
|
|
| |
-ffp-model=, and -ffp-exception-behavior="
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.
This reverts commits af57dbf12e54f3a8ff48534bf1078f4de104c1cd and e6584b2b7b2de06f1e59aac41971760cac1e1b79
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow clients of the llvm library to opt-in to one-shot SIGPIPE
handling, instead of forcing them to undo llvm's SIGPIPE handler
registration (which is brittle).
The current behavior is preserved for all llvm-derived tools (except
lldb) by means of a default-`true` flag in the InitLLVM constructor.
This prevents "IO error" crashes in long-lived processes (lldb is the
motivating example) which both a) load llvm as a dynamic library and b)
*really* need to ignore SIGPIPE.
As llvm signal handlers can be installed when calling into libclang
(say, via RemoveFileOnSignal), thereby overriding a previous SIG_IGN for
SIGPIPE, there is no clean way to opt-out of "exit-on-SIGPIPE" in the
current model.
Differential Revision: https://reviews.llvm.org/D70277
|
| |
|
|
|
| |
This reverts commit 423f541c1a322963cf482683fe9777ef0692082d, which
breaks llvm-c ABI.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reapplies c0f6ad7d1f3ccb9d0b9ce9ef8dfa06409ccf1b3e with an
additional fix in test/DebugInfo/X86/constant-loclist.ll, which had a
slightly different output on windows targets. The test now accounts for
this difference.
The original commit message follows.
Summary:
As discussed in D70081, this adds the ability to dump section
names/indices to the location list dumper. It does this by moving the
range specific logic from DWARFDie.cpp:dumpRanges into the
DWARFAddressRange class.
The trickiest part of this patch is the backflip in the meanings of the
two dump flags for the location list sections.
The dumping of "raw" location list data is now controlled by
"DisplayRawContents" flag. This frees up the "Verbose" flag to be used
to control whether we print the section index. Additionally, the
DisplayRawContents flag is set for section-based dumps whenever the
--verbose option is passed, but this is not done for the "inline" dumps.
Also note that the index dumping currently does not work for the DWARF
v5 location lists, as the parser does not fill out the appropriate
fields. This will be done in a separate patch.
Reviewers: dblaikie, probinson, JDevlieghere, SouraVX
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, arphaman, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70227
|
| |
|
|
|
|
|
|
| |
See https://bugs.llvm.org/show_bug.cgi?id=43712
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D70170
|
| | |
|
| |
|
|
| |
This reverts commit c0f6ad7d1f3ccb9d0b9ce9ef8dfa06409ccf1b3e to fix the buildbots.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
|