| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.
Reviewers: aprantl, vsk, t.p.northover
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D66953
llvm-svn: 374033
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.
In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.
This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
implemented fine-tuning for when to produce vcmpe (i.e. not do it for
equality comparisons). The complexity introduced by those patches
isn't needed anymore if we just always produce vcmp instead. Maybe
these patches need to be reintroduced again once support is needed to
map potential LLVM-IR constrained floating point compare intrinsics to
the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
of these test, the intent of what is tested for isn't related to
whether the vcmp should produce an Invalid Operation exception or not.
Fixes PR43374.
Differential Revision: https://reviews.llvm.org/D68463
llvm-svn: 374025
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374017
|
|
|
|
| |
llvm-svn: 373989
|
|
|
|
|
|
|
|
|
|
| |
larger than i32.
Gather instructions can use i32 or i64 elements for indices. If
the index is zero extended from a type smaller than i32 to i64, we
can shrink the extend to just extend to i32.
llvm-svn: 373982
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There was a bug when computing the number of unwind destination
mismatches in CFGStackify. When there are many mismatched calls that
share the same (original) destination BB, they have to be counted
separately.
This also fixes a typo and runs `fixUnwindMismatches` only when the wasm
exception handling is enabled. This is to prevent unnecessary
computations and does not change behavior.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68552
llvm-svn: 373975
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, `WebAssembly::mayThrow()` assumed all inputs are global
addresses. But when intrinsics, such as `memcpy`, `memmove`, or `memset`
are lowered to external symbols in instruction selection and later
emitted as library calls. And these functions don't throw.
This patch adds handling to those memory intrinsics to `mayThrow`
function. But while most of libcalls don't throw, we can't guarantee all
of them don't throw, so currently we conservatively return true for all
other external symbols.
I think a better way to solve this problem is to embed 'nounwind' info
in `TargetLowering::CallLoweringInfo`, so that we can access the info
from the backend. This will also enable transferring 'nounwind'
properties of LLVM IR instructions. Currently we don't transfer that
info and we can only access properties of callee functions, if the
callees are within the module. Other targets don't need this info in the
backend because they do all the processing before isel, but it will help
us because that info will reduce code size increase in fixing unwind
destination mismatches in CFGStackify.
But for now we return false for these memory intrinsics and true for all
other libcalls conservatively.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68553
llvm-svn: 373967
|
|
|
|
|
|
|
|
|
| |
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.
llvm-svn: 373947
|
|
|
|
| |
llvm-svn: 373946
|
|
|
|
| |
llvm-svn: 373945
|
|
|
|
| |
llvm-svn: 373944
|
|
|
|
|
|
|
| |
This hides some defects in SIFoldOperands when the immediates are
split.
llvm-svn: 373943
|
|
|
|
|
|
| |
Continue making a mess of merge/unmerge legality.
llvm-svn: 373942
|
|
|
|
|
|
|
|
|
|
| |
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.
We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.
llvm-svn: 373938
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
|
|
|
|
|
|
|
|
| |
NFCI.
Stop all the callers from having to check the value type before calling getTargetShuffleInputs.
llvm-svn: 373915
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ensures that frame-based unwinding will continue to work when
calling a noreturn function; there is not much use having the caller's
frame pointer saved if you don't also have the caller's program counter.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68542
llvm-svn: 373907
|
|
|
|
|
|
|
|
|
|
|
|
| |
J/JAL/JALX/JALS are absolute branches, but stay within the current
256 MB-aligned region, so we must include the high bits of the
instruction address when calculating the branch target.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68548
llvm-svn: 373906
|
|
|
|
|
|
| |
Fix comment.
llvm-svn: 373901
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible.
Move the erasing and iterator updating inside to match the
other slow LEA function.
I've adapted code from optTwoAddrLEA and basically rebuilt the
implementation here. We do lose the kill flags now just like
optTwoAddrLEA. This runs late enough in the pipeline that
shouldn't really be a problem.
llvm-svn: 373877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
Fixes PR43217
Differential Revision: https://reviews.llvm.org/D68544
llvm-svn: 373871
|
|
|
|
| |
llvm-svn: 373870
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch aims to group together the `CRNotPat` multi class instantiations
within the `PPCInstrInfo.td` file.
Integer instantiations of the multi class are grouped together into a section,
and the floating point patterns are separated into its own section.
Differential Revision: https://reviews.llvm.org/D67975
llvm-svn: 373869
|
|
|
|
|
|
|
|
| |
directly.
Move the resolveTargetShuffleInputsAndMask call to after the shuffle mask combine before the undef/zero constant fold instead.
llvm-svn: 373868
|
|
|
|
|
|
|
|
| |
Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373867
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v8i64->v8i8 truncate when v8i64 isn't legal
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.
I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68428
llvm-svn: 373864
|
|
|
|
|
|
| |
of using setTargetShuffleZeroElements directly. NFCI.
llvm-svn: 373855
|
|
|
|
|
|
|
|
| |
Summary: Replace 'isDarwin' with 'IsDarwin' based on LLVM naming convention.
Differential Revision: https://reviews.llvm.org/D68336
llvm-svn: 373852
|
|
|
|
| |
llvm-svn: 373849
|
|
|
|
|
|
|
|
|
|
| |
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This allows us to remove createTargetShuffleMask.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373846
|
|
|
|
| |
llvm-svn: 373845
|
|
|
|
| |
llvm-svn: 373842
|
|
|
|
| |
llvm-svn: 373841
|
|
|
|
| |
llvm-svn: 373840
|
|
|
|
| |
llvm-svn: 373839
|
|
|
|
|
|
| |
Turn into shift and truncate. Doesn't yet handle pointers.
llvm-svn: 373838
|
|
|
|
|
|
| |
This wasn't updated for the immarg handling change.
llvm-svn: 373837
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.
This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.
Differential Revision: https://reviews.llvm.org/D68226
llvm-svn: 373834
|
|
|
|
|
|
| |
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.
llvm-svn: 373832
|
|
|
|
|
|
| |
simm9_lsb0 and simm12_lsb0 operand types were missing predicates.
llvm-svn: 373812
|
|
|
|
|
|
|
|
|
| |
../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces]
return addMappingFromTable<1>(MI, MRI, { 0 }, Table);
^
{}
llvm-svn: 373784
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.
This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.
Removes over 10K bytes from the isel table.
Differential Revision: https://reviews.llvm.org/D68446
llvm-svn: 373766
|
|
|
|
|
|
|
|
| |
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC
Differential Revision: https://reviews.llvm.org/D68432
llvm-svn: 373765
|
|
|
|
|
|
|
|
|
|
|
|
| |
opcodes
See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68349
llvm-svn: 373745
|
|
|
|
|
|
|
|
|
|
| |
See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68348
llvm-svn: 373740
|
|
|
|
|
|
|
|
| |
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.
llvm-svn: 373738
|
|
|
|
|
|
|
|
|
|
| |
See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68347
llvm-svn: 373736
|
|
|
|
|
|
|
| |
This was always passing the destination flat address space, when it
should be picking between the two valid source options.
llvm-svn: 373716
|