| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Differential Revision: http://reviews.llvm.org/D11218
llvm-svn: 243122
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243114
|
| |
|
|
| |
llvm-svn: 243113
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Differential Revision: http://reviews.llvm.org/D11407
llvm-svn: 243103
|
| |
|
|
|
|
|
|
|
|
|
|
| |
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
llvm-svn: 243100
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit 0f720d984f419c747709462f7476dff962c0bc41.
It breaks clang too badly, I need to prepare a proper patch for clang
first.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243089
|
| |
|
|
| |
llvm-svn: 243087
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243083
|
| |
|
|
| |
llvm-svn: 243070
|
| |
|
|
|
|
|
|
|
|
| |
Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass.
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11373
llvm-svn: 243052
|
| |
|
|
| |
llvm-svn: 243046
|
| |
|
|
| |
llvm-svn: 243034
|
| |
|
|
|
|
|
|
|
|
| |
%vreg2<def> = MOVi32imm 1; GPR32:%vreg2
%W1<def> = COPY %vreg2; GPR32:%vreg2
into:
%W1<def> = MOVi32imm 1
Patched by Lawrence Hu (lawrence@codeaurora.org)
llvm-svn: 243033
|
| |
|
|
|
|
|
|
|
|
| |
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of
the push instructions did not have the right mayLoad/mayStore flags.
Differential Revision: http://reviews.llvm.org/D11340
llvm-svn: 243010
|
| |
|
|
|
|
|
|
|
| |
syntax
Patch by: marina.yatsina@intel.com
Differential Revision: http://reviews.llvm.org/D11337
llvm-svn: 243001
|
| |
|
|
|
|
|
|
|
|
| |
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.
Differential Revision: http://reviews.llvm.org/D11413
llvm-svn: 242994
|
| |
|
|
|
|
|
|
|
|
| |
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
llvm-svn: 242992
|
| |
|
|
|
|
|
|
|
|
| |
Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Differential Revision: http://reviews.llvm.org/D11218
llvm-svn: 242990
|
| |
|
|
|
|
|
|
|
|
| |
disabled only if AVX512BW and AVX512VL present.
Tests added.
Differential Revision: http://reviews.llvm.org/D11414
llvm-svn: 242987
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Straight-line optimizations can simplify the loop body and make LSR's
cost analysis more precise. This significantly improves several Eigen3
CUDA benchmarks.
With this change, EigenContractionKernel runs up to 40% faster
(https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502).
EigenConvolutionKernel2D runs up to 10% faster
(https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605).
I have some difficulties writing small tests that benefit from this
reordering due to a seemingly issue with LSR (being discussed at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html).
See the review thread for the compilation time impact of GVN.
Reviewers: eliben, jholewinski
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11304
llvm-svn: 242982
|
| |
|
|
| |
llvm-svn: 242947
|
| |
|
|
| |
llvm-svn: 242946
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add a basic CodeGen bitcode test which (for now) only prints out the function name and nothing else. The current code merely implements the basic needed for the test run to not crash / assert. Getting to that point required:
- Basic InstPrinter.
- Basic AsmPrinter.
- DiagnosticInfoUnsupported (not strictly required, but nice to have, duplicated from AMDGPU/BPF's ISelLowering).
- Some SP and register setup in WebAssemblyTargetLowering.
- Basic LowerFormalArguments.
- GenInstrInfo.
- Placeholder LowerFormalArguments.
- Placeholder CanLowerReturn and LowerReturn.
- Basic DAGToDAGISel::Select, which requiresGenDAGISel.inc as well as GET_INSTRINFO_ENUM with GenInstrInfo.inc.
- Remove WebAssemblyFrameLowering::determineCalleeSaves and rely on default.
- Implement WebAssemblyFrameLowering::hasFP, same as AArch64's implementation.
Follow-up patches will implement a real AsmPrinter, which will require adding MI opcodes specific to WebAssembly.
Reviewers: sunfish
Subscribers: aemerson, jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D11369
llvm-svn: 242939
|
| |
|
|
|
|
|
|
| |
Patch by Eugene Zelenko!
Differential Revision: http://reviews.llvm.org/D11400
llvm-svn: 242930
|
| |
|
|
| |
llvm-svn: 242922
|
| |
|
|
|
|
|
|
| |
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap.
Related to <rdar://problem/20821730>
llvm-svn: 242908
|
| |
|
|
|
|
|
|
| |
include encoding and intrinsics
Differential Revision: http://reviews.llvm.org/D11222
llvm-svn: 242896
|
| |
|
|
|
|
|
| |
Patch by: michael.zuckerman@intel.com
Differential Revision: http://reviews.llvm.org/D11223
llvm-svn: 242886
|
| |
|
|
|
|
|
|
|
|
| |
through APIs that are no longer necessary now that the update API has
been removed.
This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.
llvm-svn: 242882
|
| |
|
|
|
|
|
|
| |
All SKX forms. All VCVT instructions for float/double/int/long types.
Differential Revision: http://reviews.llvm.org/D11343
llvm-svn: 242877
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MCRegAliasIterator only works for physical registers. So, do not run it
on virtual registers.
With this issue fixed, we can resurrect the BranchFolding pass in NVPTX
backend.
Reviewers: jholewinski, bkramer
Subscribers: henryhu, meheff, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11174
llvm-svn: 242871
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes one substantive change and a few stylistic changes to the
VSX swap optimization pass.
The substantive change is to permit LXSDX and LXSSPX instructions to
participate in swap optimization computations. The previous change to
insert a swap following a SUBREG_TO_REG widening operation makes this
almost trivial.
I experimented with also permitting STXSDX and STXSSPX instructions.
This can be done using similar techniques: we could insert a swap
prior to a narrowing COPY operation, and then permit these stores to
participate. I prototyped this, but discovered that the pattern of a
narrowing COPY followed by an STXSDX does not occur in any of our
test-suite code. So instead, I added commentary indicating that this
could be done.
Other TLC:
- I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate
the direction of the copy.
- I factored the insertion of swap instructions into a separate
function.
Finally, I added a new test case to check that the scalar-to-vector
loads are working properly with swap optimization.
llvm-svn: 242838
|
| |
|
|
| |
llvm-svn: 242812
|
| |
|
|
|
|
| |
This is setup for future work planned for the AArch64 Load/Store Opt pass.
llvm-svn: 242810
|
| |
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11351
llvm-svn: 242761
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
whether register r9 should be reserved.
This recommits r242737, which broke bots because the number of subtarget
features went over the limit of 64.
This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.
Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11320
llvm-svn: 242756
|
| |
|
|
| |
llvm-svn: 242747
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply of r241928 which had to be reverted because of the r241926
revert.
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.
Differential Revision: http://reviews.llvm.org/D10676
llvm-svn: 242743
|
| |
|
|
|
|
|
|
|
|
| |
Re-apply r241926 with an additional check that r13 and r15 are not used
for LDRD/STRD. See http://llvm.org/PR24190. This also already includes
the fix from r241951.
Differential Revision: http://reviews.llvm.org/D10623
llvm-svn: 242742
|
| |
|
|
|
|
|
|
| |
This caused builds to fail with the following error message:
error:Too many subtarget features! Bump MAX_SUBTARGET_FEATURES.
llvm-svn: 242740
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
whether register r9 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.
Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11320
llvm-svn: 242737
|
| |
|
|
|
|
| |
This reverts commit r241926. This caused http://llvm.org/PR24190
llvm-svn: 242735
|
| |
|
|
|
|
| |
This reverts commit r241928. This caused http://llvm.org/PR24190
llvm-svn: 242734
|
| |
|
|
|
|
| |
This reverts commit r241951. It caused http://llvm.org/PR24190
llvm-svn: 242733
|
| |
|
|
|
|
|
| |
Even though this is just some hinting for the scheduler it doesn't make
sense to do that unless you know the target can perform the fusion.
llvm-svn: 242732
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
* Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`.
* Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute.
Multiple targets duplicated the same `needsStackRealignment` code:
- Aarch64.
- ARM.
- Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has.
- PowerPC.
- WebAssembly.
- x86 almost: has an extra `-force-align-stack` option, which the default implementation now has.
The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects:
- AMDGPU
- BPF
- CppBackend
- MSP430
- NVPTX
- Sparc
- SystemZ
- XCore
- Out-of-tree targets
This is a breaking change! `make check` passes.
The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation.
`needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone.
Reviewers: sunfish
Subscribers: aemerson, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11160
llvm-svn: 242727
|
| |
|
|
|
|
|
|
| |
Related to rdar://19205407
Differential Revision: http://reviews.llvm.org/D10746
llvm-svn: 242724
|
| |
|
|
| |
llvm-svn: 242719
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step toward supporting shrink-wrapping for this target.
The changes could be summarized by these items:
- Expand the tail-call return as part of the expand pseudo pass.
- Get rid of the assumptions that the epilogue is the exit block:
* Do not assume which registers are free in the epilogue. (This indirectly
improve the lowering of the code for the segmented stacks, see the test
cases.)
* Take into account that the basic block can be empty.
Related to <rdar://problem/20821730>
llvm-svn: 242714
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
[NVPTX] make load on global readonly memory to use ldg
Summary:
As describe in [1], ld.global.nc may be used to load memory by nvcc when
__restrict__ is used and compiler can detect whether read-only data cache
is safe to use.
This patch will try to check whether ldg is safe to use and use them to
replace ld.global when possible. This change can improve the performance
by 18~29% on affected kernels (ratt*_kernel and rwdot*_kernel) in
S3D benchmark of shoc [2].
Patched by Xuetian Weng.
[1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache
[2] https://github.com/vetter/shoc
Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
Reviewers: jholewinski, jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D11314
llvm-svn: 242713
|