summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer ↵Igor Breger2015-07-246-108/+522
| | | | | | | | | | Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 243122
* Remove access to the DataLayout in the TargetMachineMehdi Amini2015-07-244-24/+29
| | | | | | | | | | | | | | | | | | | | | | Summary: Replace getDataLayout() with a createDataLayout() method to make explicit that it is intended to create a DataLayout only and not accessing it for other purpose. This change is the last of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11103 (cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 243114
* fix wrong comment; NFCSanjay Patel2015-07-241-1/+1
| | | | llvm-svn: 243113
* [ARM] - Fix lowering of shufflevectors in AArch32Luke Cheeseman2015-07-241-38/+127
| | | | | | | | | | | | | | | | | | | Some shufflevectors are currently being incorrectly lowered in the AArch32 backend as the existing checks for detecting the NEON operations from the shufflevector instruction expects the shuffle mask and the vector operands to be of the same length. This is not always the case as the mask may be twice as long as the operand; here only the lower half of the shufflemask gets checked, so provided the lower half of the shufflemask looks like a vector transpose (or even is just all -1 for undef) then the intrinsics may get incorrectly lowered into a vector transpose (VTRN) instruction. This patch fixes this by accommodating for both cases and adds regression tests. Differential Revision: http://reviews.llvm.org/D11407 llvm-svn: 243103
* When lowering vector shifts a check is performed to see if the value to shift byLuke Cheeseman2015-07-242-17/+14
| | | | | | | | | | | | is an immediate, in this check the value is negated and stored in and int64_t. The value can be -2^63 yet the result cannot be stored in an int64_t and this gives some undefined behaviour causing failures. The negation is only necessary when the values is within a certain range and so it should not need to negate -2^63, this patch introduces this and also a regression test. Differential Revision: http://reviews.llvm.org/D11408 llvm-svn: 243100
* Revert "Remove access to the DataLayout in the TargetMachine"Mehdi Amini2015-07-244-29/+24
| | | | | | | | | | This reverts commit 0f720d984f419c747709462f7476dff962c0bc41. It breaks clang too badly, I need to prepare a proper patch for clang first. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 243089
* [bpf] initial support for debug_infoAlexei Starovoitov2015-07-243-9/+22
| | | | llvm-svn: 243087
* Remove access to the DataLayout in the TargetMachineMehdi Amini2015-07-244-24/+29
| | | | | | | | | | | | | | | | | | | | | | Summary: Replace getDataLayout() with a createDataLayout() method to make explicit that it is intended to create a DataLayout only and not accessing it for other purpose. This change is the last of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11103 (cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 243083
* test commit, only added one spaceLawrence Hu2015-07-231-1/+1
| | | | llvm-svn: 243070
* [ARM] Register (existing) ARMLoadStoreOpt pass with LLVM pass manager.David Gross2015-07-231-2/+12
| | | | | | | | | | Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass. Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D11373 llvm-svn: 243052
* Test commit.David Gross2015-07-231-0/+1
| | | | llvm-svn: 243046
* X86: Use dyn_cast instead of isa+cast, NFCDuncan P. N. Exon Smith2015-07-231-5/+6
| | | | llvm-svn: 243034
* This patch eanble register coalescing to coalesce the following:Weiming Zhao2015-07-231-0/+14
| | | | | | | | | | %vreg2<def> = MOVi32imm 1; GPR32:%vreg2 %W1<def> = COPY %vreg2; GPR32:%vreg2 into: %W1<def> = MOVi32imm 1 Patched by Lawrence Hu (lawrence@codeaurora.org) llvm-svn: 243033
* [X86] Allow load folding into PUSH instructionsMichael Kuperstein2015-07-233-9/+27
| | | | | | | | | | Adds pushes to the folding tables. This also required a fix to the TD definition, since the memory forms of the push instructions did not have the right mayLoad/mayStore flags. Differential Revision: http://reviews.llvm.org/D11340 llvm-svn: 243010
* [X86] Fix order of operands for ins and outs instructions when parsing intel ↵Michael Kuperstein2015-07-231-28/+28
| | | | | | | | | syntax Patch by: marina.yatsina@intel.com Differential Revision: http://reviews.llvm.org/D11337 llvm-svn: 243001
* X86: Fixed assertion failure in 32-bit modeElena Demikhovsky2015-07-231-2/+3
| | | | | | | | | | The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal. Added a check for the scalar type before creating this node. Added a test that fails with assertion on the current version. Differential Revision: http://reviews.llvm.org/D11413 llvm-svn: 242994
* Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..."Chandler Carruth2015-07-235-478/+94
| | | | | | | | | | This commit broke the build. Numerous build bots broken, and it was blocking my progress so reverting. It should be trivial to reproduce -- enable the BPF backend and it should fail when running llvm-tblgen. llvm-svn: 242992
* AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer ↵Igor Breger2015-07-235-94/+478
| | | | | | | | | | Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 242990
* AVX : Fix ISA disabling in case AVX512VL , some instructions should be ↵Igor Breger2015-07-232-17/+18
| | | | | | | | | | disabled only if AVX512BW and AVX512VL present. Tests added. Differential Revision: http://reviews.llvm.org/D11414 llvm-svn: 242987
* [NVPTX] run LSR before straight-line optimizationsJingyue Wu2015-07-231-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Straight-line optimizations can simplify the loop body and make LSR's cost analysis more precise. This significantly improves several Eigen3 CUDA benchmarks. With this change, EigenContractionKernel runs up to 40% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502). EigenConvolutionKernel2D runs up to 10% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605). I have some difficulties writing small tests that benefit from this reordering due to a seemingly issue with LSR (being discussed at http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html). See the review thread for the compilation time impact of GVN. Reviewers: eliben, jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11304 llvm-svn: 242982
* fix typo; NFCSanjay Patel2015-07-221-4/+4
| | | | llvm-svn: 242947
* fix indent; NFCSanjay Patel2015-07-221-20/+20
| | | | llvm-svn: 242946
* WebAssembly: basic bitcode → assembly CodeGen testJF Bastien2015-07-2215-18/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add a basic CodeGen bitcode test which (for now) only prints out the function name and nothing else. The current code merely implements the basic needed for the test run to not crash / assert. Getting to that point required: - Basic InstPrinter. - Basic AsmPrinter. - DiagnosticInfoUnsupported (not strictly required, but nice to have, duplicated from AMDGPU/BPF's ISelLowering). - Some SP and register setup in WebAssemblyTargetLowering. - Basic LowerFormalArguments. - GenInstrInfo. - Placeholder LowerFormalArguments. - Placeholder CanLowerReturn and LowerReturn. - Basic DAGToDAGISel::Select, which requiresGenDAGISel.inc as well as GET_INSTRINFO_ENUM with GenInstrInfo.inc. - Remove WebAssemblyFrameLowering::determineCalleeSaves and rely on default. - Implement WebAssemblyFrameLowering::hasFP, same as AArch64's implementation. Follow-up patches will implement a real AsmPrinter, which will require adding MI opcodes specific to WebAssembly. Reviewers: sunfish Subscribers: aemerson, jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D11369 llvm-svn: 242939
* Fix -Wextra-semi warnings.Hans Wennborg2015-07-222-2/+2
| | | | | | | | Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D11400 llvm-svn: 242930
* Simplify switch as all cases other than default return true. NFC.Chad Rosier2015-07-221-10/+0
| | | | llvm-svn: 242922
* [ARM] Make the frame lowering code ready for shrink-wrapping.Quentin Colombet2015-07-223-119/+173
| | | | | | | | Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap. Related to <rdar://problem/20821730> llvm-svn: 242908
* [X86][AVX512] add reduce/range/scalef/rndScaleAsaf Badouh2015-07-225-90/+203
| | | | | | | | include encoding and intrinsics Differential Revision: http://reviews.llvm.org/D11222 llvm-svn: 242896
* [X86] Add .intel_syntax noprefix directive to intel-syntax x86 asm outputMichael Kuperstein2015-07-221-0/+1
| | | | | | | Patch by: michael.zuckerman@intel.com Differential Revision: http://reviews.llvm.org/D11223 llvm-svn: 242886
* [PM/AA] Remove all of the dead AliasAnalysis pointers being threadedChandler Carruth2015-07-221-2/+1
| | | | | | | | | | through APIs that are no longer necessary now that the update API has been removed. This will make changes to the AA interfaces significantly less disruptive (I hope). Either way, it seems like a really nice cleanup. llvm-svn: 242882
* AVX-512: Added intrinsics for VCVT* instructions.Elena Demikhovsky2015-07-222-7/+181
| | | | | | | | All SKX forms. All VCVT instructions for float/double/int/long types. Differential Revision: http://reviews.llvm.org/D11343 llvm-svn: 242877
* [BranchFolding] do not iterate the aliases of virtual registersJingyue Wu2015-07-221-1/+0
| | | | | | | | | | | | | | | | | Summary: MCRegAliasIterator only works for physical registers. So, do not run it on virtual registers. With this issue fixed, we can resurrect the BranchFolding pass in NVPTX backend. Reviewers: jholewinski, bkramer Subscribers: henryhu, meheff, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11174 llvm-svn: 242871
* [PPC64LE] More vector swap optimization TLCBill Schmidt2015-07-211-21/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes one substantive change and a few stylistic changes to the VSX swap optimization pass. The substantive change is to permit LXSDX and LXSSPX instructions to participate in swap optimization computations. The previous change to insert a swap following a SUBREG_TO_REG widening operation makes this almost trivial. I experimented with also permitting STXSDX and STXSSPX instructions. This can be done using similar techniques: we could insert a swap prior to a narrowing COPY operation, and then permit these stores to participate. I prototyped this, but discovered that the pattern of a narrowing COPY followed by an STXSDX does not occur in any of our test-suite code. So instead, I added commentary indicating that this could be done. Other TLC: - I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate the direction of the copy. - I factored the insertion of swap instructions into a separate function. Finally, I added a new test case to check that the scalar-to-vector loads are working properly with swap optimization. llvm-svn: 242838
* Follow up to r242810. NFC.Chad Rosier2015-07-211-1/+1
| | | | llvm-svn: 242812
* [AArch64] Simplify the passing of arguments. NFC.Chad Rosier2015-07-211-23/+37
| | | | | | This is setup for future work planned for the AArch64 Load/Store Opt pass. llvm-svn: 242810
* AVX512 : Implemented VPMADDUBSW and VPMADDWD instruction , Igor Breger2015-07-215-1/+38
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11351 llvm-svn: 242761
* [ARM] Define subtarget feature "reserve-r9", which is used to decideAkira Hatanaka2015-07-213-13/+12
| | | | | | | | | | | | | | | | | | | | whether register r9 should be reserved. This recommits r242737, which broke bots because the number of subtarget features went over the limit of 64. This change is needed because we cannot use a backend option to set cl::opt "arm-reserve-r9" when doing LTO. Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to reserve r9 should make changes to add subtarget feature "reserve-r9" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11320 llvm-svn: 242756
* AMDGPU: Set isMoveImm on s_movk_i32Matt Arsenault2015-07-211-1/+1
| | | | llvm-svn: 242747
* ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common codeMatthias Braun2015-07-211-166/+186
| | | | | | | | | | | | | | | Re-apply of r241928 which had to be reverted because of the r241926 revert. This commit factors out common code from MergeBaseUpdateLoadStore() and MergeBaseUpdateLSMultiple() and introduces a new function MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a strd/ldrd instruction into an strd/ldrd instruction with writeback where possible. Differential Revision: http://reviews.llvm.org/D10676 llvm-svn: 242743
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-211-31/+102
| | | | | | | | | | Re-apply r241926 with an additional check that r13 and r15 are not used for LDRD/STRD. See http://llvm.org/PR24190. This also already includes the fix from r241951. Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 242742
* Revert r242737.Akira Hatanaka2015-07-203-12/+13
| | | | | | | | This caused builds to fail with the following error message: error:Too many subtarget features! Bump MAX_SUBTARGET_FEATURES. llvm-svn: 242740
* [ARM] Define subtarget feature "reserve-r9", which is used to decideAkira Hatanaka2015-07-203-13/+12
| | | | | | | | | | | | | | | | | whether register r9 should be reserved. This change is needed because we cannot use a backend option to set cl::opt "arm-reserve-r9" when doing LTO. Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to reserve r9 should make changes to add subtarget feature "reserve-r9" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11320 llvm-svn: 242737
* Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"Matthias Braun2015-07-201-96/+29
| | | | | | This reverts commit r241926. This caused http://llvm.org/PR24190 llvm-svn: 242735
* Revert "ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code"Matthias Braun2015-07-201-186/+166
| | | | | | This reverts commit r241928. This caused http://llvm.org/PR24190 llvm-svn: 242734
* Revert "ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920"Matthias Braun2015-07-201-3/+3
| | | | | | This reverts commit r241951. It caused http://llvm.org/PR24190 llvm-svn: 242733
* AArch64: Restrict macroop fusion heuristics to cyclone.Matthias Braun2015-07-201-31/+33
| | | | | | | Even though this is just some hinting for the scheduler it doesn't make sense to do that unless you know the target can perform the fusion. llvm-svn: 242732
* Targets: commonize some stack realignment codeJF Bastien2015-07-2015-164/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: * Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`. * Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute. Multiple targets duplicated the same `needsStackRealignment` code: - Aarch64. - ARM. - Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has. - PowerPC. - WebAssembly. - x86 almost: has an extra `-force-align-stack` option, which the default implementation now has. The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects: - AMDGPU - BPF - CppBackend - MSP430 - NVPTX - Sparc - SystemZ - XCore - Out-of-tree targets This is a breaking change! `make check` passes. The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation. `needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone. Reviewers: sunfish Subscribers: aemerson, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11160 llvm-svn: 242727
* AArch64: Add aditional Cyclone macroop fusion opportunitiesMatthias Braun2015-07-201-16/+34
| | | | | | | | Related to rdar://19205407 Differential Revision: http://reviews.llvm.org/D10746 llvm-svn: 242724
* Fix comment typo (test commit). NFCGeoff Berry2015-07-201-1/+1
| | | | llvm-svn: 242719
* [ARM] Refactor the prologue/epilogue emission to be more robust.Quentin Colombet2015-07-204-118/+219
| | | | | | | | | | | | | | | | This is the first step toward supporting shrink-wrapping for this target. The changes could be summarized by these items: - Expand the tail-call return as part of the expand pseudo pass. - Get rid of the assumptions that the epilogue is the exit block: * Do not assume which registers are free in the epilogue. (This indirectly improve the lowering of the code for the segmented stacks, see the test cases.) * Take into account that the basic block can be empty. Related to <rdar://problem/20821730> llvm-svn: 242714
* [NVPTX] make load on global readonly memory to use ldgJingyue Wu2015-07-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [NVPTX] make load on global readonly memory to use ldg Summary: As describe in [1], ld.global.nc may be used to load memory by nvcc when __restrict__ is used and compiler can detect whether read-only data cache is safe to use. This patch will try to check whether ldg is safe to use and use them to replace ld.global when possible. This change can improve the performance by 18~29% on affected kernels (ratt*_kernel and rwdot*_kernel) in S3D benchmark of shoc [2]. Patched by Xuetian Weng. [1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache [2] https://github.com/vetter/shoc Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll Reviewers: jholewinski, jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11314 llvm-svn: 242713
OpenPOWER on IntegriCloud