summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Give a predicate function a more meaningful nameKrzysztof Parzyszek2016-11-142-18/+18
| | | | | | | Change "orisadd" to "IsOrAdd" to follow the naming conventions, and change "isOrAdd" in the C++ code to "isOrEquivalentToAdd". llvm-svn: 286886
* ARM: try to fix GCC 4.8 compilation again after r286881.Tim Northover2016-11-141-1/+2
| | | | llvm-svn: 286882
* Recommit: ARM: sort register lists by encoding in push/pop instructions.Tim Northover2016-11-143-2/+28
| | | | | | | | | | | | | | | | | For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881
* [AArch64] Change some pointers to references. NFC.Geoff Berry2016-11-141-16/+16
| | | | | | Follow-up change to r286875. llvm-svn: 286879
* [AArch64] Split 0 vector stores into scalar store pairs.Geoff Berry2016-11-141-4/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The load store optimizer pass will merge them to store pair stores. This should be better than a movi to create the vector zero followed by a vector store if the zero constant is not re-used, since one instructions and one register live range will be removed. For example, the final generated code should be: stp xzr, xzr, [x0] instead of: movi v0.2d, #0 str q0, [x0] Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26561 llvm-svn: 286875
* [AArch64] Factor out transform code from split16BStore. NFC.Geoff Berry2016-11-141-24/+31
| | | | llvm-svn: 286874
* Revert: r286868 - Test commitDaniel Sanders2016-11-141-1/+0
| | | | llvm-svn: 286869
* Test commitDaniel Sanders2016-11-141-0/+1
| | | | llvm-svn: 286868
* Revert "ARM: sort register lists by encoding in push/pop instructions."Tim Northover2016-11-143-28/+2
| | | | | | | This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867
* ARM: sort register lists by encoding in push/pop instructions.Tim Northover2016-11-143-2/+28
| | | | | | | | | | | | | | For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866
* [PPC] Add intrinsic mapping to the xscvhpsp instructionSean Fertile2016-11-141-0/+9
| | | | | | | | | add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862
* AMDGPU/SI: Support data types other than V4f32 in image intrinsicsChangpeng Fang2016-11-142-63/+73
| | | | | | | | | | | | | | | | Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860
* [Hexagon] Remove unsafe load instructions that affect Stack Slot ColoringSumanth Gundapaneni2016-11-141-12/+0
| | | | | | | | | | | | | | | | | The Stack slot coloring pass removes a store that is followed by a load that deal with the same stack slot. The function isLoadFromStackSlot is supposed to consider the loads that have no side-effects. This patch fixed the issue by removing the unsafe loads from this function Eg: %vreg0<def> = L2_loadruh_io <fi#15>, 0 S2_storeri_io <fi#15>, 0, %vreg0 In this case, we load an unsigned extended half word and store this in to the same stack slot. The Stack slot coloring pass considers safe to remove the store. This patch marked all the non-vector byte and half word loads as unsafe. llvm-svn: 286843
* [CostModel][X86] Added mul costs for vXi8 vectorsSimon Pilgrim2016-11-141-5/+21
| | | | | | More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-0/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [PPC] add intrinsics for vec extract exp/significand and vec test data class.Sean Fertile2016-11-141-6/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829
* GlobalISel: Fix indentation. NFCDiana Picus2016-11-141-1/+1
| | | | llvm-svn: 286808
* [AVX-512] Add suffixless aliases for EVEX encoded ↵Craig Topper2016-11-141-0/+10
| | | | | | | | vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior. Fixes another problem from PR28850. llvm-svn: 286790
* [X86] Cleanup 'x' and 'y' mnemonic suffixes for ↵Craig Topper2016-11-143-23/+71
| | | | | | | | | | | | | vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787
* [AVX-512] Remove and autoupgrade masked dword/qword variable shift ↵Craig Topper2016-11-141-8/+0
| | | | | | intrinsics to the new unmasked versions and selects. llvm-svn: 286786
* [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an ↵Craig Topper2016-11-131-4/+14
| | | | | | | | immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd. Fixes PR24941. llvm-svn: 286775
* AMDGPU: Implement SGPR spilling with scalar storesMatt Arsenault2016-11-133-10/+153
| | | | | | | | | | | | | | | | nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
* revert commit r286761, some builds failed on Win platformsIgor Breger2016-11-131-0/+4
| | | | llvm-svn: 286765
* [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.Ayman Musa2016-11-131-4/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761
* [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.Ayman Musa2016-11-132-0/+91
| | | | | | Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758
* [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.Craig Topper2016-11-131-0/+8
| | | | | | These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-1318-238/+617
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* [AVX-512] Remove the remaining masked shift by immediate or by single value. ↵Craig Topper2016-11-121-22/+0
| | | | | | | | Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725
* [AVX-512] Add unmasked version of shift by immediate and shift by single ↵Craig Topper2016-11-121-0/+22
| | | | | | | | | | | | | | | | | | | element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711
* [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQCraig Topper2016-11-121-28/+96
| | | | | | | | | | | | Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709
* AMDGPU/SI: Promote i16 = fp_[us]int f32 for VITom Stellard2016-11-121-0/+6
| | | | | | | | | | | | Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687
* AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopiesTom Stellard2016-11-111-24/+44
| | | | | | | | | | | | | | | | | | | | Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676
* [PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portionNemanja Ivanovic2016-11-112-5/+23
| | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638
* [AArch64] Update a FIXME comment to reflect current state. NFC.Chad Rosier2016-11-111-2/+4
| | | | llvm-svn: 286625
* [AArch64] Fix bugs in isel lowering replaceSplatVectorStore.Geoff Berry2016-11-111-11/+27
| | | | | | | | | | | | | | | | | Summary: Fix off-by-one indexing error in loop checking that inserted value was a splat vector. Add code to check that INSERT_VECTOR_ELT nodes constructing the splat vector have the expected constant index values. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26409 llvm-svn: 286616
* [AArch64] Remove lots of redundant code. NFC.Chad Rosier2016-11-111-30/+41
| | | | llvm-svn: 286606
* [AArch64] Early return and minor renaming/refactoring to ease code review. NFC.Chad Rosier2016-11-111-43/+43
| | | | llvm-svn: 286601
* [PowerPC] Add vector conversion builtins to altivec.h - LLVM portionNemanja Ivanovic2016-11-111-8/+16
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596
* [AArch64] Enable merging of adjacent zero stores for all subtargets.Chad Rosier2016-11-113-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592
* [AMDGPU] TargetStreamer: Fix .note section nameSam Kolton2016-11-111-2/+2
| | | | llvm-svn: 286591
* [SystemZ] Support CL(G)T instructionsUlrich Weigand2016-11-116-3/+58
| | | | | | | | This adds support for the compare logical and trap (memory) instructions that were added as part of the miscellaneous instruction extensions feature with zEC12. llvm-svn: 286587
* [SystemZ] Support load-and-zero-rightmost-byte facilityUlrich Weigand2016-11-116-3/+49
| | | | | | | | | | This adds support for the LZRF/LZRG/LLZRGF instructions that were added on z13, and uses them for code generation were appropriate. SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF over RISBG where both would be possible. llvm-svn: 286586
* [SystemZ] Use LLGT(R) instructionsUlrich Weigand2016-11-115-46/+50
| | | | | | | | | | | | | This adds support for the 31-to-64-bit zero extension instructions LLGT and LLGTR and uses them for code generation where appropriate. Since this operation can also be performed via RISBG, we have to update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT over RISBG in case both are possible. The patch includes some simplification to the tryRISBGZero code; this is not intended to cause any (further) functional change in codegen. llvm-svn: 286585
* [ARM] Add plumbing for GlobalISelDiana Picus2016-11-1113-4/+407
| | | | | | Add GlobalISel skeleton, up to the point where we can select a ret void. llvm-svn: 286573
* AMDGPU: Attempt to fix build failure on x86-64 selfhost buildYaxun Liu2016-11-111-2/+0
| | | | | | Remove redundant include file. llvm-svn: 286552
* Add a blank line for a test commit.Sean Fertile2016-11-111-0/+1
| | | | llvm-svn: 286550
* Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate ↵Stanislav Mekhanoshin2016-11-112-26/+5
| | | | | | | | condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530
* Fix requirements.Joerg Sonnenberger2016-11-101-1/+1
| | | | llvm-svn: 286527
* Timer: Remove group-less NamedRegionTimer constructor.Matthias Braun2016-11-101-2/+0
| | | | | | | | | | | | | | | The NamedRegionTimer initializer without a group name puts the Timer into the "Misc" group and is (nearly) unused. Remove it. The only user of this constructor appears to be the HexagonGenInsert pass, which creates a counter without group to count the complete execution time of that pass, however since every pass gets a counter by the PassManager anyway this should be unnecessary. Also removed the pointless TimerGroup there. Differential Revision: https://reviews.llvm.org/D25582 llvm-svn: 286524
* [DAG Combiner] Fix the native computation of the Newton series for reciprocalsEvandro Menezes2016-11-108-26/+31
| | | | | | | | | | | | The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
OpenPOWER on IntegriCloud