summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][FastISel] Assert that we are dealing with arithmetic with overflow ↵Zvi Rackover2016-11-151-0/+3
| | | | | | intrinsics. NFC llvm-svn: 286961
* [AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>Sam Kolton2016-11-153-47/+47
| | | | | | | | | | | | Summary: This is needed to be able to use this flags in InstrMappings. Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26666 llvm-svn: 286960
* [X86][FastISel] Fix lowering of overflow result on AVX512 targetsZvi Rackover2016-11-151-2/+2
| | | | | | | | | | | | | | | | Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958
* Introduce TLI predicative for base-relative Jump Tables.Joerg Sonnenberger2016-11-152-37/+5
| | | | | | | | | | | For 64bit ABIs it is common practice to use relative Jump Tables with potentially different relocation bases. As the logic for the jump table itself doesn't depend on the relocation base, make it easier for targets to use the generic logic. Start by dropping the now redundant MIPS logic. Differential Revision: https://reviews.llvm.org/D26578 llvm-svn: 286951
* [ARM] Add machine scheduler for Cortex-R52 Javed Absar2016-11-153-1/+985
| | | | | | | | | | | | | This patch adds the Sched Machine Model for Cortex-R52. Details of the pipeline and descriptions are in comments in file ARMScheduleR52.td included in this patch. Reviewers: rengolin, jmolloy Differential Revision: https://reviews.llvm.org/D26500 llvm-svn: 286949
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-157-2/+196
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* AMDGPU: Fix f16 fabs/fnegMatt Arsenault2016-11-152-4/+18
| | | | llvm-svn: 286931
* AMDGPU: Set hasExtraSrcRegAllocReq on v_div_scale_*Matt Arsenault2016-11-151-0/+2
| | | | | | | This doesn't solve any problems I know about, but this should have more conservative assumptions about the operands' llvm-svn: 286913
* AMDGPU: Fix formatting of 1/2pi immediateMatt Arsenault2016-11-151-2/+2
| | | | llvm-svn: 286912
* [AArch64] Compute the Newton series for reciprocals nativelyEvandro Menezes2016-11-143-6/+75
| | | | | | | | | | Implement the Newton series for square root, its reciprocal and reciprocal natively using the specialized instructions in AArch64 to perform each series iteration. Differential revision: https://reviews.llvm.org/D26518 llvm-svn: 286907
* [Hexagon] Give a predicate function a more meaningful nameKrzysztof Parzyszek2016-11-142-18/+18
| | | | | | | Change "orisadd" to "IsOrAdd" to follow the naming conventions, and change "isOrAdd" in the C++ code to "isOrEquivalentToAdd". llvm-svn: 286886
* ARM: try to fix GCC 4.8 compilation again after r286881.Tim Northover2016-11-141-1/+2
| | | | llvm-svn: 286882
* Recommit: ARM: sort register lists by encoding in push/pop instructions.Tim Northover2016-11-143-2/+28
| | | | | | | | | | | | | | | | | For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881
* [AArch64] Change some pointers to references. NFC.Geoff Berry2016-11-141-16/+16
| | | | | | Follow-up change to r286875. llvm-svn: 286879
* [AArch64] Split 0 vector stores into scalar store pairs.Geoff Berry2016-11-141-4/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The load store optimizer pass will merge them to store pair stores. This should be better than a movi to create the vector zero followed by a vector store if the zero constant is not re-used, since one instructions and one register live range will be removed. For example, the final generated code should be: stp xzr, xzr, [x0] instead of: movi v0.2d, #0 str q0, [x0] Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26561 llvm-svn: 286875
* [AArch64] Factor out transform code from split16BStore. NFC.Geoff Berry2016-11-141-24/+31
| | | | llvm-svn: 286874
* Revert: r286868 - Test commitDaniel Sanders2016-11-141-1/+0
| | | | llvm-svn: 286869
* Test commitDaniel Sanders2016-11-141-0/+1
| | | | llvm-svn: 286868
* Revert "ARM: sort register lists by encoding in push/pop instructions."Tim Northover2016-11-143-28/+2
| | | | | | | This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867
* ARM: sort register lists by encoding in push/pop instructions.Tim Northover2016-11-143-2/+28
| | | | | | | | | | | | | | For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866
* [PPC] Add intrinsic mapping to the xscvhpsp instructionSean Fertile2016-11-141-0/+9
| | | | | | | | | add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862
* AMDGPU/SI: Support data types other than V4f32 in image intrinsicsChangpeng Fang2016-11-142-63/+73
| | | | | | | | | | | | | | | | Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860
* [Hexagon] Remove unsafe load instructions that affect Stack Slot ColoringSumanth Gundapaneni2016-11-141-12/+0
| | | | | | | | | | | | | | | | | The Stack slot coloring pass removes a store that is followed by a load that deal with the same stack slot. The function isLoadFromStackSlot is supposed to consider the loads that have no side-effects. This patch fixed the issue by removing the unsafe loads from this function Eg: %vreg0<def> = L2_loadruh_io <fi#15>, 0 S2_storeri_io <fi#15>, 0, %vreg0 In this case, we load an unsigned extended half word and store this in to the same stack slot. The Stack slot coloring pass considers safe to remove the store. This patch marked all the non-vector byte and half word loads as unsafe. llvm-svn: 286843
* [CostModel][X86] Added mul costs for vXi8 vectorsSimon Pilgrim2016-11-141-5/+21
| | | | | | More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-0/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [PPC] add intrinsics for vec extract exp/significand and vec test data class.Sean Fertile2016-11-141-6/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829
* GlobalISel: Fix indentation. NFCDiana Picus2016-11-141-1/+1
| | | | llvm-svn: 286808
* [AVX-512] Add suffixless aliases for EVEX encoded ↵Craig Topper2016-11-141-0/+10
| | | | | | | | vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior. Fixes another problem from PR28850. llvm-svn: 286790
* [X86] Cleanup 'x' and 'y' mnemonic suffixes for ↵Craig Topper2016-11-143-23/+71
| | | | | | | | | | | | | vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787
* [AVX-512] Remove and autoupgrade masked dword/qword variable shift ↵Craig Topper2016-11-141-8/+0
| | | | | | intrinsics to the new unmasked versions and selects. llvm-svn: 286786
* [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an ↵Craig Topper2016-11-131-4/+14
| | | | | | | | immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd. Fixes PR24941. llvm-svn: 286775
* AMDGPU: Implement SGPR spilling with scalar storesMatt Arsenault2016-11-133-10/+153
| | | | | | | | | | | | | | | | nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
* revert commit r286761, some builds failed on Win platformsIgor Breger2016-11-131-0/+4
| | | | llvm-svn: 286765
* [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.Ayman Musa2016-11-131-4/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761
* [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.Ayman Musa2016-11-132-0/+91
| | | | | | Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758
* [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.Craig Topper2016-11-131-0/+8
| | | | | | These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-1318-238/+617
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* [AVX-512] Remove the remaining masked shift by immediate or by single value. ↵Craig Topper2016-11-121-22/+0
| | | | | | | | Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725
* [AVX-512] Add unmasked version of shift by immediate and shift by single ↵Craig Topper2016-11-121-0/+22
| | | | | | | | | | | | | | | | | | | element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711
* [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQCraig Topper2016-11-121-28/+96
| | | | | | | | | | | | Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709
* AMDGPU/SI: Promote i16 = fp_[us]int f32 for VITom Stellard2016-11-121-0/+6
| | | | | | | | | | | | Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687
* AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopiesTom Stellard2016-11-111-24/+44
| | | | | | | | | | | | | | | | | | | | Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676
* [PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portionNemanja Ivanovic2016-11-112-5/+23
| | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638
* [AArch64] Update a FIXME comment to reflect current state. NFC.Chad Rosier2016-11-111-2/+4
| | | | llvm-svn: 286625
* [AArch64] Fix bugs in isel lowering replaceSplatVectorStore.Geoff Berry2016-11-111-11/+27
| | | | | | | | | | | | | | | | | Summary: Fix off-by-one indexing error in loop checking that inserted value was a splat vector. Add code to check that INSERT_VECTOR_ELT nodes constructing the splat vector have the expected constant index values. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26409 llvm-svn: 286616
* [AArch64] Remove lots of redundant code. NFC.Chad Rosier2016-11-111-30/+41
| | | | llvm-svn: 286606
* [AArch64] Early return and minor renaming/refactoring to ease code review. NFC.Chad Rosier2016-11-111-43/+43
| | | | llvm-svn: 286601
* [PowerPC] Add vector conversion builtins to altivec.h - LLVM portionNemanja Ivanovic2016-11-111-8/+16
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596
* [AArch64] Enable merging of adjacent zero stores for all subtargets.Chad Rosier2016-11-113-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592
* [AMDGPU] TargetStreamer: Fix .note section nameSam Kolton2016-11-111-2/+2
| | | | llvm-svn: 286591
OpenPOWER on IntegriCloud