summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Fix buildbot breakages from r317503. Add parentheses to assignment when ↵Graham Yiu2017-11-061-2/+2
| | | | | | using result as a condition. llvm-svn: 317508
* Adds code to PPC ISEL lowering to recognize byte inserts from ↵Graham Yiu2017-11-063-3/+117
| | | | | | | | vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word. Differential Revision: https://reviews.llvm.org/D34497 llvm-svn: 317503
* [PPC] Use xxbrd to speed up bswap64Guozhi Wei2017-11-062-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64. rotldi 5, 3, 16 rotldi 4, 3, 8 rotldi 9, 3, 24 rotldi 10, 3, 32 rotldi 11, 3, 48 rotldi 12, 3, 56 rldimi 4, 5, 8, 48 rldimi 4, 9, 16, 40 rldimi 4, 10, 24, 32 rldimi 4, 11, 40, 16 rldimi 4, 12, 48, 8 rldimi 4, 3, 56, 0 But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to: mtvsrdd 34, 3, 3 xxbrd 34, 34 mfvsrld 3, 34 Differential Revision: https://reviews.llvm.org/D39510 llvm-svn: 317499
* AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32Matt Arsenault2017-11-065-13/+90
| | | | llvm-svn: 317492
* [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' ↵Sanjay Patel2017-11-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488
* [X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI.Simon Pilgrim2017-11-061-12/+8
| | | | | | We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch. llvm-svn: 317485
* [X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before ↵Simon Pilgrim2017-11-061-2/+2
| | | | | | | | | | XFormVExtractWithShuffleIntoLoad combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad. Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit. llvm-svn: 317481
* [AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environmentYaxun Liu2017-11-061-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479
* [SystemZ] implement hasDivRemOp()Jonas Paulsson2017-11-062-0/+6
| | | | | | | | | SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477
* [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bitYaxun Liu2017-11-061-5/+10
| | | | | | | | | | | | The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476
* [mips] Add movep for microMIPS32R6 and fix microMIPS32r3 versionSimon Dardis2017-11-068-5/+51
| | | | | | | | | | | | | | | | | | | Previously, the 'movep' instruction was defined for microMIPS32r3 and shared that definition with microMIPS32R6. 'movep' was re-encoded for microMIPS32r6, so this patch provides the correct encoding. Secondly, correct the encoding of the 'rs' and 'rt' operands which have an instruction specific encoding for the registers those operands accept. Finally, correct the decoding of the 'dst_regs' operand which was extracting the relevant field from the instruction, but was actually extracting the field from the alreadly extracted field. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39495 llvm-svn: 317475
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-061-1/+4
| | | | | | | | | | Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
* [mips] Fix PR35140Simon Dardis2017-11-061-4/+4
| | | | | | | | | | | | | | Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470
* [X86][AVX512] Improve lowering of AVX512 test intrinsicsUriel Korach2017-11-062-4/+20
| | | | | | | | | | | | Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465
* [X86] Replace duplicate function call with variable. NFCUriel Korach2017-11-061-2/+2
| | | | | | | | | | | | | Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 || N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464
* X86 ISel: Basic support for variable-index vector permutationsZvi Rackover2017-11-061-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a *constant* vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463
* Revert "adding a pattern for broadcastm"Jina Nahias2017-11-061-2/+2
| | | | | | | This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462
* [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IRJina Nahias2017-11-062-17/+17
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
* adding a pattern for broadcastmJina Nahias2017-11-061-2/+2
| | | | | Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457
* [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.Craig Topper2017-11-062-39/+49
| | | | llvm-svn: 317454
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-065-37/+92
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.Craig Topper2017-11-062-10/+19
| | | | | | Fixes PR35161. llvm-svn: 317445
* [X86] Add missing predicate to a pattern. NFCCraig Topper2017-11-051-0/+2
| | | | | | Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442
* [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I ↵Craig Topper2017-11-052-25/+12
| | | | | | missed in r317413. llvm-svn: 317441
* [X86] Fix outdated comment. NFCCraig Topper2017-11-051-1/+1
| | | | llvm-svn: 317440
* [REVERT][LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-4/+1
| | | | | | | | | reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-1/+4
| | | | | | | | This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432
* [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵Craig Topper2017-11-046-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
* [X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 ↵Craig Topper2017-11-041-1/+19
| | | | | | | | into VPERM2F128/VPERM2I128. This recovers some of the tests that were changed by r317403. llvm-svn: 317410
* [AMDGPU] Remove hardcoded address space value from AMDGPULibFuncYaxun Liu2017-11-043-24/+29
| | | | | | | | | | | | AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409
* [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.Craig Topper2017-11-041-0/+10
| | | | | | | | This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403
* [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to ↵Craig Topper2017-11-031-4/+4
| | | | | | make it possible to fold a load. llvm-svn: 317382
* Move TargetFrameLowering.h to CodeGen where it's implementedDavid Blaikie2017-11-0345-45/+45
| | | | | | | | | | | This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379
* Add llvm::for_each as a range-based extensions to <algorithm> and make use ↵Aaron Ballman2017-11-031-8/+7
| | | | | | of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356
* [AArch64] Fix the number of iterations for the Newton seriesEvandro Menezes2017-11-031-1/+1
| | | | | | | | | The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349
* [mips] Match 'ins' and its' variants with C++ codeSimon Dardis2017-11-035-12/+70
| | | | | | | | | | | | | Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to C++ code. This resolves an issue where ISel would select 'dins' instead of 'dinsm' when the instructions size and position were individually in range but their sum was out of range according to the ISA specification. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39117 llvm-svn: 317331
* Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own.Andrew V. Tischenko2017-11-032-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D39546 llvm-svn: 317330
* [X86][SSE] Add PACKUS support to combineVectorTruncationSimon Pilgrim2017-11-031-6/+16
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315
* [ARM GlobalISel] Move the check for Thumb higher upDiana Picus2017-11-031-6/+6
| | | | | | | | | We're currently bailing out for Thumb targets while lowering formal parameters, but there used to be some other checks before it, which could've caused some functions (e.g. those without formal parameters) to sneak through unnoticed. llvm-svn: 317312
* [AArch64] Use dwarf exception handling on MinGWMartin Storsjo2017-11-032-1/+11
| | | | | | | | | | Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304
* [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move ↵Craig Topper2017-11-032-22/+131
| | | | | | to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299
* Avoid PLT for external calls when attribute nonlazybind is used.Sriraman Tallam2017-11-031-2/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292
* [AArch64][RegisterBankInfo] Add mapping for G_FPEXT.Quentin Colombet2017-11-023-1/+89
| | | | | | | | | | This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287
* [AArch64][RegisterBankInfo] Add FPR16 support in value mapping.Quentin Colombet2017-11-023-35/+48
| | | | | | NFC. llvm-svn: 317286
* [X86] Give AVX512VL instructions priority over their AVX equivalents.Craig Topper2017-11-021-2/+2
| | | | | | I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283
* AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]Konstantin Zhuravlyov2017-11-021-1/+0
| | | | llvm-svn: 317280
* [Hexagon] Prefer L2_loadrub_io over L4_loadrub_rrKrzysztof Parzyszek2017-11-021-52/+82
| | | | | | | If the offset is an immediate, avoid putting it in a register to get Rs+Rt<<#0. llvm-svn: 317275
* AMDGPU: Remove outdated fixme (it was already fixed)Konstantin Zhuravlyov2017-11-021-3/+0
| | | | llvm-svn: 317266
* [mips] Use register scavenging with MSA.Simon Dardis2017-11-022-24/+19
| | | | | | | | | | | | | | | | | MSA stores and loads to the stack are more likely to require an emergency GPR spill slot due to the smaller offsets available with those instructions. Handle this by overestimating the size of the stack by determining the largest offset presuming that all callee save registers are spilled and accounting of incoming arguments when determining whether an emergency spill slot is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39056 llvm-svn: 317204
* [ARM] and, or, xor and add with shl combineSam Parker2017-11-021-7/+120
| | | | | | | | | | | | | | | | | | | The generic dag combiner will fold: (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) This can create constants which are too large to use as an immediate. Many ALU operations are also able of performing the shl, so we can unfold the transformation to prevent a mov imm instruction from being generated. Other patterns, such as b + ((a << 1) | 510), can also be simplified in the same manner. Differential Revision: https://reviews.llvm.org/D38084 llvm-svn: 317197
OpenPOWER on IntegriCloud