summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [mips] Fix PR35140Simon Dardis2017-11-061-4/+4
| | | | | | | | | | | | | | Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470
* [X86][AVX512] Improve lowering of AVX512 test intrinsicsUriel Korach2017-11-062-4/+20
| | | | | | | | | | | | Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465
* [X86] Replace duplicate function call with variable. NFCUriel Korach2017-11-061-2/+2
| | | | | | | | | | | | | Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 || N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464
* X86 ISel: Basic support for variable-index vector permutationsZvi Rackover2017-11-061-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a *constant* vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463
* Revert "adding a pattern for broadcastm"Jina Nahias2017-11-061-2/+2
| | | | | | | This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462
* [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IRJina Nahias2017-11-062-17/+17
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
* adding a pattern for broadcastmJina Nahias2017-11-061-2/+2
| | | | | Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457
* [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.Craig Topper2017-11-062-39/+49
| | | | llvm-svn: 317454
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-065-37/+92
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.Craig Topper2017-11-062-10/+19
| | | | | | Fixes PR35161. llvm-svn: 317445
* [X86] Add missing predicate to a pattern. NFCCraig Topper2017-11-051-0/+2
| | | | | | Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442
* [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I ↵Craig Topper2017-11-052-25/+12
| | | | | | missed in r317413. llvm-svn: 317441
* [X86] Fix outdated comment. NFCCraig Topper2017-11-051-1/+1
| | | | llvm-svn: 317440
* [REVERT][LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-4/+1
| | | | | | | | | reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-051-1/+4
| | | | | | | | This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432
* [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵Craig Topper2017-11-046-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
* [X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 ↵Craig Topper2017-11-041-1/+19
| | | | | | | | into VPERM2F128/VPERM2I128. This recovers some of the tests that were changed by r317403. llvm-svn: 317410
* [AMDGPU] Remove hardcoded address space value from AMDGPULibFuncYaxun Liu2017-11-043-24/+29
| | | | | | | | | | | | AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409
* [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.Craig Topper2017-11-041-0/+10
| | | | | | | | This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403
* [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to ↵Craig Topper2017-11-031-4/+4
| | | | | | make it possible to fold a load. llvm-svn: 317382
* Move TargetFrameLowering.h to CodeGen where it's implementedDavid Blaikie2017-11-0345-45/+45
| | | | | | | | | | | This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379
* Add llvm::for_each as a range-based extensions to <algorithm> and make use ↵Aaron Ballman2017-11-031-8/+7
| | | | | | of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356
* [AArch64] Fix the number of iterations for the Newton seriesEvandro Menezes2017-11-031-1/+1
| | | | | | | | | The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349
* [mips] Match 'ins' and its' variants with C++ codeSimon Dardis2017-11-035-12/+70
| | | | | | | | | | | | | Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to C++ code. This resolves an issue where ISel would select 'dins' instead of 'dinsm' when the instructions size and position were individually in range but their sum was out of range according to the ISA specification. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39117 llvm-svn: 317331
* Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own.Andrew V. Tischenko2017-11-032-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D39546 llvm-svn: 317330
* [X86][SSE] Add PACKUS support to combineVectorTruncationSimon Pilgrim2017-11-031-6/+16
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315
* [ARM GlobalISel] Move the check for Thumb higher upDiana Picus2017-11-031-6/+6
| | | | | | | | | We're currently bailing out for Thumb targets while lowering formal parameters, but there used to be some other checks before it, which could've caused some functions (e.g. those without formal parameters) to sneak through unnoticed. llvm-svn: 317312
* [AArch64] Use dwarf exception handling on MinGWMartin Storsjo2017-11-032-1/+11
| | | | | | | | | | Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304
* [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move ↵Craig Topper2017-11-032-22/+131
| | | | | | to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299
* Avoid PLT for external calls when attribute nonlazybind is used.Sriraman Tallam2017-11-031-2/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292
* [AArch64][RegisterBankInfo] Add mapping for G_FPEXT.Quentin Colombet2017-11-023-1/+89
| | | | | | | | | | This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287
* [AArch64][RegisterBankInfo] Add FPR16 support in value mapping.Quentin Colombet2017-11-023-35/+48
| | | | | | NFC. llvm-svn: 317286
* [X86] Give AVX512VL instructions priority over their AVX equivalents.Craig Topper2017-11-021-2/+2
| | | | | | I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283
* AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]Konstantin Zhuravlyov2017-11-021-1/+0
| | | | llvm-svn: 317280
* [Hexagon] Prefer L2_loadrub_io over L4_loadrub_rrKrzysztof Parzyszek2017-11-021-52/+82
| | | | | | | If the offset is an immediate, avoid putting it in a register to get Rs+Rt<<#0. llvm-svn: 317275
* AMDGPU: Remove outdated fixme (it was already fixed)Konstantin Zhuravlyov2017-11-021-3/+0
| | | | llvm-svn: 317266
* [mips] Use register scavenging with MSA.Simon Dardis2017-11-022-24/+19
| | | | | | | | | | | | | | | | | MSA stores and loads to the stack are more likely to require an emergency GPR spill slot due to the smaller offsets available with those instructions. Handle this by overestimating the size of the stack by determining the largest offset presuming that all callee save registers are spilled and accounting of incoming arguments when determining whether an emergency spill slot is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39056 llvm-svn: 317204
* [ARM] and, or, xor and add with shl combineSam Parker2017-11-021-7/+120
| | | | | | | | | | | | | | | | | | | The generic dag combiner will fold: (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) This can create constants which are too large to use as an immediate. Many ALU operations are also able of performing the shl, so we can unfold the transformation to prevent a mov imm instruction from being generated. Other patterns, such as b + ((a << 1) | 510), can also be simplified in the same manner. Differential Revision: https://reviews.llvm.org/D38084 llvm-svn: 317197
* The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, ↵Andrew V. Tischenko2017-11-021-0/+93
| | | | | | | | | VXOR, VPERMILx, VBROADCASTx, etc. PR32857 should be closed. Differential Revision: https://reviews.llvm.org/D39227 llvm-svn: 317196
* Revert "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-013-54/+0
| | | | | | | This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136
* [X86] Use foreach in X86.td to combine some of the CPU names that are ↵Craig Topper2017-11-011-52/+40
| | | | | | obviously aliases. NFC llvm-svn: 317134
* [X86] Add CMOV feature to 'i686' processor, making it a proper alias of ↵Craig Topper2017-11-011-1/+1
| | | | | | | | pentiumpro which I believe it should be. This is consistent with current gcc behavior. llvm-svn: 317133
* [X86][SSE] Add PACKUS support to LowerTruncateSimon Pilgrim2017-11-011-12/+26
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128
* [X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit ↵Craig Topper2017-11-011-0/+21
| | | | | | | | | | VPALIGND/Q into VPALIGNR if the extended registers aren't being used. This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding. Differential Revision: https://reviews.llvm.org/D39401 llvm-svn: 317122
* AMDGPU: Fix set but not used warnings related to AMDGPUASKonstantin Zhuravlyov2017-11-017-36/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114
* [X86] Prevent fast isel from folding loads into the instructions listed in ↵Craig Topper2017-11-011-0/+7
| | | | | | | | | | | | | | hasPartialRegUpdate. This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses. We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch. Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want. Differential Revision: https://reviews.llvm.org/D39402 llvm-svn: 317112
* Adds code to PPC ISEL lowering to recognize half-word inserts from ↵Graham Yiu2017-11-013-5/+139
| | | | | | | | vector_shuffles, and use P9 shift and vector insert instructions instead of vperm. Differential Revision: https://reviews.llvm.org/D34160 llvm-svn: 317111
* [X86] Add 64-bit int to float/double conversion with AVX to ↵Craig Topper2017-11-011-3/+4
| | | | | | | | | | | | | | | | | | | | | X86FastISel::X86SelectSIToFP Summary: [X86] Teach fast isel to handle i64 sitofp with AVX. For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX. Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39450 llvm-svn: 317102
* Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.Andrew V. Tischenko2017-11-011-0/+39
| | | | | | Differential Revision: https://reviews.llvm.org/D39059 llvm-svn: 317101
* Correct dwarf unwind information in function epilogue for X86Petar Jovanovic2017-11-013-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100
OpenPOWER on IntegriCloud