summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Fix spelling of ICH_ELRSR_EL2 system registerOliver Stannard2018-02-061-1/+1
| | | | | | | This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for ICH_ELRSR_EL2. llvm-svn: 324325
* [ARM][AArch64] Add CSDB speculation barrier instructionOliver Stannard2018-02-064-11/+18
| | | | | | | | | | | | | | | This adds the CSDB instruction, which is a new barrier instruction described by the whitepaper at [1]. This is in encoding space which was previously executed as a NOP, so it is available for all targets that have the relevant NOP encoding space. This matches the binutils behaviour for these instructions [2][3]. [1] https://developer.arm.com/support/security-update [2] https://sourceware.org/ml/binutils/2018-01/msg00116.html [3] https://sourceware.org/ml/binutils/2018-01/msg00120.html llvm-svn: 324324
* [ARM] Armv8.2-A FP16 code generation (part 3/3)Sjoerd Meijer2018-02-063-34/+107
| | | | | | | | | | | | | | | This adds most of the FP16 codegen support, but these areas need further work: - FP16 literals and immediates are not properly supported yet (e.g. literal pool needs work), - Instructions that are generated from intrinsics (e.g. vabs) haven't been added. This will be addressed in follow-up patches. Differential Revision: https://reviews.llvm.org/D42849 llvm-svn: 324321
* AMDGPU/MemoryModel: Fix monotonic atomic loadsKonstantin Zhuravlyov2018-02-061-1/+2
| | | | | | Those should have glc bit set for system and agent synchronization scopes llvm-svn: 324314
* [RISCV] Add support for %pcrel_lo.Ahmed Charles2018-02-067-12/+36
| | | | llvm-svn: 324303
* Revert "Don't assume a null GV is local for ELF and MachO."Reid Kleckner2018-02-061-14/+7
| | | | | | | | This reverts r323297. It breaks building grub. llvm-svn: 324301
* [X86] Relax restrictions on what setcc condition codes can be folded with a ↵Craig Topper2018-02-051-2/+1
| | | | | | | | sext when AVX512 is enabled. We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX. llvm-svn: 324294
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-052-0/+5
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* [X86] Teach DAG unfoldMemoryOperand to reconvert CMPs to testsNirav Dave2018-02-051-0/+24
| | | | | | | | | | | | | | | | Summary: Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold. This fixes a regression from upcoming D41293 change. Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42808 llvm-svn: 324261
* [X86] Artificially lower the complexity of the scalar ANDN patterns so that ↵Craig Topper2018-02-051-2/+3
| | | | | | | | | | | | AND with immediate will match first. This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate. This also allows us to match an and to a movzx after a not. This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT. llvm-svn: 324260
* [Hexagon] Memoize instruction positions in BitTrackerKrzysztof Parzyszek2018-02-052-10/+22
| | | | llvm-svn: 324250
* [X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes ↵Craig Topper2018-02-051-3/+19
| | | | | | | | | | | | of a 64 bit mask. If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops. This patch teachs shrinkAndImmediate not to break that optimization. Differential Revision: https://reviews.llvm.org/D42899 llvm-svn: 324249
* BitTracker.h needs a full definition of MachineInstr, so include the ↵Benjamin Kramer2018-02-051-1/+1
| | | | | | | | | | defining file. Patch by Dean Sturtevant! Differential Revision: https://reviews.llvm.org/D42907 llvm-svn: 324245
* [Hexagon] Forgot about HexagonISD::VZERO in selecting const vectorsKrzysztof Parzyszek2018-02-051-1/+1
| | | | llvm-svn: 324244
* [Hexagon] Don't use garbage mask in HvxSelector::shuffp2Krzysztof Parzyszek2018-02-051-0/+2
| | | | | | | | The function shuffp2 was breaking up a wide shuffle into a pair of narrower ones, except that the narrower shuffle masks were actually uninitialized. llvm-svn: 324243
* [Hexagon] Use V6_vmpyih for halfword multiplicationKrzysztof Parzyszek2018-02-051-5/+6
| | | | | | | Unlike V6_vmpyhv, it produces the result in the exact form that is expected without the need for a shuffle. llvm-svn: 324241
* [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifierDmitry Preobrazhensky2018-02-054-15/+44
| | | | | | | | | See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154 Differential Revision: https://reviews.llvm.org/D42847 Reviewers: cfang, artem.tamazov, arsenm llvm-svn: 324237
* [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodesDmitry Preobrazhensky2018-02-056-3/+67
| | | | | | | | | | | See bugs 36094, 36095: https://bugs.llvm.org/show_bug.cgi?id=36094 https://bugs.llvm.org/show_bug.cgi?id=36095 Differential Revision: https://reviews.llvm.org/D42692 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 324231
* [PowerPC] Check hot loop exit edge in PPCCTRLoopsHiroshi Inoue2018-02-051-0/+21
| | | | | | | | | | | | | | | | PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr. But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge: for (unsigned i = 0; i < TripCount; i++) { // do something if (__builtin_expect(check(), 1)) break; // do something } Differential Revision: https://reviews.llvm.org/D42637 llvm-svn: 324229
* [X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. ↵Craig Topper2018-02-053-55/+108
| | | | | | | | Remove combineBitcastForMaskedOp. Add test cases for the merge masked versions to make sure we have all those covered. llvm-svn: 324210
* [X86] Remove unused lambda. NFCCraig Topper2018-02-051-11/+0
| | | | llvm-svn: 324206
* [X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel ↵Craig Topper2018-02-052-20/+46
| | | | | | | | | | patterns instead. We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking. The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it. llvm-svn: 324205
* [X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> ↵Craig Topper2018-02-041-0/+53
| | | | | | | | | | (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations. This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions. There's still some room for improvement to remove more transitions, but this is a good start. llvm-svn: 324184
* [X86] Remove unused function argument. NFCCraig Topper2018-02-041-4/+3
| | | | llvm-svn: 324183
* [X86] Remove and autoupgrade kand/kandn/kor/kxor/kxnor/knot intrinsics.Craig Topper2018-02-032-30/+0
| | | | | | | | Clang already stopped using these a couple months ago. The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around. llvm-svn: 324177
* [X86] Prefer to create a ISD::SETCC over X86ISD::PCMPEQ in ↵Craig Topper2018-02-021-3/+3
| | | | | | | | combineVectorSizedSetCCEquality. This is running pre-legalize, we should try to use target independent nodes. This will give the best opportunity for target independent optimizations. llvm-svn: 324147
* [X86] Pass SDLoc by const reference in a few more places in ↵Craig Topper2018-02-021-6/+8
| | | | | | X86ISelLowering.cpp. NFC llvm-svn: 324135
* [AArch64][GlobalISel] Use getRegClassForTypeOnBank() in selectCopy.Amara Emerson2018-02-021-23/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D42832 llvm-svn: 324110
* [X86] Remove checks for FeatureAVX512 from the X86 assembly parser. Remove ↵Craig Topper2018-02-021-83/+73
| | | | | | | | | | | | | | | | | | | | | mcpu/mattr from assembly test command lines. Summary: We should always be able to accept AVX512 registers and instructions in llvm-mc. The only subtarget mode that should be checked is 16-bit vs 32-bit vs 64-bit mode. I've also removed all the mattr/mcpu lines from test RUN lines to be consistent with this. Most were due to AVX512, but a few were for other features. Fixes PR36202 Reviewers: RKSimon, echristo, bkramer Reviewed By: echristo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42824 llvm-svn: 324106
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-022-20/+3
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [X86] Legalize (v64i1 (bitcast (i64 X))) on 32-bit targets by extracting ↵Craig Topper2018-02-021-0/+16
| | | | | | | | 32-bit halves from i32, bitcasting each to v32i1, and concatenating. This prevents the scalarization that would otherwise occur. llvm-svn: 324057
* [X86] Legalize (i64 (bitcast (v64i1 X))) on 32-bit targets by extracting to ↵Craig Topper2018-02-021-0/+17
| | | | | | | | v32i1 and bitcasting to i32. This saves a trip through memory and seems to open up other combining opportunities. llvm-svn: 324056
* [RISCV] Fix c.addi and c.addi16sp immediate constraints which should be non-zeroShiva Chen2018-02-022-9/+34
| | | | | | Differential Revision: https://reviews.llvm.org/D42782 llvm-svn: 324055
* [RISCV] Define getSetCCResultType for setting vector setCC typeShiva Chen2018-02-022-0/+10
| | | | | | | | To avoid trigger "No default SetCC type for vectors!" Assertion Differential Revision: https://reviews.llvm.org/D42675 llvm-svn: 324054
* [GlobalISel] Constrain the dest reg of IMPLICT_DEF.Amara Emerson2018-02-021-0/+6
| | | | | | | | | | This fixes a crash where the user is a COPY, which deliberately does not constrain its source operands, resulting in a vreg without a reg class escaping selection. Differential Revision: https://reviews.llvm.org/D42697 llvm-svn: 324047
* Remove non-modular header containing static utility functionsDavid Blaikie2018-02-022-203/+178
| | | | | | | | | | The one place that uses these functions isn't particularly long/complicated, so it's easier to just have these inline at that location than trying to split it out into a true header. (in part also because of the use of the DEBUG macros, which make this not really a standalone header even if the static functions were made inline instead) llvm-svn: 324044
* [X86] Separate the call to LowerVectorAllZeroTest from EmitTest. NFCICraig Topper2018-02-011-17/+21
| | | | | | | | | | Every instruction that has the word TEST in its name seems to have been buried into EmitTest. But that code is largely concerned with trying to reuse the flags from instructions that update flags in a pretty normal way. PTEST/TESTP/KTEST do not update flags in a normal way. They only update Z and C and the C flag update is non-standard. Rather than try to bend EmitTest's already complex logic to accomodate this, just move the call up to LowerSETCC and replicate the few pre-checks that are needed. While there add a FIXME for using the C flag for checking for all 1s which we definitely couldn't do from EmitTEST. llvm-svn: 324029
* [PowerPC] Tell VSX swap removal that scalar conversions are lane-sensitiveNemanja Ivanovic2018-02-011-0/+2
| | | | | | | | This is a rather non-controversial change. We were missing these instructions from the list of instructions that are lane-sensitive. These two put the result into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068. llvm-svn: 324005
* [AArch64] remove bogus comment; NFCSanjay Patel2018-02-011-3/+0
| | | | | | | I added this comment with D42323, but as discussed in D42806, the architecture does the right thing for denorms. We don't even need the select on 0.0 here? llvm-svn: 323996
* AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the ↵Changpeng Fang2018-02-012-4/+10
| | | | | | | | | | | | target has UnpackedD16VMem feature. Reviewers: Matt and Brian Differential Revision: https://reviews.llvm.org/D42548 llvm-svn: 323988
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - add support for scaling ↵Simon Pilgrim2018-02-011-5/+44
| | | | | | | | | | index vectors This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles. Differential Revision: https://reviews.llvm.org/D42487 llvm-svn: 323987
* [X86] Remove custom lowering vXi1 extending loads and truncating stores.Craig Topper2018-02-011-174/+0
| | | | | | | | | | | | | | Summary: Now that v2i1/v4i1 are legal without VLX. And v32i1 is legalized by splitting rather than widening. And isVectorLoadExtDesirable returns false for vXi1. It appears this handling is dead because the operations simply don't exist. Reviewers: RKSimon, zvi, guyblank, delena, spatel Reviewed By: delena Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D42781 llvm-svn: 323983
* [X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just ↵Craig Topper2018-02-012-2/+14
| | | | | | | | | | | | | | | | | | | | | before isel to enable test instruction matching Summary: EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass. This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel. In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working. Reviewers: spatel, RKSimon, niravd, deadalnix Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42764 llvm-svn: 323982
* [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)Sanjay Patel2018-02-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | As shown in the example in PR34994: https://bugs.llvm.org/show_bug.cgi?id=34994 ...we can return a very wrong answer (inf instead of 0.0) for square root when using a reciprocal square root estimate instruction. Here, I've conditionalized the filtering out of denorms based on the function having "denormal-fp-math"="ieee" in its attributes. The other options for this attribute are 'preserve-sign' and 'positive-zero'. So we don't generate this extra code by default with just '-ffast-math' (because then there's no denormal attribute string at all), but it works if you specify '-ffast-math -fdenormal-fp-math=ieee' from clang. As noted in the review, there may be other problems in clang that affect the results depending on platform (Linux x86 at least), but this should allow creating the desired codegen. Differential Revision: https://reviews.llvm.org/D42323 llvm-svn: 323981
* [ARM] FullFP16 LowerReturn FixSjoerd Meijer2018-02-011-2/+2
| | | | | | | | | | Commit r323512 introduced an optimisation in LowerReturn for half-precision return values. A missing check caused a crash when the return value is "undef" (i.e. a node that has no operands). Differential Revision: https://reviews.llvm.org/D42743 llvm-svn: 323968
* [mips] Include EVA instructions in Std2MicroMips mapping tablesAleksandar Beserminji2018-02-014-23/+38
| | | | | | | | | This patch includes EVA instructions in the Std2MicroMips mapping tables, which is required for direct object emission. Differential Revision: https://reviews.llvm.org/D41771 llvm-svn: 323958
* [AArch64][NFC] Make all ProcResource definitions include their SchedModel.Clement Courbet2018-02-013-42/+35
| | | | | | | This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all other targets. llvm-svn: 323955
* [ARM] Add support for unpredictable MVN instructions.Yvan Roux2018-02-011-2/+8
| | | | | | | | | | | | | | | This fixes bugzilla 33011 https://bugs.llvm.org/show_bug.cgi?id=33011 Defines bits {19-16} as zero or unpredictable as specified by the ARM ARM in sections A8.8.116 and A8.8.117. It fixes also the usage of PC register as destination register for MVN register-shifted register version as specified in A8.8.117. Differential Revision: https://reviews.llvm.org/D41905 llvm-svn: 323954
* Test commit: Fix a comment.Yvan Roux2018-02-011-1/+1
| | | | llvm-svn: 323947
* [XRay][compiler-rt+llvm] Update XRay register stashing semanticsDean Michael Berris2018-02-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: This change expands the amount of registers stashed by the entry and `__xray_CustomEvent` trampolines. We've found that since the `__xray_CustomEvent` trampoline calls can show up in situations where the scratch registers are being used, and since we don't typically want to affect the code-gen around the disabled `__xray_customevent(...)` intrinsic calls, that we need to save and restore the state of even the scratch registers in the handling of these custom events. Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc Reviewed By: echristo Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D40894 llvm-svn: 323940
OpenPOWER on IntegriCloud