summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Re-commit r324600 with fixed LLVMBuild.txtOliver Stannard2018-02-082-10/+3
| | | | | | | | | | | | | | | | | | | | | | ARMDisassembler now depends on the banked register tables in ARMUtils, so the LLVMBuild.txt needed updating to reflect this. Original commit mesage: [ARM] Fix disassembly of invalid banked register moves When disassembling banked register move instructions, we don't have an assembly syntax for the unallocated register numbers, so we have to return Fail rather than SoftFail. Previously we were returning SoftFail, then crashing in the InstPrinter as we have no way to represent these encodings in an assembly string. This also switches the decoder to use the table-generated list of banked registers, removing the duplicated list of encodings. Differential revision: https://reviews.llvm.org/D43066 llvm-svn: 324606
* Revert r324600 as it breaks a buildbotOliver Stannard2018-02-081-2/+9
| | | | | | | | The broken bot (clang-ppc64le-linux-multistage) is doign a shared-object build, so I guess using lookupBankedRegByEncoding in the disassembler is a layering violation? llvm-svn: 324604
* [ARM] Fix disassembly of invalid banked register movesOliver Stannard2018-02-081-9/+2
| | | | | | | | | | | | | | | When disassembling banked register move instructions, we don't have an assembly syntax for the unallocated register numbers, so we have to return Fail rather than SoftFail. Previously we were returning SoftFail, then crashing in the InstPrinter as we have no way to represent these encodings in an assembly string. This also switches the decoder to use the table-generated list of banked registers, removing the duplicated list of encodings. Differential revision: https://reviews.llvm.org/D43066 llvm-svn: 324600
* [X86] Fix compilation of r324580.Clement Courbet2018-02-081-1/+1
| | | | | | @ctopper Can you check that the fix is correct ? llvm-svn: 324586
* Revert accidental changes that snuck in r324584Stefan Maksimovic2018-02-083-10/+1
| | | | llvm-svn: 324585
* [mips] Define certain instructions in microMIPS32r3Stefan Maksimovic2018-02-089-155/+162
| | | | | | | | | | | | | | | | | | | | Instructions affected: mthc1, mfhc1, add.d, sub.d, mul.d, div.d, mov.d, neg.d, cvt.w.d, cvt.d.s, cvt.d.w, cvt.s.d These instructions are now defined for microMIPS32r3 + microMIPS32r6 in MicroMipsInstrFPU.td since they shared their encoding with those already defined in microMIPS32r6InstrInfo.td and have been therefore removed from the latter file. Some instructions present in MicroMipsInstrFPU.td which did not have both AFGR64 and FGR64 variants defined have been altered to do so. Differential revision: https://reviews.llvm.org/D42738 llvm-svn: 324584
* [AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supportedSjoerd Meijer2018-02-082-2/+3
| | | | | | | | | | | | | | We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled. I've not added any tests, because the problem was visible in: test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll, which I had to change: I don't think Cyclone has FullFP16 enabled by default, so it shouldn't be using this v8.2a instruction. I've also removed these rdar tags, please shout if there are any objections. Differential Revision: https://reviews.llvm.org/D43020 llvm-svn: 324581
* [X86] Support folding in a k-register OR when creating KORTEST from scalar ↵Craig Topper2018-02-081-1/+9
| | | | | | | | compare of a bitcast from vXi1. This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead. llvm-svn: 324580
* [X86] Allow KORTEST instruction to be used for testing if a mask is all onesCraig Topper2018-02-081-3/+7
| | | | | | | | The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1) Differential Revision: https://reviews.llvm.org/D42772 llvm-svn: 324577
* [X86] Don't emit KTEST instructions unless only the Z flag is being usedCraig Topper2018-02-082-29/+40
| | | | | | | | | | | | | | | | | | | | | | | Summary: KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared. We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare. The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work. This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code. This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0? Reviewers: spatel, guyblank, RKSimon, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42770 llvm-svn: 324576
* ARM: Remove dead code. NFCI.Peter Collingbourne2018-02-082-6/+0
| | | | llvm-svn: 324565
* bpf: Improve expanding logic in LowerSELECT_CCYonghong Song2018-02-081-0/+5
| | | | | | | | | | | | | | | LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It is not guaranteed to place ConstantNode at RHS which would miss matching Select_Ri. A new testcase added into the existing select_ri.ll, also there is an existing case in cmp.ll which would be improved to use Select_Ri after this patch, it is adjusted accordingly. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 324560
* AMDGPU: Fix incorrect reordering when inline asm defines LDS addressMatt Arsenault2018-02-081-3/+4
| | | | | | | Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554
* AMDGPU: Don't crash when trying to fold implicit operandsMatt Arsenault2018-02-081-0/+1
| | | | llvm-svn: 324550
* [NVPTX] When dying due to a bad address space value, print out the value.Justin Lebar2018-02-081-1/+2
| | | | llvm-svn: 324549
* [AMDGPU] Fixed wait count reuseStanislav Mekhanoshin2018-02-081-50/+40
| | | | | | | | | | | | | The code reusing existing wait counts is incorrect since it keeps adding new operands to an old instruction instead of replacing the immediate. It was also effectively switched off by the condition that wait count is not an AMDGPU::S_WAITCNT. Also switched to BuildMI instead of creating instructions directly. Differential Revision: https://reviews.llvm.org/D42997 llvm-svn: 324547
* [x86] Fix nasty bug in the x86 backend that is essentially impossible toChandler Carruth2018-02-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hit from IR but creates a minefield for MI passes. The x86 backend has fairly powerful logic to try and fold loads that feed register operands to instructions into a memory operand on the instruction. This is almost always a good thing, but there are specific relocated loads that are only allowed to appear in specific instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and `addq`. This patch blocks folding of memory operands using this relocation unless the target is in fact `addq`. The particular relocation indicates why we simply don't hit this under normal circumstances. This relocation is only used for TLS, and it gets used in very specific ways in conjunction with %fs-relative addressing. The result is that loads using this relocation are essentially never eligible for folding into an instruction's memory operands. Unless, of course, you have an MI pass that inserts usage of such a load. I have exactly such an MI pass and was greeted by truly mysterious miscompiles where the linker replaced my instruction with a completely garbage byte sequence. Go team. This is the only such relocation I'm aware of in x86, but there may be others that need to be similarly restricted. Fixes PR36165. Differential Revision: https://reviews.llvm.org/D42732 llvm-svn: 324546
* [X86] Prune some unreachable 'return SDValue()' paths from ↵Craig Topper2018-02-071-31/+26
| | | | | | | | LowerSIGN_EXTEND/LowerZERO_EXTEND/LowerANY_EXTEND. We were doing a lot of whitelisting of what we handle in these routines, but setOperationAction constrains what we can get here. So just add some asserts and prune the unreachable paths. llvm-svn: 324538
* [X86] Remove dead code from EmitTest that looked for an i1 type which should ↵Craig Topper2018-02-071-5/+0
| | | | | | have already been type legalized away. NFC llvm-svn: 324536
* [X86] When doing callee save/restore for k-registers make sure we don't use ↵Craig Topper2018-02-072-5/+25
| | | | | | | | | | | | | | KMOVQ on non-BWI targets If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass. Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug. Fixes PR36256 Differential Revision: https://reviews.llvm.org/D42989 llvm-svn: 324533
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-0712-86/+19
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-0712-19/+86
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* AMDGPU: Remove the s_buffer workaround for GFX9 chipsMarek Olsak2018-02-072-11/+2
| | | | | | | | | | | | | | | | | | Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486
* [X86][AVX] Add PACKSSDW/PACKUSDW support for truncation of clamped valuesSimon Pilgrim2018-02-071-4/+6
| | | | | | SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324485
* [mips] Support 'y' operand code to print exact log2 of the operandSimon Atanasyan2018-02-071-0/+7
| | | | llvm-svn: 324477
* [mips] Handle 'M' and 'L' operand codes for memory operandsSimon Atanasyan2018-02-071-6/+16
| | | | | | | | Both operand codes now work the same way in case of register or memory operands. It print high-order or low-order word in a double-word register or memory location. llvm-svn: 324476
* [ARM] FP16 mov imm patternSjoerd Meijer2018-02-071-3/+4
| | | | | | | | | This is a follow up of r324321, adding a match pattern for mov with a FP16 immediate (also fixing operand vfp_f16imm that wasn't even compiling). Differential Revision: https://reviews.llvm.org/D42973 llvm-svn: 324456
* [x86/retpoline] Make the external thunk names exactly match the namesChandler Carruth2018-02-071-15/+44
| | | | | | | | | | | | | | | | | | | that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 llvm-svn: 324449
* AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legalTom Stellard2018-02-071-0/+3
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446
* [AMDGPU] Suppress redundant waitcnt instrs.Mark Searles2018-02-072-19/+39
| | | | | | | | | | | | 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440
* AMDGPU: Select BFI patterns with 64-bit intsMatt Arsenault2018-02-073-6/+46
| | | | llvm-svn: 324431
* [DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into ↵Craig Topper2018-02-061-11/+0
| | | | | | | | | | | | cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together. For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa Differential Revision: https://reviews.llvm.org/D42985 llvm-svn: 324427
* Place undefined globals in .bss instead of .dataEli Friedman2018-02-061-1/+14
| | | | | | | | | | | | | | | | | | | | | | Following up on the discussion from http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef values are now placed in the .bss as well as null values. This prevents undef global values taking up potentially huge amounts of space in the .data section. The following two lines now both generate equivalent .bss data: @vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4 @vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for This is primarily motivated by the corresponding issue in the Rust compiler (https://github.com/rust-lang/rust/issues/41315). Differential Revision: https://reviews.llvm.org/D41705 Patch by varkor! llvm-svn: 324424
* [AArch64] Adjust the cost model for Exynos M3Evandro Menezes2018-02-061-4/+4
| | | | | | | Fix the modeling of long division and SIMD conversion from integer and horizontal minimum and maximum. llvm-svn: 324417
* [Hexagon] Extract HVX lowering and selection into HVX-specific files, NFCKrzysztof Parzyszek2018-02-066-581/+572
| | | | llvm-svn: 324392
* [Hexagon] Lower concat of more than 2 vectors into build_vectorKrzysztof Parzyszek2018-02-062-15/+24
| | | | llvm-svn: 324391
* [AMDGPU] removed dead code handling rmw in memory legalizerStanislav Mekhanoshin2018-02-061-66/+3
| | | | | | | | | It was always using cmpxchg path and in rmw and cmpxchg instructions are not distinguishable in the BE. Differential Revision: https://reviews.llvm.org/D42976 llvm-svn: 324383
* [Hexagon] Don't form new-value jumps from floating-point instructionsKrzysztof Parzyszek2018-02-061-0/+16
| | | | | | | Additionally, verify that the register defined by the producer is a 32-bit register. llvm-svn: 324381
* [ARM] f16 conversionsSjoerd Meijer2018-02-061-16/+23
| | | | | | | | | This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion match patterns. Differential Revision: https://reviews.llvm.org/D42954 llvm-svn: 324360
* [DAG, X86] Improve Dependency analysis when doing multi-nodeNirav Dave2018-02-061-26/+39
| | | | | | | | | | | | | | | | | | | | Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359
* AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALUMarek Olsak2018-02-061-2/+8
| | | | | | | | Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353
* [Hexagon] Remove leftover assertKrzysztof Parzyszek2018-02-061-3/+1
| | | | llvm-svn: 324352
* [Hexagon] Split HVX operations on vector pairsKrzysztof Parzyszek2018-02-064-70/+265
| | | | | | | | Vector pairs are legal types, but not every operation can work on pairs. For those operations that are legal for single vectors, generate a concat of their results on pair halves. llvm-svn: 324350
* [Hexagon] Add helper functions to identify single/pair vector types, NFCKrzysztof Parzyszek2018-02-062-3/+17
| | | | llvm-svn: 324349
* [Hexagon] Handle lowering of SETCC via setCondCodeActionKrzysztof Parzyszek2018-02-064-72/+82
| | | | | | | | | | It was expanded directly into instructions earlier. That was to avoid loads from a constant pool for a vector negation: "xor x, splat(i1 -1)". Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of all true and all false values, and handle setcc with negations through selection patterns. llvm-svn: 324348
* [X86][SSE] Add PACKUS support for truncation of clamped valuesSimon Pilgrim2018-02-061-3/+13
| | | | | | Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324347
* [AMDGPU] do not generate .AMDGPU.config for amdpal os typeTim Renouf2018-02-061-16/+12
| | | | | | | | | | | | | | | Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346
* [AArch64][SVE] Asm: Add AND_ZI instructions and aliasesSander de Smalen2018-02-064-0/+136
| | | | | | | | | | | | | | Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases. Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42295 llvm-svn: 324343
* [X86][SSE] Add PACKSS support for truncation of clamped valuesSimon Pilgrim2018-02-061-2/+8
| | | | | | Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324339
* [PowerPC] fix up in rL324229, NFCHiroshi Inoue2018-02-061-1/+1
| | | | | | This patch fixes up my previous commit (add initialization of local variables). llvm-svn: 324336
OpenPOWER on IntegriCloud