summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [Hexagon] Check vector elements for equivalence in the ↵Ron Lieberman2017-10-021-1/+16
| | | | | | | | | | | | | HexagonVectorLoopCarriedReuse pass If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 314642
* [Hexagon] Patch to Extract i1 element from vector of i1Ron Lieberman2017-10-021-1/+7
| | | | | | | This patch extracts 1 element from vector consisting of elements of size 1 bit at given index. llvm-svn: 314641
* [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMemCraig Topper2017-10-018-34/+33
| | | | | | | | | | | | | | | | | | | Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
* [X86] Remove a couple unnecessary COPY_TO_REGCLASS from some output patterns ↵Craig Topper2017-10-011-9/+7
| | | | | | where the instruction already produces the correct register class. llvm-svn: 314638
* [X86][SSE] Add faux shuffle combining support for PACKUSSimon Pilgrim2017-10-011-4/+15
| | | | llvm-svn: 314631
* [X86][SSE] Improve shuffle combining of PACKSS instructions.Simon Pilgrim2017-10-011-6/+24
| | | | | | Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits. llvm-svn: 314629
* [x86] formatting; NFCSanjay Patel2017-10-011-4/+2
| | | | llvm-svn: 314627
* [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1Simon Pilgrim2017-09-301-0/+9
| | | | | | Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599
* [AVX-512] Add patterns to make fp compare instructions commutable during isel.Craig Topper2017-09-302-1/+90
| | | | llvm-svn: 314598
* Code refactoring for the interleaved code <NFC>Michael Zuckerman2017-09-301-28/+18
| | | | | Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596
* [X86] Support v64i8 mulhu/mulhsCraig Topper2017-09-301-1/+9
| | | | | | | | Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584
* [AMDGPU] Set fast-math flags on functions given the optionsStanislav Mekhanoshin2017-09-293-7/+36
| | | | | | | | | | | | | | | | We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568
* AMDGPU: VALU carry-in and v_cndmask condition cannot be EXECNicolai Haehnle2017-09-295-11/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522
* [SystemZ] implement shouldCoalesce()Jonas Paulsson2017-09-296-4/+90
| | | | | | | | | | | | | | | | | | | Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516
* [X86] Improve codegen for inverted overflow checking intrinsics.Amara Emerson2017-09-291-0/+20
| | | | | | | | Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514
* [ARM] v8.3-a complex number supportSam Parker2017-09-297-2/+298
| | | | | | | | | | | | | | | New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element. This patch adds assembler for the ARM target. Differential Revision: https://reviews.llvm.org/D36789 llvm-svn: 314511
* Small modification <NFC>Michael Zuckerman2017-09-291-1/+1
| | | | | Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13 llvm-svn: 314510
* [mips] Reordering callseq* nodes to be linearAleksandar Beserminji2017-09-292-26/+27
| | | | | | | | | | | | | Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Recommitting r314497. This version does not contain test which fails when compiler is not build in debug mode. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314507
* Revert "[mips] Reordering callseq* nodes to be linear"Aleksandar Beserminji2017-09-292-27/+26
| | | | | | | | | Added test relies on the compiler being built in debug mode, which may not be the case. This reverts commit r314497. llvm-svn: 314506
* [mips] Add missing license info, formatting changes. NFCISimon Dardis2017-09-291-30/+47
| | | | | | | | Add missing license information to MicroMipsInstrFPU.td and fix most of the formatting errors present. Others will be addressed in a follow up commits. llvm-svn: 314505
* [AMDGPU] calling conventions for AMDPAL OS typeTim Renouf2017-09-296-2/+21
| | | | | | | | | | | | | | | Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-295-12/+95
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* [Triple] Add AMDPAL operating system typeTim Renouf2017-09-291-0/+4
| | | | | | | | | | | | | | | | | | Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500
* [mips] Reordering callseq* nodes to be linearAleksandar Beserminji2017-09-292-26/+27
| | | | | | | | | | Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314497
* [X86][MS-InlineAsm] Extended support for variables / identifiers on memory / ↵Coby Tayree2017-09-291-60/+90
| | | | | | | | | | | immediate expressions Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized. supersedes D33278, D35774 Differential Revision: https://reviews.llvm.org/D37412 llvm-svn: 314493
* [X86] Don't select (cmp (and, imm), 0) to testwCraig Topper2017-09-281-1/+4
| | | | | | | | | | | | | | | | | Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474
* ARM: Fix cases where CSI Restored bit is not clearedMatthias Braun2017-09-283-9/+19
| | | | | | | | | | LR is an untypical callee saved register in that it is restored into a different register (PC) and thus does not live-out of the return block. This case requires the `Restored` flag in CalleeSavedInfo to be cleared. This fixes a number of cases where this wasn't handled correctly yet. llvm-svn: 314471
* bpf: fix a bug for disassembling ld_pseudo instYonghong Song2017-09-281-1/+2
| | | | | Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 314469
* [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-09-2814-313/+398
| | | | | | other minor fixes (NFC). llvm-svn: 314467
* [SystemZ] Fix fall-out from r314428Ulrich Weigand2017-09-281-0/+6
| | | | | | | | | | | The expensive-checks build bot found a problem with the r314428 commit: if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be marked as live-in to the block after the loop the pseudo gets expanded to. This actually fixes a code-gen bug as well, since if the CC isn't live, the CR and JLH are merged to a CRJLH which doesn't actually set the condition code any more. llvm-svn: 314465
* [X86] Make use of vpmovwb when possible in LowerMULHCraig Topper2017-09-281-15/+8
| | | | | | | | If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract. Differential Revision: https://reviews.llvm.org/D38375 llvm-svn: 314457
* [ARM] Restore the right frame pointer register in Int_eh_sjlj_longjmpMartin Storsjo2017-09-281-14/+58
| | | | | | | | | | | | | | | | | | | | | | In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp, we fetch and store the actual frame pointer, but on return via the longjmp intrinsic, it always was restored into the r7 variable. On windows, the frame pointer should be restored into r11 instead of r7. On Darwin (where sjlj exception handling is used by default), the frame pointer is always r7, both in arm and thumb mode, and likewise, on windows, the frame pointer always is r11. On linux however, if sjlj exception handling is enabled (which it isn't by default), libcxxabi and the user code can be built in differing modes using different registers as frame pointer. Therefore, when restoring registers on a platform where we don't always use the same register depending on code mode, restore both r7 and r11. Differential Revision: https://reviews.llvm.org/D38253 llvm-svn: 314451
* [ARM] Fix SJLJ exception handling when manually chosen on a platform where ↵Martin Storsjo2017-09-281-1/+3
| | | | | | | | it isn't default Differential Revision: https://reviews.llvm.org/D38252 llvm-svn: 314450
* [X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in ↵Craig Topper2017-09-281-9/+10
| | | | | | | | LowerMULH We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary. llvm-svn: 314447
* [X86] Move a setOperation action for ISD::TRUNCATE near another one in the ↵Craig Topper2017-09-281-2/+1
| | | | | | same if. Remove one that is redundant with another subtarget features. llvm-svn: 314446
* [X86] Use BWI instructions to improve lowering of v32i8 MULHU/SCraig Topper2017-09-281-0/+18
| | | | | | | | | | | | | | Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38305 llvm-svn: 314432
* [X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handlingCraig Topper2017-09-281-1/+1
| | | | | | | | | | | | | | | | | Summary: Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash. DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38276 llvm-svn: 314430
* [X86] Use correct subvector index when combining two insert subvectors ↵Craig Topper2017-09-281-1/+1
| | | | | | | | | | featuring zero vectors. Previously we were using one of the subvector indices twice. The included test case causes an assert without this change. Thanks to Simon Pilgrim for catching this. llvm-svn: 314429
* [SystemZ] Custom-expand ATOMIC_CMP_AND_SWAP_WITH_SUCCESSUlrich Weigand2017-09-284-22/+67
| | | | | | | | The SystemZ compare-and-swap instructions already provide the "success" indication via a condition-code value, so the default expansion of those operations generates an unnecessary extra comparsion. llvm-svn: 314428
* Use SDValue::getConstantOperandVal helper. NFCI.Simon Pilgrim2017-09-281-3/+3
| | | | llvm-svn: 314425
* [mips] Remove codegen support for branch likely instructions.Simon Dardis2017-09-282-18/+49
| | | | | | | | | | | | | | | | | | This patch disables codegen support for branch likely instructions to address a potential bug. These branches were unselectable as they had the same patterns as the normal branches but came after them when ISel was concerned. The branch likely instructions were marked as having no delay slots when they have annulling delay slots. The delay slot filler does not currently handle annulling delay slot branches, so this would lead to wrong codegen if these branches were generated. Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D38169 llvm-svn: 314421
* [x86][AsmParser] Allow some more MS size directivesCoby Tayree2017-09-281-0/+3
| | | | | | | MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively. Differential Revision: https://reviews.llvm.org/D37190 llvm-svn: 314410
* Teach TargetInstrInfo::getInlineAsmLength to parse .space directives with ↵Alex Bradbury2017-09-282-45/+0
| | | | | | | | | | | | | | | | | | | | | | | | integer arguments It's currently quite difficult to test passes like branch relaxation, which requires branches with large displacement to be generated. The .space assembler directive makes it easy to create arbitrarily large basic blocks, but getInlineAsmLength is not able to parse it and so the size of the block is not correctly estimated. Other backends (AArch64, AMDGPU) introduce options just for testing that artificially restrict the ranges of branch instructions (e.g. aarch64-tbz-offset-bits). Although parsing a single form of the .space directive feels inelegant, it does allow a more direct testing approach. This patch adapts the .space parsing code from Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality is provided by the base implementation. I want to move this functionality to the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel other backends will benefit from more direct testing of large branch displacements. Differential Revision: https://reviews.llvm.org/D37798 llvm-svn: 314393
* [PowerPC] eliminate partially redundant compare instructionHiroshi Inoue2017-09-281-14/+180
| | | | | | | | | | | | | | | | | | This is a follow-on of D37211. D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g. if (a == 0) { ... } else if (a < 0) { ... } This patch extends this optimization to support partially redundant cases, which often happen in while loops. For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example. do { if (a == 0) dummy1(); a = func(a); } while (a > 0); Differential Revision: https://reviews.llvm.org/D38236 llvm-svn: 314390
* [RISCV] Add common fixups and relocationsAlex Bradbury2017-09-2811-39/+593
| | | | | | | | | | | | | %lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to ensure the appropriate fixups and relocations are generated. I've added an instruction format field which is used in RISCVMCCodeEmitter to, for instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup (RISC-V has two 12-bit immediate encodings depending on the instruction type). Differential Revision: https://reviews.llvm.org/D23568 llvm-svn: 314389
* bpf: add new insns for bswap_to_le and negationYonghong Song2017-09-283-14/+70
| | | | | | | | | | | | | | | | This patch adds new insn, "reg = be16/be32/be64 reg", for bswap to little endian for big-endian target (bpfeb). It also adds new insn for negation "reg = -reg". Currently, for source code, e.g., b = -a LLVM still prefers to generate: b = 0 - a But "reg = -reg" format can be used in assembly code. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 314376
* Reverted r313993.Galina Kistanova2017-09-271-15/+0
| | | | | | This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot). llvm-svn: 314361
* [MachineOutliner] AArch64: Avoid saving + restoring LR if possibleJessica Paquette2017-09-274-82/+204
| | | | | | | | | | | | | | | | This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341
* Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly ↵Craig Topper2017-09-274-37/+27
| | | | | | | | instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339
* Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from ↵Craig Topper2017-09-271-5/+7
| | | | | | | | X86InstrInfo::copyPhysReg." This contributed to PR34751 llvm-svn: 314338
OpenPOWER on IntegriCloud