summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targetsSimon Pilgrim2016-09-091-5/+5
| | | | | | Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets llvm-svn: 281105
* [Hexagon] Fix disassembler crash after r279255Krzysztof Parzyszek2016-09-091-0/+3
| | | | | | | When p0 was added as an explicit operand to the duplex subinstructions, the disassembler was not updated to reflect this. llvm-svn: 281104
* [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.Justin Lebar2016-09-092-16/+132
| | | | | | | | | | | | | | | | | | | | Summary: Previously these only worked via NVPTX-specific intrinsics. This change will allow us to convert these target-specific intrinsics into the general LLVM versions, allowing existing LLVM passes to reason about their behavior. It also gets us some minor codegen improvements as-is, from situations where we canonicalize code into one of these llvm intrinsics. Reviewers: majnemer Subscribers: llvm-commits, jholewinski, tra Differential Revision: https://reviews.llvm.org/D24300 llvm-svn: 281092
* ARM: move the builtins libcall CC setupSaleem Abdulrasool2016-09-092-0/+169
| | | | | | | | | Move the target specific setup into the target specific lowering setup. As pointed out by Anton, the initial change was moving this too high up the stack resulting in a violation of the layering (the target generic code path setup target specific bits). Sink this into the ARM specific setup. NFC. llvm-svn: 281088
* AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.Wei Ding2016-09-093-9/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D23700 llvm-svn: 281081
* AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSATom Stellard2016-09-092-1/+6
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 llvm-svn: 281080
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-0914-351/+708
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* [Sparc][LEON] Removed the parts of the errata fixes implemented using inline ↵Chris Dewhurst2016-09-091-76/+0
| | | | | | assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly. llvm-svn: 281047
* [ARM] ADD with a negative offset can become SUB for freeJames Molloy2016-09-091-0/+4
| | | | | | So model that directly in TTI::getIntImmCost(). llvm-svn: 281044
* [ARM] icmp %x, -C can be lowered to a simple ADDS or CMNJames Molloy2016-09-091-0/+11
| | | | | | Tell TargetTransformInfo about this so ConstantHoisting is informed. llvm-svn: 281043
* [Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)James Molloy2016-09-091-0/+42
| | | | | | The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2. llvm-svn: 281040
* GlobalISel: remove G_TYPE and G_PHITim Northover2016-09-091-10/+0
| | | | | | | | These instructions were only necessary when type information was stored in the MachineInstr (because only generic MachineInstrs possessed a type). Now that it's in MachineRegisterInfo, COPY and PHI work fine. llvm-svn: 281037
* GlobalISel: move type information to MachineRegisterInfo.Tim Northover2016-09-093-18/+16
| | | | | | | | | | | | | | | | | We want each register to have a canonical type, which means the best place to store this is in MachineRegisterInfo rather than on every MachineInstr that happens to use or define that register. Most changes following from this are pretty simple (you need an MRI anyway if you're going to be doing any transformations, so just check the type there). But legalization doesn't really want to check redundant operands (when, for example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's operand type field to encode these constraints and limit legalization's work. As an added bonus, more validation is possible, both in MachineVerifier and MachineIRBuilder (coming soon). llvm-svn: 281035
* Revert "[mips] Fix c.<cc>.<fmt> instruction definition."Simon Dardis2016-09-0915-539/+209
| | | | | | | This reverts commit r281022. Mips buildbot broke, due to unhandled register class FCC. llvm-svn: 281033
* [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to specSam Kolton2016-09-093-242/+85
| | | | | | | | | | | | | | Summary: Also removed duplicate code from AMDGPUTargetAsmStreamer. This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D24296 llvm-svn: 281028
* [Thumb1] Teach optimizeCompareInstr about thumb1 comparesJames Molloy2016-09-091-4/+21
| | | | | | | | This avoids us doing a completely unneeded "cmp r0, #0" after a flag-setting instruction if we only care about the Z or C flags. Add LSL/LSR to the whitelist while we're here and add testing. This code could really do with a spring clean. llvm-svn: 281027
* [AMDGPU] Assembler: match e32 VOP instructions before e64.Sam Kolton2016-09-097-32/+126
| | | | | | | | | | | | | | | | | | | Summary: Split assembler match table in 4 tables with assembler variants: Default - all instructions except VOP3, SDWA and DPP - VOP3 - SDWA - DPP First match Default table then VOP3, SDWA and DPP. Reviewers: tstellarAMD, artem.tamazov, vpykhtin Subscribers: arsenm, wdng, nhaehnle, AMDGPU Differential Revision: https://reviews.llvm.org/D24252 llvm-svn: 281023
* [mips] Fix c.<cc>.<fmt> instruction definition.Simon Dardis2016-09-0915-209/+539
| | | | | | | | | | | | | | | As part of this effort, remove MipsFCmp nodes and use tablegen patterns rather than custom lowering through C++. Unexpectedly, this improves codesize for microMIPS as previous floating point setcc expansions would materialize 0 and 1 into GPRs before using the relevant mov[tf].[sd] instruction. Now $zero is used directly. Reviewers: dsanders, vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D23118 llvm-svn: 281022
* [AVX-512] Add VPCMP instructions to the load folding tables and make them ↵Craig Topper2016-09-092-1/+57
| | | | | | commutable. llvm-svn: 281013
* [X86] Tighten up a comment which confused x64 ABI terminology.David Majnemer2016-09-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | The x64 ABI has two major function types: - frame functions - leaf functions A frame function is one which requires a stack frame. A leaf function is one which does not. A frame function may or may not have a frame pointer. A leaf function does not require a stack frame and may never modify SP except via a return (RET, tail call via JMP). A frame function which has a frame pointer is permitted to use the LEA instruction in the epilogue, a frame function without which doesn't establish a frame pointer must use ADD to adjust the stack pointer epilogue. Fun fact: Leaf functions don't require a function table entry (associated PDATA/XDATA). llvm-svn: 281006
* Win64: Don't use REX prefix for direct tail callsHans Wennborg2016-09-085-8/+4
| | | | | | | | | | The REX prefix should be used on indirect jmps, but not direct ones. For direct jumps, the unwinder looks at the offset to determine if it's inside the current function. Differential Revision: https://reviews.llvm.org/D24359 llvm-svn: 281003
* [RDF] Further improve handling of multiple phis reached from shadowsKrzysztof Parzyszek2016-09-081-31/+16
| | | | llvm-svn: 280987
* AMDGPU: Sign extend constants when splitting themMatt Arsenault2016-09-081-3/+2
| | | | | | | This will confuse later passes which try to look at the immediate value and don't truncate first. llvm-svn: 280974
* [Hexagon] Expand sext- and zextloads of vector types, not just extloadsKrzysztof Parzyszek2016-09-081-1/+5
| | | | | | Recent change exposed this issue, breaking the Hexagon buildbots. llvm-svn: 280973
* AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32Matt Arsenault2016-09-081-9/+14
| | | | llvm-svn: 280972
* AArch64 .arch directive - Include default arch attributes with extensions.Eric Christopher2016-09-081-3/+39
| | | | | | | | Fix the .arch asm parser to use the full set of features for the architecture and any extensions on the command line. Add and update testcases accordingly as well as add an extension that was used but not supported. llvm-svn: 280971
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-082-98/+81
| | | | llvm-svn: 280970
* Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"Renato Golin2016-09-089-121/+33
| | | | | | | | | | And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
* [ARM XRay] Try to fix Thumb-only failureRenato Golin2016-09-081-1/+1
| | | | | | | I mised the check that it had to support ARM to work. This commit tries to fix that, to make sure we don't emit ARM code in Thumb-only mode. llvm-svn: 280935
* [Thumb1] AND with a constant operand can be converted into BICJames Molloy2016-09-081-0/+4
| | | | | | | So model the cost of materializing the constant operand C as the minimum of C and ~C. llvm-svn: 280929
* [Thumb1] Fix cost calculation for complemented immediatesJames Molloy2016-09-081-1/+1
| | | | | | | | | | | | Materializing something like "-3" can be done as 2 instructions: MOV r0, #3 MVN r0, r0 This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256. There were no tests failing after changing this... :/ llvm-svn: 280928
* Revert "[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)"Pablo Barrio2016-09-081-18/+1
| | | | | | | | | | This reverts commit r280808. It is possible that this change results in an infinite loop. This is causing timeouts in some tests on ARM, and a Chromebook bot is failing. llvm-svn: 280918
* [mips][microMIPS] Implement DBITSWAP, DLSA and LWUPC and add tests for AUI ↵Hrvoje Varga2016-09-088-12/+126
| | | | | | | | instructions Differential Revision: https://reviews.llvm.org/D16452 llvm-svn: 280909
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-089-33/+121
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
* [RDF] Fix liveness analysis for phi nodes with shadow usesKrzysztof Parzyszek2016-09-072-37/+82
| | | | | | | | Shadow uses need to be analyzed together, since each individual shadow will only have a partial reaching def. All shadows together may cover a given register ref, while each individual shadow may not. llvm-svn: 280855
* [RDF] Introduce "undef" flag for ref nodesKrzysztof Parzyszek2016-09-073-24/+74
| | | | llvm-svn: 280851
* AMDGPU: Remove a useless variable which caused build failure for lld.Yaxun Liu2016-09-071-1/+1
| | | | llvm-svn: 280841
* Don't reduce the width of vector mul if the target doesn't support SSE2.Wei Mi2016-09-071-1/+2
| | | | | | | | | The patch is to fix PR30298, which is caused by rL272694. The solution is to bail out if the target has no SSE2. Differential Revision: https://reviews.llvm.org/D24288 llvm-svn: 280837
* X86: Fold tail calls into conditional branches where possible (PR26302)Hans Wennborg2016-09-075-17/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832
* AMDGPU: Add hidden kernel arguments to runtime metadataYaxun Liu2016-09-072-83/+157
| | | | | | | | OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829
* [x86] move combines of 'select of 2 constants' to its own function; NFCSanjay Patel2016-09-071-92/+103
| | | | | | There are missing folds here and possibly folds that could be made generic. llvm-svn: 280817
* [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)Pablo Barrio2016-09-071-1/+18
| | | | | | | | | | | | | | | Summary: This saves a library call to __aeabi_uidivmod. However, the processor must feature hardware division in order to benefit from the transformation. Reviewers: scott-0, jmolloy, compnerd, rengolin Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D24133 llvm-svn: 280808
* [mips] Disable the TImode shift libcalls for 32-bit targets.Vasileios Kalintiris2016-09-071-0/+7
| | | | | | | | | | | | | | Summary: The o32 ABI doesn't not support the TImode helpers. For the time being, disable just the shift libcalls as they break recursive builds on MIPS. Reviewers: sdardis Subscribers: llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D24259 llvm-svn: 280798
* [PowerPC] Fix address-offset folding for plain addiHal Finkel2016-09-071-15/+38
| | | | | | | | | | | | | | | | | | | | | | | When folding an addi into a memory access that can take an immediate offset, we were implicitly assuming that the existing offset was zero. This was incorrect. If we're dealing with an addi with a plain constant, we can add it to the existing offset (assuming that doesn't overflow the immediate, etc.), but if we have anything else (i.e. something that will become a relocation expression), we'll go back to requiring the existing immediate offset to be zero (because we don't know what the requirements on that relocation expression might be - e.g. maybe it is paired with some addis in some relevant way). On the other hand, when dealing with a plain addi with a regular constant immediate, the alignment restrictions (from the TOC base pointer, etc.) are irrelevant. I've added the test case from PR30280, which demonstrated the bug, but also demonstrates a missed optimization opportunity (i.e. we don't need the memory accesses at all). Fixes PR30280. llvm-svn: 280789
* AVX512F: FMA intrinsic + FNEG - sequence optimizationElena Demikhovsky2016-09-071-90/+102
| | | | | | | | | | | The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set. FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))). It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead. I added pattern match for integer XOR. Differential Revision: https://reviews.llvm.org/D24221 llvm-svn: 280785
* AMDGPU: Make some scalar instructions commutableMatt Arsenault2016-09-071-2/+9
| | | | llvm-svn: 280784
* Remove unnecessary call to getAllocatableRegClassMatt Arsenault2016-09-072-9/+4
| | | | | | | | This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783
* [X86] Add hasSideEffects=0 to some instructions.Craig Topper2016-09-072-3/+5
| | | | llvm-svn: 280782
* [AVX-512] Add support for commuting masked instructions in ↵Craig Topper2016-09-071-1/+23
| | | | | | findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-0614-196/+555
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
OpenPOWER on IntegriCloud