summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Implement bit-tracking facility with specifics for HexagonKrzysztof Parzyszek2015-07-075-0/+2832
| | | | | | | | This includes code that is intended to be target-independent as well as the Hexagon-specific details. This is just the framework without any users. llvm-svn: 241595
* use range-based for loops; NFCISanjay Patel2015-07-071-35/+17
| | | | llvm-svn: 241592
* Fix gcc warnings of different enum and non-enum types in ternariesDenis Protivensky2015-07-072-9/+9
| | | | llvm-svn: 241567
* [ARM] Define a subtarget feature and use it to decide whether long calls shouldAkira Hatanaka2015-07-075-12/+14
| | | | | | | | | | | | | | | | | be emitted. This is needed to enable ARM long calls for LTO and enable and disable it on a per-function basis. Out-of-tree projects currently using EnableARMLongCalls to emit long calls should start passing "+long-calls" to the feature string (see the changes made to clang in r241565). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D9364 llvm-svn: 241566
* [X86][AVX] Add support for shuffle decoding of vperm2f128/vperm2i128 with ↵Simon Pilgrim2015-07-062-5/+8
| | | | | | | | | | | | zero'd lanes The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments. As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well. Differential Revision: http://reviews.llvm.org/D10593 llvm-svn: 241516
* [x86] extend machine combiner reassociation optimization to SSE scalar addsSanjay Patel2015-07-061-13/+19
| | | | | | | | | | | | Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460) to SSE scalar FP SP adds in addition to AVX scalar FP SP adds. With the 'switch' in place, we can trivially add other opcodes and test cases in future patches. Differential Revision: http://reviews.llvm.org/D10975 llvm-svn: 241515
* [X86][SSE] Vectorized i64 uniform constant SRA shiftsSimon Pilgrim2015-07-062-2/+51
| | | | | | | | This patch adds vectorization support for uniform constant i64 arithmetic shift right operators. Differential Revision: http://reviews.llvm.org/D9645 llvm-svn: 241514
* WebAssembly: add some TODOJF Bastien2015-07-061-0/+11
| | | | | | | | | | Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D10971 llvm-svn: 241513
* [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructionsSimon Pilgrim2015-07-068-6/+300
| | | | | | | | | | | | This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it. As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions. From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level. Differential Revision: http://reviews.llvm.org/D10146 llvm-svn: 241508
* [X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 ↵Simon Pilgrim2015-07-066-138/+177
| | | | | | | | | | | | implementation With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations. This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code. Differential Revision: http://reviews.llvm.org/D10947 llvm-svn: 241506
* llc: Add a 'run-pass' option.Alex Lorenz2015-07-062-3/+4
| | | | | | | | | | | | | | | This commit adds a 'run-pass' option to llc, which instructs the compiler to run one specific code generation pass only. Llc already has the 'start-after' and the 'stop-after' options, and this new option complements the other two by making it easier to write tests that want to invoke a single pass only. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D10776 llvm-svn: 241476
* AMDGPU: Run SIInsertWaits as pre-emit passMatt Arsenault2015-07-061-1/+1
| | | | | | | | | | | Running this after the scheduler enables scheduling waits later so other ALU instructions can run while this would be waiting. When combined with enabling the post-RA scheduler, this gives about a ~20% improvement on sgemm. llvm-svn: 241473
* Change the last few internal StringRef triples into Triple objects.Daniel Sanders2015-07-0614-58/+59
| | | | | | | | | | | | | | | | | | | | Summary: This concludes the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. At this point, the StringRef-form of GNU Triples should only be used in the public API (including IR serialization) and a couple objects that directly interact with the API (most notably the Module class). The next step is to replace these Triple objects with the TargetTuple object that will represent our authoratative/unambiguous internal equivalent to GNU Triples. Reviewers: rengolin Subscribers: llvm-commits, jholewinski, ted, rengolin Differential Revision: http://reviews.llvm.org/D10962 llvm-svn: 241472
* Where Triple has a suitable predicate, use it rather than the enum values. NFC.Daniel Sanders2015-07-065-9/+7
| | | | | | | | | | Reviewers: mcrosier Subscribers: llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10960 llvm-svn: 241469
* AMDGPU/SI: Add debugging subtarget feature for DS offsetsMatt Arsenault2015-07-064-1/+18
| | | | | | | | We don't have a good way to detect most situations where DS offsets are usable on SI, so add an option to force using them even if unsafe for debugging performance problems. llvm-svn: 241462
* [Sparc] Add more instruction aliases.James Y Knight2015-07-062-12/+125
| | | | | | | | | These are mostly from the chart in the SparcV8 spec, section "A.3 Synthetic Instructions". Differential Revision: http://reviews.llvm.org/D9834 llvm-svn: 241461
* [Sparc] Add support for flush instruction.James Y Knight2015-07-062-0/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D9833 llvm-svn: 241460
* Fix a bug in the A57FPLoadBalancing register tracking/scavenger.Chad Rosier2015-07-061-3/+11
| | | | | | | | | | | | The code in AArch64A57FPLoadBalancing::scavengeRegister() to handle dead defs was not correctly handling aliased registers. E.g. if the dead def was of D2, then S2 was not being marked as unavailable, so it could potentially be used across a live-range in which it would be clobbered. Patch by Geoff Berry <gberry@codeaurora.org>! Phabricator: http://reviews.llvm.org/D10900 llvm-svn: 241449
* [X86][AVX512] Multiply Packed Unsigned Integers with Round and ScaleAsaf Badouh2015-07-065-0/+9
| | | | | | | | | pmulhrsw review: http://reviews.llvm.org/D10948 llvm-svn: 241443
* IR: Do not consider available_externally linkage to be linker-weak.Peter Collingbourne2015-07-0510-49/+32
| | | | | | | | | | | | | | | From the linker's perspective, an available_externally global is equivalent to an external declaration (per isDeclarationForLinker()), so it is incorrect to consider it to be a weak definition. Also clean up some logic in the dead argument elimination pass and clarify its comments to better explain how its behavior depends on linkage, introduce GlobalValue::isStrongDefinitionForLinker() and start using it throughout the optimizers and backend. Differential Revision: http://reviews.llvm.org/D10941 llvm-svn: 241413
* [TargetLowering] StringRefize asm constraint getters.Benjamin Kramer2015-07-0524-103/+75
| | | | | | | | There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411
* [x86][AVX512] add Multiply High OpAsaf Badouh2015-07-053-0/+12
| | | | | | | | | include encoding and intrinsics tests. review http://reviews.llvm.org/D10896 llvm-svn: 241406
* [X86] Fix incorrect/inefficient pushw encodings for x86-64 targetsMichael Kuperstein2015-07-052-9/+4
| | | | | | | | | | | | | Correctly support assembling "pushw $imm8" on x86-64 targets. Also some cleanup of the PUSH instructions (PUSH64i16 and PUSHi16 actually represent the same instruction) This fixes PR23996 Patch by: david.l.kreitzer@intel.com Differential Revision: http://reviews.llvm.org/D10878 llvm-svn: 241404
* Add missing builtins to the PPC back end for ABI compliance (vol. 2)Nemanja Ivanovic2015-07-051-0/+6
| | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D10874 Back end portion of the second round of additions to altivec.h. llvm-svn: 241398
* [X86][SSE] Improved i8/i16 to f64 uint2fp vector conversionsSimon Pilgrim2015-07-041-0/+27
| | | | | | Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp). llvm-svn: 241394
* [X86] Add proper 64-bit mode checks to jrcxz and jcxz.Craig Topper2015-07-041-2/+4
| | | | llvm-svn: 241381
* AMDGPU: Fix indentation of switchMatt Arsenault2015-07-031-11/+12
| | | | llvm-svn: 241380
* Return ErrorOr from getSymbolAddress.Rafael Espindola2015-07-031-2/+4
| | | | | | | It can fail trying to get the section on ELF and COFF. This makes sure the error is handled. llvm-svn: 241366
* Replace a few more MachO only uses of getSymbolAddress.Rafael Espindola2015-07-031-3/+2
| | | | llvm-svn: 241365
* [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt2)Simon Pilgrim2015-07-031-7/+9
| | | | | | | | Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64. Differential Revision: http://reviews.llvm.org/D10589 llvm-svn: 241325
* [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt1)Simon Pilgrim2015-07-031-6/+20
| | | | | | | | This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector. Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64. llvm-svn: 241323
* [WebAssembly] Set the HasFloatingPointExceptions flag for WebAssembly.Dan Gohman2015-07-021-1/+5
| | | | llvm-svn: 241302
* Return ErrorOr from SymbolRef::getName.Rafael Espindola2015-07-022-5/+13
| | | | | | | | | | | | This function can really fail since the string table offset can be out of bounds. Using ErrorOr makes sure the error is checked. Hopefully a lot of the boilerplate code in tools/* can go away once we have a diagnostic manager in Object. llvm-svn: 241297
* [PPC64LE] Remove implicit-subreg restriction from VSX swap removalBill Schmidt2015-07-021-26/+6
| | | | | | | | | | | | | | | | | In r241285, I removed the SUBREG_TO_REG restriction from VSX swap removal, determining that this was overly conservative. We have another form of the same restriction in that we check for the presence of implicit subregs in vector operations. As with SUBREG_TO_REG for partial register conversions, an implicit subreg is safe in and of itself, provided no other operation makes a lane-sensitive assumption about the result. This patch removes that restriction, by removing the HasImplicitSubreg flag and all code that relies on it. I've added a test case that fails to optimize before this patch is applied, and optimizes properly with the patch. Test based on a report from Anton Blanchard. llvm-svn: 241290
* [PPC64LE] Teach swap optimization about the doubleword splat idiomBill Schmidt2015-07-021-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285
* Implement TargetTransformInfo::hasCompatibleFunctionAttributes for X86.Eric Christopher2015-07-022-0/+17
| | | | | | | | | | | | | | | | | | | This checks subtarget feature compatibility for inlining by verifying that the callee is a strict subset of the caller's features. This includes the cpu as part of the subtarget we can get via the incoming functions as the backend takes CPUs as feature sets. This allows us to inline things like: int foo() { return baz(); } int __attribute__((target("sse4.2"))) bar() { return foo(); } so that generic code can be inlined into specialized functions. llvm-svn: 241221
* WebAssembly: start instructionsJF Bastien2015-07-018-13/+31
| | | | | | | | | | | | | | | | | | | | Summary: * Add 64-bit address space feature. * Rename SIMD feature to SIMD128. * Handle single-thread model with an IR pass (same way ARM does). * Rename generic processor to MVP, to follow design's lead. * Add bleeding-edge processors, with all features included. * Fix a few DEBUG_TYPE to match other backends. Test Plan: ninja check Reviewers: sunfish Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D10880 llvm-svn: 241211
* [WebAssembly] Define separate Target instances for 32-bit and 64-bit.Dan Gohman2015-07-014-6/+9
| | | | llvm-svn: 241193
* [NVPTX] expand extload/truncstore for vectors of floatsJingyue Wu2015-07-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to PTX ISA: For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type. So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of. As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like ld.v2.f32 {%fd3, %fd4}, [%rd2] This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated. Patched by Gang Hu. Test Plan: Test case attached. Reviewers: jingyue Reviewed By: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D10876 llvm-svn: 241191
* [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPassJingyue Wu2015-07-012-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass. Patched by Xuetian Weng. Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10853 llvm-svn: 241185
* [PPC64LE] Enable missing lxvdsx optimization, and related swap optimizationBill Schmidt2015-07-012-13/+11
| | | | | | | | | | | | | | | | | | | | | | | When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183
* fix formatting; NFCSanjay Patel2015-07-011-2/+2
| | | | llvm-svn: 241175
* fix typos in comment; NFCSanjay Patel2015-07-011-3/+2
| | | | llvm-svn: 241174
* [SEH] Don't assert if the parent function lacks a personalityReid Kleckner2015-07-011-0/+6
| | | | | | | | The EH code might have been deleted as unreachable and the personality pruned while the filter is still present. Currently I'm hitting this at -O0 due to the clang bug PR24009. llvm-svn: 241170
* [AArch64] Implement add/adds/sub/subs/cmp/cmn with negative immediate aliasesArnaud A. de Grandmaison2015-07-013-10/+78
| | | | | | | | | | | | | | | | | | | | | | | This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn with a negative immediate operand and convert them as shown: add Rd, Rn, -imm -> sub Rd, Rn, imm sub Rd, Rn, -imm -> add Rd, Rn, imm adds Rd, Rn, -imm -> subs Rd, Rn, imm subs Rd, Rn, -imm -> adds Rd, Rn, imm cmp Rn, -imm -> cmn Rn, imm cmn Rn, -imm -> cmp Rn, imm Those instructions are an alternate syntax available to assembly coders, and are needed in order to support code already compiling with some other assemblers (gas). They are documented in the "ARMv8 Instruction Set Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc a programmer-friendly assembler ! This also fixes PR20978: "Assembly handling of adding negative numbers not as smart as gas". llvm-svn: 241166
* [Sparc] Rearrange SparcInstrInfo, no change.James Y Knight2015-07-011-68/+80
| | | | | | | | | Move some instructions into order of sections in the spec, as the rest already were. Differential Revision: http://reviews.llvm.org/D9102 llvm-svn: 241163
* AVX-512: Implemented missing encoding for FMA scalar instructionsIgor Breger2015-07-011-34/+95
| | | | | | | | Added tests for encoding Differential Revision: http://reviews.llvm.org/D10865 llvm-svn: 241159
* [X86] Avoid over-relaxation of 8-bit immediates in integer arithmetic ↵Michael Kuperstein2015-07-011-24/+6
| | | | | | | | | | | | | | | | | | instructions. Only consider an instruction a candidate for relaxation if the last operand of the instruction is an expression. We previously checked whether any operand is an expression, which is useless, since for all instructions concerned, the only operand that may be affected by relaxation is the last one. In addition, this removes the check for having RIP as an argument, since it was plain wrong - even when one of the arguments is RIP, relaxation may still be needed. This fixes PR9807. Patch by: david.l.kreitzer@intel.com Differential Revision: http://reviews.llvm.org/D10766 llvm-svn: 241152
* [mips][microMIPS] Implement SLL and NOP instructionsZoran Jovanovic2015-07-013-0/+21
| | | | | | http://reviews.llvm.org/D10474 llvm-svn: 241150
* [SEH] Add new intrinsics for recovering and restoring parent framesReid Kleckner2015-06-302-33/+88
| | | | | | | | | | | | | | | | | | | | | | | | | The incoming EBP value established by the runtime is actually a pointer to the end of the EH registration object, and not the true parent function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo anymore because we know that the exception info pointer is at a fixed offset from this incoming EBP. The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the EH runtime and returns a pointer that is usable with llvm.framerecover. The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit specific preparation pass in blocks targetted by the EH runtime. It re-establishes any physical registers used by the parent function to address the stack, such as the frame, base, and stack pointers. Neither of these intrinsics correctly handle stack realignment prologues yet, but it's possible to add that later. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D10848 llvm-svn: 241125
OpenPOWER on IntegriCloud