summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.Robert Khasanov2014-10-281-28/+44
| | | | | | Refactored through AVX512_maskable llvm-svn: 220806
* [AVX-512] Expanded rsqrt/rcp instructions to VL subset.Robert Khasanov2014-10-281-20/+47
| | | | | | Refactored multiclass through AVX512_maskable llvm-svn: 220783
* [AVX512] Removed special case for cmp instructions in getVectorMaskingNode. ↵Robert Khasanov2014-10-281-15/+4
| | | | | | | | Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine. No functional change. llvm-svn: 220779
* [x86] Simplify vector selection if condition value type matches vselect ↵Robert Khasanov2014-10-281-10/+10
| | | | | | | | | value type and true value is all ones or false value is all zeros. This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case. Added tests for vselect of <X x i1> values. llvm-svn: 220777
* [AVX512] Bring back vector-shuffle lowering support through broadcastsRobert Khasanov2014-10-282-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774
* X86: Implement the vectorcall calling conventionReid Kleckner2014-10-282-0/+77
| | | | | | | | | | | | | | | | | | | | This is a Microsoft calling convention that supports both x86 and x86_64 subtargets. It passes vector and floating point arguments in XMM0-XMM5, and passes them indirectly once they are consumed. Homogenous vector aggregates of up to four elements can be passed in sequential vector registers, but this part is not implemented in LLVM and will be handled in Clang. On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as integer register parameters and is callee cleanup. On x86_64, it delegates to the normal win64 calling convention. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D5943 llvm-svn: 220745
* AArch64: enable Cortex-A57 FP balancing on Cortex-A53.Tim Northover2014-10-281-1/+2
| | | | | | | | | | Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744
* AArch64InstrInfo.h: Fix a warning introduced in clang r220703. ↵NAKAMURA Takumi2014-10-271-1/+1
| | | | | | [-Winconsistent-missing-override] llvm-svn: 220739
* [AVX512] Add vpermil variable versionAdam Nemet2014-10-271-2/+25
| | | | | | | | | This is implemented via a multiclass that derives from the vperm imm multiclass. Fixes <rdar://problem/18426089> llvm-svn: 220737
* [AVX512] Clean up avx512_perm_imm to use X86VectorVTInfoAdam Nemet2014-10-271-25/+22
| | | | | | | | No functionality change. No change in X86.td.expanded except that we only set the CD8 attributes for the memory variants. (This shouldn't be used unless we have a memory operand.) llvm-svn: 220736
* [AVX512] Derive vpermil* from avx512_perm_immAdam Nemet2014-10-271-14/+14
| | | | | | | | This used to derive from avx512_pshuf_imm which is confusing. NFC. Compared X86.td.expanded. llvm-svn: 220735
* [AVX512] Fix copy-and-paste bugs in vpermilAdam Nemet2014-10-271-3/+3
| | | | | | | | | 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734
* Fix a stackmap bug introduced in r220710.Pete Cooper2014-10-271-4/+14
| | | | | | | | For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops *before* the call, and not after it. llvm-svn: 220728
* [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer ↵Juergen Ributzka2014-10-271-2/+6
| | | | | | | | | | | | check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713
* [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.Juergen Ributzka2014-10-271-0/+4
| | | | | | | | | Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712
* Stackmap shadows should consider call returns a branch target.Pete Cooper2014-10-271-0/+6
| | | | | | | | To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710
* [FastISel][AArch64] Use 'cbz' also for null values (pointers).Juergen Ributzka2014-10-271-15/+12
| | | | | | | | | The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709
* [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' ↵Juergen Ributzka2014-10-271-2/+2
| | | | | | | | | | | | instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704
* [FastISel][AArch64] Fix load/store with frame indices.Juergen Ributzka2014-10-271-23/+20
| | | | | | | | | | | | At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700
* [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of theseLang Hames2014-10-271-8/+8
| | | | | | | | | | | sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688
* Prune CRLF.NAKAMURA Takumi2014-10-272-138/+138
| | | | llvm-svn: 220678
* [ARM] Select VMAXNM and VMINNM regardless of operand orderOliver Stannard2014-10-271-6/+12
| | | | | | | | | | | | | | Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671
* [asan-asm-instrumentation] Added comment describing how asm instrumentation ↵Yuri Gorshenin2014-10-271-0/+64
| | | | | | | | | | | | | | works. Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670
* AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instructionElena Demikhovsky2014-10-261-8/+17
| | | | llvm-svn: 220638
* [X86][SSE] Vector integer/float conversion memory foldingSimon Pilgrim2014-10-251-7/+7
| | | | | | | | Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1). Minor patch agreed with qcolombet. llvm-svn: 220613
* [NVPTX] aligned byte-buffers for vector return typesJingyue Wu2014-10-251-1/+6
| | | | | | | | | | | | | | | | | | | Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607
* Fix a Mach-O assembler segfault for a subtraction expression with an ↵Kevin Enderby2014-10-241-4/+7
| | | | | | | | | | | | | | | | | undefined symbol. In a Mach-O object file a relocatable expression of the form SymbolA - SymbolB + constant is allowed when both symbols are defined in a section. But when either symbol is undefined it is an error. The code was crashing when it had an undefined symbol in this case. And should have printed a error message using the location information in the relocation entry. rdar://18678402 llvm-svn: 220599
* [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoadSimon Pilgrim2014-10-241-9/+12
| | | | | | | | | | Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592
* [Hexagon] Resubmission of 220427Colin LeMahieu2014-10-2419-233/+241
| | | | | | | | | | | Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst. Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220584
* Allow AVX vrsqrtps generation.Sanjay Patel2014-10-241-2/+3
| | | | | | | This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-247-3/+51
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* [mips] Replace MipsABIEnum with a MipsABIInfo class.Daniel Sanders2014-10-247-32/+70
| | | | | | | | | | | | | | | | | | | | | Summary: No functional change yet, it's just an object replacement for an enum. It will allow us to gather ABI information in a single place so that we can start testing for properties of the ABI's instead of the ABI itself. For example we will eventually be able to use: ABI.MinStackAlignmentInBytes() instead of: (isABI_N32() || isABI_N64()) ? 16 : 8 which is clearer and more maintainable. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3341 llvm-svn: 220568
* [mips] Fix >80-column lineDaniel Sanders2014-10-241-1/+2
| | | | llvm-svn: 220564
* [mips] Remove redundant code in RetCC_MipsN. NFC.Daniel Sanders2014-10-241-3/+0
| | | | | | | | | | | | | | | | Summary: i32 is always promoted to i64 so it no longer makes sense to assign i32 to registers. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5964 llvm-svn: 220561
* [mips] For N32/N64, structs must be passed in the upper bits of a register.Daniel Sanders2014-10-241-2/+2
| | | | | | | | | | | | | | | | | | Summary: Most structs were fixed by r218451 but those of between >32-bits and <64-bits remained broken since they were not marked with [ASZ]ExtUpper. This patch fixes the remaining cases by using CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5963 llvm-svn: 220556
* [AArch64] Fix fast-isel of cbz of i1, i8, i16Oliver Stannard2014-10-241-0/+6
| | | | | | | | | | This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553
* [AVX512] FMA support for the 231 variantsAdam Nemet2014-10-241-13/+17
| | | | | | | | | | | | | | | | | This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540
* [AVX512] Introduce fma3p_forms from AVXAdam Nemet2014-10-241-38/+36
| | | | | | | | | | | | | This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539
* [X86] Improve mul w/ overflow codegen, to MUL8+SETO.Ahmed Bougacha2014-10-233-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516
* Do not emit intermediate register for zero FP immediateRenato Golin2014-10-231-0/+12
| | | | | | | | | | | | | | | | | This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486
* Hexagon/Disassembler/LLVMBuild.txt: Update libdeps.NAKAMURA Takumi2014-10-231-1/+1
| | | | llvm-svn: 220482
* Hexagon/LLVMBuild.txt: Prune CRLF.NAKAMURA Takumi2014-10-232-30/+30
| | | | llvm-svn: 220481
* [CMake] Prune CRLF in CMakeLists.txt(s).NAKAMURA Takumi2014-10-232-14/+13
| | | | llvm-svn: 220480
* Revert r220427, "[Hexagon] Adding encoding bits for add opcode."NAKAMURA Takumi2014-10-2310-225/+177
| | | | | | It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc. llvm-svn: 220478
* [mips][microMIPS] Implement ADDIUR1SP instructionZoran Jovanovic2014-10-235-0/+49
| | | | | | Differential Revision: http://reviews.llvm.org/D5153 llvm-svn: 220477
* ps][microMIPS] Implement ADDIUR2 instructionZoran Jovanovic2014-10-235-0/+53
| | | | | | Differential Revision: http://reviews.llvm.org/D5151 llvm-svn: 220476
* ps][microMIPS] Implement LI16 instructionZoran Jovanovic2014-10-233-0/+31
| | | | | | Differential Revision: http://reviews.llvm.org/D5149 llvm-svn: 220475
* [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructionsZoran Jovanovic2014-10-235-0/+54
| | | | | | Differential Revision: http://reviews.llvm.org/D5774 llvm-svn: 220474
* [Thumb2] Improve disassembly of memory hintsOliver Stannard2014-10-231-7/+57
| | | | | | | | | Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472
* [ARM, stack protector] If supported, use armv7 instructions.Akira Hatanaka2014-10-231-4/+39
| | | | | | | | | | | | | | | | | This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470
OpenPOWER on IntegriCloud