summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.Andrea Di Biagio2014-11-111-1/+2
| | | | | | This helps the DAGCombiner to identify more opportunities to fold shuffles. llvm-svn: 221684
* Use uint64_t as the type for the X86 TSFlag format enum. Allows removal of ↵Craig Topper2014-11-112-61/+62
| | | | | | the VEXShift hack that was used to access the higher bits of TSFlags. llvm-svn: 221673
* [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSextMichael Kuperstein2014-11-111-0/+1
| | | | | | | | | This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221672
* MCAsmParserExtension has a copy of the MCAsmParser. Use it.Rafael Espindola2014-11-111-12/+25
| | | | | | Base classes were storing a second copy. llvm-svn: 221667
* [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 ifQuentin Colombet2014-11-112-8/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657
* Reverting r221626 due to a too-strict test.Michael Kuperstein2014-11-101-1/+0
| | | | llvm-svn: 221629
* [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSextMichael Kuperstein2014-11-101-0/+1
| | | | | | | | | This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221626
* Misc style fixes. NFC.Rafael Espindola2014-11-102-44/+28
| | | | | | | | | | | | | This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615
* [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / ↵Simon Pilgrim2014-11-061-2/+10
| | | | | | | | | | | | cvttpd2dq) Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions. Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables. Differential Revision: http://reviews.llvm.org/D6001 llvm-svn: 221489
* [X86] Add VFMADDSUB cases for the 213->231 custom inserter.Ahmed Bougacha2014-11-061-0/+9
| | | | | | Also add tests for vfmadd/vfmsub. llvm-svn: 221488
* [X86] Add missing FMA3 VFMADDSUB in the emitter.Ahmed Bougacha2014-11-061-0/+8
| | | | | | Also reuse the fma4 intrinsic test to cover fma3 instructions too. llvm-svn: 221487
* [X86] When commuting SSE immediate blend, make sure that the new blend mask ↵Andrea Di Biagio2014-11-061-1/+2
| | | | | | | | | | | | | | | | | | is a valid imm8. Example: define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) { %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3> ret <4 x i32> %shuffle } Before llc (-mattr=+sse4.1), produced the following assembly instruction: pblendw $4294967103, %xmm1, %xmm0 After pblendw $63, %xmm1, %xmm0 llvm-svn: 221455
* X86, MC: Tidy up some whitespace in GetRelocTypeDavid Majnemer2014-11-061-1/+1
| | | | | | No functionality change intended. llvm-svn: 221443
* [X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into theQuentin Colombet2014-11-063-19/+66
| | | | | | | | | | | | | | | condition to match a blend. This prevents optimizations that work on VSELECT to perform invalid transformations. Indeed, the optimized condition does not match the vector boolean content that is expected and bad things may happen. This patch yields the exact same code on the whole test-suite + specs (-O3 and -O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a bug reduced in vselect-avx.ll. <rdar://problem/18819506> llvm-svn: 221429
* [X86][SSE] Vector integer to float conversion memory foldingSimon Pilgrim2014-11-051-0/+3
| | | | | | | | Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works. Differential Revision: http://reviews.llvm.org/D5981 llvm-svn: 221407
* [x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32Derek Schuff2014-11-051-1/+1
| | | | | | | | | | Summary: X86FastISel::fastMaterializeAlloca was incorrectly conditioning its opcode selection on subtarget bitness rather than pointer size. Differential Revision: http://reviews.llvm.org/D6136 llvm-svn: 221386
* [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.Andrea Di Biagio2014-11-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the folding of vector AND nodes into blend operations for targets that feature SSE4.1. A vector AND node where one of the operands is a constant build_vector with elements that are either zero or all-ones can be converted into a blend. This allows for example to simplify the following code: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1> %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0> %3 = or <4 x i32> %1, %2 ret <4 x i32> %3 } Before this patch llc (-mcpu=corei7) generated: andps LCPI1_0(%rip), %xmm0, %xmm0 andps LCPI1_1(%rip), %xmm1, %xmm1 orps %xmm1, %xmm0, %xmm0 retq With this patch we generate a single 'vpblendw'. llvm-svn: 221343
* [X86][SSE] Enable commutation for SSE immediate blend instructionsSimon Pilgrim2014-11-042-28/+77
| | | | | | | | | | Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask. This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests). Differential Revision: http://reviews.llvm.org/D6015 llvm-svn: 221313
* [X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX ↵Andrea Di Biagio2014-11-041-8/+11
| | | | | | | | | | | | | | | | | and FeatureSSE4A for all the bdver* cpus. This patch adds 'FeatureSlowSHLD' to 'bdver3'. According to the official AMD optimization guide for amdfam15: "Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath instructions, while SHLD is a VectorPath instruction." This patch also explicitly sets feature AVX and SSE4A for all the bdver* cpus. This part of the patch is a non-functional change and it is mainly done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A). llvm-svn: 221296
* [X86] Add debug print name for X86ISD::[US]MUL8. NFC-ish.Ahmed Bougacha2014-11-031-0/+2
| | | | | | The opcodes were added in r220516, but I forgot to add the print names. llvm-svn: 221185
* [X86] 8bit divrem: Improve codegen for AH register extraction.Ahmed Bougacha2014-11-034-28/+87
| | | | | | | | | | | | | | | | | | | | | | | | For 8-bit divrems where the remainder is used, we used to generate: divb %sil shrw $8, %ax movzbl %al, %eax That was to avoid an H-reg access, which is problematic mainly because it isn't possible in REX-prefixed instructions. This patch optimizes that to: divb %sil movzbl %ah, %eax To do that, we explicitly extend AH, and extract the L-subreg in the resulting register. The extension is done using the NOREX variants of MOVZX. To support signed operations, MOVSX_NOREX is also added. Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is then lowered to a sequence containing a single zext (rather than 2). Differential Revision: http://reviews.llvm.org/D6064 llvm-svn: 221176
* Remove redundant calls to isMaterializable.Rafael Espindola2014-11-011-6/+1
| | | | | | | | | | This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
* Revert "Temporarily revert r220777 to sort out build bot breakage."Adrian Prantl2014-11-011-10/+10
| | | | | | | This reverts commit r221028. Later commits depend on this and reverting just this one causes even more bots to fail. llvm-svn: 221041
* Revert r220779, "[AVX512] Removed special case for cmp instructions in ↵NAKAMURA Takumi2014-11-011-4/+15
| | | | | | | | getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine." Since r221028 (reverting r220777), this caused failures. llvm-svn: 221040
* Temporarily revert r220777 to sort out build bot breakage.Adrian Prantl2014-11-011-10/+10
| | | | | | "[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros." llvm-svn: 221028
* Work around bugs in MSVC "14" CTP 3's conversion logicReid Kleckner2014-10-312-2/+4
| | | | | | | | | | It appears to ignore or find ambiguous MachineInstrBuilder's conversion operators that allow conversion to MachineInstr* and MachineBasicBlock::bundle_iterator. As a workaround, add an explicit way to get the MachineInstr. llvm-svn: 221017
* [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.Robert Khasanov2014-10-301-26/+51
| | | | | | | Refactored through AVX512_maskable llvm-svn: 220908
* [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, ↵Robert Khasanov2014-10-291-107/+47
| | | | | | | | | VMULP*, VDIVP*, VMAXP*, VMINP*) Refactored through AVX512_maskable Added encoding tests for them. llvm-svn: 220858
* [AVX512] Fix VSQRT packed instructions internal names.Robert Khasanov2014-10-282-9/+9
| | | | | | No functional change llvm-svn: 220808
* [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.Robert Khasanov2014-10-281-28/+44
| | | | | | Refactored through AVX512_maskable llvm-svn: 220806
* [AVX-512] Expanded rsqrt/rcp instructions to VL subset.Robert Khasanov2014-10-281-20/+47
| | | | | | Refactored multiclass through AVX512_maskable llvm-svn: 220783
* [AVX512] Removed special case for cmp instructions in getVectorMaskingNode. ↵Robert Khasanov2014-10-281-15/+4
| | | | | | | | Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine. No functional change. llvm-svn: 220779
* [x86] Simplify vector selection if condition value type matches vselect ↵Robert Khasanov2014-10-281-10/+10
| | | | | | | | | value type and true value is all ones or false value is all zeros. This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case. Added tests for vselect of <X x i1> values. llvm-svn: 220777
* [AVX512] Bring back vector-shuffle lowering support through broadcastsRobert Khasanov2014-10-282-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774
* X86: Implement the vectorcall calling conventionReid Kleckner2014-10-282-0/+77
| | | | | | | | | | | | | | | | | | | | This is a Microsoft calling convention that supports both x86 and x86_64 subtargets. It passes vector and floating point arguments in XMM0-XMM5, and passes them indirectly once they are consumed. Homogenous vector aggregates of up to four elements can be passed in sequential vector registers, but this part is not implemented in LLVM and will be handled in Clang. On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as integer register parameters and is callee cleanup. On x86_64, it delegates to the normal win64 calling convention. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D5943 llvm-svn: 220745
* [AVX512] Add vpermil variable versionAdam Nemet2014-10-271-2/+25
| | | | | | | | | This is implemented via a multiclass that derives from the vperm imm multiclass. Fixes <rdar://problem/18426089> llvm-svn: 220737
* [AVX512] Clean up avx512_perm_imm to use X86VectorVTInfoAdam Nemet2014-10-271-25/+22
| | | | | | | | No functionality change. No change in X86.td.expanded except that we only set the CD8 attributes for the memory variants. (This shouldn't be used unless we have a memory operand.) llvm-svn: 220736
* [AVX512] Derive vpermil* from avx512_perm_immAdam Nemet2014-10-271-14/+14
| | | | | | | | This used to derive from avx512_pshuf_imm which is confusing. NFC. Compared X86.td.expanded. llvm-svn: 220735
* [AVX512] Fix copy-and-paste bugs in vpermilAdam Nemet2014-10-271-3/+3
| | | | | | | | | 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734
* Fix a stackmap bug introduced in r220710.Pete Cooper2014-10-271-4/+14
| | | | | | | | For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops *before* the call, and not after it. llvm-svn: 220728
* Stackmap shadows should consider call returns a branch target.Pete Cooper2014-10-271-0/+6
| | | | | | | | To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710
* [asan-asm-instrumentation] Added comment describing how asm instrumentation ↵Yuri Gorshenin2014-10-271-0/+64
| | | | | | | | | | | | | | works. Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670
* AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instructionElena Demikhovsky2014-10-261-8/+17
| | | | llvm-svn: 220638
* [X86][SSE] Vector integer/float conversion memory foldingSimon Pilgrim2014-10-251-7/+7
| | | | | | | | Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1). Minor patch agreed with qcolombet. llvm-svn: 220613
* Fix a Mach-O assembler segfault for a subtraction expression with an ↵Kevin Enderby2014-10-241-4/+7
| | | | | | | | | | | | | | | | | undefined symbol. In a Mach-O object file a relocatable expression of the form SymbolA - SymbolB + constant is allowed when both symbols are defined in a section. But when either symbol is undefined it is an error. The code was crashing when it had an undefined symbol in this case. And should have printed a error message using the location information in the relocation entry. rdar://18678402 llvm-svn: 220599
* [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoadSimon Pilgrim2014-10-241-9/+12
| | | | | | | | | | Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592
* Allow AVX vrsqrtps generation.Sanjay Patel2014-10-241-2/+3
| | | | | | | This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-245-1/+46
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* [AVX512] FMA support for the 231 variantsAdam Nemet2014-10-241-13/+17
| | | | | | | | | | | | | | | | | This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540
* [AVX512] Introduce fma3p_forms from AVXAdam Nemet2014-10-241-38/+36
| | | | | | | | | | | | | This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539
OpenPOWER on IntegriCloud