summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Add missing i8 max/min/umax/umin supportMichael Liao2012-09-211-9/+44
| | | | | | - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358
* Revise td of X86 atomic instructionsMichael Liao2012-09-211-0/+5
| | | | | | | - Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357
* Re-work X86 code generation of atomic ops with spin-loopMichael Liao2012-09-201-501/+490
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281
* X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math.Benjamin Kramer2012-09-151-0/+2
| | | | | | This was only an issue if sse is disabled. llvm-svn: 163967
* Fix commentMichael Liao2012-09-131-1/+1
| | | | llvm-svn: 163835
* Add wider vector/integer support for PR12312Michael Liao2012-09-131-100/+101
| | | | | | | | - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832
* Fix PR11985Michael Liao2012-09-121-2/+3
| | | | | | | | | | | - BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743
* Indentation fixes. No functional change.Craig Topper2012-09-121-8/+8
| | | | llvm-svn: 163682
* Make a bunch of lowering helper functions static instead of member ↵Craig Topper2012-09-111-58/+55
| | | | | | functions. No functional change. llvm-svn: 163596
* Remove redundant semicolons which are null statements.Dmitri Gribenko2012-09-101-1/+1
| | | | llvm-svn: 163547
* Enhance PR11334 fix to support extload from v2f32/v4f32Michael Liao2012-09-101-0/+4
| | | | | | - Fix an remaining issue of PR11674 as well llvm-svn: 163528
* Add boolean simplification support from CMOVMichael Liao2012-09-101-12/+42
| | | | | | | | - If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516
* The VPSHUFB 256-bit instruction may be generated when one of input vector is ↵Elena Demikhovsky2012-09-101-4/+15
| | | | | | | | undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506
* Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled.Craig Topper2012-09-081-0/+5
| | | | llvm-svn: 163473
* Use 256-bit alignment for constant pool value for 256-bit vector FNEG lowering.Craig Topper2012-09-081-2/+3
| | | | llvm-svn: 163463
* Add support for lowering FABS of vector types.Craig Topper2012-09-081-12/+25
| | | | llvm-svn: 163461
* Set operation action for FFLOOR to Expand for all vector types for X86. Set ↵Craig Topper2012-09-081-0/+1
| | | | | | FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458
* AVX2 optimization.Elena Demikhovsky2012-09-061-0/+40
| | | | | | Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312
* Remove duplicated helper functionMichael Liao2012-09-061-17/+1
| | | | llvm-svn: 163295
* Use iPTR instead of i32 for extract_subvector/insert_subvector index in ↵Craig Topper2012-09-061-2/+2
| | | | | | lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder. llvm-svn: 163293
* Stop casting away const qualifier needlessly.Roman Divacky2012-09-051-1/+1
| | | | llvm-svn: 163258
* Generic Bypass Slow DivPreston Gurd2012-09-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150
* This patch optimizes shuffle instruction - generates 2 instructions instead ↵Elena Demikhovsky2012-09-041-16/+17
| | | | | | | | | | | | | | | | | | | | of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134
* TyposCraig Topper2012-09-011-1/+1
| | | | llvm-svn: 163053
* SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure itsManman Ren2012-08-311-0/+12
| | | | | | | | | | | output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://11457792 llvm-svn: 163036
* Fix PR12359Michael Liao2012-08-311-3/+5
| | | | | | | | | - In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as well as PSHUFB will zero elements with negative indices. Patch by Sriram Murali <sriram.murali@intel.com> llvm-svn: 163018
* Add support for converting llvm.fma to fma4 instructions.Craig Topper2012-08-311-4/+6
| | | | llvm-svn: 162999
* Only perform DAG combine on FMAs of legal types.Craig Topper2012-08-301-0/+4
| | | | llvm-svn: 162892
* Convert FMA4 patterns to use target specific nodes instead of intrinsics to ↵Craig Topper2012-08-291-4/+0
| | | | | | align with FMA3. llvm-svn: 162829
* Add comments on the literal value used.Michael Liao2012-08-281-1/+1
| | | | llvm-svn: 162805
* Explicitly update the number of nodes to be traversedMichael Liao2012-08-281-1/+1
| | | | llvm-svn: 162780
* Fix PR12312Michael Liao2012-08-281-10/+112
| | | | | | | | | | - Add a target-specific DAG optimization to recognize a pattern PTEST-able. Such a pattern is a OR'd tree with X86ISD::OR as the root node. When X86ISD::OR node has only its flag result being used as a boolean value and all its leaves are extracted from the same vector, it could be folded into an X86ISD::PTEST node. llvm-svn: 162735
* Remove MMX shift intrinsic handling code that also exists in ↵Craig Topper2012-08-271-56/+0
| | | | | | SelectionDAGBuilder. llvm-svn: 162661
* Custom lower FMA intrinsics to target specific nodes and remove the patterns.Craig Topper2012-08-241-0/+72
| | | | llvm-svn: 162534
* fix a case where all operands of BUILD_VECTOR are undefinedMichael Liao2012-08-201-0/+4
| | | | llvm-svn: 162214
* When unsafe math is used, we can use commutative FMAX and FMIN. In some casesNadav Rotem2012-08-191-0/+27
| | | | | | | | | | | | | | | | | | | this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 llvm-svn: 162187
* Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow ↵Nadav Rotem2012-08-181-15/+60
| | | | | | better compare/branch code. llvm-svn: 162172
* Refactor code a bit to reduce number of calls in the final compiled code. No ↵Craig Topper2012-08-181-134/+144
| | | | | | functional change intended. llvm-svn: 162166
* Revert r162160 because it made a few buildbots fail.Nadav Rotem2012-08-181-43/+6
| | | | llvm-svn: 162164
* The X86 backend has a number of optimizations for SETCC nodes which useNadav Rotem2012-08-181-6/+43
| | | | | | | | | | | | | | | | | | | | | arithmetic instructions. However, when small data types are used, a truncate node appears between the SETCC node and the arithmetic operation. This patch adds support for this pattern. Before: xorl %esi, %edi testb %dil, %dil setne %al ret After: xorb %dil, %sil setne %al ret rdar://12081007 llvm-svn: 162160
* Use nested switch to select arguments to reduce calls to EmitPCMP.Craig Topper2012-08-171-5/+20
| | | | llvm-svn: 162089
* Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to ↵Craig Topper2012-08-171-16/+30
| | | | | | reduce to only a single call to it thus allowing it to be inlined by the compiler. llvm-svn: 162088
* minor fix of X86ISD::VSEXT_MOVL dumpMichael Liao2012-08-141-0/+1
| | | | llvm-svn: 161902
* fix PR11334Michael Liao2012-08-141-0/+81
| | | | | | | | | | | | - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. llvm-svn: 161894
* Factor duplicate calls to getUNDEF in several functions.Craig Topper2012-08-141-10/+10
| | | | llvm-svn: 161860
* Re-factor intrinsic lowering to combine common parts of similar intrinsics. ↵Craig Topper2012-08-141-34/+133
| | | | | | Reduces compiled code size a little bit. llvm-svn: 161859
* Tidy up VSETCC lowering code a bit more by adding an llvm_unreachable and ↵Craig Topper2012-08-131-7/+9
| | | | | | putting an a couple if conditions in a better order. llvm-svn: 161746
* Refactor code a bit to share commonalities. No functional change intended.Craig Topper2012-08-131-20/+21
| | | | llvm-svn: 161745
* Fix an unused variable warning from r161742.Craig Topper2012-08-131-3/+0
| | | | llvm-svn: 161743
* Remove the LowerMMXCONCAT_VECTORS function. It could never execute because ↵Craig Topper2012-08-131-39/+1
| | | | | | there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result. llvm-svn: 161742
OpenPOWER on IntegriCloud