summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* X86 CostModel: Add support for a some of the common arithmetic instructions ↵Nadav Rotem2012-11-031-0/+70
| | | | | | for SSE4, AVX and AVX2. llvm-svn: 167347
* Revert the majority of the next patch in the address space series:Chandler Carruth2012-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r165941: Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis. Despite this commit log, this change primarily changed stuff outside of VMCore, and those changes do not carry any tests for correctness (or even plausibility), and we have consistently found questionable or flat out incorrect cases in these changes. Most of them are probably correct, but we need to devise a system that makes it more clear when we have handled the address space concerns correctly, and ideally each pass that gets updated would receive an accompanying test case that exercises that pass specificaly w.r.t. alternate address spaces. However, from this commit, I have retained the new C API entry points. Those were an orthogonal change that probably should have been split apart, but they seem entirely good. In several places the changes were very obvious cleanups with no actual multiple address space code added; these I have not reverted when I spotted them. In a few other places there were merge conflicts due to a cleaner solution being implemented later, often not using address spaces at all. In those cases, I've preserved the new code which isn't address space dependent. This is part of my ongoing effort to clean out the partial address space code which carries high risk and low test coverage, and not likely to be finished before the 3.2 release looms closer. Duncan and I would both like to see the above issues addressed before we return to these changes. llvm-svn: 167222
* (For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization.Shuxin Yang2012-10-311-4/+29
| | | | | | | | | | | | The adc/sbb optimization is to able to convert following expression into a single adc/sbb instruction: (ult) ... = x + 1 // where the ult is unsigned-less-than comparison (ult) ... = x - 1 This change is to flip the "x >u y" (i.e. ugt comparison) in order to expose the adc/sbb opportunity. llvm-svn: 167180
* Clean up redundant SP register maintained in X86 TLIMichael Liao2012-10-311-5/+7
| | | | llvm-svn: 167104
* X86 MMX: optimize transfer from mmx to i32Manman Ren2012-10-301-0/+8
| | | | | | | | | We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 llvm-svn: 167056
* Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chanceJakub Staszak2012-10-301-4/+4
| | | | | | to test it with chapni's fix (-mattr=+avx). llvm-svn: 166985
* Revert r166971. It causes buildbot failure. To be investigated.Jakub Staszak2012-10-291-4/+4
| | | | llvm-svn: 166979
* Remove unused variable.Jakub Staszak2012-10-291-1/+0
| | | | llvm-svn: 166973
* Simplify code. No functionality change.Jakub Staszak2012-10-291-4/+5
| | | | llvm-svn: 166972
* Allow to fold vector load if there is more than one bitcast, so in the case:Jakub Staszak2012-10-291-4/+4
| | | | | | | | | | | | | | | | | | | %0 = load <8 x i16>* %dest %1 = shufflevector <8 x i16> %0, <8 x i16> %in, <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14> store <8 x i16> %1, <8 x i16>* %dest We get: vmovlpd (%eax), %xmm0, %xmm0 instead of: vmovaps (%eax), %xmm1 vmovsd %xmm1, %xmm0, %xmm0 No extra test-case is added. I just fixed the existing one (also it uses FileCheck now). llvm-svn: 166971
* Silence a GCC warning about comparing signed and unsigned types.Duncan Sands2012-10-291-2/+2
| | | | llvm-svn: 166922
* Clean up where SlotSize should be used instead of pointer size.Michael Liao2012-10-251-15/+14
| | | | llvm-svn: 166664
* Add custom conversion from v2u32 to v2f32 in 32-bit modeMichael Liao2012-10-241-0/+20
| | | | | | | - As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to v2f32 is added to improve the efficiency of the code generated. llvm-svn: 166545
* Fix PR14161Michael Liao2012-10-231-1/+4
| | | | | | | - Check index being extracted to be constant 0 before simplfiying. Otherwise, retain the original sequence. llvm-svn: 166504
* Silence -Wsign-compareMatt Beaumont-Gay2012-10-231-1/+1
| | | | llvm-svn: 166494
* Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32Michael Liao2012-10-231-19/+50
| | | | | | | | - Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce DAG combine overhead. - Extend the support to v4i16/v8i16 as well. llvm-svn: 166487
* Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1Michael Liao2012-10-231-0/+95
| | | | llvm-svn: 166486
* This patch is to fix radar://8426430. It is about llvm support of ↵Shuxin Yang2012-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | __builtin_debugtrap() which is supposed to consistently raise SIGTRAP across all systems. In contrast, __builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap" functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap(). The X86 backend is already able to handle debugtrap(). This patch is to: 1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang). 2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which make the __builtin_debugtrap() "available" to all existing ports without the hassle of changing their code. 3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and __builtin_trap() will be expanded into the function call of the specified trap function. This behavior may need change in the future. The provided testing-case is to make sure 2) and 3) are working for ARM port, and we already have a testing case for x86. llvm-svn: 166300
* Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86Michael Liao2012-10-191-0/+79
| | | | | | | | | - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. llvm-svn: 166288
* Check SSSE3 instead of SSE4.1Michael Liao2012-10-171-2/+2
| | | | | | - All shuffle insns required, especially PSHUB, are added in SSSE3. llvm-svn: 166086
* Fix setjmp on models with non-Small code model nor non-Static relocation modelMichael Liao2012-10-171-12/+42
| | | | | | | | | | - MBB address is only valid as an immediate value in Small & Static code/relocation models. On other models, LEA is needed to load IP address of the restore MBB. - A minor fix of MBB in MC lowering is added as well to enable target relocation flag being propagated into MC. llvm-svn: 166084
* Support v8f32 to v8i8/vi816 conversion through custom loweringMichael Liao2012-10-161-17/+38
| | | | | | | | - Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to vector element-wise widening) to reduce DAG combiner and its overhead added in X86 backend. llvm-svn: 166036
* Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>.NAKAMURA Takumi2012-10-161-0/+41
| | | | | | | | | | | | | | | | | | | | | | | Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". Original message since r165661: My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code. llvm-svn: 166017
* Add __builtin_setjmp/_longjmp supprt in X86 backendMichael Liao2012-10-151-0/+202
| | | | | | | | | | | - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989
* Resubmit the changes to llvm core to update the functions to support ↵Micah Villmow2012-10-151-6/+6
| | | | | | different pointer sizes on a per address space basis. llvm-svn: 165941
* X86: Fix accidentally swapped operands.Benjamin Kramer2012-10-131-1/+1
| | | | llvm-svn: 165871
* X86: Promote i8 cmov when both operands are coming from truncates of the ↵Benjamin Kramer2012-10-131-0/+15
| | | | | | | | | | | | | | same width. X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this level is often not a good idea because it's too late for many optimizations to kick in. This solution doesn't add any extensions (truncs are free) and tries to avoid introducing partial register stalls by filtering direct copyfromregs. I'm seeing a ~10% speedup on reading a random .png file with libpng15 via graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture. llvm-svn: 165868
* Revert 165732 for further review.Micah Villmow2012-10-111-6/+6
| | | | llvm-svn: 165747
* Add in the first iteration of support for llvm/clang/lldb to allow variable ↵Micah Villmow2012-10-111-6/+6
| | | | | | per address space pointer sizes to be optimized correctly. llvm-svn: 165726
* Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>."NAKAMURA Takumi2012-10-111-40/+0
| | | | | | It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692
* Change MachineInstrBuilder::addDisp to copy over target flags by default.Evan Cheng2012-10-111-5/+2
| | | | llvm-svn: 165677
* Patch by Shuxin Yang <shuxin.llvm@gmail.com>.Nadav Rotem2012-10-101-0/+40
| | | | | | | | | | | | | | | | | | | Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". llvm-svn: 165661
* Add support for FP_ROUND from v2f64 to v2f32Michael Liao2012-10-101-0/+7
| | | | | | | | | | - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631
* Add alternative support for FP_ROUND from v2f32 to v2f64Michael Liao2012-10-101-84/+17
| | | | | | | | | | | - Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. llvm-svn: 165625
* When expanding atomic load arith instructions, do not lose target flags. ↵Evan Cheng2012-10-091-2/+5
| | | | | | rdar://12453106 llvm-svn: 165568
* Create enums for the different attributes.Bill Wendling2012-10-091-7/+12
| | | | | | | We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488
* Move TargetData to DataLayout.Micah Villmow2012-10-081-3/+3
| | | | llvm-svn: 165402
* This patch corrects commit 165126 by using an integer bit width instead of Preston Gurd2012-10-041-1/+1
| | | | | | | | a pointer to a type, in order to remove the uses of getGlobalContext(). Patch by Tyler Nowicki. llvm-svn: 165255
* Add register encoding support in X86 backendMichael Liao2012-10-041-3/+4
| | | | | | | | - Add 'HwEncoding' for X86 registers and call getEncodingValue() to retrieve their encoding values. - This's the first step to adopt new scheme. Furthur revising is onging. llvm-svn: 165241
* Use new accessor methods to query for attributes.Bill Wendling2012-10-041-1/+1
| | | | llvm-svn: 165205
* Clean up tailing whitespacesMichael Liao2012-10-031-2/+2
| | | | llvm-svn: 165182
* Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an ↵Craig Topper2012-09-301-1/+1
| | | | | | EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code. llvm-svn: 164919
* Remove the `hasFnAttr' method from Function.Bill Wendling2012-09-261-6/+6
| | | | | | | The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725
* Add missing i64 max/min/umax/umin on 32-bit targetMichael Liao2012-09-251-0/+78
| | | | | | - Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616
* Fix an illegal tailcall opt where the callee returns a double via xmm while ↵Evan Cheng2012-09-251-1/+9
| | | | | | caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588
* Add missing i8 max/min/umax/umin supportMichael Liao2012-09-211-9/+44
| | | | | | - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358
* Revise td of X86 atomic instructionsMichael Liao2012-09-211-0/+5
| | | | | | | - Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357
* Re-work X86 code generation of atomic ops with spin-loopMichael Liao2012-09-201-501/+490
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281
* X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math.Benjamin Kramer2012-09-151-0/+2
| | | | | | This was only an issue if sse is disabled. llvm-svn: 163967
* Fix commentMichael Liao2012-09-131-1/+1
| | | | llvm-svn: 163835
OpenPOWER on IntegriCloud