summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86FastISel.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Don't repeat names in comments and clang-format this function.Rafael Espindola2015-03-161-7/+10
| | | | llvm-svn: 232375
* Have getCallPreservedMask and getThisCallPreservedMask take aEric Christopher2015-03-111-1/+1
| | | | | | | MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979
* [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)Sanjay Patel2015-03-051-18/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reduces code size for all AVX targets and increases speed for some chips. SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the packed float/double flavors. AVX subsequently made the instruction useful by adding a 4-register operand form. So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway. This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell: blendv : 29.73 cycles/iter logic : 43.15 cycles/iter No new test cases with this patch because: 1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel 2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX 3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants. http://llvm.org/bugs/show_bug.cgi?id=22483 Differential Revision: http://reviews.llvm.org/D8063 llvm-svn: 231408
* [X86][FastISel] Simplify the logic in method X86SelectSIToFP.Andrea Di Biagio2015-03-041-21/+13
| | | | | | | | | | | | | | | | The target-independent selection algorithm in FastISel already knows how to select a SINT_TO_FP if the target is SSE but not AVX. On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr (for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64 conversion). This patch simplifies the logic in method X86SelectSIToFP knowing that the code would not be reachable if the subtarget doesn't have AVX. No functional change intended. llvm-svn: 231243
* CodeGen: convert CCState interface to using ArrayRefsTim Northover2015-02-211-1/+1
| | | | | | | | | | | Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118
* [X86][FastIsel] Teach how to select float-half conversion intrinsics.Andrea Di Biagio2015-02-201-0/+62
| | | | | | | | | | | This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'. If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion, and VCVTPH2PSrr for a half-float conversion. Differential Revision: http://reviews.llvm.org/D7673 llvm-svn: 230043
* [X86][FastIsel] Teach how to select scalar integer to float/double conversions.Andrea Di Biagio2015-02-171-0/+48
| | | | | | | | | | | | This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 llvm-svn: 229589
* X86: @llvm.frameaddress should defer to SelectionDAG for Win CFIDavid Majnemer2015-02-101-2/+7
| | | | llvm-svn: 228754
* [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX.Andrea Di Biagio2015-02-101-28/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch teaches X86FastISel how to select AVX instructions for scalar float/double convert operations. Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float). For example: \code define double @foo(float %f) { %conv = fpext float %f to double ret double %conv } \end code Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is legacy SSE: cvtss2sd %xmm0, %xmm0 With this patch, X86FastIsel selects a VCVTSS2SDrr instead: vcvtss2sd %xmm0, %xmm0, %xmm0 Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants. Differential Revision: http://reviews.llvm.org/D7438 llvm-svn: 228682
* Migrate to using the subtarget on the machine function and updateEric Christopher2015-02-021-7/+5
| | | | | | all uses. llvm-svn: 227891
* [X86] Convert esp-relative movs of function arguments to pushes, step 2Michael Kuperstein2015-02-011-1/+1
| | | | | | | | | | | | | | This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. (Re-commit of r227728) Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227752
* Revert r227728 due to bad line endings.Michael Kuperstein2015-02-011-3358/+3358
| | | | llvm-svn: 227746
* [X86] Convert esp-relative movs of function arguments to pushes, step 2Michael Kuperstein2015-02-011-3358/+3358
| | | | | | | | | | | | This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227728
* DebugInfo: Teach Fast ISel to respect the debug location of comparisons in jumpsDavid Blaikie2015-01-291-9/+9
| | | | | | | | | The use of the DbgLoc in FastISel is probably something we should fix. It's prone to leaking the wrong location into instructions - we should have a clear chain of custody from the debug location of an IR Instruction to that of a MachineInstr to avoid such leakage. llvm-svn: 227481
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-261-2/+2
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses ↵Michael Kuperstein2015-01-211-2/+8
| | | | | | in x32 mode. llvm-svn: 226661
* [X86] Make isel select the 2-byte register form of INC/DEC even in ↵Craig Topper2015-01-061-7/+4
| | | | | | | | non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering. Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness. llvm-svn: 225252
* [X86] Make isel select the shorter form of jump instructions instead of the ↵Craig Topper2015-01-061-4/+4
| | | | | | | | long form. The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT. llvm-svn: 225242
* [X86][ISel] Fix a regression I introduced in r224884Keno Fischer2014-12-281-3/+3
| | | | | | | | | | | | | The else case ResultReg was not checked for validity. To my surprise, this case was not hit in any of the existing test cases. This includes a new test cases that tests this path. Also drop the `target triple` declaration from the original test as suggested by H.J. Lu, because apparently with it the test won't be run on Linux llvm-svn: 224901
* [FastIsel][X86] Fix invalid register replacement for bool argsKeno Fischer2014-12-271-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Consider the following IR: %3 = load i8* undef %4 = trunc i8 %3 to i1 %5 = call %jl_value_t.0* @foo(..., i1 %4, ...) ret %jl_value_t.0* %5 Bools (that are the result of direct truncs) are lowered as whatever the argument to the trunc was and a "and 1", causing the part of the MBB responsible for this argument to look something like this: %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7 Later, when the load is lowered, it will insert %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14 but remember to (at the end of isel) replace vreg7 by vreg15. Now for the bug. In fast isel lowering, we mistakenly mark vreg8 as the result of the load instead of the trunc. This adds a fixup to have vreg8 replaced by whatever the result of the load is as well, so we end up with %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15 which is an SSA violation and causes problems later down the road. This fixes PR21557. Test Plan: Test test case from PR21557 is added to the test suite. Reviewers: ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6245 llvm-svn: 224884
* Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.Jan Wen Voung2014-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Follow up to [x32] "Use ebp/esp as frame and stack pointer": http://reviews.llvm.org/D4617 In that earlier patch, NaCl64 was made to always use rbp. That's needed for most cases because rbp should hold a full 64-bit address within the NaCl sandbox so that load/stores off of rbp don't require sandbox adjustment (zeroing the top 32-bits, then filling those by adding r15). However, llvm.frameaddress returns a pointer and pointers are 32-bit for NaCl64. In this case, use ebp instead, which will make the register copy type check. A similar mechanism may be needed for llvm.eh.return, but is not added in this change. Test Plan: test/CodeGen/X86/frameaddr.ll Reviewers: dschuff, nadav Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6514 llvm-svn: 223510
* [X86] Clean up whitespace as well as minor coding styleMichael Liao2014-12-041-21/+20
| | | | llvm-svn: 223339
* Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'Craig Topper2014-11-211-3/+2
| | | | llvm-svn: 222509
* [x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32Derek Schuff2014-11-051-1/+1
| | | | | | | | | | Summary: X86FastISel::fastMaterializeAlloca was incorrectly conditioning its opcode selection on subtarget bitness rather than pointer size. Differential Revision: http://reviews.llvm.org/D6136 llvm-svn: 221386
* [X86] Memory folding for commutative instructions (updated)Simon Pilgrim2014-10-201-1/+2
| | | | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt. Added additional regression test case provided by Joerg Sonnenberger. Differential Revision: http://reviews.llvm.org/D5818 llvm-svn: 220239
* Revert r219584, "[X86] Memory folding for commutative instructions."NAKAMURA Takumi2014-10-131-1/+1
| | | | | | It broke i686 selfhosting. llvm-svn: 219595
* [X86] Memory folding for commutative instructions.Simon Pilgrim2014-10-121-1/+1
| | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-011-4/+2
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the ↵Daniel Sanders2014-09-251-0/+3
| | | | | | buildbots. llvm-svn: 218452
* [FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.Juergen Ributzka2014-09-151-40/+0
| | | | | | Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822
* [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.Juergen Ributzka2014-09-031-25/+25
| | | | | | | | | | This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075
* [FastISel] Rename public visible FastISel functions. NFC.Juergen Ributzka2014-09-031-49/+49
| | | | | | | | | | | | | | | | | | | | | This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074
* Reapply [FastISel][X86] Add large code model support for materializing ↵Juergen Ributzka2014-08-191-1/+17
| | | | | | | | | | | | | | | | | | floating-point constants (r215595). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 216012
* Reapply [FastISel][X86] Use XOR to materialize the "0" value (r215594).Juergen Ributzka2014-08-191-0/+23
| | | | | | | Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. llvm-svn: 216011
* Reapply [FastISel][X86] Emit more efficient instructions for integer ↵Juergen Ributzka2014-08-191-1/+28
| | | | | | | | | | | | | | | | | constant materialization (r215593). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 216010
* Revert several FastISel commits to track down a buildbot error.Juergen Ributzka2014-08-141-68/+2
| | | | | | | | | | | | This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
* [FastISel][X86] Add large code model support for materializing ↵Juergen Ributzka2014-08-131-1/+17
| | | | | | | | | | | | | | floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595
* [FastISel][X86] Use XOR to materialize the "0" value.Juergen Ributzka2014-08-131-0/+23
| | | | llvm-svn: 215594
* [FastISel][X86] Emit more efficient instructions for integer constant ↵Juergen Ributzka2014-08-131-1/+28
| | | | | | | | | | | | | materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593
* [FastISel][X86] Refactor constant materialization. NFCI.Juergen Ributzka2014-08-131-54/+67
| | | | | | | Split the constant materialization code into three separate helper functions for Integer-, Floating-Point-, and GlobalValue-Constants. llvm-svn: 215586
* [FastISel][X86] Silence -Wenum-compare warningRui Ueyama2014-08-081-2/+6
| | | | llvm-svn: 215253
* [FastISel][X86] Fix INC/DEC optimization (r215230)Juergen Ributzka2014-08-081-1/+1
| | | | | | | | I accidentally also used INC/DEC for unsigned arithmetic which doesn't work, because INC/DEC don't set the required flag which is used for the overflow check. llvm-svn: 215237
* [FastISel][X86] Use INC/DEC when possible for {sadd|ssub}.with.overflow ↵Juergen Ributzka2014-08-081-5/+24
| | | | | | | | | | intrinsics. This is a small peephole optimization to emit INC/DEC when possible. Fixes <rdar://problem/17952308>. llvm-svn: 215230
* Remove the target machine from CCState. Previously it was only usedEric Christopher2014-08-061-5/+3
| | | | | | | | | to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-5/+5
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [X86] Simplify X87 stackifier pass.Akira Hatanaka2014-08-011-34/+28
| | | | | | | | | | | | | | | | | | | Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580
* [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during ↵Juergen Ributzka2014-07-311-0/+5
| | | | | | | | | | function call lowering. UNDEF arguments are not ment to be touched - especially for the webkit_js calling convention. This fix reproduces the already existing behavior of SelectionDAG in FastISel. llvm-svn: 214366
* [FastISel] Move the helper function isCommutativeIntrinsic into FastISel ↵Juergen Ributzka2014-07-301-12/+0
| | | | | | | | | base class. Move the helper function isCommutativeIntrinsic into the FastISel base class, so it can be used by more than just one backend. llvm-svn: 214347
OpenPOWER on IntegriCloud