summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to ↵Craig Topper2019-12-271-1/+1
| | | | | | | | v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.
* [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.Craig Topper2019-12-271-3/+19
|
* Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821Fangrui Song2019-12-271-13/+0
| | | | | | | Intrinsic has incorrect argument type! i32 (i32*)* @llvm.setjmp *wipes tear*
* [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵Craig Topper2019-12-263-64/+92
| | | | | | | | | | | | | | | | | | | | | | avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
* [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.Craig Topper2019-12-261-0/+12
| | | | | For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.
* [X86] Fix -Wmisleading-indentation after D71892Fangrui Song2019-12-261-0/+1
|
* [X86][FPEnv] Promote some float strictfp operations to double on ↵Craig Topper2019-12-261-2/+9
| | | | | | | | i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.
* [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.Craig Topper2019-12-261-7/+16
| | | | | | | | I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.
* add custom operation for strict fpextend/fproundLiu, Chen32019-12-275-20/+57
| | | | Differential Revision: https://reviews.llvm.org/D71892
* Remove SrcVT only used in an assert and propagate query.Eric Christopher2019-12-261-2/+2
|
* [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without ↵Craig Topper2019-12-262-62/+106
| | | | | | | | avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.
* [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, ↵Craig Topper2019-12-262-10/+23
| | | | | | | | | | | | | | | but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.
* [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, ↵Craig Topper2019-12-261-6/+15
| | | | | | | | | | | | | but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.
* [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since ↵Craig Topper2019-12-261-23/+11
| | | | the AVX512DQ+AVX512VL code is very similar in both. NFC
* [X86] Add custom lowering for v2i64->v2f32 ↵Craig Topper2019-12-261-8/+32
| | | | | | | strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.
* [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backendWang, Pengfei2019-12-265-46/+141
| | | | | | | | | | | | Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
* [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint ↵Craig Topper2019-12-251-3/+7
| | | | | | | | v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.
* [X86FixupSetCC] Remember the preceding eflags defining instruction while ↵Craig Topper2019-12-251-27/+5
| | | | | | | | | | | | | | | | | | | | | we're scanning the basic block instead of looking back for it. Summary: We're already scanning forward through the basic block. Might as well just remember eflags defs instead of doing a bounded search backwards later. Based on a comment in D71841. Reviewers: RKSimon, spatel, uweigand Reviewed By: uweigand Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71865
* [X86] Merge together some common code in LowerFP_TO_INT now that we have ↵Craig Topper2019-12-251-17/+11
| | | | STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC
* Add missing strict_fp_to_intLiu, Chen32019-12-251-0/+3
| | | | Differential Revision: https://reviews.llvm.org/D71867
* [X86FixupSetCC] Use MachineInstr::readRegister/definesRegister to check for ↵Craig Topper2019-12-241-15/+3
| | | | EFLAGS use/def instead of our own custom operand scan. NFCI
* [WinEH] Delete addFnAttr("no-frame-pointer-elim") which seems no longer neededFangrui Song2019-12-241-5/+0
| | | | | | | | It was added in rL238619. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71862
* [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit ↵Craig Topper2019-12-241-7/+14
| | | | | | | | | | | | | | targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.
* [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.Craig Topper2019-12-246-165/+182
| | | | Differential Revision: https://reviews.llvm.org/D71850
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-234-92/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵Sanjay Patel2019-12-231-26/+0
| | | | | | | | | | | | | | | | | | | extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
* [NFC] Style cleanupsShengchen Kan2019-12-231-22/+23
| | | | | | 1. Remove duplicate function for class name at the beginning of the comment. 2. Use auto where the type is already obvious from the context.
* [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵Eric Astor2019-12-224-13/+41
| | | | | | | | | | | | | | | | | | | | assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
* More style cleanups following rG14fc20ca6282 [NFC]Philip Reames2019-12-201-34/+28
| | | | | | | Demote member functions to static functions where possible Use early continue/early return to reduce nesting Clarify comments slightly. Reuse previously define expression in one case.
* Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282Philip Reames2019-12-201-6/+6
| | | | Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
* Align branches within 32-Byte boundary (NOP padding)Philip Reames2019-12-201-1/+286
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARNING: If you're looking at this patch because you're looking for a full performace mitigation of the Intel JCC Erratum, this is not it! This is a preliminary patch on the patch towards mitigating the performance regressions caused by Intel's microcode update for Jump Conditional Code Erratum. For context, see: https://www.intel.com/content/www/us/en/support/articles/000055650.html The patch adds the required assembler infrastructure and command line options needed to exercise the logic for INTERNAL TESTING. These are NOT public flags, and should not be used for anything other than LLVM's own testing/debugging purposes. They are likely to change both in spelling and meaning. WARNING: This patch is knowingly incorrect in some cornercases. We need, and do not yet provide, a mechanism to selective enable/disable the padding. Conversation on this will continue in parellel with work on extending this infrastructure to support prefix padding. The goal here is to have the assembler align specific instructions such that they neither cross or end at a 32 byte boundary. The impacted instructions are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Indirect jump. e. Ret. f. Call. The new options for llvm-mc are: -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary. -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align. A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit NOP to align the fused/unfused branch. alignBranchesBegin inserts MCBoundaryAlignFragment before instructions, alignBranchesEnd marks the end of the branch to be aligned, relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch. Nop padding is disabled when the instruction may be rewritten by the linker, such as TLS Call. Process Note: I am landing a patch by skan as it has been LGTMed, and continuing to iterate on the review is simply slowing us down at this point. We can and will continue to iterate in tree. Patch By: skan Differential Revision: https://reviews.llvm.org/D70157
* [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables ↵Craig Topper2019-12-201-19/+23
| | | | | | | | | | | | | | before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736
* [X86] Make EmitCmp into a static function and explicitly return chain result ↵Craig Topper2019-12-192-18/+19
| | | | | | | | | | | | for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.
* [X86] Directly call EmitTest in two places instead of creating a null ↵Craig Topper2019-12-191-4/+2
| | | | | | | | | constant and calling EmitCmp. NFCI EmitCmp will just immediately call EmitTest and discard the null constant only to have EmitTest create it again if it doesn't fold. So just skip all that and go directly to EmitTest.
* [StackMaps] Be explicit about label formation [NFC] (try 2)Philip Reames2019-12-191-3/+14
| | | | | | Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all. For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
* Temporarily Revert "[StackMaps] Be explicit about label formation [NFC]"Eric Christopher2019-12-191-14/+3
| | | | | | as it broke the aarch64 build. This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
* [StackMaps] Be explicit about label formation [NFC]Philip Reames2019-12-191-3/+14
| | | | For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
* [FaultMaps] Make label formation a bit more explicit [NFC]Philip Reames2019-12-191-1/+5
| | | | This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
* [llvm-exegesis] Fix pfm counter names for Haswell for older versions of libpfmMiloš Stojanović2019-12-191-8/+8
| | | | | | | | The inconsistency caused uops mode to fail on an older version of libpfm since the dispatched_port was added as an alias for executed_port only after v4.6.0 of libpfm. Differential revision: https://reviews.llvm.org/D71665
* Enable STRICT_FP_TO_SINT/UINT on X86 backendLiu, Chen32019-12-195-113/+196
| | | | | | This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
* [X86] Add a simple hack to IsProfitableToFold to prevent vselect+strict fp ↵Craig Topper2019-12-181-0/+6
| | | | | | | | operations from being folded into masked instructions. We really need to update the isel patterns to prevent this, but that requires some tablegen de-tangling. So this hack will work for correctness in the short term.
* [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]Anna Welker2019-12-182-7/+9
| | | | | | | Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610
* [X86] Add strict fma supportWang, Pengfei2019-12-184-19/+26
| | | | | | | | | | | | Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604
* [X86] Manually format some setOperationAction calls to line up arguments to ↵Craig Topper2019-12-171-8/+8
| | | | improve readability. NFC
* Revert "Honor -fuse-init-array when os is not specified on x86"Mitch Phillips2019-12-173-3/+56
| | | | | | | This reverts commit aa5ee8f244441a8ea103a7e0ed8b6f3e74454516. This change broke the sanitizer buildbots. See comments at the patchset (https://reviews.llvm.org/D71360) for more information.
* This adds constrained intrinsics for the signed and unsigned conversionsKevin P. Neal2019-12-171-37/+137
| | | | | | | | | of integers to floating point. This includes some of Craig Topper's changes for promotion support from D71130. Differential Revision: https://reviews.llvm.org/D69275
* Fix assertion failure in getMemOperandWithOffsetWidthKristof Beyls2019-12-171-2/+3
| | | | | | | | | | | | | | | | | | | | This fixes an assertion failure that triggers inside getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr that is not a memory operation. Different backends implement getMemOperandWithOffset differently: some return false on non-memory MachineInstrs, others assert. The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck relies on getMemOperandWithOffset to return false on non-memory MachineInstrs, instead of asserting. This patch updates the documentation on getMemOperandWithOffset that it should return false on any MachineInstr it cannot handle, instead of asserting. It also adapts the in-tree backends accordingly where necessary. Differential Revision: https://reviews.llvm.org/D71359
* Honor -fuse-init-array when os is not specified on x86Kamlesh Kumar2019-12-163-56/+3
| | | | | | | | | | | | | | | | | | | | | Currently -fuse-init-array option is not effective when target triple does not specify os, on x86,x86_64. i.e. // -fuse-init-array is not honored. $ clang -target i386 -fuse-init-array test.c -S // -fuse-init-array is honored. $ clang -target i386-linux -fuse-init-array test.c -S This patch fixes first case. And does cleanup. Reviewers: rnk, craig.topper, fhahn, echristo Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71360
* [NFC] Use EVT instead of bool for getSetCCInverse()Alex Richardson2019-12-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void *)0x12033091e < (void *)0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917
* [IR] Split out target specific intrinsic enums into separate headersReid Kleckner2019-12-115-1/+6
| | | | | | | | | | | | | | | | | | | | This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
OpenPOWER on IntegriCloud