summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* Reland "Change the X86 datalayout to add three address spacesAmy Huang2019-09-102-2/+18
| | | | | | | | | | for 32 bit signed, 32 bit unsigned, and 64 bit pointers." This reverts 57076d3199fc2b0af4a3736b7749dd5462cacda5. Original review at https://reviews.llvm.org/D64931. Review for added fix at https://reviews.llvm.org/D66843. llvm-svn: 371568
* [X86] Updated target specific selection dag code to conservatively check for ↵Philip Reames2019-09-103-20/+20
| | | | | | | | | | | | | | isAtomic in addition to isVolatile See D66309 for context. This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific. Differential Revision: https://reviews.llvm.org/D66322 llvm-svn: 371547
* [X86] Add broadcast load unfolding support for VCMPPS/PD.Craig Topper2019-09-101-0/+6
| | | | llvm-svn: 371487
* [Windows] Replace TrapUnreachable with an int3 insertion passReid Kleckner2019-09-094-11/+125
| | | | | | | | | | | | This is an alternative to D66980, which was reverted. Instead of inserting a pseudo instruction that optionally expands to nothing, add a pass that inserts int3 when appropriate after basic block layout. Reviewers: hans Differential Revision: https://reviews.llvm.org/D67201 llvm-svn: 371466
* Introduce infrastructure for an incremental port of SelectionDAG atomic ↵Philip Reames2019-09-092-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441
* [SelectionDAG] Remove ISD::FP_ROUND_INREGCraig Topper2019-09-091-1/+0
| | | | | | | | | | | | I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431
* [X86] Allow _MM_FROUND_CUR_DIRECTION and _MM_FROUND_NO_EXC to be used ↵Craig Topper2019-09-091-2/+10
| | | | | | | | | | together on instructions that only support SAE and not embedded rounding. Current for SAE instructions we only allow _MM_FROUND_CUR_DIRECTION(bit 2) or _MM_FROUND_NO_EXC(bit 3) to be used as the immediate passed to the inrinsics. But these instructions don't perform rounding so _MM_FROUND_CUR_DIRECTION is just sort of a default placeholder when you don't want to suppress exceptions. Using _MM_FROUND_NO_EXC by itself is really bit equivalent to (_MM_FROUND_NO_EXC | _MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. Since we aren't rounding on these instructions we should also accept (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) as equivalent to (_MM_FROUND_NO_EXC). icc allows this, but gcc does not. Differential Revision: https://reviews.llvm.org/D67289 llvm-svn: 371430
* [X86] Add broadcast load unfolding support for vpcmpeq/vpcmpgt/vpcmp/vpcmpu.Craig Topper2019-09-091-0/+24
| | | | llvm-svn: 371368
* [X86] Add broadcast load unfold support for smin/umin/smax/umax.Craig Topper2019-09-091-0/+24
| | | | llvm-svn: 371366
* [X86] Add broadcast load unfolding support for VMAXPS/PD and VMINPS/PD.Craig Topper2019-09-091-0/+24
| | | | llvm-svn: 371363
* [X86] Use xorps to create fp128 +0.0 constants.Craig Topper2019-09-095-4/+24
| | | | | | This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357
* [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support.Simon Pilgrim2019-09-081-26/+52
| | | | | | This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded. llvm-svn: 371353
* [X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with ↵Craig Topper2019-09-081-0/+9
| | | | | | | | | | | two zero/undef vector inputs into an all zeroes vector. If the two zero vectors have undefs in different places they won't get combined by simplifySelect. This fixes a regression from an earlier commit. llvm-svn: 371351
* [X86] Remove call to getZeroVector from materializeVectorConstant. Add isel ↵Craig Topper2019-09-083-9/+17
| | | | | | | | | | | | | | | | patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350
* [X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() ↵Roman Lebedev2019-09-081-3/+6
| | | | | | | | | | | | | | | manually As reported in post-commit review of r370327, there is some case where the code crashes. As discussed with Craig Topper, the problem is that getConstant() internally calls getSplatBuildVector(), so we don't insert the constant itself. If we do that manually we're good. llvm-svn: 371346
* [X86] Use DAG.getConstant instead of getZeroVector in combinePMULDQ.Craig Topper2019-09-081-1/+1
| | | | | | | | getZeroVector canonicalizes the type to vXi32, but that's a legalization action. We should use the most correct type if possible. llvm-svn: 371345
* [X86] Teach materializeVectorConstant to not call ↵Craig Topper2019-09-081-3/+3
| | | | | | getZeroVector/getOnesVector on the types we already have isel patterns for. llvm-svn: 371343
* [DebugInfo][X86] Describe call site values for zero-valued immsDavid Stenberg2019-09-081-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add zero-materializing XORs to X86's describeLoadedValue() hook in order to produce call site values. I have had to change the defs logic in collectCallSiteParameters() a bit to be able to describe the XORs. The XORs implicitly define $eflags, which would cause them to never be considered, due to a guard condition that I->getNumDefs() is one. I have changed that condition so that we now only consider instructions where a forwarded register overlaps with the instruction's single explicit define. We still need to collect the implicit defines of other forwarded registers to remove them from the work list. I'm not sure how to move towards supporting instructions with multiple explicit defines, cases where forwarded register are implicitly defined, and/or cases where an instruction produces values for multiple forwarded registers. Perhaps the describeLoadedValue() hook should take a register argument, and we then leave it up to the hook to describe the loaded value in that register? I have not yet encountered a situation where that would be necessary though. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: ychen, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67225 llvm-svn: 371333
* [NFC] Make the describeLoadedValue() hook return machine operand objectsDavid Stenberg2019-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This changes the ParamLoadedValue pair which the describeLoadedValue() hook returns so that MachineOperand objects are returned instead of pointers. When describing call site values we may need to describe operands which are not part of the instruction. One such example is zero-materializing XORs on x86, which I have implemented support for in a child revision. Instead of having to return a pointer to an operand stored somewhere outside the instruction, start returning objects directly instead, as that simplifies the code. The MachineOperand class only holds POD members, and on x86-64 it is 32 bytes large. That combined with copy elision means that the overhead of returning a machine operand object from the hook does not become very large. I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The benchmark consisted of building a clang 8.0 binary configured with: -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DLLVM_TARGETS_TO_BUILD=X86 \ -DLLVM_USE_SANITIZER=Address \ -DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++" The average wall clock time increased by 4 seconds, from 62:05 to 62:09, which is an 0.1% increase. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: hiraditya, ychen, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67261 llvm-svn: 371332
* [X86][SSE] Fix out of range shift introduced in D67070/rL371328Simon Pilgrim2019-09-081-1/+2
| | | | | | Use APInt to create the comparison mask instead. llvm-svn: 371330
* [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-09-081-11/+14
| | | | | | | | | | This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises. We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal. Differential Revision: https://reviews.llvm.org/D67070 llvm-svn: 371328
* [X86] Make getZeroVector return floating point vectors in their native type ↵Craig Topper2019-09-083-2/+23
| | | | | | | | | | | | | | on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325
* [X86] Add support for unfold broadcast loads from FMA instructions.Craig Topper2019-09-071-0/+121
| | | | llvm-svn: 371323
* [X86] Add prefer-128-bit subtarget feature.Craig Topper2019-09-074-0/+10
| | | | | | | | | | | | | | | | | | | Summary: Similar to the previous prefer-256-bit flag. We might want to enable this by default some CPUs. This just starts the initial work to implement and prove that it effects TTI's vector width. Reviewers: RKSimon, echristo, spatel, atdt Reviewed By: RKSimon Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67311 llvm-svn: 371319
* [X86] Avoid uses of getZextValue(). NFCI.Simon Pilgrim2019-09-071-22/+19
| | | | | | | | Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315
* [X86] Fix pshuflw formation from repeated shuffle mask (PR43230)Nikita Popov2019-09-071-2/+2
| | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.Simon Pilgrim2019-09-071-1/+1
| | | | llvm-svn: 371302
* Remove dead .seh_stackalloc parsing method in X86AsmParserReid Kleckner2019-09-061-14/+0
| | | | | | | The shared COFF asm parser code handles this directive, since it is shared with AArch64. Spotted by Alexandre Ganea in review. llvm-svn: 371251
* [X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.Craig Topper2019-09-061-5/+8
| | | | | | | | | | | | | | | | | We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243
* [X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.Craig Topper2019-09-061-3/+3
| | | | | | | | We can rely on X86FixupBWInsts to turn these into MOVZX32. This simplifies a follow up commit to use MOVSX for i8 sdivrem with a late optimization to use CBW when register allocation works out. llvm-svn: 371242
* [X86] Teach FixupBWInsts to turn MOVSX16rr8/MOVZX16rr8/MOVSX16rm8/MOVZX16rm8 ↵Craig Topper2019-09-061-6/+48
| | | | | | into their 32-bit dest equivalents when the upper part of the register is dead. llvm-svn: 371240
* [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignmentGuillaume Chatelet2019-09-061-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212
* [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignmentGuillaume Chatelet2019-09-061-4/+5
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210
* [X86] Fix bad indentation. NFCCraig Topper2019-09-061-1/+1
| | | | llvm-svn: 371167
* [X86] Enable BuildSDIVPow2 for i16.Craig Topper2019-09-051-2/+3
| | | | | | | We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107
* [X86] Override BuildSDIVPow2 for X86.Craig Topper2019-09-052-0/+58
| | | | | | | | | | | | | | | | | | | | | | | As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104
* [x86] fix horizontal math bug exposed by improved demanded elements analysis ↵Sanjay Patel2019-09-051-5/+24
| | | | | | | | (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095
* [X86] Add a FIXME about why the CWD/CDQ/CQO have a bogus implicit def of the ↵Craig Topper2019-09-051-6/+5
| | | | | | | | | A register. NFC The instructions copy the sign bit of the A register to every bit of the D register. But they don't write to the A register. llvm-svn: 371094
* [X86] Fix stale comment. NFCCraig Topper2019-09-051-2/+2
| | | | | | | We aren't checking for a concat here. We're just always splitting 256-bit stores. llvm-svn: 371092
* [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads ↵Simon Pilgrim2019-09-051-0/+4
| | | | | | | | | | (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078
* [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::AlignGuillaume Chatelet2019-09-052-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063
* [X86] X86SpeculativeLoadHardeningPass::canHardenRegister - fix out of bounds ↵Simon Pilgrim2019-09-051-2/+5
| | | | | | | | warning. Fixes clang static-analyzer warning. llvm-svn: 371050
* [X86] X86InstrInfo::optimizeCompareInstr - fix potential null dereference. Simon Pilgrim2019-09-051-2/+3
| | | | | | | | Fixes clang static-analyzer warning. Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0. llvm-svn: 371047
* [LLVM][Alignment] Make functions using log of alignment explicitGuillaume Chatelet2019-09-052-3/+3
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
* Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-09-037-37/+10
| | | | | | | | | | | | | | | | | | This reverts r370525 (git commit 0bb1630685fba255fa93def92603f064c2ffd203) Also reverts r370543 (git commit 185ddc08eed6542781040b8499ef7ad15c8ae9f4) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829
* [GlobalISel][CallLowering] Add support for splitting types according to ↵Amara Emerson2019-09-031-3/+4
| | | | | | | | | | | | | | calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
* [X86] Merge 2 consecutive HasInt256 branches. NFCI.Simon Pilgrim2019-09-031-3/+2
| | | | llvm-svn: 370761
* [X86] Simplify the setOperationAction handling for fp_to_uint by improving ↵Craig Topper2019-09-032-19/+22
| | | | | | | | | | | | | | | | the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700
* [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.Craig Topper2019-09-031-13/+7
| | | | | | | | Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699
* [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.Craig Topper2019-09-031-8/+7
| | | | | | | Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693
OpenPOWER on IntegriCloud