summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits.Craig Topper2019-11-061-2/+5
| | | | | | | | | | | | | | | | | | The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922
* [X86TargetTransformInfo] Fixed warning: Expression 'ISD == ISD::UREM' is ↵Dávid Bolvanský2019-11-061-1/+1
| | | | always true. NFCI.
* [X86] Fix SLM v2i64 ADD/Sub/CMPEQ instruction schedulesSimon Pilgrim2019-11-061-0/+16
| | | | | | Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2i64 ops. Numbers taken from Intel AOM (+ checked against Agner)
* [X86] Fix SLM v2f64 ADD/MUL + FP BLEND/HADD instruction schedulesSimon Pilgrim2019-11-061-7/+7
| | | | Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2f64/v2i64 ops.
* [X86ISelLowering] Fixed typo in assert. NFCI.Dávid Bolvanský2019-11-061-1/+1
|
* [CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLMSimon Pilgrim2019-11-061-0/+26
| | | | As noted on D59710 we weren't handling the high costs of these operations on SLM.
* [x86] avoid crashing when splitting AVX stores with non-simple type (PR43916)Sanjay Patel2019-11-061-3/+5
| | | | | The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.
* [X86] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-067-26/+26
|
* [X86] LowerAVXExtend - fix dodgy self-comparison assert.Simon Pilgrim2019-11-061-1/+1
| | | | PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".
* [X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMathBenjamin Kramer2019-11-051-8/+7
|
* [X86/Atomics] (Semantically) revert G246098, switch back to the old atomic ↵Philip Reames2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.
* [X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZerosBenjamin Kramer2019-11-051-2/+3
| | | | | The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-053-8/+8
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* [X86] Lower the cost of avx512 horizontal bool and/or reductions to ↵Craig Topper2019-11-041-0/+21
| | | | | | | | | | | | | | | | | 2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.
* [X86] Teach X86MCInstLower to swap operands of commutable instructions to ↵Craig Topper2019-11-041-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | enable 2-byte VEX encoding. Summary: The 2 source operands commutable instructions are encoded in the VEX.VVVV field and the r/m field of the MODRM byte plus the VEX.B field. The VEX.B field is missing from the 2-byte VEX encoding. If the VEX.VVVV source is 0-7 and the other register is 8-15 we can swap them to avoid needing the VEX.B field. This works as long as the VEX.W, VEX.mmmmm, and VEX.X fields are also not needed. Fixes PR36706. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68550
* [X86] Add support for -mvzeroupper and -mno-vzeroupper to match gccCraig Topper2019-11-044-53/+68
| | | | | | | | | | | | | | | | | -mvzeroupper will force the vzeroupper insertion pass to run on CPUs that normally wouldn't. -mno-vzeroupper disables it on CPUs where it normally runs. To support this with the default feature handling in clang, we need a vzeroupper feature flag in X86.td. Since this flag has the opposite polarity of the fast-partial-ymm-or-zmm-write we used to use to disable the pass, we now need to add this new flag to every CPU except KNL/KNM and BTVER2 to keep identical behavior. Remove -fast-partial-ymm-or-zmm-write which is no longer used. Differential Revision: https://reviews.llvm.org/D69786
* [X86] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-0410-42/+42
|
* [X86] Convert ShrinkMode to scoped enum class. NFCI.Simon Pilgrim2019-11-041-11/+15
|
* [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵Simon Pilgrim2019-11-041-0/+17
| | | | | | | | | | DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3
* Set the floating point status register as reservedPengfei Wang2019-11-031-0/+3
| | | | | | | | | | | | | | | | | | | | Summary: This patch sets the FPSW (X87 floating-point status register) as a reserved physical register and fix the test failure caused by [[ https://reviews.llvm.org/D68854| D68854 ]]. Before this patch, some tests will fail because it implicit uses FPSW without define it. Setting the FPSW as a reserved physical register will skip liveness analysis because it is always live. Reviewers: pengfei, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, hiraditya, llvm-commits Patch by LiuChen. Differential Revision: https://reviews.llvm.org/D69784
* [X86][SSE] combineX86ShufflesRecursively - at Depth==0, only resolve ↵Simon Pilgrim2019-11-031-6/+31
| | | | | | | | KnownZero if it removes an input. This stops infinite loops where KnownUndef elements are converted to Zeroable, resulting in KnownZero elements which are then simplified (via SimplifyDemandedElts etc.) back to KnownUndef elements........ Prep fix for PR43024 which will allow rL368307 to be re-applied.
* [X86][SSE] combineX86ShufflesRecursively - don't bother merging shuffles ↵Simon Pilgrim2019-11-031-92/+105
| | | | | | with empty roots. NFCI. This doesn't affect actual codegen, but is a minor refactor toward fixing PR43024 where we need to avoid excess changes (folding zeroables etc.) to the shuffle mask at Depth == 0.
* [X86] Convert PICStyles::Style to scoped enum class. NFCI.Simon Pilgrim2019-11-032-11/+11
| | | | Fixes MSVC static analyzer warnings about enum safety, this enum performs no integer math so it'd be better to fix its scope.
* Fix uninitialized variable warning. NFCI.Simon Pilgrim2019-11-031-1/+1
|
* isImmPCRel/isImmSigned - both functions should return bool not unsigned. NFCI.Simon Pilgrim2019-11-021-2/+2
|
* X86_MC::createX86MCSubtargetInfo - X86_MC::ParseX86Triple never returns an ↵Simon Pilgrim2019-11-021-6/+3
| | | | | | empty string. NFCI. PVS Studio was complaining that the expression '!ArchFS.empty()' is always true.
* X86Operand::print - fix SymName shadow variable warning. NFCI.Simon Pilgrim2019-11-021-2/+2
|
* X86AsmPrinter - fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-021-2/+2
|
* [X86] Move computeZeroableShuffleElements before ↵Simon Pilgrim2019-11-021-87/+87
| | | | | | getTargetShuffleAndZeroables. NFCI. Prep work toward merging some of the functionality.
* [X86] Remove FeatureSSE3 from the implies list of HasFastHorizontalOps.Craig Topper2019-11-011-1/+1
| | | | HasFastHorizontalOps is a tuning flag. It shouldn't imply an ISA flag.
* [X86] Model MXCSR for MMX FP instructionsPengfei Wang2019-11-011-5/+5
| | | | | | | | | | | | | | | | Summary: This patch models MXCSR and adds flag "mayRaiseFPException" for MMX FP instructions. Reviewers: craig.topper, andrew.w.kaylor, RKSimon, cameron.mcinally Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D69702
* [X86] add mayRaiseFPException flag and FPCW registers for X87 instructionsPengfei Wang2019-11-012-25/+46
| | | | | | | | | | | | | | | | Summary: This patch adds flag "mayRaiseFPException" , FPCW and FPSW for X87 instructions which could raise float exception. Reviewers: pengfei, RKSimon, andrew.w.kaylor, uweigand, kpn, spatel, cameron.mcinally, craig.topper Reviewed By: craig.topper Subscribers: thakis, hiraditya, llvm-commits Patch by LiuChen. Differential Revision: https://reviews.llvm.org/D68854
* [X86] Change the behavior of canWidenShuffleElements used by ↵Craig Topper2019-11-011-19/+14
| | | | | | | | | | | lowerV2X128Shuffle to match the behavior in lowerVectorShuffle with regards to zeroable elements. Previously we marked zeroable elements in a way that prevented the widening check from recognizing that it could widen. Now we only mark them zeroable if V2 is an all zeros vector. This matches what we do for widening elements in lowerVectorShuffle. Fixes PR43866.
* [X86][AVX] Add support for and/or scalar bool reduction with AVX512 mask ↵Simon Pilgrim2019-11-011-0/+6
| | | | | | registers combineBitcastvxi1 only handles bitcast->MOVMSK combines, with mask registers we use BITCAST directly.
* [X86] isFNEG - use switch() instead of if-else tree. NFCI.Simon Pilgrim2019-11-011-33/+36
| | | | In a future patch this will avoid some checks which don't need to be done for some opcodes.
* [X86] Reland: Enable YMM memcmp with AVX1David Zarzycki2019-11-011-3/+2
| | | | | | | | | | Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. This also updates the memcmp-optsize.ll test unlike the first patch. https://reviews.llvm.org/D69658
* Revert "[X86] add mayRaiseFPException flag and FPCW registers for X87 ↵Nico Weber2019-10-312-46/+25
| | | | | | | instructions" This reverts commit a678677da498a45f59c16ee74fea438e34a801ce. It broke CodeGen/ms-inline-asm.c on most bots.
* [X86] add mayRaiseFPException flag and FPCW registers for X87 instructionsCraig Topper2019-10-312-25/+46
| | | | | | | | | This patch adds flag "mayRaiseFPException" , FPCW and FPSW for X87 instructions which could raise float exception. Patch by LiuChen. With a couple small fixes from me. Differential Revision: https://reviews.llvm.org/D68854
* [X86] Remove FSIN/FCOS isel patterns and the pseudo instructions that they ↵Craig Topper2019-10-313-13/+3
| | | | | | | selected for the FP stackifier. We always expand these to libcalls so get rid of the last vestiges of using the instructions.
* Revert rG0e252ae19ff8d99a59d64442c38eeafa5825d441 : [X86] Enable YMM memcmp ↵Simon Pilgrim2019-10-311-2/+3
| | | | | | | | with AVX1 Breaks build bots Differential Revision: https://reviews.llvm.org/D69658
* [X86] Enable YMM memcmp with AVX1David Zarzycki2019-10-311-3/+2
| | | | | | | | Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. https://reviews.llvm.org/D69658
* Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional ↵Simon Pilgrim2019-10-312-9/+13
| | | | | | destination and source pair as a return value; NFC This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
* [TII] Use optional destination and source pair as a return value; NFCDjordje Todorovic2019-10-312-13/+9
| | | | | | | | | | Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622
* [X86][SSE] Convert computeZeroableShuffleElements to emit KnownUndef and ↵Simon Pilgrim2019-10-311-23/+35
| | | | KnownZero
* [cfi] Add flag to always generate .debug_frameDavid Candler2019-10-312-12/+6
| | | | | | | | | This adds a flag to LLVM and clang to always generate a .debug_frame section, even if other debug information is not being generated. In situations where .eh_frame would normally be emitted, both .debug_frame and .eh_frame will be used. Differential Revision: https://reviews.llvm.org/D67216
* [X86] Model MXCSR for all SSE instructionsCraig Topper2019-10-304-51/+97
| | | | | | | | | | | | | | | This patch adds MXCSR as a reserved physical register and models its use by X86 SSE instructions. It also adds flag "mayRaiseFPException" for the instructions that possibly can raise FP exception according to the architecture definition. Following what SystemZ and other targets does, only the current rounding modes and the IEEE exception masks are modeled. *Changes* of the MXCSR due to exceptions are not modeled. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D68121
* [X86] Rewrite hasReassociableOperands and setSpecialOperandAttr to not ↵Craig Topper2019-10-301-28/+19
| | | | | | hardcode number of operands or position of the EFLAGS operand. This makes the code immune to the MXCSR addition in D68121.
* [X86] Add FIXME comment to merge more of computeZeroableShuffleElements and ↵Simon Pilgrim2019-10-301-0/+1
| | | | getTargetShuffleAndZeroables
* [X86][SSE] combineX86ShuffleChain - use resolveZeroablesFromTargetShuffle ↵Simon Pilgrim2019-10-301-4/+3
| | | | helper. NFCI.
* [X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check ↵Simon Pilgrim2019-10-301-1/+2
| | | | | | FSHL/FSHR support Remove hard wired legality check.
OpenPOWER on IntegriCloud