summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Correct scheduler information for rotate by constant for Haswell, ↵Craig Topper2019-03-074-4/+36
| | | | | | | | | | | | | | Broadwell, and Skylake. Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not. The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that. RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd. Differential Revision: https://reviews.llvm.org/D59077 llvm-svn: 355636
* [X86] Model ADC/SBB with immediate 0 more accurately in the Haswell ↵Craig Topper2019-03-071-0/+30
| | | | | | | | | | scheduler model Haswell and possibly Sandybridge have an optimization for ADC/SBB with immediate 0 to use a single uop flow. This only applies GR16/GR32/GR64 with an 8-bit immediate. It does not apply to GR8. It also does not apply to the implicit AX/EAX/RAX forms. Differential Revision: https://reviews.llvm.org/D59058 llvm-svn: 355635
* Delete x86_64 ShadowCallStack supportVlad Tsyrklevich2019-03-074-330/+0
| | | | | | | | | | | | | | | | | | | | | Summary: ShadowCallStack on x86_64 suffered from the same racy security issues as Return Flow Guard and had performance overhead as high as 13% depending on the benchmark. x86_64 ShadowCallStack was always an experimental feature and never shipped a runtime required to support it, as such there are no expected downstream users. Reviewers: pcc Reviewed By: pcc Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59034 llvm-svn: 355624
* [X86] Enable combineFMinNumFMaxNum for 512 bit vectors when AVX512 is enabled.Craig Topper2019-03-071-5/+7
| | | | | | | | Simplified by just checking if the vector type is legal rather than listing all combinations of types and features. Fixes PR40984. llvm-svn: 355582
* [DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 foldSimon Pilgrim2019-03-061-6/+0
| | | | | | | | Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well. Requested by @spatel on D59006 llvm-svn: 355533
* [X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS)Simon Pilgrim2019-03-061-0/+6
| | | | | | | | | | As noticed on D58965 DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point. Differential Revision: https://reviews.llvm.org/D58974 llvm-svn: 355494
* [X86] Enable the add with 128 -> sub with -128 encoding trick with ↵Craig Topper2019-03-061-0/+10
| | | | | | | | | | X86ISD::ADD when the carry flag isn't used. This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate. Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq. llvm-svn: 355485
* [X86] Suppress load folding for add/sub with 128 immediate.Craig Topper2019-03-061-0/+6
| | | | | | 128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate. llvm-svn: 355484
* [X86] Remove periods from the end of SubtargetFeature descriptions since the ↵Craig Topper2019-03-061-7/+7
| | | | | | | | help printer adds a period. Most features don't have periods already, but some did. When there is a period it causes llc -mattr=+help to print 2 periods. llvm-svn: 355474
* Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-053-43/+49
| | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures. llvm-svn: 355433
* [X86] In X86DomainReassignment.cpp add enclosed registers to EnclosedEdgesGuozhi Wei2019-03-051-0/+1
| | | | | | | | | | The variable X86DomainReassignment::EnclosedEdges is used to store registers that have been enclosed in some closure, so those registers will be ignored when create new closures. But there is no registers has ever been put into this set, so a single register can be enclosed in multiple closures, it significantly increase compile time. This patch adds a register into EnclosedEdges when it is enclosed into a closure. Differential Revision: https://reviews.llvm.org/D58646 llvm-svn: 355430
* [X86] Enable 8-bit SHL to convert to LEACraig Topper2019-03-052-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D58870 llvm-svn: 355425
* [X86] Allow 8-bit INC/DEC to be converted to LEA.Craig Topper2019-03-052-6/+9
| | | | | | | | We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too. Differential Revision: https://reviews.llvm.org/D58869 llvm-svn: 355424
* [X86] Enable 8-bit OR with disjoint bits to convert to LEACraig Topper2019-03-056-10/+33
| | | | | | | | We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423
* [X86] Reduce some patterns by using FP instructions for integer types even ↵Craig Topper2019-03-051-61/+9
| | | | | | | | | | | | when AVX2 is available and execution domain fixing will do the right thing We have quite a few cases of using FP instructions for integer operations when only AVX1 is available. Then we switch to integer instructions with AVX2. In a lot of these cases execution domain fixing will take care of turning FP instructions into integer if its profitable. With this patch we just keep on using the FP instructions even with AVX2. I've only handled some cases that don't require messing with patterns that are defined in the instruction definition. Those will require more subtle multiclass work possibly involving null_frag, hasSideEffects = 0, etc. Differential Revision: https://reviews.llvm.org/D58470 llvm-svn: 355361
* [X86] Avoid codegen changes when DBG_VALUE appears between lowered selectsJeremy Morse2019-03-041-4/+15
| | | | | | | | | | | | | | | | X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo instructions without accounting for debug intrinsics. This leads to different codegen with and without option -g, if a DBG_VALUE instruction lands in the middle of several lowered selects. Work around this by skipping over debug instructions when looking for CMOV sequences, and sinking those debug insts into the EmitLoweredSelect sunk block. This might slightly shift where variables appear in the instruction sequence, but won't re-order assignments. Differential Revision: https://reviews.llvm.org/D58672 llvm-svn: 355307
* Remove unused variable. NFCI.Simon Pilgrim2019-03-031-1/+0
| | | | llvm-svn: 355289
* [X86] getShuffleScalarElt - peek through insert/extract subvector nodes.Simon Pilgrim2019-03-031-0/+23
| | | | llvm-svn: 355288
* [X86] Pull out combineToConsecutiveLoads helper. NFCI.Simon Pilgrim2019-03-031-17/+23
| | | | llvm-svn: 355287
* [X86] Prefer VPBLENDD for v2i64/v4i64 blends with AVX2.Craig Topper2019-03-031-3/+37
| | | | | | | | We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can. This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD. llvm-svn: 355281
* [X86] Improve use of SHLD/SHRDAmaury Sechet2019-03-021-0/+6
| | | | | | | | | | | | | | | Summary: This extends the variety of pattern that can generate a SHLD instead of using two shifts. This fixes a regression that would be introduced by D57367 or D33587 Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57389 llvm-svn: 355260
* [TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-013-49/+43
| | | | | | | | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary. Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355224
* [x86] scalarize extract element 0 of FP mathSanjay Patel2019-02-281-0/+59
| | | | | | | | | | | | | | | | This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs: 1. Memory loads may get scalarized resulting in more efficient code. 2. Memory stores may get scalarized resulting in more efficient code. 3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch. 4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch. The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions. Differential Revision: https://reviews.llvm.org/D58282 llvm-svn: 355130
* [X86] Don't peek through bitcasts before checking ↵Craig Topper2019-02-281-2/+5
| | | | | | | | | | | | ISD::isBuildVectorOfConstantSDNodes in combineTruncatedArithmetic We don't have any combines that can look through a bitcast to truncate a build vector of constants. So the truncate will stick around and give us something like this pattern (binop (trunc X), (trunc (bitcast (build_vector)))) which has two truncates in it. Which will be reversed by hoistLogicOpWithSameOpcodeHands in the generic DAG combiner. Thus causing an infinite loop. Even if we had a combine for (truncate (bitcast (build_vector))), I think it would need to be implemented in getNode otherwise DAG combiner visit ordering would probably still visit the binop first and reverse it. Or combineTruncatedArithmetic would need to do its own constant folding. Differential Revision: https://reviews.llvm.org/D58705 llvm-svn: 355116
* Add support for computing "zext of value" in KnownBits. NFCIBjorn Pettersson2019-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The description of KnownBits::zext() and KnownBits::zextOrTrunc() has confusingly been telling that the operation is equivalent to zero extending the value we're tracking. That has not been true, instead the user has been forced to explicitly set the extended bits as known zero afterwards. This patch adds a second argument to KnownBits::zext() and KnownBits::zextOrTrunc() to control if the extended bits should be considered as known zero or as unknown. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58650 llvm-svn: 355099
* [X86][AVX] Remove superfluous insert_subvector(zero, bitcast(x)) -> ↵Simon Pilgrim2019-02-281-14/+0
| | | | | | | | bitcast(insert_subvector(zero, x)) fold This is caught by other existing bitcast folds. llvm-svn: 355084
* [X86][AVX] Fold vf64 concat_vectors(movddup(x),movddup(x)) -> broadcast(x)Simon Pilgrim2019-02-281-1/+11
| | | | llvm-svn: 355078
* [X86] Use PreprocessISelDAG to convert vector sra/srl/shl to the X86 ↵Craig Topper2019-02-283-121/+40
| | | | | | | | | | | | specific variable shift ISD opcodes. These allows use to use the same set of isel patterns for sra/srl/shl which are undefined for out of range shifts and intrinsic shifts which aren't undefined. Doing this late allows DAG combine to have every opportunity to optimize the sra/srl/shl nodes. This removes about 7000 bytes from the isel table and simplies the td files. llvm-svn: 355071
* [X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last ↵Craig Topper2019-02-281-1/+1
| | | | | | encoding. NFC llvm-svn: 355059
* [X86][AVX] Pull out some INSERT_SUBVECTOR combines into a ↵Simon Pilgrim2019-02-271-51/+66
| | | | | | | | | | | | combineConcatVectorOps helper. NFCI A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node. I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered. Differential Revision: https://reviews.llvm.org/D58583 llvm-svn: 355015
* [X86][AVX] Only combine loads to broadcasts for legal typesSimon Pilgrim2019-02-271-9/+11
| | | | | | Thanks to @echristo for spotting this. llvm-svn: 354961
* [X86] Fix bug in vectorcall calling conventionReid Kleckner2019-02-261-1/+4
| | | | | | | | | | | Original implementation can't correctly handle __m256 and __m512 types passed by reference through stack. This patch fixes it. Patch by Wei Xiao! Differential Revision: https://reviews.llvm.org/D57643 llvm-svn: 354921
* [X86] AMD znver2 enablementGanesh Gopalasubramanian2019-02-261-2/+15
| | | | | | | | | | | | | | | | This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) Scheduler descriptions are yet to be put in place. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D58343 llvm-svn: 354897
* [X86] Fix bug in x86_intrcc with arg copy elisionReid Kleckner2019-02-264-43/+51
| | | | | | | | | | | | | | | | | | | | Summary: Use a custom calling convention handler for interrupts instead of fixing up the locations in LowerMemArgument. This way, the offsets are correct when constructed and we don't need to account for them in as many places. Depends on D56883 Replaces D56275 Reviewers: craig.topper, phil-opp Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56944 llvm-svn: 354837
* [X86] Improve detection of unneeded shift amount masking to also handle the ↵Craig Topper2019-02-252-47/+63
| | | | | | | | | | case that the LHS has known zeroes in it If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case. Differential Revision: https://reviews.llvm.org/D58475 llvm-svn: 354811
* [X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)Simon Pilgrim2019-02-251-10/+28
| | | | | | | | | | Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771
* [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)Simon Pilgrim2019-02-241-0/+14
| | | | | | Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757
* [X86] Fix tls variable lowering issue with large code modelCraig Topper2019-02-241-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756
* [X86][SSE] Use pblendw for v4i32/v2i64 during isel.Craig Topper2019-02-241-13/+63
| | | | | | | | | | | | | | | | | | Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755
* [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and ↵Craig Topper2019-02-243-10/+13
| | | | | | | | | | | | | | | | | | | | | Skylake. Summary: The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate. The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate. Reviewers: RKSimon, andreadb Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58581 llvm-svn: 354754
* [LegalizeTypes][AArch64][X86] Make type legalization of vector ↵Craig Topper2019-02-241-2/+1
| | | | | | | | | | | | | | | | | | | | | (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753
* [X86][AVX] Rename lowerShuffleByMerging128BitLanes to ↵Simon Pilgrim2019-02-241-10/+11
| | | | | | | | lowerShuffleAsLanePermuteAndRepeatedMask. NFC. Name better matches the other similar 'lane permute' and 'repeated mask' functions we have. llvm-svn: 354749
* Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value ↵Craig Topper2019-02-232-60/+123
| | | | | | | | | | types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737
* Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of ↵Craig Topper2019-02-231-0/+21
| | | | | | | | | | EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734
* [X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) ↵Simon Pilgrim2019-02-231-5/+0
| | | | | | | | --> sub_vbroadcast(x) D58053/rL354340 added this to EltsFromConsecutiveLoads directly llvm-svn: 354732
* Fix MSVC constant truncation warnings. NFCI.Simon Pilgrim2019-02-231-11/+11
| | | | llvm-svn: 354731
* [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> ↵Simon Pilgrim2019-02-231-0/+7
| | | | | | | | broadcast(x) For AVX1, limit this to i32/f32/i64/f64 loading cases only. llvm-svn: 354730
* [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input ↵Simon Pilgrim2019-02-231-0/+12
| | | | | | | | in place Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD. llvm-svn: 354729
* [X86] Sign extend the 8-bit immediate when commuting blend instructions to ↵Craig Topper2019-02-231-3/+5
| | | | | | | | | | | | match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724
* Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more ↵Reid Kleckner2019-02-232-144/+60
| | | | | | | | | | | | | | | value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713
OpenPOWER on IntegriCloud