summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] ISD::INSERT_SUBVECTOR - use uint64_t index. NFCI.Simon Pilgrim2019-07-081-4/+4
| | | | | | Keep the uint64_t type from getConstantOperandVal to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365328
* [X86] Allow execution domain fixing to turn SHUFPD into SHUFPS.Craig Topper2019-07-081-0/+14
| | | | | | | This can help with code size on SSE targets where SHUFPD requires a 0x66 prefix and SHUFPS doesn't. llvm-svn: 365293
* [X86] Make movsd commutable to shufpd with a 0x02 immediate on pre-SSE4.1 ↵Craig Topper2019-07-082-15/+41
| | | | | | | | | | | | | targets. This can help avoid a copy or enable load folding. On SSE4.1 targets we can commute it to blendi instead. I had to make shufpd with a 0x02 immediate commutable as well since we expect commuting to be reversible. llvm-svn: 365292
* [X86] Add MOVSDrr->MOVLPDrm entry to load folding table. Add custom handling ↵Craig Topper2019-07-082-1/+19
| | | | | | | | to turn UNPCKLPDrr->MOVHPDrm when load is under aligned. If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm. llvm-svn: 365287
* [X86] Make sure load isn't volatile before shrinking it in MOVDDUP isel ↵Craig Topper2019-07-072-5/+5
| | | | | | patterns. llvm-svn: 365275
* [X86] SimplifyDemandedVectorEltsForTargetNode - fix shadow variable warning. ↵Simon Pilgrim2019-07-061-3/+3
| | | | | | | | NFCI. Fixes cppcheck warning. llvm-svn: 365271
* [X86] LowerBuildVectorv16i8 - pull out repeated getOperand() call. NFCI.Simon Pilgrim2019-07-061-3/+3
| | | | llvm-svn: 365270
* [X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions.Craig Topper2019-07-063-25/+37
| | | | | | | | These patterns are the same as the MOVLPDmr and MOVHPDmr patterns, but with a bitcast at the end. We can just select the PD instruction and let execution domain fixing switch to PS. llvm-svn: 365267
* [X86] Add patterns to select MOVLPDrm from MOVSD+load and MOVHPD from ↵Craig Topper2019-07-061-0/+14
| | | | | | | | | | | | | UNPCKL+load. These narrow the load so we can only do it if the load isn't volatile. There also tests in vector-shuffle-128-v4.ll that this should support, but we don't seem to fold bitcast+load on pre-sse4.2 targets due to the slow unaligned mem 16 flag. llvm-svn: 365266
* [X86] Correct the size check in foldMemoryOperandCustom.Craig Topper2019-07-051-2/+2
| | | | | | | | | | | | | | | The Size either needs to be 0 meaning we aren't folding a stack reload. Or the stack slot needs to be at least 16 bytes. I've also added a paranoia check ensure the RCSize is at leat 16 bytes as well. This avoids any FR32/FR64 surprises, but I think we already filtered those earlier. All of our test case have Size as either 0 or 16 and RCSize == 16. So the Size <= 16 check worked for those cases. llvm-svn: 365234
* [X86] Update SSE1 MOVLPSrm and MOVHPSrm isel patterns to ensure loads are ↵Craig Topper2019-07-051-2/+3
| | | | | | | | | | | non-volatile before folding. These patterns use 128-bit loads, but the instructions only load 64-bits. We shouldn't narrow the load if its volatile. Fixes another variant of PR42079 llvm-svn: 365225
* [X86] Remove unnecessary isel pattern for MOVLPSmr.Craig Topper2019-07-051-5/+0
| | | | | | | | | This was identical to a pattern for MOVPQI2QImr with a bitcast as an input. But we should be able to turn MOVPQI2QImr into MOVLPSmr in the execution domain fixup pass so we shouldn't need this. llvm-svn: 365224
* This reverts r365061 and r365062 (test update)Robert Lougher2019-07-051-4/+4
| | | | | | | | Revision r365061 changed a skip of debug instructions for a skip of meta instructions. This is not safe, as IMPLICIT_DEF is classed as a meta instruction. llvm-svn: 365202
* Revert r365198 as this accidentally commited something thatRobert Lougher2019-07-051-4/+4
| | | | | | should not have been added. llvm-svn: 365199
* This reverts r365061 and r365062 (test update)Robert Lougher2019-07-051-4/+4
| | | | | | | | Revision r365061 changed a skip of debug instructions for a skip of meta instructions. This is not safe, as IMPLICIT_DEF is classed as a meta instruction. llvm-svn: 365198
* [X86][SSE] LowerINSERT_VECTOR_ELT - early out for out of range indicesSimon Pilgrim2019-07-051-3/+3
| | | | | | Fixes OSS-Fuzz #15662 llvm-svn: 365180
* [X86] Add custom isel to select ADD/SUB/OR/XOR/AND to their non-immediate ↵Craig Topper2019-07-041-1/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | forms under optsize when the immediate has additional users. Summary: We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202. Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern. To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND. This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild. I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong. I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations. Fixes PR27202 Reviewers: spatel, RKSimon, andreadb Reviewed By: RKSimon Subscribers: jsji, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59909 llvm-svn: 365163
* [X86][AVX1] Combine concat_vectors(pshufd(x,c),pshufd(y,c)) -> ↵Simon Pilgrim2019-07-041-8/+8
| | | | | | | | vpermilps(concat_vectors(x,y),c) Bitcast v4i32 to v8f32 and back again - it might be worth adding isel patterns for X86PShufd v8i32 on AVX1 targets like we did for X86Blendi to avoid the bitcasts? llvm-svn: 365125
* [X86] Use pointer sized indices instead of i32 for EXTRACT_VECTOR_ELT and ↵Craig Topper2019-07-041-2/+2
| | | | | | | | INSERT_VECTOR_ELT in a couple places. Most places already did this. llvm-svn: 365109
* [X86] Avoid SFB - Skip meta instructionsRobert Lougher2019-07-031-4/+4
| | | | | | | | | | | This patch generalizes the fix in D61680 to ignore all meta instructions, not just debug info. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62605 llvm-svn: 365061
* [CodeGen] Make branch funnels pass the machine verifierFrancis Visoiu Mistrih2019-07-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We previously marked all the tests with branch funnels as `-verify-machineinstrs=0`. This is an attempt to fix it. 1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList = (outs)` 2) After that we hit an assert: ``` Assertion failed: (Op.getValueType() != MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue operands should occur at end of operand list!"), function AddOperand, file /Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp, line 461. ``` The chain operand was added at the beginning of the operand list. Move that to the end. 3) After that we hit another verifier issue in the pseudo expansion where the registers used in the cmps and jmps are not added to the livein lists. Add the `EFLAGS` to all the new MBBs that we create. PR39436 Differential Review: https://reviews.llvm.org/D54155 llvm-svn: 365058
* [X86] ComputeNumSignBitsForTargetNode - add target shuffle support.Simon Pilgrim2019-07-031-1/+50
| | | | llvm-svn: 365057
* [X86][AVX] combineX86ShufflesRecursively - peek through extract_subvectorSimon Pilgrim2019-07-031-20/+25
| | | | | | If we have more then 2 shuffle ops to combine, try to use combineX86ShuffleChainWithExtract to see if some are from the same super vector. llvm-svn: 365050
* [X86][AVX] Combine vpermi(bitcast(x)) -> bitcast(vpermi(x))Simon Pilgrim2019-07-031-0/+16
| | | | | | | | | | iff the number of elements doesn't change. This gets around an issue with combineX86ShuffleChain not being able to hint which domain is preferred for shuffles that can be done with either. Fixes regression introduced in rL365041 llvm-svn: 365044
* [X86][AVX] combineX86ShuffleChainWithExtract - add number of non-zero ↵Simon Pilgrim2019-07-031-0/+3
| | | | | | | | | | extract_subvectors to the combine depth This better accounts for the cost/benefit of removing extract_subvectors from the shuffle and will be more useful in future patches. The vpermq predicate regression will be fixed shortly. llvm-svn: 365041
* [X86][SSE] lowerUINT_TO_FP_v2i32 - explicitly cast half word to doubleSimon Pilgrim2019-07-031-1/+1
| | | | | | Fixes MSVC analyzer extension->double warning. llvm-svn: 365027
* [X86][SSE] LowerINSERT_VECTOR_ELT - ensure insertion index correctness. NFCI.Simon Pilgrim2019-07-031-1/+2
| | | | | | Assert that the insertion index is in range and use uint64_t for the index to fix MSVC/cppcheck truncation warning. llvm-svn: 365025
* [X86][SSE] LowerScalarImmediateShift - ensure shift amount correctness. NFCI.Simon Pilgrim2019-07-031-2/+4
| | | | | | Assert that the shift amount is in range and create vXi8 shift masks in a way that doesn't cause MSVC/cppcheck shift result is truncated then extended warnings. llvm-svn: 365024
* Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-07-031-1/+1
| | | | | | Both MSVC and cppcheck don't like the fact that the variables are initialized via references. llvm-svn: 365018
* [X86] LowerFunnelShift - use modulo constant shift amount.Simon Pilgrim2019-07-031-1/+1
| | | | | | This avoids the use of getZExtValue and uses the modulo shift amount which is whats expected for funnel shifts anyhow. llvm-svn: 365016
* [X86] Add a DAG combine for turning *_extend_vector_inreg+load into an ↵Craig Topper2019-07-023-31/+27
| | | | | | | | | | appropriate extload if the load isn't volatile. Remove the corresponding isel patterns that did the same thing without checking for volatile. This fixes another variation of PR42079 llvm-svn: 364977
* [X86] getTargetConstantBitsFromNode - remove unnecessary getZExtValue() ↵Simon Pilgrim2019-07-021-2/+1
| | | | | | | | | | | | (PR42486) Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits. In this case it was completely superfluous as we'd moved the rest of the code to always use APInt. Fixes the <1 x i128> addition bug in PR42486 llvm-svn: 364953
* [X86] Add patterns to select (scalar_to_vector (loadf32)) as (V)MOVSSrm ↵Craig Topper2019-07-022-9/+24
| | | | | | | | | | instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt. Similar for (V)MOVSD. Ultimately, I'd like to see about folding scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm so this is closer to that. llvm-svn: 364948
* [X86][AVX] combineX86ShuffleChain - pull out CombineShuffleWithExtract ↵Simon Pilgrim2019-07-021-105/+116
| | | | | | | | lambda. NFCI. Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch. llvm-svn: 364924
* [X86] resolveTargetShuffleInputsAndMask - add repeated input handling.Simon Pilgrim2019-07-021-7/+22
| | | | | | We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask. llvm-svn: 364906
* [X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into ↵Craig Topper2019-07-023-131/+30
| | | | | | X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns. llvm-svn: 364887
* [X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where ↵Craig Topper2019-07-013-9/+7
| | | | | | | | | | | | only 32-bits are loaded. v2i64 vzload defines a 64-bit memory access. It doesn't look like we have any coverage for this either way. Also remove some vzload usages where the instruction loads only 16-bits. llvm-svn: 364851
* [X86] Remove several bad load folding isel patterns for VPMOVZX/VPMOVSX.Craig Topper2019-07-012-12/+0
| | | | | | | These patterns all matched a v2i64 vzload which only loads 64-bits to instructions that load a full 128-bits. llvm-svn: 364847
* [X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patternsCraig Topper2019-07-012-2/+93
| | | | | | | | | | | | | | | These instructions only read 64-bits of memory so we shouldn't allow a full vector width load to be pattern matched in case it is marked volatile. Instead allow vzload or scalar_to_vector+load. Also add a DAG combine to turn full vector loads into vzload when used by one of these instructions if the load isn't volatile. This fixes another case for PR42079 llvm-svn: 364838
* [X86] Avoid SFB - Fix inconsistent codegen with/without debug info(2)Robert Lougher2019-07-011-0/+4
| | | | | | | | | | | | | | The function findPotentialBlockers may consider debug info instructions as potential blockers and may stop searching for a store-load pair prematurely. This patch corrects this and tests the cases where the store is separated from the load by more than InspectionLimit debug instructions. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62408 llvm-svn: 364829
* [X86] Add widenSubVector to size in bits helper. NFCI.Simon Pilgrim2019-07-011-4/+16
| | | | | | | | We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size. This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources. llvm-svn: 364803
* [X86] CombineShuffleWithExtract - updated description comments. NFCI.Simon Pilgrim2019-07-011-4/+4
| | | | | | CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size. llvm-svn: 364745
* [X86] Improve the type checking fast-isel handling of vector bitcasts.Craig Topper2019-07-011-13/+8
| | | | | | | | | | | | | | | We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729
* [X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 ↵Craig Topper2019-07-012-0/+44
| | | | | | | | | | | | | | CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728
* [X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD.Craig Topper2019-07-012-0/+18
| | | | | | | | | We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726
* Cleanup: llvm::bsearch -> llvm::partition_point after r364719Fangrui Song2019-06-301-2/+2
| | | | llvm-svn: 364720
* [X86] Custom lower AVX masked loads to masked load and vselect instead of ↵Craig Topper2019-06-302-16/+29
| | | | | | | | | | | | selecting a maskmov+vblend during isel. AVX masked loads only support 0 as the value for masked off elements. So we need an extra blend to support other values. Previously we expanded the masked load to two instructions with isel patterns. With this patch we now insert the vselect during lowering and it will be separately selected as a blend. llvm-svn: 364718
* [x86] remove stale comment about cmov; NFCSanjay Patel2019-06-281-2/+1
| | | | | | | The cmov node used to sometimes return a glue result (and that's what 'flag' meant in this context), but that was removed with D38664. llvm-svn: 364687
* [X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chainSimon Pilgrim2019-06-281-9/+9
| | | | llvm-svn: 364667
* [X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTORSimon Pilgrim2019-06-281-8/+16
| | | | | | | | We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be. Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all). llvm-svn: 364644
OpenPOWER on IntegriCloud