summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] In PostprocessISelDAG, start from allnodes_end, not the root.Craig Topper2018-10-191-2/+1
| | | | | | | | | | There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG. This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from. I don't have a test case, but without this a future patch doesn't work which is how I found it. llvm-svn: 344808
* [X86] Support for the mno-tls-direct-seg-refs flagKristina Brooks2018-10-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723
* [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.Craig Topper2018-10-161-15/+32
| | | | | | | | | | Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649
* [X86] Fix Skylake ReadAfterLd for PADDrm etc.Simon Pilgrim2018-10-162-4/+8
| | | | | | Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600
* [X86] Remove some isel patterns that shouldn't be possible.Craig Topper2018-10-152-6/+0
| | | | | | These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573
* [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for ↵Craig Topper2018-10-151-9/+10
| | | | | | EVEX encoded instructions. llvm-svn: 344563
* [TI removal] Make variables declared as `TerminatorInst` and initializedChandler Carruth2018-10-151-1/+1
| | | | | | | | | | | | | by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
* [X86] Move promotion of vector and/or/xor from legalization to DAG combineCraig Topper2018-10-151-17/+34
| | | | | | | | | | | | | | | | | | | Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487
* [X86] Add 128 MOVDDUP to the constant pool printing in ↵Craig Topper2018-10-151-0/+6
| | | | | | | | X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486
* [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 ↵Simon Pilgrim2018-10-141-0/+10
| | | | | | | | shuffle lowering Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles. llvm-svn: 344481
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-142-9/+23
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-142-23/+9
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-142-9/+23
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* [X86] Fix bad indentation. NFCCraig Topper2018-10-141-1/+1
| | | | llvm-svn: 344471
* [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, ↵Craig Topper2018-10-141-13/+34
| | | | | | | | | | | | | | | | extracting f64 and storing. Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53173 llvm-svn: 344470
* Move some helpers from the global namespace into anonymous ones.Benjamin Kramer2018-10-131-0/+2
| | | | llvm-svn: 344468
* [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-131-28/+7
| | | | | | | | | | instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457
* [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-131-9/+8
| | | | | | | | instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453
* [X86][SSE] combineIncDecVector - use isConstantSplatSimon Pilgrim2018-10-131-3/+1
| | | | | | Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants) llvm-svn: 344452
* [X86] Pull out target constant splat helper function. NFCI.Simon Pilgrim2018-10-131-17/+27
| | | | | | The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector. llvm-svn: 344451
* Pull out repeated getOperand(). NFCI.Simon Pilgrim2018-10-131-3/+2
| | | | llvm-svn: 344450
* Remove unused variable. NFCI.Simon Pilgrim2018-10-131-1/+0
| | | | llvm-svn: 344449
* [X86][SSE] Improve CTTZ lowering when CTLZ is legalSimon Pilgrim2018-10-131-11/+13
| | | | | | | | If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen. Similar to rL344447, this is also closer to LegalizeDAG's approach llvm-svn: 344448
* [X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1))Simon Pilgrim2018-10-131-8/+12
| | | | | | | | | | | | | | | | This patch changes the vector CTTZ lowering from: cttz(x) = ctpop((x & -x) - 1) to: cttz(x) = ctpop(~x & (x - 1)) Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first). Differential Revision: https://reviews.llvm.org/D53214 llvm-svn: 344447
* [X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles ↵Simon Pilgrim2018-10-131-0/+60
| | | | | | | | | | | | | | (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446
* [X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to ↵Craig Topper2018-10-121-7/+13
| | | | | | avoid a stack stack temporary. llvm-svn: 344425
* [X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 ↵Craig Topper2018-10-121-7/+3
| | | | | | | | (bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR. Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence. llvm-svn: 344424
* [X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults ↵Craig Topper2018-10-121-8/+2
| | | | | | | | if the dest type can be widened by generic legalization. NFCI The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion. llvm-svn: 344423
* [x86] add and use fast horizontal vector math subtarget featureSanjay Patel2018-10-123-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361
* Fix unused variable warning after r344348Eric Liu2018-10-121-0/+1
| | | | llvm-svn: 344350
* [X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage. Simon Pilgrim2018-10-121-51/+30
| | | | | | | | Pull out repeated byte sum stage for popcount of vector elements > 8bits. This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw. llvm-svn: 344348
* [X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combineSimon Pilgrim2018-10-121-0/+12
| | | | | | | | Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336
* [tblgen][llvm-mca] Add the ability to describe move elimination candidates ↵Andrea Di Biagio2018-10-121-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | via tablegen. This patch adds the ability to identify instructions that are "move elimination candidates". It also allows scheduling models to describe processor register files that allow move elimination. A move elimination candidate is an instruction that can be eliminated at register renaming stage. Each subtarget can specify which instructions are move elimination candidates with the help of tablegen class "IsOptimizableRegisterMove" (see llvm/Target/TargetInstrPredicate.td). For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated. The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this: ``` def : IsOptimizableRegisterMove<[ InstructionEquivalenceClass<[ // GPR variants. MOV32rr, MOV64rr, // MMX variants. MMX_MOVQ64rr, // SSE variants. MOVAPSrr, MOVUPSrr, MOVAPDrr, MOVUPDrr, MOVDQArr, MOVDQUrr, // AVX variants. VMOVAPSrr, VMOVUPSrr, VMOVAPDrr, VMOVUPDrr, VMOVDQArr, VMOVDQUrr ], CheckNot<CheckSameRegOperand<0, 1>> > ]>; ``` Definitions of IsOptimizableRegisterMove from processor models of a same Target are processed by the SubtargetEmitter to auto-generate a target-specific override for each of the following predicate methods: ``` bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI) const; bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned CPUID) const; ``` By default, those methods return false (i.e. conservatively assume that there are no move elimination candidates). Tablegen class RegisterFile has been extended with the following information: - The set of register classes that allow move elimination. - Maxium number of moves that can be eliminated every cycle. - Whether move elimination is restricted to moves from registers that are known to be zero. This patch is structured in three part: A first part (which is mostly boilerplate) adds the new 'isOptimizableRegisterMove' target hooks, and extends existing register file descriptors in MC by introducing new fields to describe properties related to move elimination. A second part, uses the new tablegen constructs to describe move elimination in the BtVer2 scheduling model. A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove' hook to mark instructions that are candidates for move elimination. It also teaches class RegisterFile how to describe constraints on move elimination at PRF granularity. llvm-mca tests for btver2 show differences before/after this patch. Differential Revision: https://reviews.llvm.org/D53134 llvm-svn: 344334
* [X86] Ignore float/double non-temporal loads (PR39256)Simon Pilgrim2018-10-121-0/+3
| | | | | | | | Scalar non-temporal loads were asserting instead of just being ignored. Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895 llvm-svn: 344331
* X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_FreeMatthias Braun2018-10-111-1/+5
| | | | | | | | | | | | | | | DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315
* Inline variable into assert to avoid unused variable warning.Richard Trieu2018-10-111-2/+1
| | | | llvm-svn: 344308
* [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.Craig Topper2018-10-111-0/+24
| | | | | | | | | | | | On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291
* [X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to ↵Craig Topper2018-10-113-80/+152
| | | | | | | | | | | | | | | | create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode. This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about. We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR. I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that. I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet. Differential Revision: https://reviews.llvm.org/D53126 llvm-svn: 344270
* [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) patternRoman Lebedev2018-10-111-0/+83
| | | | | | | | | | | | | | | | | | Summary: As discussed in D48491, we can't really do this in the TableGen, since we need to produce *two* instructions. This only implements one single pattern. The other 3 patterns will be in follow-ups. I'm not sure yet if we want to also fuse shift into here (i.e `(x >> start) & ...`) Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D52304 llvm-svn: 344224
* [X86] Prevent non-temporal loads from folding into instructions by blocking ↵Craig Topper2018-10-102-54/+31
| | | | | | | | | | them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189
* [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLoweringRoman Lebedev2018-10-102-66/+66
| | | | | | | | | | | | | | | | | | | | | | | | Summary: As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 | PR38938 ]], we fail to emit `BEXTR` if the mask is shifted. We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`, and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node, and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back. So it would seem X86ISelLowering is the place to handle this. This patch only moves the matchBEXTRFromAnd() from X86DAGToDAGISel to X86ISelLowering. It does not add support for the 'shifted mask' pattern. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52426 llvm-svn: 344179
* [x86] allow single source horizontal op matching (PR39195)Sanjay Patel2018-10-101-2/+6
| | | | | | | | | | | | | | | This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141
* [TargetLowering] Add root node back to work list after successful ↵Simon Pilgrim2018-10-101-14/+2
| | | | | | | | | | SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132
* [X86] Remove FeatureRTM from Skylake processor listCraig Topper2018-10-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53041 llvm-svn: 344116
* [X86] Fix sanitizer bot failure from 344085Rong Xu2018-10-091-2/+3
| | | | | | Fix the memory issue exposed by sanitizer. llvm-svn: 344092
* Recommit r343993: [X86] condition branches folding for three-way conditional ↵Rong Xu2018-10-096-0/+601
| | | | | | | | codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085
* [X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits ↵Craig Topper2018-10-091-10/+8
| | | | | | | | | | | | | | in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071
* [x86] use demanded bits to simplify masked store codegenSanjay Patel2018-10-091-17/+16
| | | | | | | | | | | | As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048
* [X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectorsSimon Pilgrim2018-10-091-15/+34
| | | | | | | | As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019
* [X86] Revert r343993 condition branches folding for three-way conditional codesRong Xu2018-10-086-601/+0
| | | | | | Some buildbots failed. llvm-svn: 343998
OpenPOWER on IntegriCloud