summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Specify all TSFlags bit-offsets symbolicallyAdam Nemet2014-07-141-3/+6
| | | | | | | | | | | | | | | No functional change. The offsets for the other bitfields are specified symbolically. I need to increase the size for one of the earlier fields which is easier after this cleanup. Why these bits are relative to VEXShift is a bit strange but that is for another cleanup. I made sure that the values for the enums are unchanged after this change. llvm-svn: 213011
* CodeGen: Stick constant pool entries in COMDAT sections for WinCOFFDavid Majnemer2014-07-144-0/+91
| | | | | | | | | | | | | | | | COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006
* X86: correct 64-bit atomics on 32-bitSaleem Abdulrasool2014-07-141-12/+8
| | | | | | | | | | | | | | | | We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. llvm-svn: 212956
* X86: remove temporary atomicrmw used during lowering.Tim Northover2014-07-141-2/+5
| | | | | | | | | | | We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! llvm-svn: 212948
* MC: make DWARF and Windows unwinding handling more similarSaleem Abdulrasool2014-07-131-2/+2
| | | | | | | | Rename member variables and functions for the MCStreamer for DWARF-like unwinding management. Rename the Windows ones as well and make the naming and handling similar across the two. No functional change intended. llvm-svn: 212912
* Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."Juergen Ributzka2014-07-111-38/+42
| | | | | | This reverts commit r212851, because it broke the memset lowering. llvm-svn: 212855
* [FastISel][X86] Implement the FastLowerIntrinsicCall hook.Juergen Ributzka2014-07-111-42/+38
| | | | | | | Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 212851
* [X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.Quentin Colombet2014-07-111-9/+41
| | | | | | | | Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808
* [X86] AVX512: Improve readability of isCDisp8Adam Nemet2014-07-111-3/+12
| | | | | | | | No functional change. As I was trying to understand this function, I found that variables were reused with confusing names and the broadcast case was a bit too implicit. Hopefully, this is an improvement. llvm-svn: 212795
* [X86] AVX512: Simplify logic in isCDisp8Adam Nemet2014-07-111-6/+6
| | | | | | | | | | | | It was computing the VL/n case as: MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize ElemByteSize not only falls out but VectorByteSize/Divider now actually matches the definition of VL/n. Also some formatting fixes. llvm-svn: 212794
* [X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.Akira Hatanaka2014-07-102-2/+5
| | | | | | | | | Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> llvm-svn: 212747
* [x32] Add AsmBackend for X32 which uses ELF32 with x86_64 (the author is ↵Zinovy Nis2014-07-101-0/+14
| | | | | | | | | | Pavel Chupin). This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow. Differential Revision: http://reviews.llvm.org/D4181 llvm-svn: 212716
* [x86] Add another combine that is particularly useful for the new vectorChandler Carruth2014-07-101-0/+41
| | | | | | | | | | shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705
* [x86] Expand the target DAG combining for PSHUFD nodes to be able toChandler Carruth2014-07-101-1/+34
| | | | | | | | | | combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be *really* nice. llvm-svn: 212695
* [x86] Tweak the v16i8 single input special case lowering for shufflesChandler Carruth2014-07-101-34/+44
| | | | | | | | | | | | | | | | | | | that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692
* [x86] Initial improvements to the new shuffle lowering for v16i8Chandler Carruth2014-07-101-10/+36
| | | | | | | | | | | | | shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680
* [x86] Refactor some of the new code for lowering v16i8 shuffles toChandler Carruth2014-07-101-17/+17
| | | | | | | | remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674
* [SDAG] Make the new zext-vector-inreg node default to expand so targetsChandler Carruth2014-07-091-1/+0
| | | | | | | | | | | don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661
* TargetRegisterInfo: Remove function that fell out of use years ago.Benjamin Kramer2014-07-092-19/+0
| | | | llvm-svn: 212636
* [X86] AVX512: Enable it in the Loop VectorizerAdam Nemet2014-07-091-1/+5
| | | | | | | | | | This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634
* X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.Benjamin Kramer2014-07-091-5/+13
| | | | | | | | | | Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [x86] Initialize a pointer to null to fix a bug in r212602.Chandler Carruth2014-07-091-1/+1
| | | | | | | This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606
* [x86] Re-apply a variant of the x86 side of r212324 now that the restChandler Carruth2014-07-091-74/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-41/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*Adam Nemet2014-07-081-7/+7
| | | | | | | | | As destination k0 is allowed but not as predicate/writemask. I also modified the test to allow checking of error messages by the assembler. I applied a similar approach to the test ret.s in the same directory. llvm-svn: 212504
* [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with ↵Andrea Di Biagio2014-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | different types. When combining a sequence of two PSHUFD dag nodes into a single PSHUFD, make sure that we assign the correct type to the resulting PSHUFD. X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32. Before this change, an assertion failure was triggered in method 'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test below into a single PSHUFD. define <4 x float> @test1(<4 x float> %V) { %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> ret <4 x float> %2 } llvm-svn: 212498
* [FastISel][X86] Fix smul.with.overflow.i8 lowering.Juergen Ributzka2014-07-071-3/+19
| | | | | | | | | | | Add custom lowering code for signed multiply instruction selection, because the default FastISel instruction selection for ISD::MUL will use unsigned multiply for the i8 type and signed multiply for all other types. This would set the incorrect flags for the overflow check. This fixes <rdar://problem/17549300> llvm-svn: 212493
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-071-76/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* X86: revert unintentional change to X86FastISel.Tim Northover2014-07-071-1/+1
| | | | | | This crept in with r212443. llvm-svn: 212459
* [asan] Generate asm instrumentation in MC.Evgeniy Stepanov2014-07-071-63/+308
| | | | | | | | | Generate entire ASan asm instrumentation in MC without relying on runtime helper functions. Patch by Yuri Gorshenin. llvm-svn: 212455
* [x86] Teach the new vector shuffle lowering code to handle what isChandler Carruth2014-07-071-0/+41
| | | | | | | | | | | | | | | | | | essentially a DAG combine that never gets a chance to run. We might typically expect DAG combining to remove shuffles-of-splats and other similar patterns, but we don't get a chance to run the DAG combiner when we recursively form sub-shuffles during the lowering of a shuffle. So instead hand-roll a really important combine directly into the lowering code to detect shuffles-of-splats, especially shuffles of an all-zero splat which needn't even have the same element width, etc. This lets the new vector shuffle lowering handle shuffles which implement things like zero-extension really nicely. This will become even more important when I wire the legalization of zero-extension to vector shuffles with the new widening legalization strategy. llvm-svn: 212444
* CodeGen: it turns out that NAND is not the same thing as BIC. At all.Tim Northover2014-07-071-1/+1
| | | | | | | | | | | We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443
* Add support for parsing the not operator in Microsoft inline assemblyEhsan Akhgari2014-07-041-5/+36
| | | | | | This fixes http://llvm.org/PR20202 llvm-svn: 212352
* [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work forChandler Carruth2014-07-041-75/+76
| | | | | | | | | | | | | | | | | | | | | | | | any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324
* [X86] Limit maximum nop length on SilvermontAlexey Volkov2014-07-041-3/+4
| | | | | | | | | | | Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes. Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty. Maximum nop length is limited to 7 bytes when used for padding on Silvermont. For other x86 processors max nop length remains unchanged 15 bytes. Differential Revision: http://reviews.llvm.org/D4374 llvm-svn: 212321
* [x86] Clarify that this lowering only applies to vectors and is onlyChandler Carruth2014-07-031-3/+2
| | | | | | used when we have SSE2. llvm-svn: 212300
* [CostModel][x86] Improved cost model for alternate shuffles.Andrea Di Biagio2014-07-031-17/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296
* [X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes.Andrea Di Biagio2014-07-032-0/+23
| | | | | | | | | | This patch adds tablegen patterns to select F16C float-to-half-float conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes. If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32' are expanded into library calls. llvm-svn: 212293
* [x86] Fix crashes in lowering bitcast instructions with the wideningChandler Carruth2014-07-031-0/+7
| | | | | | | | | | mode. This also runs the test in that mode which would reproduce the crash. What I love is that *every single FIXME* in the test is addressed by switching to widening. llvm-svn: 212254
* [x86] Based on a long conversation between myself, Jim Grosbach, HalChandler Carruth2014-07-032-0/+19
| | | | | | | | | | | | | | | | | Finkel, Eric Christopher, and a bunch of other people I'm probably forgetting (sorry), add an option to the x86 backend to widen vectors during type legalization rather than promote them. This still would promote vNi1 vectors to get the masks right, but would widen other vectors. A lot of experiments are piling up right now showing that widening should probably be the default legalization strategy outside of vNi1 cases, but it is very hard to test the rammifications of that and fix bugs in widening-based legalization without an option that enables it. I'll be checking in tests shortly that use this option to exercise cases where widening doesn't work well and hopefully we'll be able to switch fully to this soon. llvm-svn: 212249
* [X86] AVX512: Allow writemask argument in vpermt* intrinsicsAdam Nemet2014-07-021-5/+15
| | | | llvm-svn: 212223
* [X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclassAdam Nemet2014-07-021-19/+14
| | | | | | | | | | | | | This new multiclass, avx512_perm_table_3src derives from the current one and provides the Pat<>. The next patch will add another Pat<> that uses the writemask. Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32 VR512:$src1) -> R512:$src1. I think that this should be fine (at least many intrinsic calls don't provide them) and it greatly reduces the number of template arguments. llvm-svn: 212222
* [X86] AVX512: Add writemask variants for vperm*2*Adam Nemet2014-07-021-14/+68
| | | | | | | | | This includes assembler and codegen support (see the new tests in avx512-encodings.s and avx512-shuffle.ll). <rdar://problem/17492620> llvm-svn: 212221
* X86: When combining shuffles just remove shuffles that are completely redundant.Benjamin Kramer2014-07-021-0/+7
| | | | | | | CombineTo doesn't allow replacing a node with itself so this would crash if the combined shuffle is the same as the input shuffle. llvm-svn: 212181
* AVX-512: dec/inc instructions are slow on KNLElena Demikhovsky2014-07-021-1/+2
| | | | | | | After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC. Added a test. llvm-svn: 212178
* X86: remove atomic instructions *after* we've iterated through them.Tim Northover2014-07-011-3/+6
| | | | | | | | | Otherwise they get freed and the implicit "isa<XYZ>" tests following turn out badly (at least under sanitizers). Also corrects the ordering of unordered atomic stores. llvm-svn: 212136
* [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.Juergen Ributzka2014-07-012-3/+4
| | | | | | | | The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135
* X86: delegate expanding atomic libcalls to generic code.Tim Northover2014-07-011-0/+14
| | | | | | | | | | | | | | | On targets without cmpxchg16b or cmpxchg8b, the borderline atomic operations were slipping through the gaps. X86AtomicExpand.cpp was delegating to ISelLowering. Generic ISelLowering was delegating to X86ISelLowering and X86ISelLowering was asserting. The correct behaviour is to expand to a libcall, preferably in generic ISelLowering. This can be achieved by X86ISelLowering deciding it doesn't want the faff after all. llvm-svn: 212134
* X86: expand atomics in IR instead of as MachineInstrs.Tim Northover2014-07-019-976/+294
| | | | | | | | | | | | The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119
OpenPOWER on IntegriCloud