summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [x86] Factor out the logic to generically decombose a vector shuffleChandler Carruth2014-09-241-72/+42
| | | | | | | | | | | into unblended shuffles and a blend. This is the consistent fallback for the lowering paths that have fast blend operations available, and its getting quite repetitive. No functionality changed. llvm-svn: 218399
* [x86] Teach the instruction lowering to add comments describing constantChandler Carruth2014-09-241-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a *bunch* of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377
* [x86] More refactoring of the shuffle comment emission. The previousChandler Carruth2014-09-241-38/+38
| | | | | | | | | | | attempt didn't work out so well. It looks like it will be much better for introducing extra logic to find a shuffle mask if the finding logic is totally separate. This also makes it easy to sink the opcode logic completely out of the routine so we don't re-dispatch across it. Still no functionality changed. llvm-svn: 218363
* [x86] Bypass the shuffle mask comment generation when not using verboseChandler Carruth2014-09-241-0/+2
| | | | | | | | asm. This can be somewhat expensive and there is no reason to do it outside of tests or debugging sessions. I'm also likely to make it significantly more expensive to support more styles of shuffles. llvm-svn: 218362
* [x86] Hoist the logic for extracting the relevant bits of informationChandler Carruth2014-09-241-16/+20
| | | | | | | | | | | | | | | from the MachineInstr into the caller which is already doing a switch over the instruction. This will make it more clear how to compute different operands to feed the comment selection for example. Also, in a drive-by-fix, don't append an empty comment string (which is a no-op ultimately). No functionality changed. llvm-svn: 218361
* [x86] Start refactoring the comment printing logic in the MC lowering ofChandler Carruth2014-09-241-87/+102
| | | | | | | | | | | | | | | vector shuffles. This is just the beginning by hoisting it into its own function and making use of early exit to dramatically simplify the flow of the function. I'm going to be incrementally refactoring this until it is a bit less magical how this applies to other instructions, and I can teach it how to dig a shuffle mask out of a register. Then I plan to hook it up to VPERMD so we get our mask comments for it. No functionality changed yet. llvm-svn: 218357
* [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles withChandler Carruth2014-09-241-5/+50
| | | | | | | | | | | | the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347
* [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffleChandler Carruth2014-09-241-13/+36
| | | | | | | | | | | | detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a *numeric* shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346
* [x86] Teach the new vector shuffle lowering to lower v4i64 vectorChandler Carruth2014-09-231-6/+57
| | | | | | | | | | | shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338
* [x86] Teach the rest of the 'target shuffle' machinery about blends andChandler Carruth2014-09-232-1/+30
| | | | | | | | | | | add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. llvm-svn: 218335
* [X86] Make wide loads be managed by AtomicExpandRobin Morisset2014-09-231-28/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332
* [x86] Teach the new shuffle lowering's blend functionality to use AVX2'sChandler Carruth2014-09-231-16/+35
| | | | | | | | | | | | | VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. llvm-svn: 218322
* [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT isLang Hames2014-09-231-52/+0
| | | | | | gone they're no longer needed. llvm-svn: 218320
* [x86] Teach the vector comment parsing and printing to correctly handleChandler Carruth2014-09-235-56/+113
| | | | | | | | | | | | | | | | | undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is *much* prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301
* [x86] Teach the AVX1 path of the new vector shuffle lowering one moreChandler Carruth2014-09-237-25/+78
| | | | | | | | | | | | | | | | | | | | | | trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300
* [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for theChandler Carruth2014-09-225-30/+30
| | | | | | | | td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282
* Fix a "typo" from my previous commit.Kaelyn Takata2014-09-221-1/+1
| | | | llvm-svn: 218281
* Silence unused variable warnings in the new stub functions that occurKaelyn Takata2014-09-221-1/+3
| | | | | | when assertions are disabled. llvm-svn: 218280
* [x86] Stub out the integer lowering of 256-bit vectors with AVX2Chandler Carruth2014-09-221-4/+87
| | | | | | | support. No interesting functionality yet, but this will let me implement one vector type at a time. llvm-svn: 218277
* ms-inline-asm: Fix parsing label names inside bracket expressionsEhsan Akhgari2014-09-221-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes a couple of issues. One is ensuring that AOK_Label rewrite rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to be able to skip the brackets properly. The other part of the fix ensures that we don't overwrite Identifier when looking up the identifier, and that we use the locally available information to generate the AOK_Label rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm would be problematic since the Start location there may point to the beginning of a bracket expression, and not necessarily the beginning of an identifier. This also means that we don't need to carry around the InternlName field, which helps simplify the code. Test Plan: This will be tested on the clang side. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5445 llvm-svn: 218270
* Use broadcasts to optimize overall size when loading constant splat vectors ↵Sanjay Patel2014-09-222-7/+33
| | | | | | | | | | | | | | | (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263
* [x32] Fix segmented stacks supportPavel Chupin2014-09-227-41/+65
| | | | | | | | | | | | | | | | Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247
* Fix assert when decoding PSHUFB maskRobert Lougher2014-09-221-6/+4
| | | | | | | | | | The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242
* ms-inline-asm: Add a sema callback for looking up label namesEhsan Akhgari2014-09-221-2/+18
| | | | | | | | | | | | | | | The implementation of the callback in clang's Sema will return an internal name for labels. Test Plan: Will be tested in clang. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4587 llvm-svn: 218229
* [x86] Back out a bad choice about lowering v4i64 and pave the way forChandler Carruth2014-09-221-44/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled *all* support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 *without access to hardware*. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. llvm-svn: 218228
* [x86] Teach the new vector shuffle lowering how to cleverly lower singleChandler Carruth2014-09-211-3/+22
| | | | | | | | | | | | | input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. llvm-svn: 218226
* [x86] With the stronger canonicalization of shuffles added in r218216,Chandler Carruth2014-09-211-4/+0
| | | | | | | the new vector shuffle lowering no longer needs to check both symmetric forms of UNPCK patterns for v4f64. llvm-svn: 218217
* [x86] Teach the new vector shuffle lowering to re-use the SHUFPSChandler Carruth2014-09-211-4/+23
| | | | | | | | | | | | | lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. llvm-svn: 218216
* [x86] Refactor the logic to form SHUFPS instruction patterns to lowerChandler Carruth2014-09-211-89/+108
| | | | | | | | | | a generic vector shuffle mask into a helper that isn't specific to the other things that influence which choice is made or the specific types used with the instruction. No functionality changed. llvm-svn: 218215
* [x86] Teach the new vector shuffle lowering the basics about insertionChandler Carruth2014-09-211-0/+18
| | | | | | | | of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. llvm-svn: 218214
* [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS andChandler Carruth2014-09-211-2/+6
| | | | | | | | | | | | | UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... llvm-svn: 218213
* [x86] Begin teaching the new vector shuffle lowering among the mostChandler Carruth2014-09-211-2/+28
| | | | | | | | | | important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. llvm-svn: 218211
* [x86] Explicitly lower to a blend early if it is trivial to do so forChandler Carruth2014-09-211-0/+5
| | | | | | | | | | | | v8f32 shuffles in the new vector shuffle lowering code. This is very cheap to do and makes it much more clear that anything more expensive but overlapping with this lowering should be selected afterward (for example using AVX2's VPERMPS). However, no functionality changed here as without this code we would fall through to create no-op shuffles of each input and a blend. =] llvm-svn: 218209
* [x86] Teach the new vector shuffle lowering of v4f64 to prefer a directChandler Carruth2014-09-211-0/+5
| | | | | | | | | | VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. llvm-svn: 218208
* [x86] Switch the blend implementation to use a MVT switch rather thanChandler Carruth2014-09-211-18/+25
| | | | | | | | | awkward conditions. The readability improvement of this will be even more important as I generalize it to handle more types. No functionality changed. llvm-svn: 218205
* [x86] Remove some essentially lying comments from the v4f64 path of theChandler Carruth2014-09-211-6/+0
| | | | | | new vector shuffle lowering. llvm-svn: 218204
* [x86] Fix a helper to reflect that what we actually care about isChandler Carruth2014-09-211-9/+12
| | | | | | | | | | 128-bit lane crossings, not 'half' crossings. This came up in code review ages ago, but I hadn't really addresesd it. Also added some documentation for the helper. No functionality changed. llvm-svn: 218203
* [x86] Teach the new vector shuffle lowering the first step toward moreChandler Carruth2014-09-211-1/+40
| | | | | | | | | | | | actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. llvm-svn: 218202
* [x86] Teach the new vector shuffle lowering to use VPERMILPD forChandler Carruth2014-09-201-0/+8
| | | | | | | | single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. llvm-svn: 218193
* [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPSChandler Carruth2014-09-201-3/+11
| | | | | | | | instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. llvm-svn: 218190
* [x86] Teach the v4f32 path of the new shuffle lowering to handle theChandler Carruth2014-09-201-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? llvm-svn: 218177
* [x86] Refactor the code for emitting INSERTPS to reuse the zeroable maskChandler Carruth2014-09-201-25/+15
| | | | | | | analysis used elsewhere. This removes the last duplicate of this logic. Also simplify the code here quite a bit. No functionality changed. llvm-svn: 218176
* [x86] Generalize the single-element insertion lowering to work withChandler Carruth2014-09-201-13/+45
| | | | | | | | | | | | floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. llvm-svn: 218175
* [x86] Replace some duplicated logic reasoning about whether particularChandler Carruth2014-09-201-13/+6
| | | | | | | | | | vector lanes can be modeled as zero with a call to the new function that computes a bit-vector representing that information. No functionality changed here, but will allow doing more clever things with the zero-test. llvm-svn: 218174
* [X86] Erase some obsolete comments from README.txtRobin Morisset2014-09-191-177/+0
| | | | | | | | | | | I just tried reproducing some of the optimization failures in README.txt in the X86 backend, and many of them could not be reproduced. In general the entire file appears quite bit-rotted, whatever interesting parts remain should be moved to bugzilla, and the rest deleted. I did not spend the time to do that, so I just deleted the few I tried reproducing which are obsolete, to save some time to whoever will find the courage to do it. llvm-svn: 218170
* [x86] Hoist a function up to the rest of the non-type-specific loweringChandler Carruth2014-09-191-75/+74
| | | | | | | | | helpers, and re-flow the logic to use early exit and be a bit more readable. No functionality changed. llvm-svn: 218155
* [x86] Hoist the actual lowering logic into a helper function to separateChandler Carruth2014-09-191-74/+89
| | | | | | | | it from the shuffle pattern matching logic. Also cleaned up variable names, comments, etc. No functionality changed. llvm-svn: 218152
* [x86] Fully generalize the zext lowering in the new vector shuffleChandler Carruth2014-09-191-33/+91
| | | | | | | | | | | | lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get *exactly* the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is *huge*. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. llvm-svn: 218143
* [x86] Recognize that we can use duplication to widen v16i8 shuffles dueChandler Carruth2014-09-191-3/+3
| | | | | | | | to undef lanes as well as defined widenable lanes. This dramatically improves the lowering we use for undef-shuffles in a zext-ish pattern for SSE2. llvm-svn: 218115
* [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32Chandler Carruth2014-09-191-1/+7
| | | | | | | | | shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. llvm-svn: 218112
OpenPOWER on IntegriCloud