summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Fix a large collection of bugs that crept in as I fleshed out theChandler Carruth2014-09-261-13/+27
| | | | | | | | | | | | AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) llvm-svn: 218522
* [X86][SchedModel] SSE reciprocal square root instruction latencies.Andrea Di Biagio2014-09-267-15/+39
| | | | | | | | | | | | | | | | | The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517
* [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.Robert Khasanov2014-09-262-6/+61
| | | | | | Added lowering tests for these instructions. llvm-svn: 218508
* [AVX512] Simplify use of !con()Adam Nemet2014-09-261-4/+2
| | | | | | No change in X86.td.expanded. llvm-svn: 218485
* [AVX512] Pull pattern for subvector extract into the instruction definitionAdam Nemet2014-09-251-9/+6
| | | | | | | | | | | | | | | | | | No functional change. I initially thought that pulling the Pat<> into the instruction pattern was not possible because it was doing a transform on the index in order to convert it from a per-element (extract_subvector) index into a per-chunk (vextract*x4) index. Turns out this also works inside the pattern because the vextract_extract PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the index in $idx goes through the same conversion. The existing test CodeGen/X86/avx512-insert-extract.ll extended in the previous commit provides coverage for this change. llvm-svn: 218480
* [AVX512] Refactor subvector extractsAdam Nemet2014-09-251-98/+69
| | | | | | | | | | | | | | | | | | | | | No functional change. These are now implemented as two levels of multiclasses heavily relying on the new X86VectorVTInfo class. The multiclass at the first level that is called with float or int provides the 128 or 256 bit subvector extracts. The second level provides the register and memory variants and some more Pat<>s. I've compared the td.expanded files before and after. One change is that ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a bugfix. (BTW, this is the change that was blocked on the recent tablegen fix. The class-instance values X86VectorVTInfo inside vextract_for_type weren't properly evaluated.) Part of <rdar://problem/17688758> llvm-svn: 218478
* [AVX512] Fix typoAdam Nemet2014-09-251-1/+1
| | | | | | F->I in VEXTRACTF32x4rr. llvm-svn: 218477
* Lower idempotent RMWs to fence+loadRobin Morisset2014-09-252-4/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455
* Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the ↵Daniel Sanders2014-09-251-0/+3
| | | | | | buildbots. llvm-svn: 218452
* [x86] Teach the new vector shuffle lowering to use AVX2 instructions forChandler Carruth2014-09-251-16/+31
| | | | | | | | | | | | | | | | | | | | | | | v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449
* [x86] Teach the new vector shuffle lowering a fancier way to lowerChandler Carruth2014-09-251-33/+65
| | | | | | | | | | | 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446
* [x86] Fix an oversight in the v8i32 path of the new vector shuffleChandler Carruth2014-09-251-2/+2
| | | | | | | | | | | | | | | | lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442
* [x86] Rearrange the code for v16i16 lowering a bit for clarity and toChandler Carruth2014-09-251-29/+18
| | | | | | | | | | | | | | | | | | | | reduce the amount of checking we do here. The first realization is that only non-crossing cases between 128-bit lanes are handled by almost the entire function. It makes more sense to handle the crossing cases first. THe second is that until we actually are going to generate fancy shared lowering strategies that use the repeated semantics of the v8i16 lowering, we should waste time checking for repeated masks. It is simplest to directly test for the entire unpck masks anyways, so we gained nothing from this. This also matches the structure of v32i8 more closely. No functionality changed here. llvm-svn: 218441
* [x86] Implement AVX2 support for v32i8 in the new vector shuffleChandler Carruth2014-09-251-5/+57
| | | | | | | | | | lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440
* [x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selectsChandler Carruth2014-09-252-4/+0
| | | | | | | | | for this now. Should prevent folks from running afoul of this and not knowing why their code won't instruction select the way I just did... llvm-svn: 218436
* [x86] Fix the v16i16 blend logic I added in the prior commit and add theChandler Carruth2014-09-251-3/+5
| | | | | | | | | | missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. llvm-svn: 218434
* [X86,AVX] Add an isel pattern for X86VBroadcast.Akira Hatanaka2014-09-251-0/+3
| | | | | | This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431
* [x86] Implement v16i16 support with AVX2 in the new vector shuffleChandler Carruth2014-09-254-58/+174
| | | | | | | | | | | | | | | lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. llvm-svn: 218430
* [x86] Factor out the logic to generically decombose a vector shuffleChandler Carruth2014-09-241-72/+42
| | | | | | | | | | | into unblended shuffles and a blend. This is the consistent fallback for the lowering paths that have fast blend operations available, and its getting quite repetitive. No functionality changed. llvm-svn: 218399
* [x86] Teach the instruction lowering to add comments describing constantChandler Carruth2014-09-241-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a *bunch* of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377
* [x86] More refactoring of the shuffle comment emission. The previousChandler Carruth2014-09-241-38/+38
| | | | | | | | | | | attempt didn't work out so well. It looks like it will be much better for introducing extra logic to find a shuffle mask if the finding logic is totally separate. This also makes it easy to sink the opcode logic completely out of the routine so we don't re-dispatch across it. Still no functionality changed. llvm-svn: 218363
* [x86] Bypass the shuffle mask comment generation when not using verboseChandler Carruth2014-09-241-0/+2
| | | | | | | | asm. This can be somewhat expensive and there is no reason to do it outside of tests or debugging sessions. I'm also likely to make it significantly more expensive to support more styles of shuffles. llvm-svn: 218362
* [x86] Hoist the logic for extracting the relevant bits of informationChandler Carruth2014-09-241-16/+20
| | | | | | | | | | | | | | | from the MachineInstr into the caller which is already doing a switch over the instruction. This will make it more clear how to compute different operands to feed the comment selection for example. Also, in a drive-by-fix, don't append an empty comment string (which is a no-op ultimately). No functionality changed. llvm-svn: 218361
* [x86] Start refactoring the comment printing logic in the MC lowering ofChandler Carruth2014-09-241-87/+102
| | | | | | | | | | | | | | | vector shuffles. This is just the beginning by hoisting it into its own function and making use of early exit to dramatically simplify the flow of the function. I'm going to be incrementally refactoring this until it is a bit less magical how this applies to other instructions, and I can teach it how to dig a shuffle mask out of a register. Then I plan to hook it up to VPERMD so we get our mask comments for it. No functionality changed yet. llvm-svn: 218357
* [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles withChandler Carruth2014-09-241-5/+50
| | | | | | | | | | | | the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347
* [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffleChandler Carruth2014-09-241-13/+36
| | | | | | | | | | | | detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a *numeric* shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346
* [x86] Teach the new vector shuffle lowering to lower v4i64 vectorChandler Carruth2014-09-231-6/+57
| | | | | | | | | | | shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338
* [x86] Teach the rest of the 'target shuffle' machinery about blends andChandler Carruth2014-09-232-1/+30
| | | | | | | | | | | add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. llvm-svn: 218335
* [X86] Make wide loads be managed by AtomicExpandRobin Morisset2014-09-231-28/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332
* [x86] Teach the new shuffle lowering's blend functionality to use AVX2'sChandler Carruth2014-09-231-16/+35
| | | | | | | | | | | | | VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. llvm-svn: 218322
* [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT isLang Hames2014-09-231-52/+0
| | | | | | gone they're no longer needed. llvm-svn: 218320
* [x86] Teach the vector comment parsing and printing to correctly handleChandler Carruth2014-09-235-56/+113
| | | | | | | | | | | | | | | | | undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is *much* prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301
* [x86] Teach the AVX1 path of the new vector shuffle lowering one moreChandler Carruth2014-09-237-25/+78
| | | | | | | | | | | | | | | | | | | | | | trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300
* [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for theChandler Carruth2014-09-225-30/+30
| | | | | | | | td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282
* Fix a "typo" from my previous commit.Kaelyn Takata2014-09-221-1/+1
| | | | llvm-svn: 218281
* Silence unused variable warnings in the new stub functions that occurKaelyn Takata2014-09-221-1/+3
| | | | | | when assertions are disabled. llvm-svn: 218280
* [x86] Stub out the integer lowering of 256-bit vectors with AVX2Chandler Carruth2014-09-221-4/+87
| | | | | | | support. No interesting functionality yet, but this will let me implement one vector type at a time. llvm-svn: 218277
* ms-inline-asm: Fix parsing label names inside bracket expressionsEhsan Akhgari2014-09-221-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes a couple of issues. One is ensuring that AOK_Label rewrite rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to be able to skip the brackets properly. The other part of the fix ensures that we don't overwrite Identifier when looking up the identifier, and that we use the locally available information to generate the AOK_Label rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm would be problematic since the Start location there may point to the beginning of a bracket expression, and not necessarily the beginning of an identifier. This also means that we don't need to carry around the InternlName field, which helps simplify the code. Test Plan: This will be tested on the clang side. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5445 llvm-svn: 218270
* Use broadcasts to optimize overall size when loading constant splat vectors ↵Sanjay Patel2014-09-222-7/+33
| | | | | | | | | | | | | | | (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263
* [x32] Fix segmented stacks supportPavel Chupin2014-09-227-41/+65
| | | | | | | | | | | | | | | | Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247
* Fix assert when decoding PSHUFB maskRobert Lougher2014-09-221-6/+4
| | | | | | | | | | The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242
* ms-inline-asm: Add a sema callback for looking up label namesEhsan Akhgari2014-09-221-2/+18
| | | | | | | | | | | | | | | The implementation of the callback in clang's Sema will return an internal name for labels. Test Plan: Will be tested in clang. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4587 llvm-svn: 218229
* [x86] Back out a bad choice about lowering v4i64 and pave the way forChandler Carruth2014-09-221-44/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled *all* support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 *without access to hardware*. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. llvm-svn: 218228
* [x86] Teach the new vector shuffle lowering how to cleverly lower singleChandler Carruth2014-09-211-3/+22
| | | | | | | | | | | | | input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. llvm-svn: 218226
* [x86] With the stronger canonicalization of shuffles added in r218216,Chandler Carruth2014-09-211-4/+0
| | | | | | | the new vector shuffle lowering no longer needs to check both symmetric forms of UNPCK patterns for v4f64. llvm-svn: 218217
* [x86] Teach the new vector shuffle lowering to re-use the SHUFPSChandler Carruth2014-09-211-4/+23
| | | | | | | | | | | | | lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. llvm-svn: 218216
* [x86] Refactor the logic to form SHUFPS instruction patterns to lowerChandler Carruth2014-09-211-89/+108
| | | | | | | | | | a generic vector shuffle mask into a helper that isn't specific to the other things that influence which choice is made or the specific types used with the instruction. No functionality changed. llvm-svn: 218215
* [x86] Teach the new vector shuffle lowering the basics about insertionChandler Carruth2014-09-211-0/+18
| | | | | | | | of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. llvm-svn: 218214
* [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS andChandler Carruth2014-09-211-2/+6
| | | | | | | | | | | | | UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... llvm-svn: 218213
* [x86] Begin teaching the new vector shuffle lowering among the mostChandler Carruth2014-09-211-2/+28
| | | | | | | | | | important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. llvm-svn: 218211
OpenPOWER on IntegriCloud