summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.Robert Khasanov2014-09-304-0/+191
| | | | | | Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670
* [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}Robert Khasanov2014-09-302-0/+139
| | | | | | | | | | | | | | Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669
* [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.Robert Khasanov2014-09-301-0/+33
| | | | | | Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668
* [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.Robert Khasanov2014-09-301-1/+33
| | | | | | Added CMP_MASK intrinsic type llvm-svn: 218667
* [x86] Revert r218588, r218589, and r218600. These patches were pursuingChandler Carruth2014-09-305-20/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to *have* to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the *constant* blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during *legalization* relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658
* [x86] Add some vector-register broadcast operations to the 256-bit v4Chandler Carruth2014-09-301-0/+30
| | | | | | tests which were missing them. llvm-svn: 218657
* R600: Fix broken check lines, missing scalar case.Matt Arsenault2014-09-301-21/+31
| | | | llvm-svn: 218655
* [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.Juergen Ributzka2014-09-302-11/+193
| | | | | | | | | | | | | | The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653
* Add soft-float to the key for the subtarget lookup in the TargetMachineEric Christopher2014-09-292-6/+51
| | | | | | | | | | | map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632
* R600/SI: Also fix fsub + fadd a, a to mad combinesMatt Arsenault2014-09-292-0/+64
| | | | llvm-svn: 218609
* R600/SI: Fix using mad with multiplies by 2Matt Arsenault2014-09-291-6/+152
| | | | | | | | | These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608
* [x86] Make the new vector shuffle lowering lower blends as VSELECTChandler Carruth2014-09-292-13/+13
| | | | | | | | | | | | | | | | | | | | | nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't *actually* bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600
* [x86] Delete a bunch of really bad and totally unnecessary code in theChandler Carruth2014-09-293-23/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589
* [x86] Add the dispatch skeleton to the new vector shuffle lowering forChandler Carruth2014-09-291-1247/+469
| | | | | | | | | | | | | | | | | AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585
* [x86] Teach the new vector shuffle lowering to fall back on AVX-512Chandler Carruth2014-09-281-0/+2217
| | | | | | | | | | | | | | vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583
* [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2Chandler Carruth2014-09-282-35/+35
| | | | | | | | | | | | | | | | | lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there *only* the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these *should* be. llvm-svn: 218582
* [x86] Fix a really silly bug that I introduced fixing another bug in theChandler Carruth2014-09-281-0/+26
| | | | | | | | | | new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576
* [x86] Fix yet another bug in the new vector shuffle lowering's handlingChandler Carruth2014-09-281-0/+37
| | | | | | | | | | | | | | | | | | of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+25
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* [x86] Fix terrible bugs everywhere in the new vector shuffle loweringChandler Carruth2014-09-271-14/+53
| | | | | | | | | | | | | | | | | | | | | | and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual *value* of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. llvm-svn: 218562
* [x86] Flip the sentinel values used in the target shuffle mask decodingChandler Carruth2014-09-271-6/+6
| | | | | | | | | | | | | | to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a *lot* of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. llvm-svn: 218561
* Refactor reciprocal and reciprocal square root estimate into ↵Sanjay Patel2014-09-261-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
* [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logicChandler Carruth2014-09-262-42/+90
| | | | | | | | | | | that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. llvm-svn: 218540
* R600/SI: Add strict check lines to div_scale tests.Matt Arsenault2014-09-261-16/+255
| | | | | | | | | This has weird operand requirements so it's worthwhile to have very strict checks for its operands. Add different combinations of SGPR operands. llvm-svn: 218535
* R600/SI Allow same SGPR to be used for multiple operandsMatt Arsenault2014-09-261-0/+96
| | | | | | | | | | | Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532
* R600/SI: Partially move operand legalization to post-isel hook.Matt Arsenault2014-09-265-11/+11
| | | | | | | | | Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531
* R600/SI: Don't move operands that are required to be SGPRsMatt Arsenault2014-09-261-9/+33
| | | | | | | | e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529
* R600/SI: Fix using wrong operand indices when commutingMatt Arsenault2014-09-261-2/+2
| | | | | | | | | | | | | No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527
* [x86] In the new vector shuffle lowering, when trying to do anotherChandler Carruth2014-09-261-0/+24
| | | | | | | | layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. llvm-svn: 218523
* [x86] Fix a large collection of bugs that crept in as I fleshed out theChandler Carruth2014-09-262-0/+86
| | | | | | | | | | | | AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) llvm-svn: 218522
* [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.Robert Khasanov2014-09-263-0/+885
| | | | | | Added lowering tests for these instructions. llvm-svn: 218508
* Revert patch of r218493, delete the test caseDavid Xu2014-09-261-23/+0
| | | | llvm-svn: 218495
* Redundant store instructions should be removed as dead codeDavid Xu2014-09-261-0/+23
| | | | llvm-svn: 218493
* Add the first backend support for on demand subtarget creationEric Christopher2014-09-264-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | based on the Function. This is currently used to implement mips16 support in the mips backend via the existing module pass resetting the subtarget. Things to note: a) This involved running resetTargetOptions before creating a new subtarget so that code generation options like soft-float could be recognized when creating the new subtarget. This is to deal with initialization code in isel lowering that only paid attention to the initial value. b) Many of the existing testcases weren't using the soft-float feature correctly. I've corrected these based on the check values assuming that was the desired behavior. c) The mips port now pays attention to the target-cpu and target-features strings when generating code for a particular function. I've removed these from one function where the requested cpu and features didn't match the check lines in the testcase. llvm-svn: 218492
* R600: Avoid repeated check linesMatt Arsenault2014-09-261-20/+18
| | | | llvm-svn: 218487
* R600/SI: Fix emitting trailing whitespace after s_waitcntMatt Arsenault2014-09-261-4/+4
| | | | llvm-svn: 218486
* [AVX512] Make vextract*x4/vinsert*x4 tests check for the index as wellAdam Nemet2014-09-251-7/+7
| | | | | | Extend test so that it provides coverage for the next commit. llvm-svn: 218479
* R600: Fix some missing conversion testcasesMatt Arsenault2014-09-253-4/+44
| | | | llvm-svn: 218474
* Remove duplicated RUN lines in middle of testMatt Arsenault2014-09-251-2/+0
| | | | llvm-svn: 218473
* [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfoBruno Cardoso Lopes2014-09-251-0/+45
| | | | | | | | | | | | | | | Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> llvm-svn: 218472
* R600/SI: Add support for global atomic addTom Stellard2014-09-251-0/+39
| | | | llvm-svn: 218457
* Lower idempotent RMWs to fence+loadRobin Morisset2014-09-251-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455
* Add missing attributes !cmp.[eq,gt,gtu] instructions.Sid Manning2014-09-251-0/+50
| | | | | | | | These instructions do not indicate they are extendable or the number of bits in the extendable operand. Rename to match architected names. Add a testcase for the intrinsics. llvm-svn: 218453
* [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and ↵Daniel Sanders2014-09-251-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | handle struct's correctly on big-endian N32/N64 return values. Summary: The N32/N64 ABI's require that structs passed in registers are laid out such that spilling the register with 'sd' places the struct at the lowest address. For little endian this is trivial but for big-endian it requires that structs are shifted into the upper bits of the register. We also require that structs passed in registers have the 'inreg' attribute for big-endian N32/N64 to work correctly. This is because the tablegen-erated calling convention implementation only has access to the lowered form of struct arguments (one or more integers of up to 64-bits each) and is unable to determine the original type. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5286 llvm-svn: 218451
* [x86] Teach the new vector shuffle lowering to use AVX2 instructions forChandler Carruth2014-09-252-203/+350
| | | | | | | | | | | | | | | | | | | | | | | v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449
* [x86] Teach the new vector shuffle lowering a fancier way to lowerChandler Carruth2014-09-254-254/+199
| | | | | | | | | | | 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446
* [x86] Fix an oversight in the v8i32 path of the new vector shuffleChandler Carruth2014-09-251-6/+2
| | | | | | | | | | | | | | | | lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442
* [x86] Implement AVX2 support for v32i8 in the new vector shuffleChandler Carruth2014-09-251-211/+55
| | | | | | | | | | lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440
* [x86] More tweaks to the v32i8 test cases.Chandler Carruth2014-09-251-18/+52
| | | | | | | | I made a mistake in the previous commit and produced the wrong pattern. Fix that. Also make one more shuffle pattern byte-based rather than word-based, and add two more blend patterns. llvm-svn: 218439
* [x86] Re-work a bunch of the v32i8 test cases to actually involve byteChandler Carruth2014-09-251-188/+142
| | | | | | | | | | | | shuffles rather than word shuffles. As you might guess, these were built starting from the word shuffle test cases and I failed to properly port a bunch of them and left them as widened word shuffle test cases. We still have a couple of tests that check our ability to widen shuffles, but now we will test the actual byte shuffle quite a bit better. llvm-svn: 218438
OpenPOWER on IntegriCloud