summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to ↵Simon Pilgrim2016-01-261-45/+7
| | | | | | | | | | | | | | EltsFromConsecutiveLoads This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load). It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads. After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion. Differential Revision: http://reviews.llvm.org/D16217 llvm-svn: 258798
* AMDGPU: Make v32i8/v64i8 illegal typesMatt Arsenault2016-01-263-76/+187
| | | | | | | | Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788
* AMDGPU: Remove old sample intrinsicsMatt Arsenault2016-01-2611-662/+138
| | | | | | | | | | | I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787
* AMDGPU: Add new amdgcn intrinsics for cube instructionsMatt Arsenault2016-01-266-2/+107
| | | | | | | More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786
* AMDGPU: Implement read_register and write_register intrinsicsMatt Arsenault2016-01-266-0/+224
| | | | | | | | | | | | | | Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785
* AMDGPU: Restore AMDGPU prefixed rsq intrinsic for nowMatt Arsenault2016-01-262-0/+56
| | | | | | Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783
* [WebAssembly] Optimize memcpy/memmove/memcpy calls.Dan Gohman2016-01-262-2/+62
| | | | | | | | These calls return their first argument, but because LLVM uses an intrinsic with a void return type, they can't use the returned attribute. Generalize the store results pass to optimize these calls too. llvm-svn: 258781
* [WebAssembly] Implement unaligned loads and stores.Dan Gohman2016-01-264-10/+556
| | | | | | Differential Revision: http://reviews.llvm.org/D16534 llvm-svn: 258779
* [MC] Use .p2align instead of .alignDan Gohman2016-01-2672-233/+233
| | | | | | | | | | | | | | | For historic reasons, the behavior of .align differs between targets. Fortunately, there are alternatives, .p2align and .balign, which make the interpretation of the parameter explicit, and which behave consistently across targets. This patch teaches MC to use .p2align instead of .align, so that people reading code for multiple architectures don't have to remember which way each platform does its .align directive. Differential Revision: http://reviews.llvm.org/D16549 llvm-svn: 258750
* X86ISelLowering: Fix cmov(cmov) special lowering bugMatthias Braun2016-01-251-0/+49
| | | | | | | | | | | | | | | | There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729
* [X86][AVX] Add commutation support for VPERM2X128 instructions Simon Pilgrim2016-01-251-0/+171
| | | | | | | | Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726
* [WebAssembly] Fix unbalanced register stack code in the case of late DCE.Dan Gohman2016-01-251-2/+2
| | | | | | | Instructions can be DCE'd after the RegStackify pass. If the instruction which would be the pop for what would be a push is removed, don't use a push. llvm-svn: 258694
* [WebAssembly] Add tests for negative offsets with global variable addresses.Dan Gohman2016-01-251-0/+18
| | | | llvm-svn: 258693
* [SelectionDAG] Use the correct return type for memcpy, memmove, and memset.Dan Gohman2016-01-251-1/+1
| | | | | | | | | | | | | When generating calls to memcpy, memmove, and memset, use void* as the return type rather than void, to match the standard signatures for these functions. This has no practical effect for most targets, since the return values of these calls aren't being used anyway, and most calling conventions tolerate this kind of mismatch. However, this change will help support future optimizations to utilize the return value to avoid holding the argument value live across a call. llvm-svn: 258691
* [AVX512] Adding PTESTNMB/D/W/Q instructionMichael Zuckerman2016-01-254-0/+218
| | | | | | Differential Revision: http://reviews.llvm.org/D16520 llvm-svn: 258688
* [AVX512] Adding PTESTMB/W/D/Q instruction Michael Zuckerman2016-01-253-0/+180
| | | | | | Differential Revision: http://reviews.llvm.org/D16519 llvm-svn: 258686
* [ARM] Add DSP build attribute and extension targetingBradley Smith2016-01-252-4/+19
| | | | | | | | This patch was originally committed as r257885, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258683
* [ARM] Add new system registers to ARMv8-M Baseline/MainlineBradley Smith2016-01-252-0/+356
| | | | | | | | This patch was originally committed as r257884, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258682
* [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned ↵Asaf Badouh2016-01-252-0/+314
| | | | | | | | | | | 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-251-607/+803
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* Added Skylake client to X86 targets and featuresElena Demikhovsky2016-01-241-56/+196
| | | | | | | | | | | | | Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659
* AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.Igor Breger2016-01-244-1/+241
| | | | | | Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657
* [WinEH] Don't miscompile cleanups which conditionally unwind to callerDavid Majnemer2016-01-231-0/+24
| | | | | | | | | | | | | | | | A cleanup can have paths which unwind or end up in unreachable. If there is an unreachable path *and* a path which unwinds to caller, we would mistakenly inject an unwind path to a catchswitch on the unreachable path. This results in a verifier assertion firing because the cleanup unwinds to two different places: to the caller and to the catchswitch. This occured because we used getCleanupRetUnwindDest to determine if the cleanuppad had no cleanuprets. This is incorrect, getCleanupRetUnwindDest returns null for cleanuprets which unwind to caller. llvm-svn: 258651
* [SelectionDAG] Generalised the CONCAT_VECTORS creation to support ↵Simon Pilgrim2016-01-231-2/+2
| | | | | | BUILD_VECTOR and UNDEF folding. llvm-svn: 258646
* [CUDA] Die gracefully when trying to output an LLVM alias.Justin Lebar2016-01-231-0/+7
| | | | | | | | | | | | | | Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638
* regenerate checks and note some near-term improvementsSanjay Patel2016-01-231-239/+1100
| | | | | | | | For the moment, this file takes way too long to run (see inline comments), but that should be a temporary problem. The fact that the compile time is so slow for a target that doesn't support maskmov may be a bug worth investigating too. llvm-svn: 258629
* [X86][SSE] Remove INSERTPS dependencies from unreferenced operands.Simon Pilgrim2016-01-231-0/+32
| | | | | | If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622
* Put space after pointer type in test. NFC.Manuel Jacob2016-01-231-2/+2
| | | | llvm-svn: 258615
* AMDGPU: Replace some deprecated intrinsic uses in testsMatt Arsenault2016-01-238-61/+58
| | | | llvm-svn: 258614
* AMDGPU: Run instnamer on a few testsMatt Arsenault2016-01-234-1727/+1713
| | | | | | This will make future test updates easier llvm-svn: 258613
* AMDGPU: Remove more unused intrinsicsMatt Arsenault2016-01-236-1831/+1943
| | | | | | Replace tests with lrp with basic IR expansion llvm-svn: 258612
* AArch64ISel: Fix ccmp code selection matching deep expressions.Matthias Braun2016-01-231-0/+19
| | | | | | | | | | | | Some of the conditions necessary to produce ccmp sequences were only checked in recursive calls to emitConjunctionDisjunctionTree() after some of the earlier expressions were already built. Move all checks over to isConjunctionDisjunctionTree() so they are all checked before we start emitting instructions. Also rename some variable to better reflect their usage. llvm-svn: 258605
* [WinEH] Let cleanups post-dominated by unreachable get executedDavid Majnemer2016-01-224-4/+76
| | | | | | | | | | | | | | | | | | | | | | | | | Cleanups in C++ are a little weird. They are only guaranteed to be reliably executed if, and only if, there is a viable catch handler which can handle the exception. This means that reachability of a cleanup is lexically determined by it being nested with a try-block which unwinds to a catch. It is *cannot* be reasoned about by examining the control flow edges leaving a cleanup. Usually this is not a problem. It becomes a problem when there are *no* edges out of a cleanup because we believed that code post-dominated by the cleanup is dead. In LLVM's case, this code is what informs the personality routine about the presence of a suitable catch handler. However, the lack of edges to that catch handler makes the handler become unreachable which causes us to remove it. By removing the handler, the cleanup becomes unreachable. Instead, inject a catch-all handler with every cleanup that has no unwind edges. This will allow us to properly unwind the stack. This fixes PR25997. llvm-svn: 258580
* fixed to test features, not CPU modelsSanjay Patel2016-01-221-73/+73
| | | | llvm-svn: 258568
* AMDGPU: Add new name for barrier intrinsicMatt Arsenault2016-01-221-0/+28
| | | | llvm-svn: 258558
* AMDGPU: Rename intrinsics to use amdgcn prefixMatt Arsenault2016-01-2219-212/+308
| | | | | | | | | | | The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
* [AArch64] Cleanup ccmp test check labels. NFC.Ahmed Bougacha2016-01-221-10/+10
| | | | llvm-svn: 258541
* AMDGPU: Fix crash with invariant markersMatt Arsenault2016-01-221-0/+25
| | | | | | | | The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537
* [NVPTX] expand mul_lohi to mul_lo and mul_hiJingyue Wu2016-01-221-0/+24
| | | | | | | | | | | | Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536
* [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.Ahmed Bougacha2016-01-221-18/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current behavior is incorrect, as the two CCs returned by changeFPCCToAArch64CC, intended to be OR'ed, are instead used in an AND ccmp chain. Consider: define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) { %cc1 = fcmp one float %a, %b %cc2 = fcmp olt float %c, %d %and = and i1 %cc1, %cc2 %r = select i1 %and, i32 %e, i32 %f ret i32 %r } Assuming (%a < %b) and (%c < %d); we used to do: fcmp s0, s1 # nzcv <- 1000 orr w8, wzr, #0x1 # w8 <- 1 csel w9, w8, wzr, mi # w9 <- 1 csel w8, w8, w9, gt # w8 <- 1 fcmp s2, s3 # nzcv <- 1000 cset w9, mi # w9 <- 1 tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000 csel w0, w0, w1, ne # w0 <- w0 We now do: fcmp s2, s3 # nzcv <- 1000 fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000 fccmp s0, s1, #8, le # !le, so: nzcv <- 1000 csel w0, w0, w1, pl # !pl, so: w0 <- w1 In other words, we transformed: (c < d) && ((a < b) || (a > b)) into: (c < d) && (a u>= b) && (a u<= b) whereas, per De Morgan's, we wanted: (c < d) && !((a u>= b) && (a u<= b)) Note that this problem doesn't occur in the test-suite. changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt. We can't represent that in the fccmp chain; it can't express arbitrary OR sequences, as one comment explains: In general we can create code for arbitrary "... (and (and A B) C)" sequences. We can also implement some "or" expressions, because "(or A B)" is equivalent to "not (and (not A) (not B))" and we can implement some negation operations. [...] However there is no way to negate the result of a partial sequence. Instead, introduce changeFPCCToANDAArch64CC, which produces the conjunct cond codes: - (a one b) == ((a olt b) || (a ogt b)) == ((a ord b) && (a une b)) - (a ueq b) == ((a uno b) || (a oeq b)) == ((a ule b) && (a uge b)) Note that, at first, one might think that, when PushNegate is true, we should use the disjunct CCs, in effect doing: (a || b) = !(!a && !(b)) = !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2) = !(!a && !b1 && !b2) However, we can take advantage of the fact that the CC is already negated, which lets us avoid special-casing PushNegate and doing the simpler to reason about: (a || b) = !(!a && (!b)) = !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2) = !(!a && b1 && b2) This makes both emitConditionalCompare cases behave identically, and produces correct ccmp sequences for the 2-CC fcmps. llvm-svn: 258533
* [Hexagon] Use general purpose registers to spill pred/mod registers intoKrzysztof Parzyszek2016-01-221-0/+42
| | | | | | Patch by Tobias Edler Von Koch. llvm-svn: 258527
* AMDGPU: Rename some r600 intrinsics to use correct TargetPrefixMatt Arsenault2016-01-222-9/+9
| | | | | | These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523
* AMDGPU: Remove AMDGPU.fract intrinsicMatt Arsenault2016-01-224-71/+87
| | | | | | | Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513
* [SelectionDAG] Fold more offsets into GlobalAddressesDan Gohman2016-01-225-11/+725
| | | | | | | | This reapplies r258296 and r258366, and also fixes an existing bug in SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the offset in a GlobalAddressSDNode, which is uncovered by those patches. llvm-svn: 258482
* Do not lower VSETCC if operand is an f16 vectorPirama Arumuga Nainar2016-01-222-0/+358
| | | | | | | | | | | | | | | | | Summary: SETCC with f16 vectors has OperationAction set to Expand but still gets lowered to FCM* intrinsics based on its result type. This patch skips lowering of VSETCC if the operand is an f16 vector. v4 and v8 tests included. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15361 llvm-svn: 258471
* Revert "[SelectionDAG] Fold more offsets into GlobalAddresses"Reid Kleckner2016-01-224-701/+11
| | | | | | | | | | | | | | This reverts r258296 and the follow up r258366. With this change, we miscompiled the following program on Windows: #include <string> #include <iostream> static const char kData[] = "asdf jkl;"; int main() { std::string s(kData + 3, sizeof(kData) - 3); std::cout << s << '\n'; } llvm-svn: 258465
* Avoid unnecessary stack realignment in musttail thunks with SSE2 enabledReid Kleckner2016-01-211-0/+3
| | | | | | | | | | | The X86 musttail implementation finds register parameters to forward by running the calling convention algorithm until a non-register location is returned. However, assigning a vector memory location has the side effect of increasing the function's stack alignment. We shouldn't increase the stack alignment when we are only looking for register parameters, so this change conditionalizes it. llvm-svn: 258442
* [X86][SSE] Improve i16 splatting shufflesSimon Pilgrim2016-01-219-212/+169
| | | | | | | | | | | | Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440
* AVX512: Masked move intrinsic implementation.Igor Breger2016-01-2110-23/+324
| | | | | | | | Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398
* [AVX512] Adding VPERMT2B and VPERMI2B Intrinsics Michael Zuckerman2016-01-212-0/+170
| | | | | | Differential Revision: http://reviews.llvm.org/D16398 llvm-svn: 258397
OpenPOWER on IntegriCloud