summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Added Skylake client to X86 targets and featuresElena Demikhovsky2016-01-244-140/+129
| | | | | | | | | | | | | Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659
* AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.Igor Breger2016-01-244-8/+42
| | | | | | Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657
* [X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC.Simon Pilgrim2016-01-231-16/+11
| | | | | | Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645
* [CUDA] Die gracefully when trying to output an LLVM alias.Justin Lebar2016-01-231-0/+5
| | | | | | | | | | | | | | Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638
* [CUDA] Make empty parameter lists in nvptx function decls easier to read.Justin Lebar2016-01-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Before: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv( ) { After: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv() { Reviewers: bkramer Subscribers: llvm-commits, tra, jhen, echristo, jholewinski Differential Revision: http://reviews.llvm.org/D16512 llvm-svn: 258637
* Silence a -Wparentheses warning; NFC.Aaron Ballman2016-01-231-1/+1
| | | | llvm-svn: 258626
* Added missing comment. NFC.Simon Pilgrim2016-01-231-2/+3
| | | | llvm-svn: 258624
* [X86][SSE] Remove INSERTPS dependencies from unreferenced operands.Simon Pilgrim2016-01-231-3/+13
| | | | | | If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622
* Inline variable into assertMatthias Braun2016-01-231-3/+1
| | | | | | | | | Seems like some compilers still give unused variable warnings for bool var = ...; (void)var; so I have to inline the variable. llvm-svn: 258619
* AArch64ISelLowering.cpp: Fix a warning. [-Wunused-variable]NAKAMURA Takumi2016-01-231-0/+1
| | | | llvm-svn: 258618
* Put space after pointer type in test. NFC.Manuel Jacob2016-01-231-1/+1
| | | | llvm-svn: 258615
* AMDGPU: Remove more unused intrinsicsMatt Arsenault2016-01-236-73/+4
| | | | | | Replace tests with lrp with basic IR expansion llvm-svn: 258612
* AMDGPU: Move amdgcn intrinsic handling into SITargetLoweringMatt Arsenault2016-01-232-73/+68
| | | | llvm-svn: 258608
* AMDGPU: Remove IntrNoMem from llvm.SI.sendmsgMatt Arsenault2016-01-231-1/+1
| | | | | | This has side effects. llvm-svn: 258607
* AMDGPU: Remove Feature64BitPtrMatt Arsenault2016-01-233-14/+4
| | | | | | | This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606
* AArch64ISel: Fix ccmp code selection matching deep expressions.Matthias Braun2016-01-231-48/+79
| | | | | | | | | | | | Some of the conditions necessary to produce ccmp sequences were only checked in recursive calls to emitConjunctionDisjunctionTree() after some of the earlier expressions were already built. Move all checks over to isConjunctionDisjunctionTree() so they are all checked before we start emitting instructions. Also rename some variable to better reflect their usage. llvm-svn: 258605
* AArch64ISelLowering: Reduce maximum recursion depth of ↵Matthias Braun2016-01-231-2/+2
| | | | | | | | | isConjunctionDisjunctionTree() This function will exhibit exponential runtime (2**n) so we should rather use a lower limit. llvm-svn: 258604
* Fix wrong indentationMatthias Braun2016-01-231-4/+4
| | | | llvm-svn: 258603
* [WebAssembly] Fix RegNumbering for the stack pointerDerek Schuff2016-01-231-5/+13
| | | | | | | | | Previously it failed to add NumArgRegs to the offset and so clobbered an already-used register. Now just start the numbering after the arg regs and don't duplicate the add. Test coverage for this coming shortly with the implementation of byval. llvm-svn: 258597
* fix typos; NFCSanjay Patel2016-01-221-2/+2
| | | | llvm-svn: 258567
* AMDGPU: Add new name for barrier intrinsicMatt Arsenault2016-01-221-1/+7
| | | | llvm-svn: 258558
* AMDGPU: Rename intrinsics to use amdgcn prefixMatt Arsenault2016-01-224-13/+29
| | | | | | | | | | | The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
* AMDGPU: Fix crash with invariant markersMatt Arsenault2016-01-221-0/+8
| | | | | | | | The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537
* [NVPTX] expand mul_lohi to mul_lo and mul_hiJingyue Wu2016-01-221-0/+4
| | | | | | | | | | | | Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536
* [AArch64] Simplify emitConditionalCompare calls. NFC.Ahmed Bougacha2016-01-221-13/+9
| | | | | | | Now that both callsites are identical, we can simplify the prototype and make it easier to reason about the 2-CC case. llvm-svn: 258534
* [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.Ahmed Bougacha2016-01-221-8/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current behavior is incorrect, as the two CCs returned by changeFPCCToAArch64CC, intended to be OR'ed, are instead used in an AND ccmp chain. Consider: define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) { %cc1 = fcmp one float %a, %b %cc2 = fcmp olt float %c, %d %and = and i1 %cc1, %cc2 %r = select i1 %and, i32 %e, i32 %f ret i32 %r } Assuming (%a < %b) and (%c < %d); we used to do: fcmp s0, s1 # nzcv <- 1000 orr w8, wzr, #0x1 # w8 <- 1 csel w9, w8, wzr, mi # w9 <- 1 csel w8, w8, w9, gt # w8 <- 1 fcmp s2, s3 # nzcv <- 1000 cset w9, mi # w9 <- 1 tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000 csel w0, w0, w1, ne # w0 <- w0 We now do: fcmp s2, s3 # nzcv <- 1000 fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000 fccmp s0, s1, #8, le # !le, so: nzcv <- 1000 csel w0, w0, w1, pl # !pl, so: w0 <- w1 In other words, we transformed: (c < d) && ((a < b) || (a > b)) into: (c < d) && (a u>= b) && (a u<= b) whereas, per De Morgan's, we wanted: (c < d) && !((a u>= b) && (a u<= b)) Note that this problem doesn't occur in the test-suite. changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt. We can't represent that in the fccmp chain; it can't express arbitrary OR sequences, as one comment explains: In general we can create code for arbitrary "... (and (and A B) C)" sequences. We can also implement some "or" expressions, because "(or A B)" is equivalent to "not (and (not A) (not B))" and we can implement some negation operations. [...] However there is no way to negate the result of a partial sequence. Instead, introduce changeFPCCToANDAArch64CC, which produces the conjunct cond codes: - (a one b) == ((a olt b) || (a ogt b)) == ((a ord b) && (a une b)) - (a ueq b) == ((a uno b) || (a oeq b)) == ((a ule b) && (a uge b)) Note that, at first, one might think that, when PushNegate is true, we should use the disjunct CCs, in effect doing: (a || b) = !(!a && !(b)) = !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2) = !(!a && !b1 && !b2) However, we can take advantage of the fact that the CC is already negated, which lets us avoid special-casing PushNegate and doing the simpler to reason about: (a || b) = !(!a && (!b)) = !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2) = !(!a && b1 && b2) This makes both emitConditionalCompare cases behave identically, and produces correct ccmp sequences for the 2-CC fcmps. llvm-svn: 258533
* [AArch64] Assert that CCMP isel didn't fail inconsistently.Ahmed Bougacha2016-01-221-0/+2
| | | | | | | | | | | | We verify that the op tree is eligible for CCMP emission in isConjunctionDisjunctionTree, but it's also possible that emitConjunctionDisjunctionTree fails later. The initial check is useful, as it avoids building nodes that will get discarded. Still, make sure that inconsistencies don't happen with an assert. llvm-svn: 258532
* [Hexagon] Use general purpose registers to spill pred/mod registers intoKrzysztof Parzyszek2016-01-224-78/+310
| | | | | | Patch by Tobias Edler Von Koch. llvm-svn: 258527
* AMDGPU: Rename some r600 intrinsics to use correct TargetPrefixMatt Arsenault2016-01-223-39/+44
| | | | | | These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523
* AMDGPU: Remove unused R600 intrinsicsMatt Arsenault2016-01-222-48/+0
| | | | llvm-svn: 258522
* AMDGPU: Change control flow intrinsics to use amdgcn prefixMatt Arsenault2016-01-223-21/+23
| | | | | | | These aren't supposed to be used outside of the backend, so there aren't any users to worry about. llvm-svn: 258516
* AMDGPU: Don't use separate mulhu/mulhs PatsMatt Arsenault2016-01-221-12/+2
| | | | llvm-svn: 258515
* AMDGPU: Remove random TGSI intrinsicMatt Arsenault2016-01-223-14/+0
| | | | | | I don't think this was ever used. llvm-svn: 258514
* AMDGPU: Remove AMDGPU.fract intrinsicMatt Arsenault2016-01-224-7/+1
| | | | | | | Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513
* NFC WebAssembly: update linksJF Bastien2016-01-221-2/+2
| | | | | | I got a vanity URL, and moved the github waterfall repo. llvm-svn: 258484
* Do not lower VSETCC if operand is an f16 vectorPirama Arumuga Nainar2016-01-221-0/+3
| | | | | | | | | | | | | | | | | Summary: SETCC with f16 vectors has OperationAction set to Expand but still gets lowered to FCM* intrinsics based on its result type. This patch skips lowering of VSETCC if the operand is an f16 vector. v4 and v8 tests included. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15361 llvm-svn: 258471
* [X86][SSE] Improve i16 splatting shufflesSimon Pilgrim2016-01-211-0/+20
| | | | | | | | | | | | Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440
* [TTI] Add getCacheLineSizeAdam Nemet2016-01-213-5/+16
| | | | | | | | | | | | | | | | | Summary: And use it in PPCLoopDataPrefetch.cpp. @hfinkel, please let me know if your preference would be to preserve the ppc-loop-prefetch-cache-line option in order to be able to override the value of TTI::getCacheLineSize for PPC. Reviewers: hfinkel Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D16306 llvm-svn: 258419
* [mips] Allowed dla instructions on 32-bit architectures.Scott Egerton2016-01-211-5/+15
| | | | | | | | | | | | | | | Summary: This is now the same as the behaviour of the GNU assembler. This was done as it is required in order to build the Linux kernel with the integrated assembler enabled. Reviewers: dsanders, vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D13594 llvm-svn: 258400
* AVX512: Masked move intrinsic implementation.Igor Breger2016-01-212-11/+24
| | | | | | | | Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398
* [AVX512] Adding VPERMT2B and VPERMI2B Intrinsics Michael Zuckerman2016-01-211-0/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D16398 llvm-svn: 258397
* PR26172: unnecessary indirection in HexagonCopyToCombine.cppKrzysztof Parzyszek2016-01-211-1/+1
| | | | llvm-svn: 258395
* [X86] - Removing warning on legal cases caused by commit r258132Marina Yatsina2016-01-211-4/+13
| | | | | | | | | | | | | There's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" or "Move or Merge Scalar Double-Precision Floating-Point Value". The former should produce warnings when parsing a memory operand that is not ESI/EDI, but the latter should not. Fixed the code to produce warnings only after making sure we're dealing with the first case. Expanded the tests of the produced warnings + fixed RUN line of the test so that it would check both stdout and stderr Differential Revision: http://reviews.llvm.org/D16359 llvm-svn: 258393
* AMDGPU/SI: Pass whether to use the SI scheduler via Target AttributeTom Stellard2016-01-214-1/+13
| | | | | | | | | | | | | | | | Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386
* AMDGPU/SI: Promote i1 SETCC operationsTom Stellard2016-01-201-0/+1
| | | | | | | | | | | | | | Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352
* AMDGPU: Fix old comments that mention AMDILMatt Arsenault2016-01-203-4/+4
| | | | llvm-svn: 258350
* AMDGPU: Remove AMDGPU.trunc intrinsicMatt Arsenault2016-01-202-3/+0
| | | | llvm-svn: 258348
* AMDGPU: Remove AMDIL.fraction intrinsicMatt Arsenault2016-01-203-4/+1
| | | | llvm-svn: 258347
* AMDGPU: Remove AMDIL.round.nearest intrinsicMatt Arsenault2016-01-202-3/+0
| | | | llvm-svn: 258346
* AMDGPU: Remove abs intrinsicMatt Arsenault2016-01-203-16/+0
| | | | llvm-svn: 258343
OpenPOWER on IntegriCloud