summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* merge consecutive stores of extracted vector elements (PR21711)Sanjay Patel2015-01-221-0/+59
| | | | | | | | | | | | | | | | | | This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 llvm-svn: 226845
* [DAGCombine] Produce better code for constant splatsMichael Kuperstein2015-01-223-4/+44
| | | | | | | | | | | This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 Fixed recommit of r226811. llvm-svn: 226816
* Revert r226811, MSVC accepts code sane compilers don't.Michael Kuperstein2015-01-223-44/+4
| | | | llvm-svn: 226814
* [DAGCombine] Produce better code for constant splatsMichael Kuperstein2015-01-223-4/+44
| | | | | | | | | This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 llvm-svn: 226811
* Fixed a bug in type legalizer for masked load/store intrinsics.Elena Demikhovsky2015-01-221-3/+4
| | | | | | | | | | | | The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808
* Fixed a bug in narrowing store operation.Elena Demikhovsky2015-01-221-0/+10
| | | | | | | | | Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type, since the size of VT (1 bit) is not equal to its actual store size(8 bits). Added a test provided by David (dag@cray.com) llvm-svn: 226805
* SEH: Finish writing the catch-all test caseReid Kleckner2015-01-221-1/+5
| | | | llvm-svn: 226768
* Win64 SEH: Emit the constant 1 for catch-all into xdataReid Kleckner2015-01-221-0/+29
| | | | llvm-svn: 226767
* [X86][SSE] Missing SSE/AVX1 memory folding integer instructionsSimon Pilgrim2015-01-214-219/+647
| | | | | | | | | | Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1. The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests. Differential Revision: http://reviews.llvm.org/D7094 llvm-svn: 226745
* DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))Tim Northover2015-01-213-16/+57
| | | | | | | It can help with argument juggling on some targets, and is generally a good idea. llvm-svn: 226740
* R600: Add checks for urem/srem by a constantMatt Arsenault2015-01-212-1/+29
| | | | | | | Make sure this uses the faster expansion using magic constants to avoid the full division path. llvm-svn: 226734
* [X86][SSE] Added support for SSE3 lane duplication shuffle instructionsSimon Pilgrim2015-01-2112-503/+493
| | | | | | | | | | | | This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716
* R600: Add missing tests for i64 sremMatt Arsenault2015-01-211-0/+48
| | | | llvm-svn: 226713
* Fix load-store optimizer on thumbv4tJonathan Roelofs2015-01-211-0/+55
| | | | | | | | | | Thumbv4t does not have lo->lo copies other than MOVS, and that can't be predicated. So emit MOVS when needed and bail if there's a predicate. http://reviews.llvm.org/D6592 llvm-svn: 226711
* R600/SI: Custom lower froundMatt Arsenault2015-01-212-27/+124
| | | | | | | | | This fixes it for SI. It also removes the pattern used previously for Evergreen for f32. I'm not sure if the the new R600 output is better or not, but it uses 1 fewer instructions if BFI is available. llvm-svn: 226682
* [Hexagon] Converting multiply and accumulate with immediate intrinsics to ↵Colin LeMahieu2015-01-211-0/+120
| | | | | | patterns. llvm-svn: 226681
* [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.Ahmed Bougacha2015-01-213-6/+4
| | | | | | | | | | | | | | | | | | Now that we can fully specify extload legality, we can declare them legal for the PMOVSX/PMOVZX instructions. This for instance enables a DAGCombine to fire on code such as (and (<zextload-equivalent> ...), <redundant mask>) to turn it into: (zextload ...) as seen in the testcase changes. There is one regression, in widen_load-2.ll: we're no longer able to do store-to-load forwarding with illegal extload memory types. This will be addressed separately. Differential Revision: http://reviews.llvm.org/D6533 llvm-svn: 226676
* Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"Tim Northover2015-01-213-57/+16
| | | | | | | | It hadn't gone through review yet, but was still on my local copy. This reverts commit r226663 llvm-svn: 226665
* AArch64: add backend option to reserve x18 (platform register)Tim Northover2015-01-211-7/+8
| | | | | | | | | AAPCS64 says that it's up to the platform to specify whether x18 is reserved, and a first step on that way is to add a flag controlling it. From: Andrew Turner <andrew@fubar.geek.nz> llvm-svn: 226664
* DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))Tim Northover2015-01-213-16/+57
| | | | llvm-svn: 226663
* [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses ↵Michael Kuperstein2015-01-211-0/+10
| | | | | | in x32 mode. llvm-svn: 226661
* [X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC.Simon Pilgrim2015-01-211-191/+191
| | | | | | Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves. llvm-svn: 226623
* [X86][SSE] Added SSE/AVX1 integer stack folding tests.Simon Pilgrim2015-01-202-0/+1816
| | | | | | Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review. llvm-svn: 226622
* [X86][SSE] Added SSE fp stack folding tests.Simon Pilgrim2015-01-201-0/+1082
| | | | | | Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review. llvm-svn: 226621
* [X86][AVX] Renamed AVX1 fp stack folding tests. NFC.Simon Pilgrim2015-01-201-457/+457
| | | | | | The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync. llvm-svn: 226620
* [Hexagon] Adding intrinsics for doubleword ALU operations.Colin LeMahieu2015-01-201-0/+34
| | | | llvm-svn: 226606
* [Hexagon] Removing unnecessary clutter in intrinsic tests.Colin LeMahieu2015-01-201-18/+9
| | | | llvm-svn: 226602
* Prevent binary-tree deterioration in sparse switch statements.Daniel Jasper2015-01-201-0/+58
| | | | | | | | | | | | | This addresses part of llvm.org/PR22262. Specifically, it prevents considering the densities of sub-ranges that have fewer than TLI.getMinimumJumpTableEntries() elements. Those densities won't help jump tables. This is not a complete solution but works around the most pressing issue. Review: http://reviews.llvm.org/D7070 llvm-svn: 226600
* [GC] Verify-pass void vararg functions in gc.statepointRamkumar Ramachandra2015-01-201-0/+14
| | | | | | | | | | With the appropriate Verifier changes, exactracting the result out of a statepoint wrapping a vararg function crashes. However, a void vararg function works fine: commit this first step. Differential Revision: http://reviews.llvm.org/D7071 llvm-svn: 226599
* R600/SI: Fix simple-loop.ll testTom Stellard2015-01-201-1/+0
| | | | llvm-svn: 226596
* R600/SI: Add kill flag when copying scratch offset to a registerTom Stellard2015-01-201-2/+7
| | | | | | | This allows us to re-use the same register for the scratch offset when accessing large private arrays. llvm-svn: 226585
* R600/SI: Don't store scratch buffer frame index in MUBUF offset fieldTom Stellard2015-01-201-0/+81
| | | | | | | | We don't have a good way of legalizing this if the frame index offset is more than the 12-bits, which is size of MUBUF's offset field, so now we store the frame index in the vaddr field. llvm-svn: 226584
* [mips] Add registers and ALL check prefix to octeon test case.Kai Nacke2015-01-201-46/+36
| | | | | | | | No functional change. Reviewed by D. Sanders llvm-svn: 226574
* [mips] Add octeon branch instructions bbit0/bbit032/bbit1/bbit132Kai Nacke2015-01-201-0/+72
| | | | | | | | | This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132. It also includes patterns for instruction selection and test cases. Reviewed by D. Sanders llvm-svn: 226573
* [X86][AVX] Missing AVX1 memory folding float instructionsSimon Pilgrim2015-01-192-194/+457
| | | | | | | | | | Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing). Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me. Differential Revision: http://reviews.llvm.org/D7055 llvm-svn: 226513
* [Hexagon] Updating muxir/ri/ii intrinsics. Setting predicate registers as ↵Colin LeMahieu2015-01-191-0/+36
| | | | | | compatible with i32 rather than doing custom type conversion. llvm-svn: 226500
* [Hexagon] Converting intrinsics combine imm/imm, simple shifts and extends.Colin LeMahieu2015-01-192-0/+93
| | | | llvm-svn: 226483
* [Hexagon] Converting remaining ALU32/ALU intrinsics.Colin LeMahieu2015-01-191-0/+22
| | | | llvm-svn: 226480
* [Hexagon] Converting ALU32/ALU intrinsics to new patterns.Colin LeMahieu2015-01-191-0/+119
| | | | llvm-svn: 226478
* [AArch64] Implement GHC calling conventionGreg Fitzgerald2015-01-191-0/+89
| | | | | | | | | | Original patch by Luke Iannini. Minor improvements and test added by Erik de Castro Lopo. Differential Revision: http://reviews.llvm.org/D6877 From: Erik de Castro Lopo <erikd@mega-nerd.com> llvm-svn: 226473
* [Hexagon] Converting halfword to double accumulating multiply intrinsics.Colin LeMahieu2015-01-191-0/+390
| | | | llvm-svn: 226472
* Bring r226038 back.Rafael Espindola2015-01-197-46/+2
| | | | | | | | | | | | | | | | No change in this commit, but clang was changed to also produce trivial comdats when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226467
* [MIScheduler] Slightly better handling of constrainLocalCopy when both ↵Michael Kuperstein2015-01-193-17/+54
| | | | | | | | | | source and dest are local This fixes PR21792. Differential Revision: http://reviews.llvm.org/D6823 llvm-svn: 226433
* [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2Hal Finkel2015-01-191-2/+2
| | | | | | | | | | | Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call instructions in order to represent the dependency on the TOC base pointer value. Restricting this to ELF V2, however, does not seem to make sense: calls under ELF V1 have the same dependence, and indirect calls have an r2 dependence just as direct ones. Make sure the dependence is noted for all calls under both ELF V1 and ELF V2. llvm-svn: 226432
* R600: Remove redundant testMatt Arsenault2015-01-183-15/+2
| | | | | | This is already covered in ftrunc.ll llvm-svn: 226412
* [X86][SSE] Added scalar min/max folding tests. NFC.Simon Pilgrim2015-01-181-4/+40
| | | | llvm-svn: 226406
* [X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC.Simon Pilgrim2015-01-181-3/+26
| | | | llvm-svn: 226405
* [X86][SSE] Added scalar conversion stack folding tests. NFC.Simon Pilgrim2015-01-181-12/+156
| | | | llvm-svn: 226404
* AVX1 stack folding tests. NFC.Simon Pilgrim2015-01-181-31/+1246
| | | | | | | | Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added. I'll then move on to SSE tests and then the integer instructions. llvm-svn: 226400
* [PowerPC] Initial PPC64 calling-convention changes for fastccHal Finkel2015-01-182-0/+596
| | | | | | | | | | | | | | | | | The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is designed to work with both prototyped and non-prototyped/varargs functions. As a result, GPRs and stack space are allocated for every argument, even those that are passed in floating-point or vector registers. GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that do not have their address taken) to use the 'fast' calling convention. When functions are using the 'fast' calling convention, don't allocate GPRs for arguments passed in other types of registers, and don't allocate stack space for arguments passed in registers. Other changes for the fast calling convention may be added in the future. llvm-svn: 226399
OpenPOWER on IntegriCloud