summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-152-0/+150
| | | | llvm-svn: 289866
* [LV] Enable vectorization of loops with conditional stores by defaultMatthew Simpson2016-12-155-6/+6
| | | | | | | | | This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863
* LibDriver: Allow resource files to be archive members.Peter Collingbourne2016-12-153-1/+4
| | | | | | | | It seems pointless to add a resource to an archive because it won't have any symbols to link against (and link.exe doesn't have an equivalent of --whole-archive), but lib.exe allows it for some reason. llvm-svn: 289859
* [InstCombine] add folds for icmp (smin X, Y), XSanjay Patel2016-12-151-36/+12
| | | | | | | | | | | | | | Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. This patch won't solve the example that was attached to that thread, so something else still needs fixing. The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that we want to fold to already exists, but sometimes it's the swapped form of what we want. Corresponding changes for smax/umin/umax to follow. Differential Revision: https://reviews.llvm.org/D27531 llvm-svn: 289855
* [x86] use a single shufps for 256-bit vectors when it can save instructionsSanjay Patel2016-12-152-41/+18
| | | | | | | | | | | This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-6/+12
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* [ThinLTO] Ensure callees get hot threshold when first seen on cold pathTeresa Johnson2016-12-152-0/+95
| | | | | | | | | | | | | | | | | This is split out from D27696, since it turned out to be a bug fix and not part of the NFC efficiency change. Keep the same adjusted (possibly decayed) threshold in both the worklist and the ImportList. Otherwise if we encountered it first along a cold path, the callee would be added to the worklist with a lower decayed threshold than when it is later encountered along a hot path. But the logic uses the threshold recorded in the ImportList entry to check if we should re-add it, and without this patch the threshold recorded there is the same along both paths so we don't re-add it. Using the same possibly decayed threshold in the ImportList ensures we re-add it later with the higher non-decayed hot path threshold. llvm-svn: 289843
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-1522-1187/+770
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [X86][SSE] Fix domains for scalar store instructionsSimon Pilgrim2016-12-153-7/+7
| | | | | | As discussed on D27692 llvm-svn: 289834
* Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of ↵Robert Lougher2016-12-151-70/+0
| | | | | | | | common inst" Reverting as it is causing buildbot failures (address sanitizer). llvm-svn: 289833
* [lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵Jacques Pienaar2016-12-151-0/+14
| | | | | | | | sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
* [SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst Robert Lougher2016-12-151-0/+70
| | | | | | | | | | | Simplify CFG will try to sink the last instruction in a series of basic blocks, creating a "common" instruction in the successor block (sinkLastInstruction). When it does this, the debug location of the single instruction should be the merged debug locations of the commoned instructions. Differential Revision: https://reviews.llvm.org/D27590 llvm-svn: 289828
* [X86][SSE] Fix domains for VZEXT_LOAD type instructionsSimon Pilgrim2016-12-1547-202/+190
| | | | | | | | Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-0/+11
| | | | llvm-svn: 289822
* [CostModel][X86] Updated reverse shuffle costsSimon Pilgrim2016-12-151-32/+56
| | | | llvm-svn: 289819
* [TEST] Initial commit of tests for minmax horizontal reductions.Alexey Bataev2016-12-151-0/+1725
| | | | llvm-svn: 289817
* Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"Alexey Bataev2016-12-152-350/+0
| | | | | | This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347. llvm-svn: 289814
* [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmpEhsan Amiri2016-12-151-0/+204
| | | | | | | | | | | | | | | | | A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813
* [CostModel] Fix long standing bug with reverse shuffle mask detectionSimon Pilgrim2016-12-151-0/+31
| | | | | | Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles llvm-svn: 289811
* [TESTS] Initial commit of tests, by Andrew TischenkoAlexey Bataev2016-12-152-0/+350
| | | | llvm-svn: 289807
* [Power9] Allow AnyExt immediates for XXSPLTIBNemanja Ivanovic2016-12-151-0/+9
| | | | | | | | | | In some situations, the BUILD_VECTOR node that builds a v18i8 vector by a splat of an i8 constant will end up with signed 8-bit values and other situations, it'll end up with unsigned ones. Handle both situations. Fixes PR31340. llvm-svn: 289804
* [AVR] Support floats in the instrumention passDylan McKay2016-12-151-4/+21
| | | | | | This also refactors some common code into the 'GetTypeName' method. llvm-svn: 289803
* [CostModel][X86] Add tests for reverse shuffle costsSimon Pilgrim2016-12-151-0/+143
| | | | llvm-svn: 289800
* Add missing triple target for numeric section flag testPrakhar Bahuguna2016-12-151-1/+1
| | | | llvm-svn: 289798
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlySjoerd Meijer2016-12-157-19/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
* Allow ELF section flags to be specified numericallyPrakhar Bahuguna2016-12-151-0/+37
| | | | | | | | | | | | | | | | | Summary: GAS already allows flags for sections to be specified directly as a numeric value. This functionality is particularly useful for setting processor or application-specific values that may not be directly supported or understood by LLVM. This patch allows LLVM to use numeric section flag values verbatim if specified by the assembly file. Reviewers: grosbach, rafael, t.p.northover, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27451 llvm-svn: 289785
* [ARM] Implement execute-only support in CodeGenPrakhar Bahuguna2016-12-155-0/+305
| | | | | | | | | | | | | | | | | | | | This implements execute-only support for ARM code generation, which prevents the compiler from generating data accesses to code sections. The following changes are involved: * Add the CodeGen option "-arm-execute-only" to the ARM code generator. * Add the clang flag "-mexecute-only" as well as the GCC-compatible alias "-mpure-code" to enable this option. * When enabled, literal pools are replaced with MOVW/MOVT instructions, with VMOV used in addition for floating-point literals. As the MOVT instruction is required, execute-only support is only available in Thumb mode for targets supporting ARMv8-M baseline or Thumb2. * Jump tables are placed in data sections when in execute-only mode. * The execute-only text section is assigned section ID 0, and is marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'. This also overrides selection of ELF sections for globals. llvm-svn: 289784
* Add missing -mtriple to MIR test caseSanjoy Das2016-12-151-1/+1
| | | | llvm-svn: 289779
* [MachineBlockPlacement] Don't make blocks "uneditable"Sanjoy Das2016-12-151-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes an issue with MachineBlockPlacement due to a badly timed call to `analyzeBranch` with `AllowModify` set to true. The timeline is as follows: 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls `TailDup.shouldTailDuplicate` on its argument, which in turn calls `analyzeBranch` with `AllowModify` set to true. 2. This `analyzeBranch` call edits the terminator sequence of the block based on the physical layout of the machine function, turning an unanalyzable non-fallthrough block to a unanalyzable fallthrough block. Normally MBP bails out of rearranging such blocks, but this block was unanalyzable non-fallthrough (and thus rearrangeable) the first time MBP looked at it, and so it goes ahead and decides where it should be placed in the function. 3. When placing this block MBP fails to analyze and thus update the block in keeping with the new physical layout. Concretely, before (1) we have something like: ``` LBL0: < unknown terminator op that may branch to LBL1 > jmp LBL1 LBL1: ... A LBL2: ... B ``` In (2), analyze branch simplifies this to ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump removed LBL1: ... A LBL2: ... B ``` In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2 after the first block since that is profitable. ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump LBL2: ... B LBL1: ... A ``` and the program now has incorrect behavior (we no longer fall-through from `LBL0` to `LBL1`) because MBP can no longer edit LBL0. There are several possible solutions, but I went with removing the teeth off of the `analyzeBranch` calls in TailDuplicator. That makes thinking about the result of these calls easier, and breaks nothing in the lit test suite. I've also added some bookkeeping to the MachineBlockPlacement pass and used that to write an assert that would have caught this. Reviewers: chandlerc, gberry, MatzeB, iteratee Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27783 llvm-svn: 289764
* [AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵Craig Topper2016-12-151-0/+390
| | | | | | SimplifyDemandedVectorElts. llvm-svn: 289759
* Remove the AssumptionCacheHal Finkel2016-12-152-23/+1
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Make processing @llvm.assume more efficient by using operand bundlesHal Finkel2016-12-1511-41/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
* Add testcases for some shuffle bugs.Eli Friedman2016-12-152-0/+205
| | | | | | | See https://llvm.org/bugs/show_bug.cgi?id=31301 and https://llvm.org/bugs/show_bug.cgi?id=31364 . llvm-svn: 289751
* Fix test/tools/lto/hide-linkonce-odr.ll after r289719Nico Weber2016-12-151-0/+1
| | | | llvm-svn: 289750
* Use PIC relocation model as default for PowerPC64 ELF.Joerg Sonnenberger2016-12-1519-35/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the PowerPC64 code generation for the ELF ABI is already PIC. There are four main exceptions: (1) Constant pointer arrays etc. should in writeable sections. (2) The TOC restoration NOP after a call is needed for all global symbols. While GNU ld has a workaround for questionable GCC self-calls, we trigger the checks for calls from COMDAT sections as they cross input sections and are therefore not considered self-calls. The current decision is questionable and suboptimal, but outside the scope of the change. (3) TLS access can not use the initial-exec model. (4) Jump tables should use relative addresses. Note that the current encoding doesn't work for the large code model, but it is more compact than the default for any non-trivial jump table. Improving this is again beyond the scope of this change. At least (1) and (3) are assumptions made in target-independent code and introducing additional hooks is a bit messy. Testing with clang shows that a -fPIC binary is 600KB smaller than the corresponding -fno-pic build. Separate testing from improved jump table encodings would explain only about 100KB or so. The rest is expected to be a result of more aggressive immediate forming for -fno-pic, where the -fPIC binary just uses TOC entries. This change brings the LLVM output in line with the GCC output, other PPC64 compilers like XLC on AIX are known to produce PIC by default as well. The relocation model can still be provided explicitly, i.e. when using MCJIT. One test case for case (1) is included, other test cases with relocation mode sensitive behavior are wired to static for now. They will be reviewed and adjusted separately. Differential Revision: https://reviews.llvm.org/D26566 llvm-svn: 289743
* [AMDGPU] Fix runtime-metadata.ll test so it doesn't leave an object file in ↵Justin Lebar2016-12-141-1/+1
| | | | | | the source tree. llvm-svn: 289742
* [DAG] allow more select folding for targets that have 'and not' (PR31175)Sanjay Patel2016-12-142-14/+9
| | | | | | | | | | | | | | The original motivation for this patch comes from wanting to canonicalize more IR to selects and also canonicalizing min/max. If we're going to do that, we need more backend fixups to undo select codegen when simpler ops will do. I chose AArch64 for the tests because that shows the difference in the simplest way. This should fix: https://llvm.org/bugs/show_bug.cgi?id=31175 Differential Revision: https://reviews.llvm.org/D27489 llvm-svn: 289738
* [gold] Add datalayout to two tests where it was missing.Davide Italiano2016-12-142-0/+6
| | | | | | Reported by: thakis via chromium bots. llvm-svn: 289737
* Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.Justin Lebar2016-12-141-4/+0
| | | | llvm-svn: 289730
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-141-3/+13
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
* LibDriver: Reject inputs that are not COFF objects or bitcode files.Peter Collingbourne2016-12-143-1/+3
| | | | | | | | Fixes PR31372. Differential Revision: https://reviews.llvm.org/D27776 llvm-svn: 289726
* Only sets profile summary when it was not preset.Dehao Chen2016-12-141-0/+1
| | | | | | | | | | | | Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725
* [LTO] Add the missing datalayout in a test.Davide Italiano2016-12-141-0/+1
| | | | llvm-svn: 289720
* [LTO] Reject modules without datalayout.Davide Italiano2016-12-1487-8/+157
| | | | | | | | | | | Also, udpate the ~60 failing tests in the tree which did not contain a valid datalayout. This fixes PR31123. lld will be updated in a following patch, immediately after this is committed. Differential Revision: https://reviews.llvm.org/D27082 llvm-svn: 289719
* [asan] Don't skip instrumentation of masked load/store unless we've seen a ↵Filipe Cabecinhas2016-12-141-0/+62
| | | | | | | | | | | | full load/store on that pointer. Reviewers: kcc, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27625 llvm-svn: 289718
* [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation ↵Filipe Cabecinhas2016-12-141-11/+24
| | | | | | | | | | | | instrumentation. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27548 llvm-svn: 289717
* [ARM] Split 128-bit vectors in BUILD_VECTOR loweringEli Friedman2016-12-144-26/+63
| | | | | | | | | | | | | Given that INSERT_VECTOR_ELT operates on D registers anyway, combining 64-bit vectors into a 128-bit vector is basically free. Therefore, try to split BUILD_VECTOR nodes before giving up and lowering them to a series of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically better lowerings; see testcases for examples. Inspired by similar code in the x86 backend for AVX. Differential Revision: https://reviews.llvm.org/D27624 llvm-svn: 289706
* [InstCombine] Folding of a compare with RHS const should merge debug locationsRobert Lougher2016-12-141-0/+51
| | | | | | | | | | | | | If all the operands to a phi node are compares that have a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289704
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-142-6/+163
| | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289703
* [InstCombine] Folding of a binop with RHS const should merge the debug locationsRobert Lougher2016-12-141-0/+49
| | | | | | | | | | | | | If all the operands to a phi node are a binop with a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 7 of 8 for D26256. Folding of a binop with RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289699
OpenPOWER on IntegriCloud