summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* Revert patch series introducing the DAG combine to match a load-by-bytesChandler Carruth2016-12-163-1193/+0
| | | | | | | | | | | | | | | | | | | | | | | | idiom. r289538: Match load by bytes idiom and fold it into a single load r289540: Fix a buildbot failure introduced by r289538 r289545: Use more detailed assertion messages in the code ... r289646: Add a couple of assertions to the load combine code ... This DAG combine has a bad crash in it that is quite hard to trigger sadly -- it relies on sneaking code with UB through the SDAG build and into this particular combine. I've responded to the original commit with a test case that reproduces it. However, the code also has other problems that will require substantial changes to address and so I'm going ahead and reverting it for now. This should unblock us and perhaps others that are hitting the crash in the wild and will let a fresh patch with updated approach come in cleanly afterward. Sorry for any trouble or disruption! llvm-svn: 289916
* Revert "[IR] Remove the DIExpression field from DIGlobalVariable."Adrian Prantl2016-12-16163-561/+414
| | | | | | This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-16163-414/+561
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902
* [PPC] corrections in two testcasesEhsan Amiri2016-12-161-14/+14
| | | | | | | | | Removing sensitivity to scheduling (by using CHECK-DAG instead of CHECK) and some other minor corrections. In preparation to commit Power9 processor model. llvm-svn: 289900
* IPO: Introduce ThinLTOBitcodeWriter pass.Peter Collingbourne2016-12-166-0/+159
| | | | | | | | | | | | | | This pass prepares a module containing type metadata for ThinLTO by splitting it into regular and thin LTO parts if possible, and writing both parts to a multi-module bitcode file. Modules that do not contain type metadata are written unmodified as a single module. All globals with type metadata are added to the regular LTO module, and the rest are added to the thin LTO module. Differential Revision: https://reviews.llvm.org/D27324 llvm-svn: 289899
* [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)Teresa Johnson2016-12-152-0/+36
| | | | | | | | | | | | | | | | | Summary: We were reinvoking exportGlobalInModule numerous times redundantly. No need to re-export globals referenced by a global that was already imported from its module. This resulted in a large speedup in the thin link for a big application, particularly when importing aggressiveness was cranked up. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27687 llvm-svn: 289896
* [SimplifyLibCalls] Add a test to make sure we lower fls(0) correctly.Davide Italiano2016-12-151-0/+9
| | | | llvm-svn: 289895
* [SimplifyLibCalls] Lower fls() to llvm.ctlz().Davide Italiano2016-12-151-0/+48
| | | | | | Differential Revision: https://reviews.llvm.org/D14590 llvm-svn: 289894
* DebugInfo: Make a Generic test case actually generic (remove datalayout/triple)David Blaikie2016-12-151-7/+0
| | | | llvm-svn: 289893
* [IRTranslator] Merge the entry and ABI lowering blocks.Quentin Colombet2016-12-153-39/+7
| | | | | | | | | | | | | | | The IRTranslator uses an additional block before the LLVM-IR entry block to perform all the ABI lowering and the constant hoisting. Thus, this block is the actual entry block and it falls through the LLVM-IR entry block. However, with such representation, we end up with two basic blocks that are not maximal. Therefore, this patch adds a bit of canonicalization by merging both the LLVM-IR entry block and the ABI lowering/constants hoisting into one block, making the resulting block more likely to be maximal (indeed the LLVM-IR entry block might not have been maximal). llvm-svn: 289891
* DebugInfo: Emit ranges for functions with DISubprograms but lacking ↵David Blaikie2016-12-152-1/+43
| | | | | | | | | locations on any instructions This seems more consistent, and helps tidy up/simplify some other code in this change. llvm-svn: 289889
* Don't combine splats with other shuffles.Eli Friedman2016-12-152-29/+24
| | | | | | | | | | | We sometimes end up creating shuffles which are worse than the obvious translation of the IR. Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 . Differential Revision: https://reviews.llvm.org/D27793 llvm-svn: 289882
* Fix R_AARCH64_MOVW_UABS_G3 relocationYichao Yu2016-12-152-0/+67
| | | | | | | | | | | | Summary: The relocation is missing mask so an address that has non-zero bits in 47:43 may overwrite the register number. (Frequently shows up as target register changed to `xzr`....) Reviewers: t.p.northover, lhames Subscribers: davide, aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D27609 llvm-svn: 289880
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-156-13/+14
| | | | llvm-svn: 289877
* [gold] Add datalayout to test where it was missingTeresa Johnson2016-12-152-0/+6
| | | | | | | | | Needed due to change to require datalayout (r289719). Found this in my own testing, maybe there aren't any bots using a v1.12 gold yet. llvm-svn: 289876
* Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.Eli Friedman2016-12-154-173/+118
| | | | | | | | | | | | | Targets can't handle this case well in general; we often transform a shuffle of two cheap BUILD_VECTORs to element-by-element insertion, which is very inefficient. Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially fixes https://llvm.org/bugs/show_bug.cgi?id=31301. Differential Revision: https://reviews.llvm.org/D27787 llvm-svn: 289874
* [Verifier] Allow TBAA metadata on atomicrmw and atomiccmpxchgSanjoy Das2016-12-151-0/+22
| | | | | | | | | This used to be allowed before r289402 by default (before r289402 you could have TBAA metadata on any instruction), and while I'm not sure that it helps, it does sound reasonable enough to not fail the verifier and we have out-of-tree users who use this. llvm-svn: 289872
* [PPC] Use CHECK-DAG instead of CHECK in the testcaseEhsan Amiri2016-12-151-15/+15
| | | | | | | | | This test is currently sensitive to scheduling. Using CHECK-DAG allows us to preserve the main purpose of the test and remove this sensivity. In preparation to commit Power9 processor model. llvm-svn: 289869
* AMDGPU: Fix asserting on returned tail callsMatt Arsenault2016-12-151-0/+14
| | | | llvm-svn: 289868
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-152-0/+150
| | | | llvm-svn: 289866
* [LV] Enable vectorization of loops with conditional stores by defaultMatthew Simpson2016-12-155-6/+6
| | | | | | | | | This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863
* LibDriver: Allow resource files to be archive members.Peter Collingbourne2016-12-153-1/+4
| | | | | | | | It seems pointless to add a resource to an archive because it won't have any symbols to link against (and link.exe doesn't have an equivalent of --whole-archive), but lib.exe allows it for some reason. llvm-svn: 289859
* [InstCombine] add folds for icmp (smin X, Y), XSanjay Patel2016-12-151-36/+12
| | | | | | | | | | | | | | Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. This patch won't solve the example that was attached to that thread, so something else still needs fixing. The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that we want to fold to already exists, but sometimes it's the swapped form of what we want. Corresponding changes for smax/umin/umax to follow. Differential Revision: https://reviews.llvm.org/D27531 llvm-svn: 289855
* [x86] use a single shufps for 256-bit vectors when it can save instructionsSanjay Patel2016-12-152-41/+18
| | | | | | | | | | | This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-6/+12
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* [ThinLTO] Ensure callees get hot threshold when first seen on cold pathTeresa Johnson2016-12-152-0/+95
| | | | | | | | | | | | | | | | | This is split out from D27696, since it turned out to be a bug fix and not part of the NFC efficiency change. Keep the same adjusted (possibly decayed) threshold in both the worklist and the ImportList. Otherwise if we encountered it first along a cold path, the callee would be added to the worklist with a lower decayed threshold than when it is later encountered along a hot path. But the logic uses the threshold recorded in the ImportList entry to check if we should re-add it, and without this patch the threshold recorded there is the same along both paths so we don't re-add it. Using the same possibly decayed threshold in the ImportList ensures we re-add it later with the higher non-decayed hot path threshold. llvm-svn: 289843
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-1522-1187/+770
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [X86][SSE] Fix domains for scalar store instructionsSimon Pilgrim2016-12-153-7/+7
| | | | | | As discussed on D27692 llvm-svn: 289834
* Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of ↵Robert Lougher2016-12-151-70/+0
| | | | | | | | common inst" Reverting as it is causing buildbot failures (address sanitizer). llvm-svn: 289833
* [lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵Jacques Pienaar2016-12-151-0/+14
| | | | | | | | sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
* [SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst Robert Lougher2016-12-151-0/+70
| | | | | | | | | | | Simplify CFG will try to sink the last instruction in a series of basic blocks, creating a "common" instruction in the successor block (sinkLastInstruction). When it does this, the debug location of the single instruction should be the merged debug locations of the commoned instructions. Differential Revision: https://reviews.llvm.org/D27590 llvm-svn: 289828
* [X86][SSE] Fix domains for VZEXT_LOAD type instructionsSimon Pilgrim2016-12-1547-202/+190
| | | | | | | | Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-0/+11
| | | | llvm-svn: 289822
* [CostModel][X86] Updated reverse shuffle costsSimon Pilgrim2016-12-151-32/+56
| | | | llvm-svn: 289819
* [TEST] Initial commit of tests for minmax horizontal reductions.Alexey Bataev2016-12-151-0/+1725
| | | | llvm-svn: 289817
* Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"Alexey Bataev2016-12-152-350/+0
| | | | | | This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347. llvm-svn: 289814
* [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmpEhsan Amiri2016-12-151-0/+204
| | | | | | | | | | | | | | | | | A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813
* [CostModel] Fix long standing bug with reverse shuffle mask detectionSimon Pilgrim2016-12-151-0/+31
| | | | | | Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles llvm-svn: 289811
* [TESTS] Initial commit of tests, by Andrew TischenkoAlexey Bataev2016-12-152-0/+350
| | | | llvm-svn: 289807
* [Power9] Allow AnyExt immediates for XXSPLTIBNemanja Ivanovic2016-12-151-0/+9
| | | | | | | | | | In some situations, the BUILD_VECTOR node that builds a v18i8 vector by a splat of an i8 constant will end up with signed 8-bit values and other situations, it'll end up with unsigned ones. Handle both situations. Fixes PR31340. llvm-svn: 289804
* [AVR] Support floats in the instrumention passDylan McKay2016-12-151-4/+21
| | | | | | This also refactors some common code into the 'GetTypeName' method. llvm-svn: 289803
* [CostModel][X86] Add tests for reverse shuffle costsSimon Pilgrim2016-12-151-0/+143
| | | | llvm-svn: 289800
* Add missing triple target for numeric section flag testPrakhar Bahuguna2016-12-151-1/+1
| | | | llvm-svn: 289798
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlySjoerd Meijer2016-12-157-19/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
* Allow ELF section flags to be specified numericallyPrakhar Bahuguna2016-12-151-0/+37
| | | | | | | | | | | | | | | | | Summary: GAS already allows flags for sections to be specified directly as a numeric value. This functionality is particularly useful for setting processor or application-specific values that may not be directly supported or understood by LLVM. This patch allows LLVM to use numeric section flag values verbatim if specified by the assembly file. Reviewers: grosbach, rafael, t.p.northover, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27451 llvm-svn: 289785
* [ARM] Implement execute-only support in CodeGenPrakhar Bahuguna2016-12-155-0/+305
| | | | | | | | | | | | | | | | | | | | This implements execute-only support for ARM code generation, which prevents the compiler from generating data accesses to code sections. The following changes are involved: * Add the CodeGen option "-arm-execute-only" to the ARM code generator. * Add the clang flag "-mexecute-only" as well as the GCC-compatible alias "-mpure-code" to enable this option. * When enabled, literal pools are replaced with MOVW/MOVT instructions, with VMOV used in addition for floating-point literals. As the MOVT instruction is required, execute-only support is only available in Thumb mode for targets supporting ARMv8-M baseline or Thumb2. * Jump tables are placed in data sections when in execute-only mode. * The execute-only text section is assigned section ID 0, and is marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'. This also overrides selection of ELF sections for globals. llvm-svn: 289784
* Add missing -mtriple to MIR test caseSanjoy Das2016-12-151-1/+1
| | | | llvm-svn: 289779
* [MachineBlockPlacement] Don't make blocks "uneditable"Sanjoy Das2016-12-151-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes an issue with MachineBlockPlacement due to a badly timed call to `analyzeBranch` with `AllowModify` set to true. The timeline is as follows: 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls `TailDup.shouldTailDuplicate` on its argument, which in turn calls `analyzeBranch` with `AllowModify` set to true. 2. This `analyzeBranch` call edits the terminator sequence of the block based on the physical layout of the machine function, turning an unanalyzable non-fallthrough block to a unanalyzable fallthrough block. Normally MBP bails out of rearranging such blocks, but this block was unanalyzable non-fallthrough (and thus rearrangeable) the first time MBP looked at it, and so it goes ahead and decides where it should be placed in the function. 3. When placing this block MBP fails to analyze and thus update the block in keeping with the new physical layout. Concretely, before (1) we have something like: ``` LBL0: < unknown terminator op that may branch to LBL1 > jmp LBL1 LBL1: ... A LBL2: ... B ``` In (2), analyze branch simplifies this to ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump removed LBL1: ... A LBL2: ... B ``` In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2 after the first block since that is profitable. ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump LBL2: ... B LBL1: ... A ``` and the program now has incorrect behavior (we no longer fall-through from `LBL0` to `LBL1`) because MBP can no longer edit LBL0. There are several possible solutions, but I went with removing the teeth off of the `analyzeBranch` calls in TailDuplicator. That makes thinking about the result of these calls easier, and breaks nothing in the lit test suite. I've also added some bookkeeping to the MachineBlockPlacement pass and used that to write an assert that would have caught this. Reviewers: chandlerc, gberry, MatzeB, iteratee Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27783 llvm-svn: 289764
* [AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵Craig Topper2016-12-151-0/+390
| | | | | | SimplifyDemandedVectorElts. llvm-svn: 289759
* Remove the AssumptionCacheHal Finkel2016-12-152-23/+1
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
OpenPOWER on IntegriCloud