bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CostModel][X86] Updated reverse shuffle costs	Simon Pilgrim	2016-12-15	1	-32/+56
\| \| \| \|	llvm-svn: 289819
*	[TEST] Initial commit of tests for minmax horizontal reductions.	Alexey Bataev	2016-12-15	1	-0/+1725
\| \| \| \|	llvm-svn: 289817
*	Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"	Alexey Bataev	2016-12-15	2	-350/+0
\| \| \| \| \| \|	This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347. llvm-svn: 289814
*	[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp	Ehsan Amiri	2016-12-15	1	-0/+204
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813
*	[CostModel] Fix long standing bug with reverse shuffle mask detection	Simon Pilgrim	2016-12-15	1	-0/+31
\| \| \| \| \| \|	Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles llvm-svn: 289811
*	[TESTS] Initial commit of tests, by Andrew Tischenko	Alexey Bataev	2016-12-15	2	-0/+350
\| \| \| \|	llvm-svn: 289807
*	[Power9] Allow AnyExt immediates for XXSPLTIB	Nemanja Ivanovic	2016-12-15	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	In some situations, the BUILD_VECTOR node that builds a v18i8 vector by a splat of an i8 constant will end up with signed 8-bit values and other situations, it'll end up with unsigned ones. Handle both situations. Fixes PR31340. llvm-svn: 289804
*	[AVR] Support floats in the instrumention pass	Dylan McKay	2016-12-15	1	-4/+21
\| \| \| \| \| \|	This also refactors some common code into the 'GetTypeName' method. llvm-svn: 289803
*	[CostModel][X86] Add tests for reverse shuffle costs	Simon Pilgrim	2016-12-15	1	-0/+143
\| \| \| \|	llvm-svn: 289800
*	Add missing triple target for numeric section flag test	Prakhar Bahuguna	2016-12-15	1	-1/+1
\| \| \| \|	llvm-svn: 289798
*	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	Sjoerd Meijer	2016-12-15	7	-19/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
*	Allow ELF section flags to be specified numerically	Prakhar Bahuguna	2016-12-15	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GAS already allows flags for sections to be specified directly as a numeric value. This functionality is particularly useful for setting processor or application-specific values that may not be directly supported or understood by LLVM. This patch allows LLVM to use numeric section flag values verbatim if specified by the assembly file. Reviewers: grosbach, rafael, t.p.northover, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27451 llvm-svn: 289785
*	[ARM] Implement execute-only support in CodeGen	Prakhar Bahuguna	2016-12-15	5	-0/+305
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements execute-only support for ARM code generation, which prevents the compiler from generating data accesses to code sections. The following changes are involved: * Add the CodeGen option "-arm-execute-only" to the ARM code generator. * Add the clang flag "-mexecute-only" as well as the GCC-compatible alias "-mpure-code" to enable this option. * When enabled, literal pools are replaced with MOVW/MOVT instructions, with VMOV used in addition for floating-point literals. As the MOVT instruction is required, execute-only support is only available in Thumb mode for targets supporting ARMv8-M baseline or Thumb2. * Jump tables are placed in data sections when in execute-only mode. * The execute-only text section is assigned section ID 0, and is marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'. This also overrides selection of ELF sections for globals. llvm-svn: 289784
*	Add missing -mtriple to MIR test case	Sanjoy Das	2016-12-15	1	-1/+1
\| \| \| \|	llvm-svn: 289779
*	[MachineBlockPlacement] Don't make blocks "uneditable"	Sanjoy Das	2016-12-15	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes an issue with MachineBlockPlacement due to a badly timed call to `analyzeBranch` with `AllowModify` set to true. The timeline is as follows: 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls `TailDup.shouldTailDuplicate` on its argument, which in turn calls `analyzeBranch` with `AllowModify` set to true. 2. This `analyzeBranch` call edits the terminator sequence of the block based on the physical layout of the machine function, turning an unanalyzable non-fallthrough block to a unanalyzable fallthrough block. Normally MBP bails out of rearranging such blocks, but this block was unanalyzable non-fallthrough (and thus rearrangeable) the first time MBP looked at it, and so it goes ahead and decides where it should be placed in the function. 3. When placing this block MBP fails to analyze and thus update the block in keeping with the new physical layout. Concretely, before (1) we have something like: ``` LBL0: < unknown terminator op that may branch to LBL1 > jmp LBL1 LBL1: ... A LBL2: ... B ``` In (2), analyze branch simplifies this to ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump removed LBL1: ... A LBL2: ... B ``` In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2 after the first block since that is profitable. ``` LBL0: < unknown terminator op that may branch to LBL2 > ;; jmp LBL1 <- redundant jump LBL2: ... B LBL1: ... A ``` and the program now has incorrect behavior (we no longer fall-through from `LBL0` to `LBL1`) because MBP can no longer edit LBL0. There are several possible solutions, but I went with removing the teeth off of the `analyzeBranch` calls in TailDuplicator. That makes thinking about the result of these calls easier, and breaks nothing in the lit test suite. I've also added some bookkeeping to the MachineBlockPlacement pass and used that to write an assert that would have caught this. Reviewers: chandlerc, gberry, MatzeB, iteratee Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27783 llvm-svn: 289764
*	[AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵	Craig Topper	2016-12-15	1	-0/+390
\| \| \| \| \| \|	SimplifyDemandedVectorElts. llvm-svn: 289759
*	Remove the AssumptionCache	Hal Finkel	2016-12-15	2	-23/+1
\| \| \| \| \| \| \| \| \|	After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
*	Make processing @llvm.assume more efficient by using operand bundles	Hal Finkel	2016-12-15	11	-41/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
*	Add testcases for some shuffle bugs.	Eli Friedman	2016-12-15	2	-0/+205
\| \| \| \| \| \| \|	See https://llvm.org/bugs/show_bug.cgi?id=31301 and https://llvm.org/bugs/show_bug.cgi?id=31364 . llvm-svn: 289751
*	Fix test/tools/lto/hide-linkonce-odr.ll after r289719	Nico Weber	2016-12-15	1	-0/+1
\| \| \| \|	llvm-svn: 289750
*	Use PIC relocation model as default for PowerPC64 ELF.	Joerg Sonnenberger	2016-12-15	19	-35/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the PowerPC64 code generation for the ELF ABI is already PIC. There are four main exceptions: (1) Constant pointer arrays etc. should in writeable sections. (2) The TOC restoration NOP after a call is needed for all global symbols. While GNU ld has a workaround for questionable GCC self-calls, we trigger the checks for calls from COMDAT sections as they cross input sections and are therefore not considered self-calls. The current decision is questionable and suboptimal, but outside the scope of the change. (3) TLS access can not use the initial-exec model. (4) Jump tables should use relative addresses. Note that the current encoding doesn't work for the large code model, but it is more compact than the default for any non-trivial jump table. Improving this is again beyond the scope of this change. At least (1) and (3) are assumptions made in target-independent code and introducing additional hooks is a bit messy. Testing with clang shows that a -fPIC binary is 600KB smaller than the corresponding -fno-pic build. Separate testing from improved jump table encodings would explain only about 100KB or so. The rest is expected to be a result of more aggressive immediate forming for -fno-pic, where the -fPIC binary just uses TOC entries. This change brings the LLVM output in line with the GCC output, other PPC64 compilers like XLC on AIX are known to produce PIC by default as well. The relocation model can still be provided explicitly, i.e. when using MCJIT. One test case for case (1) is included, other test cases with relocation mode sensitive behavior are wired to static for now. They will be reviewed and adjusted separately. Differential Revision: https://reviews.llvm.org/D26566 llvm-svn: 289743
*	[AMDGPU] Fix runtime-metadata.ll test so it doesn't leave an object file in ↵	Justin Lebar	2016-12-14	1	-1/+1
\| \| \| \| \| \|	the source tree. llvm-svn: 289742
*	[DAG] allow more select folding for targets that have 'and not' (PR31175)	Sanjay Patel	2016-12-14	2	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original motivation for this patch comes from wanting to canonicalize more IR to selects and also canonicalizing min/max. If we're going to do that, we need more backend fixups to undo select codegen when simpler ops will do. I chose AArch64 for the tests because that shows the difference in the simplest way. This should fix: https://llvm.org/bugs/show_bug.cgi?id=31175 Differential Revision: https://reviews.llvm.org/D27489 llvm-svn: 289738
*	[gold] Add datalayout to two tests where it was missing.	Davide Italiano	2016-12-14	2	-0/+6
\| \| \| \| \| \|	Reported by: thakis via chromium bots. llvm-svn: 289737
*	Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.	Justin Lebar	2016-12-14	1	-4/+0
\| \| \| \|	llvm-svn: 289730
*	[NVPTX] Support .maxnreg annotation.	Justin Lebar	2016-12-14	1	-3/+13
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
*	LibDriver: Reject inputs that are not COFF objects or bitcode files.	Peter Collingbourne	2016-12-14	3	-1/+3
\| \| \| \| \| \| \| \|	Fixes PR31372. Differential Revision: https://reviews.llvm.org/D27776 llvm-svn: 289726
*	Only sets profile summary when it was not preset.	Dehao Chen	2016-12-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725
*	[LTO] Add the missing datalayout in a test.	Davide Italiano	2016-12-14	1	-0/+1
\| \| \| \|	llvm-svn: 289720
*	[LTO] Reject modules without datalayout.	Davide Italiano	2016-12-14	87	-8/+157
\| \| \| \| \| \| \| \| \| \| \|	Also, udpate the ~60 failing tests in the tree which did not contain a valid datalayout. This fixes PR31123. lld will be updated in a following patch, immediately after this is committed. Differential Revision: https://reviews.llvm.org/D27082 llvm-svn: 289719
*	[asan] Don't skip instrumentation of masked load/store unless we've seen a ↵	Filipe Cabecinhas	2016-12-14	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \|	full load/store on that pointer. Reviewers: kcc, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27625 llvm-svn: 289718
*	[asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation ↵	Filipe Cabecinhas	2016-12-14	1	-11/+24
\| \| \| \| \| \| \| \| \| \| \| \|	instrumentation. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27548 llvm-svn: 289717
*	[ARM] Split 128-bit vectors in BUILD_VECTOR lowering	Eli Friedman	2016-12-14	4	-26/+63
\| \| \| \| \| \| \| \| \| \| \| \| \|	Given that INSERT_VECTOR_ELT operates on D registers anyway, combining 64-bit vectors into a 128-bit vector is basically free. Therefore, try to split BUILD_VECTOR nodes before giving up and lowering them to a series of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically better lowerings; see testcases for examples. Inspired by similar code in the x86 backend for AVX. Differential Revision: https://reviews.llvm.org/D27624 llvm-svn: 289706
*	[InstCombine] Folding of a compare with RHS const should merge debug locations	Robert Lougher	2016-12-14	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are compares that have a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289704
*	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.	Eli Friedman	2016-12-14	2	-6/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289703
*	[InstCombine] Folding of a binop with RHS const should merge the debug locations	Robert Lougher	2016-12-14	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are a binop with a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 7 of 8 for D26256. Folding of a binop with RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289699
*	[GVNHoist] Move GVNHoist to function simplification part of pipeline.	Geoff Berry	2016-12-14	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696
*	[InstCombine] When folding casts through a phi node merge the debug locations	Robert Lougher	2016-12-14	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are a cast, instcombine will try to pull them through the phi node, combining them into a single cast. When it does this, the debug location of the new cast should be the merged debug locations of the phi node arguments. Patch 6 of 8 for D26256. Folding of a cast operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289693
*	[InstCombine] Folding loads through a phi node should merge the debug locations	Robert Lougher	2016-12-14	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are a load, instcombine will try to pull them through the phi node, combining them into a single load. When it does this, the debug location of the new load should be the merged debug locations of the phi node arguments. Patch 5 of 8 for D26256. Folding of a load operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289688
*	[InstCombine] When folding GEP through a phi node merge the debug locations	Robert Lougher	2016-12-14	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are getelementptr, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new getelementptr should be the merged debug locations of the phi node arguments. Patch 4 of 8 for D26256. Folding of a getelementptr operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289684
*	[InstCombine] Merge debug locations when folding through a phi node	Robert Lougher	2016-12-14	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 3 of 8 for D26256. Folding of a compare operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289681
*	[InstCombine] Merge debug locations when folding through a phi node	Robert Lougher	2016-12-14	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \|	If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 2 of 8 for D26256. Folding of a binary operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289679
*	AMDGPU: Emit runtime metadata version 2 as YAML	Yaxun Liu	2016-12-14	4	-2303/+214
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674
*	lit.cfg: Check value of build config rather than converting to boolean	Derek Schuff	2016-12-14	1	-1/+1
\| \| \| \| \| \|	This is a CMake var which never evaluates to false. llvm-svn: 289673
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-14	64	-1552/+1715
\| \| \| \| \| \| \| \| \| \|	UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
*	AMDGPU: Change vintrp printing	Matt Arsenault	2016-12-14	2	-58/+58
\| \| \| \|	llvm-svn: 289664
*	Revert gold part of change, just liblto	Derek Schuff	2016-12-14	2	-2/+1
\| \| \| \|	llvm-svn: 289663
*	Disable libLTO tests when libLTO is not built	Derek Schuff	2016-12-14	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current test only checks whether ld64 is available, causing tests to fail when ld64 is avilable but libLTO is not built. Reviewers: beanz, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27739 llvm-svn: 289662
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-14	64	-1715/+1552
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
*	[DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of ↵	Simon Pilgrim	2016-12-14	5	-238/+100
\| \| \| \| \| \| \| \| \| \| \| \|	just APInt::isPowerOf2 Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases. Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value. Differential Revision: https://reviews.llvm.org/D27714 llvm-svn: 289654