bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[InlineFunction] add nonnull assumptions based on argument attributes	Sanjay Patel	2017-02-27	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|	This was suggested in D27855: have the inliner add assumptions, so we don't lose nonnull info provided by argument attributes. This still doesn't solve PR28430 (dyn_cast), but this gets us closer. https://reviews.llvm.org/D29999 llvm-svn: 296366
*	Fix a bug when unswitching on partial LIV for SwitchInst	Xin Tong	2017-02-27	1	-0/+211
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix a bug when unswitching on partial LIV for SwitchInst. Reviewers: hfinkel, efriedma, sanjoy Reviewed By: sanjoy Subscribers: david2050, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D29107 llvm-svn: 296363
*	Remove an empty line in icmp-illegal.ll . NFC	Amaury Sechet	2017-02-27	1	-1/+0
\| \| \| \|	llvm-svn: 296350
*	[SLP] A test for a fix of PR32038.	Alexey Bataev	2017-02-27	1	-0/+124
\| \| \| \|	llvm-svn: 296349
*	Loop predication expand both sides of the widened condition	Artur Pilipenko	2017-02-27	1	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for a loop predication bug which resulted in malformed IR generation. Loop invariant side of the widened condition is not guaranteed to be available in the preheader as is, so we need to expand it as well. See added unsigned_loop_0_to_n_hoist_length test for example. Reviewed By: sanjoy, mkazantsev Differential Revision: https://reviews.llvm.org/D30099 llvm-svn: 296345
*	[ARM] LSL #0 is an alias of MOV	John Brawn	2017-02-27	2	-0/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we handle this correctly in arm, but in thumb we don't which leads to an unpredictable instruction being emitted for LSL #0 in an IT block and SP not being permitted in some cases when it should be. For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the .td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to get the IT handling right. We also need to adjust the handling of MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We should also adjust it to allow SP in the same way that it is allowed in MOV rd, rn, but I haven't done that here because it looks like it would take quite a lot of work to get right. Additionally correct the selection of the 16-bit shift instructions in processInstruction, where it was checking if the two registers were equal when it should have been checking if they were low. It appears that previously this code was never executed and the 16-bit encoding was selected by default, but the other changes I've done here have somehow made it start being used. Differential Revision: https://reviews.llvm.org/D30294 llvm-svn: 296342
*	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE ↵	Artur Pilipenko	2017-02-27	2	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336
*	[AMDGPU] Runtime metadata fixes:	Konstantin Zhuravlyov	2017-02-27	5	-42/+191
\| \| \| \| \| \| \| \| \| \| \|	- Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324
*	Do full codegen for various tests. NFC	Amaury Sechet	2017-02-27	3	-57/+147
\| \| \| \|	llvm-svn: 296305
*	Revert "[CGP] Split some critical edges coming out of indirect branches"	Daniel Jasper	2017-02-26	4	-280/+13
\| \| \| \| \| \| \|	This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295
*	[X86] Fix execution domain for cmpss/sd instructions.	Craig Topper	2017-02-26	4	-160/+160
\| \| \| \|	llvm-svn: 296293
*	[AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.	Craig Topper	2017-02-26	1	-1/+1
\| \| \| \|	llvm-svn: 296291
*	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts.	Craig Topper	2017-02-26	2	-5/+5
\| \| \| \|	llvm-svn: 296290
*	[AVX-512] Fix execution domain for VPMADD52 instructions.	Craig Topper	2017-02-26	2	-24/+24
\| \| \| \|	llvm-svn: 296288
*	[AVX-512] Use update_llc_test_checks.py to regenerate a test.	Craig Topper	2017-02-26	1	-44/+53
\| \| \| \|	llvm-svn: 296287
*	[X86] Fix the execution domain for scalar SQRT intrinsic instruction.	Craig Topper	2017-02-26	2	-5/+5
\| \| \| \|	llvm-svn: 296284
*	[X86] Add an additional CHECK prefix to a test. Some of the cases used it, ↵	Craig Topper	2017-02-26	1	-12/+7
\| \| \| \| \| \|	but it wasn't on the FileCheck command lines. llvm-svn: 296283
*	[X86] Clean up test/CodeGen/X86/2006-03-02-InstrSchedBug.ll	David L. Jones	2017-02-26	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Migrated from grep to FileCheck. Re-indented code, removed boilerplate comments. Added 'entry' label at beginning of basic block. Patch by Jorge Gorbe! Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30320 llvm-svn: 296280
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-02-26	69	-2116/+2175
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
*	[ValueTracking] Don't do an unchecked shift in ComputeNumSignBits	Sanjoy Das	2017-02-25	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we used to return a bogus result, 0, for IR like `ashr %val, -1`. I've also added an assert checking that `ComputeNumSignBits` at least returns 1. That assert found an already checked in test case where we were returning a bad result for `ashr %val, -1`. Fixes PR32045. Reviewers: spatel, majnemer Reviewed By: spatel, majnemer Subscribers: efriedma, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30311 llvm-svn: 296273
*	[AVX-512] Fix the execution domain for scalar FMA instructions.	Craig Topper	2017-02-25	3	-18/+18
\| \| \| \|	llvm-svn: 296271
*	[AVX-512] Fix the execution domain on some instructions.	Craig Topper	2017-02-25	3	-7/+7
\| \| \| \|	llvm-svn: 296270
*	[AVX-512] Add an additional test case to show the execution domain for ↵	Craig Topper	2017-02-25	1	-0/+12
\| \| \| \| \| \|	vrqsrtsd is wrong. llvm-svn: 296269
*	[AVX-512] Use update_llc_test_checks.py to regenerate the avx512er intrinsic ↵	Craig Topper	2017-02-25	1	-23/+90
\| \| \| \| \| \|	test. llvm-svn: 296268
*	reenable accidentally disabled test NFC.	Nirav Dave	2017-02-25	1	-14/+14
\| \| \| \|	llvm-svn: 296266
*	[ExecutionDepsFix] Don't make copies of LiveReg objects when collecting ↵	Craig Topper	2017-02-25	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operands for soft instructions Summary: While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object. To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler. The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation. Fixes PR30284. Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30242 llvm-svn: 296260
*	Update various test's codegen. NFC	Amaury Sechet	2017-02-25	2	-10/+482
\| \| \| \|	llvm-svn: 296257
*	Add test for known bits in uaddo and saddo.	Amaury Sechet	2017-02-25	1	-0/+54
\| \| \| \|	llvm-svn: 296255
*	The automatic CHECK: to CHECK-LABEL: conversion, back in 2013,	Artyom Skrobov	2017-02-25	1	-7/+7
\| \| \| \| \| \| \|	had missed most labels in this test because they didn't end with a colon. llvm-svn: 296254
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-25	69	-2175/+2116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
*	[XRAY] A Color Choosing helper for XRay Graph	Dean Michael Berris	2017-02-25	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In Preparation for graph comparison, this patch breaks out the color choice code from xray-graph into a library and adds polynomials for the Sequential and Difference sets from ColorBrewer. Depends on D29005 Reviewers: dblaikie, chandlerc, dberris Reviewed By: dberris Subscribers: chandlerc, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D29363 llvm-svn: 296210
*	[PGO] Directory name stripping in global identifier for static functions	Rong Xu	2017-02-25	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current internal option -static-func-full-module-prefix keeps all the directory path the profile counter names for static functions. The default of this option is false. This strips the directory names from the source filename which is problematic: (1) it creates linker errors for profile-generation compilation, exposed in our internal benchmarks. We are seeing messages like "warning: relocation refers to discarded section". This is due to the name conflicts after the stripping. (2) the stripping only applies to getPGOFuncName. Current Thin-LTO module importing for the indirect-calls assumes the source directory name not being stripped. Current default value for this option can potentially prevent some inter-module indirect-call-promotions. This patch turns the default value for -static-func-full-module-prefix to true. The second part of the patch is to have an alternative implementation under the internal option -static-func-strip-dirname-prefix=<value> This options specifies level of directories to be stripped from the source filename. Using a large value as the parameter has the same effect as -static-func-full-module-prefix. Differential Revision: http://reviews.llvm.org/D29512 llvm-svn: 296206
*	[WebAssembly] Add support for using a wasm global for the stack pointer.	Dan Gohman	2017-02-24	1	-45/+57
\| \| \| \| \| \| \|	This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201
*	[Hexagon] Undo shift folding where it could simplify addressing mode	Krzysztof Parzyszek	2017-02-24	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \|	For example, avoid (single shift): r0 = and(##536870908,lsr(r0,#3)) r0 = memw(r1+r0<<#0) in favor of (two shifts): r0 = lsr(r0,#5) r0 = memw(r1+r0<<#2) llvm-svn: 296196
*	[WebAssembly] Basic support for Wasm object file encoding.	Dan Gohman	2017-02-24	30	-230/+433
\| \| \| \| \| \| \| \| \|	With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190
*	AMDGPU : Replace FMAD with FMA when denormals are enabled.	Wei Ding	2017-02-24	1	-1/+19
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
*	Revert "Correct register pressure calculation in presence of subregs"	Stanislav Mekhanoshin	2017-02-24	2	-83/+16
\| \| \| \| \| \| \| \|	This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
*	Disallow redefinition of section symbols.	Evgeniy Stepanov	2017-02-24	5	-142/+18
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D30235 llvm-svn: 296180
*	[ARM] add tests for alternate forms of select-of-constants; NFC	Sanjay Patel	2017-02-24	1	-0/+33
\| \| \| \|	llvm-svn: 296178
*	GlobalISel: check for CImm rather than Imm on G_CONSTANTs.	Tim Northover	2017-02-24	1	-4/+3
\| \| \| \| \| \| \|	All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm (i.e. a ConstantInt), so that's what the selector needs to look for. llvm-svn: 296176
*	[ARM] auto-generate complete checks; NFC	Sanjay Patel	2017-02-24	1	-8/+34
\| \| \| \| \| \| \|	The affected test may change with a patch I'm looking at for DAGCombiner, so I want to make sure it's not a regression. llvm-svn: 296175
*	[WebAssembly] Handle f16 in fast-isel.	Dan Gohman	2017-02-24	1	-0/+1
\| \| \| \|	llvm-svn: 296172
*	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.	Eli Friedman	2017-02-24	1	-3/+8
\| \| \| \| \| \| \| \| \| \|	When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167
*	[InstCombine] Fix bug in pointer replacement	Yaxun Liu	2017-02-24	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	This optimisation was crashing when there was a chain of more than one bitcast instruction to replace, as a result of the changes in D27283. Patch by James Price. Differential Revision: https://reviews.llvm.org/D30347 llvm-svn: 296163
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	4	-13/+280
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149
*	[LV] Merge floating-point and integer induction widening code	Matthew Simpson	2017-02-24	1	-63/+56
\| \| \| \| \| \| \| \| \| \| \|	This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145
*	[PowerPC] Use subfic instruction for subtract from immediate	Nemanja Ivanovic	2017-02-24	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \|	Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate. The corresponding pattern already exists for 32-bit integers. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29387 llvm-svn: 296144
*	[PowerPC] Use rldicr instruction for AND with an immediate if possible	Nemanja Ivanovic	2017-02-24	3	-16/+22
\| \| \| \| \| \| \| \| \| \| \|	Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that clear bits from the right hand size. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29388 llvm-svn: 296143
*	[DAGCombiner] add missing folds for scalar select of {-1,0,1}	Sanjay Patel	2017-02-24	7	-109/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137
*	Recommit "[mips] Fix atomic compare and swap at O0."	Simon Dardis	2017-02-24	2	-3/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134