bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Regenerate x64 mmx/f64 return value tests	Simon Pilgrim	2016-09-04	1	-17/+26
\| \| \| \|	llvm-svn: 280634
*	[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div ↵	Craig Topper	2016-09-04	2	-252/+252
\| \| \| \| \| \|	intrinsics and upgrade to native IR. llvm-svn: 280633
*	[ORC] Clone module flags metadata into the globals module in the	Lang Hames	2016-09-04	1	-0/+13
\| \| \| \| \| \| \| \|	CompileOnDemandLayer. Also contains a tweak to the orc-lazy jit in LLI to enable the test case. llvm-svn: 280632
*	[X86] Regenerate trunc-store legalization test	Simon Pilgrim	2016-09-04	1	-4/+12
\| \| \| \|	llvm-svn: 280631
*	[X86][SSE] Regenerate fcmp/uitofp combine tests	Simon Pilgrim	2016-09-04	1	-12/+25
\| \| \| \|	llvm-svn: 280629
*	revert r279960.	Igor Breger	2016-09-04	5	-273/+723
\| \| \| \| \| \|	https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625
*	EOL fixes	Simon Pilgrim	2016-09-04	2	-54/+54
\| \| \| \|	llvm-svn: 280624
*	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing	Dorit Nuzman	2016-09-04	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \|	memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617
*	[PowerPC] Zero-extend constants in FastISel	Hal Finkel	2016-09-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which are illegal types promoted to i32 on PowerPC, is a choice constrained by assumptions within the infrastructure. Specifically, the logic in FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI operands will be zero extended, and so, at least when materializing constants that are PHI operands, we must do the same. The rest of our fast-isel implementation does not appear to depend on the fact that we were sign-extending i8/i16 constants, and all other targets also appear to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we had been doing this only for i1 constants, and sign-extending the others). Fixes PR27721. llvm-svn: 280614
*	[AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to ↵	Craig Topper	2016-09-04	8	-1836/+1838
\| \| \| \| \| \|	native IR. llvm-svn: 280611
*	Fix inliner funclet unwind memoization	Joseph Tremoulet	2016-09-04	1	-4/+224
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610
*	[Profile] preserve branch metadata lowering select in CGP	Xinliang David Li	2016-09-03	2	-5/+21
\| \| \| \| \| \| \| \| \| \|	CGP currently drops select's MD_prof profile data when generating conditional branch which can lead to bad code layout. The patch fixes the issue. Differential Revision: http://reviews.llvm.org/D24169 llvm-svn: 280600
*	Fix ThinLTO crash with debug info	Mehdi Amini	2016-09-03	2	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \|	Because the recent change about ODR type uniquing in the context, we can reach types defined in another module during IR linking. This triggered some assertions in case we IR link without starting from an empty module. To alleviate that, we can self-map metadata defined in the destination module so that they won't be visited. Differential Revision: https://reviews.llvm.org/D23841 llvm-svn: 280599
*	[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an ↵	Craig Topper	2016-09-03	2	-0/+454
\| \| \| \| \| \|	AVX512 stack folding test. llvm-svn: 280593
*	AMDGPU: Reduce the duration of whole-quad-mode	Nicolai Haehnle	2016-09-03	1	-28/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 llvm-svn: 280590
*	AMDGPU: Fix an interaction between WQM and polygon stippling	Nicolai Haehnle	2016-09-03	1	-4/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589
*	AMDGPU: Do basic folding of class intrinsic	Matt Arsenault	2016-09-03	1	-0/+237
\| \| \| \| \| \| \|	This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586
*	AMDGPU: Fix spilling of m0	Matt Arsenault	2016-09-03	2	-35/+78
\| \| \| \| \| \| \| \| \|	readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
*	[AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables.	Craig Topper	2016-09-03	1	-0/+48
\| \| \| \|	llvm-svn: 280581
*	Revert r280549.	Nico Weber	2016-09-03	1	-482/+482
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test it added doesn't pass: http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15318/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apdbdump-yaml-types.test Command Output (stdout): -- $ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\llvm-pdbdump.EXE" "pdb2yaml" "-tpi-stream" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB/Inputs/empty.pdb" $ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\FileCheck.EXE" "-check-prefix=YAML" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test" # command stderr: D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test:36:7: error: expected string not found in input YAML: Name: apartment ^ <stdin>:153:10: note: scanning from here Value: 161 ^ llvm-svn: 280577
*	[PowerPC] Support asm parsing for bc[l][a][+-] mnemonics	Hal Finkel	2016-09-03	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PowerPC assembly code in the wild, so it seems, has things like this: bc+ 12, 28, .L9 This is a bit odd because the '+' here becomes part of the BO field, and the BO field is otherwise the first operand. Nevertheless, the ISA specification does clearly say that the +- hint syntax applies to all conditional-branch mnemonics (that test either CTR or a condition register, although not the forms which check both), both basic and extended, so this is supposed to be valid. This introduces some asm-parser-only definitions which take only the upper three bits from the specified BO value, and the lower two bits are implied by the +- suffix (via some associated aliases). Fixes PR23646. llvm-svn: 280571
*	Fix buildbot error.	Wei Mi	2016-09-03	1	-4/+1
\| \| \| \| \| \|	Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86. llvm-svn: 280568
*	[PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfev	Hal Finkel	2016-09-02	2	-0/+21
\| \| \| \| \| \| \| \|	These few book-III instructions are used by the Linux kernel. Partially fixes PR24796. llvm-svn: 280560
*	[PowerPC] Add support for the extended dcbf form and mnemonics	Hal Finkel	2016-09-02	2	-2/+18
\| \| \| \| \| \| \| \| \|	dcbf has an optional hint-like field, add support for the extended form and the associated mnemonics (dcbfl and dcbflp). Partially fixes PR24796. llvm-svn: 280559
*	[codeview] Make FieldList records print as a yaml sequence.	Zachary Turner	2016-09-02	1	-482/+482
\| \| \| \| \| \| \| \| \| \| \|	Before we were kind of imitating the behavior of a Yaml sequence by outputting each record one after the other. This makes it a little cumbersome when we want to go the other direction -- from Yaml to Pdb. So this treats FieldList records as no different than any other list of records, by printing them as a Yaml sequence with the exact same format. llvm-svn: 280549
*	[Profile] handle select instruction in 'expect' lowering	Xinliang David Li	2016-09-02	1	-0/+10
\| \| \| \| \| \| \| \| \|	Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547
*	[PowerPC] For larger offsets, when possible, fold offset into addis toc@ha	Hal Finkel	2016-09-02	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	When we have an offset into a global, etc. that is accessed relative to the TOC base pointer, and the offset is larger than the minimum alignment of the global itself and the TOC base pointer (which is 8-byte aligned), we can still fold the @toc@ha into the memory access, but we must update the addis instruction's symbol reference with the offset as the symbol addend. When there is only one use of the addi to be folded and only one use of the addis that would need its symbol's offset adjusted, then we can make the adjustment and fold the @toc@l into the memory access. llvm-svn: 280545
*	AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors ↵	Jan Vesely	2016-09-02	3	-0/+153
\| \| \| \| \| \| \| \| \| \|	have the same number of elements. Fixes R600 piglit regressions since r280298 Differential Revision: https://reviews.llvm.org/D24174 llvm-svn: 280535
*	Do not consider subreg defs as reads when computing subrange liveness	Krzysztof Parzyszek	2016-09-02	4	-0/+307
\| \| \| \| \| \| \| \| \| \|	Subregister definitions are considered uses for the purpose of tracking liveness of the whole register. At the same time, when calculating live interval subranges, subregister defs should not be treated as uses. Differential Revision: https://reviews.llvm.org/D24190 llvm-svn: 280532
*	[InstCombine] auto-generate assertions for tighter checking	Sanjay Patel	2016-09-02	1	-60/+95
\| \| \| \|	llvm-svn: 280531
*	AMDGPU/R600: Expand unaligned writes to local and global AS	Jan Vesely	2016-09-02	2	-6/+178
\| \| \| \| \| \| \| \| \|	LOCAL and GLOBAL AS only PRIVATE needs special treatment Differential Revision: https://reviews.llvm.org/D23971 llvm-svn: 280526
*	AMDGPU: Reorganize store tests	Jan Vesely	2016-09-02	4	-188/+177
\| \| \| \| \| \| \| \| \|	Split by AS. Merge with some prviously failing tests. Differential Revision: https://reviews.llvm.org/D23969 llvm-svn: 280523
*	[codeview] Use the correct max CV record length of 0xFF00	Reid Kleckner	2016-09-02	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	Previously we were splitting our records at 0xFFFF bytes, which the Microsoft tools don't like. Should fix failure on the new Windows self-host buildbot. This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h llvm-svn: 280522
*	IfConversion: Fix bug introduced by rescanning diamonds.	Kyle Butt	2016-09-02	1	-0/+66
\| \| \| \| \| \| \|	Passing the wrong values for predicate-clobbering. Simple to miss. Added an assert to make this easier to catch in the future. llvm-svn: 280517
*	Split the store of a wide value merged from an int-fp pair into multiple stores.	Wei Mi	2016-09-02	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505
*	[InsttCombine] fold insertelement of constant into shuffle with constant ↵	Sanjay Patel	2016-09-02	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504
*	[LV] Ensure reverse interleaved group GEPs remain uniform	Matthew Simpson	2016-09-02	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497
*	[instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all ↵	Andrea Di Biagio	2016-09-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NaN. This patch fixes a crash caused by an incorrect folding of an ordered comparison between a packed floating point vector and a splat vector of NaN. An ordered comparison between a vector and a constant vector of NaN, should always be folded into a constant vector where each element is i1 false. Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar 'false'. Later on, this would cause an assertion failure, since the value type of the folded value doesn't match the expected value type of the uses of the original instruction: "Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!". This patch fixes the issue and adds a test case to the already existing test InstSimplify/floating-point-compares.ll. Differential Revision: https://reviews.llvm.org/D24143 llvm-svn: 280488
*	[DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.	Andrea Di Biagio	2016-09-02	1	-0/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a regression introduced by revision 268094. Revision 268094 added the following dag combine rule: // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2 That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2). The problem is that the constraint on the shift count is incorrect, so the rule doesn't work well in some cases involving vector types. The combine rule should have been written instead like this: // trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits() Basically, if K is smaller than the "scalar size in bits" of the truncated value then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined. This patch fixes the check on the shift count, and adds test cases to make sure that we don't regress the behavior. Differential Revision: https://reviews.llvm.org/D24154 llvm-svn: 280482
*	[InstCombine] Add test for insertelementinsts with constants.	Alexey Bataev	2016-09-02	1	-0/+77
\| \| \| \| \| \| \|	Added a tests that shows that several insertelementinsts with constant indexes/data are not folded into a single shuffleinst. llvm-svn: 280474
*	[llvm-readobj] - Teach readobj to print DT_AUXILIARY dynamic tag in human ↵	George Rimar	2016-09-02	3	-0/+38
\| \| \| \| \| \| \| \| \| \|	readable form. Previously DT_AUXILIARY was unknown, patch fixes that. Differential revision: https://reviews.llvm.org/D24138 llvm-svn: 280471
*	[SimplifyCFG] Add a workaround to fix PR30188	James Molloy	2016-09-02	1	-0/+23
\| \| \| \| \| \| \| \|	We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470
*	[AVX-512] Move tests for masked floating point logical operations to ↵	Craig Topper	2016-09-02	2	-1248/+1251
\| \| \| \| \| \|	avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded. llvm-svn: 280467
*	[AVX-512] Add more patterns for masked and broadcasted logical operations ↵	Craig Topper	2016-09-02	2	-26/+13
\| \| \| \| \| \| \| \|	where the select or broadcast has a floating point type. These are needed in order to remove the masked floating point logical operation intrinsics and use native IR. llvm-svn: 280465
*	[AVX-512] Add execution domain fixing for logical operations with broadcast ↵	Craig Topper	2016-09-02	2	-16/+66
\| \| \| \| \| \|	loads. This builds on the handling of masked ops since we need to keep element size the same. llvm-svn: 280464
*	[PowerPC] hasAndNotCompare should return true	Hal Finkel	2016-09-02	1	-20/+25
\| \| \| \| \| \| \| \| \| \|	As Sanjay suggested when he added the hook, PPC should return true from hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a compare). Fixes PR27203. llvm-svn: 280457
*	[PowerPC] Add a pattern for a runtime bit check	Hal Finkel	2016-09-02	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following a suggestion by Sanjay, we should lower: %shl = shl i32 1, %y %and = and i32 %x, %shl %cmp = icmp eq i32 %and, %shl ret i1 %cmp into: subfic r4, r4, 32 rlwnm r3, r3, r4, 31, 31 Add this pattern and some associated patterns for the 64-bit case and the not-equal case. Fixes PR27356. llvm-svn: 280454
*	llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead ↵	NAKAMURA Takumi	2016-09-02	1	-1/+1
\| \| \| \| \| \|	of %T, not to emit backslashes. llvm-svn: 280451
*	[PowerPC] Don't apply the PPC64 address-formation peephole for offsets ↵	Hal Finkel	2016-09-02	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	greater than 7 When applying our address-formation PPC64 peephole, we are reusing the @ha TOC addis value with the low parts associated with different offsets (i.e. different effective symbol addends). We were assuming this was okay so long as the offsets were less than the alignment of the global variable being accessed. This ignored the fact, however, that the TOC base pointer itself need only be 8-byte aligned. As a result, what we were doing is legal only for offsets less than 8 regardless of the alignment of the object being accessed. Fixes PR28727. llvm-svn: 280441
*	[PowerPC] Don't consider fusion in PPC64 address-formation peephole	Hal Finkel	2016-09-02	1	-162/+80
\| \| \| \| \| \| \| \| \| \| \| \| \|	The logic in this function assumes that the P8 supports fusion of addis/addi, but it does not. As a result, there is no advantage to restricting our peephole application, merging addi instructions into dependent memory accesses, even when the addi has multiple users, regardless of whether or not we're optimizing for size. We might need something like this again for the P9; I suspect we'll revisit this code when we work on P9 tuning. llvm-svn: 280440