bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Add SIWholeQuadMode pass	Nicolai Haehnle	2016-03-21	9	-15/+515
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
*	[Hexagon] Add handling fixups and instruction relaxation	Krzysztof Parzyszek	2016-03-21	1	-112/+451
\| \| \| \|	llvm-svn: 263981
*	[Hexagon] Properly encode registers in duplex instructions	Krzysztof Parzyszek	2016-03-21	3	-6/+126
\| \| \| \|	llvm-svn: 263980
*	[Hexagon] Fix reserving emergency spill slots for register scavenger	Krzysztof Parzyszek	2016-03-21	3	-35/+11
\| \| \| \| \| \| \|	- R10 and R11 are not reserved registers. - Check for reserved registers when finding unused caller-saved registers. llvm-svn: 263977
*	[WebAssembly] Implement the eqz instructions.	Dan Gohman	2016-03-21	1	-0/+7
\| \| \| \|	llvm-svn: 263976
*	AMDGPU/SI: Fix threshold calculation for branching when exec is zero	Tom Stellard	2016-03-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969
*	[AArch64] Add a helpful assert. NFC.	Chad Rosier	2016-03-21	1	-0/+1
\| \| \| \|	llvm-svn: 263965
*	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets	Matt Arsenault	2016-03-21	1	-1/+1
\| \| \| \| \| \| \|	These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964
*	ARM: Better codegen for 64-bit compares.	Peter Collingbourne	2016-03-21	2	-0/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a custom lowering for ISD::SETCCE (introduced in r253572) that allows us to emit a short code sequence for 64-bit compares. Before: push {r7, lr} cmp r0, r2 mov.w r0, #0 mov.w r12, #0 it hs movhs r0, #1 cmp r1, r3 it ge movge.w r12, #1 it eq moveq r12, r0 cmp.w r12, #0 bne .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} After: push {r7, lr} subs r0, r0, r2 sbcs.w r0, r1, r3 bge .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} Saves around 80KB in Chromium's libchrome.so. Some notes on this patch: - I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I introduced (nothing else needs them). However, they are necessary in order to avoid poor codegen, and they seem similar to existing combines in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to (brcond Compare)). - No support for Thumb-1. This is in principle possible, but we'd need to implement ARMISD::SUBE for Thumb-1. Differential Revision: http://reviews.llvm.org/D15256 llvm-svn: 263962
*	[ARM] Add Cortex-A32 support	Renato Golin	2016-03-21	2	-2/+10
\| \| \| \| \| \| \| \|	Adding Cortex-A32 as an available target in the ARM backend. Patch by Sam Parker. llvm-svn: 263956
*	AMDGPU: Add frexp_mant intrinsic	Matt Arsenault	2016-03-21	1	-2/+2
\| \| \| \|	llvm-svn: 263948
*	[AArch64] Fix a -Wdocumentation warning. NFC.	Chad Rosier	2016-03-21	1	-2/+2
\| \| \| \|	llvm-svn: 263942
*	[NVPTX] Adds a new address space inference pass.	Jingyue Wu	2016-03-20	5	-9/+609
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable to convert the address space of a pointer induction variable. This patch adds a new pass called NVPTXInferAddressSpaces that overcomes that limitation using a fixed-point data-flow analysis (see the file header comments for details). The new pass is experimental and not enabled by default. Users can turn it on by setting the -nvptx-use-infer-addrspace flag of llc. Reviewers: jholewinski, tra, jlebar Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17965 llvm-svn: 263916
*	[X86][SSE] Tidyup setTargetShuffleZeroElements to match ↵	Simon Pilgrim	2016-03-20	1	-4/+4
\| \| \| \| \| \| \| \|	computeZeroableShuffleElements Based on feedback for D14261 llvm-svn: 263911
*	[X86][SSE] Detect zeroable shuffle elements from different value types	Simon Pilgrim	2016-03-20	1	-8/+42
\| \| \| \| \| \| \| \|	Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask. Differential Revision: http://reviews.llvm.org/D14261 llvm-svn: 263906
*	AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR	Igor Breger	2016-03-20	1	-0/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18211 llvm-svn: 263898
*	Use a range-based for loop. NFC.	Michael Kuperstein	2016-03-20	1	-4/+4
\| \| \| \|	llvm-svn: 263889
*	[CXX_FAST_TLS] Fix issues in ARM.	Manman Ren	2016-03-18	1	-2/+3
\| \| \| \| \| \| \| \| \|	We need to be careful on which registers can be explicitly handled via copies. Prologue, Epilogue use physical registers and if one belongs to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites it after the explicit copies. llvm-svn: 263857
*	[CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.	Manman Ren	2016-03-18	3	-0/+21
\| \| \| \| \| \| \|	Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
*	[CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.	Manman Ren	2016-03-18	2	-0/+2
\| \| \| \| \| \| \|	Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855
*	AArch64: Don't modify other modules in AArch64PromoteConstant	Duncan P. N. Exon Smith	2016-03-18	1	-148/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid modifying other modules in `AArch64PromoteConstant` when the constant is `ConstantData` (a horrible accident, I'm sure, caught by an experimental follow-up to r261464). Previously, this walked through all the users of a constant, but that reaches into other modules when the constant doesn't depend transitively on a `GlobalValue`! Since we're walking instructions anyway, just modify the instructions we actually see. As a drive-by, instead of storing `Use` and getting the instructions again via `Use::getUser()` (which is not a constantant time lookup), store `std::pair<Instruction, unsigned>`. Besides being cheaper, this makes it easier to drop use-lists form `ConstantData` in the future. (I threw this in because I was touching all the code anyway.) Because the patch completely changes the traversal logic, it looks like a rewrite of the pass, but the core logic is all the same (or should be, minus the out-of-module changes). In other words, there should be NFC as long as the LLVMContext only has a single Module. I didn't think of a good way to test this, but I hope to submit a patch eventually that makes walking these use-lists illegal/impossible. llvm-svn: 263853
*	BPF: emit an error message for unsupported signed division operation	Alexei Starovoitov	2016-03-18	1	-0/+12
\| \| \| \| \| \|	Signed-off-by: Yonghong Song <yhs@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 263842
*	AMDGPU: add missing braces around multi-line if block	Nicolai Haehnle	2016-03-18	1	-1/+2
\| \| \| \| \| \|	This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823
*	[AArch64] Enable more load clustering in the MI Scheduler.	Chad Rosier	2016-03-18	3	-36/+116
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 llvm-svn: 263819
*	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format	Nicolai Haehnle	2016-03-18	1	-36/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792
*	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics	Nicolai Haehnle	2016-03-18	3	-2/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791
*	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format	Nicolai Haehnle	2016-03-18	3	-13/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790
*	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3	Sam Kolton	2016-03-18	2	-50/+95
\| \| \| \| \|	Review: http://reviews.llvm.org/D18267 llvm-svn: 263789
*	adding another optimization opportunity to readme file	Ehsan Amiri	2016-03-18	1	-0/+11
\| \| \| \|	llvm-svn: 263775
*	[LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead	Adam Nemet	2016-03-18	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It can hurt performance to prefetch ahead too much. Be conservative for now and don't prefetch ahead more than 3 iterations on Cyclone. Reviewers: hfinkel Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17949 llvm-svn: 263772
*	[LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses	Adam Nemet	2016-03-18	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: And use this TTI for Cyclone. As it was explained in the original RFC (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW prefetcher work up to 2KB strides. I am also adding tests for this and the previous change (D17943): * Cyclone prefetching accesses with a large stride * Cyclone not prefetching accesses with a small stride * Generic Aarch64 subtarget not prefetching either Reviewers: hfinkel Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17945 llvm-svn: 263771
*	[Aarch64] Add pass LoopDataPrefetch for Cyclone	Adam Nemet	2016-03-18	3	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This wires up the pass for Cyclone but keeps it off for now because we need a few more TTIs. The getPrefetchMinStride value is not very well tuned right now but it works well with CFP2006/433.milc which motivated this. Tests will be added as part of the upcoming large-stride prefetching patch. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, hfinkel, rengolin Differential Revision: http://reviews.llvm.org/D17943 llvm-svn: 263770
*	[PPC, FastISel] Fix ordered/unordered fcmp	Tim Shen	2016-03-17	1	-7/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. llvm-svn: 263753
*	ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.	Tim Northover	2016-03-17	1	-2/+3
\| \| \| \|	llvm-svn: 263741
*	[PowerPC] Disable CTR loops optimization for soft float operations	Petar Jovanovic	2016-03-17	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 llvm-svn: 263727
*	[WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback	Derek Schuff	2016-03-17	2	-19/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MRI::eliminateFrameIndex can emit several instructions to do address calculations; these can usually be stackified. Because instructions with FI operands can have subsequent operands which may be expression trees, find the top of the leftmost tree and insert the code before it, to keep the LIFO property. Also use stackified registers when writing back the SP value to memory in the epilog; it's unnecessary because SP will not be used after the epilog, and it results in better code. Differential Revision: http://reviews.llvm.org/D18234 llvm-svn: 263725
*	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute	Changpeng Fang	2016-03-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720
*	AMDGPU: mark atomic instructions as sources of divergence	Nicolai Haehnle	2016-03-17	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719
*	[X86][SSE] Simplified blend-with-zero combining	Simon Pilgrim	2016-03-17	1	-14/+13
\| \| \| \| \| \| \| \|	We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717
*	ARM: Revert SVN r253865, 254158, fix windows division	Saleem Abdulrasool	2016-03-17	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The two changes together weakened the test and caused a regression with division handling in MSVC mode. They were applied to avoid an assertion being triggered in the block frequency analysis. However, the underlying problem was simply being masked rather than solved properly. Address the actual underlying problem and revert the changes. Rather than analyze the cause of the assertion, the division failure was assumed to be an overflow. The underlying issue was a subtle bug in the BB construction in the emission of the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor information in the basic blocks, nor did we update the PHIs associated with the basic block when we split them. This would result in assertions being triggered in the block frequency analysis pass. Although the original tests are being removed, the tests themselves performed very little in terms of validation but merely tested that we did not assert when generating code. Update this with new tests that actually ensure that we do not regress on the code generation. llvm-svn: 263714
*	[mips] Use `formatImm` call to print immediate value in the `MipsInstPrinter`	Simon Atanasyan	2016-03-17	1	-2/+2
\| \| \| \| \| \| \| \| \|	That allows, for example, to print hex-formatted immediates using llvm-objdump --print-imm-hex command line option. Differential Revision: http://reviews.llvm.org/D18195 llvm-svn: 263704
*	[mips] Eliminate instances of "potentially uninitialised local variable" ↵	Scott Egerton	2016-03-17	1	-16/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	warnings, NFC Summary: This should eliminate all occurrences of this within LLVMMipsAsmParser. This patch is in response to http://reviews.llvm.org/D17983. I was unable to reproduce the warnings on my machine so please advise if this fixes the warnings. Reviewers: ariccio, vkalintiris, dsanders Subscribers: dblaikie, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18087 llvm-svn: 263703
*	Tweak some atomics functions in preparation for larger changes; NFC.	James Y Knight	2016-03-16	11	-15/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665
*	AMDGPU: Prevent uniform loops from becoming infinite	Nicolai Haehnle	2016-03-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658
*	[Hexagon] Adding missing break in switch statement. Extra operands would ↵	Colin LeMahieu	2016-03-16	1	-0/+1
\| \| \| \| \| \|	have been appended to the end. llvm-svn: 263657
*	fix function names; NFC	Sanjay Patel	2016-03-16	1	-58/+60
\| \| \| \|	llvm-svn: 263646
*	AMDGPU: Verify instructions in non-debug builds as well	Michel Danzer	2016-03-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	And emit an error if it fails. This prevents illegal instructions from getting sent to the GPU, which would potentially result in a hang. This is a candidate for the stable branch(es). Reviewed-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 263627
*	AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat	Michel Danzer	2016-03-16	1	-3/+3
\| \| \| \| \|	Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626
*	AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for ↵	Igor Breger	2016-03-16	1	-0/+5
\| \| \| \| \| \| \| \|	512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624
*	[MC] Rename TLSDESC as it's not ARM specific.	Davide Italiano	2016-03-15	1	-1/+1
\| \| \| \| \| \|	Similarly to what was done for TLSCALL in r263515. llvm-svn: 263564