bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU][llvm-mc] Predefined symbols to access register counts ↵	Artem Tamazov	2016-12-27	1	-7/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(.kernel.{v\|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
*	[AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions	Sam Kolton	2016-12-27	3	-6/+37
\| \| \| \| \| \| \| \| \| \|	Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
*	AMDGPU: split ret/noret patterns for global atomics	Jan Vesely	2016-12-23	3	-22/+52
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
*	Enable '-Wstring-conversion' and fix some bad asserts that it helped	Chandler Carruth	2016-12-23	1	-1/+1
\| \| \| \| \| \| \| \|	find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
*	AMDGPU: Invert cmp + select with constant	Matt Arsenault	2016-12-22	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
*	AMDGPU: Use i16 for i16 shift amount	Matt Arsenault	2016-12-22	2	-8/+10
\| \| \| \|	llvm-svn: 290351
*	AMDGPU: Fix missing 16-bit cmpx instructions	Matt Arsenault	2016-12-22	1	-0/+39
\| \| \| \|	llvm-svn: 290349
*	AMDGPU: Use i16 comparison instructions	Matt Arsenault	2016-12-22	2	-5/+43
\| \| \| \|	llvm-svn: 290348
*	AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert	Matt Arsenault	2016-12-22	1	-17/+4
\| \| \| \| \| \| \| \|	Caused by dereferencing end iterator when trying to const cast the iterator. Patch by Martin Sherburn llvm-svn: 290347
*	[AMDGPU] Add pseudo SDWA instructions	Sam Kolton	2016-12-22	5	-85/+159
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is needed for later SDWA support in CodeGen. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27412 llvm-svn: 290338
*	[AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa	Sam Kolton	2016-12-22	4	-5/+26
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands. Reviewers: nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27847 llvm-svn: 290336
*	AMDGPU: Fix missing commute table entries for cmpx	Matt Arsenault	2016-12-22	1	-4/+4
\| \| \| \| \| \|	No tests because these aren't currently used anywhere. llvm-svn: 290316
*	AMDGPU: Swap order of operands in fadd/fsub combine	Matt Arsenault	2016-12-22	1	-4/+4
\| \| \| \| \| \| \|	FMA is canonicalized to constant in the middle operand. Do the same so fmad matches and avoid an extra combine step. llvm-svn: 290313
*	AMDGPU: Check fast math flags in fadd/fsub combines	Matt Arsenault	2016-12-22	2	-7/+15
\| \| \| \|	llvm-svn: 290312
*	AMDGPU: Form more FMAs if fusion is allowed	Matt Arsenault	2016-12-22	2	-30/+46
\| \| \| \| \| \| \|	Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311
*	AMDGPU: Move combines into separate functions	Matt Arsenault	2016-12-22	2	-152/+174
\| \| \| \|	llvm-svn: 290309
*	AMDGPU: Enable some f32 fadd/fsub combines for f16	Matt Arsenault	2016-12-22	1	-7/+12
\| \| \| \|	llvm-svn: 290308
*	AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16	Matt Arsenault	2016-12-22	1	-0/+2
\| \| \| \|	llvm-svn: 290307
*	AMDGPU: Allow rcp and rsq usage with f16	Matt Arsenault	2016-12-22	2	-4/+8
\| \| \| \|	llvm-svn: 290302
*	AMDGPU: Custom lower f16 fdiv	Matt Arsenault	2016-12-22	2	-1/+22
\| \| \| \|	llvm-svn: 290301
*	AMDGPU: Implement f16 fcanonicalize	Matt Arsenault	2016-12-22	3	-0/+9
\| \| \| \|	llvm-svn: 290300
*	AMDGPU: Update isFPImmLegal for f16	Matt Arsenault	2016-12-22	1	-1/+2
\| \| \| \| \| \|	I don't think this matters because ConstantFP is legal. llvm-svn: 290299
*	AMDGPU/SI: Fix file header	Tom Stellard	2016-12-21	1	-1/+1
\| \| \| \|	llvm-svn: 290265
*	[AMDGPU] Garbage collect dead code. NFCI.	Davide Italiano	2016-12-21	1	-15/+0
\| \| \| \|	llvm-svn: 290249
*	AMDGPU: Allow 16-bit types in inline asm constraints	Matt Arsenault	2016-12-20	1	-0/+2
\| \| \| \|	llvm-svn: 290193
*	AMDGPU: Don't add same instruction multiple times to worklist	Matt Arsenault	2016-12-20	1	-1/+7
\| \| \| \| \| \| \| \| \|	When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191
*	AMDGPU/SI: Make a function const	Tom Stellard	2016-12-20	2	-4/+3
\| \| \| \|	llvm-svn: 290185
*	AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*	Tom Stellard	2016-12-20	6	-6/+77
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
*	AMDGPU/SI: Add a MachineMemOperand to MIMG instructions	Tom Stellard	2016-12-20	4	-6/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
*	[AMDGPU] When unifying metadata, add operands to named metadata individually	Konstantin Zhuravlyov	2016-12-19	1	-3/+5
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27725 llvm-svn: 290114
*	AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵	Sam Kolton	2016-12-19	4	-72/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
*	AMDGPU: Fix name for v_ashrrev_i16	Matt Arsenault	2016-12-16	1	-3/+3
\| \| \| \|	llvm-svn: 289967
*	AMDGPU: Select branch on undef to uniform scc branch	Matt Arsenault	2016-12-15	3	-0/+21
\| \| \| \|	llvm-svn: 289877
*	AMDGPU: Fix asserting on returned tail calls	Matt Arsenault	2016-12-15	1	-2/+4
\| \| \| \|	llvm-svn: 289868
*	AMDGPU: Assembler support for vintrp instructions	Matt Arsenault	2016-12-15	3	-6/+108
\| \| \| \|	llvm-svn: 289866
*	Fix for regression after Global Load Scalarization patch	Alexander Timofeev	2016-12-15	1	-1/+2
\| \| \| \|	llvm-svn: 289822
*	Extract LaneBitmask into a separate type	Krzysztof Parzyszek	2016-12-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
*	fix gcc warning about a superfluous ;	Nico Weber	2016-12-14	1	-1/+1
\| \| \| \|	llvm-svn: 289705
*	Fix build failure due to r289674 on certain systems	Yaxun Liu	2016-12-14	1	-1/+0
\| \| \| \| \| \|	Removed a useless include which caused conflict. llvm-svn: 289700
*	AMDGPU: Emit runtime metadata version 2 as YAML	Yaxun Liu	2016-12-14	7	-403/+550
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674
*	AMDGPU: Make AllocationPriority of SGPRs higher than VGPRs	Matt Arsenault	2016-12-14	1	-11/+13
\| \| \| \| \| \| \| \|	Since SGPRs should spill to VGPRs, they should be allocated first. I don't think this is sufficient for SGPRs to always spill to VGPRs though. llvm-svn: 289671
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
*	AMDGPU: Change vintrp printing	Matt Arsenault	2016-12-14	4	-6/+37
\| \| \| \|	llvm-svn: 289664
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-14	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
*	Replace APFloatBase static fltSemantics data members with getter functions	Stephan Bergmann	2016-12-14	2	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
*	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵	Eugene Zelenko	2016-12-12	8	-95/+127
\| \| \| \| \| \|	You Use warnings; other minor fixes (NFC). llvm-svn: 289475
*	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence	Nicolai Haehnle	2016-12-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447
*	AMDGPU: Fix asan errors when folding operands	Matt Arsenault	2016-12-10	1	-2/+2
\| \| \| \| \| \| \|	This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337
*	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts	Matt Arsenault	2016-12-10	1	-1/+8
\| \| \| \| \| \| \|	The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
*	AMDGPU: Fix handling of 16-bit immediates	Matt Arsenault	2016-12-10	19	-222/+741
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306