bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019
*	Revert r300932 and r300930.	Akira Hatanaka	2017-04-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930
*	AMDGPU: Custom lower illegal small select types	Matt Arsenault	2017-04-19	1	-0/+29
\| \| \| \| \| \| \|	Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754
*	AMDGPU: Fix invalid copies when copying i1 to phys reg	Matt Arsenault	2017-04-12	1	-2/+28
\| \| \| \| \| \| \|	Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
*	AMDGPU: Refactor SIMachineFunctionInfo slightly	Matt Arsenault	2017-04-11	1	-1/+1
\| \| \| \| \| \|	Prepare for handling non-entry functions. llvm-svn: 299999
*	AMDGPU: Refactor argument lowering	Matt Arsenault	2017-04-11	1	-256/+318
\| \| \| \| \| \| \|	Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
*	AMDGPU/GFX9: Fix shared and private aperture queries	Konstantin Zhuravlyov	2017-04-06	1	-11/+23
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727
*	AMDGPU: Replace fp16SrcZerosHighBits with a whitelist	Matt Arsenault	2017-04-06	1	-4/+50
\| \| \| \| \| \| \|	FCOPYSIGN is lowered to bit operations which don't clear the high bits. llvm-svn: 299708
*	AMDGPU: Stop using CCAssignToRegWithShadow	Matt Arsenault	2017-04-06	1	-11/+0
\| \| \| \| \| \| \|	This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667
*	[AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size	Stanislav Mekhanoshin	2017-04-06	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guarantied to come to the same point at the same time. Differential Revision: https://reviews.llvm.org/D31731 llvm-svn: 299659
*	AMDGPU: Remove legacy export intrinsic	Matt Arsenault	2017-04-04	1	-23/+0
\| \| \| \|	llvm-svn: 299444
*	AMDGPU: Remove llvm.SI.vs.load.input	Matt Arsenault	2017-04-03	1	-5/+0
\| \| \| \|	llvm-svn: 299391
*	AMDGPU: Remove legacy bfe intrinsics	Matt Arsenault	2017-04-03	1	-3/+7
\| \| \| \|	llvm-svn: 299372
*	AMDGPU: Remove unnecessary ands when f16 is legal	Matt Arsenault	2017-03-31	1	-0/+39
\| \| \| \| \| \| \| \| \| \|	Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
*	AMDGPU: Add all atomicrmw fields to atomic.inc/dec	Matt Arsenault	2017-03-30	1	-2/+5
\| \| \| \| \| \|	Add scope, order, isVolatile llvm-svn: 299122
*	[AMDGPU] Get address space mapping by target triple environment	Yaxun Liu	2017-03-27	1	-75/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
*	AMDGPU: Implement f16 fround	Matt Arsenault	2017-03-24	1	-0/+1
\| \| \| \|	llvm-svn: 298730
*	AMDGPU: Rename SI_RETURN	Matt Arsenault	2017-03-21	1	-1/+1
\| \| \| \| \| \| \| \|	This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
*	AMDGPU: Always use VGPR indexing on GFX9	Marek Olsak	2017-03-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
*	AMDGPU: Fix asserting on 0 dmask for image intrinsics	Matt Arsenault	2017-03-21	1	-0/+58
\| \| \| \| \| \|	Fold these to undef during lowering so users get eliminated. llvm-svn: 298387
*	AMDGPU: Cleanup control flow intrinsics	Matt Arsenault	2017-03-17	1	-25/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
*	AMDGPU: Allow sinking of addressing modes for atomic_inc/dec	Matt Arsenault	2017-03-15	1	-5/+22
\| \| \| \|	llvm-svn: 297913
*	AMDGPU: Re-use TM.getNullPointerValue	Matt Arsenault	2017-03-13	1	-10/+8
\| \| \| \|	llvm-svn: 297662
*	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering	Matt Arsenault	2017-03-13	1	-7/+14
\| \| \| \|	llvm-svn: 297658
*	AMDGPU: Remove packf16 intrinsic	Matt Arsenault	2017-03-11	1	-5/+0
\| \| \| \|	llvm-svn: 297557
*	AMDGPU: Use v_med3_{f16\|i16\|u16}	Matt Arsenault	2017-02-27	1	-17/+16
\| \| \| \|	llvm-svn: 296401
*	AMDGPU: Support v2i16/v2f16 packed operations	Matt Arsenault	2017-02-27	1	-6/+69
\| \| \| \|	llvm-svn: 296396
*	AMDGPU: Support inlineasm for packed instructions	Matt Arsenault	2017-02-27	1	-1/+42
\| \| \| \| \| \| \|	Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
*	AMDGPU: Use clamp with f64	Matt Arsenault	2017-02-22	1	-5/+8
\| \| \| \|	llvm-svn: 295908
*	AMDGPU : Update TrapCode based on Trap Handler ABI.	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
*	AMDGPU: Add replacement bfe intrinsics	Matt Arsenault	2017-02-22	1	-0/+6
\| \| \| \|	llvm-svn: 295899
*	AMDGPU: Don't look at chain users when adjusting writemask	Matt Arsenault	2017-02-22	1	-0/+4
\| \| \| \| \| \|	Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
*	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."	Wei Ding	2017-02-22	1	-1/+1
\| \| \| \| \| \|	This reverts commit r295867. llvm-svn: 295871
*	AMDGPU : Update TrapCode based on Trap Handler ABI.	Wei Ding	2017-02-22	1	-1/+1
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
*	AMDGPU: Add cvt.pkrtz intrinsic	Matt Arsenault	2017-02-22	1	-4/+41
\| \| \| \| \| \|	Convert llvm.SI.packf16 test uses llvm-svn: 295797
*	AMDGPU: Redefine clamp node as clamp 0.0-1.0	Matt Arsenault	2017-02-21	1	-3/+77
\| \| \| \| \| \| \| \| \| \| \|	Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
*	AMDGPU: Formatting fixes	Matt Arsenault	2017-02-21	1	-4/+5
\| \| \| \|	llvm-svn: 295783
*	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic	Matt Arsenault	2017-02-21	1	-1/+0
\| \| \| \|	llvm-svn: 295754
*	AMDGPU: Don't use stack space for SGPR->VGPR spills	Matt Arsenault	2017-02-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
*	AMDGPU: Merge initial gfx9 support	Matt Arsenault	2017-02-18	1	-1/+8
\| \| \| \|	llvm-svn: 295554
*	AMDGPU: Fix crashes on invalid icmp/fcmp intrinsics	Matt Arsenault	2017-02-17	1	-5/+9
\| \| \| \|	llvm-svn: 295489
*	AMDGPU: Remove llvm.AMDGPU.rsq intrinsic	Matt Arsenault	2017-02-16	1	-1/+0
\| \| \| \|	llvm-svn: 295358
*	AMDGPU: Remove llvm.SI.sendmsg	Matt Arsenault	2017-02-16	1	-4/+3
\| \| \| \|	llvm-svn: 295270
*	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics	Matt Arsenault	2017-02-16	1	-25/+0
\| \| \| \| \| \|	Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
*	AMDGPU: Consolidate sendmsg/sendmsghalt handling and tests	Matt Arsenault	2017-02-15	1	-7/+4
\| \| \| \|	llvm-svn: 295244
*	AMDGPU: Fix trailing whitespace	Matt Arsenault	2017-02-10	1	-6/+5
\| \| \| \|	llvm-svn: 294694
*	AMDGPU : Add trap handler support.	Wei Ding	2017-02-10	1	-18/+41
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
*	AMDGPU: Generalize matching of v_med3_f32	Matt Arsenault	2017-01-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
*	AMDGPU: Make i32 uaddo/usubo legal	Matt Arsenault	2017-01-30	1	-0/+3
\| \| \| \|	llvm-svn: 293514