bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Extend memcpy expansion in Transform/Utils to handle wider operand types.	Sean Fertile	2017-07-07	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \|	Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
*	AMDGPU: Add macro fusion schedule DAG mutation	Matt Arsenault	2017-07-06	4	-0/+86
\| \| \| \| \| \|	Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
*	AMDGPU: Minor cleanup of shrinking logic	Matt Arsenault	2017-07-06	1	-8/+4
\| \| \| \|	llvm-svn: 307312
*	[AMDGPU] Always use rcp + mul with fast math	Stanislav Mekhanoshin	2017-07-06	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
*	[Constants] If we already have a ConstantInt*, prefer to use ↵	Craig Topper	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \|	isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
*	[AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.	Quentin Colombet	2017-07-05	2	-48/+50
\| \| \| \| \| \|	NFC llvm-svn: 307186
*	[AMDGPU] Switch scalarize global loads ON by default	Alexander Timofeev	2017-07-04	1	-1/+1
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
*	[AMDGPU] Fix latency of MIMG instructions	Marek Olsak	2017-07-04	1	-0/+1
\| \| \| \| \| \|	Patch by cwabbott (Connor Abbott). llvm-svn: 307081
*	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"	NAKAMURA Takumi	2017-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \|	It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
*	[AMDGPU] Switch scalarize global loads ON by default	Alexander Timofeev	2017-07-03	1	-1/+1
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
*	AMDGPU: Add operand target flags serialization	Matt Arsenault	2017-07-02	2	-0/+26
\| \| \| \|	llvm-svn: 306995
*	fix trivial typos; NFC	Hiroshi Inoue	2017-07-02	1	-1/+1
\| \| \| \| \| \|	suport -> support llvm-svn: 306968
*	AMDGPU: Remove SITypeRewriter	Matt Arsenault	2017-06-28	4	-159/+0
\| \| \| \| \| \| \|	This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603
*	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.	Geoff Berry	2017-06-28	2	-2/+3
\| \| \| \| \| \| \| \| \| \|	Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
*	[AMDGPU] Add pattern for v_alignbit_b32 with immediate	Stanislav Mekhanoshin	2017-06-28	2	-3/+6
\| \| \| \| \| \| \| \|	If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500
*	[AMDGPU] Add 2 new alignbit patterns	Stanislav Mekhanoshin	2017-06-27	1	-0/+9
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449
*	[AMDGPU] Simplify setcc (sext from i1 b), -1\|0, cc	Stanislav Mekhanoshin	2017-06-27	1	-1/+29
\| \| \| \| \| \| \| \| \| \| \|	Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446
*	[AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0	Stanislav Mekhanoshin	2017-06-27	1	-2/+28
\| \| \| \| \| \| \| \| \| \|	Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439
*	[AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions	Sam Kolton	2017-06-27	6	-33/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
*	fix trivial typos, NFC	Hiroshi Inoue	2017-06-27	1	-2/+2
\| \| \| \| \| \|	succesor -> successor llvm-svn: 306393
*	AMDGPU: M0 operands to spill/restore opcodes are dead	Nicolai Haehnle	2017-06-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
*	AMDGPU: Setup SP/FP in callee function prolog/epilog	Matt Arsenault	2017-06-26	3	-2/+78
\| \| \| \|	llvm-svn: 306312
*	AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal	Tom Stellard	2017-06-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
*	AMDGPU: Whitespace fixes	Matt Arsenault	2017-06-26	4	-6/+6
\| \| \| \|	llvm-svn: 306265
*	AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling	Matt Arsenault	2017-06-26	6	-30/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
*	Remove a processFixupValue hack.	Rafael Espindola	2017-06-24	2	-35/+32
\| \| \| \| \| \| \| \| \| \| \|	The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200
*	Remove redundant argument.	Rafael Espindola	2017-06-24	1	-2/+2
\| \| \| \|	llvm-svn: 306189
*	Move Value adjustment to applyFixup. NFC.	Rafael Espindola	2017-06-23	1	-2/+1
\| \| \| \|	llvm-svn: 306178
*	ARM: move some logic from processFixupValue to applyFixup.	Rafael Espindola	2017-06-23	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
*	AMDGPU/GlobalISel: Mark 32-bit G_AND as legal	Tom Stellard	2017-06-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112
*	[AMDGPU] Add intrinsics for tbuffer load and store - build error fix	David Stuttard	2017-06-22	1	-2/+1
\| \| \| \| \| \| \|	Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034
*	[AMDGPU] Add intrinsics for tbuffer load and store	David Stuttard	2017-06-22	8	-121/+535
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
*	[AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit ↵	Sam Kolton	2017-06-22	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999
*	[AMDGPU] SDWA: add support for GFX9 in peephole pass	Sam Kolton	2017-06-22	6	-39/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
*	[AMDGPU] Add FP_CLASS to the add/setcc combine	Stanislav Mekhanoshin	2017-06-21	1	-1/+3
\| \| \| \| \| \| \| \|	This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970
*	Use a MutableArrayRef. NFC.	Rafael Espindola	2017-06-21	1	-4/+4
\| \| \| \|	llvm-svn: 305968
*	[AMDGPU] Combine add and adde, sub and sube	Stanislav Mekhanoshin	2017-06-21	2	-9/+81
\| \| \| \| \| \| \| \| \|	If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
*	[AMDGPU] simplify add x, *ext (setcc) => addc\|subb x, 0, setcc	Stanislav Mekhanoshin	2017-06-21	4	-0/+59
\| \| \| \| \| \| \| \| \|	This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
*	[AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failures	Dmitry Preobrazhensky	2017-06-21	4	-11/+6
\| \| \| \| \| \| \| \| \| \|	See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509 Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin Differential Revision: https://reviews.llvm.org/D34360 llvm-svn: 305923
*	[AMDGPU][MC] Corrected V_QSAD instructions to check that dest register is ↵	Dmitry Preobrazhensky	2017-06-21	3	-5/+84
\| \| \| \| \| \| \| \| \| \| \| \|	different than any of the src See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D34003 llvm-svn: 305915
*	[AMDGPU] SDWA: merge VI and GFX9 pseudo instructions	Sam Kolton	2017-06-21	15	-281/+323
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
*	AMDGPU: Allow vectorization of packed types	Matt Arsenault	2017-06-20	2	-8/+20
\| \| \| \|	llvm-svn: 305844
*	[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32	Stanislav Mekhanoshin	2017-06-20	1	-0/+2
\| \| \| \| \| \| \| \| \|	If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840
*	AMDGPU: Start adding global_* instructions	Matt Arsenault	2017-06-20	6	-6/+106
\| \| \| \|	llvm-svn: 305838
*	AMDGPU: Do operand folding in program order	Matt Arsenault	2017-06-20	1	-5/+3
\| \| \| \| \| \| \| \| \|	Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821
*	AMDGPU: Preserve undef when folding register operands	Matt Arsenault	2017-06-20	1	-0/+2
\| \| \| \| \| \| \| \|	If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816
*	[AMDGPU] Eliminate SGPR to VGPR copy when possible	Stanislav Mekhanoshin	2017-06-20	1	-0/+30
\| \| \| \| \| \| \| \|	SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815
*	AMDGPU: Fix crash with undef vreg input operand	Matt Arsenault	2017-06-20	1	-1/+1
\| \| \| \|	llvm-svn: 305814
*	AMDGPU: Fix scratch wave offset relative FI expansion	Matt Arsenault	2017-06-19	1	-9/+20
\| \| \| \| \| \| \| \|	The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761
*	[AMDGPU] Add infer address spaces pass before SROA	Stanislav Mekhanoshin	2017-06-19	1	-0/+8
\| \| \| \| \| \| \| \| \|	It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759