bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers	Dmitry Preobrazhensky	2019-05-17	1	-33/+17
\| \| \| \| \| \| \| \| \| \|	See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61009 llvm-svn: 361045
*	[AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork	Dmitry Preobrazhensky	2019-05-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	See https://bugs.llvm.org/show_bug.cgi?id=41888 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62016 llvm-svn: 361040
*	[X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.	Simon Pilgrim	2019-05-17	1	-7/+0
\| \| \| \| \| \|	We can now rely on generic expansion to handle this. llvm-svn: 361038
*	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) ↵	Simon Pilgrim	2019-05-17	1	-0/+9
\| \| \| \| \| \| \| \|	fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035
*	[AMDGPU][MC] Enabled expressions for most operands which accept integer values	Dmitry Preobrazhensky	2019-05-17	2	-65/+110
\| \| \| \| \| \| \| \| \| \|	See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60768 llvm-svn: 361031
*	AMDGPU: Fix unused variable warnings in release builds	Matt Arsenault	2019-05-17	1	-12/+9
\| \| \| \|	llvm-svn: 361030
*	AMDGPU/GlobalISel: Legalize G_FCEIL	Matt Arsenault	2019-05-17	2	-2/+37
\| \| \| \|	llvm-svn: 361028
*	AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC	Matt Arsenault	2019-05-17	2	-4/+70
\| \| \| \|	llvm-svn: 361027
*	AMDGPU/GlobalISel: Legalize G_FRINT	Matt Arsenault	2019-05-17	2	-0/+44
\| \| \| \|	llvm-svn: 361026
*	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN	Matt Arsenault	2019-05-17	1	-0/+4
\| \| \| \|	llvm-svn: 361025
*	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load	Matt Arsenault	2019-05-17	1	-0/+44
\| \| \| \|	llvm-svn: 361023
*	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge	Matt Arsenault	2019-05-17	1	-8/+2
\| \| \| \| \| \| \|	This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022
*	AMDGPU/GlobalISel: Use waterfall loop for buffer_load	Matt Arsenault	2019-05-17	2	-36/+302
\| \| \| \| \| \| \|	This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021
*	[X86] Pull out IsNOT helper. NFCI.	Simon Pilgrim	2019-05-17	1	-8/+16
\| \| \| \| \| \|	Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012
*	[AMDGPU] detect WaW hazards when moving/merging load/store instructions	Rhys Perry	2019-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008
*	[AArch64][SVE2] Asm: add saturating multiply-add long instructions	Cullen Rhodes	2019-05-17	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61997 llvm-svn: 361005
*	[AArch64][SVE2] Asm: add integer multiply-add long instructions	Cullen Rhodes	2019-05-17	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61951 llvm-svn: 361003
*	[AArch64][SVE2] Asm: add integer multiply long instructions	Cullen Rhodes	2019-05-17	2	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61936 llvm-svn: 361002
*	[X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ↵	Craig Topper	2019-05-17	1	-0/+2
\| \| \| \| \| \| \| \| \|	ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992
*	[PowerPC] Support .reloc , R_PPC{,64}_NONE,	Fangrui Song	2019-05-17	2	-28/+49
\| \| \| \| \| \| \| \|	This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. llvm-svn: 360990
*	[MC][PowerPC] Clean up PPCAsmBackend	Fangrui Song	2019-05-17	1	-25/+17
\| \| \| \| \| \| \| \|	Replace the member variable Target with Triple Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit. Delete redundant parameters llvm-svn: 360986
*	[X86] Support .reloc , R_{386,X86_64}_NONE,	Fangrui Song	2019-05-17	2	-9/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983
*	[AArch64] Support .reloc , R_AARCH64_NONE,	Fangrui Song	2019-05-17	2	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61973 llvm-svn: 360981
*	[ARM] Support .reloc , R_ARM_NONE,	Fangrui Song	2019-05-17	6	-9/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980
*	[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()	Jonas Paulsson	2019-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965
*	[X86][AsmParser] Add mnemonics missed in r360954.	David L. Jones	2019-05-17	1	-1/+2
\| \| \| \| \| \| \| \|	These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960
*	[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.	David L. Jones	2019-05-16	1	-5/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954
*	[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".	David L. Jones	2019-05-16	1	-9/+9
\| \| \| \| \| \| \| \|	This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953
*	[X86] Deduplicate symbol lowering logic, NFC	Reid Kleckner	2019-05-16	2	-88/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952
*	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. ↵	Craig Topper	2019-05-16	1	-4/+4
\| \| \| \| \| \| \| \| \|	Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915
*	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence	Matt Arsenault	2019-05-16	1	-0/+7
\| \| \| \| \| \| \|	The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906
*	AMDGPU: Assume xnack is enabled by default	Matt Arsenault	2019-05-16	3	-2/+28
\| \| \| \| \| \| \| \| \| \| \|	This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
*	[AArch64] Handle ISD::LROUND and ISD::LLROUND	Adhemerval Zanella	2019-05-16	2	-0/+11
\| \| \| \| \| \| \|	This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894
*	[CodeGen] Add lround/llround builtins	Adhemerval Zanella	2019-05-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889
*	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor	Matt Arsenault	2019-05-16	1	-1/+1
\| \| \| \| \| \|	Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877
*	[AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions	Cullen Rhodes	2019-05-16	2	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds support for the indexed and unpredicated vectors forms of the CMLA and SQRDCMLAH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61906 llvm-svn: 360871
*	[AArch64][SVE2] Asm: implement CDOT instruction	Cullen Rhodes	2019-05-16	2	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The complex DOT instructions perform a dot-product on quadtuplets from two source vectors and the resuling wide real or wide imaginary is accumulated into the destination register. The instructions come in two forms: Vector form, e.g. cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. The complex numbers in the second source vector are rotated by 90 degrees. cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. The complex numbers in the second source vector are rotated by 180 degrees. Indexed form, e.g. cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 32-bit elements. cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 64-bit elements. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61903 llvm-svn: 360870
*	[AArch64][SVE2] Asm: add unpredicated integer multiply instructions	Cullen Rhodes	2019-05-16	2	-0/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for the following instructions: * MUL (indexed and unpredicated vectors forms) * SQDMULH (indexed and unpredicated vectors forms) * SQRDMULH (indexed and unpredicated vectors forms) * SMULH (unpredicated, predicated form added in SVE) * UMULH (unpredicated, predicated form added in SVE) * PMUL (unpredicated) The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61902 llvm-svn: 360867
*	Add Triple::isPPC64()	Fangrui Song	2019-05-16	2	-3/+2
\| \| \| \|	llvm-svn: 360864
*	[X86] Delay creating index register negations during address matching until ↵	Craig Topper	2019-05-15	1	-7/+15
\| \| \| \| \| \| \| \| \| \|	after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823
*	[mips] Use range-based `for` loops. NFC	Simon Atanasyan	2019-05-15	1	-20/+17
\| \| \| \|	llvm-svn: 360817
*	[AArch64] only indicate CFI on Windows if we emitted CFI	Mandeep Singh Grang	2019-05-15	3	-37/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Otherwise, we emit directives for CFI without any actual CFI opcodes to go with them, which causes tools to malfunction. The technique is similar to what the x86 backend already does. Fixes https://bugs.llvm.org/show_bug.cgi?id=40876 Patch by: froydnj (Nathan Froyd) Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor Tags: #llvm Differential Revision: https://reviews.llvm.org/D61960 llvm-svn: 360816
*	[X86] Strengthen type constraints on some specialized X86 ISD opcodes that ↵	Craig Topper	2019-05-15	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \|	don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815
*	Uncomment LLVM_FALLTHROUGH.	Pete Couperus	2019-05-15	1	-1/+1
\| \| \| \|	llvm-svn: 360798
*	[AMDGPU] Increases available SGPR for Calling Convention	Ryan Taylor	2019-05-15	2	-4/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778
*	[ARM] Don't use the Machine Scheduler for cortex-m at minsize	David Green	2019-05-15	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769
*	[ARM] Cortex-M4 schedule	David Green	2019-05-15	6	-65/+176
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768
*	[mips] LLVM and GAS now use same instructions for CFA Definition. NFCI	Simon Atanasyan	2019-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set the register and offset for current CFA rule. We change it to `DW_CFA_def_cfa_register` which is the same one used by GAS that only changes the register but keeping the old offset. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D61899 llvm-svn: 360765
*	[X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.	Craig Topper	2019-05-15	2	-5/+3
\| \| \| \| \| \| \| \| \| \| \|	They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747
*	[NFC] Reuse a helper function to eliminate duplicate code	Philip Reames	2019-05-15	1	-79/+67
\| \| \| \|	llvm-svn: 360740