bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	If a function needs a frame pointer, but r11 (aka fp) has not been used,	Joerg Sonnenberger	2014-05-06	5	-36/+38
\| \| \| \| \| \| \| \|	remove it from the list of unspilled registers. Otherwise the following attempt to keep the stack aligned by picking an extra GPR register to spill will not work as it picks up r11. llvm-svn: 208129
*	[X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 ↵	Andrea Di Biagio	2014-05-06	2	-2/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(and vice versa). Before this patch, the backend always emitted a store+load sequence to bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting i64 node was then used to build a v2i32 vector. With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free" bitcast to type MVT::v4i32. The elements of the resulting v4i32 are then extracted to build a v2i32 vector (which is illegal and therefore promoted to MVT::v2i64). This is in general cheaper than emitting a stack store+load sequence to bitconvert the operand from type f64 to type i64. llvm-svn: 208107
*	Implememting named register intrinsics	Renato Golin	2014-05-06	12	-0/+204
\| \| \| \| \| \| \| \| \| \| \|	This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104
*	Special case aliases in GlobalValue::getAlignment.	Rafael Espindola	2014-05-06	1	-0/+14
\| \| \| \| \| \| \| \| \|	An alias has the address of what it points to, so it also has the same alignment. This allows a few optimizations to see past aliases for free. llvm-svn: 208103
*	[ARM64] Enable alignment control option in front-end for ARM64.	Kevin Qin	2014-05-06	1	-0/+1
\| \| \| \| \| \|	This is the modification in llvm part. llvm-svn: 208074
*	Fix i128 div/mod on mingw64	Reid Kleckner	2014-05-06	1	-0/+26
\| \| \| \| \| \| \| \| \| \|	The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029
*	R600: Expand i64 ISD:SUB	Tom Stellard	2014-05-05	1	-18/+37
\| \| \| \|	llvm-svn: 208005
*	Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest."	Filipe Cabecinhas	2014-05-05	2	-16/+0
\| \| \| \| \| \|	This reverts commit 207992. I misread the phab number on the LGTM. llvm-svn: 207993
*	Optimize shufflevector that copies an i64/f64 and zeros the rest.	Filipe Cabecinhas	2014-05-05	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 207992
*	Move test from r207969 to another folder and rename it.	Michael Zolotukhin	2014-05-05	1	-0/+25
\| \| \| \|	llvm-svn: 207984
*	Remove the -disable-cfi option.	Rafael Espindola	2014-05-05	1	-34/+0
\| \| \| \| \| \| \|	This also add a release note about it. If this stays I will cleanup MC next week. llvm-svn: 207977
*	Modify test to not use -disable-cfi.	Rafael Espindola	2014-05-05	2	-34/+27
\| \| \| \|	llvm-svn: 207974
*	Convert a CodeGen test into a MC test.	Rafael Espindola	2014-05-05	1	-161/+0
\| \| \| \|	llvm-svn: 207971
*	CodeGen: correct memset emittance for WoA	Saleem Abdulrasool	2014-05-04	1	-0/+18
\| \| \| \| \| \| \| \|	Windows on ARM does not conform to AEABI. However, memset would be emitted using the AEABI signature, resulting in inverted parameters. Handle this special case appropriately. llvm-svn: 207943
*	CodeGen: strengthen WoA AEABI avoidance tests	Saleem Abdulrasool	2014-05-04	1	-0/+22
\| \| \| \| \| \|	Add additional test cases for WoA AEABI avoidance checking. llvm-svn: 207942
*	AVX-512: minor change in rndscale intrinsic	Elena Demikhovsky	2014-05-04	1	-2/+2
\| \| \| \|	llvm-svn: 207937
*	X86: repair export compatibility with MinGW/cygwin	Saleem Abdulrasool	2014-05-04	1	-36/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both MinGW and cygwin (i686) construct export directives without the global leader prefix. This is mostly due to the fact that they use GNU ld which does not correctly handle the export directive. This apparently has been been broken for a while. However, this was recently reported as being broken by mingwandroid and diorcety of the msys2 project. Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain the global leader prefix. Add an explicit test for cygwin's behaviour of export directives. llvm-svn: 207926
*	[ARM64] Correctly select ANDWri in FastISel.	Joey Gouly	2014-05-03	1	-1/+1
\| \| \| \| \| \|	http://reviews.llvm.org/D3598 llvm-svn: 207917
*	DAGCombine: prevent formation of illegal ConstantFP nodes.	Tim Northover	2014-05-02	1	-0/+14
\| \| \| \|	llvm-svn: 207850
*	R600: Expand vector sin and cos.	Tom Stellard	2014-05-02	2	-22/+65
\| \| \| \| \| \| \| \|	v2: move code to AMDGPUISelLowering.cpp squash with tests (both EG and SI) Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 207845
*	R600: Expand TruncStore i64 -> {i16,i8}	Tom Stellard	2014-05-02	1	-0/+40
\| \| \| \|	llvm-svn: 207844
*	AArch64/ARM64: add patterns for post-indexed ST1 ops.	Tim Northover	2014-05-02	1	-0/+211
\| \| \| \|	llvm-svn: 207840
*	AArch64/ARM64: support indexed loads/stores on vector types.	Tim Northover	2014-05-02	1	-0/+402
\| \| \| \| \| \| \| \|	While post-indexed LD1/ST1 instructions do exist for vector loads, this patch makes use of the more flexible addressing-modes in LDR/STR instructions. llvm-svn: 207838
*	Allow SelectionDAG::FoldConstantArithmetic to work when it's called with a ↵	Benjamin Kramer	2014-05-02	1	-0/+10
\| \| \| \| \| \|	vector VT but scalar values. llvm-svn: 207835
*	[IR] Make {extract,insert}element accept an index of any integer type.	Michael J. Spencer	2014-05-01	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given the following C code llvm currently generates suboptimal code for x86-64: __m128 bss4( const __m128 ptr, size_t i, size_t j ) { float f = ptr[i][j]; return (__m128) { f, f, f, f }; } ================================================= define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float> nocapture readonly %ptr, i64 %i, i64 %j) #0 { %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i %a2 = load <4 x float>* %a1, align 16, !tbaa !1 %a3 = trunc i64 %j to i32 %a4 = extractelement <4 x float> %a2, i32 %a3 %a5 = insertelement <4 x float> undef, float %a4, i32 0 %a6 = insertelement <4 x float> %a5, float %a4, i32 1 %a7 = insertelement <4 x float> %a6, float %a4, i32 2 %a8 = insertelement <4 x float> %a7, float %a4, i32 3 ret <4 x float> %a8 } ================================================= shlq $4, %rsi addq %rdi, %rsi movslq %edx, %rax vbroadcastss (%rsi,%rax,4), %xmm0 retq ================================================= The movslq is uneeded, but is present because of the trunc to i32 and then sext back to i64 that the backend adds for vbroadcastss. We can't remove it because it changes the meaning. The IR that clang generates is already suboptimal. What clang really should emit is: %a4 = extractelement <4 x float> %a2, i64 %j This patch makes that legal. A separate patch will teach clang to do it. Differential Revision: http://reviews.llvm.org/D3519 llvm-svn: 207801
*	Add basic functionality for assignment of ints.	Reed Kotler	2014-05-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel Test Plan: simplestore.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3527 llvm-svn: 207790
*	R600/SI: Fix verifier error with pseudo store instructions.	Matt Arsenault	2014-05-01	3	-4/+4
\| \| \| \| \| \| \| \|	Use i32 instead of specifying SReg_32. When this is the pseudo INDIRECT_BASE_ADDR, this would give a bogus verifier error. llvm-svn: 207770
*	[ARM64] Prefer generation of bzero on Darwin only	Bradley Smith	2014-05-01	1	-5/+12
\| \| \| \|	llvm-svn: 207760
*	AArch64/ARM64: print BFM instructions as BFI or BFXIL	Tim Northover	2014-05-01	3	-43/+32
\| \| \| \| \| \| \|	The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752
*	[ARM64] Prevent bit extraction to be adjusted by following shift	Weiming Zhao	2014-04-30	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702
*	[X86] Never hoist the shift value of a shift instruction.	Michael Zolotukhin	2014-04-30	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \|	There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692
*	ARM64: print fp immediates without using scientific notation.	Tim Northover	2014-04-30	3	-6/+6
\| \| \| \|	llvm-svn: 207669
*	R600/SI: Use VALU instructions for copying i1 values	Tom Stellard	2014-04-30	1	-0/+39
\| \| \| \| \| \| \| \| \|	We can't use SALU instructions for this since they ignore the EXEC mask and are always executed. This fixes several OpenCV tests. llvm-svn: 207661
*	R600/SI: Teach moveToVALU how to handle some SMRD instructions	Tom Stellard	2014-04-30	1	-0/+28
\| \| \| \|	llvm-svn: 207660
*	[ARM64][fast-isel] Fast-isel doesn't know how to handle f128.	Chad Rosier	2014-04-30	1	-1/+33
\| \| \| \|	llvm-svn: 207659
*	[mips] Fix MipsLongBranch pass to work when the offset from the branch to the	Sasa Stankovic	2014-04-30	2	-16468/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	target cannot be determined accurately. This is the case for NaCl where the sandboxing instructions are added in MC layer, after the MipsLongBranch pass. It is also the case when the code has inline assembly. Instead of calculating offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions that are resolved during the fixup. This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll and implements microMIPS CHECKs in a much simpler way in a file test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64. llvm-svn: 207656
*	ARM64: print lsr instead of lsrv for variable shifts (etc)	Tim Northover	2014-04-30	2	-18/+18
\| \| \| \| \| \| \|	The canonical syntax for shifts by a variable amount does not end with 'v', but that syntax should be supported as an alias (presumably for legacy reasons). llvm-svn: 207649
*	AArch64/ARM64: use HS instead of CS & LO instead of CC.	Tim Northover	2014-04-30	3	-17/+17
\| \| \| \| \| \| \| \| \|	On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644
*	[mips][msa] Fix vector insertions where the index is variable	Daniel Sanders	2014-04-30	2	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This isn't supported directly so we rotate the vector by the desired number of elements, insert to element zero, then rotate back. The i64 case generates rather poor code on MIPS32. There is an obvious optimisation to be made in future (do both insert.w's inside a shared rotate/unrotate sequence) but for now it's sufficient to select valid code instead of aborting. Depends on D3536 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3537 llvm-svn: 207640
*	ARM64: use hex immediates for movz/movk instructions	Tim Northover	2014-04-30	16	-121/+121
\| \| \| \| \| \| \| \|	Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635
*	ARM64: hexify printing various immediate operands	Tim Northover	2014-04-30	7	-17/+17
\| \| \| \| \| \| \| \| \| \|	This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they accept weird shifts which are more naturally understandable in hex notation). Also changes BRK/HINT etc, which is probably a neutral change, but easier than the alternative. llvm-svn: 207634
*	ARM64: print canonical syntax for add/sub (imm) instructions.	Tim Northover	2014-04-30	7	-30/+30
\| \| \| \| \| \| \| \| \| \|	Since these instructions only accept a 12-bit immediate, possibly shifted left by 12, the canonical syntax used by the architecture reference manual is "#N {, lsl #12 }". We should accept an immediate that has already been shifted, (e.g. Also, print a comment giving the full addend since it can be helpful. llvm-svn: 207633
*	[ARM64] Ensure arm64_be is dealt with when emitting debug info.	James Molloy	2014-04-30	1	-0/+1
\| \| \| \| \| \| \|	This is a partial port of r204816 (cpirker "Elf support for MC-JIT runtime dynamic linker") from AArch64 to ARM64. llvm-svn: 207625
*	ARM64: make sure FastISel uses a GPR64 source in 64-bit extensions.	Tim Northover	2014-04-30	1	-8/+34
\| \| \| \|	llvm-svn: 207620
*	ARM: support stack probe emission for Windows on ARM	Saleem Abdulrasool	2014-04-30	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces the stack lowering emission of the stack probe function for Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where any page allocation which crosses a page boundary of the following guard page will cause a page fault. This page fault must be handled by the kernel to ensure that the page is faulted in. If this does not occur and a write access any memory beyond that, the page fault will go unserviced, resulting in an abnormal program termination. The watermark for the stack probe appears to be at 4080 bytes (for accommodating the stack guard canaries and stack alignment) when SSP is enabled. Otherwise, the stack probe is emitted on the page size boundary of 4096 bytes. llvm-svn: 207615
*	ARM: partially handle 32-bit relocations for WoA	Saleem Abdulrasool	2014-04-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise relocation is not split up and reordered. When expanding the mov32imm pseudo-instruction, create a bundle if the machine operand is referencing an address. This helps ensure that the relocatable address load is not reordered by subsequent passes. Unfortunately, this only partially handles the case as the Constant Island Pass occurs after the instructions are unbundled and does not properly handle bundles. That is a more fundamental issue with the pass itself and beyond the scope of this change. llvm-svn: 207608
*	Implement X86 code generation for musttail	Reid Kleckner	2014-04-29	3	-4/+226
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598
*	R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors	Tom Stellard	2014-04-29	25	-25/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SI_IF and SI_ELSE are terminators which also produce a value. For these instructions ISel always inserts a COPY to move their value to another basic block. This COPY ends up between SI_(IF\|ELSE) and the S_BRANCH* instruction at the end of the block. This breaks MachineBasicBlock::getFirstTerminator() and also the machine verifier which assumes that terminators are grouped together at the end of blocks. To solve this we coalesce the copy away right after ISel to make sure there are no instructions in between terminators at the end of blocks. llvm-svn: 207591
*	R600/SI: Only select SALU instructions in the entry or exit block	Tom Stellard	2014-04-29	1	-0/+27
\| \| \| \| \| \| \| \|	SALU instructions ignore control flow, so it is not always safe to use them within branches. This is a partial solution to this problem until we can come up with something better. llvm-svn: 207590
*	R600: optimize the UDIVREM 64 algorithm	Tom Stellard	2014-04-29	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a squash of several optimization commits: - calculate DIV_Lo and DIV_Hi separately - use BFE_U32 if we are operating on 32bit values - use precomputed constants instead of shifting in UDVIREM - skip the first 32 iterations of udivrem v2: Check whether BFE is supported before using it Patch by: Jan Vesely Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 207589