bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA	Tom Stellard	2016-09-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 llvm-svn: 281080
*	[Sparc][LEON] Removed the parts of the errata fixes implemented using inline ↵	Chris Dewhurst	2016-09-09	3	-68/+9
\| \| \| \| \| \|	assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly. llvm-svn: 281047
*	[ARM] ADD with a negative offset can become SUB for free	James Molloy	2016-09-09	1	-0/+17
\| \| \| \| \| \|	So model that directly in TTI::getIntImmCost(). llvm-svn: 281044
*	[ARM] icmp %x, -C can be lowered to a simple ADDS or CMN	James Molloy	2016-09-09	1	-0/+34
\| \| \| \| \| \|	Tell TargetTransformInfo about this so ConstantHoisting is informed. llvm-svn: 281043
*	[SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type	Simon Pilgrim	2016-09-09	1	-0/+107
\| \| \| \| \| \|	Fixes issue with rL280927 identified by Mikael Holmén llvm-svn: 281042
*	[Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)	James Molloy	2016-09-09	5	-6/+39
\| \| \| \| \| \|	The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2. llvm-svn: 281040
*	GlobalISel: remove G_TYPE and G_PHI	Tim Northover	2016-09-09	10	-17/+17
\| \| \| \| \| \| \| \|	These instructions were only necessary when type information was stored in the MachineInstr (because only generic MachineInstrs possessed a type). Now that it's in MachineRegisterInfo, COPY and PHI work fine. llvm-svn: 281037
*	GlobalISel: move type information to MachineRegisterInfo.	Tim Northover	2016-09-09	26	-747/+755
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want each register to have a canonical type, which means the best place to store this is in MachineRegisterInfo rather than on every MachineInstr that happens to use or define that register. Most changes following from this are pretty simple (you need an MRI anyway if you're going to be doing any transformations, so just check the type there). But legalization doesn't really want to check redundant operands (when, for example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's operand type field to encode these constraints and limit legalization's work. As an added bonus, more validation is possible, both in MachineVerifier and MachineIRBuilder (coming soon). llvm-svn: 281035
*	Revert "[mips] Fix c.<cc>.<fmt> instruction definition."	Simon Dardis	2016-09-09	2	-57/+85
\| \| \| \| \| \| \|	This reverts commit r281022. Mips buildbot broke, due to unhandled register class FCC. llvm-svn: 281033
*	[AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec	Sam Kolton	2016-09-09	4	-45/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also removed duplicate code from AMDGPUTargetAsmStreamer. This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D24296 llvm-svn: 281028
*	[Thumb1] Teach optimizeCompareInstr about thumb1 compares	James Molloy	2016-09-09	2	-5/+57
\| \| \| \| \| \| \| \|	This avoids us doing a completely unneeded "cmp r0, #0" after a flag-setting instruction if we only care about the Z or C flags. Add LSL/LSR to the whitelist while we're here and add testing. This code could really do with a spring clean. llvm-svn: 281027
*	[mips] Fix c.<cc>.<fmt> instruction definition.	Simon Dardis	2016-09-09	2	-85/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of this effort, remove MipsFCmp nodes and use tablegen patterns rather than custom lowering through C++. Unexpectedly, this improves codesize for microMIPS as previous floating point setcc expansions would materialize 0 and 1 into GPRs before using the relevant mov[tf].[sd] instruction. Now $zero is used directly. Reviewers: dsanders, vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D23118 llvm-svn: 281022
*	[Sparc][LEON] Unit test for CASA instruction supported by some LEON ↵	Chris Dewhurst	2016-09-09	1	-0/+14
\| \| \| \| \| \|	processors added. llvm-svn: 281021
*	[AVX-512] Add VPCMP instructions to the load folding tables and make them ↵	Craig Topper	2016-09-09	1	-12/+6
\| \| \| \| \| \|	commutable. llvm-svn: 281013
*	[AVX-512] Add more integer vector comparison tests with loads. Some of these ↵	Craig Topper	2016-09-09	1	-0/+198
\| \| \| \| \| \| \| \|	show opportunities where we can commute to fold loads. Commutes will be added in a followup commit. llvm-svn: 281012
*	[X86] Add more baseline tests for "irregular" shuffles. NFC.	Michael Kuperstein	2016-09-09	1	-13/+1136
\| \| \| \| \| \| \|	This adds more tests for shuffles where the output width does not match the input width and/or the output is generated from more than two inputs. llvm-svn: 281005
*	Win64: Don't use REX prefix for direct tail calls	Hans Wennborg	2016-09-08	3	-3/+3
\| \| \| \| \| \| \| \| \| \|	The REX prefix should be used on indirect jmps, but not direct ones. For direct jumps, the unwinder looks at the offset to determine if it's inside the current function. Differential Revision: https://reviews.llvm.org/D24359 llvm-svn: 281003
*	[RDF] Further improve handling of multiple phis reached from shadows	Krzysztof Parzyszek	2016-09-08	1	-0/+40
\| \| \| \|	llvm-svn: 280987
*	[Hexagon] Expand sext- and zextloads of vector types, not just extloads	Krzysztof Parzyszek	2016-09-08	1	-0/+10
\| \| \| \| \| \|	Recent change exposed this issue, breaking the Hexagon buildbots. llvm-svn: 280973
*	AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32	Matt Arsenault	2016-09-08	2	-0/+32
\| \| \| \|	llvm-svn: 280972
*	AMDGPU: Support commuting with immediate in src0	Matt Arsenault	2016-09-08	30	-79/+80
\| \| \| \|	llvm-svn: 280970
*	Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"	Renato Golin	2016-09-08	2	-48/+0
\| \| \| \| \| \| \| \| \| \|	And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
*	[SDAGBuilder] Don't create a binary tree for switches in minsize mode	James Molloy	2016-09-08	1	-0/+34
\| \| \| \| \| \|	This bloats codesize - all of the non-leaf nodes are extra code. llvm-svn: 280932
*	[Thumb1] AND with a constant operand can be converted into BIC	James Molloy	2016-09-08	1	-0/+18
\| \| \| \| \| \| \|	So model the cost of materializing the constant operand C as the minimum of C and ~C. llvm-svn: 280929
*	[Thumb1] Fix cost calculation for complemented immediates	James Molloy	2016-09-08	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	Materializing something like "-3" can be done as 2 instructions: MOV r0, #3 MVN r0, r0 This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256. There were no tests failing after changing this... :/ llvm-svn: 280928
*	[SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and ↵	Simon Pilgrim	2016-09-08	3	-14/+10
\| \| \| \| \| \| \| \| \| \| \| \|	SimplifyDemandedBits Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element. This is an initial step towards determining the sign bits of a vector (PR29079). Differential Revision: https://reviews.llvm.org/D24253 llvm-svn: 280927
*	[DAGCombiner] Enable AND combines of splatted constant vectors	Simon Pilgrim	2016-09-08	1	-4/+2
\| \| \| \| \| \| \| \|	Allow AND combines to use a vector splatted constant as well as a constant scalar. Preliminary part of D24253. llvm-svn: 280926
*	Revert "[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)"	Pablo Barrio	2016-09-08	1	-43/+0
\| \| \| \| \| \| \| \| \| \|	This reverts commit r280808. It is possible that this change results in an infinite loop. This is causing timeouts in some tests on ARM, and a Chromebook bot is failing. llvm-svn: 280918
*	[CGP] Be less conservative about tail-duplicating a ret to allow tail calls	Michael Kuperstein	2016-09-08	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CGP tail-duplicates rets into blocks that end with a call that feed the ret. This puts the call in tail position, potentially allowing the DAG builder to lower it as a tail call. To avoid tail duplication in cases where we won't form the tail call, CGP tried to predict whether this is going to be possible, and avoids doing it when lowering as a tail call will definitely fail. However, it was being too conservative by always throwing away calls to functions with a signext/zeroext attribute on the return type. Instead, we can use the same logic the builder uses to determine whether the attributes work out. Differential Revision: https://reviews.llvm.org/D24315 llvm-svn: 280894
*	[XRay] ARM 32-bit no-Thumb support in LLVM	Dean Michael Berris	2016-09-08	2	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \|	This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
*	Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.	Elena Demikhovsky	2016-09-07	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \|	https://llvm.org/bugs/show_bug.cgi?id=29058. While node legalization we tried to legalize its operands. If an operand node is replaced during legalization the user node may be destroyed. Differential Revision: https://reviews.llvm.org/D24244 llvm-svn: 280862
*	[RDF] Fix liveness analysis for phi nodes with shadow uses	Krzysztof Parzyszek	2016-09-07	1	-0/+64
\| \| \| \| \| \| \| \|	Shadow uses need to be analyzed together, since each individual shadow will only have a partial reaching def. All shadows together may cover a given register ref, while each individual shadow may not. llvm-svn: 280855
*	Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more ↵	Wei Mi	2016-09-07	1	-0/+4
\| \| \| \| \| \| \| \|	meaningful, NFC. Add PR number and comment in pr30298.ll to explain what is testing. llvm-svn: 280843
*	Don't reduce the width of vector mul if the target doesn't support SSE2.	Wei Mi	2016-09-07	1	-0/+43
\| \| \| \| \| \| \| \| \|	The patch is to fix PR30298, which is caused by rL272694. The solution is to bail out if the target has no SSE2. Differential Revision: https://reviews.llvm.org/D24288 llvm-svn: 280837
*	Add more triple to conditional-tailcall.ll test	Hans Wennborg	2016-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 280835
*	CodeGen: ensure that libcalls are always AAPCS CC	Saleem Abdulrasool	2016-09-07	2	-2/+2
\| \| \| \| \| \| \|	The original commit was too aggressive about marking LibCalls as AAPCS. The libcalls contain libc/libm/libunwind calls which are not AAPCS, but C. llvm-svn: 280833
*	X86: Fold tail calls into conditional branches where possible (PR26302)	Hans Wennborg	2016-09-07	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832
*	AMDGPU: Add hidden kernel arguments to runtime metadata	Yaxun Liu	2016-09-07	1	-31/+1326
\| \| \| \| \| \| \| \|	OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829
*	Fix typo in test - it should be masking bits0-15 not bit16	Simon Pilgrim	2016-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 280816
*	[X86][SSE] Added or combine tests for known bits of vectors	Simon Pilgrim	2016-09-07	1	-0/+51
\| \| \| \| \| \|	Part of the yak shaving for D24253 llvm-svn: 280813
*	[X86][SSE] Added and+or+zext combine tests for known bits of vectors	Simon Pilgrim	2016-09-07	1	-0/+32
\| \| \| \| \| \|	Part of the yak shaving for D24253 llvm-svn: 280810
*	[X86][SSE] Added and+or combine tests currently failing with vectors	Simon Pilgrim	2016-09-07	1	-1/+28
\| \| \| \| \| \| \| \|	(and (or x, C), D) -> D if (C & D) == D Part of the yak shaving for D24253 llvm-svn: 280809
*	[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)	Pablo Barrio	2016-09-07	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This saves a library call to __aeabi_uidivmod. However, the processor must feature hardware division in order to benefit from the transformation. Reviewers: scott-0, jmolloy, compnerd, rengolin Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D24133 llvm-svn: 280808
*	[mips] Disable the TImode shift libcalls for 32-bit targets.	Vasileios Kalintiris	2016-09-07	3	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The o32 ABI doesn't not support the TImode helpers. For the time being, disable just the shift libcalls as they break recursive builds on MIPS. Reviewers: sdardis Subscribers: llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D24259 llvm-svn: 280798
*	[PowerPC] Fix address-offset folding for plain addi	Hal Finkel	2016-09-07	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When folding an addi into a memory access that can take an immediate offset, we were implicitly assuming that the existing offset was zero. This was incorrect. If we're dealing with an addi with a plain constant, we can add it to the existing offset (assuming that doesn't overflow the immediate, etc.), but if we have anything else (i.e. something that will become a relocation expression), we'll go back to requiring the existing immediate offset to be zero (because we don't know what the requirements on that relocation expression might be - e.g. maybe it is paired with some addis in some relevant way). On the other hand, when dealing with a plain addi with a regular constant immediate, the alignment restrictions (from the TOC base pointer, etc.) are irrelevant. I've added the test case from PR30280, which demonstrated the bug, but also demonstrates a missed optimization opportunity (i.e. we don't need the memory accesses at all). Fixes PR30280. llvm-svn: 280789
*	AVX512F: FMA intrinsic + FNEG - sequence optimization	Elena Demikhovsky	2016-09-07	1	-20/+15
\| \| \| \| \| \| \| \| \| \| \|	The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set. FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))). It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead. I added pattern match for integer XOR. Differential Revision: https://reviews.llvm.org/D24221 llvm-svn: 280785
*	[AVX-512] Add support for commuting masked instructions in ↵	Craig Topper	2016-09-07	1	-0/+52
\| \| \| \| \| \|	findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781
*	Revert "CodeGen: ensure that libcalls are always AAPCS CC"	Saleem Abdulrasool	2016-09-07	5	-176/+171
\| \| \| \| \| \| \|	This reverts SVN r280683. Revert until I figure out why this is breaking lli tests. llvm-svn: 280778
*	[DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)	Hal Finkel	2016-09-06	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I might have called this "r246507, the sequel". It fixes the same issue, as the issue has cropped up in a few more places. The underlying problem is that isSetCCEquivalent can pick up select_cc nodes with a result type that is not legal for a setcc node to have, and if we use that type to create new setcc nodes, nothing fixes that (and so we've violated the contract that the infrastructure has with the backend regarding setcc node types). Fixes PR30276. For convenience, here's the commit message from r246507, which explains the problem is greater detail: [DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 280767
*	[AMDGPU] Wave and register controls	Konstantin Zhuravlyov	2016-09-06	1	-0/+190
\| \| \| \| \| \|	- Add missing test llvm-svn: 280749