bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AArch64][FastISel] Don't even try to select vector icmps.	Ahmed Bougacha	2015-11-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	We used to try to constant-fold them to i32 immediates. Given that fast-isel doesn't otherwise support vNi1, when selecting the result users, we'd fallback to SDAG anyway. However, if the users were in another block, we'd insert broken cross-class copies (GPR32 to FPR64). Give up, let SDAG agree with itself on a vNi1 legalization strategy. llvm-svn: 252364
*	[X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.	Ahmed Bougacha	2015-11-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	When matching non-LSB-extracting truncating broadcasts, we now insert the necessary SRL. If the scalar resulted from a load, the SRL will be folded into it, creating a narrower, offset, load. However, i16 loads aren't Desirable, so we get i16->i32 zextloads. We already catch i16 aextloads; catch these as well. llvm-svn: 252363
*	[X86] SRL non-LSB extracts when folding to truncating broadcasts.	Ahmed Bougacha	2015-11-06	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we recognize this, we can support it instead of bailing out. That is, we can fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc (srl Y, 16))))) llvm-svn: 252362
*	[X86] Don't fold non-LSB extracts into truncating broadcasts.	Ahmed Bougacha	2015-11-06	1	-12/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to incorrectly assume that the offset we're extracting from was a multiple of the element size. So, we'd fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc Y)))) whereas we should have extracted the higher bits from X. Instead, bail out if the assumption doesn't hold. llvm-svn: 252361
*	AMDGPU/SI: Refactor VOP[12C] tablegen definitions	Tom Stellard	2015-11-06	2	-97/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Pass the VOPProfile object all the through to *_m multiclasses. This will allow us to do more simplifications in the future. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13437 llvm-svn: 252339
*	Improved the operands commute transformation for X86-FMA3 instructions.	Andrew Kaylor	2015-11-06	3	-80/+411
\| \| \| \| \| \| \| \| \| \| \| \|	All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335
*	[WebAssembly] Make expression-stack pushing explicit	Dan Gohman	2015-11-06	1	-7/+19
\| \| \| \| \| \| \| \| \|	Modelling of the expression stack is evolving. This patch takes another step by making pushes explicit. Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252334
*	AMDGPU: Cleanup includes	Matt Arsenault	2015-11-06	2	-6/+4
\| \| \| \|	llvm-svn: 252328
*	AMDGPU: Create emergency stack slots during frame lowering	Matt Arsenault	2015-11-06	7	-14/+89
\| \| \| \| \| \|	Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
*	AMDGPU: Remove unused scratch resource operands	Matt Arsenault	2015-11-06	2	-75/+131
\| \| \| \| \| \|	The SGPR spill pseudos don't actually use them. llvm-svn: 252324
*	AMDGPU: Add pass to detect used kernel features	Matt Arsenault	2015-11-06	4	-0/+138
\| \| \| \| \| \| \| \| \| \| \|	Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323
*	AMDGPU: Fix hardcoded alignment of spill.	Matt Arsenault	2015-11-06	2	-13/+12
\| \| \| \| \| \| \|	Instead of forcing 4 alignment when spilled, set register class alignments. llvm-svn: 252322
*	AMDGPU: Hack for VS_32 register pressure	Matt Arsenault	2015-11-06	2	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
*	[WinEH] Mark funclet entries and exits as clobbering all registers	Reid Kleckner	2015-11-06	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In this implementation, LiveIntervalAnalysis invents a few register masks on basic block boundaries that preserve no registers. The nice thing about this is that it prevents the prologue inserter from thinking it needs to spill all XMM CSRs, because it doesn't see any explicit physreg defs in the MI. Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer Subscribers: MatzeB, llvm-commits Differential Revision: http://reviews.llvm.org/D14407 llvm-svn: 252318
*	[AArch64]Enable the narrow ld promotion only on profitable microarchitectures	Jun Bum Lim	2015-11-06	1	-8/+22
\| \| \| \| \| \| \| \| \|	The benefit from converting narrow loads into a wider load (r251438) could be micro-architecturally dependent, as it assumes that a single load with two bitfield extracts is cheaper than two narrow loads. Currently, this conversion is enabled only in cortex-a57 on which performance benefits were verified. llvm-svn: 252316
*	[mips][ias] Range check uimm4 operands and fixed a bug this revealed.	Daniel Sanders	2015-11-06	3	-14/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The bug was that the sldi instructions have immediate widths dependant on their element size. So sldi.d has a 1-bit immediate and sldi.b has a 4-bit immediate. All of these were using 4-bit immediates previously. Reviewers: vkalintiris Subscribers: llvm-commits, atanasyan, dsanders Differential Revision: http://reviews.llvm.org/D14018 llvm-svn: 252297
*	[mips][ias] Range check uimm3 operands.	Daniel Sanders	2015-11-06	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reviewers: vkalintiris Subscribers: atanasyan, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14016 llvm-svn: 252296
*	[mips][ias] Range check uimm2 operands and fix a bug this revealed.	Daniel Sanders	2015-11-06	9	-46/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The bug was that the MIPS32R6/MIPS64R6/microMIPS32R6 versions of LSA and DLSA (unlike the MSA version) failed to account for the off-by-one encoding of the immediate. The range is actually 1..4 rather than 0..3. Reviewers: vkalintiris Subscribers: atanasyan, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14015 llvm-svn: 252295
*	[mips][ias] Range check uimmz operands.	Daniel Sanders	2015-11-06	2	-2/+34
\| \| \| \| \| \| \| \| \| \|	Reviewers: vkalintiris Subscribers: dsanders, atanasyan, llvm-commits Differential Revision: http://reviews.llvm.org/D14013 llvm-svn: 252294
*	[mips] Define patterns for the atomic_{load,store}_{8,16,32,64} nodes.	Vasileios Kalintiris	2015-11-06	3	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without these patterns we would generate a complete LL/SC sequence. This would be problematic for memory regions marked as WRITE-only or READ-only, as the instructions LL/SC would read/write to the protected memory regions correspondingly. Reviewers: dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D14397 llvm-svn: 252293
*	AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL	Tom Stellard	2015-11-06	6	-0/+60
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
*	[WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH	Reid Kleckner	2015-11-06	6	-52/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the EH_RESTORE x86 pseudo instr, which is responsible for restoring the stack pointers: EBP and ESP, and ESI if stack realignment is involved. We only need this on 32-bit x86, because on x64 the runtime restores CSRs for us. Previously we had to keep the CATCHRET instruction around during SEH so that we could convince X86FrameLowering to restore our frame pointers. Now we can split these instructions earlier. This was confusing, because we had a return instruction which wasn't really a return and was ultimately going to be removed by X86FrameLowering. This change also simplifies X86FrameLowering, which really shouldn't be building new MBBs. No observable functional change currently, but with the new register mask stuff in D14407, CATCHRET will become a register allocator barrier, and our existing tests rely on us having reasonable register allocation around SEH. llvm-svn: 252266
*	Remove windows line endings introduced by r252177. NFC.	Tim Northover	2015-11-05	9	-128/+128
\| \| \| \|	llvm-svn: 252217
*	[WinEH] Fix funclet prologues with stack realignment	Reid Kleckner	2015-11-05	3	-34/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a test for this for 32-bit SEH catchpads, but those don't actually create funclets. We had a bug that only appeared in funclet prologues, where we would establish EBP and ESI as our FP and BP, and then downstream prologue code would overwrite them. While I was at it, I fixed Win64+funclets+stackrealign. This issue doesn't come up as often there due to the ABI requring 16 byte stack alignment, but now we can rest easy that AVX and WinEH will work well together =P. llvm-svn: 252210
*	[WebAssembly] Fix copypasta.	Dan Gohman	2015-11-05	2	-3/+3
\| \| \| \| \| \|	Noticed by dschff in http://reviews.llvm.org/rL252203 llvm-svn: 252208
*	[WebAssembly] Rename Immediate instructions to Const.	Dan Gohman	2015-11-05	2	-16/+16
\| \| \| \| \| \|	This more closely reflects the naming convention in the spec. llvm-svn: 252204
*	[WebAssembly] Add AsmString strings for most instructions.	Dan Gohman	2015-11-05	7	-135/+212
\| \| \| \| \| \| \| \| \|	Mangling type information into MachineInstr opcode names was a temporary measure, and it's starting to get hairy. At the same time, the MC instruction printer wants to use AsmString strings for printing. This patch takes the first step, starting the process of adding AsmStrings for instructions. llvm-svn: 252203
*	[WebAssembly] Update wasm builtin functions to match spec changes.	Dan Gohman	2015-11-05	1	-15/+7
\| \| \| \| \| \| \|	The page_size operator has been removed from the spec, and the resize_memory operator has been changed to grow_memory. llvm-svn: 252202
*	replace MachineCombinerPattern namespace and enum with enum class; NFCI	Sanjay Patel	2015-11-05	4	-36/+36
\| \| \| \| \| \| \| \|	Also, remove an enum hack where enum values were used as indexes into an array. We may want to make this a real class to allow pattern-based queries/customization (D13417). llvm-svn: 252196
*	[WebAssembly] Add WebAssemblyMCInstLower.cpp.	Dan Gohman	2015-11-05	5	-4/+168
\| \| \| \| \| \| \|	This isn't used yet; it's just a start towards eventually using MC to do instruction printing, and eventually binary encoding. llvm-svn: 252194
*	[DebugInfo] Fix ARM/AArch64 prologue_end position. Related to D11268.	Oleg Ranevskyy	2015-11-05	9	-116/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it. D11268 is quite old and has merge conflicts against the current trunk. This request - rebases D11268 onto the new trunk; - resolves the merge conflicts; - fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct. Reviewers: echristo, rengolin, kubabrecka Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252177
*	Add cfi instr for CFA calculation when movpc is expanded to call and pop	Petar Jovanovic	2015-11-05	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the issue of wrong CFA calculation in the following case: 0x08048400 <+0>: push %ebx 0x08048401 <+1>: sub $0x8,%esp 0x08048404 <+4>: call 0x8048409 <test+9> 0x08048409 <+9>: pop %eax 0x0804840a <+10>: add $0x1bf7,%eax 0x08048410 <+16>: mov %eax,%ebx 0x08048412 <+18>: call 0x80483f0 <bar> 0x08048417 <+23>: add $0x8,%esp 0x0804841a <+26>: pop %ebx 0x0804841b <+27>: ret The highlighted instructions are a product of movpc instruction. The call instruction changes the stack pointer, and pop instruction restores its value. However, the rule for computing CFA is not updated and is wrong on the pop instruction. So, e.g. backtrace in gdb does not work when on the pop instruction. This adds cfi instructions for both call and pop instructions. cfi_adjust_cfa_offset** instruction is used with the appropriate offset for setting the rules to calculate CFA correctly. Patch by Violeta Vukobrat. Differential Revision: http://reviews.llvm.org/D14021 llvm-svn: 252176
*	[WebAssembly] Rename ior operator to or to match the spec	Derek Schuff	2015-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The spec uses "or" for inclusive-or and "xor" for exclusive-or Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D14362 llvm-svn: 252174
*	[ARM] Compute known bits for ARMISD::CMOV	James Molloy	2015-11-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities. I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :( This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case. llvm-svn: 252168
*	revert rev. 252153 due to build failure on ubuntu	Asaf Badouh	2015-11-05	4	-101/+1
\| \| \| \| \| \|	[X86][AVX512] add comi with Sae llvm-svn: 252154
*	[X86][AVX512] add comi with Sae	Asaf Badouh	2015-11-05	4	-1/+101
\| \| \| \| \| \| \| \|	add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153
*	[X86][AVX512] small bugfix in VPBROADCASTM	Asaf Badouh	2015-11-05	1	-2/+2
\| \| \| \| \| \| \| \|	VPBROADCASTMW2D and VPBROADCASTMB2Q Differential Revision: http://reviews.llvm.org/D14335 llvm-svn: 252151
*	AMDGPU: Also track whether SGPRs were spilled	Matt Arsenault	2015-11-05	3	-2/+20
\| \| \| \|	llvm-svn: 252145
*	AMDGPU: Print number user SGPRs	Matt Arsenault	2015-11-05	1	-0/+6
\| \| \| \| \| \| \|	This doesn't quite match how SC prints it, which doesn't put it in a comment. llvm-svn: 252144
*	AMDGPU: Disallow s[102:103] on VI in assembler	Matt Arsenault	2015-11-05	1	-2/+28
\| \| \| \|	llvm-svn: 252142
*	AMDGPU: Fix assert when legalizing atomic operands	Matt Arsenault	2015-11-05	3	-15/+59
\| \| \| \| \| \| \| \| \| \|	The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
*	AMDGPU: Make addr64 atomic operand order consistent	Matt Arsenault	2015-11-05	1	-2/+2
\| \| \| \| \| \| \|	vaddr comes before srsrc in every other MUBUF instruction, and is the order it is printed. llvm-svn: 252139
*	[WinEH] Fix establisher param reg in CLR funclets	Joseph Tremoulet	2015-11-05	1	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The CLR's personality routine passes the pointer to the establisher frame in RCX, not RDX. Reviewers: pgavlin, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14343 llvm-svn: 252135
*	Go back to producing relocations for out of range symbols.	Rafael Espindola	2015-11-05	1	-6/+4
\| \| \| \| \| \| \| \|	This brings back the behavior from before r252090 for out of range symbols. Should bring some arm bots back. llvm-svn: 252119
*	AMDGPU: Fix typo	Matt Arsenault	2015-11-05	1	-2/+2
\| \| \| \|	llvm-svn: 252116
*	Slightly saner handling of thumb branches.	Rafael Espindola	2015-11-04	1	-9/+15
\| \| \| \| \| \| \| \|	The generic infrastructure already did a lot of work to decide if the fixup value is know or not. It doesn't make sense to reimplement a very basic case: same fragment. llvm-svn: 252090
*	[x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.	Quentin Colombet	2015-11-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Win64 has some strict requirements for the epilogue. As a result, we disable shrink-wrapping for Win64 unless the block that gets the epilogue is already an exit block. Fixes PR24193. llvm-svn: 252088
*	Warning fix.	Simon Pilgrim	2015-11-04	1	-2/+2
\| \| \| \|	llvm-svn: 252078
*	[X86][SSE] Add general memory folding for (V)INSERTPS instruction	Simon Pilgrim	2015-11-04	3	-58/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074
*	[IR] Add bounds checking to paramHasAttr	Sanjoy Das	2015-11-04	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is intended to make a later change simpler. Note: adding this bounds checking required fixing `X86FastISel`. As far I can tell I've preserved original behavior but a careful review will be appreciated. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14304 llvm-svn: 252073