bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[Hexagon] Enabling ASM parsing on Hexagon backend and adding instruction ↵	Colin LeMahieu	2015-11-09	6	-0/+6
\| \| \| \| \| \|	parsing tests. General updating of the code emission. llvm-svn: 252443
*	[PowerPC] Fix LoopPreIncPrep not to depend on SCEV constant simplifications	Hal Finkel	2015-11-08	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under most circumstances, if SCEV can simplify X-Y to a constant, then it can also simplify Y-X to a constant. However, there is no guarantee that this is always true, and concensus is not to consider that a correctness bug in SCEV (although it is undesirable). PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and prefetches) into buckets, where in each bucket the relative pointer offsets are constant. We used to keep each bucket as a multimap, where SCEV's subtraction operation was used to define the ordering predicate. Instead, use a fixed SCEV base expression for each bucket, record the constant offsets from that base expression, and adjust it later, if desirable, once all pointers have been collected. Doing it this way should be more compile-time efficient than the previous scheme (in addition to making the implementation less sensitive to SCEV simplification quirks). Fixes PR25170. llvm-svn: 252417
*	[WinEH] Update PHIs of CATCHRET successors	David Majnemer	2015-11-08	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	The TailDuplication machine pass ran across a malformed CFG: a PHI node referred it's predecessor's predecessor instead of it's predecessor. This occurred because we split the edge in X86ISelLowering when we processed the CATCHRET but forgot to do something about the PHI nodes. This fixes PR25444. llvm-svn: 252413
*	[WinEH] Update exception pointer registers	Joseph Tremoulet	2015-11-07	2	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383
*	[AArch64][FastISel] Don't even try to select vector icmps.	Ahmed Bougacha	2015-11-06	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \|	We used to try to constant-fold them to i32 immediates. Given that fast-isel doesn't otherwise support vNi1, when selecting the result users, we'd fallback to SDAG anyway. However, if the users were in another block, we'd insert broken cross-class copies (GPR32 to FPR64). Give up, let SDAG agree with itself on a vNi1 legalization strategy. llvm-svn: 252364
*	[X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.	Ahmed Bougacha	2015-11-06	2	-12/+4
\| \| \| \| \| \| \| \| \| \| \|	When matching non-LSB-extracting truncating broadcasts, we now insert the necessary SRL. If the scalar resulted from a load, the SRL will be folded into it, creating a narrower, offset, load. However, i16 loads aren't Desirable, so we get i16->i32 zextloads. We already catch i16 aextloads; catch these as well. llvm-svn: 252363
*	[X86] SRL non-LSB extracts when folding to truncating broadcasts.	Ahmed Bougacha	2015-11-06	4	-58/+110
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we recognize this, we can support it instead of bailing out. That is, we can fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc (srl Y, 16))))) llvm-svn: 252362
*	[X86] Don't fold non-LSB extracts into truncating broadcasts.	Ahmed Bougacha	2015-11-06	4	-0/+396
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to incorrectly assume that the offset we're extracting from was a multiple of the element size. So, we'd fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc Y)))) whereas we should have extracted the higher bits from X. Instead, bail out if the assumption doesn't hold. llvm-svn: 252361
*	DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> ↵	Tom Stellard	2015-11-06	3	-10/+32
\| \| \| \| \| \| \| \| \| \| \| \|	extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349
*	[WebAssembly] Use more explicit types in testcases.	Dan Gohman	2015-11-06	10	-114/+114
\| \| \| \|	llvm-svn: 252345
*	[WebAssembly] Add more explicit pushes to the tests.	Dan Gohman	2015-11-06	19	-169/+169
\| \| \| \|	llvm-svn: 252344
*	[ShrinkWrapping] Teach shrink-wrapping how to analyze RegMask.	Quentin Colombet	2015-11-06	1	-0/+59
\| \| \| \| \| \| \|	Previously we were conservatively assuming that RegMask operands clobber callee saved registers. llvm-svn: 252341
*	Improved the operands commute transformation for X86-FMA3 instructions.	Andrew Kaylor	2015-11-06	2	-12/+515
\| \| \| \| \| \| \| \| \| \| \| \|	All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335
*	[WebAssembly] Make expression-stack pushing explicit	Dan Gohman	2015-11-06	15	-191/+191
\| \| \| \| \| \| \| \| \|	Modelling of the expression stack is evolving. This patch takes another step by making pushes explicit. Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252334
*	AMDGPU: Create emergency stack slots during frame lowering	Matt Arsenault	2015-11-06	2	-1/+487
\| \| \| \| \| \|	Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
*	AMDGPU: Add pass to detect used kernel features	Matt Arsenault	2015-11-06	1	-0/+193
\| \| \| \| \| \| \| \| \| \| \|	Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323
*	AMDGPU: Hack for VS_32 register pressure	Matt Arsenault	2015-11-06	2	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
*	[WinEH] Mark funclet entries and exits as clobbering all registers	Reid Kleckner	2015-11-06	2	-0/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In this implementation, LiveIntervalAnalysis invents a few register masks on basic block boundaries that preserve no registers. The nice thing about this is that it prevents the prologue inserter from thinking it needs to spill all XMM CSRs, because it doesn't see any explicit physreg defs in the MI. Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer Subscribers: MatzeB, llvm-commits Differential Revision: http://reviews.llvm.org/D14407 llvm-svn: 252318
*	[AArch64]Enable the narrow ld promotion only on profitable microarchitectures	Jun Bum Lim	2015-11-06	2	-48/+47
\| \| \| \| \| \| \| \| \|	The benefit from converting narrow loads into a wider load (r251438) could be micro-architecturally dependent, as it assumes that a single load with two bitfield extracts is cheaper than two narrow loads. Currently, this conversion is enabled only in cortex-a57 on which performance benefits were verified. llvm-svn: 252316
*	Bring r252305 back with a test fix.	Rafael Espindola	2015-11-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We now create the .eh_frame section early, just like every other special section. This means that the special flags are visible in code that explicitly asks for ".eh_frame". llvm-svn: 252313
*	[mips] Define patterns for the atomic_{load,store}_{8,16,32,64} nodes.	Vasileios Kalintiris	2015-11-06	4	-32/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without these patterns we would generate a complete LL/SC sequence. This would be problematic for memory regions marked as WRITE-only or READ-only, as the instructions LL/SC would read/write to the protected memory regions correspondingly. Reviewers: dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D14397 llvm-svn: 252293
*	AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL	Tom Stellard	2015-11-06	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
*	Revert r252249 (and r252255, r252258), "[WinEH] Clone funclets with multiple ↵	NAKAMURA Takumi	2015-11-06	4	-1690/+10
\| \| \| \| \| \| \| \|	parents" It behaved flaky due to iterating pointer key values on std::set and std::map. llvm-svn: 252279
*	Temporarily disable flaky checks in wineh-multi-parent-cloning.	Andrew Kaylor	2015-11-06	1	-4/+8
\| \| \| \|	llvm-svn: 252258
*	[WinEH] Clone funclets with multiple parents	Andrew Kaylor	2015-11-06	4	-10/+1686
\| \| \| \| \| \| \| \| \| \|	Windows EH funclets need to always return to a single parent funclet. However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated. These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet. Differential Revision: http://reviews.llvm.org/D13274?id=39098 llvm-svn: 252249
*	DI: Reverse direction of subprogram -> function edge.	Peter Collingbourne	2015-11-05	57	-155/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
*	[WinEH] Fix funclet prologues with stack realignment	Reid Kleckner	2015-11-05	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a test for this for 32-bit SEH catchpads, but those don't actually create funclets. We had a bug that only appeared in funclet prologues, where we would establish EBP and ESI as our FP and BP, and then downstream prologue code would overwrite them. While I was at it, I fixed Win64+funclets+stackrealign. This issue doesn't come up as often there due to the ABI requring 16 byte stack alignment, but now we can rest easy that AVX and WinEH will work well together =P. llvm-svn: 252210
*	[WebAssembly] Update wasm builtin functions to match spec changes.	Dan Gohman	2015-11-05	2	-34/+10
\| \| \| \| \| \| \|	The page_size operator has been removed from the spec, and the resize_memory operator has been changed to grow_memory. llvm-svn: 252202
*	Add cfi instr for CFA calculation when movpc is expanded to call and pop	Petar Jovanovic	2015-11-05	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the issue of wrong CFA calculation in the following case: 0x08048400 <+0>: push %ebx 0x08048401 <+1>: sub $0x8,%esp 0x08048404 <+4>: call 0x8048409 <test+9> 0x08048409 <+9>: pop %eax 0x0804840a <+10>: add $0x1bf7,%eax 0x08048410 <+16>: mov %eax,%ebx 0x08048412 <+18>: call 0x80483f0 <bar> 0x08048417 <+23>: add $0x8,%esp 0x0804841a <+26>: pop %ebx 0x0804841b <+27>: ret The highlighted instructions are a product of movpc instruction. The call instruction changes the stack pointer, and pop instruction restores its value. However, the rule for computing CFA is not updated and is wrong on the pop instruction. So, e.g. backtrace in gdb does not work when on the pop instruction. This adds cfi instructions for both call and pop instructions. cfi_adjust_cfa_offset** instruction is used with the appropriate offset for setting the rules to calculate CFA correctly. Patch by Violeta Vukobrat. Differential Revision: http://reviews.llvm.org/D14021 llvm-svn: 252176
*	[WebAssembly] Rename ior operator to or to match the spec	Derek Schuff	2015-11-05	4	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The spec uses "or" for inclusive-or and "xor" for exclusive-or Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D14362 llvm-svn: 252174
*	revert rev. 252153 due to build failure on ubuntu	Asaf Badouh	2015-11-05	1	-75/+0
\| \| \| \| \| \|	[X86][AVX512] add comi with Sae llvm-svn: 252154
*	[X86][AVX512] add comi with Sae	Asaf Badouh	2015-11-05	1	-0/+75
\| \| \| \| \| \| \| \|	add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153
*	AMDGPU: Fix assert when legalizing atomic operands	Matt Arsenault	2015-11-05	1	-0/+52
\| \| \| \| \| \| \| \| \| \|	The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
*	[WinEH] Fix establisher param reg in CLR funclets	Joseph Tremoulet	2015-11-05	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The CLR's personality routine passes the pointer to the establisher frame in RCX, not RDX. Reviewers: pgavlin, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14343 llvm-svn: 252135
*	AMDGPU: Add missing v2f64 fadd tests	Matt Arsenault	2015-11-05	1	-10/+42
\| \| \| \|	llvm-svn: 252117
*	[x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.	Quentin Colombet	2015-11-04	1	-0/+122
\| \| \| \| \| \| \| \| \| \|	Win64 has some strict requirements for the epilogue. As a result, we disable shrink-wrapping for Win64 unless the block that gets the epilogue is already an exit block. Fixes PR24193. llvm-svn: 252088
*	[X86][SSE] Add general memory folding for (V)INSERTPS instruction	Simon Pilgrim	2015-11-04	5	-13/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074
*	Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of ↵	Andrew Kaylor	2015-11-04	2	-248/+939
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	scalar FMA intrinsics. Patch by Slava Klochkov The key difference between FMA* and FMA_Int opcodes is that FMA_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result. Reviewers: Quentin Colombet, Elena Demikhovsky Differential Revision: http://reviews.llvm.org/D13710 llvm-svn: 252060
*	[ARM] Combine CMOV into BFI where possible	James Molloy	2015-11-04	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057
*	[X86] DAGCombine should not introduce FILD in soft-float mode	Michael Kuperstein	2015-11-04	1	-0/+15
\| \| \| \| \| \| \|	The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode. llvm-svn: 252042
*	[StatepointLowering] Remove distinction between call and invoke safepoints	Igor Laevsky	2015-11-04	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \|	There is no point in having invoke safepoints handled differently than the call safepoints. All relevant decisions could be made by looking at whether or not gc.result and gc.relocate lay in a same basic block. This change will allow to lower call safepoints with relocates and results in a different basic blocks. See test case for example. Differential Revision: http://reviews.llvm.org/D14158 llvm-svn: 252028
*	Address nit	Derek Schuff	2015-11-03	1	-32/+32
\| \| \| \|	llvm-svn: 252004
*	[WebAssembly] Support wasm select operator	Derek Schuff	2015-11-03	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for wasm's select operator, and lower LLVM's select DAG node to it. Reviewers: sunfish Subscribers: dschuff, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D14295 llvm-svn: 252002
*	[X86][AVX] Tweaked shuffle stack folding tests	Simon Pilgrim	2015-11-03	2	-5/+5
\| \| \| \| \| \|	To avoid alternative lowerings. llvm-svn: 251986
*	[X86][AVX512] Fixed shuffle test name to match shuffle	Simon Pilgrim	2015-11-03	1	-2/+2
\| \| \| \|	llvm-svn: 251984
*	[X86][XOP] Add support for the matching of the VPCMOV bit select instruction	Simon Pilgrim	2015-11-03	2	-3/+164
\| \| \| \| \| \| \| \| \| \|	XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) ) This patch adds tablegen pattern matching for this instruction. Differential Revision: http://reviews.llvm.org/D8841 llvm-svn: 251975
*	Remove unnecessary dependency on section and string positions.	Rafael Espindola	2015-11-03	1	-2/+1
\| \| \| \|	llvm-svn: 251964
*	[X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments	Michael Kuperstein	2015-11-03	6	-87/+282
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When push instructions are being used to pass function arguments on the stack, and either EH or debugging are enabled, we need to generate .cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is enough for the CFA offset to be correct at every call site, while for debugging we want to be correct after every push. Darwin does not support this well, so don't use pushes whenever it would be required. Differential Revision: http://reviews.llvm.org/D13767 llvm-svn: 251904
*	AMDGPU: Stop assuming vreg for build_vector	Matt Arsenault	2015-11-02	1	-8/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860
*	AMDGPU: Error on graphics shaders with HSA	Matt Arsenault	2015-11-02	1	-0/+18
\| \| \| \| \| \| \| \|	I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858