bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add support for folding (insert_subvector vec1, (extract_subvector ↵	Craig Topper	2017-02-04	3	-6/+3
\| \| \| \| \| \|	vec2, idx1), idx1) -> (blendi vec2, vec1). llvm-svn: 294112
*	[DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.	Craig Topper	2017-02-04	1	-18/+16
\| \| \| \| \| \|	Based on similar code for INSERT_VECTOR_ELT. llvm-svn: 294110
*	Add test cases for (trunc adde) DAGCombiner patterns. NFC	Amaury Sechet	2017-02-04	1	-0/+63
\| \| \| \|	llvm-svn: 294105
*	[X86][SSE] Add target shuffle combine buildvec style tests	Simon Pilgrim	2017-02-04	2	-0/+69
\| \| \| \| \| \|	Extra tests for D29399 llvm-svn: 294101
*	MachineCopyPropagation: Respect implicit operands of COPY	Matthias Braun	2017-02-04	1	-0/+22
\| \| \| \| \| \| \| \| \|	The code missed to check implicit operands of COPY instructions for defs/uses. Differential Revision: https://reviews.llvm.org/D29522 llvm-svn: 294088
*	MachineCopyPropagation: Do not consider undef operands as clobbers	Matthias Braun	2017-02-04	1	-23/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was originally introduced in r278321 to work around correctness problems in the ExecutionDepsFix pass; Probably also to keep the performance benefits of breaking the false dependencies which of course also affect undef operands. ExecutionDepsFix has been improved here recently (see for example r278321) so we should not need this exception any longer. Differential Revision: https://reviews.llvm.org/D29525 llvm-svn: 294087
*	[NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc.	Justin Lebar	2017-02-04	1	-0/+24
\| \| \| \|	llvm-svn: 294082
*	Add test cases for bug 31719. NFC	Amaury Sechet	2017-02-04	1	-14/+103
\| \| \| \|	llvm-svn: 294080
*	[RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUE	Brendon Cahoon	2017-02-04	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \|	An assert occurs when calling SlotIndexes::getInstructionIndex with a DBG_VALUE instruction because the function expects an instruction with a slot index. However, there is no slot index for a DBG_VALUE instruction. Differential Revision: https://reviews.llvm.org/D29048 llvm-svn: 294070
*	AMDGPU: Cleanup scalar_to_vector test	Matt Arsenault	2017-02-03	1	-13/+13
\| \| \| \|	llvm-svn: 294038
*	AMDGPU: Set MCAsmInfo::PointerSize	Matt Arsenault	2017-02-03	1	-0/+26
\| \| \| \|	llvm-svn: 294031
*	[TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.	Ahmed Bougacha	2017-02-03	8	-328/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-applies commit r292189, reverted in r292191. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to a few tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. - a handful of SystemZ tests check the SDAG support for lax prototype checking: Ulrich agrees on removing them. I don't think it's worth supporting any of these (IMO) invalid testcases. Instead, fix them to be more meaningful. llvm-svn: 294028
*	GlobalISel: translate dynamic alloca instructions.	Tim Northover	2017-02-03	1	-0/+57
\| \| \| \|	llvm-svn: 294022
*	[X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) ↵	Simon Pilgrim	2017-02-03	3	-11/+6
\| \| \| \| \| \| \| \|	into a target shuffle. Correctly flagging upper elements as undef. llvm-svn: 294020
*	[X86][SSE] Renamed all_of/any_of reduction patterns tests	Simon Pilgrim	2017-02-03	2	-150/+150
\| \| \| \| \| \|	Make it clear these tests sign-extend the comparison result. Some patterns zero-extend to a bool result that we still need to handle. llvm-svn: 294018
*	[NVPTX] Enable combineRepeatedFPDivisors for NVPTX.	Justin Lebar	2017-02-03	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011
*	[SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=	Alexey Bataev	2017-02-03	1	-0/+241
\| \| \| \| \| \| \| \| \| \| \| \|	ISD::DELETED_NODE && "NodeToMatch was removed partway through selection"' failed. NodeToMatch can be modified during matching, but code does not handle this situation. Differential Revision: https://reviews.llvm.org/D29292 llvm-svn: 294003
*	[ARM] Change TCReturn to tBL if tailcall optimization fails.	Sanne Wouda	2017-02-03	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code. Reviewers: rengolin, jmolloy, rovka, olista01 Reviewed By: olista01 Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D29020 llvm-svn: 294000
*	[LLC] Add an inline assembly diagnostics handler.	Sanne Wouda	2017-02-03	8	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: llc would hit a fatal error for errors in inline assembly. The diagnostics message is now printed. Reviewers: rengolin, MatzeB, javed.absar, anemet Reviewed By: anemet Subscribers: jyknight, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D29408 llvm-svn: 293999
*	AMDGPU: Fold fneg into fmin/fmax_legacy	Matt Arsenault	2017-02-03	2	-0/+79
\| \| \| \|	llvm-svn: 293972
*	AMDGPU: Fold fneg into fminnum/fmaxnum	Matt Arsenault	2017-02-03	1	-0/+264
\| \| \| \|	llvm-svn: 293968
*	llvm-readobj: fix next note entry calculation and print unknown note types	Konstantin Zhuravlyov	2017-02-02	1	-1/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29131 llvm-svn: 293964
*	AMDGPU: Check if users of fneg can fold mods	Matt Arsenault	2017-02-02	5	-76/+502
\| \| \| \| \| \|	In multi-use cases this can save a few instructions. llvm-svn: 293962
*	[X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to ↵	Craig Topper	2017-02-02	1	-10/+6
\| \| \| \| \| \| \| \|	DAG combine. On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI. llvm-svn: 293944
*	[ARM] Classification Improvements to ARM Sched-Model. NFCI.	Javed Absar	2017-02-02	1	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \|	This is the second in the series of patches to enable adding of machine sched-models for ARM processors easier and compact. This patch focuses on integer instructions and adds missing sched definitions. Reviewers: rovka, rengolin Differential Revision: https://reviews.llvm.org/D29127 llvm-svn: 293935
*	[Hexagon] Fix insertBranch for loops with multiple ENDLOOP instructions	Krzysztof Parzyszek	2017-02-02	1	-0/+79
\| \| \| \|	llvm-svn: 293925
*	[X86][XOP] Added FIXME comments to missed shuffle combine opportunities	Simon Pilgrim	2017-02-02	1	-0/+2
\| \| \| \| \| \|	Requested by @silvas llvm-svn: 293916
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-02-02	66	-1664/+1846
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
*	[X86][SSE] Add test case for PR18344	Simon Pilgrim	2017-02-02	1	-0/+89
\| \| \| \|	llvm-svn: 293907
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-02	66	-1846/+1664
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
*	[ARM] GlobalISel: Lower pointer args and returns	Diana Picus	2017-02-02	2	-0/+67
\| \| \| \| \| \| \| \| \|	It is important to change the ArgInfo's type from pointer to integer, otherwise the CC assign function won't know what to do. Instead of hacking it up, we use ComputeValueVTs and introduce some of the helpers that we will need later on for lowering more complex types. llvm-svn: 293889
*	[ARM] GlobalISel: Legalize loading pointers	Diana Picus	2017-02-02	2	-0/+37
\| \| \| \| \| \| \|	Make it legal to load pointer values. Also check that pointers are assigned to the GPR reg bank by default. llvm-svn: 293886
*	[ARM] GlobalISel: Test default banks for load results. NFC.	Diana Picus	2017-02-02	1	-0/+32
\| \| \| \| \| \|	Check that all scalars are loaded into the GPR by default. llvm-svn: 293883
*	[X86][SSE] Use MOVMSK for all_of/any_of reduction patterns	Simon Pilgrim	2017-02-02	2	-542/+282
\| \| \| \| \| \| \| \| \| \|	This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880
*	[AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.	Craig Topper	2017-02-02	1	-28/+14
\| \| \| \|	llvm-svn: 293862
*	[AVX-512] Add test case demonstrating that we have an incomplete implicit ↵	Craig Topper	2017-02-02	1	-2/+42
\| \| \| \| \| \|	def list for VZEROALL/VZEROUPPER. YMM16-YMM31 should also be defs. llvm-svn: 293861
*	[X86] Use update_llc_test_checks.py to regenerate a test.	Craig Topper	2017-02-02	1	-6/+25
\| \| \| \|	llvm-svn: 293860
*	AMDGPU: Use source modifiers with f16->f32 conversions	Matt Arsenault	2017-02-02	7	-106/+375
\| \| \| \| \| \| \| \| \| \| \|	The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
*	NVPTX: Fix not preserving volatile when expanding memset	Matt Arsenault	2017-02-02	1	-0/+13
\| \| \| \|	llvm-svn: 293851
*	X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).	Peter Collingbourne	2017-02-02	1	-2/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844
*	[AMDGPU] Account workgroup size in LDS occupancy limits	Stanislav Mekhanoshin	2017-02-01	2	-26/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
*	[ImplicitNullChecks] NFC Fix the implicit-null-checks.mir test	Sanjoy Das	2017-02-01	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently the test implicit-null-checks.mir crashes if we run llc with -enable-implicit-null-checks -start-before implicit-null-checks options. Change fixes the RET instruction causing the crash. Patch by Serguei Katkov! Reviewers: sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29390 llvm-svn: 293789
*	AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops	Matt Arsenault	2017-02-01	1	-44/+44
\| \| \| \| \| \| \| \| \| \| \| \|	These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. llvm-svn: 293776
*	[ImplicitNullCheck] Extend canReorder scope	Sanjoy Das	2017-02-01	2	-4/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change allows a re-order of two intructions if their uses are overlapped. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29120 llvm-svn: 293775
*	[PowerPC] Fix sjlj pseduo instructions to use G8RC_NOX0 register class	Kit Barton	2017-02-01	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The the following instructions: - LD/LWZ (expanded from sjLj pseudo-instructions) - LXVL/LXVLL vector loads - STXVL/STXVLL vector stores all require G8RC_NO0X class registers for RA. Differential Revision: https://reviews.llvm.org/D29289 Committed for Lei Huang llvm-svn: 293769
*	[ARM] Enable Cortex-M23 and Cortex-M33 support.	Javed Absar	2017-02-01	2	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add both cores to the target parser and TableGen. Test that eabi attributes are set correctly for both cores. Additionally, test the absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively. Committed on behalf of Sanne Wouda. Reviewers : rengolin, olista01. Differential Revision: https://reviews.llvm.org/D29073 llvm-svn: 293761
*	[CodeGen] Move MacroFusion to the target	Evandro Menezes	2017-02-01	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737
*	CodeGen: Allow small copyable blocks to "break" the CFG.	Kyle Butt	2017-01-31	25	-159/+265
\| \| \| \| \| \| \| \| \| \| \|	When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716
*	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).	Justin Lebar	2017-01-31	2	-5/+7
\| \| \| \| \| \| \| \| \| \|	x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713
*	GlobalISel: the translation of an invoke must branch to the good block.	Tim Northover	2017-01-31	1	-0/+1
\| \| \| \| \| \| \|	Otherwise bad things happen if the basic block order isn't trivial after an invoke. llvm-svn: 293679