bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[WebAssembly] Update llvm-readobj tests for switch to version 0x1	Derek Schuff	2017-02-22	2	-1/+1
\| \| \| \|	llvm-svn: 295875
*	AMDGPU: Change exp with compr bit printing	Matt Arsenault	2017-02-22	1	-26/+44
\| \| \| \|	llvm-svn: 295873
*	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	This reverts commit r295867. llvm-svn: 295871
*	[WebAssembly] Update llvm-objdump tests for the new wasm version number.	Dan Gohman	2017-02-22	2	-1/+1
\| \| \| \|	llvm-svn: 295869
*	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result	Alexey Bataev	2017-02-22	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868
*	AMDGPU : Update TrapCode based on Trap Handler ABI.	Wei Ding	2017-02-22	1	-2/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
*	Bring back 2>&1 redirection for this test	Matthias Braun	2017-02-22	1	-1/+1
\| \| \| \|	llvm-svn: 295864
*	[AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.	Geoff Berry	2017-02-22	1	-0/+295
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863
*	[LV] Add scalar floating-point induction test (NFC)	Matthew Simpson	2017-02-22	1	-0/+58
\| \| \| \|	llvm-svn: 295862
*	[ModuleSummaryAnalysis] Don't crash when referencing unnamed globals.	Davide Italiano	2017-02-22	1	-0/+10
\| \| \| \| \| \| \|	Instead, just be conservative as these are unfrequent enough. Thanks to Peter Collingbourne for the discussion about this on IRC. llvm-svn: 295861
*	[WebAssembly] Implement the wasm binary container header.	Dan Gohman	2017-02-22	2	-0/+11
\| \| \| \| \| \| \|	Also, update the version number to 0x1, which is what engines are now expecting. llvm-svn: 295860
*	MIRTests: Remove unnecessary 2>&1 redirection	Matthias Braun	2017-02-22	69	-69/+69
\| \| \| \| \| \| \|	llc mir output goes to stdout nowadays, so the 2>&1 is not necessary anymore for most tests. llvm-svn: 295859
*	[LoopVectorize] Added address space check when analysing interleaved accesses	Karl-Johan Karlsson	2017-02-22	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858
*	[SLP] Test with extra argument used several times.	Alexey Bataev	2017-02-22	1	-0/+108
\| \| \| \|	llvm-svn: 295853
*	Fix an obvious bug in SampleProfileReaderGCC.	Dehao Chen	2017-02-22	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers. Reviewers: dnovillo, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30233 llvm-svn: 295852
*	[WebAssembly] Configure codegen to legalize f16 values.	Dan Gohman	2017-02-22	1	-0/+28
\| \| \| \|	llvm-svn: 295850
*	[DAGCombiner] revert r295336	Bill Seurer	2017-02-22	8	-55/+162
\| \| \| \| \| \| \| \| \| \| \|	r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849
*	* [AMDGPU][mc][tests] Updated coverage/smoke tests for gfx7 and gfx8; minor ↵	Dmitry Preobrazhensky	2017-02-22	4	-100835/+114665
\| \| \| \| \| \| \|	test corrections. NB: several old tests have been corrected because they violated constant bus limitations llvm-svn: 295834
*	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr	Igor Breger	2017-02-22	3	-0/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824
*	[X86] Regenerate CSE test with codegen instead of just the instruction count	Simon Pilgrim	2017-02-22	1	-2/+37
\| \| \| \|	llvm-svn: 295819
*	[ARM] Fix constant islands pass.	Roger Ferrer Ibanez	2017-02-22	1	-0/+1052
\| \| \| \| \| \| \| \| \| \| \| \|	The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816
*	Write to a temporary file in test instead of random file in the test directory.	Benjamin Kramer	2017-02-22	1	-6/+6
\| \| \| \|	llvm-svn: 295815
*	[ARM] Classification Improvements to ARM Sched-Models. NFCI.	Javed Absar	2017-02-22	1	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811
*	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵	Craig Topper	2017-02-22	2	-16/+36
\| \| \| \| \| \| \| \| \| \| \| \|	when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
*	[ValueTracking] Make poison propagation more aggressive	Sanjoy Das	2017-02-22	3	-17/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Motivation: fix PR31181 without regression (the actual fix is still in progress). However, the actual content of PR31181 is not relevant here. This change makes poison propagation more aggressive in the following cases: 1. poision * Val == poison, for any Val. In particular, this changes existing intentional and documented behavior in these two cases: a. Val is 0 b. Val is 2^k * N 2. poison << Val == poison, for any Val 3. getelementptr is poison if any input is poison I think all of these are justified (and are axiomatically true in the new poison / undef model): 1a: we need poison * 0 to be poison to allow transforms like these: A * (B + C) ==> A * B + A * C If poison * 0 were 0 then the above transform could not be allowed since e.g. we could have A = poison, B = 1, C = -1, making the LHS poison * (1 + -1) = poison * 0 = 0 and the RHS poison * 1 + poison * -1 = poison + poison = poison 1b: we need e.g. poison * 4 to be poison since we want to allow A * 4 ==> A + A + A + A If poison * 4 were a value with all of their bits poison except the last four; then we'd not be able to do this transform since then if A were poison the LHS would only be "partially" poison while the RHS would be "full" poison. 2: Same reasoning as (1b), we'd like have the following kinds transforms be legal: A << 1 ==> A + A Reviewers: majnemer, efriedma Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30185 llvm-svn: 295809
*	AMDGPU: Add cvt.pkrtz intrinsic	Matt Arsenault	2017-02-22	9	-46/+266
\| \| \| \| \| \|	Convert llvm.SI.packf16 test uses llvm-svn: 295797
*	[LoopUnroll] Enable PGO-based loop peeling by default.	Michael Kuperstein	2017-02-22	1	-1/+1
\| \| \| \| \| \| \| \| \|	This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796
*	AMDGPU: Remove some uses of llvm.SI.export in tests	Matt Arsenault	2017-02-22	33	-1065/+952
\| \| \| \| \| \|	Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792
*	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic	Matt Arsenault	2017-02-21	9	-812/+776
\| \| \| \|	llvm-svn: 295789
*	AMDGPU: Redefine clamp node as clamp 0.0-1.0	Matt Arsenault	2017-02-21	3	-4/+550
\| \| \| \| \| \| \| \| \| \| \|	Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
*	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵	Artem Belevich	2017-02-21	8	-36/+964
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
*	[AArch64] Add test case for fusion of literal generation	Evandro Menezes	2017-02-21	1	-0/+46
\| \| \| \| \| \| \|	Add test case from https://reviews.llvm.org/D28698 that was somehow lost in transit. llvm-svn: 295775
*	[AArch64] Add test case for fusion of AES crypto operations	Evandro Menezes	2017-02-21	1	-0/+207
\| \| \| \| \| \| \|	Add test case from https://reviews.llvm.org/D28491 that was somehow lost in transit. llvm-svn: 295774
*	Don't modify archive members unless really needed.	Rafael Espindola	2017-02-21	3	-7/+37
\| \| \| \| \| \| \| \| \| \| \|	For whatever reason ld64 requires that member headers (not the member themselves) should be aligned. The only way to do that is to edit the previous member so that it ends at an aligned boundary. Since modifying data put in an archive is an undesirable property, llvm-ar should only do it when it is absolutely necessary. llvm-svn: 295765
*	Fix PR31896.	Evgeniy Stepanov	2017-02-21	1	-0/+16
\| \| \| \| \| \|	Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762
*	[InstCombine] canonicalize non-obivous forms of integer min/max	Sanjay Patel	2017-02-21	2	-22/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758
*	AMDGPU: Remove dead declarations in tests	Matt Arsenault	2017-02-21	2	-8/+0
\| \| \| \|	llvm-svn: 295757
*	AMDGPU: Remove dead declarations from MIR tests	Matt Arsenault	2017-02-21	3	-48/+5
\| \| \| \|	llvm-svn: 295755
*	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic	Matt Arsenault	2017-02-21	1	-25/+0
\| \| \| \|	llvm-svn: 295754
*	AMDGPU: Don't use stack space for SGPR->VGPR spills	Matt Arsenault	2017-02-21	3	-4/+651
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
*	Teach the IR verifier to reject conflicting debug info for function arguments.	Adrian Prantl	2017-02-21	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \|	Conflicting debug info for function arguments causes hard-to-debug assertions in the DWARF backend, so the Verifier should reject it. For performance reasons this only checks function arguments from non-inlined debug intrinsics for now. rdar://problem/30520286 llvm-svn: 295749
*	[CodeGenPrepare] Sink and duplicate more 'and' instructions.	Geoff Berry	2017-02-21	3	-1/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746
*	Test commit	Dmitry Preobrazhensky	2017-02-21	2	-0/+4
\| \| \| \|	llvm-svn: 295740
*	[X86][AVX512] Update VPBROADCASTQ test to combine from VPERMQ instead of ↵	Simon Pilgrim	2017-02-21	1	-7/+6
\| \| \| \| \| \| \| \|	VPERMI2Q. VPERMI2Q doesn't have shuffle decoding from re-materializable constants. llvm-svn: 295736
*	[X86][AVX] Rename shuffle combine tests to show combined shuffle type. NFCI.	Simon Pilgrim	2017-02-21	2	-9/+9
\| \| \| \|	llvm-svn: 295735
*	[ARM] Correct SP/PC handling in t2MOVr	John Brawn	2017-02-21	1	-0/+100
\| \| \| \| \| \|	Add a missing test that I forgot to svn add in my previous commit llvm-svn: 295734
*	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.	Simon Pilgrim	2017-02-21	1	-4/+2
\| \| \| \| \| \|	As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733
*	[X86][AVX2] Add AVX512 test targets to AVX2 shuffle combines.	Simon Pilgrim	2017-02-21	1	-24/+50
\| \| \| \|	llvm-svn: 295731
*	[X86][AVX] Add tests showing missed VPBROADCASTQ folding on 32-bit targets.	Simon Pilgrim	2017-02-21	2	-0/+54
\| \| \| \| \| \| \| \|	As i64 isn't a value type on 32-bit targets, we fail to fold the VZEXT_LOAD into VPBROADCASTQ. Also shows that we're not decoding VPERMIV3 shuffles very well.... llvm-svn: 295729
*	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.	Simon Pilgrim	2017-02-21	1	-20/+5
\| \| \| \| \| \|	This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723