bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add a testcase for r275581	David Majnemer	2016-07-19	1	-0/+28
\| \| \| \|	llvm-svn: 276002
*	add tests related to PR28466	Sanjay Patel	2016-07-19	1	-1/+60
\| \| \| \|	llvm-svn: 275995
*	[X86][AVX512] Added AVX512 subvector broadcast tests	Simon Pilgrim	2016-07-19	2	-0/+464
\| \| \| \|	llvm-svn: 275994
*	[X86][AVX] Fixed typo in test names	Simon Pilgrim	2016-07-19	2	-6/+6
\| \| \| \|	llvm-svn: 275992
*	add missing test for simplifySelectBitTest()	Sanjay Patel	2016-07-19	1	-0/+14
\| \| \| \|	llvm-svn: 275990
*	[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))	Tobias Grosser	2016-07-19	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one: > Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp. Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check: `if ((!isa<ICmpInst>(Cast0Src) \|\| !isa<ICmpInst>(Cast1Src)) && ...` This check seems to sort out more cases than necessary because: - the reverse transformation is obviously done for `or` instructions only - and also not every `zext icmp` pair is necessarily the result of this reverse transformation Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`). As an example, consider the following IR snippet ``` %1 = icmp sgt i64 %a, %b %2 = zext i1 %1 to i8 %3 = icmp slt i64 %a, %c %4 = zext i1 %3 to i8 %5 = and i8 %2, %4 ``` which would now be transformed to ``` %1 = icmp sgt i64 %a, %b %2 = icmp slt i64 %a, %c %3 = and i1 %1, %2 %4 = zext i1 %3 to i8 ``` This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code. Reviewers: grosser, vtjnash, majnemer Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22511 Contributed-by: Matthias Reisinger llvm-svn: 275989
*	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using ↵	Simon Pilgrim	2016-07-19	10	-70/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981
*	[ARM] Refactor Thumb2 Mul and Mla instr descs	Sam Parker	2016-07-19	1	-4/+26
\| \| \| \| \| \| \| \| \| \| \|	Recommitting after r274347 was reverted. This patch introduces some classes to refactor the 3 and 4 register Thumb2 multiplication instruction descriptions, plus improved tests for some of those instructions. Differential Revision: https://reviews.llvm.org/D21929 llvm-svn: 275979
*	Add support for tlsldm assembler operator to ARM target	Peter Smith	2016-07-19	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The standard local dynamic model for TLS on ARM systems needs two relocations: - R_ARM_TLS_LDM32 (module idx) - R_ARM_TLS_LDO32 (offset of object from origin of module TLS block) In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to produce these relocations. llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm). This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32. TLS for ARM is defined in Addenda to, and Errata in, the ABI for the ARM Architecture Differential Revision: https://reviews.llvm.org/D22461 llvm-svn: 275977
*	[AARCH64] Fix linu triple typo	Simon Pilgrim	2016-07-19	1	-1/+1
\| \| \| \| \| \|	As promised in D22191 llvm-svn: 275976
*	[AARCH64] Enable AARCH64 lit tests on windows dev machines	Simon Pilgrim	2016-07-19	177	-212/+193
\| \| \| \| \| \| \| \| \| \|	As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows This will hopefully help stop cases where windows developers break the aarch64 target Differential Revision: https://reviews.llvm.org/D22191 llvm-svn: 275973
*	[mips][ias] R_MIPS_GOT_(PAGE\|OFST) do not need symbols	Daniel Sanders	2016-07-19	1	-5/+31
\| \| \| \| \| \| \| \| \| \|	Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22458 llvm-svn: 275968
*	[mips] Correct label prefixes for N32 and N64.	Daniel Sanders	2016-07-19	31	-271/+381
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own ($). This fixes the majority of object differences between -fintegrated-as and -fno-integrated-as. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22412 llvm-svn: 275967
*	AVX-512: Fixed BT instruction selection.	Elena Demikhovsky	2016-07-19	1	-452/+80
\| \| \| \| \| \| \| \| \| \| \|	The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950
*	[AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.	Craig Topper	2016-07-19	1	-2/+2
\| \| \| \|	llvm-svn: 275942
*	[MemorySSA] Update to the new shiny walker.	George Burgess IV	2016-07-19	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates MemorySSA's use-optimizing walker to be more accurate and, in some cases, faster. Essentially, this changed our core walking algorithm from a cache-as-you-go DFS to an iteratively expanded DFS, with all of the caching happening at the end. Said expansion happens when we hit a Phi, P; we'll try to do the smallest amount of work possible to see if optimizing above that Phi is legal in the first place. If so, we'll expand the search to see if we can optimize to the next phi, etc. An iteratively expanded DFS lets us potentially quit earlier (because we don't assume that we can optimize above all phis) than our old walker. Additionally, because we don't cache as we go, we can now optimize above loops. As an added bonus, this patch adds a ton of verification (if EXPENSIVE_CHECKS are enabled), so finding bugs is easier. Differential Revision: https://reviews.llvm.org/D21777 llvm-svn: 275940
*	Retry: [llvm-profdata] Speed up merging by using a thread pool	Vedant Kumar	2016-07-19	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Changes since the initial commit: - When handling odd-length inputs, call ThreadPool::wait() before merging the last profile. Should fix a race/off-by-one (see r275937). Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275938
*	Revert "[llvm-profdata] Speed up merging by using a thread pool"	Vedant Kumar	2016-07-19	1	-40/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r275921. It broke the ppc64be bot: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537 I'm not sure why it broke, but based on the output, it looks like an off-by-one (one profile left un-merged). llvm-svn: 275937
*	Recommit the patch "Use uniforms set to populate VecValuesToIgnore".	Wei Mi	2016-07-19	5	-17/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936
*	AMDGPU: Expand register indexing pseudos in custom inserter	Matt Arsenault	2016-07-19	3	-412/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
*	[LoopReroll] Reroll loops with unordered atomic memory accesses	Sanjoy Das	2016-07-19	1	-0/+131
\| \| \| \| \| \| \| \| \| \|	Reviewers: hfinkel, jfb, reames Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22385 llvm-svn: 275932
*	AMDGPU: Fix test name and broken CHECK-LABEL	Matt Arsenault	2016-07-18	1	-6/+3
\| \| \| \|	llvm-svn: 275928
*	[llvm-profdata] Speed up merging by using a thread pool	Vedant Kumar	2016-07-18	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275921
*	[NVPTX] Make sure we adjust alignment at all call sites	Artem Belevich	2016-07-18	1	-0/+11
\| \| \| \| \| \| \|	.. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920
*	[PM] Convert Loop Strength Reduce pass to new PM	Dehao Chen	2016-07-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Convert Loop String Reduce pass to new PM Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22468 llvm-svn: 275919
*	[PM] Port FunctionImport Pass to new PM	Teresa Johnson	2016-07-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Port FunctionImport Pass to new PM. Reviewers: mehdi_amini, davide Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D22475 llvm-svn: 275916
*	Revert rL275912.	Wei Mi	2016-07-18	4	-93/+14
\| \| \| \|	llvm-svn: 275915
*	Use uniforms set to populate VecValuesToIgnore.	Wei Mi	2016-07-18	4	-14/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912
*	add tests for missed sext transform	Sanjay Patel	2016-07-18	1	-0/+26
\| \| \| \|	llvm-svn: 275908
*	auto-generate checks	Sanjay Patel	2016-07-18	1	-39/+64
\| \| \| \|	llvm-svn: 275899
*	[NVPTX] Force minimum alignment of 4 for byval arguments of device-side ↵	Artem Belevich	2016-07-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893
*	[LoopSimplify] Update LCSSA after separating nested loops.	Michael Zolotukhin	2016-07-18	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Usually LCSSA survives this transformation, but in some cases (see attached test) it doesn't: values from the original loop after separating might be used from the outer loop. Before the transformation it was the same loop, so LCSSA phis were not required. This fixes PR28272. Reviewers: sanjoy, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21665 llvm-svn: 275891
*	Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"	Vitaly Buka	2016-07-18	1	-11/+0
\| \| \| \| \| \| \| \|	Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275776. llvm-svn: 275890
*	Revert "[ARM] Update test to use CHECK-LABEL. NFCI."	Vitaly Buka	2016-07-18	1	-8/+6
\| \| \| \| \| \| \| \|	Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275777. llvm-svn: 275889
*	[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA ↵	Michael Zolotukhin	2016-07-18	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	form. Summary: SSAUpdate might insert PHI-nodes inside loops, which can break LCSSA form unless we fix it up. This fixes PR28424. Reviewers: sanjoy, chandlerc, hfinkel Subscribers: uabelho, llvm-commits Differential Revision: http://reviews.llvm.org/D21997 llvm-svn: 275883
*	[X86][SSE] Regenerate extraction from promotion test	Simon Pilgrim	2016-07-18	1	-6/+21
\| \| \| \| \| \|	Added tests for SSE2 as well as SSE41 llvm-svn: 275878
*	[X86][SSE] Regenerate extraction+store memop tests	Simon Pilgrim	2016-07-18	1	-24/+90
\| \| \| \| \| \|	Added tests for SSE2 as well as SSE41+AVX llvm-svn: 275876
*	[X86][SSE] Regenerate truncate+extension memop tests	Simon Pilgrim	2016-07-18	1	-42/+123
\| \| \| \| \| \|	Added tests for SSE2 as well as SSE41 llvm-svn: 275875
*	Regenerate test	Simon Pilgrim	2016-07-18	1	-3/+3
\| \| \| \|	llvm-svn: 275872
*	AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32	Matt Arsenault	2016-07-18	2	-28/+54
\| \| \| \|	llvm-svn: 275871
*	AMDGPU/R600: Replace barrier intrinsics	Matt Arsenault	2016-07-18	8	-174/+200
\| \| \| \|	llvm-svn: 275870
*	AMDGPU: Remove dead check in AMDGPUPromoteAlloca	Matt Arsenault	2016-07-18	2	-34/+120
\| \| \| \| \| \| \| \| \| \|	This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869
*	CodeGenPrep: use correct function to determine Global's alignment.	Tim Northover	2016-07-18	1	-0/+9
\| \| \| \| \| \| \| \| \|	Elsewhere (particularly computeKnownBits) we assume that a global will be aligned to the value returned by Value::getPointerAlignment. This is used to boost the alignment on memcpy/memset, so any target-specific request can only increase that value. llvm-svn: 275866
*	[llvm-cov] Place anchors around line numbers in html reports	Vedant Kumar	2016-07-18	2	-55/+55
\| \| \| \| \| \|	Based on a suggestion by Harlan Haskins! llvm-svn: 275840
*	[X86][AVX] Add target shuffle decode support for VBROADCAST	Simon Pilgrim	2016-07-18	1	-4/+0
\| \| \| \| \| \|	Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823
*	[Hexagon] Handle returning small structures by value	Krzysztof Parzyszek	2016-07-18	1	-0/+18
\| \| \| \| \| \| \|	This is compliant with the official ABI, but allows experimentation with calling conventions. llvm-svn: 275822
*	[X86] Accept SELECT op code for x86-64 fp128 type	Chih-Hung Hsieh	2016-07-18	1	-0/+35
\| \| \| \| \| \| \| \| \| \|	DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow SELECT op code for x86_64 fp128 type for MME targets, so SoftenFloatOperand does not abort on SELECT op code. Differential Revision: http://reviews.llvm.org/D21758 llvm-svn: 275818
*	[LoopDist] This test does not require ASSERTS	Adam Nemet	2016-07-18	1	-2/+0
\| \| \| \| \| \| \|	Only its counterpart, diagnostics-with-hotness-lazy-BFI.ll, which invokes opt with -debug-only=. llvm-svn: 275812
*	[LoopDist] Port to new PM	Adam Nemet	2016-07-18	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The direct motivation for the port is to ensure that the OptRemarkEmitter tests work with the new PM. This remains a function pass because we not only create multiple loops but could also version the original loop. In the test I need to invoke opt with -passes='require<aa>,loop-distribute'. LoopDistribute does not directly depend on AA however LAA does. LAA uses getCachedResult so I think we need manually pull in 'aa'. Reviewers: davidxl, silvas Subscribers: sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22437 llvm-svn: 275811
*	[X86][AVX2] Added tests that demonstrate duplicate broadcasts	Simon Pilgrim	2016-07-18	1	-0/+60
\| \| \| \| \| \|	We don't yet decode broadcasts as a target shuffle llvm-svn: 275808