bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Make i32 uaddo/usubo legal	Matt Arsenault	2017-01-30	2	-58/+175
\| \| \| \|	llvm-svn: 293514
*	DAG: Fold fneg into compare with constant into the constant	Matt Arsenault	2017-01-30	3	-6/+263
\| \| \| \| \| \| \| \|	fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512
*	Revert "AMDGPU/GlobalISel: Add support for simple shaders"	Tom Stellard	2017-01-30	6	-356/+0
\| \| \| \| \| \| \| \|	This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
*	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors ↵	Sanjay Patel	2017-01-30	1	-3/+3
\| \| \| \| \| \|	with splat constants llvm-svn: 293507
*	AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent	Matt Arsenault	2017-01-30	2	-0/+43
\| \| \| \|	llvm-svn: 293504
*	AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	6	-0/+356
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
*	Update pr31758.ll for unreachable revert	Daniel Berlin	2017-01-30	1	-1/+1
\| \| \| \|	llvm-svn: 293502
*	Revert "NewGVN: Make unreachable blocks be marked with unreachable"	Daniel Berlin	2017-01-30	2	-20/+20
\| \| \| \| \| \| \| \| \|	This reverts commit r293196 Besides making things look nicer, ATM, we'd like to preserve analysis more than we'd like to destroy the CFG. We'll probably revisit in the future llvm-svn: 293501
*	[X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with ↵	Simon Pilgrim	2017-01-30	1	-15/+1
\| \| \| \| \| \|	target shuffles llvm-svn: 293500
*	DAG: Constant fold fp16_to_fp/fp16_to_fp	Matt Arsenault	2017-01-30	13	-163/+123
\| \| \| \| \| \| \|	This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499
*	[InstCombine] fixed to propagate 'exact' on lshr	Sanjay Patel	2017-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original shift is bigger, so this may qualify as 'obvious', but here's an attempt at an Alive-based proof: Name: exact Pre: (C1 u< C2) %a = shl i8 %x, C1 %b = lshr exact i8 %a, C2 => %c = lshr exact i8 %x, C2 - C1 %b = and i8 %c, ((1 << width(C1)) - 1) u>> C2 Optimization is correct! llvm-svn: 293498
*	[InstCombine] add 'exact' to lshr to show that it got dropped; NFC	Sanjay Patel	2017-01-30	1	-1/+2
\| \| \| \|	llvm-svn: 293496
*	[Inliner] Fold analysis remarks into missed remarks	Adam Nemet	2017-01-30	2	-4/+2
\| \| \| \| \| \|	This significantly reduces the noise level of these messages. llvm-svn: 293492
*	[InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constants	Sanjay Patel	2017-01-30	1	-4/+3
\| \| \| \|	llvm-svn: 293489
*	[InstCombine] add tests for shift-shift patterns; NFC	Sanjay Patel	2017-01-30	1	-0/+57
\| \| \| \|	llvm-svn: 293487
*	TableGen: Fix infinite recursion in RegisterBankEmitter	Tom Stellard	2017-01-30	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: AMDGPU has two register classes with the same set of registers, and this was causing this tablegen backend would get stuck in infinite recursion. Reviewers: dsanders Reviewed By: dsanders Subscribers: tpr, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D29049 llvm-svn: 293483
*	[X86][MCU] Minor bug fix for r293469 + test case	Asaf Badouh	2017-01-30	1	-0/+14
\| \| \| \|	llvm-svn: 293478
*	AMDGPU: Fix assembler encoding for EXP instructions on VI	Marek Olsak	2017-01-30	1	-30/+58
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28992 llvm-svn: 293476
*	[GlobalISel] Add support for indirectbr	Kristof Beyls	2017-01-30	2	-0/+59
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28079 llvm-svn: 293470
*	[X86][MCU] replace select with bit manipulation instead of branches	Asaf Badouh	2017-01-30	1	-0/+43
\| \| \| \| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469
*	[AVX-512] Remove duplicate CodeGenOnly patterns for scalar register ↵	Craig Topper	2017-01-30	3	-38/+26
\| \| \| \| \| \| \| \|	broadcast. We can use COPY_TO_REGCLASS like AVX does. This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX. llvm-svn: 293464
*	Test RuntimeDyld doesn't crash with R_X86_64_NONE (r293388).	Will Dietz	2017-01-30	1	-0/+30
\| \| \| \| \| \|	Largely based on LLD test for dtrace. llvm-svn: 293451
*	[X86][Disassembler] Added SALC instruction	Chris Ray	2017-01-29	2	-790/+797
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: joe.abbey, craig.topper Reviewed By: craig.topper Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D29201 llvm-svn: 293447
*	[AVX-512] Fix lowering for mask register concatenation with undef in the ↵	Craig Topper	2017-01-29	1	-0/+12
\| \| \| \| \| \| \| \|	lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446
*	[X86][SSE] Lower scalar_to_vector(0) to zero vector	Simon Pilgrim	2017-01-29	3	-89/+31
\| \| \| \| \| \| \| \| \| \|	Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector. The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem. Differential Revision: https://reviews.llvm.org/D29097 llvm-svn: 293438
*	[X86] Reproducer for pr31719. NFC	Zvi Rackover	2017-01-29	1	-3/+32
\| \| \| \|	llvm-svn: 293437
*	[InstCombine] enable (X >>?,exact C1) << C2 --> X << (C2 - C1) for vectors ↵	Sanjay Patel	2017-01-29	1	-6/+2
\| \| \| \| \| \|	with splats llvm-svn: 293435
*	[InstCombine] add tests for shl(shr X, C1), C2 transforms; NFC	Sanjay Patel	2017-01-29	1	-4/+58
\| \| \| \|	llvm-svn: 293434
*	ARM: support `-mlong-calls` with AEABI TLS on ELF	Saleem Abdulrasool	2017-01-29	1	-0/+26
\| \| \| \| \| \| \| \| \| \|	Support lowering AEABI TLS access (__aeabi_read_tp) with long calls. This requires adjusting the call sequence to use an indirect call to get full addressability. Resolves PR31769! llvm-svn: 293433
*	[X86 Codegen] Fixed a bug in unsigned saturation	Elena Demikhovsky	2017-01-29	3	-28/+26
\| \| \| \| \| \| \| \| \| \| \|	PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern. AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned. See https://llvm.org/bugs/show_bug.cgi?id=31773 Differential Revision: https://reviews.llvm.org/D29196 llvm-svn: 293431
*	Add -mtriple=aarch64-unknown to ↵	NAKAMURA Takumi	2017-01-29	1	-1/+1
\| \| \| \| \| \| \| \|	llvm/test/CodeGen/AArch64/GlobalISel/gisel-abort.ll. Unsupported target might be induced if default target is neither macho nor elf. (e.g. *-win32) llvm-svn: 293430
*	[X86][GlobalISel] Add limited argument lowering support to the IRTranslator.	Igor Breger	2017-01-29	2	-0/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add limited (i8/i16/i32/i64) argument lowering support to the IRTranslator. Inspired by commit 289940. Reviewers: t.p.northover, qcolombet, ab, zvi, rovka Reviewed By: rovka Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D28987 llvm-svn: 293427
*	Add some Book-E instructions to the asm parser and printer.	Justin Hibbits	2017-01-29	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following instructions: * mfpmr * mtpmr * icblc * icblq * icbtls Fix the scheduling for mtspr on e5500, which uses CFX0, instead of SFX0/SFX1 as on e500mc. Addresses PR 31538. Differential Revision: https://reviews.llvm.org/D29002 llvm-svn: 293417
*	[X86] Fix vector ANDN matching to work correctly when both inputs to the AND ↵	Craig Topper	2017-01-28	2	-12/+8
\| \| \| \| \| \|	are XORs. llvm-svn: 293403
*	[X86] Add test case that shows failure to use a vector ANDN when both inputs ↵	Craig Topper	2017-01-28	1	-2/+25
\| \| \| \| \| \| \| \|	to the AND are XORs. The matching code tries to canonicalize XOR to the left, but if there are two XORs and only one is a vnot, this canonicalization can prevent matching. llvm-svn: 293402
*	[SLP] Vectorize loads of consecutive memory accesses, accessed in ↵	Mohammad Shahid	2017-01-28	3	-36/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	non-consecutive (jumbled) way. The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask. Reviewers: hfinkel, mssimpso, mkuper Differential Revision: https://reviews.llvm.org/D26905 Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad llvm-svn: 293386
*	[NVPTX] Add intrinsics to support named barriers.	Arpith Chacko Jacob	2017-01-28	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384
*	stripDebugInfo() should remove DILocation's found in !llvm.loop metadata	Daniel Sanders	2017-01-28	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch by Michele Scandale (with a small tweak to 'CHECK-NOT' the last DILocation in the test) Subscribers: bogner, llvm-commits Differential Revision: https://reviews.llvm.org/D27980 llvm-svn: 293377
*	[InstCombine] Merge DebugLoc when speculatively hoisting store instruction	Taewook Oh	2017-01-28	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice. Reviewers: aprantl, andreadb, danielcdh Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29062 llvm-svn: 293372
*	[RegisterBankInfo] Emit proper type for remapped registers.	Quentin Colombet	2017-01-28	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the OperandsMapper creates virtual registers, it used to just create plain scalar register with the right size. This may confuse the instruction selector because we lose the information of the instruction using those registers what supposed to do. The MachineVerifier complains about that already. With this patch, the OperandsMapper still creates plain scalar register, but the expectation is for the mapping function to remap the type properly. The default mapping function has been updated to do that. rdar://problem/30231850 llvm-svn: 293362
*	[RegisterCoalescing] Recommit the patch "Remove partial redundent copy".	Quentin Colombet	2017-01-28	3	-0/+454
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r292621, the recommit fixes a bug related with live interval update after the partial redundent copy is moved. This recommit solves an additional bug related to the lack of update of subranges. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 Re-apply r292621 Revert "Revert rL292621. Caused some internal build bot failures in apple." This reverts commit r292984. Original patch: Wei Mi <wmi@google.com> Subrange fix: Mostly Matthias Braun <matze@braunis.de> llvm-svn: 293353
*	[InstCombine] move icmp transforms that might be recognized as min/max and ↵	Sanjay Patel	2017-01-27	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inf-loop (PR31751) This is a minimal patch to avoid the infinite loop in: https://llvm.org/bugs/show_bug.cgi?id=31751 But the general problem is bigger: we're not canonicalizing all of the min/max forms reported by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own inline definitions which may be subtly different from any of the above. The reason that the test cases in this patch need a cast op to trigger is because we don't (yet) canonicalize all min/max forms based on matchSelectPattern() in canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on matchSelectPattern() in visitSelectInst(). The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so I'm moving those behind the min/max fence in visitICmpInst() as the quick fix. llvm-svn: 293345
*	GlobalISel: set correct regclass for LOAD_STACK_GUARD.	Tim Northover	2017-01-27	1	-1/+1
\| \| \| \| \| \| \|	Since it's not actually a generic MI, its register operands need a RegClass, which is conveniently the target's pointer RegClass. llvm-svn: 293335
*	GlobalISel: mark incoming landing-pad registers as live.	Tim Northover	2017-01-27	1	-1/+1
\| \| \| \| \| \|	Should fix machine verifier failures. llvm-svn: 293334
*	[X86] Adding FFREEP instruction.	Chris Ray	2017-01-27	2	-11808/+11836
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Small change to get the FREEP instruction to decode properly. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29193 llvm-svn: 293314
*	AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands	Matt Arsenault	2017-01-27	2	-26/+54
\| \| \| \| \| \| \| \| \| \| \|	Accomplishes what r292982 was supposed to, which ended up only really making the necessary test changes. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 293310
*	[ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC)	Matthew Simpson	2017-01-27	10	-1027/+1407
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The interleaved access pass is an IR-to-IR transformation that runs before code generation. It matches interleaved memory operations to target-specific intrinsics (that are later lowered to load and store multiple instructions on ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under test/Transforms. This patch moves the InterleavedAccessPass tests out of test/CodeGen and into target-specific directories under test/Transforms/InterleavedAccess. Although the pass is an IR pass, many of the existing tests were llc tests rather opt tests. For example, the tests would check for ldN/stN instructions generated by llc rather than the intrinsic calls the pass actually inserts. Thus, this patch updates all tests to be opt tests that check for the inserted intrinsics. We already have separate CodeGen tests that ensure we lower the interleaved access intrinsics to their corresponding ldN/stN instructions. In addition to migrating the tests to opt, this patch also performs some minor clean-up (to ensure consistent naming, etc.). Differential Revision: https://reviews.llvm.org/D29184 llvm-svn: 293309
*	[CodeGenPrep]No negative cost in the ExtLd promotion	Jun Bum Lim	2017-01-27	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument. Reviewers: mcrosier, jmolloy, qcolombet, javed.absar Reviewed By: mcrosier, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28871 llvm-svn: 293307
*	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass	Stanislav Mekhanoshin	2017-01-27	1	-4/+0
\| \| \| \| \| \| \| \| \|	With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300
*	Fix BasicAA incorrect assumption on GEP	Mehdi Amini	2017-01-27	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is fixing pr31761: BasicAA is deducing NoAlias on the result of the GEP if the base pointer is itself NoAlias. This is possible only if the NoAlias on the base pointer is deduced with a non-sized query: this should guarantee that the pointers are belonging to different memory allocation and that the GEP can't legally jump from one to another. Differential Revision: https://reviews.llvm.org/D29216 llvm-svn: 293293