bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to ↵	Craig Topper	2016-11-07	2	-32/+33
\| \| \| \| \| \|	legacy intrinsics and a select. llvm-svn: 286089
*	ARM: lower fpowi appropriately for Windows ARM	Saleem Abdulrasool	2016-11-06	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \|	This handles the last case of the builtin function calls that we would generate code which differed from Microsoft's ABI. Rather than generating a call to `__pow{d,s}i2` we now promote the parameter to a float or double and invoke `powf` or `pow` instead. Addresses PR30825! llvm-svn: 286082
*	[SelectionDAG] Add support for vector demandedelts in XOR opcodes	Simon Pilgrim	2016-11-06	1	-10/+2
\| \| \| \|	llvm-svn: 286075
*	[X86] Add knownbits vector xor test	Simon Pilgrim	2016-11-06	1	-0/+31
\| \| \| \| \| \|	In preparation for demandedelts support llvm-svn: 286074
*	[AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead ↵	Craig Topper	2016-11-06	2	-215/+215
\| \| \| \| \| \|	upgrade them to a select and the older AVX2 intrinsic. llvm-svn: 286073
*	[AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. ↵	Craig Topper	2016-11-06	4	-325/+244
\| \| \| \| \| \|	Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286072
*	[SelectionDAG] Add support for vector demandedelts in OR opcodes	Simon Pilgrim	2016-11-06	1	-10/+2
\| \| \| \|	llvm-svn: 286071
*	[AVX-512] Remove intrinsics for 128/256-bit masked shift by single element ↵	Craig Topper	2016-11-06	4	-303/+304
\| \| \| \| \| \|	in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286070
*	[AVX-512] Remove a 512-bit test cases from the avx512vl test file. It ↵	Craig Topper	2016-11-06	1	-20/+0
\| \| \| \| \| \|	already exists in the avx512f test file. llvm-svn: 286069
*	[X86] Add knownbits vector or test	Simon Pilgrim	2016-11-06	1	-0/+31
\| \| \| \| \| \|	In preparation for demandedelts support llvm-svn: 286068
*	[X86] Add a few more fptoui test cases to the vec_fp_to_int.ll. The codegen ↵	Craig Topper	2016-11-06	1	-0/+135
\| \| \| \| \| \|	for these test cases will be improved for AVX512 in a future commit. llvm-svn: 286063
*	[AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 ↵	Craig Topper	2016-11-06	1	-2/+2
\| \| \| \| \| \|	addr:)) -> VCVTPS2PDZ128rm llvm-svn: 286059
*	[AVX-512] Add avx512vl command line to the fpext test and add ↵	Craig Topper	2016-11-06	1	-80/+146
\| \| \| \| \| \|	-show-mc-encoding to show where we aren't using EVEX instructions. llvm-svn: 286058
*	[AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX ↵	Craig Topper	2016-11-06	1	-1/+1
\| \| \| \| \| \|	instruction when available. llvm-svn: 286057
*	[AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so ↵	Craig Topper	2016-11-06	2	-12/+27
\| \| \| \| \| \|	they can use EVEX instructions when available. llvm-svn: 286056
*	[AVX-512] Add -show-mc-encoding to legacy vector intrinsic tests so we can ↵	Craig Topper	2016-11-06	8	-4060/+4018
\| \| \| \| \| \|	see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line. llvm-svn: 286055
*	[LoopStrengthReduce] Don't use a DenseSet<int64_t> when we might add any ↵	Justin Lebar	2016-11-05	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	valid int64_t to the set. Summary: SmallSetVector uses DenseSet, but that means we need to reserve some values for the empty and tombstone keys. It seems to me we should have a general way to let us store full-range ints inside of DenseSets, and furthermore that we probably shouldn't silently let you add ints into DenseSets without explicitly promising that they're in range. But that's a battle for another day; for now, just fix this code, since we currently do something Very Bad when compiling ffmpeg. Fixes PR30914. Reviewers: jeremyhu Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D26323 llvm-svn: 286038
*	[Hexagon] Account for <def,read-undef> when validating moves for predication	Krzysztof Parzyszek	2016-11-04	1	-0/+41
\| \| \| \|	llvm-svn: 286009
*	[X86] Broadcast from memory intructions aren't unfoldable	Zvi Rackover	2016-11-04	1	-1/+0
\| \| \| \| \| \| \| \|	Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998
*	Add bugpoint-reduced reproducer for pr30693	Zvi Rackover	2016-11-04	1	-0/+148
\| \| \| \|	llvm-svn: 285997
*	Revert "AMDGPU: Add VI i16 support"	Tom Stellard	2016-11-04	27	-1081/+263
\| \| \| \| \| \|	This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
*	[Cortex-M0] Atomic lowering	Weiming Zhao	2016-11-03	1	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls. Reviewers: rengolin, efriedma, t.p.northover, jmolloy Subscribers: efriedma, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26120 llvm-svn: 285969
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-03	27	-263/+1081
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
*	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.	Alexander Timofeev	2016-11-03	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919
*	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"	James Molloy	2016-11-03	7	-93/+21
\| \| \| \| \| \|	This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912
*	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	James Molloy	2016-11-03	7	-21/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893
*	[AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors.	Craig Topper	2016-11-03	2	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR. Reviewers: delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26134 llvm-svn: 285878
*	Expandload and Compressstore intrinsics	Elena Demikhovsky	2016-11-03	1	-0/+247
\| \| \| \| \| \| \| \|	2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876
*	[Hexagon] Remove registers coalesced in expand-condsets from live intervals	Krzysztof Parzyszek	2016-11-02	1	-0/+49
\| \| \| \|	llvm-svn: 285846
*	AMDGPU: Cleanup some xfailed tests	Matt Arsenault	2016-11-02	3	-52/+10
\| \| \| \| \| \|	Some of these are already fixed or tested somewhere else. llvm-svn: 285840
*	AMDGPU: Allow additional implicit operands on MOVRELS instructions	Nicolai Haehnle	2016-11-02	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835
*	BranchRelaxation: Fix computing indirect branch block size	Matt Arsenault	2016-11-02	1	-0/+54
\| \| \| \|	llvm-svn: 285828
*	AMDGPU: Use brev for materializing SGPR constants	Matt Arsenault	2016-11-01	6	-11/+73
\| \| \| \| \| \|	This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765
*	AMDGPU: Default to using scalar mov to materialize immediate	Matt Arsenault	2016-11-01	3	-12/+47
\| \| \| \| \| \| \| \| \| \| \| \|	This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762
*	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32	Konstantin Zhuravlyov	2016-11-01	2	-87/+81
\| \| \| \| \| \| \| \| \| \| \|	This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
*	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64	Tom Stellard	2016-11-01	2	-19/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704
*	This is a 1 character fix for an ARM build attribute test (r284571): the	Sjoerd Meijer	2016-11-01	1	-1/+1
\| \| \| \| \| \| \|	purpose of the test was to have 2 different function attribute sets, but due to a typo there was only one both with number #0. llvm-svn: 285701
*	[Sparc][LEON] Test for FixFDIVSQRT erratum fix.	Chris Dewhurst	2016-11-01	1	-0/+59
\| \| \| \| \| \| \| \|	Note: Test is per differential review, but the other changed code in the review was for an optimisation that din't quite work. Nevertheless, the test is valid for the unoptimised version of the fix. Differential Review: https://reviews.llvm.org/D24658 llvm-svn: 285692
*	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables	James Molloy	2016-11-01	6	-23/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690
*	[AMDGPU] Expand vector mulhu/mulhs	Valery Pykhtin	2016-11-01	2	-0/+26
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684
*	[PowerPC] Implement vector shift builtins - llvm portion	Nemanja Ivanovic	2016-11-01	1	-0/+23
\| \| \| \| \| \| \|	This patch corresponds to review https://reviews.llvm.org/D26095. Committing on behalf of Tony Jiang. llvm-svn: 285681
*	[DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded ↵	Sanjay Patel	2016-10-31	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bits (PR30841) This bug was exposed by using nsw/nuw for more aggressive folds in: https://reviews.llvm.org/rL284844 The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(), but we can't just flip flag bits in the DAG; we have to create a new node that has the bits cleared. This should fix: https://llvm.org/bugs/show_bug.cgi?id=30841 llvm-svn: 285656
*	CodeGen: further loosen -O0 CG for WoA division	Saleem Abdulrasool	2016-10-31	3	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649
*	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	Justin Lebar	2016-10-31	4	-21/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642
*	[PPC] add absolute difference altivec instructions and matching intrinsics	Nemanja Ivanovic	2016-10-31	1	-0/+40
\| \| \| \| \| \| \|	This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627
*	GlobalISel: allow truncating pointer casts on AArch64.	Tim Northover	2016-10-31	1	-2/+19
\| \| \| \|	llvm-svn: 285615
*	GlobalISel: translate stack protector intrinsics	Tim Northover	2016-10-31	1	-0/+20
\| \| \| \|	llvm-svn: 285614
*	[Hexagon] Don't expand mux instructions with both sources identical	Krzysztof Parzyszek	2016-10-31	1	-0/+32
\| \| \| \|	llvm-svn: 285588
*	Add triple to test so it does not fail on windows.	Manuel Klimek	2016-10-31	1	-1/+1
\| \| \| \|	llvm-svn: 285560
*	Delete .s file that did not test anything, and check in test that works.	Manuel Klimek	2016-10-31	2	-20/+27
\| \| \| \| \| \| \|	In D26098, Davide Italiano submitted a .s file instead of the .ll file that was the last stage of the review. llvm-svn: 285559