bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][SSE] Added vector sext_in_reg combine tests	Simon Pilgrim	2016-12-06	1	-0/+58
\| \| \| \|	llvm-svn: 288819
*	[X86] Improve UMAX/UMIN knownbits test	Simon Pilgrim	2016-12-06	1	-36/+8
\| \| \| \| \| \|	Test the sequential effect of each op llvm-svn: 288815
*	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for ↵	Ayman Musa	2016-12-06	4	-8/+1213
\| \| \| \| \| \| \| \| \| \| \| \|	broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804
*	[X86] Add tests to show missed opportunities to calculate knownbits in ↵	Simon Pilgrim	2016-12-06	1	-0/+94
\| \| \| \| \| \|	SMAX/SMIN/UMAX/UMIN llvm-svn: 288801
*	[PowerPC] Improvements for BUILD_VECTOR Vol. 4	Nemanja Ivanovic	2016-12-06	3	-22/+4876
\| \| \| \| \| \| \| \| \| \| \| \|	This is the final patch in the series of patches that improves BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations to remove redundant instructions. It also adds a large test case which encompasses a large set of code patterns that build vectors - this test case was the motivator for this series of patches. Differential Revision: https://reviews.llvm.org/D26066 llvm-svn: 288800
*	[framelowering] Improve tracking of first CS pop instruction.	Florian Hahn	2016-12-06	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated. Reviewers: mkuper, aprantl, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27343 llvm-svn: 288794
*	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.	Craig Topper	2016-12-06	1	-1/+2
\| \| \| \| \| \| \| \|	This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784
*	[X86] Add test case demonstrating a case where a vector sqrt being passed ↵	Craig Topper	2016-12-06	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	(scalar_to_vector loadf64) uses a scalar sqrt instruction. This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok. I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction. I will remove this pattern in a follow up patch. There appears to be no other test content for it. llvm-svn: 288783
*	[X86] Regenerate a test using update_llc_test_checks.py	Craig Topper	2016-12-06	1	-92/+183
\| \| \| \|	llvm-svn: 288782
*	[X86] Remove bad pattern that caused 128-bit loads being used by scalar ↵	Craig Topper	2016-12-06	1	-1/+2
\| \| \| \| \| \| \| \|	sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781
*	[X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector ↵	Craig Topper	2016-12-06	1	-0/+26
\| \| \| \| \| \| \| \|	load using the load form of the sqrtsd instruction which violates the intrinsic semantics. The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd. llvm-svn: 288780
*	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and ↵	Craig Topper	2016-12-06	1	-34/+4
\| \| \| \| \| \| \| \| \| \|	VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779
*	[X86] Remove scalar logical op alias instructions. Just use ↵	Craig Topper	2016-12-06	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771
*	AMDGPU: Don't required structured CFG	Matt Arsenault	2016-12-06	9	-45/+104
\| \| \| \| \| \| \| \| \| \| \|	The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744
*	Summary: Currently there is no way to disable deprecated warning from asm ↵	Weiming Zhao	2016-12-05	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	like this clang -target arm deprecated-asm.s -c deprecated-asm.s:30:9: warning: use of SP or PC in the list is deprecated stmia r4!, {r12-r14} We have to have an option what can disable it. Patched by Yin Ma! Reviewers: joey, echristo, weimingz Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D27219 llvm-svn: 288734
*	GlobalISel: avoid looking too closely at PHIs when we bail.	Tim Northover	2016-12-05	1	-0/+20
\| \| \| \| \| \| \| \|	The function used to finish off PHIs by adding the relevant basic blocks can fail if we're aborting and still don't actually have the needed MachineBasicBlocks. So avoid trying in that case. llvm-svn: 288727
*	GlobalISel: place constants correctly in the entry block.	Tim Northover	2016-12-05	3	-5/+19
\| \| \| \| \| \| \| \| \|	When the entry block was empty after arg lowering, we were always placing constants at the end. This is probably hamrless while translating the same block, but horribly wrong once its terminator has been translated. So switch to inserting at the beginning. llvm-svn: 288720
*	GlobalISel: handle pointer arguments that get assigned to the stack.	Tim Northover	2016-12-05	1	-4/+13
\| \| \| \|	llvm-svn: 288717
*	GlobalISel: translate constants larger than 64 bits.	Tim Northover	2016-12-05	1	-0/+9
\| \| \| \|	llvm-svn: 288713
*	GlobalISel: make G_CONSTANT take a ConstantInt rather than int64_t.	Tim Northover	2016-12-05	8	-45/+45
\| \| \| \| \| \| \| \|	This makes it more similar to the floating-point constant, and also allows for larger constants to be translated later. There's no real functional change in this patch though, just syntax updates. llvm-svn: 288712
*	GlobalISel: improve translation fallback for constants.	Tim Northover	2016-12-05	1	-5/+18
\| \| \| \| \| \| \| \|	Returning 0 (NoReg) from getOrCreateVReg leads to unexpected situations later in the translation. It's better to return a valid (if undefined) register and let the rest of the instruction carry on as planned. llvm-svn: 288709
*	GlobalISel: handle 1-element aggregates during ABI lowering.	Tim Northover	2016-12-05	1	-0/+7
\| \| \| \|	llvm-svn: 288706
*	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register	Michael Kuperstein	2016-12-05	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \|	This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703
*	AMDGPU: Change how exp is printed	Matt Arsenault	2016-12-05	3	-7/+247
\| \| \| \| \| \| \|	This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697
*	AMDGPU: Refactor exp instructions	Matt Arsenault	2016-12-05	3	-5/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
*	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation	Adrian Prantl	2016-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683
*	[TargetLowering] add special-case for demanded bits analysis of 'not'	Sanjay Patel	2016-12-05	1	-16/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get folded into another logic op if the target has those. However, if we can remove a logic instruction by changing the xor's constant mask value, that should always be a win. Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case currently (although that's marked with a FIXME). So if you run this IR through -instcombine, you should get the same end result. I'm hoping to add a different backend transform that will expose this problem though, so I need to solve this first. Differential Revision: https://reviews.llvm.org/D27356 llvm-svn: 288676
*	[x86] fold fand (fxor X, -1) Y --> fandn X, Y	Sanjay Patel	2016-12-05	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \|	I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675
*	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH.	Simon Pilgrim	2016-12-05	5	-28/+16
\| \| \| \|	llvm-svn: 288663
*	[AVX-512] Teach fast isel to handle 512-bit vector bitcasts.	Craig Topper	2016-12-05	1	-0/+244
\| \| \| \|	llvm-svn: 288641
*	[AVX-512] Teach fast isel to use masked compare and movss for handling ↵	Craig Topper	2016-12-05	1	-290/+146
\| \| \| \| \| \|	scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel. llvm-svn: 288636
*	[AVX-512] Add avx512f command lines to fast isel SSE select test.	Craig Topper	2016-12-05	1	-0/+314
\| \| \| \| \| \|	Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this. llvm-svn: 288635
*	[X86][XOP] Add target shuffle tests showing missing UNPCKL combine.	Simon Pilgrim	2016-12-04	1	-0/+14
\| \| \| \|	llvm-svn: 288628
*	[X86][AVX512] Add target shuffle tests showing missing UNPCK combines.	Simon Pilgrim	2016-12-04	2	-0/+64
\| \| \| \|	llvm-svn: 288627
*	[AVR] Remove 'XFAIL' from a CodeGen test	Dylan McKay	2016-12-04	1	-1/+0
\| \| \| \| \| \|	This seems to be fixed as of r288052. llvm-svn: 288618
*	DAG: Fold out out of bounds insert_vector_elt	Matt Arsenault	2016-12-03	2	-28/+35
\| \| \| \| \| \| \|	getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603
*	[AVX-512] Add many of the VPERM instructions to the load folding table. Move ↵	Craig Topper	2016-12-03	2	-1/+266
\| \| \| \| \| \|	VPERMPDZri to the correct table. llvm-svn: 288591
*	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables.	Craig Topper	2016-12-03	2	-0/+189
\| \| \| \|	llvm-svn: 288587
*	[X86] Fix VEX encoded VPMADDUBSW to not be marked commutable.	Craig Topper	2016-12-03	2	-2/+4
\| \| \| \| \| \|	This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241. llvm-svn: 288578
*	[X86] Add test cases demonstrating where we incorrectly commute VEX ↵	Craig Topper	2016-12-03	2	-16/+54
\| \| \| \| \| \| \| \|	VPMADDUSBW due to a bug introduced in r285515. I believe this is the cause of PR31241. llvm-svn: 288577
*	testcase only works in a debug build	Matthias Braun	2016-12-03	1	-0/+1
\| \| \| \|	llvm-svn: 288567
*	AArch64CollectLOH: Rewrite as block-local analysis.	Matthias Braun	2016-12-03	4	-9/+191
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 288561
*	[lanai] Custom lowering of SHL_PARTS	Jacques Pienaar	2016-12-02	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement custom lowering of SHL_PARTS to enable lowering of left shift with larger than 32-bit shifts. Reviewers: eliben, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27232 llvm-svn: 288541
*	[SystemZ] Support floating-point control register instructions	Ulrich Weigand	2016-12-02	1	-0/+67
\| \| \| \| \| \| \| \| \| \|	Add assembler support for instructions manipulating the FPC. Also add codegen support via the GCC compatibility builtins: __builtin_s390_sfpc __builtin_s390_efpc llvm-svn: 288525
*	[x86] add common check prefix to reduce duplication; NFC	Sanjay Patel	2016-12-02	1	-17/+9
\| \| \| \|	llvm-svn: 288522
*	fix check-label	Sanjay Patel	2016-12-02	1	-1/+1
\| \| \| \|	llvm-svn: 288517
*	[x86] add tests to show missing demanded bits analysis; NFC	Sanjay Patel	2016-12-02	1	-0/+53
\| \| \| \|	llvm-svn: 288515
*	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default	Nicolai Haehnle	2016-12-02	3	-142/+379
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506
*	[X86][SSE] Renamed shuffle combine test.	Simon Pilgrim	2016-12-02	1	-6/+6
\| \| \| \| \| \|	We're trying to combine to vpunpckhbw not vpunpckhwd llvm-svn: 288501
*	[X86][SSE] Add support for extracting constant bit data from broadcasted ↵	Simon Pilgrim	2016-12-02	3	-22/+13
\| \| \| \| \| \|	constants llvm-svn: 288499