bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[x86] add RUN for target before roundss; NFC	Sanjay Patel	2018-03-27	1	-30/+232
\| \| \| \|	llvm-svn: 328601
*	[x86] add tests for ftrunc; NFC	Sanjay Patel	2018-03-26	2	-12/+404
\| \| \| \|	llvm-svn: 328592
*	Fix newlines. NFCI.	Simon Pilgrim	2018-03-26	4	-18305/+18305
\| \| \| \|	llvm-svn: 328583
*	[X86] Add WriteCRC32 scheduler class	Simon Pilgrim	2018-03-26	1	-6/+6
\| \| \| \| \| \| \| \|	Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582
*	Use local symbols for creating .stack-size.	Rafael Espindola	2018-03-26	3	-7/+14
\| \| \| \|	llvm-svn: 328581
*	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32	Reid Kleckner	2018-03-26	8	-2/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570
*	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵	Simon Pilgrim	2018-03-26	4	-18305/+18305
\| \| \| \| \| \| \| \| \| \| \| \|	(PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
*	[Hexagon] Add more lit tests	Krzysztof Parzyszek	2018-03-26	12	-0/+1367
\| \| \| \|	llvm-svn: 328561
*	[Power9]Legalize and emit code for quad-precision convert from double-precision	Lei Huang	2018-03-26	1	-1/+29
\| \| \| \| \| \| \| \| \|	Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558
*	[PowerPC] Infrastructure work. Implement getting the opcode for a spill in ↵	Stefan Pintilie	2018-03-26	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556
*	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs	Simon Pilgrim	2018-03-26	2	-16/+16
\| \| \| \| \| \|	We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
*	[Pipeliner] Add missing loop carried dependences	Krzysztof Parzyszek	2018-03-26	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pipeliner is not adding a dependence edge for a loop carried dependence, and ends up scheduling a load from iteration n prior to an aliased store in iteration n-1. The code that adds the loop carried dependences in the pipeliner doesn't check if the memory objects for loads and stores are "identified" (i.e., distinct) objects. If they are not, then the code that adds the dependences needs to be conservative. The objects can be used to check dependences only when they are distinct objects. The code that checks for loop carried dependences has been updated to classify loads and stores that are not identified as "unknown" values. A store with an "unknown" value can potentially create a loop carried dependence with any pending load. Patch by Brendon Cahoon. llvm-svn: 328547
*	[Pipeliner] Use latency to compute RecMII	Krzysztof Parzyszek	2018-03-26	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542
*	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs	Simon Pilgrim	2018-03-26	1	-8/+8
\| \| \| \|	llvm-svn: 328541
*	[Pipeliner] Fix assert caused by pipeliner serialization	Krzysztof Parzyszek	2018-03-26	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pipeliner is asserting because the serialization step that occurs at the end is deleting an instruction. The assert occurs later on because there is a use without a definition. The problem occurs when an instruction defines a value used by a REQ_SEQUENCE and that value is used by a COPY instruction. The latencies between these instructions are zero, so they are put in to the same packet. The serialization code is unable to handle this correctly, and ends up putting the REG_SEQUENCE before its definition. There is special code in the serialization step that attempts to handle zero-cost instructions (phis, copy, reg_sequence) differently than regular instructions. Unfortunately, this means the order does not come out correct. This patch simplifies the code by changing the seperate steps for handling zero-cost and regular instructions. Only phis are handled separate now, since they should occurs first. Then, this patch adds checks to make use the MoveUse is set to the smallest value if there are multiple uses in a cycle. Patch by Brendon Cahoon. llvm-svn: 328540
*	[Pipeliner] Fix check for order dependences when finalizing instructions	Krzysztof Parzyszek	2018-03-26	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in orderDepdences that looks at the order dependences between instructions was processing all the successor and predecessor order dependences. However, we really only want to check for an order dependence for instructions scheduled in the same cycle. Also, fixed how the pipeliner handles output dependences. An output dependence is also a potential loop carried dependence. The pipeliner didn't handle this case properly so an invalid schedule could be created that allowed an output dependence to be scheduled in the next iteration at the same cycle. Patch by Brendon Cahoon. llvm-svn: 328516
*	[Pipeliner] Fix in the pipeliner phi reuse code	Krzysztof Parzyszek	2018-03-26	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \|	When the definition of a phi is used by a phi in the next iteration, the pipeliner was assuming that the definition is processed first. Because of the assumption, an incorrect phi name was used. This patch has a check to see if the phi definition has been processed already. Patch by Brendon Cahoon. llvm-svn: 328510
*	[Pipeliner] Correctly update memoperands in the epilog	Krzysztof Parzyszek	2018-03-26	1	-0/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pipeliner needs to be conservative when updating the memoperands of instructions in the epilog. Previously, the pipeliner was changing the offset of the memoperand based upon the scheduling stage. However, that is incorrect when control flow branches around the kernel code. The bug enabled a load and store to the same stack offset to be swapped. This patch fixes the bug by updating the size of the memoperands to be UINT_MAX. This conservative value means that dependences will be created between other loads and stores. Patch by Brendon Cahoon. llvm-svn: 328508
*	[Hexagon] Give priority to post-incremementing memory accesses in LSR	Krzysztof Parzyszek	2018-03-26	4	-43/+47
\| \| \| \|	llvm-svn: 328506
*	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs	Simon Pilgrim	2018-03-26	2	-32/+32
\| \| \| \| \| \| \| \|	Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
*	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs	Simon Pilgrim	2018-03-26	1	-12/+12
\| \| \| \| \| \|	These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
*	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs	Simon Pilgrim	2018-03-26	1	-8/+8
\| \| \| \| \| \|	The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
*	[X86][Btver2] Double the AGU and schedule pipe resources for YMM	Simon Pilgrim	2018-03-26	1	-15/+15
\| \| \| \| \| \|	Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491
*	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead ↵	Hans Wennborg	2018-03-26	7	-76/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482
*	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.	Craig Topper	2018-03-26	2	-36/+36
\| \| \| \|	llvm-svn: 328474
*	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the ↵	Craig Topper	2018-03-26	3	-62/+62
\| \| \| \| \| \| \| \|	SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473
*	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss ↵	Craig Topper	2018-03-26	2	-8/+8
\| \| \| \| \| \|	instructions. llvm-svn: 328472
*	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.	Craig Topper	2018-03-26	1	-3/+4
\| \| \| \| \| \|	They were backwards. llvm-svn: 328469
*	[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 ↵	Craig Topper	2018-03-25	1	-2/+2
\| \| \| \| \| \| \| \|	for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465
*	[RISCV] Use init_array instead of ctors for RISCV target, by default	Mandeep Singh Grang	2018-03-24	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LLVM defaults to the newer .init_array/.fini_array scheme for static constructors rather than the less desirable .ctors/.dtors (the UseCtors flag defaults to false). This wasn't being respected in the RISC-V backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate flag for UseInitArray. This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call InitializeELF(TM.Options.UseInitArray). Reviewers: asb, apazos Reviewed By: asb Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits Differential Revision: https://reviews.llvm.org/D44750 llvm-svn: 328433
*	[X86][AES] Ensure we're testing both non-VEX/VEX variants of AES ↵	Simon Pilgrim	2018-03-24	1	-7/+321
\| \| \| \| \| \| \| \|	instructions on AVX targets Add skylake server tests as well llvm-svn: 328424
*	[X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE ↵	Simon Pilgrim	2018-03-24	6	-112/+12529
\| \| \| \| \| \| \| \|	instructions on AVX targets And ensure we don't use later instruction sets in SSE schedule tests llvm-svn: 328423
*	[X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule tests	Simon Pilgrim	2018-03-24	1	-9/+9
\| \| \| \|	llvm-svn: 328421
*	[X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule tests	Simon Pilgrim	2018-03-24	1	-9/+13
\| \| \| \|	llvm-svn: 328420
*	[X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes	Craig Topper	2018-03-24	1	-18/+16
\| \| \| \| \| \| \| \|	These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them. Differential Revision: https://reviews.llvm.org/D44375 llvm-svn: 328405
*	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32	Reid Kleckner	2018-03-23	7	-2/+76
\| \| \| \| \| \| \|	Both GCC and MSVC only look at the low byte of a boolean when it is passed. llvm-svn: 328386
*	[Hexagon] Boost profit for word-mask immediates, reduce for others	Krzysztof Parzyszek	2018-03-23	2	-0/+145
\| \| \| \| \| \|	This avoids unnecessary splitting due to uninteresting immediates. llvm-svn: 328364
*	[Hexagon] Fold offset in base+immediate loads/stores	Krzysztof Parzyszek	2018-03-23	1	-0/+55
\| \| \| \| \| \| \| \|	Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz => memw(Rx,#n) = Rz. Patch by Jyotsna Verma. llvm-svn: 328355
*	[AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU	Tony Tye	2018-03-23	2	-7/+7
\| \| \| \| \| \| \| \|	Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44697 llvm-svn: 328351
*	[AMDGPU] Remove use of OpenCL triple environment and replace with function ↵	Tony Tye	2018-03-23	2	-20/+137
\| \| \| \| \| \| \| \| \| \| \|	attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349
*	[Hexagon] Always generate mux out of predicated transfers if possible	Krzysztof Parzyszek	2018-03-23	6	-3/+64
\| \| \| \| \| \| \| \| \| \| \| \|	HexagonGenMux would collapse pairs of predicated transfers if it assumed that the predicated .new forms cannot be created. Turns out that generating mux is preferable in almost all cases. Introduce an option -hexagon-gen-mux-threshold that controls the minimum distance between the instruction defining the predicate and the later of the two transfers. If the distance is closer than the threshold, mux will not be generated. Set the threshold to 0 by default. llvm-svn: 328346
*	[Hexagon] Avoid early if-conversion for one sided branches	Krzysztof Parzyszek	2018-03-23	1	-0/+80
\| \| \| \| \| \|	Patch by Anand Kodnani. llvm-svn: 328344
*	[ARM] Fix "Constant pool entry out of range!" in Thumb1 mode	Ana Pazos	2018-03-23	1	-0/+359
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode. In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode, adjustBBOffsetsAfter() is not calculating postOffset correctly by properly accounting for the padding that is required for the constant pool that immediately follows the jump table branch instruction. Reviewers: t.p.northover, eli.friedman Reviewed By: t.p.northover Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44709 llvm-svn: 328341
*	[Hexagon] Two fixes in early if-conversion	Krzysztof Parzyszek	2018-03-23	1	-0/+75
\| \| \| \| \| \| \| \| \|	- Fix checking for vector predicate registers. - Avoid speculating llvm.lifetime.end intrinsic. Patch by Harsha Jagasia and Brendon Cahoon. llvm-svn: 328339
*	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit	Simon Pilgrim	2018-03-23	1	-1/+1
\| \| \| \| \| \|	Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338
*	Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores	Zaara Syeda	2018-03-23	3	-4/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 328326
*	[AArch64] Don't reduce the width of loads if it prevents combining a shift	John Brawn	2018-03-23	2	-2/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321
*	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to ↵	Simon Pilgrim	2018-03-23	1	-4/+4
\| \| \| \| \| \| \| \|	correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318
*	[ARM] Support float literals under XO	Christof Douma	2018-03-23	1	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \| \|	When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313
*	[GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.	Amara Emerson	2018-03-23	1	-3/+3
\| \| \| \| \| \| \|	This was being masked because GISel is enabled by default for -O0 and the abort was disabled. Modified test to explicitly enable abort. llvm-svn: 328311