bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Increase kernel padding	Stanislav Mekhanoshin	2019-07-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938
*	[ARM] Rewrite how VCMP are lowered, using a single node	David Green	2019-07-24	5	-248/+283
\| \| \| \| \| \| \| \| \| \| \| \|	This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934
*	[DAGCombine] matchBinOpReduction - add partial reduction matching	Simon Pilgrim	2019-07-24	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933
*	[ARM] Disable MVE fptosi and friends	David Green	2019-07-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931
*	[AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVec	Jessica Paquette	2019-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fix an off-by-one error which made us not look at the last element of the zero vector. This caused a miscompile in 188.ammp. Differential Revision: https://reviews.llvm.org/D65168 llvm-svn: 366930
*	[ARM] More MVE compare vector splat combines for ANDs	David Green	2019-07-24	1	-0/+12
\| \| \| \| \| \| \| \|	Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928
*	[ARM] MVE compare vector splat combine	David Green	2019-07-24	1	-0/+12
\| \| \| \| \| \| \| \| \|	MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924
*	[AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed ↵	Dmitry Preobrazhensky	2019-07-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921
*	[ARM] Better OR's for MVE compares	David Green	2019-07-24	4	-8/+73
\| \| \| \| \| \| \| \| \| \| \|	This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920
*	[AMDGPU] Add all vgpr classes to asm parser	Stanislav Mekhanoshin	2019-07-24	1	-1/+5
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917
*	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts	Matt Arsenault	2019-07-24	1	-6/+8
\| \| \| \| \| \| \|	The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915
*	[ARM] Better AND's for MVE compares	David Green	2019-07-24	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910
*	[ARM] MVE floating point compares and selects	David Green	2019-07-24	2	-1/+53
\| \| \| \| \| \| \| \| \| \| \| \| \|	Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909
*	[ARM] Basic And/Or/Xor handling for MVE predicates	David Green	2019-07-24	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \|	This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907
*	[ARM] Make sure that the constant pool does not keep in the middle of an IT ↵	Simi Pallipurath	2019-07-24	1	-3/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905
*	Test commit. NFC.	Sjoerd Meijer	2019-07-24	1	-1/+1
\| \| \| \| \| \| \|	Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904
*	[ARM] MVE predicate register support	David Green	2019-07-24	3	-13/+360
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890
*	[ARM] MVE integer compares and selects	David Green	2019-07-24	5	-40/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885
*	[ARM][ParallelDSP] Fix pointer operand reordering	Sam Parker	2019-07-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881
*	[PowerPC][NFC] use opcode instead of MachineInstr for instrHasImmForm().	Chen Zheng	2019-07-24	2	-9/+14
\| \| \| \|	llvm-svn: 366867
*	[AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after ↵	Fangrui Song	2019-07-24	1	-0/+1
\| \| \| \| \| \|	r366857 llvm-svn: 366866
*	[AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.	Amara Emerson	2019-07-23	3	-13/+92
\| \| \| \| \| \| \| \| \| \| \| \|	We need to be able to load and store s128 for memcpy inlining, where we want to generate Q register mem ops. Making these legal also requires that we add some support in other instructions. Regbankselect should also know about these since they have no GPR register class that can hold them, so need special handling to live on the FPR bank. Differential Revision: https://reviews.llvm.org/D65166 llvm-svn: 366857
*	[GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPR	Jessica Paquette	2019-07-23	1	-5/+3
\| \| \| \| \| \| \| \|	The condition can never be fed by FPRs, so it should always be on a GPR. Differential Revision: https://reviews.llvm.org/D65157 llvm-svn: 366854
*	[ARM] Add opt-bisect support to ARMParallelDSP.	Eli Friedman	2019-07-23	1	-0/+3
\| \| \| \|	llvm-svn: 366851
*	[PowerPC] Remove redundant load immediate instructions	Yi-Hong Lyu	2019-07-23	1	-0/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PowerPC backend emits code like this: r3 = li 0 std r3, 264(r1) r3 = li 0 std r3, 272(r1) This patch fixes that and other cases where a register already contains a value that is loaded so we will get: r3 = li 0 std r3, 264(r1) std r3, 272(r1) Differential Revision: https://reviews.llvm.org/D64220 llvm-svn: 366840
*	[X86] In lowerVectorShuffle, instead of creating a new node to canonicalize ↵	Craig Topper	2019-07-23	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the shuffle mask by commuting, just commute the mask and swap V1/V2. LegalizeDAG tries to legal the DAG by legalizing nodes before their operands. If we create a new node, we end up legalizing it after its operands. This prevents some of the optimizations that can be done when the operand is a build_vector since the build_vector will have been legalized to something else. Differential Revision: https://reviews.llvm.org/D65132 llvm-svn: 366835
*	[GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes	Jessica Paquette	2019-07-23	1	-7/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we select the XRO variants of loads, we can pull in very specific shifts (of the size of an element). E.g. ``` ldr x1, [x2, x3, lsl #3] ``` This teaches GISel to handle these when they're coming from shifts specifically. This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg` which recognizes this pattern. This also packs this up with `selectAddrModeRegisterOffset` into `selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO` in AArch64ISelDAGtoDAG. Also update load-addressing-modes to show that all of the cases here work. Differential Revision: https://reviews.llvm.org/D65119 llvm-svn: 366819
*	[ARM][LowOverheadLoops] Fix branch target codegen	Sam Parker	2019-07-23	4	-37/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809
*	Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.	Simon Pilgrim	2019-07-23	1	-2/+2
\| \| \| \|	llvm-svn: 366808
*	[ARM] Rename NEONModImm to VMOVModImm. NFC	David Green	2019-07-23	8	-46/+46
\| \| \| \| \| \|	Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790
*	[PowerPC] Replace float load/store pair with integer load/store pair when ↵	Zi Xuan Wu	2019-07-23	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	it's only used in load/store Replace float load/store pair with integer load/store pair when it's only used in load/store, because float load/store instructions cost more cycles then integer load/store. A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack. So we need a load/store pair to do such memory operation if the variable is global variable. Differential Revision: https://reviews.llvm.org/D64195 llvm-svn: 366775
*	AMDGPU: Don't use SDNodeXForm for DS offset output	Matt Arsenault	2019-07-22	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \|	The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743
*	[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector ↵	Craig Topper	2019-07-22	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732
*	[NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming ↵	Jason Liu	2019-07-22	7	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	convention Summary: Since we are planning to add ADDIStocHA for 32bit in later patch, we decided to change 64bit one first to follow naming convention with 8 behind opcode. Patch by: Xiangling_L Differential Revision: https://reviews.llvm.org/D64814 llvm-svn: 366731
*	Stubs out TLOF for AIX and add support for common vars in assembly output.	Sean Fertile	2019-07-22	2	-2/+36
\| \| \| \| \| \| \| \| \|	Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727
*	[PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC]	Sean Fertile	2019-07-22	1	-2/+2
\| \| \| \| \| \|	Patch by Xiangling Liao. llvm-svn: 366724
*	[ARM][LowOverheadLoops] Revert remaining pseudos	Sam Parker	2019-07-22	1	-12/+56
\| \| \| \| \| \| \| \| \| \| \|	ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691
*	AMDGPU/GlobalISel: Remove unnecessary code	Matt Arsenault	2019-07-22	1	-4/+0
\| \| \| \| \| \| \|	The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685
*	[ARM] Fix for MVE VPT block pass	David Green	2019-07-22	1	-3/+18
\| \| \| \| \| \| \| \| \|	We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684
*	[X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)	Simon Pilgrim	2019-07-22	1	-5/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Fixed out of bounds load assert identified in rL366501 Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366681
*	Added address-space mangling for stack related intrinsics	Christudasan Devadasan	2019-07-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679
*	[IPRA][ARM] Make use of the "returned" parameter attribute	Oliver Stannard	2019-07-22	3	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669
*	[AMDGPU] Save some work when an atomic op has no uses	Jay Foad	2019-07-22	1	-67/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667
*	[X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST ↵	Simon Pilgrim	2019-07-21	1	-19/+13
\| \| \| \| \| \| \| \|	narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660
*	[X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)	Simon Pilgrim	2019-07-20	1	-7/+38
\| \| \| \| \| \|	As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636
*	[GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs	Jessica Paquette	2019-07-20	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625
*	[WebAssembly] Compute and export TLS block alignment	Guanzhong Chen	2019-07-19	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624
*	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces	Matt Arsenault	2019-07-19	1	-1/+3
\| \| \| \|	llvm-svn: 366621
*	[AMDGPU] Autogenerate register sequences in tuples	Stanislav Mekhanoshin	2019-07-19	1	-272/+47
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619
*	[AMDGPU] Fixed occupancy calculation for gfx10	Stanislav Mekhanoshin	2019-07-19	4	-28/+19
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616