bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[arm] Add big-endian version of pcrel fixups for adr instructions	Dimitry Andric	2020-05-19	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In 2e24219d3cbf, a number of ARM pcrel fixups were resolved at assembly time, to solve PR44929. This only covered little-endian ARM however, so add similar fixups for big-endian ARM. Also extend the test case to cover big-endian ARM. Reviewers: hans, psmith, MaskRay Reviewed By: psmith, MaskRay Subscribers: kristof.beyls, hiraditya, danielkiss, emaste, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79774 (cherry picked from commit fc373522b044e0b150561204958f0d603fb4caba)
*	[ARM] Only produce qadd8b under hasV6Ops	David Green	2020-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877 (cherry picked from commit 8807139026b64ac40163bb255dad38a1d8054f08)
*	[MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)	Hans Wennborg	2020-02-27	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \|	MC currently does not emit these relocation types, and lld does not handle them. Add FKF_Constant as a work-around of some ARM code after D72197. Eventually we probably should implement these relocation types. By Fangrui Song! Differential revision: https://reviews.llvm.org/D72892 (cherry picked from commit 2e24219d3cbfcb8c824c58872f97de0a2e94a7c8)
*	Don't generate libcalls for wide shift on Windows ARM (PR42711)	Hans Wennborg	2020-02-25	1	-1/+1
\| \| \| \| \| \| \|	The previous patch (cff90f07cb5cc3c3bc58277926103af31caef308) didn't cover ARM. (cherry picked from commit decd021facba804b57e8d80b6159c987d3261ab8)
*	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering	John Brawn	2020-02-18	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726 (cherry picked from commit 594a89f7270da74c89f2321432bc6a7135773fa5)
*	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND	John Brawn	2020-02-18	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559 (cherry picked from commit 0ec57972967dfb43fc022c2e3788be041d1db730)
*	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS	John Brawn	2020-02-18	3	-10/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194 (cherry picked from commit b37d59353f699e99f139a9227a6a69964ef4b132)
*	Fix an unused variable warning	Hans Wennborg	2020-02-12	1	-1/+1
\| \| \| \|	(cherry picked from commit ea9850b6c71d975935de15bd4128508b260165c5)
*	Revert "[ARM] Improve codegen of volatile load/store of i64"	Victor Campos	2020-02-08	6	-162/+6
\| \| \| \|	This reverts commit 60e0120c913dd1d4bfe33769e1f000a076249a42.
*	[ARM][VecReduce] Force expand vector_reduce_fmin	David Green	2020-02-05	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code. (cherry picked from commit 362d00e0510ee75750499e2993a782428e377215)
*	[ARM] Expand vector reduction intrinsics on soft float	Nikita Popov	2020-02-05	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854 (cherry picked from commit 1cc4f8d17247cd9be88addd75d060f9321b6f8b0)
*	[AArch64][ARM] Always expand ordered vector reductions (PR44600)	Nikita Popov	2020-02-05	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135 (cherry picked from commit 70d345e687caba4ac1f95655c6924dfa91e0083f)
*	Add function attribute "patchable-function-prefix" to support ↵	Fangrui Song	2020-01-24	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-fpatchable-function-entry=N,M where M>0 Similar to the function attribute `prefix` (prefix data), "patchable-function-prefix" inserts data (M NOPs) before the function entry label. -fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry) will look like: ``` .type foo,@function .Ltmp0: # @foo nop foo: .Lfunc_begin0: # optional `bti c` (AArch64 Branch Target Identification) or # `endbr64` (Intel Indirect Branch Tracking) nop .section __patchable_function_entries,"awo",@progbits,get,unique,0 .p2align 3 .quad .Ltmp0 ``` -fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html): ``` (a) (b) func: func: .Ltmp0: bti c bti c .Ltmp0: nop nop ``` (a) needs no additional code. If the consensus is to go for (b), we will need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp . Differential Revision: https://reviews.llvm.org/D73070 (cherry picked from commit 22467e259507f5ead2a87d989251b4c951a587e4)
*	CMake: Make most target symbols hidden by default	Tom Stellard	2020-01-14	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
*	[ARM][MVE] VTP Block Pass fix	Sjoerd Meijer	2020-01-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP instruction that can be folded. The problem was that for each VPT block, we record the predicate statements with a list, but the same instruction was added twice. Thus, we were running in an assert trying to remove the same instruction twice. To avoid this the instructions are now recorded with a set. Differential Revision: https://reviews.llvm.org/D72699
*	[ARM,MVE] Use the new Tablegen `defvar` and `if` statements.	Simon Tatham	2020-01-14	1	-253/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This cleans up a lot of ugly `foreach` bodges that I've been using to work around the lack of those two language features. Now they both exist, I can make then all into something more legible! In particular, in the common pattern in `ARMInstrMVE.td` where a multiclass defines an `Instruction` instance plus one or more `Pat` that select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so that the patterns themselves become a little more legible. Replacing a `foreach` with a `defvar` removes a level of block structure, so several pieces of code have their indentation changed by this patch. Best viewed with whitespace ignored. NFC: the output of `llvm-tblgen -print-records` on the two affected Tablegen sources is exactly identical before and after this change, so there should be no effect at all on any of the other generated files. Reviewers: MarkMurrayARM, miyuki Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72690
*	[ARM][LowOverheadLoops] Allow all MVE instrs.	Sam Parker	2020-01-14	1	-21/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead. This also fixes a previous test failure. Differential Revision: https://reviews.llvm.org/D72509
*	[ARM][LowOverheadLoops] Change predicate inspection	Sam Parker	2020-01-14	1	-26/+27
\| \| \| \| \| \| \| \| \| \|	Use the already provided helper function to get the operand type so that we can detect whether the vpr is being used as a predicate or not. Also use existing helpers to get the predicate indices when we converting the vpt blocks. This enables us to support both types of vpr predicate operand. Differential Revision: https://reviews.llvm.org/D72504
*	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP	Diogo Sampaio	2020-01-14	7	-86/+351
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb Reviewed By: efriedma Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
*	[ARM][MVE] Disallow VPSEL for tail predication	Sam Parker	2020-01-14	2	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the current way that we collect predicated instructions, we can't easily handle vpsel in tail predicated loops. There are a couple of issues: 1) It will use the VPR as a predicate operand, but doesn't have to be instead a VPT block, which means we can assert while building up the VPT block because we don't find another VPST to being a new one. 2) VPSEL still requires a VPR operand even after tail predicating, which means we can't remove it unless there is another instruction, such as vcmp, that can provide the VPR def. The first issue should be a relatively simple fix in the logic of the LowOverheadLoops pass, whereas the second will require us to represent the 'implicit' tail predication with an explicit value. Differential Revision: https://reviews.llvm.org/D72629
*	[ARM][MVE] Masked gathers from base + vector of offsets	Anna Welker	2020-01-14	1	-38/+162
\| \| \| \| \| \| \| \|	Enables the masked gather pass to create a masked gather loading from a base and vector of offsets. This also enables v8i16 and v16i8 gather loads. Differential Revision: https://reviews.llvm.org/D72330
*	[Scheduler] Remove superfluous casts. NFC	David Green	2020-01-13	1	-1/+1
\|
*	ARMLowOverheadLoops: return earlier to avoid printing irrelevant dbg msg. NFC	Sjoerd Meijer	2020-01-13	1	-0/+1
\|
*	[Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()	Fangrui Song	2020-01-11	1	-13/+7
\| \| \| \| \| \| \| \| \| \|	The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.
*	[TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from ↵	Craig Topper	2020-01-10	2	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536
*	[ARM][MVE] Tail predicate VMAX,VMAXA,VMIN,VMINA	Sam Parker	2020-01-10	1	-0/+2
\| \| \| \| \| \| \|	Add the MVE min and max instructions to our tail predication whitelist. Differential Revision: https://reviews.llvm.org/D72502
*	ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC.	Sjoerd Meijer	2020-01-10	1	-7/+16
\|
*	Reverting, broke some bots. Need further investigation.	Diogo Sampaio	2020-01-10	7	-336/+85
\| \| \| \| \| \| \| \|	Summary: This reverts commit 8c12769f3046029e2a9b4e48e1645b1a77d28650. Reviewers: Subscribers:
*	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP	Diogo Sampaio	2020-01-10	7	-85/+336
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
*	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm	Matt Arsenault	2020-01-09	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
*	CodeGen: Use LLT instead of EVT in getRegisterByName	Matt Arsenault	2020-01-09	2	-2/+2
\| \| \| \| \| \|	Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
*	[NFC][ARM] LowOverheadLoop comments	Sam Parker	2020-01-09	1	-0/+16
\| \| \| \|	Add a comment describing the dependencies of the pass.
*	[ARM][MVE] Don't unroll intrinsic loops.	Sam Parker	2020-01-09	1	-4/+5
\| \| \| \| \| \| \| \|	We don't unroll vector loops for MVE targets, but we miss the case when loops only contain intrinsic calls. So just move the logic a bit to catch this case. Differential Revision: https://reviews.llvm.org/D72440
*	Revert "[ARM][LowOverheadLoops] Update liveness info"	Sam Parker	2020-01-09	1	-64/+0
\| \| \| \| \| \| \|	This reverts commit e93e0d413f3afa1df5c5f88df546bebcd1183155. There's some ordering problems on some on the buildbots which needs investigating.
*	[ARM][LowOverheadLoops] Update liveness info	Sam Parker	2020-01-09	1	-0/+64
\| \| \| \| \| \| \| \|	After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131
*	[ARM,MVE] Intrinsics for variable shift instructions.	Simon Tatham	2020-01-08	1	-12/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329
*	[ARM,MVE] Intrinsics for partial-overwrite imm shifts.	Simon Tatham	2020-01-08	1	-49/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328
*	[ARM][MVE] Enable masked gathers from vector of pointers	Anna Welker	2020-01-08	6	-1/+208
\| \| \| \| \| \| \| \|	Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743
*	[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS	Sjoerd Meijer	2020-01-07	1	-31/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470
*	[ARM] Improve codegen of volatile load/store of i64	Victor Campos	2020-01-07	6	-6/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
*	[MC] Add parameter `Address` to MCInstrPrinter::printInstruction	Fangrui Song	2020-01-06	2	-5/+5
\| \| \| \| \| \| \| \|	Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180
*	[MC] Add parameter `Address` to MCInstPrinter::printInst	Fangrui Song	2020-01-06	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
*	[ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering	David Green	2020-01-06	1	-2/+4
\| \| \| \| \| \| \| \| \|	The segmented stack lowering code appears to be using ARM opcodes under Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR seems wrong. Either way, using the correct thumb vs arm opcodes is more correct. Differential Revision: https://reviews.llvm.org/D72074
*	[ARM] Use correct TRAP opcode for thumb in FastISel	David Green	2020-01-06	1	-2/+6
\| \| \| \| \| \| \| \| \|	We were previously unconditionally using the ARM::TRAP opcode, even under Thumb. My understanding is that these are essentially the same thing (they both result in a trap under Thumb), but the ARM::TRAP opcode is marked as requiring IsARM, so it is more correct to use ARM::tTRAP. Differential Revision: https://reviews.llvm.org/D72075
*	[ARM,MVE] Fix many signedness errors in MVE intrinsics.	Simon Tatham	2020-01-06	1	-28/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270
*	[ARM,MVE] Generate the right instruction for vmaxnmq_m_f16.	Simon Tatham	2020-01-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Due to a copy-paste error in the isel patterns, the predicated version of this intrinsic was expanding to the `VMAXNMT.F32` instruction instead of `VMAXNMT.F16`. Similarly for vminnm. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72269
*	[NFC] Fix trivial typos in comments	James Henderson	2020-01-06	5	-5/+5
\| \| \| \| \| \| \| \|	Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
*	[ARM][MVE] More MVETailPredication debug messages. NFC.	Sjoerd Meijer	2020-01-06	2	-65/+96
\| \| \| \| \| \| \| \| \| \|	I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops. Differential Revision: https://reviews.llvm.org/D71549
*	[MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ↵	Fangrui Song	2020-01-05	1	-2/+5
\| \| \| \| \| \| \| \|	ARMELFObjectWriter::addTargetSectionFlags This simplifies the generic interface and also makes SHF_ARM_PURECODE more robust (fixes a TODO). Inspecting MCDataFragment contents covers more cases than MCObjectStreamer::EmitBytes.
*	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors	David Green	2020-01-05	5	-18/+45
\| \| \| \| \| \| \| \| \| \| \|	This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139