bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from ↵	Craig Topper	2020-01-10	6	-27/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536
*	[AArch64] Don't generate libcalls for wide shifts on Darwin	Jessica Paquette	2020-01-10	1	-1/+1
\| \| \| \| \| \| \|	Similar to cff90f07cb5cc3. Darwin doesn't always use compiler-rt, and so we can't assume that these functions are available (at least on arm64).
*	[NFC][InlineCost] Factor cost modeling out of CallAnalyzer traversal.	Mircea Trofin	2020-01-10	1	-328/+431
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The goal is to simplify experimentation on the cost model. Today, CallAnalyzer decides 2 things: legality, and benefit. The refactoring keeps legality assessment in CallAnalyzer, and factors benefit evaluation out, as an extension. Reviewers: davidxl, eraman Reviewed By: davidxl Subscribers: kamleshbhalui, fedor.sergeev, hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71733
*	[LockFileManager] Make default waitForUnlock timeout a parameter, NFC	Vedant Kumar	2020-01-10	1	-4/+2
\| \| \| \|	Patch by Xi Ge!
*	Let targets adjust operand latency of bundles	Stanislav Mekhanoshin	2020-01-10	4	-42/+34
\| \| \| \| \| \| \| \| \| \| \|	This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535
*	[AArch64] Add isAuthenticated predicate to MCInstDesc	Vedant Kumar	2020-01-10	2	-6/+14
\| \| \| \| \| \| \| \| \| \|	Add a predicate to MCInstDesc that allows tools to determine whether an instruction authenticates a pointer. This can be used by diagnostic tools to hint at pointer authentication failures. Differential Revision: https://reviews.llvm.org/D70329 rdar://55089604
*	[TargetLowering] Use SelectionDAG::getSetCC and remove a repeated call to ↵	Craig Topper	2020-01-10	1	-8/+4
\| \| \| \|	getSetCCResultType in softenSetCCOperands. NFCI
*	[CMake] Fix modules build after DWARFLinker reorganization	Jonas Devlieghere	2020-01-10	1	-0/+2
\| \| \| \| \|	Create a dedicate module for the DWARFLinker and make it depend on intrinsics gen.
*	[TargetLowering][ARM][X86] Change softenSetCCOperands handling of ONE to ↵	Craig Topper	2020-01-10	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \|	avoid spurious exceptions for QNANs with strict FP quiet compares ONE is currently softened to OGT \| OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO \| OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN. This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception. There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix. Differential Revision: https://reviews.llvm.org/D72477
*	[LegalizeVectorOps] Improve handling of multi-result operations.	Craig Topper	2020-01-10	1	-173/+271
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This system wasn't very well designed for multi-result nodes. As a consequence they weren't consistently registered in the LegalizedNodes map leading to nodes being revisited for different results. I've removed the "Result" variable from the main LegalizeOp method and used a SDNode* instead. The result number from the incoming Op SDValue is only used for deciding which result to return to the caller. When LegalizeOp is called it should always register a legalized result for all of its results. Future calls for any other result should be pulled for the LegalizedNodes map. Legal nodes will now register all of their results in the map instead of just the one we were called for. The Expand and Promote handling to use a vector of results similar to LegalizeDAG. Each of the new results is then re-legalized and logged in the LegalizedNodes map for all of the Results for the node being legalized. None of the handles register their own results now. And none call ReplaceAllUsesOfValueWith now. Custom handling now always passes result number 0 to LowerOperation. This matches what LegalizeDAG does. Since the introduction of STRICT nodes, I've encountered several issues with X86's custom handling being called with an SDValue pointing at the chain and our custom handlers using that to get a VT instead of result 0. This should prevent us from having any more of those issues. On return we will update the LegalizedNodes map for all results so we shouldn't call the custom handler again for each result number. I want to push SDNode* further into the Expand and Promote handlers, but I've left that for a follow to keep this patch size down. I've created a dummy SDValue(Node, 0) to keep the handlers working. Differential Revision: https://reviews.llvm.org/D72224
*	[X86] Support function attribute "patchable-function-entry"	Fangrui Song	2020-01-10	1	-3/+15
\| \| \| \| \| \| \|	For x86-64, we diverge from GCC -fpatchable-function-entry in that we emit multi-byte NOPs. Differential Revision: https://reviews.llvm.org/D72220
*	[AArch64] Add function attribute "patchable-function-entry" to add NOPs at ↵	Fangrui Song	2020-01-10	4	-2/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function entry The Linux kernel uses -fpatchable-function-entry to implement DYNAMIC_FTRACE_WITH_REGS for arm64 and parisc. GCC 8 implemented -fpatchable-function-entry, which can be seen as a generalized form of -mnop-mcount. The N,M form (function entry points before the Mth NOP) is currently only used by parisc. This patch adds N,0 support to AArch64 codegen. N is represented as the function attribute "patchable-function-entry". We will use a different function attribute for M, if we decide to implement it. The patch reuses the existing patchable-function pass, and TargetOpcode::PATCHABLE_FUNCTION_ENTER which is currently used by XRay. When the integrated assembler is used, __patchable_function_entries will be created for each text section with the SHF_LINK_ORDER flag to prevent --gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197) and COMDAT (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195) issues. Retrospectively, __patchable_function_entries should use a PC-relative relocation type to avoid the SHF_WRITE flag and dynamic relocations. "patchable-function-entry"'s interaction with Branch Target Identification is still unclear (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 for GCC discussions). Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D72215
*	[AIX] Allow vararg calls when all arguments reside in registers	jasonliu	2020-01-10	1	-22/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch pushes the AIX vararg unimplemented error diagnostic later and allows vararg calls so long as all the arguments can be passed in register. This patch extends the AIX calling convention implementation to initialize GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated for floating point arguments. The GPR(s) are only initialized for vararg calls, otherwise the callee is expected to retrieve the float argument in the FPR. f64 in AIX PPC32 requires special handling in order to allocated and initialize 2 GPRs. This is performed with bitcast, SRL, truncation to initialize one GPR for the MSW and bitcast, truncations to initialize the other GPR for the LSW. A future patch will follow to add support for arguments passed on the stack. Patch provided by: cebowleratibm Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D71013
*	[X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵	Simon Pilgrim	2020-01-10	1	-2/+2
\| \| \| \| \| \| \| \|	multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
*	[BPF] extend BTF_KIND_FUNC to cover global, static and extern funcs	Yonghong Song	2020-01-10	3	-29/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously extern function is added as BTF_KIND_VAR. This does not work well with existing BTF infrastructure as function expected to use BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO. This patch added extern function to BTF_KIND_FUNC. The two bits 0:1 of btf_type.info are used to indicate what kind of function it is: 0: static 1: global 2: extern Differential Revision: https://reviews.llvm.org/D71638
*	Add support for __declspec(guard(nocf))	Andrew Paverd	2020-01-10	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a new `"guard_nocf"` function attribute to indicate that checks should not be added on indirect calls within that function. Add support for `__declspec(guard(nocf))` following the same syntax as MSVC. Reviewers: rnk, dmajor, pcc, hans, aaron.ballman Reviewed By: aaron.ballman Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72167
*	[PowerPC] Handle constant zero bits in BitPermutationSelector	Nemanja Ivanovic	2020-01-10	1	-4/+7
\| \| \| \| \| \| \| \| \| \|	We currently crash when analyzing an AssertZExt node that has some bits that are constant zeros (i.e. as a result of an and with a constant). This issue was reported in https://bugs.llvm.org/show_bug.cgi?id=41088 and this patch fixes that. Differential revision: https://reviews.llvm.org/D72038
*	[DebugInfo][NFC] Remove unused variable/fix variable naming	James Henderson	2020-01-10	1	-1181/+1179
\| \| \| \| \| \|	Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D72159
*	[DebugInfo] Improve error message text	James Henderson	2020-01-10	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	Unlike most of our errors in the debug line parser, the "no end of sequence" message was missing any reference to which line table it refererred to. This change adds the offset to this message. Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D72443
*	AMDGPU/GlobalISel: Clamp G_ZEXT source sizes	Matt Arsenault	2020-01-10	1	-2/+3
\| \| \| \| \|	Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
*	[FPEnv] Invert sense of MIFlag::FPExcept flag	Ulrich Weigand	2020-01-10	7	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes default to potentially raising FP exceptions unless otherwise specified -- i.e. if we forget to propagate the flag somewhere, the effect is now only lost performance, not incorrect code. However, the related flag at the MI level still defaults to nodes not raising FP exceptions unless otherwise specified. To be fully on the (conservatively) safe side, we should invert that flag as well. This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept. (Note that this does also introduce an incompatible change in the MIR format.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D72466
*	[ARM][MVE] Tail predicate VMAX,VMAXA,VMIN,VMINA	Sam Parker	2020-01-10	1	-0/+2
\| \| \| \| \| \| \|	Add the MVE min and max instructions to our tail predication whitelist. Differential Revision: https://reviews.llvm.org/D72502
*	ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC.	Sjoerd Meijer	2020-01-10	1	-7/+16
\|
*	Reverting, broke some bots. Need further investigation.	Diogo Sampaio	2020-01-10	7	-336/+85
\| \| \| \| \| \| \| \|	Summary: This reverts commit 8c12769f3046029e2a9b4e48e1645b1a77d28650. Reviewers: Subscribers:
*	[Support] ThreadPoolExecutor fixes for Windows/MinGW	Andrew Ng	2020-01-10	1	-18/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changed ThreadPoolExecutor to no longer use detached threads and instead to join threads on destruction. This is to prevent intermittent crashing on Windows when doing a normal full exit, e.g. via exit(). Changed ThreadPoolExecutor to be a ManagedStatic so that it can be stopped on llvm_shutdown(). Without this, it would only be stopped in the destructor when doing a full exit. This is required to avoid intermittent crashing on Windows due to a race condition between the ThreadPoolExecutor starting up threads and the process doing a fast exit, e.g. via _exit(). The Windows crashes appear to only occur with the MSVC static runtimes and are more frequent with the debug static runtime. These changes also prevent intermittent deadlocks on exit with the MinGW runtime. Differential Revision: https://reviews.llvm.org/D70447
*	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP	Diogo Sampaio	2020-01-10	7	-85/+336
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
*	Fix "pointer is null" static analyzer warnings. NFCI.	Simon Pilgrim	2020-01-10	1	-0/+1
\| \| \| \|	Assert that the pointers are non-null before dereferencing them.
*	Fix Wdocumentation warning. NFCI.	Simon Pilgrim	2020-01-10	1	-2/+0
\|
*	Don't use dyn_cast_or_null if we know the pointer is nonnull.	Simon Pilgrim	2020-01-10	1	-4/+2
\| \| \| \|	Fix clang static analyzer null dereference warning by using dyn_cast instead.
*	[LV] Silence unused variable warning in Release builds. NFC.	Benjamin Kramer	2020-01-10	1	-0/+1
\|
*	[MIR] Fix cyclic dependency of MIR formatter	Peng Guo	2020-01-10	6	-24/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Move MIR formatter pointer from TargetMachine to TargetInstrInfo to avoid cyclic dependency between target & codegen. Reviewers: dsanders, bkramer, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72485
*	[SVEV] Recognise hardware-loop intrinsic loop.decrement.reg	Sjoerd Meijer	2020-01-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same semantics as a sub expression. This allows us to query hardware-loops, which contain this @loop.decrement.reg intrinsic, so that we can calculate iteration counts, exit values, etc. of hardwareloops. This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus, while hardware-loops and tripcounts now become analysable by SCEV, this prevents the usual loop transformations from applying transformations on hardware-loops, which is what we want at this point, for which I have added test cases for loopunrolling and IndVarSimplify and LFTR. Differential Revision: https://reviews.llvm.org/D71563
*	[NFC] [PowerPC] Add isPredicable for basic instrs	Qiu Chaofan	2020-01-10	4	-34/+21
\| \| \| \| \| \| \| \| \|	PowerPC uses a dedicated method to check if the machine instr is predicable by opcode. However, there's a bit `isPredicable` in instr definition. This patch removes the method and set the bit only to opcodes referenced in it. Differential Revision: https://reviews.llvm.org/D71921
*	[LV] VPValues for memory operation pointers (NFCI)	Gil Rapaport	2020-01-10	5	-104/+141
\| \| \| \| \| \| \| \| \| \| \| \|	Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to feed these recipes with pointers based on other ingredients or none at all. This patch modifies these recipes to use a VPValue for the pointer instead, in order to reduce ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. The recipes are constructed with VPValues bound to these ingredients, maintaining current behavior. Differential revision: https://reviews.llvm.org/D70865
*	[ThinLTO] Pass CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP	Wei Mi	2020-01-09	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	down to pass builder in ltobackend. Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang are not passed down to pass builder in ltobackend when new pass manager is used. This is inconsistent with the behavior when new pass manager is used and thinlto is not used. Such inconsistency causes slp vectorization pass not being enabled in ltobackend for O3 + thinlto right now. This patch fixes that. Differential Revision: https://reviews.llvm.org/D72386
*	[NFC] Style cleanup	Shengchen Kan	2020-01-10	1	-9/+10
\|
*	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT	Matt Arsenault	2020-01-09	5	-10/+89
\| \| \| \| \|	Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
*	AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v case	Matt Arsenault	2020-01-09	2	-16/+87
\| \| \| \| \| \|	If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
*	[AMDGPU] Fix bundle scheduling	Stanislav Mekhanoshin	2020-01-09	3	-0/+61
\| \| \| \| \| \| \|	Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
*	AVR: Update for getRegisterByName change	Matt Arsenault	2020-01-09	2	-3/+3
\|
*	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm	Matt Arsenault	2020-01-09	7	-31/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
*	GlobalISel: Handle llvm.read_register	Matt Arsenault	2020-01-09	3	-0/+30
\| \| \| \| \|	Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
*	DAG: Don't use unchecked dyn_cast	Matt Arsenault	2020-01-09	1	-4/+4
\|
*	GlobalISel: Fix else after return	Matt Arsenault	2020-01-09	1	-3/+9
\|
*	CodeGen: Use LLT instead of EVT in getRegisterByName	Matt Arsenault	2020-01-09	21	-26/+31
\| \| \| \| \| \|	Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
*	[AArch64][GlobalISel] Implement selection of <2 x float> vector splat.	Amara Emerson	2020-01-09	2	-7/+36
\| \| \| \| \| \|	Also requires making G_IMPLICIT_DEF of v2s32 legal. Differential Revision: https://reviews.llvm.org/D72422
*	GlobalISel: Move getLLTForMVT/getMVTForLLT	Matt Arsenault	2020-01-09	2	-17/+17
\| \| \| \| \| \|	As an intermediate step, some TLI functions can be converted to using LLT instead of MVT. Move this somewhere out of GlobalISel so DAG functions can use these.
*	GlobalISel: Don't assert on MoreElements creating vectors	Matt Arsenault	2020-01-09	1	-5/+7
\| \| \| \| \| \| \|	If the original type was a scalar, it should be valid to add elements to turn it into a vector. Tests included with following legalization change.
*	AMDGPU/GlobalISel: Fix argument lowering for vectors of pointers	Matt Arsenault	2020-01-09	1	-2/+18
\| \| \| \| \| \| \|	When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.
*	AMDGPU/GlobalISel: Widen 16-bit shift amount sources	Matt Arsenault	2020-01-09	1	-1/+2
\| \| \| \| \| \|	This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.