bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	19	-54/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[ARM] Additional tests and minor formatting. NFC	David Green	2019-12-09	1	-43/+43
\| \| \| \| \| \|	This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
*	[DebugInfo] Make describeLoadedValue() reg aware	David Stenberg	2019-12-09	6	-22/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes situations in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: ormris, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70431
*	Revert "[DebugInfo] Make describeLoadedValue() reg aware"	David Stenberg	2019-12-09	6	-164/+22
\| \| \| \| \|	This reverts commit 3cd93a4efcdeabeb20cb7bec9fbddcb540d337a1. I'll recommit with a well-formatted arcanist commit message.
*	[DebugInfo] Make describeLoadedValue() reg aware	David Stenberg	2019-12-09	6	-22/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes a case in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers.
*	[ARM] Attempt to use whole register vmovs for MVE shuffles.	David Green	2019-12-08	1	-0/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509
*	[ARM] Disable VLD4 under MVE	David Green	2019-12-08	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109
*	[SystemZ] Fix build bot failures	Ulrich Weigand	2019-12-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	My patch 9db13b5a7d43096a9ab5f7cef6e1b7e2dc9c9c63 seems to have caused some build bots to fail due to warnings that appear only when using -Wcovered-switch-default. This patch is an attempt to fix this by trying to avoid both the warning "default label in switch which covers all enumeration values" for the inner switch statements and at the same time the warning "this statement may fall through" for the outer switch statement in getVectorComparison (SystemZISelLowering.cpp).
*	[BPF] Support weak global variables for BTF	Yonghong Song	2019-12-07	1	-5/+6
\| \| \| \| \| \| \| \| \|	Generate types for global variables with "weak" attribute. Keep allocation scope the same for both weak and non-weak globals as ELF symbol table can determine whether a global symbol is weak or not. Differential Revision: https://reviews.llvm.org/D71162
*	[FPEnv] Constrained FCmp intrinsics	Ulrich Weigand	2019-12-07	8	-77/+283
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for constrained floating-point comparison intrinsics. Specifically, we add: declare <ty2> @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) declare <ty2> @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) The first variant implements an IEEE "quiet" comparison (i.e. we only get an invalid FP exception if either argument is a SNaN), while the second variant implements an IEEE "signaling" comparison (i.e. we get an invalid FP exception if either argument is any NaN). The condition code is implemented as a metadata string. The same set of predicates as for the fcmp instruction is supported (except for the "true" and "false" predicates). These new intrinsics are mapped by SelectionDAG codegen onto two new ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again representing quiet vs. signaling comparison operations. Otherwise those nodes look like SETCC nodes, with an additional chain argument and result as usual for strict FP nodes. The patch includes support for the common legalization operations for those nodes. The patch also includes full SystemZ back-end support for the new ISD nodes, mapping them to all available SystemZ instruction to fully implement strict semantics (scalar and vector). Differential Revision: https://reviews.llvm.org/D69281
*	[PowerPC] Fix MI peephole optimization for splats	Kai Luo	2019-12-07	1	-11/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap. Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered. This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register. Here is an example in pseudo-MIR of what happens in the test cased added in this patch: Before PPC MI peephole optimization: ``` %arg = XVADDDP %0, %1 $f1 = COPY %arg.sub_64 call double rint(double) %res.first = COPY $f1 %vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64 %arg.swapped = XXPERMDI %arg, %arg, 2 $f1 = COPY %arg.swapped.sub_64 call double rint(double) %res.second = COPY $f1 %vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64 %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 %vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2 ; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ] ``` After optimization: ``` ; ... %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 ; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1 ; so the pass replaces the swap with a copy: %vec.res = COPY %vec.res.splat ; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ] ``` As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat. Committed for vddvss (Colin Samples). Thanks Colin for the patch! Differential Revision: https://reviews.llvm.org/D69497
*	[AArch64][GlobalISel] Add missing default statement to a switch in the selector.	Amara Emerson	2019-12-06	1	-0/+3
\|
*	Move variable only used in an assert into the assert itself.	Sterling Augustine	2019-12-06	1	-2/+1
\| \| \| \|	This prevents unused variable warnings from breaking the build.
*	[AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.	Amara Emerson	2019-12-06	1	-5/+71
\| \| \| \| \| \|	Only implemented for the type combinations already supported for G_SHL. Differential Revision: https://reviews.llvm.org/D71153
*	[WebAssebmly][MC] Support .import_name/.import_field asm directives	Sam Clegg	2019-12-06	1	-0/+24
\| \| \| \| \| \| \| \|	Convert the MC test to use asm rather than bitcode. This is a precursor to https://reviews.llvm.org/D70520. Differential Revision: https://reviews.llvm.org/D70877
*	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.	Amara Emerson	2019-12-06	1	-1/+1
\| \| \| \| \| \| \| \|	The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095
*	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT	Craig Topper	2019-12-06	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105
*	[X86] Don't setup and teardown memory for a musttail call	Reid Kleckner	2019-12-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097
*	Revert "[PGO][PGSO] Instrument the code gen / target passes."	Hiroshi Yamauchi	2019-12-06	3	-52/+2
\| \| \| \| \| \|	This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac. This seems to break buildbots.
*	Revert "ARM-Darwin: keep the frame register reserved even if not updated."	Alina Sbirlea	2019-12-06	1	-1/+1
\| \| \| \| \| \| \| \|	This reverts commit a7d90af1be48234ce583e00fb16e33633d44ae38. This revision came back as the root-cause for crashes in internal ARM-IOS apps. Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.
*	[x86] add cost model special-case for insert/extract from element 0	Sanjay Patel	2019-12-06	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023
*	[PGO][PGSO] Instrument the code gen / target passes.	Hiroshi Yamauchi	2019-12-06	3	-2/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71072
*	[ARM][MVE] Fix copy-paste error in VQSHL instruction ids.	Simon Tatham	2019-12-06	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The immediate forms of the MVE VQSHL instruction have MC names like `MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing, because VSLI is a completely different shift instruction with no semantic relation to VQSHL. But it just happens to be defined immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a copy-paste error. Renamed the ids to match the instruction name. Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71114
*	[AArch64] Fix a bug with jump table generation	Cullen Rhodes	2019-12-06	2	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When trying to calculate the offsets for the jump table entries we fail to take into account the block alignment, which could be greater than 4 bytes. This led to cases where the jump table offset was too big to fit in a byte. Reviewers: t.p.northover, sdesmalen, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Committed on behalf of David Sherwood (david-arm) Tags: #llvm Differential Revision: https://reviews.llvm.org/D70533
*	[AArch64][SVE2] Implement while comparison intrinsics	Cullen Rhodes	2019-12-06	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following intrinsics: * whilege, whilegt, whilehi, whilehs Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70909
*	[AArch64][SVE] Implement integer compare intrinsics	Cullen Rhodes	2019-12-06	3	-34/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds intrinsics for the following: * cmphs, cmphi * cmpge, cmpgt * cmpeq, cmpne * cmplt, cmple * cmplo, cmpls Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the scalable flag being dropped. Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70889
*	[FPEnv][SelectionDAG] Relax chain requirements	Ulrich Weigand	2019-12-06	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the following changes: 1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well. This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG 2) The OPC_CheckFoldableChainNode check requires that each of the intermediate nodes in a multi-node pattern match only has a single use. This check tends to fail if those intermediate nodes are strict operations as those have a chain output that typically indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't. This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value. In theory, this change could affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening 3) The SystemZ back-end currently does not allow matching multiply-and- extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for strict FP operations. This was not possible in the past due to the problems described under 1) and 2) above. With those issues fixed, it is now possible to fully support those instructions in strict mode as well, and this patch does so. Differential Revision: https://reviews.llvm.org/D70913
*	[X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we ↵	Craig Topper	2019-12-05	2	-11/+12
\| \| \| \| \| \| \| \| \| \|	explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.
*	Add strict fp support for instructions fadd/fsub/fmul/fdiv	Liu, Chen3	2019-12-06	4	-32/+42
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D68757
*	[AIX] Make sure to use QualNames for external global objects	David Tenty	2019-12-05	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we only handled the case where the csect hadn't been set up yet, so we'd hit an assert later on. Reviewers: jasonliu, DiggerLin, stevewan Reviewed By: jasonliu Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71032
*	[X86] Remove ProcIntelGLM/ProcIntelGLP/ProcIntelTRM and replace them with a ↵	Craig Topper	2019-12-05	4	-23/+15
\| \| \| \| \| \|	single feature flag covers the two places they were used. Differential Revision: https://reviews.llvm.org/D71048
*	[AArch64] Fix MUL/SUB fusing	Sanne Wouda	2019-12-05	1	-20/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When MUL is the first operand to SUB, we can't use MLS because the accumulator should be negated. Emit a NEG of the accumulator and an MLA instead, similar to what we do for FMUL / FSUB fusing. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf, mstorsjo, asbirlea Reviewed By: asbirlea Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71067
*	[AArch64][SVE] Integer reduction instructions pattern/intrinsics.	Danilo Carvalho Grael	2019-12-05	5	-14/+106
\| \| \| \| \| \| \| \|	Added pattern matching/intrinsics for the following SVE instructions: -- saddv, uaddv -- smaxv, sminv, umaxv, uminv -- orv, eorv, andv
*	[AArch64][SVE] Implement element count intrinsics	Cullen Rhodes	2019-12-05	2	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds intrinsics for the following: * cntb * cnth * cntw * cntd * cntp Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70967
*	[MCRegInfo] Add forward sub and super register iterators. (NFC)	Florian Hahn	2019-12-05	2	-26/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds forward iterators mc_difflist_iterator, mc_subreg_iterator and mc_superreg_iterator, based on the existing DiffListIterator. Those are used to provide iterator ranges over sub- and super-register from TRI, which are slightly more convenient than the existing MCSubRegIterator/MCSuperRegIterator. Unfortunately, it duplicates a bit of functionality, but the new iterators are a bit more convenient (and can be used with various existing iterator utilities) and should probably replace the old iterators in the future. This patch updates some existing users. Reviewers: evandro, qcolombet, paquette, MatzeB, arsenm Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D70565
*	Fix the macro fusion table for X86 according to Intel optimization	Shengchen Kan	2019-12-05	2	-171/+254
\| \| \| \| \| \|	manual and add function isMacroFused Differential Revision: https://reviews.llvm.org/D70999
*	[AArch64][SVE] Add intrinsics and patterns for logical predicate instructions	Danilo Carvalho Grael	2019-12-04	2	-17/+27
\| \| \| \| \| \| \|	Add instrinics and patters for the following logical predicate instructions: -- and, ands, bic, bics, eor, eors -- sel -- orr, orrs, orn, orns, nor, nors, nand, nads
*	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC	Craig Topper	2019-12-04	2	-9/+0
\| \| \| \| \| \| \| \| \| \|	I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.
*	Reland [AArch64][MachineOutliner] Return address signing for outlined functions	David Tellenbach	2019-12-05	1	-8/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reland after fixing an ASan failure by stopping outlining early if the constraints for return address signing removed too many outlining candidates. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635
*	Revert "Reland [AArch64][MachineOutliner] Return address signing for ↵	Sterling Augustine	2019-12-04	1	-284/+8
\| \| \| \| \| \| \| \| \|	outlined functions" This reverts commit 02760b750b2ffcc0e2f5d78ecb137c80930c42c3. The original commit is not asan clean. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/37147/steps/check-llvm%20asan/logs/stdio
*	[X86] Add missing break to the end of the last case in a switch. NFC	Craig Topper	2019-12-04	1	-0/+1
\|
*	Add support for lowering 32-bit/64-bit pointers	Amy Huang	2019-12-04	3	-4/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639
*	Reland [AArch64][MachineOutliner] Return address signing for outlined functions	David Tellenbach	2019-12-04	1	-8/+284
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reland after fixing a bug that allowed outlining of SP modifying instructions that invalidated return address signing. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635
*	[SVE][AArch64] Adding patterns for while intrinsics.	Mikhail Gudim	2019-12-04	2	-28/+40
\|
*	[XCOFF][AIX] Emit TOC entries for object file generation	jasonliu	2019-12-04	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement emitTCEntry for PPCTargetXCOFFStreamer. Add TC csects to TOCCsects for object file writing. Note: 1. I did not include any raw data testing for this object file generation because TC entries raw data will all be 0 without relocation implemented. I will add raw data testing as part of relocation testing later. 2. I removed "Symbol->setFragment(F);" for common symbols because we don't need it, and if we have it then we would hit assertions below: Assertion `(SymbolContents == SymContentsUnset \|\| SymbolContents == SymContentsOffset) && "Cannot get offset for a common/variable symbol"' failed. 3.Fixed incorrect TOC-base alignment. Differential Revision: https://reviews.llvm.org/D70798
*	[ARM][MVE][Intrinsics] Add VMULH/VRMULH intrinsics.	Mark Murray	2019-12-04	1	-14/+40
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add MVE VMULH/VRMULH intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70948
*	[AArch64][SVE] Implement reversal intrinsics	Cullen Rhodes	2019-12-04	2	-8/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds intrinsics for the following: * rbit * revb * revh * revw Patterns are also defined to map the 'llvm.bswap.*' intrinsic to the SVE revb instruction. Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70960
*	[AMDGPU][MC] Remove duplicate code introduced in r359316.	Jay Foad	2019-12-04	1	-9/+0
\|
*	Allow negative offsets in MipsMCInstLower::LowerOperand	Alex Richardson	2019-12-04	2	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We rely on this in our CHERI backend to address the GOT by generating a $pc-relative addresses. For this we emit the following code sequence: lui $1, %pcrel_hi(_CHERI_CAPABILITY_TABLE_-8) daddiu $1, $1, %pcrel_lo(_CHERI_CAPABILITY_TABLE_-4) cgetpccincoffset $c1, $1 However, without this change the addend is implicitly converted to UINT32_MAX and an invalid pointer value is generated. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70953
*	Handle BUNDLE instructions in MipsAsmPrinter	Alex Richardson	2019-12-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In our CHERI fork we use BUNDLE instructions to ensure that a three-instruction sequence to generate a program-counter-relative value is emitted without reordering or insertions (since that would break the 32-bit offset computation). Currently MipsAsmPrinter asserts when it encounters a pseudo instruction. To handle BUNDLE we can simply skip the instruction which will then make EmitInstruction() process the contents of the bundle in order. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70945