bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget	Guillaume Chatelet	2019-10-17	33	-83/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084
*	[gicombiner] Add the run-time rule disable option	Daniel Sanders	2019-10-17	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Each generated helper can be configured to generate an option that disables rules in that helper. This can be used to bisect rulesets. The disable bits are stored in a SparseVector as this is very cheap for the common case where nothing is disabled. It gets more expensive the more rules are disabled but you're generally doing that for debug purposes where performance is less of a concern. Depends on D68426 Reviewers: volkan, bogner Reviewed By: volkan Subscribers: hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68438 llvm-svn: 375067
*	[GISel][CombinerHelper] Add concat_vectors(build_vector, build_vector) => ↵	Quentin Colombet	2019-10-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	build_vector Teach the combiner helper how to flatten concat_vectors of build_vectors into a build_vector. Add this combine as part of AArch64 pre-legalizer combiner. Differential Revision: https://reviews.llvm.org/D69071 llvm-svn: 375066
*	[gicombiner] Hoist pure C++ combine into the tablegen definition	Daniel Sanders	2019-10-16	2	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is just moving the existing C++ code around and will be NFC w.r.t AArch64. Renamed 'CombineBr' to something more descriptive ('ElideByByInvertingCond') at the same time. The remaining combines in AArch64PreLegalizeCombiner require features that aren't implemented at this point and will be hoisted as they are added. Depends on D68424 Reviewers: bogner, volkan Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68426 llvm-svn: 375057
*	[AArch64] Fix offset calculation	Shoaib Meenai	2019-10-16	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r374772 changed Offset to be an int64_t but left NewOffset as an int. Scale is unsigned, so in the calculation `Offset - NewOffset * Scale`, `NewOffset * Scale` was promoted to unsigned and was then zero-extended to 64 bits, leading to an incorrect computation which manifested as an out-of-memory when building the Swift standard library for Android aarch64. Promote NewOffset to int64_t to fix this, and promote EmittableOffset as well, since its one user passes it to a function which takes an int64_t anyway. Test case based on a suggestion by Sander de Smalen! Differential Revision: https://reviews.llvm.org/D69018 llvm-svn: 375043
*	GlobalISel: Implement lower for G_SADDO/G_SSUBO	Matt Arsenault	2019-10-16	2	-3/+4
\| \| \| \| \| \| \|	Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042
*	[AMDGPU] Do not combine dpp mov reading physregs	Stanislav Mekhanoshin	2019-10-16	1	-0/+6
\| \| \| \| \| \| \| \|	We cannot be sure physregs will stay unchanged. Differential Revision: https://reviews.llvm.org/D69065 llvm-svn: 375033
*	[AMDGPU] Do not combine dpp with physreg def	Stanislav Mekhanoshin	2019-10-16	1	-0/+4
\| \| \| \| \| \| \| \|	We will remove dpp mov along with the physreg def otherwise. Differential Revision: https://reviews.llvm.org/D69063 llvm-svn: 375030
*	[AMDGPU] Supress unused sdwa insts generation	Stanislav Mekhanoshin	2019-10-16	3	-25/+55
\| \| \| \| \| \| \| \| \|	Do not generate non-existing sdwa instructions. It reduces the number of generated instructions by 185. Differential Revision: https://reviews.llvm.org/D69010 llvm-svn: 375016
*	[AArch64,Assembler] Compiler support for ID_MMFR5_EL1	Mark Murray	2019-10-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add read-only system register ID_MMFR5_EL1 and unit tests. Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69039 llvm-svn: 375010
*	bpf: fix wrong truncation elimination when there is back-edge/loop	Jiong Wang	2019-10-16	4	-167/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, BPF backend is doing truncation elimination. If one truncation is performed on a value defined by narrow loads, then it could be redundant given BPF loads zero extend the destination register implicitly. When the definition of the truncated value is a merging value (PHI node) that could come from different code paths, then checks need to be done on all possible code paths. Above described optimization was introduced as r306685, however it doesn't work when there is back-edge, for example when loop is used inside BPF code. For example for the following code, a zero-extended value should be stored into b[i], but the "and reg, 0xffff" is wrongly eliminated which then generates corrupted data. void cal1(unsigned short a, unsigned long b, unsigned int k) { unsigned short e; e = *a; for (unsigned int i = 0; i < k; i++) { b[i] = e; e = ~e; } } The reason is r306685 was trying to do the PHI node checks inside isel DAG2DAG phase, and the checks are done on MachineInstr. This is actually wrong, because MachineInstr is being built during isel phase and the associated information is not completed yet. A quick search shows none target other than BPF is access MachineInstr info during isel phase. For an PHI node, when you reached it during isel phase, it may have all predecessors linked, but not successors. It seems successors are linked to PHI node only when doing SelectionDAGISel::FinishBasicBlock and this happens later than PreprocessISelDAG hook. Previously, BPF program doesn't allow loop, there is probably the reason why this bug was not exposed. This patch therefore fixes the bug by the following approach: - The existing truncation elimination code and the associated "load_to_vreg_" records are removed. - Instead, implement truncation elimination using MachineSSA pass, this is where all information are built, and keep the pass together with other similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move elimination logic is updated accordingly. - Unit testcase included + no compilation errors for kernel BPF selftest. Patch Review === Patch was sent to and reviewed by BPF community at: https://lore.kernel.org/bpf Reported-by: David Beckett <david.beckett@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 375007
*	[RISCV] Add MachineInstr immediate verification	Luis Marques	2019-10-16	6	-4/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch implements the `TargetInstrInfo::verifyInstruction` hook for RISC-V. Currently the hook verifies the machine instruction's immediate operands, to check if the immediates are within the expected bounds. Without the hook invalid immediates are not detected except when doing assembly parsing, so they are silently emitted (including being truncated when emitting object code). The bounds information is specified in tablegen by using the `OperandType` definition, which sets the `MCOperandInfo`'s `OperandType` field. Several RISC-V-specific immediate operand types were created, which extend the `MCInstrDesc`'s `OperandType` `enum`. To have the hook called with `llc` pass it the `-verify-machineinstrs` option. For Clang add the cmake build config `-DLLVM_ENABLE_EXPENSIVE_CHECKS=True`, or temporarily patch `TargetPassConfig::addVerifyPass`. Review concerns: - The patch adds immediate operand type checks that cover at least the base ISA. There are several other operand types for the C extension and one type for the F/D extensions that were left out of this initial patch because they introduced further design concerns that I felt were best evaluated separately. - Invalid register classes (e.g. passing a GPR register where a GPRC is expected) are already caught, so were not included. - This design makes the more abstract `MachineInstr` verification depend on MC layer definitions, which arguably is not the cleanest design, but is in line with how things are done in other parts of the target and LLVM in general. - There is some duplication of logic already present in the `MCOperandPredicate`s. Since the `MachineInstr` and `MCInstr` notions of immediates are fundamentally different, this is currently necessary. Reviewers: asb, lenary Reviewed By: lenary Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67397 llvm-svn: 375006
*	[AMDGPU] Fix-up cases where writelane has 2 SGPR operands	David Stuttard	2019-10-16	2	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004
*	[ARM] Add a register class for GPR pairs without SP and use it. NFCI	Mikhail Maltsev	2019-10-16	2	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently Thumb2InstrInfo.cpp uses a register class which is auto-generated by tablegen. Such approach is fragile because auto-generated classes might change when other register classes are added. For example, before https://reviews.llvm.org/D62667 we were using GPRPair_with_gsub_1_in_rGPRRegClass, but had to change it to GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass because the former class stopped being generated (this did not change the functionality though). This patch adds a register class consisting of even-odd GPR register pairs from (R0, R1) to (R10, R11), which excludes (R12, SP) and uses it in Thumb2InstrInfo.cpp instead of GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass. Reviewers: ostannard, simon_tatham, dmgreen, efriedma Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69026 llvm-svn: 374990
*	[AMDGPU] Extend the SI Load/Store optimizer	Piotr Sobczak	2019-10-16	1	-13/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions. When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64911 llvm-svn: 374984
*	[ARM][ParallelDSP] Change smlad insertion order	Sam Parker	2019-10-16	1	-16/+51
\| \| \| \| \| \| \| \| \| \|	Instead of inserting everything after the 'root' of the reduction, insert all instructions as close to their operands as possible. This can help reduce register pressure. Differential Revision: https://reviews.llvm.org/D67392 llvm-svn: 374981
*	AMDGPU: Fix infinite searches in SIFixSGPRCopies	Austin Kerbow	2019-10-15	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 llvm-svn: 374944
*	Added support for "#pragma clang section relro=<name>"	Dmitry Mikulin	2019-10-15	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D68806 llvm-svn: 374934
*	[WebAssembly] Allow multivalue types in block signature operands	Thomas Lively	2019-10-15	9	-93/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Renames `ExprType` to the more apt `BlockType` and adds a variant for multivalue blocks. Currently non-void blocks are only generated at the end of functions where the block return type needs to agree with the function return type, and that remains true for multivalue blocks. That invariant means that the actual signature does not need to be stored in the block signature `MachineOperand` because it can be inferred by `WebAssemblyMCInstLower` from the return type of the parent function. `WebAssemblyMCInstLower` continues to lower block signature operands to immediates when possible but lowers multivalue signatures to function type symbols. The AsmParser and Disassembler are updated to handle multivalue block types as well. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68889 llvm-svn: 374933
*	[X86] combineX86ShufflesRecursively - split the getTargetShuffleInputs call ↵	Simon Pilgrim	2019-10-15	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	from the resolveTargetShuffleAndZeroables call. Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements. Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0 llvm-svn: 374928
*	[X86] Make memcmp() use PTEST if possible and also enable AVX1	David Zarzycki	2019-10-15	1	-17/+46
\| \| \| \|	llvm-svn: 374922
*	[AMDGPU] Support mov dpp with 64 bit operands	Stanislav Mekhanoshin	2019-10-15	4	-0/+103
\| \| \| \| \| \| \| \| \| \|	We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910
*	[AMDGPU] Allow DPP combiner to work with REG_SEQUENCE	Stanislav Mekhanoshin	2019-10-15	1	-5/+54
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D68828 llvm-svn: 374908
*	[Alignment][NFC] Value::getPointerAlignment returns MaybeAlign	Guillaume Chatelet	2019-10-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68398 llvm-svn: 374889
*	[ARM][MVE] validForTailPredication insts	Sam Parker	2019-10-15	3	-17/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverse the logic for valid tail predication instructions and create a whitelist instead. Added other instruction groups that aren't obviously safe: - instructions that 'narrow' their result. - lane moves. - byte swapping instructions. - interleaving loads and stores. - cross-beat carries. - top/bottom instructions. - complex operations. Hopefully we should be able to add more of these instructions to the whitelist, once we have a more concrete idea of the transform. Differential Revision: https://reviews.llvm.org/D67904 llvm-svn: 374887
*	[Alignment] Migrate Attribute::getWith(Stack)Alignment	Guillaume Chatelet	2019-10-15	9	-39/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884
*	[Alignment][NFC] Remove dependency on GlobalObject::setAlignment(unsigned)	Guillaume Chatelet	2019-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, mehdi_amini, jvesely, nhaehnle, hiraditya, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68944 llvm-svn: 374880
*	[X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. ↵	Simon Pilgrim	2019-10-15	1	-9/+18
\| \| \| \| \| \|	NFCI. llvm-svn: 374878
*	[MIPS GlobalISel] Add MSA registers to fprb. Select vector load, store	Petar Avramovic	2019-10-15	4	-26/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add vector MSA register classes to fprb, they are 128 bit wide. MSA instructions use the same registers for both integer and floating point operations. Therefore we only need to check for vector element size during legalization or instruction selection. Add helper function in MipsLegalizerInfo and switch to legalIf LegalizeRuleSet to keep legalization rules compact since they depend on MipsSubtarget and presence of MSA. fprb is assigned to all vector operands. Move selectLoadStoreOpCode to MipsInstructionSelector in order to reduce number of arguments. Differential Revision: https://reviews.llvm.org/D68867 llvm-svn: 374872
*	[MIPS GlobalISel] Refactor MipsRegisterBankInfo [NFC]	Petar Avramovic	2019-10-15	2	-153/+127
\| \| \| \| \| \| \| \| \| \|	Check if size of operand LLT matches sizes of available register banks before inspecting the opcode in order to reduce number of checks. Factor commonly used pieces of code into functions. Differential Revision: https://reviews.llvm.org/D68866 llvm-svn: 374870
*	[X86] Don't check for VBROADCAST_LOAD being a user of the source of a ↵	Craig Topper	2019-10-15	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862
*	[RISCV] Support fast calling convention	Shiva Chen	2019-10-15	1	-2/+67
\| \| \| \| \| \| \| \| \| \| \| \|	LLVM may annotate the function with fastcc if there has only one caller and there're no other caller out of the module and the function is not naked or contain variable arguments. The fastcc functions could pass the arguments by the caller saved registers. Differential Revision: https://reviews.llvm.org/D68559 llvm-svn: 374857
*	[WebAssembly] Trapping fptoint builtins and intrinsics	Thomas Lively	2019-10-15	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The WebAssembly backend lowers fptoint instructions to a code sequence that checks for overflow to avoid traps because fptoint is supposed to be speculatable. These new builtins and intrinsics give users a way to depend on the trapping semantics of the underlying instructions and avoid the extra code generated normally. Patch by coffee and tlively. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68902 llvm-svn: 374856
*	[X86] Teach X86MCodeEmitter to properly encode zmm16-zmm31 as index register ↵	Craig Topper	2019-10-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	to vgatherpf/vscatterpf. We need to encode bit 4 into the EVEX.V' bit. We do this right for regular gather/scatter which use either MRMSrcMem or MRMDestMem formats. The prefetches use MRM*m formats. Fixes an issue recently added to PR36202. llvm-svn: 374849
*	[ARM][AsmParser] handles offset expression in parentheses	Jian Cai	2019-10-14	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Integrated assembler does not accept offset expressions surrounded by parenthesis. Handle this case for GAS compability. https://bugs.llvm.org/show_bug.cgi?id=43631 Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68764 llvm-svn: 374832
*	AMDGPU: Fix redundant setting of m0 for atomic load/store	Matt Arsenault	2019-10-14	1	-10/+7
\| \| \| \| \| \| \|	Atomic load/store would have their setting of m0 handled twice, which happened to be optimized out later. llvm-svn: 374801
*	[NVPTX] Restructure shfl instrinsics and add variants that return a predicate.	Artem Belevich	2019-10-14	2	-111/+63
\| \| \| \| \| \| \| \| \|	Also, amend constraints for non-sync variants that are no longer available on sm_70+ with PTX6.4+. Differential Revision: https://reviews.llvm.org/D68892 llvm-svn: 374790
*	[CostModel][X86] Add CTLZ scalar costs	Simon Pilgrim	2019-10-14	1	-1/+22
\| \| \| \| \| \| \| \|	Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
*	[ARM] Selection for MVE VMOVN	David Green	2019-10-14	3	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781
*	[CostModel][X86] Add CTPOP scalar costs (PR43656)	Simon Pilgrim	2019-10-14	1	-0/+23
\| \| \| \| \| \| \| \|	Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
*	[AArch64] Stackframe accesses to SVE objects.	Sander de Smalen	2019-10-14	4	-41/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Materialize accesses to SVE frame objects from SP or FP, whichever is available and beneficial. This patch still assumes the objects are pre-allocated. The automatic layout of SVE objects within the stackframe will be added in a separate patch. Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D67749 llvm-svn: 374772
*	[AMDGPU] Come back patch for the 'Assign register class for cross block ↵	Alexander Timofeev	2019-10-14	4	-125/+208
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767
*	[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.	Andrea Di Biagio	2019-10-14	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765
*	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store]	Sam Parker	2019-10-14	4	-10/+13
\| \| \| \| \| \| \| \| \|	Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
*	[X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag ↵	Craig Topper	2019-10-14	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755
*	[X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as ↵	Simon Pilgrim	2019-10-13	1	-11/+11
\| \| \| \| \| \| \| \|	well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732
*	[X86] Enable use of avx512 saturating truncate instructions in more cases.	Craig Topper	2019-10-13	1	-35/+42
\| \| \| \| \| \| \| \| \| \|	This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731
*	[X86] SimplifyMultipleUseDemandedBitsForTargetNode - use ↵	Simon Pilgrim	2019-10-13	1	-12/+11
\| \| \| \| \| \|	getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725
*	[X86] getTargetShuffleInputs - add KnownUndef/Zero output support	Simon Pilgrim	2019-10-13	1	-25/+25
\| \| \| \| \| \|	Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally. llvm-svn: 374724
*	[X86] Add a one use check on the setcc to the min/max canonicalization code ↵	Craig Topper	2019-10-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706