bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[Loop Peeling] Fix the handling of branch weights of peeled off branches.	Serguei Katkov	2019-07-22	1	-62/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current algorithm to update branch weights of latch block and its copies is based on the assumption that number of peeling iterations is approximately equal to trip count. However it is not correct. According to profitability check in one case we can decide to peel in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration can be less then estimated trip count. This patch introduces another way to set the branch weights to peeled of branches. Let F is a weight of the edge from latch to header. Let E is a weight of the edge from latch to exit. F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit. Then, Estimated TripCount = F / E. For I-th (counting from 0) peeled off iteration we set the the weights for the peeled latch as (TC - I, 1). It gives us reasonable distribution, The probability to go to exit 1/(TC-I) increases. At the same time the estimated trip count of remaining loop reduces by I. As a result after peeling off N iteration the weights will be (F - N * E, E) and trip count of loop becomes F / E - N or TC - N. The idea is taken from the review of the patch D63918 proposed by Philip. Reviewers: reames, mkuper, iajbar, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64235 llvm-svn: 366665
*	[X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST ↵	Simon Pilgrim	2019-07-21	1	-19/+13
\| \| \| \| \| \| \| \|	narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660
*	[InstCombine] Update comment I missed in r366649. NFC	Craig Topper	2019-07-21	1	-1/+1
\| \| \| \|	llvm-svn: 366658
*	[GISel]: Attach missing range metadata while translating G_LOADs	Aditya Nandakumar	2019-07-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D65048 Attach range information to G_LOAD when only defining one register. reviewed by: arsenm llvm-svn: 366656
*	[InstCombine] Remove insertRangeTest code that handles the equality case.	Craig Topper	2019-07-21	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	For equality, the function called getTrue/getFalse with the VT of the comparison input. But getTrue/getFalse need the boolean VT. So if this code ever executed, it would assert. I believe these cases are removed by InstSimplify so we don't get here. So this patch just fixes up an assert to exclude the equality possibility and removes the broken code. llvm-svn: 366649
*	[InstCombine] Don't use AddOne/SubOne to see if two APInts are 1 apart. Use ↵	Craig Topper	2019-07-21	1	-5/+9
\| \| \| \| \| \| \| \| \| \|	APInt operations instead. NFCI AddOne/SubOne create new Constant objects. That seems heavy for comparing ConstantInts which wrap APInts. Just do the math on on the APInts and compare them. llvm-svn: 366648
*	[Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements	Roman Lebedev	2019-07-20	1	-35/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we were ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are anything other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 llvm-svn: 366637
*	[X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)	Simon Pilgrim	2019-07-20	1	-7/+38
\| \| \| \| \| \|	As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636
*	[Local] Zap blockaddress without users in ConstantFoldTerminator.	Florian Hahn	2019-07-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the blockaddress is not destoryed, the destination block will still be marked as having its address taken, limiting further transformations. I think there are other places where the dead blockaddress constants are kept around, I'll look into that as follow up. Reviewers: craig.topper, brzycki, davide Reviewed By: brzycki, davide Differential Revision: https://reviews.llvm.org/D64936 llvm-svn: 366633
*	[GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs	Jessica Paquette	2019-07-20	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625
*	[WebAssembly] Compute and export TLS block alignment	Guanzhong Chen	2019-07-19	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624
*	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces	Matt Arsenault	2019-07-19	1	-1/+3
\| \| \| \|	llvm-svn: 366621
*	[AMDGPU] Autogenerate register sequences in tuples	Stanislav Mekhanoshin	2019-07-19	1	-272/+47
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619
*	[AMDGPU] Fixed occupancy calculation for gfx10	Stanislav Mekhanoshin	2019-07-19	4	-28/+19
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
*	AMDGPU: Avoid custom predicates for stores with glue	Matt Arsenault	2019-07-19	1	-18/+24
\| \| \| \|	llvm-svn: 366613
*	AMDGPU: Redefine setcc condition PatLeafs	Matt Arsenault	2019-07-19	3	-67/+36
\| \| \| \| \| \|	Avoid using custom code predicates. llvm-svn: 366609
*	AMDGPU: Don't rely on m0 being -1 for GWS offsets	Matt Arsenault	2019-07-19	1	-4/+6
\| \| \| \| \| \| \|	This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
*	AMDGPU: Force s_waitcnt after GWS instructions	Matt Arsenault	2019-07-19	4	-5/+26
\| \| \| \| \| \| \|	This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
*	LiveIntervals: Fix handleMove asserting on BUNDLE	Matt Arsenault	2019-07-19	1	-1/+4
\| \| \| \| \| \| \| \| \|	The top-level BUNDLE instruction should behave as an ordinary instruction. It is supposed to have all relevant registers as implicit operands. Moving it should work as any other instruction. I believe the assert intended to avoid moving instructions inside bundles. llvm-svn: 366605
*	Revert "Use the MachineBasicBlock symbol for a callbr target"	Nick Desaulniers	2019-07-19	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r366523/ccbffefccaff42b0d094c9ef0f49fc3e8c8456ea. Two regressions were immediately reported: - https://github.com/ClangBuiltLinux/linux/issues/614 - https://github.com/ClangBuiltLinux/linux/issues/615 Reported-by: nathanchance llvm-svn: 366600
*	[AMDGPU] Allow register tuples to set asm names	Stanislav Mekhanoshin	2019-07-19	4	-139/+99
\| \| \| \| \| \| \| \| \| \| \| \|	This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598
*	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads	Matt Arsenault	2019-07-19	1	-1/+1
\| \| \| \| \| \|	The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597
*	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum	Matt Arsenault	2019-07-19	1	-2/+4
\| \| \| \| \| \| \|	v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585
*	AMDGPU/GlobalISel: Support arguments with multiple registers	Matt Arsenault	2019-07-19	2	-30/+47
\| \| \| \| \| \|	Handles structs used directly in argument lists. llvm-svn: 366584
*	AMDGPU/GlobalISel: Rewrite lowerFormalArguments	Matt Arsenault	2019-07-19	4	-200/+374
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
*	AMDGPU: Decompose all values to 32-bit pieces for calling conventions	Matt Arsenault	2019-07-19	3	-92/+18
\| \| \| \| \| \| \| \| \| \|	This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
*	DAG: Handle dbg_value for arguments split into multiple subregs	Matt Arsenault	2019-07-19	1	-23/+52
\| \| \| \| \| \| \| \|	This was handled previously for arguments split due to not fitting in an MVT. This was dropping the register for argument registers split due to TLI::getRegisterTypeForCallingConv. llvm-svn: 366574
*	[AMDGPU][MC] Corrected parsing of branch offsets	Dmitry Preobrazhensky	2019-07-19	1	-20/+43
\| \| \| \| \| \| \| \| \| \|	See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571
*	[MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs.	Kai Luo	2019-07-19	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Current PRE hoists common computations into CMBB = DT->findNearestCommonDominator(MBB, MBB1). However, if CMBB is in a hot loop body, we might get performance degradation. Differential Revision: https://reviews.llvm.org/D64394 llvm-svn: 366570
*	[X86] for split stack, not save/restore nested arg if unused	Than McIntosh	2019-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue. Reviewers: thanm Reviewed By: thanm Subscribers: mstorsjo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64673 llvm-svn: 366569
*	Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions	Oliver Stannard	2019-07-19	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We'd like to remove this whole function, because these are properties of functions, not the target as a whole. These two are easy to remove because they are only used for emitting ARM build attributes, which expects them to represent the defaults for the whole module, not just the last function generated. This is needed to get correct build attributes when using IPRA on ARM, because IPRA causes resetTargetOptions to get called before ARMAsmPrinter::emitAttributes. Differential revision: https://reviews.llvm.org/D64929 llvm-svn: 366562
*	[IPRA] Don't rely on non-exact function definitions	Oliver Stannard	2019-07-19	1	-1/+5
\| \| \| \| \| \| \| \| \|	If a function definition is not exact, then the linker could select a differently-compiled version of it, which could use different registers. https://reviews.llvm.org/D64909 llvm-svn: 366557
*	[ARM] Add <saturate> operand to SQRSHRL and UQRSHLL	Mikhail Maltsev	2019-07-19	6	-11/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555
*	[sanitizers] Use covering ObjectFormatType switches	Hubert Tong	2019-07-19	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch removes the `default` case from some switches on `llvm::Triple::ObjectFormatType`, and cases for the missing enumerators (`UnknownObjectFormat`, `Wasm`, and `XCOFF`) are then added. For `UnknownObjectFormat`, the effect of the action for the `default` case is maintained; otherwise, where `llvm_unreachable` is called, `report_fatal_error` is used instead. Where the `default` case returns a default value, `report_fatal_error` is used for XCOFF as a placeholder. For `Wasm`, the effect of the action for the `default` case in maintained. The code is structured to avoid strongly implying that the `Wasm` case is present for any reason other than to make the switch cover all `ObjectFormatType` enumerator values. Reviewers: sfertile, jasonliu, daltenty Reviewed By: sfertile Subscribers: hiraditya, aheejin, sunfish, llvm-commits, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64222 llvm-svn: 366544
*	[AMDGPU] Simplify the exclusive scan used for optimized atomics	Jay Foad	2019-07-19	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543
*	[Loop Peeling] Enable peeling of multiple exits by default.	Serguei Katkov	2019-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Enable loop peeling with multiple exits where all non-latch exits ends up with deopt by default. Reviewers: reames, fhahn Reviewed By: reames Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64619 llvm-svn: 366542
*	[InstCombine] Dropping redundant masking before left-shift [5/5] (PR42563)	Roman Lebedev	2019-07-19	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) Normally, the inner pattern is sign-extend, but for our purposes it's no different to other patterns: alive proofs: f: https://rise4fun.com/Alive/7U3 For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64524 llvm-svn: 366540
*	[InstCombine] Dropping redundant masking before left-shift [4/5] (PR42563)	Roman Lebedev	2019-07-19	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: e: https://rise4fun.com/Alive/0FT For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64521 llvm-svn: 366539
*	[InstCombine] Dropping redundant masking before left-shift [3/5] (PR42563)	Roman Lebedev	2019-07-19	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: d: https://rise4fun.com/Alive/I5Y For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64519 llvm-svn: 366538
*	[InstCombine] Dropping redundant masking before left-shift [2/5] (PR42563)	Roman Lebedev	2019-07-19	1	-16/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: c: https://rise4fun.com/Alive/RgJh For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64517 llvm-svn: 366537
*	[InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563)	Roman Lebedev	2019-07-19	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: b. `(x & (~(-1 << maskNbits))) << shiftNbits` All these patterns can be simplified to just: `x << ShiftShAmt` iff: b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)` alive proof: b: https://rise4fun.com/Alive/y8M For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64514 llvm-svn: 366536
*	[InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)	Roman Lebedev	2019-07-19	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)` alive proof: a: https://rise4fun.com/Alive/wi9 Indeed, not all of these patterns are canonical. But since this fold will only produce a single instruction i'm really interested in handling even uncanonical patterns, since i have this general kind of pattern in hotpaths, and it is not totally outlandish for bit-twiddling code. For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, huihuiz, xbolva00 Reviewed By: xbolva00 Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64512 llvm-svn: 366535
*	[DebugInfo] Some fields do not need relocations even relax is enabled.	Hsiangkai Wang	2019-07-19	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In debug frame information, some fields, e.g., Length in CIE/FDE and Offset in FDE are attributes to describe the structure of CIE/FDE. They are not related to the relaxed code. However, these attributes are symbol differences. So, in current design, these attributes will be filled as zero and LLVM generates relocations for them. We only need to generate relocations for symbols in executable sections. So, if the symbols are not located in executable sections, we still evaluate their values under relaxation. Differential Revision: https://reviews.llvm.org/D61584 llvm-svn: 366531
*	[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.	Hsiangkai Wang	2019-07-19	8	-38/+99
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is necessary to generate fixups in .debug_frame or .eh_frame as relaxation is enabled due to the address delta may be changed after relaxation. There is an opcode with 6-bits data in debug frame encoding. So, we also need 6-bits fixup types. Differential Revision: https://reviews.llvm.org/D58335 llvm-svn: 366524
*	Use the MachineBasicBlock symbol for a callbr target	Bill Wendling	2019-07-19	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Inline asm doesn't use labels when compiled as an object file. Therefore, we shouldn't create one for the (potential) callbr destination. Instead, use the symbol for the MachineBasicBlock. Reviewers: nickdesaulniers, craig.topper Reviewed By: nickdesaulniers Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64888 llvm-svn: 366523
*	[GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs ↵	Amara Emerson	2019-07-19	8	-42/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516
*	[AMDGPU] Drop Reg32 and use regular AsmName	Stanislav Mekhanoshin	2019-07-18	3	-25/+21
\| \| \| \| \| \| \| \|	This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505
*	[GlobalISel][AArch64] Add support for base register + offset register loads	Jessica Paquette	2019-07-18	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for folding G_GEPs into loads of the form ``` ldr reg, [base, off] ``` when possible. This can save an add before the load. Currently, this is only supported for loads of 64 bits into 64 bit registers. Add a new addressing mode function, `selectAddrModeRegisterOffset` which performs this folding when it is profitable. Also add a test for addressing modes for G_LOAD. Differential Revision: https://reviews.llvm.org/D64944 llvm-svn: 366503
*	CodeGen: Allow !associated metadata to point to aliases.	Peter Collingbourne	2019-07-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This is a small extension of !associated, mostly useful for the implementation convenience of instrumentation passes that RAUW globals with aliases, such as LowerTypeTests. Differential Revision: https://reviews.llvm.org/D64951 llvm-svn: 366502
*	Revert [X86] EltsFromConsecutiveLoads - support common source loads	Reid Kleckner	2019-07-18	1	-63/+5
\| \| \| \| \| \| \| \|	This reverts r366441 (git commit 48104ef7c9c653bbb732b66d7254957389fea337) This causes clang to fail to compile some file in Skia. Reduction soon. llvm-svn: 366501