bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add support for unfold broadcast loads from FMA instructions.	Craig Topper	2019-09-07	1	-0/+121
\| \| \| \|	llvm-svn: 371323
*	[aarch64] Add combine patterns for fp16 fmla	Sebastian Pop	2019-09-07	1	-62/+280
\| \| \| \| \| \| \| \| \|	This patch enables generation of fused multiply add/sub for instructions operating on fp16. Tested on aarch64-linux. Differential Revision: https://reviews.llvm.org/D67297 llvm-svn: 371321
*	[X86] Add prefer-128-bit subtarget feature.	Craig Topper	2019-09-07	4	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Similar to the previous prefer-256-bit flag. We might want to enable this by default some CPUs. This just starts the initial work to implement and prove that it effects TTI's vector width. Reviewers: RKSimon, echristo, spatel, atdt Reviewed By: RKSimon Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67311 llvm-svn: 371319
*	Fix typo. NFCI	Simon Pilgrim	2019-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 371317
*	[X86] Avoid uses of getZextValue(). NFCI.	Simon Pilgrim	2019-09-07	1	-22/+19
\| \| \| \| \| \| \| \|	Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315
*	[ELF][MC] Set types of aliases of IFunc to STT_GNU_IFUNC	Fangrui Song	2019-09-07	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` .type foo,@gnu_indirect_function .set foo,foo_resolver .set foo2,foo .set foo3,foo2 ``` The types of foo2 and foo3 should be STT_GNU_IFUNC, but we currently resolve them to the type of foo_resolver. This patch fixes it. Differential Revision: https://reviews.llvm.org/D67206 Patch by Senran Zhang llvm-svn: 371312
*	[CodeGen] Handle SMULFIXSAT with scale zero in ↵	Bjorn Pettersson	2019-09-07	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TargetLowering::expandFixedPointMul Summary: Normally TargetLowering::expandFixedPointMul would handle SMULFIXSAT with scale zero by using an SMULO to compute the product and determine if saturation is needed (if overflow happened). But if SMULO isn't custom/legal it falls through and uses the same technique, using MULHS/SMUL_LOHI, as used for non-zero scales. Problem was that when checking for overflow (handling saturation) when not using MULO we did not expect to find a zero scale. So we ended up in an assertion when doing APInt::getLowBitsSet(VTSize, Scale - 1) This patch fixes the problem by adding a new special case for how saturation is computed when scale is zero. Reviewers: RKSimon, bevinh, leonardchan, spatel Reviewed By: RKSimon Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67071 llvm-svn: 371309
*	[Intrinsic] Add the llvm.umul.fix.sat intrinsic	Bjorn Pettersson	2019-09-07	11	-47/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308
*	[X86] Fix pshuflw formation from repeated shuffle mask (PR43230)	Nikita Popov	2019-09-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307
*	[LVI] Look through extractvalue of insertvalue	Nikita Popov	2019-09-07	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses the issue mentioned on D19867. When we simplify with.overflow instructions in CVP, we leave behind extractvalue of insertvalue sequences that LVI no longer understands. This means that we can not simplify any instructions based on the with.overflow anymore (until some over pass like InstCombine cleans them up). This patch extends LVI extractvalue handling by calling SimplifyExtractValueInst (which doesn't do anything more than constant folding + looking through insertvalue) and using the block value of the simplification. A possible alternative would be to do something similar to SimplifyIndVars, where we instead directly try to replace extractvalue users of the with.overflow. This would need some additional structural changes to CVP, as it's currently not legal to remove anything but the current instruction -- we'd have to introduce a worklist with instructions scheduled for deletion or similar. Differential Revision: https://reviews.llvm.org/D67035 llvm-svn: 371306
*	[DwarfExpression] Disallow some rewrites to avoid undefined behavior	Bjorn Pettersson	2019-09-07	2	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The value operand in DW_OP_plus_uconst/DW_OP_constu value can be large (it uses uint64_t as representation internally in LLVM). This means that in the uint64_t to int conversions, previously done by DwarfExpression::addMachineRegExpression, could lose information. Also, the negation done in "-Offset" was undefined behavior in case Offset was exactly INT_MIN. To avoid the above problems, we now avoid transformation like [Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset] and [Reg, DW_OP_constu, Offset, DW_OP_plus] --> [DW_OP_breg, Offset] when Offset > INT_MAX. And we avoid to transform [Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset] when Offset > INT_MAX+1. The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions with negative frame index offsets. Notice that this might just be the tip of the iceberg. There are lots of fishy handling related to these constants. I think both DIExpression::appendOffset and DIExpression::extractIfOffset may trigger undefined behavior for certain values. Reviewers: sdesmalen, rnk, JDevlieghere Reviewed By: JDevlieghere Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67263 llvm-svn: 371304
*	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.	Simon Pilgrim	2019-09-07	2	-2/+2
\| \| \| \|	llvm-svn: 371302
*	[SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not ↵	Roman Lebedev	2019-09-07	1	-7/+7
\| \| \| \| \| \| \| \| \|	SpeculationCost It counts the number of instructions we are ok speculating (at most 1 there), not their cost, so rename accordingly. llvm-svn: 371294
*	Replicate the change "[Alignment][NFC] Use Align with ↵	Sylvestre Ledru	2019-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	TargetLowering::setMinFunctionAlignment" on AVR to avoid a breakage. See r371200 / https://reviews.llvm.org/D67229 llvm-svn: 371293
*	[Attributor] ValueSimplify Abstract Attribute	Hideto Ueno	2019-09-07	1	-4/+269
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces initial `AAValueSimplify` which simplifies a value in a context. example - (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant. - If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66967 llvm-svn: 371291
*	Change TargetLibraryInfo analysis passes to always require Function	Teresa Johnson	2019-09-07	66	-214/+330
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
*	Synchronize LLVM's copy of libc++abi's demangler with the libc++abi	Richard Smith	2019-09-07	1	-0/+10
\| \| \| \| \| \| \| \| \|	version after r371273. Also fix a minor issue in r371273 that only surfaced after template instantiation from LLVM's use of the demangler. llvm-svn: 371274
*	[AArch64][GlobalISel] Enable the localizer for optimized builds.	Amara Emerson	2019-09-06	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Despite the fact that the localizer's original motivation was to fix horrendous constant spilling at -O0, shortening live ranges still has net benefits even with optimizations enabled. On an -Os build of CTMark, doing this improves code size by 0.5% geomean. There are a few regressions, bullet increasing in size by 0.5%. One example from bullet where code size increased slightly was due to GlobalISel actually now generating the same code as SelectionDAG. So we actually have an opportunity in future to implement better heuristics for localization and therefore be better than SDAG in some cases. In relation to other optimizations though that one is relatively minor. Differential Revision: https://reviews.llvm.org/D67303 llvm-svn: 371266
*	[InstCombine] Refactor substitution of instruction in the parent BB (NFC)	Evandro Menezes	2019-09-06	1	-14/+9
\| \| \| \| \| \| \| \| \|	Add the new method `LibCallSimplifier::substituteInParent()` that calls `LibCallSimplifier::replaceAllUsesWith()' and `LibCallSimplifier::eraseFromParent()` back to back, simplifying the resulting code. llvm-svn: 371264
*	[IR] CallBrInst: scan+update arg list when indirect dest list changes	Nick Desaulniers	2019-09-06	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There's an unspoken invariant of callbr that the list of BlockAddress Constants in the "function args" list match the BasicBlocks in the "other labels" list. (This invariant is being added to the LangRef in https://reviews.llvm.org/D67196). When modifying the any of the indirect destinations of a callbr instruction (possible jump targets), we need to update the function arguments if the argument is a BlockAddress whose BasicBlock refers to the indirect destination BasicBlock being replaced. Otherwise, many transforms that modify successors will end up violating that invariant. A recent change to the arm64 Linux kernel exposed this bug, which prevents the kernel from booting. I considered maintaining a mapping from indirect destination BasicBlock to argument operand BlockAddress, but this ends up being a one to potentially many (though usually one) mapping. Also, the list of arguments to a function (or more typically inline assembly) ends up being less than 10. The implementation is significantly simpler to just rescan the full list of arguments. Because of the one to potentially many relationship, the full arg list must be scanned (we can't stop at the first instance). Thanks to the following folks that reported the issue and helped debug it: * Nathan Chancellor * Will Deacon * Andrew Murray * Craig Topper Link: https://bugs.llvm.org/show_bug.cgi?id=43222 Link: https://github.com/ClangBuiltLinux/linux/issues/649 Link: https://lists.infradead.org/pipermail/linux-arm-kernel/2019-September/678330.html Reviewers: craig.topper, chandlerc Reviewed By: craig.topper Subscribers: void, javed.absar, kristof.beyls, hiraditya, llvm-commits, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D67252 llvm-svn: 371262
*	GlobalISel: Add G_FMAD instruction	Matt Arsenault	2019-09-06	1	-0/+2
\| \| \| \|	llvm-svn: 371254
*	GlobalISel: Support physical register inputs in patterns	Matt Arsenault	2019-09-06	1	-5/+7
\| \| \| \|	llvm-svn: 371253
*	Remove dead .seh_stackalloc parsing method in X86AsmParser	Reid Kleckner	2019-09-06	1	-14/+0
\| \| \| \| \| \| \|	The shared COFF asm parser code handles this directive, since it is shared with AArch64. Spotted by Alexandre Ganea in review. llvm-svn: 371251
*	AMDGPU: Fix typo	Matt Arsenault	2019-09-06	1	-4/+4
\| \| \| \|	llvm-svn: 371249
*	[X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.	Craig Topper	2019-09-06	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243
*	[X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.	Craig Topper	2019-09-06	1	-3/+3
\| \| \| \| \| \| \| \|	We can rely on X86FixupBWInsts to turn these into MOVZX32. This simplifies a follow up commit to use MOVSX for i8 sdivrem with a late optimization to use CBW when register allocation works out. llvm-svn: 371242
*	[X86] Teach FixupBWInsts to turn MOVSX16rr8/MOVZX16rr8/MOVSX16rm8/MOVZX16rm8 ↵	Craig Topper	2019-09-06	1	-6/+48
\| \| \| \| \| \|	into their 32-bit dest equivalents when the upper part of the register is dead. llvm-svn: 371240
*	[ConstantFolding] Refactor functions not available before C99 (NFC)	Evandro Menezes	2019-09-06	1	-1/+6
\| \| \| \| \| \| \|	Note the cases when calling a function at compile time may fail if the host does not support the C99 run time library. llvm-svn: 371236
*	[Remarks] Add support for internalizing a remark in a string table	Francis Visoiu Mistrih	2019-09-06	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	In order to keep remarks around, we need to make them tied to a string table. Users then can delete the parser and rely on the string table to keep the memory of the strings alive and deduplicated. llvm-svn: 371233
*	[ARM] Add patterns for VSUB with q and r registers	Oliver Cruickshank	2019-09-06	1	-0/+9
\| \| \| \| \| \| \|	Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231
*	[ARM] Add patterns for VADD with q and r registers	Oliver Cruickshank	2019-09-06	1	-0/+9
\| \| \| \| \| \| \|	Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230
*	[ARM] Add patterns for VMUL with q and r registers	Oliver Cruickshank	2019-09-06	1	-0/+9
\| \| \| \| \| \| \|	Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229
*	[ConstantFolding] Refactor function match for better speed (NFC)	Evandro Menezes	2019-09-06	1	-102/+134
\| \| \| \| \| \|	Use an `enum` instead of string comparison to match the candidate function. llvm-svn: 371228
*	[AArch64][GlobalISel] Always fall back on tail calls with -tailcallopt	Jessica Paquette	2019-09-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-tailcallopt requires that we perform different stack adjustments than with sibling calls. For example, the `@caller_to0_from8` function in test/CodeGen/AArch64/tail-call.ll requires that we adjust SP. Without -tailcallopt, this adjustment does not happen. With it, however, it is expected. So, to ensure that adding sibling call support doesn't break -tailcallopt, make CallLowering always fall back on possible tail calls when -tailcallopt is passed in. Update test/CodeGen/AArch64/tail-call.ll with a GlobalISel line to make sure that we don't differ from the SDAG implementation at any point. Differential Revision: https://reviews.llvm.org/D67245 llvm-svn: 371227
*	[SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)	Sanjay Patel	2019-09-06	1	-2/+2
\| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=43233 llvm-svn: 371221
*	[ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection	Sam Tebbs	2019-09-06	1	-10/+48
\| \| \| \| \| \| \| \| \| \| \| \|	This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218
*	[AMDGPU] Enable constant offset promotion to immediate operand for VMEM stores	Valery Pykhtin	2019-09-06	1	-4/+5
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
*	[Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment	Guillaume Chatelet	2019-09-06	10	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212
*	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment	Guillaume Chatelet	2019-09-06	5	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210
*	[Alignment] fix dubious min function alignment	Guillaume Chatelet	2019-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This was discovered while introducing the llvm::Align type. The original setMinFunctionAlignment used to take alignment as log2, looking at the comment it seems like instructions are to be 2-bytes aligned and not 4-bytes aligned. Reviewers: uweigand Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67271 llvm-svn: 371204
*	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment	Guillaume Chatelet	2019-09-06	12	-14/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200
*	[DFAPacketizer] Track resources for packetized instructions	James Molloy	2019-09-06	2	-11/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371198
*	[DebugInfo] LiveDebugValues: explicitly terminate overwritten stack locations	Jeremy Morse	2019-09-06	1	-12/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a stack spill location is overwritten by another spill instruction, any variable locations pointing at that slot should be terminated. We cannot rely on spills always being restored to registers or variable locations being moved by a DBG_VALUE: the register allocator is entitled to spill a value and then forget about it when it goes out of liveness. To address this, scan for memory writes to spill locations, even those we don't consider to be normal "spills". isSpillInstruction and isLocationSpill distinguish the two now. After identifying spill overwrites, terminate the open range, and insert a $noreg DBG_VALUE for that variable. Differential Revision: https://reviews.llvm.org/D66941 llvm-svn: 371193
*	[AMDGPU] Mark s_barrier as having side effects but not accessing memory.	Jay Foad	2019-09-06	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192
*	[yaml2obj] Rename SHOffset (e_shoff) field to SHOff. NFC	Fangrui Song	2019-09-06	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \|	`struct Elf*_Shdr` has a field `sh_offset`, named `ShOffset` in llvm::ELFYAML::Section. Rename SHOffset (e_shoff) to SHOff to prevent confusion. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67254 llvm-svn: 371185
*	[ARM] MVE Tail Predication	Sam Parker	2019-09-06	4	-1/+476
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
*	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the ↵	Kang Zhang	2019-09-06	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	blocks Summary: Fix a bug of not update the jump table and recommit it again. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 371177
*	[X86] Fix bad indentation. NFC	Craig Topper	2019-09-06	1	-1/+1
\| \| \| \|	llvm-svn: 371167
*	[yaml2obj] Make e_phoff and e_phentsize 0 if there are no program headers	Alex Brachet	2019-09-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It says [[ http://www.sco.com/developers/gabi/latest/ch4.eheader.html \| here ]] that if there are no program headers than e_phoff should be 0, but currently it is always set after the header. GNU's `readelf` (but not `llvm-readelf`) complains about this: `readelf: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers`. Reviewers: jhenderson, grimar, MaskRay, rupprecht Reviewed By: jhenderson, grimar, MaskRay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67054 llvm-svn: 371162
*	[MC] Fix undefined behavior in MCInstPrinter::formatHex	Jonas Devlieghere	2019-09-06	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Passing INT64_MIN to MCInstPrinter::formatHex triggers undefined behavior because the negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long long'). This patch puts a workaround in place to just print the hex value directly. A possible alternative involves using a small helper functions that uses (implementation) defined conversions to achieve the desirable value: static int64_t helper(int64_t V) { auto U = static_cast<uint64_t>(V); return V < 0 ? -U : U; } The underlying problem is that MCInstPrinter::formatHex(int64_t) returns a format_object<int64_t> and should really return a format_object<uint64_t>. However, that's not possible because formatImm needs to be able to print both as decimal (where a signed is required) and hex (where we'd prefer to always have an unsigned). format_object<int64_t> formatImm(int64_t Value) const { return PrintImmHex ? formatHex(Value) : formatDec(Value); } Differential revision: https://reviews.llvm.org/D67236 llvm-svn: 371159