bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[MIR] Skip hoisting to basic block which may throw exception or return	Anton Afanasyev	2019-06-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix hoisting to basic block which are not legal for hoisting cause it can be terminated by exception or it is return block. Reviewers: john.brawn, RKSimon, MatzeB Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63148 llvm-svn: 363164
*	[NFC] Correct comments in RegisterCoalescer.	Hsiangkai Wang	2019-06-12	1	-6/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63124 llvm-svn: 363119
*	[GlobalISel] Add a G_JUMP_TABLE opcode.	Amara Emerson	2019-06-11	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	This opcode generates a pointer to the address of the jump table specified by the source operand, which is a jump table index. It will be used in conjunction with an upcoming G_BRJT opcode to support jump table codegen with GlobalISel. Differential Revision: https://reviews.llvm.org/D63111 llvm-svn: 363096
*	[PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner	Jinsong Ji	2019-06-11	1	-4/+12
\| \| \| \| \| \| \| \| \|	Implement necessary target hooks to enable MachinePipeliner for P9 only. The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9. Differential Revision: https://reviews.llvm.org/D62164 llvm-svn: 363085
*	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.	Simon Pilgrim	2019-06-11	3	-45/+47
\| \| \| \| \| \|	As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048
*	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105)	Simon Pilgrim	2019-06-11	1	-9/+40
\| \| \| \| \| \| \| \|	Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040
*	Change semantics of fadd/fmul vector reductions.	Sander de Smalen	2019-06-11	2	-36/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035
*	AtomicExpand: Don't crash on non-0 alloca	Matt Arsenault	2019-06-11	1	-2/+6
\| \| \| \| \| \| \|	This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
*	AMDGPU: Expand < 32-bit atomics	Matt Arsenault	2019-06-11	1	-0/+4
\| \| \| \| \| \|	Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
*	[MIR-Canon] Fixing non-determinism that was breaking bots (NFC).	Puyan Lotfi	2019-06-11	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	An earlier fix of a subtle iterator invalidation bug had uncovered a nondeterminism that was present in the MultiUsers bag. Problem was that MultiUsers was being looked up using pointers. This patch is an NFC change that numbers each multiuser and processes each in numbered order. This fixes the test failure on netbsd and will likely fix the green-dragon bot too. llvm-svn: 363012
*	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops	Jessica Paquette	2019-06-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989
*	[FastISel] Skip creating unnecessary vregs for arguments	Francis Visoiu Mistrih	2019-06-10	2	-33/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963
*	[DebugInfo] Terminate all location-lists at end of block	Jeremy Morse	2019-06-10	2	-117/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951
*	[DebugInfo] More strict debug range for stack variables	Nikola Prica	2019-06-10	6	-128/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923
*	[DAGCombine] Match a pattern where a wide type scalar value is stored by ↵	QingShan Zhang	2019-06-10	1	-0/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921
*	[TargetLowering] Simplify (ctpop x) == 1	David Bolvansky	2019-06-09	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912
*	[MIR] Add simple PRE pass to MachineCSE	Anton Afanasyev	2019-06-09	1	-9/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901
*	Use for-range loop. NFCI.	Simon Pilgrim	2019-06-09	1	-3/+1
\| \| \| \|	llvm-svn: 362897
*	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) ↵	Simon Pilgrim	2019-06-08	1	-21/+4
\| \| \| \| \| \| \| \|	combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880
*	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.	Jonas Paulsson	2019-06-08	3	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
*	Factor out SelectionDAG's switch analysis and lowering into a separate ↵	Amara Emerson	2019-06-08	5	-767/+573
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857
*	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen	Volkan Keles	2019-06-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834
*	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI.	Simon Pilgrim	2019-06-07	1	-24/+24
\| \| \| \|	llvm-svn: 362825
*	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. ↵	Simon Pilgrim	2019-06-07	1	-3/+3
\| \| \| \| \| \|	NFCI. llvm-svn: 362820
*	[MachineScheduler] checkResourceLimit boundary condition update	Jinsong Ji	2019-06-07	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805
*	TailDuplicator: Remove no-op analyzeBranch call	Matt Arsenault	2019-06-07	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \|	This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800
*	Fix for lld buildbot	Sam Parker	2019-06-07	1	-2/+1
\| \| \| \| \| \|	Removed unused (in non-debug builds) variable. llvm-svn: 362775
*	[CodeGen] Generic Hardware Loop Support	Sam Parker	2019-06-07	3	-0/+443
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774
*	[DebugInfo] Incorrect debug info record generated for loop counter.	Alexey Lapshin	2019-06-06	1	-19/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750
*	[AIX] Implement function descriptor on SDAG	Jason Liu	2019-06-06	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735
*	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store ↵	Simon Pilgrim	2019-06-06	1	-7/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723
*	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI.	Simon Pilgrim	2019-06-06	1	-20/+21
\| \| \| \| \| \|	Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695
*	[MIPS GlobalISel] Select floor and ceil	Petar Avramovic	2019-06-06	1	-1/+9
\| \| \| \| \| \| \| \|	Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688
*	Allow target to handle STRICT floating-point nodes	Ulrich Weigand	2019-06-05	16	-19/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663
*	IR: make getParamByValType Just Work. NFC.	Tim Northover	2019-06-05	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642
*	Fix shadow local variable warning. NFCI.	Simon Pilgrim	2019-06-05	1	-6/+6
\| \| \| \|	llvm-svn: 362622
*	[LoopUtils][SLPVectorizer] clean up management of fast-math-flags	Sanjay Patel	2019-06-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of passing around fast-math-flags as a parameter, we can set those using an IRBuilder guard object. This is no-functional-change-intended. The motivation is to eventually fix the vectorizers to use and set the correct fast-math-flags for reductions. Examples of that not behaving as expected are: https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast') https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0) D61802 (should be able to reduce with IR-level FMF) Differential Revision: https://reviews.llvm.org/D62272 llvm-svn: 362612
*	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI.	Simon Pilgrim	2019-06-05	1	-1/+2
\| \| \| \| \| \|	Will be used more in an upcoming patch. llvm-svn: 362595
*	[SelectionDAG][FIX] Allow "returned" arguments to be bit-casted	Johannes Doerfert	2019-06-04	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: An argument that is return by a function but bit-casted before can still be annotated as "returned". Make sure we do not crash for this case. Reviewers: sunfish, stephenwlin, niravd, arsenm Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59917 llvm-svn: 362546
*	Revert r362472 as it is breaking PPC build bots	Nemanja Ivanovic	2019-06-04	1	-179/+0
\| \| \| \| \| \| \|	The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539
*	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1)	Craig Topper	2019-06-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533
*	[SelectionDAG][x86] limit post-legalization store merging by type	Sanjay Patel	2019-06-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507
*	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold ↵	Roman Lebedev	2019-06-04	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488
*	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold	Roman Lebedev	2019-06-04	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487
*	[SelectionDAG] ComputeNumSignBits - support constant pool values from target	Simon Pilgrim	2019-06-04	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits. The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select. It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup. Differential Revision: https://reviews.llvm.org/D62777 llvm-svn: 362486
*	[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD ↵	Simon Pilgrim	2019-06-04	1	-7/+7
\| \| \| \| \| \| \| \|	comments. NFCI. Pre-commit requested for D62777. llvm-svn: 362485
*	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold	Simon Pilgrim	2019-06-04	2	-0/+13
\| \| \| \| \| \| \| \|	Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483
*	[DAGCombine] Match a pattern where a wide type scalar value is stored by ↵	QingShan Zhang	2019-06-04	1	-0/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472
*	Propagate fmf for setcc in SDAG for select folds	Michael Berg	2019-06-03	2	-4/+8
\| \| \| \|	llvm-svn: 362448
*	Propagate fmf for setcc/select folds	Michael Berg	2019-06-03	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439