bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DA][NewPM] Add a printerpass and port the testsuite	Philip Pfaffe	2019-01-08	1	-0/+7
\| \| \| \| \| \| \| \| \|	The new-pm version of DA is untested. Testing requires a printer, so add that and use it in the existing DA tests. Differential Revision: https://reviews.llvm.org/D56386 llvm-svn: 350624
*	[MemorySSA] Add SkipSelfWalker.	Alina Sbirlea	2019-01-07	1	-1/+49
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add implementation of SkipSelfWalker. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D56285 llvm-svn: 350561
*	[MemorySSA] Refactor CachingWalker.	Alina Sbirlea	2019-01-07	1	-49/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Refactor caching walker to make creating a walker that skips the starting access strightforward. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D55957 llvm-svn: 350558
*	[MemorySSA] Extend the clobber walker with the option to skip the starting ↵	Alina Sbirlea	2019-01-07	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	access. Summary: The option enables loop transformations to hoist accesses that do not have clobbers in the loop. If the clobber queries skips the starting access, the result may be outside the loop instead of the header Phi. Adding the walker that uses this option in a separate patch. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D55944 llvm-svn: 350551
*	Revert "[DemandedBits] Use SetVector for Worklist"	Nikita Popov	2019-01-07	1	-6/+7
\| \| \| \| \| \| \| \|	This reverts commit r350547. Seeing assertion failures on clang tests. llvm-svn: 350549
*	[DemandedBits] Use SetVector for Worklist	Nikita Popov	2019-01-07	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits currently uses a simple vector for the worklist, which means that instructions may be inserted multiple times into it. Especially in combination with the deep lattice, this may cause instructions too be recomputed very often. To avoid this, switch to a SetVector. Differential Revision: https://reviews.llvm.org/D56362 llvm-svn: 350547
*	[CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and	Chandler Carruth	2019-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	update client code. Also rename it to use the more generic term `call` instead of something that could be confused with a praticular type. Differential Revision: https://reviews.llvm.org/D56183 llvm-svn: 350508
*	[CallSite removal] Migrate all Alias Analysis APIs to use the newly	Chandler Carruth	2019-01-07	15	-343/+337
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	minted `CallBase` class instead of the `CallSite` wrapper. This moves the largest interwoven collection of APIs that traffic in `CallSite`s. While a handful of these could have been migrated with a minorly more shallow migration by converting from a `CallSite` to a `CallBase`, it hardly seemed worth it. Most of the APIs needed to migrate together because of the complex interplay of AA APIs and the fact that converting from a `CallBase` to a `CallSite` isn't free in its current implementation. Out of tree users of these APIs can fairly reliably migrate with some combination of `.getInstruction()` on the `CallSite` instance and casting the resulting pointer. The most generic form will look like `CS` -> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there is a more elegant migration. Hopefully, this migrates enough APIs for users to fully move from `CallSite` to the base class. All of the in-tree users were easily migrated in that fashion. Thanks for the review from Saleem! Differential Revision: https://reviews.llvm.org/D55641 llvm-svn: 350503
*	[BDCE] Remove dead uses of arguments	Nikita Popov	2019-01-04	1	-41/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to finding dead uses of instructions, also find dead uses of function arguments, and replace them with zero as well. I'm changing the way the known bits are computed here to remove the coupling between the transfer function and the algorithm. It previously relied on the first op being visited first and computing known bits -- unless the first op is not an instruction, in which case they're computed on the second op. I could have adjusted this to check for "instruction or argument", but I think it's better to avoid the repeated calculation with an explicit flag. Differential Revision: https://reviews.llvm.org/D56247 llvm-svn: 350435
*	[ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset	Florian Hahn	2019-01-04	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395
*	[SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable ↵	Simon Pilgrim	2019-01-03	1	-0/+4
\| \| \| \| \| \| \| \|	(PR40123) Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics llvm-svn: 350300
*	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)	Hal Finkel	2019-01-02	1	-49/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220
*	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"	Nikita Popov	2019-01-01	1	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188
*	[MemoryLocation] Use LocationSize instead of ints; NFC	George Burgess IV	2018-12-23	1	-20/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. This one sadly isn't super small, but all of the changes here are either to: - libfuncs that are passed a constant size (memcpy, memset, ...) - instructions that store/load a constant size So they have to be precise llvm-svn: 350017
*	[Loads] Use LocationSize instead of ints; NFC	George Burgess IV	2018-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. This tries to find literal loads/stores of the given type, so this has to be precise. llvm-svn: 350016
*	[Lint] Use LocationSize instead of ints; NFC	George Burgess IV	2018-12-23	1	-2/+2
\| \| \| \| \| \| \|	Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350015
*	[AAEval] Use LocationSize instead of ints; NFC	George Burgess IV	2018-12-23	1	-7/+10
\| \| \| \| \| \| \|	Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350014
*	[Analysis] More LocationSize cleanup; NFC	George Burgess IV	2018-12-22	2	-4/+4
\| \| \| \| \| \| \|	Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350008
*	[IR] Add Instruction::isLifetimeStartOrEnd, NFC	Vedant Kumar	2018-12-21	2	-7/+3
\| \| \| \| \| \| \| \| \| \| \|	Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an llvm.lifetime.start or an llvm.lifetime.end intrinsic. This was suggested as a cleanup in D55967. Differential Revision: https://reviews.llvm.org/D56019 llvm-svn: 349964
*	[BasicAA] Fix AA bug on dynamic allocas and stackrestore	Reid Kleckner	2018-12-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55969 llvm-svn: 349945
*	[LAA] Avoid generating RT checks for known deps preventing vectorization.	Florian Hahn	2018-12-20	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D55798 llvm-svn: 349794
*	[NFC] Fix trailing comma after function.	Clement Courbet	2018-12-20	1	-1/+1
\| \| \| \| \| \|	lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic] llvm-svn: 349732
*	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.	Michael Kruse	2018-12-20	2	-6/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
*	Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"	Nikita Popov	2018-12-19	1	-41/+6
\| \| \| \| \| \| \|	This reverts commit r349674. It causes a failure in test-suite enc-3des.execution_time. llvm-svn: 349684
*	[BDCE][DemandedBits] Detect dead uses of undead instructions	Nikita Popov	2018-12-19	1	-6/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The test case has a couple of cases that are not simplified yet. In particular, we're only looking at uses of instructions right now. I think it would make sense to also extend this to arguments. Furthermore DemandedBits doesn't yet know some of the tricks that InstCombine does for the demanded bits or bitwise or/and/xor in combination with known bits information. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 349674
*	[ValueTracking] remove unused parameters from helper functions; NFC	Sanjay Patel	2018-12-19	1	-23/+16
\| \| \| \|	llvm-svn: 349641
*	[LAA] Introduce enum for vectorization safety status (NFC).	Florian Hahn	2018-12-18	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D54892 llvm-svn: 349556
*	Change the objc ARC optimizer to use the new objc.* intrinsics	Pete Cooper	2018-12-18	1	-88/+64
\| \| \| \| \| \| \| \| \| \| \|	We're moving ARC optimisation and ARC emission in clang away from runtime methods and towards intrinsics. This is the part which actually uses the intrinsics in the ARC optimizer when both analyzing the existing calls and emitting new ones. Differential Revision: https://reviews.llvm.org/D55348 Reviewers: ahatanak llvm-svn: 349534
*	[CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual ↵	Artur Pilipenko	2018-12-18	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	implementation This is a follow up for rL347910. In the original patch I somehow forgot to pass the limit from wrappers to the function which actually does the job. llvm-svn: 349438
*	[InstSimplify] Simplify saturating add/sub + icmp	Nikita Popov	2018-12-17	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \|	If a saturating add/sub has one constant operand, then we can determine the possible range of outputs it can produce, and simplify an icmp comparison based on that. The implementation is based on a similar existing mechanism for simplifying binary operator + icmps. Differential Revision: https://reviews.llvm.org/D55735 llvm-svn: 349369
*	[SampleFDO] handle ProfileSampleAccurate when initializing function entry count	Wei Mi	2018-12-13	1	-18/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088
*	[ThinLTO] Compute synthetic function entry count	Easwaran Raman	2018-12-13	2	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43521 llvm-svn: 349076
*	Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"	David L. Jones	2018-12-13	1	-6/+0
\| \| \| \| \| \| \|	This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002
*	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.	Michael Kruse	2018-12-12	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944
*	[SampleFDO] Extend profile-sample-accurate option to cover ↵	Wei Mi	2018-12-12	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	isFunctionColdInCallGraph For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified. But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now. The patch fixes that. Differential Revision: https://reviews.llvm.org/D55567 llvm-svn: 348940
*	[ConstantFolding] Handle leading zero-size elements in load folding	Nikita Popov	2018-12-11	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Struct types may have leading zero-size elements like [0 x i32], in which case the "real" element at offset 0 will not necessarily coincide with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast() wants to drill down the element at offset 0, but currently always picks the 0th aggregate element to do so. This patch changes the code to find the first non-zero-size element instead, for the struct case. The motivation behind this change is https://github.com/rust-lang/rust/issues/48627. Rust is fond of emitting [0 x iN] separators between struct elements to enforce alignment, which prevents constant folding in this particular case. The additional tests with [4294967295 x [0 x i32]] check that we don't end up unnecessarily looping over a large number of zero-size elements of a zero-size array. Differential Revision: https://reviews.llvm.org/D55169 llvm-svn: 348895
*	[NewPM] fixing asserts on deleted loop in -print-after-all	Fedor Sergeev	2018-12-11	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	IR-printing AfterPass instrumentation might be called on a loop that has just been invalidated. We should skip printing it to avoid spurious asserts. Reviewed By: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D54740 llvm-svn: 348887
*	[MemCpyOpt] memset->memcpy forwarding with undef tail	Nikita Popov	2018-12-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645
*	Reapply "[DemandedBits][BDCE] Support vectors of integers"	Nikita Popov	2018-12-07	1	-21/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602
*	Revert "[DemandedBits][BDCE] Support vectors of integers"	Nikita Popov	2018-12-07	1	-44/+22
\| \| \| \| \| \| \|	This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558
*	[DemandedBits][BDCE] Support vectors of integers	Nikita Popov	2018-12-06	1	-22/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549
*	Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"	David L. Jones	2018-12-05	1	-42/+41
\| \| \| \| \| \| \| \|	This change caused SEGVs in instcombine. (The r347934 change seems to me to be a precipitating cause, not a root cause. Details are on the llvm-commits thread for r347934.) llvm-svn: 348426
*	[CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC	Sanjay Patel	2018-12-04	1	-10/+10
\| \| \| \| \| \| \|	The old function underspecified the return type, took an unused parameter, and had a misleading name. llvm-svn: 348292
*	[CmpInstAnalysis] fix formatting; NFC	Sanjay Patel	2018-12-03	1	-4/+4
\| \| \| \| \| \| \| \|	There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149
*	Fixing -print-module-scope for legacy SCC passes	Fedor Sergeev	2018-12-03	1	-4/+21
\| \| \| \| \| \| \| \| \| \|	It appears that print-module-scope was not implemented for legacy SCC passes. Fixed to print a whole module instead of just current SCC. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D54793 llvm-svn: 348144
*	[ValueTracking] Support funnel shifts in computeKnownBits()	Nikita Popov	2018-12-02	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D55140 llvm-svn: 348091
*	[ValueTracking] add helper function for testing implied condition; NFCI	Sanjay Patel	2018-12-02	2	-38/+35
\| \| \| \| \| \| \| \|	We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088
*	[ThinLTO] Allow importing of functions with var args	Teresa Johnson	2018-12-01	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068
*	LegacyDivergenceAnalysis: fix uninitialized value	Nicolai Haehnle	2018-11-30	1	-1/+3
\| \| \| \| \|	Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26 llvm-svn: 348051
*	[DA] GPUDivergenceAnalysis for unstructured GPU kernels	Nicolai Haehnle	2018-11-30	2	-26/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048