bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[FunctionAttrs] Enable nonnull arg propagation	David Bolvansky	2019-09-23	1	-4/+1
\| \| \| \| \| \|	Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again.. llvm-svn: 372573
*	[LSR] Silence static analyzer null dereference warnings with assertions. NFCI.	Simon Pilgrim	2019-09-22	1	-0/+2
\| \| \| \| \| \|	Add assertions to make it clear that GenerateIVChain / NarrowSearchSpaceByPickingWinnerRegs should succeed in finding non-null values llvm-svn: 372518
*	ConstantHoisting - Silence static analyzer dyn_cast<PointerType> null ↵	Simon Pilgrim	2019-09-22	1	-1/+1
\| \| \| \| \| \|	dereference warning. NFCI. llvm-svn: 372517
*	[InstCombine] allow icmp+binop folds before min/max bailout (PR43310)	Sanjay Patel	2019-09-22	2	-3/+4
\| \| \| \| \| \| \| \| \|	This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510
*	[VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.	Simon Pilgrim	2019-09-22	1	-1/+1
\| \| \| \|	llvm-svn: 372502
*	SROA: Check Total Bits of vector type	Suyog Sarda	2019-09-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	While Promoting alloca instruction of Vector Type, Check total size in bits of its slices too. If they don't match, don't promote the alloca instruction. Bug : https://bugs.llvm.org/show_bug.cgi?id=42585 llvm-svn: 372480
*	Test mail. NFC.	Suyog Sarda	2019-09-21	1	-1/+1
\| \| \| \| \| \|	Testing commit acces. NFC. llvm-svn: 372479
*	[Attributor] Implement "norecurse" function attribute deduction	Hideto Ueno	2019-09-21	1	-6/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces `norecurse` function attribute deduction. `norecurse` will be deduced if the following conditions hold: * The size of SCC in which the function belongs equals to 1. * The function doesn't have self-recursion. * We have `norecurse` for all call site. To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67751 llvm-svn: 372475
*	[AddressSanitizer] Don't dereference dyn_cast<ConstantInt> results. NFCI.	Simon Pilgrim	2019-09-20	1	-2/+2
\| \| \| \| \| \|	The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us. llvm-svn: 372429
*	[ObjC][ARC] Skip debug instructions when computing the insert point of	Akira Hatanaka	2019-09-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	objc_release calls This fixes a bug where the presence of debug instructions would cause ARC optimizer to change the order of retain and release calls. rdar://problem/55319419 llvm-svn: 372352
*	Don't use invalidated iterators in FlattenCFGPass	Jakub Kuderski	2019-09-19	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks. Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash. This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks. Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser Reviewed By: asbirlea Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67672 llvm-svn: 372347
*	[InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)	Roman Lebedev	2019-09-19	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base - offset; } ``` it will end up producing something like https://godbolt.org/z/luGEju which after optimizations reduces down to roughly ``` declare void @use64(i64) define i1 @test(i8* dereferenceable(1) %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = sub i64 %base_int, %offset call void @use64(i64 %adjusted) %not_null = icmp ne i64 %adjusted, 0 %no_underflow = icmp ule i64 %adjusted, %base_int %no_underflow_and_not_null = and i1 %not_null, %no_underflow ret i1 %no_underflow_and_not_null } ``` Without D67122 there was no `%not_null`, and in this particular case we can "get rid of it", by merging two checks: Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset` Alive proofs: https://rise4fun.com/Alive/QOs The `@llvm.usub.with.overflow` pattern itself is not handled here because this is the main pattern, that we currently consider canonical. https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00, majnemer Reviewed By: xbolva00, majnemer Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67356 llvm-svn: 372341
*	[Float2Int] avoid crashing on unreachable code (PR38502)	Sanjay Patel	2019-09-19	1	-18/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the example from: https://bugs.llvm.org/show_bug.cgi?id=38502 ...we hit infinite looping/crashing because we have non-standard IR - an instruction operand is used before defined. This and other unusual constructs are allowed in unreachable blocks, so avoid the problem by using DominatorTree to step around landmines. Differential Revision: https://reviews.llvm.org/D67766 llvm-svn: 372339
*	[Unroll] Add an option to control complete unrolling	Serguei Katkov	2019-09-19	2	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Add an ability to specify the max full unroll count for LoopUnrollPass pass in pass options. Reviewers: fhahn, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D67701 llvm-svn: 372305
*	[SimplifyCFG] mergeConditionalStoreToAddress(): try to pacify MSAN	Roman Lebedev	2019-09-18	1	-1/+1
\| \| \| \| \| \| \| \|	MSAN bot complains that there is use-of-uninitialized-value of this FreeStores later in IsWorthwhile(). Perhaps FreeStores needs to be stored in a vector? llvm-svn: 372262
*	[InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251)	Roman Lebedev	2019-09-18	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I don't have a direct motivational case for this, but it would be good to have this for completeness/symmetry. This pattern is basically the motivational pattern from https://bugs.llvm.org/show_bug.cgi?id=43251 but with different predicate that requires that the offset is non-zero. The completeness bit comes from the fact that a similar pattern (offset != zero) will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259, so it'd seem to be good to not overlook very similar patterns.. Proofs: https://rise4fun.com/Alive/21b Also, there is something odd with `isKnownNonZero()`, if the non-zero knowledge was specified as an assumption, it didn't pick it up (PR43267) With this, i see no other missing folds for https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67412 llvm-svn: 372257
*	[SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not ↵	Roman Lebedev	2019-09-18	1	-42/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction count Summary: As it can be see in the changed test, while `div` is really costly, we were speculating it. This does not seem correct. Also, the old code would run for every single insturuction in BB, instead of eagerly bailing out as soon as there are too many instructions. This function still has a problem that `PHINodeFoldingThreshold` is per-basic-block, while it should be for all the basic blocks. Reviewers: efriedma, craig.topper, dmgreen, jmolloy Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67315 llvm-svn: 372255
*	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): some cleanup before ↵	Roman Lebedev	2019-09-18	1	-5/+8
\| \| \| \| \| \|	upcoming patch llvm-svn: 372245
*	[SampleFDO] Minimize performance impact when profile-sample-accurate	Wei Mi	2019-09-18	1	-20/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is enabled. We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression. To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline instance, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled. At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall. Differential Revision: https://reviews.llvm.org/D67561 llvm-svn: 372232
*	[SimplifyLibCalls] fix crash with empty function name (PR43347)	Sanjay Patel	2019-09-18	1	-15/+12
\| \| \| \| \| \| \| \|	...and improve some variable names while here. https://bugs.llvm.org/show_bug.cgi?id=43347 llvm-svn: 372227
*	[PGO] Change hardcoded thresholds for cold/inlinehint to use summary	Teresa Johnson	2019-09-17	1	-18/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The PGO counter reading will add cold and inlinehint (hot) attributes to functions that are very cold or hot. This was using hardcoded thresholds, instead of the profile summary cutoffs which are used in other hot/cold detection and are more dynamic and adaptable. Switch to using the summary-based cold/hot detection. The hardcoded limits were causing some code that had a medium level of hotness (per the summary) to be incorrectly marked with a cold attribute, blocking inlining. Reviewers: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67673 llvm-svn: 372189
*	[PGO] Don't use comdat groups for counters & data on COFF	Reid Kleckner	2019-09-17	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For COFF, a comdat group is really a symbol marked IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in the group are not external and are not referenced by other TUs, they are things like debug info, C++ dynamic initializers, or other section registration schemes. The Visual C++ linker reports a duplicate symbol error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they would be discarded after handling the leader symbol. Fixes coverage-inline.cpp in check-profile after r372020. llvm-svn: 372182
*	[NFC][InstCombine] dropRedundantMaskingOfLeftShiftInput(): some NFC diff shaving	Roman Lebedev	2019-09-17	1	-8/+11
\| \| \| \|	llvm-svn: 372171
*	Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, ↵	David Bolvansky	2019-09-17	1	-1/+4
\| \| \| \| \| \|	'\0', y)" llvm-svn: 372142
*	Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, ↵	Krasimir Georgiev	2019-09-17	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'\0', y)" Summary: This reverts commit r372101. Causes ASAN build bot failures: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176 From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio: ``` [ RUN ] AddressSanitizer.StrNCatOOBTest /home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure Death test: strncat(to - 1, from, 0) Result: failed to die. ``` Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67658 llvm-svn: 372125
*	[LoopVectorize] Don't dereference a dyn_cast result. NFCI.	Simon Pilgrim	2019-09-17	1	-1/+1
\| \| \| \| \| \|	The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 372116
*	Hide implementation details in namespaces.	Benjamin Kramer	2019-09-17	1	-1/+3
\| \| \| \|	llvm-svn: 372113
*	[Attributor][Fix] Initialize the cache prior to using it	Johannes Doerfert	2019-09-17	1	-59/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There were segfaults as we modified and iterated the instruction maps in the cache at the same time. This was happening because we created new instructions while we populated the cache. This fix changes the order in which we perform these actions. First, the caches for the whole module are created, then we start to create abstract attributes. I don't have a unit test but the LLVM test suite exposes this problem. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67232 llvm-svn: 372105
*	[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)	David Bolvansky	2019-09-17	1	-1/+4
\| \| \| \|	llvm-svn: 372101
*	[InstCombine] Annotate strdup with deref_or_null	David Bolvansky	2019-09-17	1	-0/+6
\| \| \| \|	llvm-svn: 372098
*	[NFCI] Fixed buildbots	David Bolvansky	2019-09-17	1	-6/+1
\| \| \| \|	llvm-svn: 372097
*	[SimplifyLibCalls] Fix -Wunused-result after D53342/r372091	Fangrui Song	2019-09-17	1	-1/+2
\| \| \| \|	llvm-svn: 372096
*	[SimplifyLibCalls] Mark known arguments with nonnull	David Bolvansky	2019-09-17	1	-84/+204
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D53342 llvm-svn: 372091
*	[LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching ↵	Florian Hahn	2019-09-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LoopSize. We use `< UP.Threshold` later on, so we should use LoopSize + 1, to allow unrolling if the result won't exceed to loop size. Fixes PR43305. Reviewers: efriedma, dmgreen, paquette Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D67594 llvm-svn: 372084
*	[Attributor] Use Alias Analysis in noalias callsite argument deduction	Hideto Ueno	2019-09-17	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67604 llvm-svn: 372075
*	[Attributor] Create helper struct for handling analysis getters	Hideto Ueno	2019-09-17	1	-18/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for `noalias`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67603 llvm-svn: 372072
*	[PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups	Reid Kleckner	2019-09-16	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes relocations against __profd_ symbols in discarded sections, which is PR41380. In general, instrumentation happens very early, and optimization and inlining happens afterwards. The counters for a function are calculated early, and after inlining, counters for an inlined function may be widely referenced by other functions. For C++ inline functions of all kinds (linkonce_odr & available_externally mainly), instr profiling wants to deduplicate these __profc_ and __profd_ globals. Otherwise the binary would be quite large. I made __profd_ and __profc_ comdat in r355044, but I chose to make __profd_ internal. At the time, I was only dealing with coverage, and in that case, none of the instrumentation needs to reference __profd_. However, if you use PGO, then instrumentation passes add calls to __llvm_profile_instrument_range which reference __profd_ globals. The solution is to make these globals externally visible by using linkonce_odr linkage for data as was done for counters. This is safe because PGO adds a CFG hash to the names of the data and counter globals, so if different TUs have different globals, they will get different data and counter arrays. Reviewers: xur, hans Differential Revision: https://reviews.llvm.org/D67579 llvm-svn: 372020
*	[SimplifyCFG] FoldTwoEntryPHINode(): consider total speculation cost, not ↵	Roman Lebedev	2019-09-16	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) \| metric \| old \| new \| delta \| % \| \| x86-mi-counting.NumMachineFunctions \| 37720 \| 37721 \| 1 \| 0.00% \| \| x86-mi-counting.NumMachineBasicBlocks \| 773545 \| 771181 \| -2364 \| -0.31% \| \| x86-mi-counting.NumMachineInstructions \| 7488843 \| 7486442 \| -2401 \| -0.03% \| \| x86-mi-counting.NumUncondBR \| 135770 \| 135543 \| -227 \| -0.17% \| \| x86-mi-counting.NumCondBR \| 423753 \| 422187 \| -1566 \| -0.37% \| \| x86-mi-counting.NumCMOV \| 24815 \| 25731 \| 916 \| 3.69% \| \| x86-mi-counting.NumVecBlend \| 17 \| 17 \| 0 \| 0.00% \| We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 major improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009
*	[InstCombine] remove unneeded one-use checks for icmp fold	Sanjay Patel	2019-09-16	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Related folds were added in: rL125734 ...the code comment about register pressure is discussed in more detail in: https://bugs.llvm.org/show_bug.cgi?id=2698 But 10 years later, perf testing bzip2 with this change now shows a slight (0.2% average) improvement on Haswell although that's probably within test noise. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 and rL371981 are related patches in this series. llvm-svn: 372007
*	[InstCombine] remove unneeded one-use checks for icmp fold	Sanjay Patel	2019-09-16	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fold and several others were added in: rL125734 <https://reviews.llvm.org/rL125734> ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 is a related patch in this series. llvm-svn: 371981
*	[InstCombine] fix comments to match code; NFC	Sanjay Patel	2019-09-16	1	-25/+27
\| \| \| \| \| \| \| \| \| \|	This blob was written before match() existed, so it could probably be reduced significantly. But I suspect it isn't well tested, so tests would have to be added to reduce risk from logic changes. llvm-svn: 371978
*	[VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.	Simon Pilgrim	2019-09-16	1	-5/+8
\| \| \| \| \| \|	The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place. llvm-svn: 371975
*	[SLPVectorizer] Assert that we find a LastInst to silence analyzer null ↵	Simon Pilgrim	2019-09-16	1	-0/+1
\| \| \| \| \| \|	dereference warning. NFCI. llvm-svn: 371974
*	[SLPVectorizer] Don't dereference a dyn_cast result. NFCI.	Simon Pilgrim	2019-09-16	1	-4/+4
\| \| \| \| \| \|	The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 371973
*	[Attributor] Heap-To-Stack Conversion	Stefan Stipanovic	2019-09-15	1	-5/+259
\| \| \| \| \| \| \| \| \| \| \| \|	D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features. Reviewers: jdoerfert, uenoku, hfinkel, efriedma Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D65408 llvm-svn: 371942
*	[InstCombine] remove unneeded one-use checks for icmp fold	Sanjay Patel	2019-09-15	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fold and several others were added in: rL125734 ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. There are similar checks as noted with the TODO comments. I'm hoping to remove those restrictions too, but if any of these does cause a regression, it should be easier to correct by making small, individual commits. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. llvm-svn: 371940
*	[LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down ↵	Simon Pilgrim	2019-09-15	1	-1/+2
\| \| \| \| \| \| \| \|	the load chain. NFCI. Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value. llvm-svn: 371936
*	[SLP] limit vectorization of Constant subclasses (PR33958)	Sanjay Patel	2019-09-15	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for: https://bugs.llvm.org/show_bug.cgi?id=33958 It seems universally true that we would not want to transform this kind of sequence on any target, but if that's not correct, then we could view this as a target-specific cost model problem. We could also white-list ConstantInt, ConstantFP, etc. rather than blacklist Global and ConstantExpr. Differential Revision: https://reviews.llvm.org/D67362 llvm-svn: 371931
*	[Attributor][Fix] Use right type to replace expressions	Johannes Doerfert	2019-09-14	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This should be obsolete once the functionality in D66967 is integrated. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67231 llvm-svn: 371915
*	[BasicBlockUtils] Add optional BBName argument, in line with BB:splitBasicBlock	Florian Hahn	2019-09-13	2	-4/+6
\| \| \| \| \| \| \| \| \| \|	Reviewers: spatel, asbirlea, craig.topper Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D67521 llvm-svn: 371819