bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merging r339260:	Tom Stellard	2018-11-30	12	-222/+1446
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339260 \| syzaara \| 2018-08-08 08:20:43 -0700 (Wed, 08 Aug 2018) \| 13 lines [PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 ------------------------------------------------------------------------ llvm-svn: 347957
*	Merging r340641:	Hans Wennborg	2018-08-27	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r340641 \| stefanp \| 2018-08-24 21:38:29 +0200 (Fri, 24 Aug 2018) \| 9 lines [Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 ------------------------------------------------------------------------ llvm-svn: 340731
*	Merging r339769:	Hans Wennborg	2018-08-16	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r339769 \| nemanjai \| 2018-08-15 14:58:13 +0200 (Wed, 15 Aug 2018) \| 12 lines [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 ------------------------------------------------------------------------ llvm-svn: 339859
*	Merging r338658:	Hans Wennborg	2018-08-02	1	-204/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	------------------------------------------------------------------------ r338658 \| nemanjai \| 2018-08-02 02:03:22 +0200 (Thu, 02 Aug 2018) \| 13 lines [PowerPC] Do not round values prior to converting to integer Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 ------------------------------------------------------------------------ llvm-svn: 338678
*	[DAGCombiner] transform sub-of-shifted-signbit to add	Sanjay Patel	2018-07-30	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317
*	[AArch64, PowerPC, x86] add more signbit math tests; NFC	Sanjay Patel	2018-07-27	1	-7/+32
\| \| \| \| \| \| \| \|	The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150
*	[AArch64, PowerPC, x86] add more signbit math tests; NFC	Sanjay Patel	2018-07-27	1	-0/+28
\| \| \| \|	llvm-svn: 338143
*	[DAGCombiner] fold 'not' with signbit math	Sanjay Patel	2018-07-27	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132
*	[PowerPC] add more tests for signbit math; NFC	Sanjay Patel	2018-07-27	1	-0/+96
\| \| \| \|	llvm-svn: 338130
*	[SelectionDAG] try to convert funnel shift directly to rotate if legal	Sanjay Patel	2018-07-25	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966
*	[AArch, PowerPC] add more tests for legal rotate ops; NFC	Sanjay Patel	2018-07-25	1	-4/+25
\| \| \| \|	llvm-svn: 337964
*	[Power9] Code Cleanup - Remove needsAggressiveScheduling()	Stefan Pintilie	2018-07-19	12	-199/+164
\| \| \| \| \| \| \| \| \|	As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488
*	Introduce codegen for the Signal Processing Engine	Justin Hibbits	2018-07-18	5	-10/+599
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
*	[Intrinsics] define funnel shift IR intrinsics + DAG builder support	Sanjay Patel	2018-07-16	2	-0/+485
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221
*	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)	Sanjay Patel	2018-07-15	3	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130
*	[PowerPC] Materialize more constants with CR-field set in late peephole	Nemanja Ivanovic	2018-07-13	2	-3/+422
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
*	[PowerPC] [NFC] Update __float128 tests	Stefan Pintilie	2018-07-12	9	-1067/+1074
\| \| \| \| \| \| \|	Add the two options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to the __float128 tests. Then modify the tests as required. llvm-svn: 336940
*	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests	Joel E. Denny	2018-07-11	4	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843
*	[Power9] Add remaining __flaot128 builtin support for FMA round to odd	Stefan Pintilie	2018-07-11	1	-43/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement this as it is done on GCC: __float128 a, b, c, d; a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp Differential Revision: https://reviews.llvm.org/D48218 llvm-svn: 336754
*	[Power9] Add __float128 builtins for Rounding Operations	Stefan Pintilie	2018-07-09	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601
*	[Power9] [LLVM] Add __float128 support for trunc to double round to odd	Stefan Pintilie	2018-07-09	1	-0/+10
\| \| \| \| \| \| \| \| \|	Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595
*	[Power9] Add __float128 builtins for Round To Odd	Stefan Pintilie	2018-07-09	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \|	GCC has builtins for these round to odd instructions: __float128 __builtin_sqrtf128_round_to_odd (__float128) __float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128) __float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128) Differential Revision: https://reviews.llvm.org/D47550 llvm-svn: 336578
*	[Power9] Add __float128 support for compare operations	Stefan Pintilie	2018-07-09	1	-0/+225
\| \| \| \| \| \| \| \|	Added handling for the select f128. Differential Revision: https://reviews.llvm.org/D48294 llvm-svn: 336548
*	[Power9] Add __float128 library call for frem	Stefan Pintilie	2018-07-06	1	-0/+14
\| \| \| \| \| \| \| \|	Power 9 does not have a hardware instruction for frem but we can call fmodf128. Differential Revision: https://reviews.llvm.org/D48552 llvm-svn: 336406
*	[Power9] Add lib calls for float128 operations with no equivalent PPC ↵	Lei Huang	2018-07-05	1	-1/+146
\| \| \| \| \| \| \| \| \| \| \|	instructions Map the following instructions to the proper float128 lib calls: pow[i], exp[2], log[2\|10], sin, cos, fmin, fmax Differential Revision: https://reviews.llvm.org/D48544 llvm-svn: 336361
*	[AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC	Sanjay Patel	2018-07-05	1	-0/+151
\| \| \| \|	llvm-svn: 336348
*	[Power9] Optimize codgen for conversions of int to float128	Lei Huang	2018-07-05	2	-10/+136
\| \| \| \| \| \| \| \| \| \| \| \|	Optimize code sequences for integer conversion to fp128 when the integer is a result of: * float->int * float->long * double->int * double->long Differential Revision: https://reviews.llvm.org/D48429 llvm-svn: 336316
*	[Power9][NFC] add back-end tests for passing homogeneous fp128 aggregates by ↵	Lei Huang	2018-07-05	1	-3/+187
\| \| \| \| \| \| \| \| \| \| \|	value Tests to verify that we are passing fp128 via VSX registers as per ABI. These are related to clang commit rL336308. Differential Revision: https://reviews.llvm.org/D48310 llvm-svn: 336314
*	[Power9] Add tests for passing float128 in VSX reg for non-homogenous aggregates	Lei Huang	2018-07-05	1	-0/+206
\| \| \| \| \| \|	Add missing testcase for rL336310 llvm-svn: 336313
*	[Power9]Legalize and emit code for quad-precision convert from single-precision	Lei Huang	2018-07-05	2	-27/+162
\| \| \| \| \| \| \| \| \|	Legalize and emit code for quad-precision floating point operation conversion of single-precision value to quad-precision. Differential Revision: https://reviews.llvm.org/D47569 llvm-svn: 336307
*	[Power9] Implement float128 parameter passing and return values	Lei Huang	2018-07-05	1	-0/+268
\| \| \| \| \| \| \| \| \| \|	This patch enable parameter passing and return by value for float128 types. Passing aggregate/union which contain float128 members will be submitted in subsequent patches. Differential Revision: https://reviews.llvm.org/D47552 llvm-svn: 336306
*	[Power9]Legalize and emit code for round & convert quad-precision values	Lei Huang	2018-07-04	1	-0/+154
\| \| \| \| \| \| \| \| \|	Legalize and emit code for round & convert float128 to double precision and single precision. Differential Revision: https://reviews.llvm.org/D46997 llvm-svn: 336299
*	[PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler	Stefan Pintilie	2018-07-04	76	-482/+457
\| \| \| \| \| \| \| \| \| \| \|	We want to run the Machine Scheduler instead of the List Scheduler after RA. Checked with a performance run on a Power 9 machine with SPEC 2006 and while some benchmarks improved and others degraded the geomean was slightly improved with the Machine Scheduler. Differential Revision: https://reviews.llvm.org/D45265 llvm-svn: 336295
*	NFC - Various typo fixes in tests	Gabor Buella	2018-07-04	1	-1/+1
\| \| \| \|	llvm-svn: 336268
*	[PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's ↵	QingShan Zhang	2018-07-02	1	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = (unsigned long long )(p - 50000 + i); unsigned long long x2 = (unsigned long long )(p - 61024 + i); unsigned long long x3 = (unsigned long long )(p - 62048 + i); unsigned long long x4 = (unsigned long long )(p - 64096 + i); result = x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074
*	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros	Sanjay Patel	2018-06-27	4	-6/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761
*	[DAGCombiner] eliminate setcc bool math when input is low-bit of some value	Sanjay Patel	2018-06-24	1	-32/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch has the same motivating example as D48466: define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) { %c.0282 = and i32 %c.0282.in, 268435455 %a16 = lshr i64 32508, %x %a17 = and i64 %a16, 1 %tobool = icmp eq i64 %a17, 0 %. = select i1 %tobool, i32 1, i32 2 %.286 = select i1 %tobool, i32 27, i32 26 %shr97 = lshr i32 %c.0282, %. %shl98 = shl i32 %c.0282.in, %.286 %or99 = or i32 %shr97, %shl98 %shr100 = lshr i32 %d.0280, %. %shl101 = shl i32 %d.0280, %.286 %or102 = or i32 %shr100, %shl101 store i32 %or99, i32* %ptr0 store i32 %or102, i32* %ptr1 ret void } ...but I'm trying to kill the setcc bool math sooner rather than later. By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, we can create a universally good fold because we always eliminate the condition code intermediate value. Here are Alive proofs for these (currently instcombine folds the 'add' variants, but misses the 'sub' patterns): https://rise4fun.com/Alive/Gsyp Name: sub of zext cmp mask %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %z = zext i1 %c to i32 %r = sub i32 C1, %z => %optional_cast = zext i8 %a to i32 %r = add i32 %optional_cast, C1-1 Name: add of zext cmp mask %a = and i32 %x, 1 %c = icmp eq i32 %a, 0 %z = zext i1 %c to i8 %r = add i8 %z, C1 => %optional_cast = trunc i32 %a to i8 %r = sub i8 C1+1, %optional_cast All of the tests look like improvements or neutral to me. But it is possible that x86 test+set+bitop is better than what we now show here. I suspect we could do better by adding another fold for the 'sub' variants. We start with select-of-constant in IR in the larger motivating test, so that's why I included tests with selects. Proofs for those variants: https://rise4fun.com/Alive/Bx1 Name: true const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C2, i64 C1 => %z = zext i8 %a to i64 %r = sub i64 C2, %z Name: false const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C1, i64 C2 => %z = zext i8 %a to i64 %r = add i64 C1, %z Differential Revision: https://reviews.llvm.org/D48466 llvm-svn: 335433
*	[PowerPC] add more tests for bit hacking opportunities with setcc; NFC	Sanjay Patel	2018-06-22	1	-0/+57
\| \| \| \| \| \|	Missed cases where the input and output are the same size in rL335390. llvm-svn: 335395
*	[PowerPC] add tests for bit hacking opportunities with setcc; NFC	Sanjay Patel	2018-06-22	1	-0/+114
\| \| \| \| \| \| \| \| \| \|	We likely gave up on folding some select-of-constants patterns in IR with rL331486, and we need to recover those in the DAG. The tests without select are based on our current DAGCombiner optimizations for select-of-constants. llvm-svn: 335390
*	[DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property	Mikael Holmen	2018-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In some cases, these operands lacked the IsDebug property, which is meant to signal that they should not affect codegen. This patch adds a check for this property in the MachineVerifier and adds it where it was missing. This includes refactorings to use MachineInstrBuilder construction functions instead of manually setting up the intrinsic everywhere. Patch by: JesperAntonsson Reviewers: aprantl, rnk, echristo, javed.absar Reviewed By: aprantl Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D48319 llvm-svn: 335214
*	Generalize MergeBlockIntoPredecessor. Replace uses of ↵	Alina Sbirlea	2018-06-20	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183
*	Allow binop C1, (select cc, CF, CT) -> select folding	Stanislav Mekhanoshin	2018-06-20	1	-126/+100
\| \| \| \| \| \| \| \| \| \|	Previously this folding was done only if select is a first operand. However, for non-commutative operations constant may go before select. Differential Revision: https://reviews.llvm.org/D48223 llvm-svn: 335167
*	[PowerPC] Fix label address calculation for ppc32	Strahinja Petrovic	2018-06-19	1	-0/+44
\| \| \| \| \| \| \| \|	This patch fixes calculating address of label on ppc32 (for -fPIC). Differential Revision: https://reviews.llvm.org/D46582 llvm-svn: 335043
*	If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction ↵	QingShan Zhang	2018-06-19	2	-2/+72
\| \| \| \| \| \| \| \| \| \| \|	when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction. Differential Revision: https://reviews.llvm.org/D47568 Reviewed By: Nemanjai llvm-svn: 335024
*	Tests for dag combine select (binop) -> select. NFC.	Stanislav Mekhanoshin	2018-06-18	1	-58/+332
\| \| \| \| \| \|	Tests will be updated with https://reviews.llvm.org/D48223 llvm-svn: 334987
*	Utilize new SDNode flag functionality to expand current support for fma	Michael Berg	2018-06-16	2	-40/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai Reviewed By: rampitec, nhaehnle Subscribers: tpr, nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47918 llvm-svn: 334876
*	propagate fast math flags via IR on fma and sub expressions	Michael Berg	2018-06-07	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242
*	[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector	Hiroshi Inoue	2018-06-07	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation. However, enforcing 32-bit instruction sometimes results in redundant generated code. For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF). uint64_t func(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) \| (b << 32) ; } To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group. Differential Revision: https://reviews.llvm.org/D47867 llvm-svn: 334195
*	guard fsqrt with fmf sub flags	Michael Berg	2018-06-06	1	-10/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. It contains only context for fsqrt. Reviewers: spatel, hfinkel, arsenm Reviewed By: spatel Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai Differential Revision: https://reviews.llvm.org/D47749 llvm-svn: 334113
*	guard fneg with fmf sub flags	Michael Berg	2018-06-05	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. Reviewers: spatel, hfinkel Reviewed By: spatel Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D47389 llvm-svn: 334037