bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PHIElimination] Update the regression test for PR16508	Bjorn Pettersson	2018-09-30	2	-28/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When PR16508 was solved (in rL185363) a regression test was added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll. I discovered that the test case no longer reproduced the scenario from PR16508. This problem could have been amended by adding an extra RUN line with "-O1" (or possibly "-O0"), but instead I added a mir-reproducer test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir to get a reproducer that is less sensitive to changes in earlier passes (including O-level). While being at it I also corrected a code comment in PHIElimination::EliminatePHINodes that has been incorrect since the related bugfix from rL185363. Reviewers: MatzeB, hfinkel Reviewed By: MatzeB Subscribers: nemanjai, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52553 llvm-svn: 343416
*	[PowerPC] optimize conditional branch on CRSET/CRUNSET	Hiroshi Inoue	2018-09-26	2	-0/+264
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET. Other optimizers, such as block placement, may generate such code and hence I do this at the very end of the optimization in pre-emit peephole pass. A conditional branch based on a constant is eliminated or converted into unconditional branch. Also CRSET/CRUNSET is eliminated if the condition code register is not used by instruction other than the branch to be optimized. Differential Revision: https://reviews.llvm.org/D52345 llvm-svn: 343100
*	[Power9] [LLVM] Add __float128 exponent GET and SET builtins	Stefan Pintilie	2018-09-24	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added __builtin_vsx_scalar_extract_expq __builtin_vsx_scalar_insert_exp_qp Builtins should behave the same way as in GCC. Differential Revision: https://reviews.llvm.org/D48185 llvm-svn: 342910
*	[PowerPC] Support operand modifier 'x' in inline asm	Zaara Syeda	2018-09-24	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	gcc uses operand modifier 'x' in inline asm for VSX registers. Without this modifier, instructions which use VSX numbering for their operands are printed as VMX registers. This patch adds support for the operand modifier 'x'. Differential Revision: https://reviews.llvm.org/D52244 llvm-svn: 342882
*	[PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1	QingShan Zhang	2018-09-20	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive. But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52072 llvm-svn: 342611
*	[PowerPC] Do not emit record-form rotates when record-form andi/andis suffices	Nemanja Ivanovic	2018-09-18	3	-22/+83
\| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to the previous patch that eliminated some of the rotates. With this addition, we will also emit the record-form andis. This patch increases the number of record-form rotates we eliminate by more than 70%. Differential revision: https://reviews.llvm.org/D44897 llvm-svn: 342478
*	[PowerPC] Optimize compares fed by ANDISo	Nemanja Ivanovic	2018-09-18	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions. When optimizing compares, we handle the former in order to eliminate the compare instruction but not the latter. This patch just adds the latter to the set of instructions we optimize. The reason these instructions need to be handled separately is that they are not part of the RecFormRel map (since they don't have a non-record-form). The missing "and-immediate-shifted" is just an oversight in the initial implementation. Differential revision: https://reviews.llvm.org/D51353 llvm-svn: 342472
*	Remove trailing whitespace introduced in r342440.	Alexander Kornienko	2018-09-18	1	-3/+3
\| \| \| \|	llvm-svn: 342463
*	[PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8	QingShan Zhang	2018-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52040 llvm-svn: 342441
*	[PowerPC][NFC] Add a mulld testcase for scheduling check.	QingShan Zhang	2018-09-18	1	-0/+53
\| \| \| \| \| \| \| \| \|	This patch add a mulld testcase for scheduling check. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52039 llvm-svn: 342440
*	[PowerPC] Fix label address calculation for ppc64	Strahinja Petrovic	2018-09-17	1	-0/+21
\| \| \| \| \| \| \| \|	This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368
*	[PowerPC] Fix the calling convention for i1 arguments on PPC32	Lion Yang	2018-09-14	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Integer types smaller than i32 must be extended to i32 by default. The feature "crbits" introduced at r202451 handles i1 as a special case, but it did not extend properly. The caller was, therefore, passing i1 stack arguments by writing 0/1 to the first byte of the 4-byte stack object and callee was reading the first byte for the value. "crbits" is enabled if the optimization level is greater than 1, which is very common in "release builds". Such discrepancies with ABI specification also introduces potential incompatibility with programs or libraries built with other compilers e.g. GCC. Fixes PR38661 Reviewers: hfinkel, cuviper Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D51108 llvm-svn: 342288
*	[PowerPC] Combine ADD to ADDZE	QingShan Zhang	2018-09-07	1	-0/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the ppc64le platform, if ir has the following form, define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 { entry: %cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768) %conv1 = zext i1 %cmp to i64 %add = add nsw i64 %conv1, %x ret i64 %add } we can optimize it to the form below. when C == 0 --> addze X, (addic Z, -1)) / add X, (zext(setne Z, C))-- \ when -32768 <= -C <= 32767 && C != 0 --> addze X, (addic (addi Z, -C), -1) Patch By: HLJ2009 (Li Jia He) Differential Revision: https://reviews.llvm.org/D51403 Reviewed By: Nemanjai llvm-svn: 341634
*	[PowerPC] Add Itineraries of IIC_IntRotateDI for P7/P8	QingShan Zhang	2018-09-03	4	-67/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch by jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D51506 llvm-svn: 341293
*	[PPC] Remove Darwin support from POWER backend.	Kit Barton	2018-08-28	119	-652/+557
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch issues an error message if Darwin ABI is attempted with the PPC backend. It also cleans up existing test cases, either converting the test to use an alternative triple or removing the test if the coverage is no longer needed. Updated Tests ------------- The majority of test cases were updated to use a different triple that does not include the Darwin ABI. Many tests were also updated to use FileCheck, in place of grep. Deleted Tests ------------- llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test specific functionality of dsymutil using an object file created with an old version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he suggested removing the test. llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a PPC test to a SystemZ test, as the behavior is also reproducible there. All other tests that were deleted were specific to the darwin/ppc ABI and no longer necessary. Phabricator Review: https://reviews.llvm.org/D50988 llvm-svn: 340795
*	[PowerPC] Revert commit r339779	Nemanja Ivanovic	2018-08-27	2	-100/+7
\| \| \| \| \| \| \|	This commit has caused failures in some internal benchmarks. Temporarily reverting this patch until the issue can be diagnosed and fixed. llvm-svn: 340740
*	[PowerPC] Recommit r340016 after fixing the reported issue	Nemanja Ivanovic	2018-08-27	1	-0/+17
\| \| \| \| \| \| \| \|	The internal benchmark failure reported by Google was due to a missing check for the result type for the sign-extend and shift DAG. This commit adds the check and re-commits the patch. llvm-svn: 340734
*	[PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar ↵	Stefan Pintilie	2018-08-24	5	-14/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	registers for P9 This patch will address using the xscpsgndp instruction to copy floating point scalar registers instead of the xxlor (specifically XXLORf) instruction that is currently used. Additionally, this patch of utilizing xscpsgndp will apply to P9, while pre-P9 will still use xxlor. Patch by amyk Differential Revision: https://reviews.llvm.org/D50004 llvm-svn: 340643
*	[Exception Handling] Unwind tables are required for all functions that have ↵	Stefan Pintilie	2018-08-24	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340641
*	[PowerPC] Change Test Options [NFC]	Stefan Pintilie	2018-08-24	3	-750/+782
\| \| \| \| \| \|	Patch by amyk llvm-svn: 340639
*	Revert "[Exception Handling] Unwind tables are required for all functions ↵	Stefan Pintilie	2018-08-24	1	-51/+0
\| \| \| \| \| \| \| \| \|	that have an EH personality." This reverts commit rL340614. Previous commit broke some llvm-cfi-verify tests. llvm-svn: 340625
*	[Exception Handling] Unwind tables are required for all functions that have ↵	Stefan Pintilie	2018-08-24	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340614
*	Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift ↵	Eric Christopher	2018-08-21	1	-17/+0
\| \| \| \| \| \| \| \|	immediate instruction" due to it causing a compiler crash on valid. This reverts commit r340016, testcase forthcoming. llvm-svn: 340315
*	[PowerPC] Add a peephole post RA to transform the inst that fed by add	QingShan Zhang	2018-08-20	18	-107/+109
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e. y = add imm, reg LFDX. 0, y --> LFD imm(reg) Reviewers: Nemanjai Differential Revision: https://reviews.llvm.org/D49007 llvm-svn: 340149
*	[PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads	Stefan Pintilie	2018-08-17	1	-0/+274
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses: - Implementation within PPCISelLowering.cpp to check if we should use direct load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector function is used; which will allow us to catch as many cases of the scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into lxsd. - Test cases to exhibit the behaviour of emitting lxsd/lfd. Patch by amyk Differential revision: https://reviews.llvm.org/D49698 llvm-svn: 340037
*	[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction	Nemanja Ivanovic	2018-08-17	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \|	Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli extend sign and shift immediate instruction. Patch by RolandF. Differential revision: https://reviews.llvm.org/D49879 llvm-svn: 340016
*	[SelectionDAG] Improve the legalisation lowering of UMULO.	Eli Friedman	2018-08-16	1	-0/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922
*	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type	Nemanja Ivanovic	2018-08-15	1	-5/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779
*	[PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types	Nemanja Ivanovic	2018-08-15	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769
*	[SelectionDAG] try harder to convert funnel shift to rotate	Sanjay Patel	2018-08-09	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359
*	[PowerPC] Improve codegen for vector loads using scalar_to_vector	Zaara Syeda	2018-08-08	11	-219/+1444
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260
*	[PowerPC] Do not round values prior to converting to integer	Nemanja Ivanovic	2018-08-02	1	-204/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 llvm-svn: 338658
*	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type	Sanjay Patel	2018-08-01	2	-40/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592
*	[DAGCombiner] transform sub-of-shifted-signbit to add	Sanjay Patel	2018-07-30	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317
*	[AArch64, PowerPC, x86] add more signbit math tests; NFC	Sanjay Patel	2018-07-27	1	-7/+32
\| \| \| \| \| \| \| \|	The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150
*	[AArch64, PowerPC, x86] add more signbit math tests; NFC	Sanjay Patel	2018-07-27	1	-0/+28
\| \| \| \|	llvm-svn: 338143
*	[DAGCombiner] fold 'not' with signbit math	Sanjay Patel	2018-07-27	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132
*	[PowerPC] add more tests for signbit math; NFC	Sanjay Patel	2018-07-27	1	-0/+96
\| \| \| \|	llvm-svn: 338130
*	[SelectionDAG] try to convert funnel shift directly to rotate if legal	Sanjay Patel	2018-07-25	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966
*	[AArch, PowerPC] add more tests for legal rotate ops; NFC	Sanjay Patel	2018-07-25	1	-4/+25
\| \| \| \|	llvm-svn: 337964
*	[Power9] Code Cleanup - Remove needsAggressiveScheduling()	Stefan Pintilie	2018-07-19	12	-199/+164
\| \| \| \| \| \| \| \| \|	As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488
*	Introduce codegen for the Signal Processing Engine	Justin Hibbits	2018-07-18	5	-10/+599
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
*	[Intrinsics] define funnel shift IR intrinsics + DAG builder support	Sanjay Patel	2018-07-16	2	-0/+485
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221
*	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)	Sanjay Patel	2018-07-15	3	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130
*	[PowerPC] Materialize more constants with CR-field set in late peephole	Nemanja Ivanovic	2018-07-13	2	-3/+422
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
*	[PowerPC] [NFC] Update __float128 tests	Stefan Pintilie	2018-07-12	9	-1067/+1074
\| \| \| \| \| \| \|	Add the two options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to the __float128 tests. Then modify the tests as required. llvm-svn: 336940
*	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests	Joel E. Denny	2018-07-11	4	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843
*	[Power9] Add remaining __flaot128 builtin support for FMA round to odd	Stefan Pintilie	2018-07-11	1	-43/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement this as it is done on GCC: __float128 a, b, c, d; a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp Differential Revision: https://reviews.llvm.org/D48218 llvm-svn: 336754
*	[Power9] Add __float128 builtins for Rounding Operations	Stefan Pintilie	2018-07-09	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601
*	[Power9] [LLVM] Add __float128 support for trunc to double round to odd	Stefan Pintilie	2018-07-09	1	-0/+10
\| \| \| \| \| \| \| \| \|	Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595