bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg	Bill Schmidt	2014-10-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290
*	[PowerPC] Change assert to better form	Bill Schmidt	2014-10-17	1	-3/+3
\| \| \| \|	llvm-svn: 220092
*	[PowerPC] Change liveness testing in VSX FMA mutation pass	Bill Schmidt	2014-10-17	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090
*	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation	Bill Schmidt	2014-10-17	2	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047
*	Simplify handling of --noexecstack by using getNonexecutableStackSection.	Rafael Espindola	2014-10-15	1	-7/+3
\| \| \| \|	llvm-svn: 219799
*	Use the triple to figure out if this is a darwin target, not	Eric Christopher	2014-10-14	1	-1/+1
\| \| \| \| \| \|	the subtarget. llvm-svn: 219673
*	MC: Bit pack MCSymbolData.	Benjamin Kramer	2014-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573
*	[PowerPC] Reduce names from Power8Vector to P8Vector	Bill Schmidt	2014-10-10	3	-8/+7
\| \| \| \| \| \|	Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514
*	[PowerPC] Add feature for Power8 vector extensions	Bill Schmidt	2014-10-10	3	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501
*	Fix bug in GPR to FPR moves in PPC64LE.	Samuel Antao	2014-10-09	1	-4/+4
\| \| \| \| \| \|	The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441
*	[PPC64] VSX indexed-form loads use wrong instruction format	Bill Schmidt	2014-10-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416
*	Add subtarget caches to aarch64, arm, ppc, and x86.	Eric Christopher	2014-10-06	2	-1/+30
\| \| \| \| \| \| \| \| \|	These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106
*	[Power] Use lwsync for non-seq_cst fences	Robin Morisset	2014-10-03	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995
*	[PowerPC] Modern Book-E cores support sync	Hal Finkel	2014-10-02	4	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
*	[Power] Improve the expansion of atomic loads/stores	Robin Morisset	2014-10-02	3	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922
*	constify the TargetMachine argument used in the subtarget and	Eric Christopher	2014-10-01	4	-4/+4
\| \| \| \| \| \|	lowering constructors. llvm-svn: 218832
*	Now that the optimization level is adjusting the feature string	Eric Christopher	2014-10-01	3	-9/+4
\| \| \| \| \| \|	before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817
*	Rework the PPC TargetMachine so that the non-function specific	Eric Christopher	2014-10-01	2	-27/+32
\| \| \| \| \| \| \|	overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805
*	Split the estimate() interface into separate functions for each type. NFC.	Sanjay Patel	2014-09-30	2	-22/+33
\| \| \| \| \| \| \| \| \| \| \| \|	It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
*	Refactor reciprocal and reciprocal square root estimate into ↵	Sanjay Patel	2014-09-26	2	-180/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
*	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where ↵	Robin Morisset	2014-09-23	4	-2/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331
*	[MCJIT] Remove PPCRelocations.h - it's no longer used.	Lang Hames	2014-09-23	1	-56/+0
\| \| \| \| \| \| \|	This was overlooked in r218320, which removed the relocation headers for other targets. Thanks to Ulrich Weigand for catching it. llvm-svn: 218327
*	Refactor reciprocal square root estimate into target-independent function; NFC.	Sanjay Patel	2014-09-21	2	-41/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
*	Optionally enable more-aggressive FMA formation in DAGCombine	Hal Finkel	2014-09-19	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
*	Reverting NFC changes from r218050. Instead, the warning was disabled for ↵	Aaron Ballman	2014-09-18	1	-1/+0
\| \| \| \| \| \|	GCC in r218059, so these changes are no longer required. llvm-svn: 218062
*	Fixing a bunch of -Woverloaded-virtual warnings due to hiding ↵	Aaron Ballman	2014-09-18	1	-0/+1
\| \| \| \| \| \|	getSubtargetImpl from the base class. NFC. llvm-svn: 218050
*	Add a new pass FunctionTargetTransformInfo. This pass serves as a	Eric Christopher	2014-09-18	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \|	shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004
*	Fix FastISel bug in boolean returns for PowerPC.	Samuel Antao	2014-09-17	1	-7/+18
\| \| \| \| \| \|	For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. llvm-svn: 217993
*	Remove unnecessary blank space (test commit)	Samuel Antao	2014-09-17	1	-1/+1
\| \| \| \|	llvm-svn: 217991
*	Address comments on r217622	Bill Schmidt	2014-09-12	1	-4/+6
\| \| \| \|	llvm-svn: 217680
*	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm	Bill Schmidt	2014-09-11	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
*	Use cast to MVT instead of EVT on a couple calls to getSizeInBits.	Craig Topper	2014-09-10	1	-2/+2
\| \| \| \|	llvm-svn: 217473
*	[FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.	Juergen Ributzka	2014-09-03	1	-11/+11
\| \| \| \| \| \| \| \| \| \|	This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075
*	[FastISel] Rename public visible FastISel functions. NFC.	Juergen Ributzka	2014-09-03	1	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074
*	Remove resetSubtargetFeatures as it is unused.	Eric Christopher	2014-09-03	2	-21/+3
\| \| \| \|	llvm-svn: 217071
*	Add override to overriden virtual methods, remove virtual keywords.	Benjamin Kramer	2014-09-03	3	-24/+8
\| \| \| \| \| \|	No functionality change. Changes made by clang-tidy + some manual cleanup. llvm-svn: 217028
*	Reinstate "Nuke the old JIT."	Eric Christopher	2014-09-02	12	-943/+24
\| \| \| \| \| \| \| \|	Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
*	Fix signed integer overflow in PPCInstPrinter.	Alexey Samsonov	2014-09-02	1	-1/+1
\| \| \| \| \| \|	This bug was reported by UBSan. llvm-svn: 216917
*	[PowerPC] Guard against illegal selection of add for TargetConstant operands	Hal Finkel	2014-09-02	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	r208640 was reverted because it caused a self-hosting failure on ppc64. The underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant operands. Because we have no patterns for 'add' taking 'timm' nodes, these are selected as r+r add instructions (which is a miscompile). Guard against this kind of behavior in the future by making the backend crash should this occur (instead of silently generating invalid output). llvm-svn: 216897
*	Remove 'virtual' keyword from methods markedwith 'override' keyword.	Craig Topper	2014-08-30	3	-35/+32
\| \| \| \|	llvm-svn: 216823
*	Test commit. Fix whitespace from a previous patch of mine.	Justin Hibbits	2014-08-28	1	-1/+1
\| \| \| \|	llvm-svn: 216650
*	Allow vectorization of division by uniform power of 2.	Karthik Bhat	2014-08-25	1	-8/+10
\| \| \| \| \| \| \| \|	This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
*	[PowerPC] Add support for dcbtst and icbt (prefetch)	Hal Finkel	2014-08-23	2	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \|	Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. llvm-svn: 216339
*	name change: isPow2DivCheap -> isPow2SDivCheap	Sanjay Patel	2014-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237
*	TableGen: allow use of uint64_t for available features mask.	Tim Northover	2014-08-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	ARM in particular is getting dangerously close to exceeding 32 bits worth of possible subtarget features. When this happens, various parts of MC start to fail inexplicably as masks get truncated to "unsigned". Mostly just refactoring at present, and there's probably no way to test. llvm-svn: 215887
*	[PowerPC] Mark fixed-offset byvals as pointed-to by IR values	Hal Finkel	2014-08-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	A byval object, even if allocated at a fixed offset (prescribed by the ABI) is pointed to by IR values. Most fixed-offset stack objects are not pointed-to by IR values, so the default is to assume this is not possible. However, we need to override the default in this case (instruction scheduling can cause miscompiles otherwise). Fixes PR20280. llvm-svn: 215795
*	[PowerPC] Darwin byval arguments are not immutable	Hal Finkel	2014-08-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's frame, but are not immutable -- the pointer value is directly available to the higher-level code as the address of the argument, and the value of the byval argument can be modified at the IR level. This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in a follow-up commit, its test case will cover this change. llvm-svn: 215793
*	Remove HasLEB128.	Rafael Espindola	2014-08-15	1	-1/+0
\| \| \| \| \| \|	We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712
*	PPC: Clean up pointer casting, no functionality change.	Benjamin Kramer	2014-08-15	1	-2/+2
\| \| \| \| \| \|	Silences GCC's -Wcast-qual. llvm-svn: 215703