bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[FastISel] Remove an performance debugging assert.	Juergen Ributzka	2014-08-15	1	-1/+0
\| \| \| \| \| \| \| \|	As Jim pointed out this assert isn't really needed to test for correctness, because the code right afterwards does the same check and falls-back to SelectionDAG - as intended. llvm-svn: 215735
*	Revert several FastISel commits to track down a buildbot error.	Juergen Ributzka	2014-08-14	1	-21/+15
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
*	optimize vector fneg of bitcasted integer value	Sanjay Patel	2014-08-14	1	-9/+14
\| \| \| \| \| \| \| \| \| \|	This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
*	[SDAG] Fix a bug in the DAG combiner where we would fail to return the	Chandler Carruth	2014-08-14	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being completely confusing and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return that exact node.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine just right. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
*	[SDAG] Fix a case where we would iteratively legalize a node during	Chandler Carruth	2014-08-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611
*	[FastISel] Let the target decide first if it wants to materialize a constant.	Juergen Ributzka	2014-08-13	1	-15/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
*	Canonicalize header guards into a common format.	Benjamin Kramer	2014-08-13	5	-10/+10
\| \| \| \| \| \| \| \| \| \|	Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
*	[DAGCombiner] Improved target independent vector shuffle combine rule.	Andrea Di Biagio	2014-08-13	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
*	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic	Hal Finkel	2014-08-13	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512
*	[x86] Fold extract_vector_elt of a load into the Load's address computation.	Michael J. Spencer	2014-08-11	1	-90/+125
\| \| \| \|	llvm-svn: 215409
*	Make this SmallVector size a power of two as suggested by Chandler	Hans Wennborg	2014-08-11	1	-1/+1
\| \| \| \|	llvm-svn: 215355
*	Increase the size of this SmallVector in CloneNodeWithValues.	Hans Wennborg	2014-08-11	1	-1/+1
\| \| \| \| \| \|	In a Clang bootstrap, the size of this vector was always 6. llvm-svn: 215335
*	Add support for scalarizing cttz_zero_undef	Petar Jovanovic	2014-08-10	1	-0/+1
\| \| \| \| \| \| \| \| \|	Follow up to r214266. Add missing case in ScalarizeVectorResult() for cttz_zero_undef. Differential Revision: http://reviews.llvm.org/D4813 llvm-svn: 215330
*	Added a TLI hook to signal that the target does not have or does not care about	Pedro Artigas	2014-08-08	1	-5/+10
\| \| \| \| \| \| \| \|	floating point exceptions, added use of flag to fold potentially exception raising floating point math in selection DAG. No functionality change, as targets have to explicitly ask for this behavior and none does today. llvm-svn: 215222
*	[pr19635] Revert most of r170537, and add new testcase.	Patrik Hagglund	2014-08-08	1	-1/+1
\| \| \| \| \| \| \| \|	Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190
*	[stack protector] Look through bitcasts to get global variable	Akira Hatanaka	2014-08-07	1	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__stack_chk_guard. Handle the case where the pointer operand of the load instruction that loads the stack guard is not a global variable but instead a bitcast. %StackGuard = load i8 bitcast (i64 @__stack_chk_guard to i8*) call void @llvm.stackprotector(i8 %StackGuard, i8** %StackGuardSlot) Original test case provided by Ana Pazos. This fixes PR20558. llvm-svn: 215167
*	Temporarily Revert "Nuke the old JIT." as it's not quite ready to	Eric Christopher	2014-08-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
*	Nuke the old JIT.	Rafael Espindola	2014-08-07	1	-3/+2
\| \| \| \| \| \| \| \| \|	I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
*	Optimize vector fabs of bitcasted constant integer values.	Sanjay Patel	2014-08-05	1	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
*	Have MachineFunction cache a pointer to the subtarget to make lookups	Eric Christopher	2014-08-05	5	-14/+8
\| \| \| \| \| \| \| \| \| \| \|	shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
*	[SDAG] Fix a really, really terrible bug in the DAG combiner.	Chandler Carruth	2014-08-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785
*	Remove the TargetMachine forwards for TargetSubtargetInfo based	Eric Christopher	2014-08-04	14	-164/+228
\| \| \| \| \| \|	information and update all callers. No functional change. llvm-svn: 214781
*	[x86] Don't add nodes to the combined set (and prune subsequent	Chandler Carruth	2014-08-03	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	combines) until they are legal. Doing it the old way could, when the stars align just right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673
*	fix for PR20354 - Miscompile of fabs due to vectorization	Sanjay Patel	2014-08-03	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
*	[AArch64] Teach DAGCombiner that converting two consecutive loads into a ↵	James Molloy	2014-08-02	1	-0/+7
\| \| \| \| \| \| \| \|	vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634
*	[SDAG] Refactor the code which deletes nodes in the DAG combiner to do	Chandler Carruth	2014-08-02	1	-54/+36
\| \| \| \| \| \| \| \| \| \| \| \| \|	so using a single helper which adds operands back onto the worklist. Several places didn't rigorously do this but a couple already did. Factoring them together and doing it rigorously is important to delete things recursively early on in the combiner and get a chance to see accurate hasOneUse values. While no existing test cases change, an upcoming patch to add DAG combining logic for PSHUFB requires this to work correctly. llvm-svn: 214623
*	Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be ↵	Owen Anderson	2014-08-02	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	constant-folded during DAGCombine in certain circumstances. Unfortunately, the circumstances required to trigger the issue seem to require a pretty specific interaction of DAGCombines, and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64. The functionality added here is replicated in essentially every other DAG combine, so it seems pretty obviously correct. llvm-svn: 214622
*	CodeGen: Remove commented out code	Justin Bogner	2014-08-02	1	-2/+0
\| \| \| \| \| \| \|	These two lines have been commented out for over 4 years. They aren't helping anyone. llvm-svn: 214615
*	[SDAG] MorphNodeTo recursively deletes dead operands of the old	Chandler Carruth	2014-08-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	fromulation of the node, which isn't really the desired behavior from within the combiner or legalizer, but is necessary within ISel. I've added a hopefully helpful comment and fixed the only two places where this took place. Yet another step toward the combiner and legalizer not needing to use update listeners with virtual calls to manage the worklists behind legalization and combining. llvm-svn: 214574
*	[SDAG] Begin simplifying the way in which the legalizer deletes nodes.	Chandler Carruth	2014-08-01	1	-41/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lifts the (very few) places the legalizer would delete dead nodes into the outer loop around the legalizer. This is significantly simpler because it doesn't require the legalizer itself to manage the iterator validity, and it doesn't require the legalizer to be a DAG update listener in order to remove things from the legalized set. It also makes the interface much less contrived for the case of the legalizer running inside the last phase of DAG combining. I'm working on centralizing the deletion of nodes during both legalizing and combining as much as possible. My hope is to remove the need for DAG update listeners from the combiner next, which would remove a costly virtual dispatch chain on every deletion. This in turn should allow us to more aggressively delete DAG nodes during combining which will in turn allow us to combine more aggressively by exposing the actual nodes which have single users to the combine phases. llvm-svn: 214546
*	[PowerPC] Generate unaligned vector loads using intrinsics instead of ↵	Hal Finkel	2014-08-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	regular loads Altivec vector loads on PowerPC have an interesting property: They always load from an aligned address (by rounding down the address actually provided if necessary). In order to generate an actual unaligned load, you can generate two load instructions, one with the original address, one offset by one vector length, and use a special permutation to extract the bytes desired. When this was originally implemented, I generated these two loads using regular ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with this: The alignment of a load does not contribute to its identity, and SDNodes are uniqued. So, imagine that we have some unaligned load, L1, that is not aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned). Further imagine that there had already existed a load (L1+16)(unaligned) with the same chain operand as the load L1. When (L1+16)(aligned) is created as part of the lowering of L1, this load is also the (L1+16)(unaligned) node, just now marked as aligned (because the new alignment overwrites the old). But the original users of (L1+16)(unaligned) now get the data intended for the permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists to get its own permutation-based expansion. This was PR19991. A second potential problem has to do with the MMOs on these loads, which can be used by AA during instruction scheduling to break chain-based dependencies. If the new "aligned" loads get the MMO from the original unaligned load, this does not represent the fact that it will load data from below the original address. Normally, this would not matter, but this load might be combined with another load pair for a previous vector, and then the dependency on the otherwise- ignored lower bytes can matter. To fix both problems, instead of generating the necessary loads using regular ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are provided with MMOs with a conservative address range. Unfortunately, I no longer have a failing test case (since PR19991 was reported, other changes in CodeGen have forced this bug back into hiding it again). Nevertheless, this should fix the underlying problem. llvm-svn: 214481
*	White space fix.	Louis Gerbarg	2014-07-31	1	-1/+1
\| \| \| \|	llvm-svn: 214455
*	Make sure no loads resulting from load->switch DAGCombine are marked invariant	Louis Gerbarg	2014-07-31	6	-40/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
*	[FastISel] Fix the patchpoint intrinsic lowering in FastISel for large ↵	Juergen Ributzka	2014-07-31	1	-1/+1
\| \| \| \| \| \| \| \| \|	target addresses. This fixes a mistake where I accidentially dropped the upper 32bit of a 64bit pointer during FastISel lowering of the patchpoint intrinsic. llvm-svn: 214367
*	Retain alignment requirements for load->selects modified by DAGCombine	Louis Gerbarg	2014-07-30	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DAGCombine may choose to rewrite graphs where two loads feed a select into graphs where a select of two addresses feed a load. While it sanity checks the loads to make sure they are broadly equivalent it currently just uses the alignment restriction of the left node. In cases where the right node has stronger alignment requiresment this may lead to bad codegen, such as generating an aligned load where an unaligned load is required. This patch makes the combine generate a load with an alignment that is the same as whichever is more restrictive of the two alignments. Tests included. rdar://17762530 llvm-svn: 214322
*	Add support for scalarizing ctlz_zero_undef	Petar Jovanovic	2014-07-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	Fix the missing case in ScalarizeVectorResult() that was exposed with libclcore.bc in Android. Differential Revision: http://reviews.llvm.org/D4645 llvm-svn: 214266
*	ARM: fix @llvm.convert.from.fp16 on softfloat targets.	Tim Northover	2014-07-29	1	-1/+6
\| \| \| \| \| \| \| \|	We need to make sure we use the softened version of all appropriate operands in the libcall, or things go horribly wrong. This may entail actually executing a 1-stage softening. llvm-svn: 214175
*	[SDAG] Add DEBUG logging to the legalizer, fixing a "bug" found by	Chandler Carruth	2014-07-28	2	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inspection in the proccess, and shuffle the logging in the DAG combiner around a bit. With this it is much easier to follow what the legalizer is doing. It should even accurately present most of the strange legalization operations where a single node is replaced by multiple nodes, etc. There is still some information lost (we log SDNodes not SDValues so we don't log which result is used for which thing), but I think this is much closer to a usable system. Notably, this will make it much more apparant when legalization is actually happening inside the combiner, or when there is a cycle caused by interactions of the legalizer and the combiner. The "bug" I fixed here I'm not sure is remotely possible to trigger. We were only adding one of the nodes in a replacement to the updated set rather than all of the nodes in the replacement. Realistically, the worst result of this are nodes not getting back onto the worklist in the DAG combiner. I doubt it is possible to trigger this today, and I certainly don't have any ideas about how, but this at least brings the code into alignment with the principled operation of the routine. llvm-svn: 214105
*	Add alignment value to allowsUnalignedMemoryAccess	Matt Arsenault	2014-07-27	3	-12/+17
\| \| \| \| \| \| \| \| \| \|	Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055
*	[SDAG] Add an assert that we don't mess up the number of values when	Chandler Carruth	2014-07-26	1	-0/+3
\| \| \| \| \| \| \| \|	replacing nodes in the legalizer. This caught a number of bugs for me during development. llvm-svn: 214022
*	[SDAG] Simplify the code for handling single-value nodes and add	Chandler Carruth	2014-07-26	1	-8/+12
\| \| \| \| \| \|	a missing transfer of debug information (without which tests fail). llvm-svn: 214021
*	[SDAG] When performing post-legalize DAG combining, run the legalizer	Chandler Carruth	2014-07-26	2	-61/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020
*	Add @llvm.assume, lowering, and some basic properties	Hal Finkel	2014-07-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973
*	[stack protector] Fix a potential security bug in stack protector where the	Akira Hatanaka	2014-07-25	2	-6/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967
*	[SDAG] Don't insert the VRBase into a mapping from SDValues when the def	Chandler Carruth	2014-07-25	1	-6/+10
\| \| \| \| \| \| \|	doesn't actually correspond to an SDValue at all. Fixes most of the remaining asserts on out-of-range SDValue result numbers. llvm-svn: 213930
*	Store nodes only have 1 result.	Matt Arsenault	2014-07-25	1	-1/+1
\| \| \| \|	llvm-svn: 213928
*	[SDAG] Start plumbing an assert into SDValues that we don't form one	Chandler Carruth	2014-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	with a result number outside the range of results for the node. I don't know how we managed to not really check this very basic invariant for so long, but the code is very broken at this point. I have over 270 test failures with the assert enabled. I'm committing it disabled so that others can join in the cleanup effort and reproduce the issues. I've also included one of the obvious fixes that I already found. More fixes to come. llvm-svn: 213926
*	[SDAG] Introduce a combined set to the DAG combiner which tracks nodes	Chandler Carruth	2014-07-24	1	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are introduced during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898
*	[x86] Make vector legalization of extloads work more like the "normal"	Chandler Carruth	2014-07-24	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector operation legalization with support for custom target lowering and fallback to expand when it fails, and use this to implement sext and anyext load lowering for x86 in a more principled way. Previously, the x86 backend relied on a target DAG combine to "combine away" sextload and extload nodes prior to legalization, or would expand them during legalization with terrible code. This is particularly problematic because the DAG combine relies on running over non-canonical DAG nodes at just the right time to match several common and important patterns. It used a combine rather than lowering because we didn't have good lowering support, and to expose some tricks being employed to more combine phases. With this change it becomes a proper lowering operation, the backend marks that it can lower these nodes, and I've added support for handling the canonical forms that don't have direct legal representations such as sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases for this behavior continue to pass even after the DAG combiner beigns running more systematically over every node. There is some noise caused by this in the test suite where we actually use vector extends instead of subregister extraction. This doesn't really seem like the right thing to do, but is unlikely to be a critical regression. We do regress in one case where by lowering to the target-specific patterns early we were able to combine away extraneous legal math nodes. However, this regression is completely addressed by switching to a widening based legalization which is what I'm working toward anyways, so I've just switched the test to that mode. Differential Revision: http://reviews.llvm.org/D4654 llvm-svn: 213897
*	AA metadata refactoring (introduce AAMDNodes)	Hal Finkel	2014-07-24	10	-110/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859