bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.	Eli Friedman	2017-02-24	1	-3/+8
\| \| \| \| \| \| \| \| \| \|	When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	1	-0/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149
*	Revert r269060 to pacify bots.	Michael Kuperstein	2017-02-24	1	-254/+0
\| \| \| \|	llvm-svn: 296064
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	1	-0/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060
*	[Analysis] Centralize objectsize lowering logic.	George Burgess IV	2016-12-20	1	-1/+34
\| \| \| \| \| \| \| \| \|	We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	3	-6/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
*	Fix CodeGenPrepare::stripInvariantGroupMetadata	Sanjoy Das	2016-12-16	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \|	`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. llvm-svn: 289973
*	Revert "[CodeGenPrep] Skip merging empty case blocks"	Jun Bum Lim	2016-12-16	3	-206/+6
\| \| \| \| \| \|	This reverts commit r289951. llvm-svn: 289960
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	3	-6/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
*	AMDGPU: Implement isCheapAddrSpaceCast	Matt Arsenault	2016-12-02	1	-0/+121
\| \| \| \|	llvm-svn: 288523
*	Revert r287553: [CodeGenPrep] Skip merging empty case blocks	Joerg Sonnenberger	2016-11-28	3	-150/+6
\| \| \| \| \| \| \|	It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052
*	[CodeGenPrepare] Don't sink non-cheap addrspacecasts.	Justin Lebar	2016-11-21	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-11-21	3	-6/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553
*	[BypassSlowDivision] Handle division by constant numerators better.	Justin Lebar	2016-11-16	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062
*	Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX.	Justin Lebar	2016-10-28	1	-0/+2
\| \| \| \|	llvm-svn: 285464
*	Don't leave unused divs/rems sitting around in BypassSlowDivision.	Justin Lebar	2016-10-28	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 llvm-svn: 285460
*	Don't claim the udiv created in BypassSlowDivision is exact.	Justin Lebar	2016-10-28	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 llvm-svn: 285459
*	Update the section.ll to fix non-x86 failure.	Dehao Chen	2016-10-19	1	-10/+5
\| \| \| \|	llvm-svn: 284566
*	Revert r284545 again as the regression in ppc still exists. There is bug in ↵	Dehao Chen	2016-10-19	1	-1/+1
\| \| \| \| \| \| \| \|	MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563
*	Add target for test to fix regression introduced by r284533.	Dehao Chen	2016-10-18	1	-0/+2
\| \| \| \|	llvm-svn: 284538
*	Use profile info to set function section prefix to group hot/cold functions.	Dehao Chen	2016-10-18	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533
*	[CodeGenPrepare] Don't sink a cast past its user	David Majnemer	2016-04-27	1	-0/+32
\| \| \| \| \| \| \| \| \| \|	The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767
*	[CodeGenPrepare] don't convert an unpredictable select into control flow	Sanjay Patel	2016-04-26	1	-5/+24
\| \| \| \| \| \| \|	Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504
*	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	Adrian Prantl	2016-04-15	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
*	Calculate __builtin_object_size when pointer depends on a condition	Petar Jovanovic	2016-04-13	1	-0/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes calculating of builtin_object_size if it depends on a condition. Before this patch compiler did not know how to calculate the object size when it finds a condition that cannot be eliminated. This patch enables calculating of builtin_object_size even in case when condition cannot be eliminated by choosing minimum or maximum value as a result from condition. Choosing minimum or maximum value from condition is based on the second argument of __builtin_object_size function. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D18438 llvm-svn: 266193
*	[CodeGenPrepare] Avoid sinking soft-FP comparisons	Peter Zotov	2016-04-03	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sinking comparisons in CGP can undo the job of hoisting them done earlier by LICM, and soft-FP makes this an expensive mistake. A common pattern that produces floating point comparisons uniform over a loop is an explicit check for division by zero. If the divisor is hoisted out of the loop, the comparison can also be, but hoisting the function that unwinds is never legal, since it may cause side effects in the loop body prior to the unwinding to not be executed. Differential Revision: http://reviews.llvm.org/D18744 llvm-svn: 265264
*	testcase gardening: update the emissionKind enum to the new syntax. (NFC)	Adrian Prantl	2016-04-01	1	-1/+1
\| \| \| \|	llvm-svn: 265081
*	Keep CodeGenPrepare from preserving the domtree.	George Burgess IV	2016-03-22	1	-0/+41
\| \| \| \| \| \| \| \| \| \|	CGP modifies the domtree in some cases, so saying that it preserves the domtree is a lie. We'll be able to selectively preserve it with the new pass manager. Differential Revision: http://reviews.llvm.org/D16893 llvm-svn: 264099
*	[CGP] Duplicate addressing computation in cold paths if required to sink ↵	Philip Reames	2016-03-09	1	-0/+196
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	addressing mode This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store. In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path. This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had. Differential Revision: http://reviews.llvm.org/D17652 llvm-svn: 263074
*	[CodeGenPrepare] Remove load-based heuristic	Junmo Park	2016-02-25	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Both the hardware and LLVM have changed since 2012. Now, load-based heuristic don't show big differences any more on OoO cores. There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5). Reviewers: spatel, zansari Differential Revision: http://reviews.llvm.org/D16836 llvm-svn: 261809
*	AMDGPU: Remove some old intrinsic uses from tests	Matt Arsenault	2016-02-11	1	-2/+2
\| \| \| \|	llvm-svn: 260493
*	[InstCombine] Rewrite bswap/bitreverse handling completely.	James Molloy	2016-01-15	3	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several requirements that ended up with this design; 1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early. 2. Bitreversals and byteswaps are very related in their matching logic. 3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses. 4. Bswaps are best matched early in InstCombine. The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals. We can then extend the matching logic in one place only. llvm-svn: 257875
*	Reapply r257105 "[Verifier] Check that debug values have proper size"	Keno Fischer	2016-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I originally reapplied this in 257550, but had to revert again due to bot breakage. The only change in this version is to allow either the TypeSize or the TypeAllocSize of the variable to be the one represented in debug info (hopefully in the future we can figure out how to encode the difference). Additionally, several bot failures following r257550, were due to optimizer bugs now fixed in r257787 and r257795. r257550 commit message was: ``` The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: `` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref `` ``` llvm-svn: 257850
*	Re-Revert r257105 (Verifier debug info changes)	Keno Fischer	2016-01-13	1	-1/+1
\| \| \| \| \| \| \|	While I investigate some new buildbot failures. This was originally reapplied as r257550 and r257558. llvm-svn: 257563
*	Reapply r257105 "[Verifier] Check that debug values have proper size"	Keno Fischer	2016-01-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: ``` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref ``` llvm-svn: 257550
*	Temporarily revert r257105 "[Verifier] Check that debug values have proper size"	Keno Fischer	2016-01-07	1	-1/+1
\| \| \| \| \| \| \|	Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107
*	[Verifier] Check that debug values have proper size	Keno Fischer	2016-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105
*	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type ↵	Chen Li	2015-12-26	1	-33/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443
*	Remove double blanks. NFC.	Manuel Jacob	2015-12-19	1	-7/+7
\| \| \| \|	llvm-svn: 256100
*	Move catchpad-phi-cast.ll to the X86 specific subdirectory	David Majnemer	2015-12-12	1	-0/+0
\| \| \| \| \| \| \|	It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426
*	[IR] Reformulate LLVM's EH funclet IR	David Majnemer	2015-12-12	1	-25/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422
*	[CGP] Reimplement r255055 a different way	Reid Kleckner	2015-12-08	1	-0/+57
\| \| \| \|	llvm-svn: 255070
*	Revert "[CGP] Check that we have an insert point before moving ↵	Reid Kleckner	2015-12-08	1	-57/+0
\| \| \| \| \| \| \| \| \| \|	llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063
*	[CGP] Check that we have an insert point before moving llvm.dbg.value around	Reid Kleckner	2015-12-08	1	-0/+57
\| \| \| \|	llvm-svn: 255055
*	[WinEH] Fix problem where CodeGenPrepare incorrectly sinks a bitcast into an ↵	Andrew Kaylor	2015-11-23	1	-0/+59
\| \| \| \| \| \| \| \|	EH pad. Differential Revision: http://reviews.llvm.org/D14842 llvm-svn: 253902
*	Move free-zext.ll to llvm/test/Transforms/CodeGenPrepare/AArch64/	NAKAMURA Takumi	2015-11-20	1	-0/+0
\| \| \| \|	llvm-svn: 253730
*	[CodeGenPrepare] Create more extloads and fewer ands	Geoff Berry	2015-11-20	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add and instructions immediately after loads that only have their low bits used, assuming that the (and (load x) c) will be matched as a extload and the ands/truncs fed by the extload will be removed by isel. Reviewers: mcrosier, qcolombet, ab Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14584 llvm-svn: 253722
*	this new test file was accidentally left out of r253573	Sanjay Patel	2015-11-19	1	-0/+56
\| \| \| \|	llvm-svn: 253574
*	Revert "Change memcpy/memset/memmove to have dest and source alignments."	Pete Cooper	2015-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
*	Change memcpy/memset/memmove to have dest and source alignments.	Pete Cooper	2015-11-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511