bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM] Set correct successors in CMPXCHG pseudo expansion.	Ahmed Bougacha	2016-04-27	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. The testcase changes are caused by Thumb2SizeReduction, which was previously confused by the broken CFG. Follow-up to r266679. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267778
*	[ARM] Expand vector ctlz_zero_undef so it becomes ctlz.	Craig Topper	2016-04-26	1	-0/+62
\| \| \| \| \| \|	The default is Legal, which results in 'Cannot select' errors. llvm-svn: 267521
*	[ARM] Expand v1i64 and v2i64 ctlz.	Craig Topper	2016-04-26	1	-0/+16
\| \| \| \| \| \|	The default is legal, which results in 'Cannot select' errors. llvm-svn: 267520
*	Pass the test file in through stdin instead of by filename.	Richard Trieu	2016-04-26	1	-1/+1
\| \| \| \| \| \| \| \|	When passed in via filename, this test will fail if the path to the test has the strings "f1" and "f2" in somewhere. Pass the file through stdin to prevent test failures due to coincidences in path names. llvm-svn: 267517
*	ARM: put extern __thread stubs in a special section.	Tim Northover	2016-04-25	1	-0/+17
\| \| \| \| \| \| \|	The linker needs to know that the symbols are thread-local to do its job properly. llvm-svn: 267473
*	[ARM] Add support for the X asm constraint	Silviu Baranga	2016-04-25	2	-0/+178
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds support for the X asm constraint. To do this, we lower the constraint to either a "w" or "r" constraint depending on the operand type (both constraints are supported on ARM). Fixes PR26493 Reviewers: t.p.northover, echristo, rengolin Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19061 llvm-svn: 267411
*	ARM: fix __chkstk Frame Setup on WoA	Saleem Abdulrasool	2016-04-24	4	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	This corrects the MI annotations for the stack adjustment following the __chkstk invocation. We were marking the original SP usage as a Def rather than Kill. The (new) assigned value is the definition, the original reference is killed. Adjust the ISelLowering to mark Kills and FrameSetup as well. This partially resolves PR27480. llvm-svn: 267361
*	Fix llvm/test/CodeGen/ARM/Windows/dbzchk.ll not to check mixed output, take #2.	NAKAMURA Takumi	2016-04-22	1	-2/+2
\| \| \| \|	llvm-svn: 267242
*	CodeGen: Use PLT relocations for relative references to unnamed_addr functions.	Peter Collingbourne	2016-04-22	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211
*	test: split test into two runs	Saleem Abdulrasool	2016-04-22	1	-8/+9
\| \| \| \| \| \| \| \| \| \|	Rather than checking both stdout and stderr simultaneously, split it into two tests. This apparently breaks on Windows where MSVCRT does not buffer output correctly. NFC. Thanks to chapuni for bringing the issue to my attention! llvm-svn: 267179
*	ARM: fix test for Windows division	Saleem Abdulrasool	2016-04-22	1	-4/+4
\| \| \| \| \| \| \|	This was meant to be part of SVN r267080. cbz cannot use a high register, which would be silently truncated. This has now been fixed. llvm-svn: 267092
*	ARM: restrict register class for WIN__DBZCHK	Saleem Abdulrasool	2016-04-21	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \|	WIN__DBZCHK will insert a CBZ instruction into the stream. This instruction reserves 3 bits for the condition register (rn). As such, we must ensure that we restrict the register to a low register. Use the tGPR class instead of GPR to ensure that this is properly constrained. In debug builds, we would attempt to use lr as a condition register which would silently get truncated with no hint that the register selection was incorrect. llvm-svn: 267080
*	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.	Mandeep Singh Grang	2016-04-19	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
*	ARM: fix assertion failure on -O0 cmpxchg.	Tim Northover	2016-04-19	1	-0/+21
\| \| \| \| \| \| \| \|	Because lowering of CMP_SWAP_64 occurs during type legalization, there can be i64 types produced by more than just a BUILD_PAIR or similar. My initial tests used just incoming function args. llvm-svn: 266828
*	[AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.	Marcin Koscielnicki	2016-04-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818
*	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD	Tim Shen	2016-04-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806
*	ARM: use a pseudo-instruction for cmpxchg at -O0.	Tim Northover	2016-04-18	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fast register-allocator cannot cope with inter-block dependencies without spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions where any value produced within a block is dead by the end, but not for cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded after regalloc. Fortunately this is at -O0 so we don't have to care about performance. This simplifies the various axes of expansion considerably: we assume a strong seq_cst operation and ensure ordering via the always-present DMB instructions rather than v8 acquire/release instructions. Should fix the 32-bit part of PR25526. llvm-svn: 266679
*	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	Adrian Prantl	2016-04-15	20	-72/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
*	[CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN	David Majnemer	2016-04-14	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279
*	ARM: override cost function to re-enable ConstantHoisting (& fix it).	Tim Northover	2016-04-13	2	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At some point, ARM stopped getting any benefit from ConstantHoisting because the pass called a different variant of getIntImmCost. Reimplementing the correct variant revealed some problems, however: + ConstantHoisting was modifying switch statements. This is simply invalid, the cases must remain integer constants no matter the notional cost. + ConstantHoisting was mangling alloca instructions in the entry block. These should be handled by FrameLowering, so constants actually have a cost of 0. Worse, the resulting bitcasts meant they became dynamic allocas. rdar://25707382 llvm-svn: 266260
*	ARM: Use a callee save register for the swiftself parameter.	Matthias Braun	2016-04-13	1	-23/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18901 llvm-svn: 266253
*	Recommit r265547, and r265610,r265639,r265657 on top of it, plus	Wei Mi	2016-04-13	1	-0/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	two fixes with one about error verify-regalloc reported, and another about live range update of phi after rematerialization. r265547: Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Patches on top of r265547: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" Differential Revision: http://reviews.llvm.org/D15302 Differential Revision: http://reviews.llvm.org/D18934 Differential Revision: http://reviews.llvm.org/D18935 Differential Revision: http://reviews.llvm.org/D18936 llvm-svn: 266162
*	CodeGen: Clear the MFI's save and restore point after PrologEpilogInserter	Justin Bogner	2016-04-12	1	-0/+27
\| \| \| \| \| \| \| \| \| \|	This state is no longer useful and not guaranteed to be valid in later codegen passes. For example, see the added test, which would print a savepoint of %bb.-1 without this change, and crashes with a use-after-free error under ASan if you apply the recycling allocator patch from llvm.org/PR26808. llvm-svn: 266150
*	ARM: use r7 as the frame-pointer on all MachO targets.	Tim Northover	2016-04-11	2	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \|	This is better for a few reasons: + It matches the other tooling for iOS. + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't use ARM mode). + It leads to infinitesimally smaller code (0.2%, yay!). rdar://25369506 llvm-svn: 266003
*	Swift Calling Convention: swifterror target support.	Manman Ren	2016-04-11	1	-0/+381
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997
*	More upgrading of old- and very-old-style debug info in testcases.	Adrian Prantl	2016-04-11	2	-3/+3
\| \| \| \|	llvm-svn: 265953
*	Add trailing colons to labels in a test.	James Y Knight	2016-04-08	1	-12/+12
\| \| \| \| \| \| \|	This will avoid matching on the FILENAME if it happened to contain, say, "f4" anywhere in the file path. llvm-svn: 265837
*	Revert r265817	Colin LeMahieu	2016-04-08	4	-8/+8
\| \| \| \| \| \|	lld tests need to be addressed. llvm-svn: 265822
*	[llvm-objdump] Printing hex instead of dec by default	Colin LeMahieu	2016-04-08	4	-8/+8
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18770 llvm-svn: 265817
*	[ARM] Enable SMLAW[B\|T] and SMLUW[B\|T] instruction selection	Sam Parker	2016-04-08	2	-26/+105
\| \| \| \| \| \| \| \| \| \|	Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt, smulwb and smulwt instructions for the ARM backend. Also updated the smul CodeGen test and removed the smulw one. Differential Revision: http://reviews.llvm.org/D18892 llvm-svn: 265793
*	Swift Calling Convention: swiftcc for ARM.	Manman Ren	2016-04-05	2	-0/+201
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18769 llvm-svn: 265482
*	Don't delete empty preheaders in CodeGenPrepare if it would create a ↵	Chuang-Yu Cheng	2016-04-05	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
*	ARM, AArch64, X86: Check preserved registers for tail calls.	Matthias Braun	2016-04-04	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329
*	Add missing emissionKind flags to the DICompileUnits of several old testcases.	Adrian Prantl	2016-04-01	1	-1/+1
\| \| \| \|	llvm-svn: 265192
*	testcase gardening: update the emissionKind enum to the new syntax. (NFC)	Adrian Prantl	2016-04-01	18	-18/+18
\| \| \| \|	llvm-svn: 265081
*	Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.	Adrian Prantl	2016-03-31	7	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
*	[ARM] Expand v1i64 and v2i64 ctpop.	Benjamin Kramer	2016-03-31	1	-0/+16
\| \| \| \| \| \| \|	The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040
*	Swift Calling Convention: add swiftself attribute.	Manman Ren	2016-03-29	1	-0/+32
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754
*	[Codegen] Decrease minimum jump table density.	Kyle Butt	2016-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689
*	fix checks: _DAG -> -DAG	Sanjay Patel	2016-03-28	1	-1/+1
\| \| \| \|	llvm-svn: 264676
*	ARM: maintain BB ordering when expanding WIN__DBZCHK	Saleem Abdulrasool	2016-03-25	2	-25/+64
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454
*	ARM: fix optimised division on WoA	Saleem Abdulrasool	2016-03-25	1	-3/+32
\| \| \| \| \| \| \| \| \|	We did not have an explicit branch to the continuation BB. When the check was hoisted, this could permit control follow to fall through into the division trap. Add the explicit branch to the continuation basic block to ensure that code execution is correct. llvm-svn: 264370
*	Remove unsafe AssertZext after promoting result of FP_TO_FP16	Pirama Arumuga Nainar	2016-03-24	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32 instruction, do not guarantee that the top 16 bits are zeroed out. Remove the unsafe AssertZext and add tests to exercise this. Reviewers: jmolloy, sbaranga, kristof.beyls, aadg Subscribers: llvm-commits, srhines, aemerson Differential Revision: http://reviews.llvm.org/D18426 llvm-svn: 264285
*	CodeGen: check return types match when emitting tail call to builtin.	Tim Northover	2016-03-22	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \|	We were just completely ignoring the types when determining whether we could safely emit a libcall as a tail call. This is clearly wrong. Theoretically, we could dig deeper looking for incidental matches (much like the generic code in Analysis.cpp does), but it's probably not worth it for the few libcalls that exist. llvm-svn: 264084
*	ARM: Better codegen for 64-bit compares.	Peter Collingbourne	2016-03-21	3	-144/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a custom lowering for ISD::SETCCE (introduced in r253572) that allows us to emit a short code sequence for 64-bit compares. Before: push {r7, lr} cmp r0, r2 mov.w r0, #0 mov.w r12, #0 it hs movhs r0, #1 cmp r1, r3 it ge movge.w r12, #1 it eq moveq r12, r0 cmp.w r12, #0 bne .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} After: push {r7, lr} subs r0, r0, r2 sbcs.w r0, r1, r3 bge .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} Saves around 80KB in Chromium's libchrome.so. Some notes on this patch: - I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I introduced (nothing else needs them). However, they are necessary in order to avoid poor codegen, and they seem similar to existing combines in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to (brcond Compare)). - No support for Thumb-1. This is in principle possible, but we'd need to implement ARMISD::SUBE for Thumb-1. Differential Revision: http://reviews.llvm.org/D15256 llvm-svn: 263962
*	[ARM] Add Cortex-A32 support	Renato Golin	2016-03-21	1	-0/+35
\| \| \| \| \| \| \| \|	Adding Cortex-A32 as an available target in the ARM backend. Patch by Sam Parker. llvm-svn: 263956
*	[DAGCombine] Catch the case where extract_vector_elt can cause an any_ext ↵	Silviu Baranga	2016-03-21	1	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	while processing AND SDNodes Summary: extract_vector_elt can cause an implicit any_ext if the types don't match. When processing the following pattern: (and (extract_vector_elt (load ([non_ext\|any_ext\|zero_ext] V))), c) DAGCombine was ignoring the possible extend, and sometimes removing the AND even though it was required to maintain some of the bits in the result to 0, resulting in a miscompile. This change fixes the issue by limiting the transformation only to cases where the extract_vector_elt doesn't perform the implicit extend. Reviewers: t.p.northover, jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18247 llvm-svn: 263935
*	[CXX_FAST_TLS] Fix issues in ARM.	Manman Ren	2016-03-18	1	-4/+16
\| \| \| \| \| \| \| \| \|	We need to be careful on which registers can be explicitly handled via copies. Prologue, Epilogue use physical registers and if one belongs to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites it after the explicit copies. llvm-svn: 263857
*	[CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.	Manman Ren	2016-03-18	1	-0/+13
\| \| \| \| \| \| \|	Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
*	[CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.	Manman Ren	2016-03-18	1	-1/+49
\| \| \| \| \| \| \|	Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855