bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.	Chen Li	2016-01-10	2	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a fix of D13718. D13718 was committed but then reverted because of the following bug: https://llvm.org/bugs/show_bug.cgi?id=25299 This patch fixes the issue shown in the bug. Reviewers: majnemer, reames Subscribers: jevinskie, llvm-commits Differential Revision: http://reviews.llvm.org/D14308 llvm-svn: 257277
*	[WinEH] Fix catchpad pred verification	Joseph Tremoulet	2016-01-10	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The code was simply ensuring that the catchpad's pred is its catchswitch, which was letting cases slip through where the flow edge was the unwind edge of the catchswitch rather than one of its catch clauses. Reviewers: andrew.w.kaylor, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16011 llvm-svn: 257275
*	[WinEH] Disallow cyclic unwinds	Joseph Tremoulet	2016-01-10	2	-42/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Funclet-based EH personalities/tables likely can't handle these, and they can't be generated at source, so make them officially illegal in IR as well. Reviewers: andrew.w.kaylor, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15963 llvm-svn: 257274
*	[WinEH] Verify consistent funclet unwind exits	Joseph Tremoulet	2016-01-10	3	-3/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A funclet EH pad may be exited by an unwind edge, which may be a cleanupret exiting its cleanuppad, an invoke exiting a funclet, or an unwind out of a nested funclet transitively exiting its parent. Funclet EH personalities require all such exceptional exits from a given funclet to have the same unwind destination, and EH preparation / state numbering / table generation implicitly depends on this. Formalize it as a rule of the IR in the LangRef and verifier. Reviewers: rnk, majnemer, andrew.w.kaylor Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15962 llvm-svn: 257273
*	[WinEH] Verify unwind edges against EH pad tree	Joseph Tremoulet	2016-01-10	3	-3/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Funclet EH personalities require a tree-like nesting among funclets (enforced by the ParentPad linkage in the IR), and also require that unwind edges conform to certain rules with respect to the tree: - An unwind edge may exit 0 or more ancestor pads - An unwind edge must enter exactly one EH pad, which must be distinct from any exited pads - A cleanupret's edge must exit its cleanuppad Describe these rules in the LangRef, and enforce them in the verifier. Reviewers: rnk, majnemer, andrew.w.kaylor Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15961 llvm-svn: 257272
*	Revert "[BranchFolding] Set correct mem refs"	Michael Zolotukhin	2016-01-09	1	-53/+0
\| \| \| \| \| \|	This reverts commit 1ff11017d2669b933b29fcbb6451cfcda34ad693. llvm-svn: 257270
*	[X86][AVX] Match broadcast loads through a bitcast	Simon Pilgrim	2016-01-09	2	-19/+5
\| \| \| \| \| \| \| \|	AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266
*	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets	Simon Pilgrim	2016-01-09	2	-353/+853
\| \| \| \| \| \|	Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264
*	[BranchFolding] Set correct mem refs	Junmo Park	2016-01-09	1	-0/+53
\| \| \| \| \| \| \| \|	Merge MBBICommon and MBBI's MMOs. Differential Revision: http://reviews.llvm.org/D15990 llvm-svn: 257253
*	[RS4GC] Update and simplify handling of Constants in ↵	Manuel Jacob	2016-01-09	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	findBaseDefiningValueOfVector(). Summary: This is analogous to r256079, which removed an overly strong assertion, and r256812, which simplified the code by replacing three conditionals by one. Reviewers: reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D16019 llvm-svn: 257250
*	[rs4gc] Optionally directly relocated vector of pointers	Philip Reames	2016-01-09	3	-2/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates. The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications. Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR. At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested. Differential Revision: http://reviews.llvm.org/D15982 llvm-svn: 257244
*	[llvm-symbolizer] -print-source-context-lines option to print source code ↵	Mike Aizatsky	2016-01-09	2	-0/+24
\| \| \| \| \| \| \| \|	around the line. Differential Revision: http://reviews.llvm.org/D15909 llvm-svn: 257236
*	[DAGCombiner] don't dereference an operand that doesn't exist (PR26070)	Sanjay Patel	2016-01-08	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	The bug was introduced with changes for x86-64 fp128: http://reviews.llvm.org/rL254653 I don't know why an x86 change is here, so I'll follow up in: http://reviews.llvm.org/D15134 Should fix: https://llvm.org/bugs/show_bug.cgi?id=26070 llvm-svn: 257200
*	[JumpThreading] Split select that has constant conditions coming from the ↵	Haicheng Wu	2016-01-08	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PHI node Look for PHI/Select in the same BB of the form bb: %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ... %s = select p, trueval, falseval And expand the select into a branch structure. This later enables jump-threading over bb in this pass. Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold select if the associated PHI has at least one constant. If the unfolded select is not jump-threaded, it will be folded again in the later optimizations. llvm-svn: 257198
*	LoopInfo: Simplify ownership of Loop objects	Justin Bogner	2016-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	It's strange that LoopInfo mostly owns the Loop objects, but that it defers deleting them to the loop pass manager. Instead, change the oddly named "updateUnloop" to "markAsRemoved" and have it queue the Loop object for deletion. We can't delete the Loop immediately when we remove it, since we need its pointer identity still, so we'll mark the object as "invalid" so that clients can see what's going on. llvm-svn: 257191
*	RBIT Instruction only available for ARMv6t2 and above.	Weiming Zhao	2016-01-08	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse. RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5. Patch by Z. Zheng <zhaoshiz@codeaurora.org> Reviewers: apazos, jmolloy, weimingz Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15932 llvm-svn: 257188
*	Do not ASSERTZEXT for i16 result of bitcast from f16 operand	Pirama Arumuga Nainar	2016-01-08	2	-3/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16. Directly return an FP_TO_FP16 node with return type as the promote-to-type of i16. This patch also removes extraneous length check. This legalization should be valid even if integer and float types are of different lengths. This patch breaks a hard-float test for fp16 args. The test is changed to allow a vmov to zero-out the top bits, and also ensure that the return value is in an FP register. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15438 llvm-svn: 257184
*	[WinEH] CatchHandler which don't have catch objects in StackColoring	David Majnemer	2016-01-08	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StackColoring rewrites the frame indicies of operations involving allocas if it can find that the life time of two objects do not overlap. MSVC EH needs to be kept aware of this if happens in the event that a catch object has moved around. However, we represent the non-existance of a catch object with a sentinel frame index (INT_MAX). This sentinel also happens to be the EmptyKey of the SlotRemap DenseMap. Testing for whether or not we need to translate the frame index fails in this case because we call the count method on the DenseMap with the EmptyKey, leading to assertions. Instead, check if it is our sentinel value before trying to look into the DenseMap. This fixes PR26073. llvm-svn: 257182
*	AMDGPU/SI: Emit global variable sizes when targeting HSA	Tom Stellard	2016-01-08	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173
*	AMDGPU: Emit functions sizes	Tom Stellard	2016-01-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172
*	[ThinLTO] Delay metadata materializtion in function importer	Teresa Johnson	2016-01-08	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function importer was still materializing metadata when modules were loaded for function importing. We only want to materialize it when we are going to invoke the metadata linking postpass. Materializing it before function importing is not only unnecessary, but also causes metadata referenced by imported functions to be mapped in early, and then not connected to the rest of the module level metadata when it is ultimately linked in. Augmented the test case to specifically check for the metadata being properly connected, which it wasn't before this fix. llvm-svn: 257171
*	Re-commit r257064, this time with a fixed assert	Silviu Baranga	2016-01-08	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In setInsertionPoint if the value is not a PHI, Instruction or Argument it should be a Constant, not a ConstantExpr. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257164
*	[attrs] Split the late-revisit pattern for deducing norecurse in	Chandler Carruth	2016-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a top-down manner into a true top-down or RPO pass over the call graph. There are specific patterns of function attributes, notably the norecurse attribute, which are most effectively propagated top-down because all they us caller information. Walk in RPO over the call graph SCCs takes the form of a module pass run immediately after the CGSCC pass managers postorder walk of the SCCs, trying again to deduce norerucrse for each singular SCC in the call graph. This removes a very legacy pass manager specific trick of using a lazy revisit list traversed during finalization of the CGSCC pass. There is no analogous finalization step in the new pass manager, and a lazy revisit list is just trying to produce an RPO iteration of the call graph. We can do that more directly if more expensively. It seems unlikely that this will be the expensive part of any compilation though as we never examine the function bodies here. Even in an LTO run over a very large module, this should be a reasonable fast set of operations over a reasonably small working set -- the function call graph itself. In the future, if this really is a compile time performance issue, we can look at building support for both post order and RPO traversals directly into a pass manager that builds and maintains the PO list of SCCs. Differential Revision: http://reviews.llvm.org/D15785 llvm-svn: 257163
*	[WinEH] Update WinEHFuncInfo if StackColoring merges allocas	David Majnemer	2016-01-08	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Windows EH keeping track of which frame index corresponds to a catchpad in order to inform the runtime where the catch parameter should be initialized. LLVM's optimizations are able to prove that the memory used by the catch parameter can be reused with another memory optimization, changing it's frame index. We need to keep WinEHFuncInfo up to date with respect to this or we will miscompile/assert. This fixes PR26069. llvm-svn: 257158
*	[X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a ↵	Craig Topper	2016-01-08	1	-1/+1
\| \| \| \| \| \|	mistake I made years ago. llvm-svn: 257149
*	[PGO] Ensure vp data in indexed profile always sorted	Xinliang David Li	2016-01-08	1	-2/+8
\| \| \| \| \| \| \| \| \|	Done in InstrProfWriter to eliminate the need for client code to do the sorting. The operation is done once and reused many times so it is more efficient. Update unit test to remove sorting. Also update expected output of affected tests. llvm-svn: 257145
*	Add call sequence start and end for __tls_get_addr	Kyle Butt	2016-01-08	2	-0/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839. For a PIC TLS variable access in a function, prologue (mflr followed by std and stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but no one saves/restores it. Also added a test for save/restore clobbered registers during calling __tls_get_addr. Patch by Tim Shen llvm-svn: 257137
*	[Vectorization] Actually return from error case in isStridedPtr	Kyle Butt	2016-01-08	1	-0/+29
\| \| \| \| \| \| \| \| \| \|	The early return seems to be missed. This causes a radical and wrong loop optimization on powerpc. It isn't reproducible on x86_64, because "UseInterleaved" is false. Patch by Tim Shen. llvm-svn: 257134
*	[InstCombine] insert a new shuffle in a safe place (PR25999)	Sanjay Patel	2016-01-08	1	-0/+50
\| \| \| \| \| \| \| \|	Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133
*	Add some testing for thumb1 and thumb2 inline asm immediate constraints	Eric Christopher	2016-01-08	2	-0/+51
\| \| \| \| \| \| \| \|	and fix a couple of bugs on inspection. Also fixes PR26061. llvm-svn: 257122
*	[llvm-symbolizer] Print out non-address lines verbatim.	Mike Aizatsky	2016-01-07	2	-0/+6
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15876 llvm-svn: 257115
*	Instructions to be redone only if from the same BB	Aditya Nandakumar	2016-01-07	1	-0/+20
\| \| \| \| \| \| \|	While adding instructions(possible roots) to be redone, make sure they are from the same basic block. llvm-svn: 257112
*	WebAssembly: use .skip instead of .zero directive	JF Bastien	2016-01-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.zero is confusing when used with two arguments. Documentation: This directive emits SIZE 0-valued bytes. SIZE must be an absolute expression. This directive is actually an alias for the '.skip' directive so in can take an optional second argument of the value to store in the bytes instead of zero. Using '.zero' in this way would be confusing however. Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353 Hexagon and Sparc do the same, and it's all the same to WebAssembly so let's pick the less confusing of the two. llvm-svn: 257111
*	Temporarily revert r257105 "[Verifier] Check that debug values have proper size"	Keno Fischer	2016-01-07	25	-182/+165
\| \| \| \| \| \| \|	Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107
*	[Verifier] Check that debug values have proper size	Keno Fischer	2016-01-07	25	-165/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105
*	Turn off lldb debug tuning by default for FreeBSD	Dimitry Andric	2016-01-07	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In rL242338, debugger tuning was introduced, and the tuning for FreeBSD was set to lldb by default. However, for the foreseeable future we still need to default to gdb tuning, since lldb is not ready for all of FreeBSD's architectures, and some system tools (like objcopy, etc) have not yet been adapted to cope with the lldb tuned format, which has .apple sections. Therefore, let FreeBSD use gdb by default for now. Reviewers: emaste, probinson Subscribers: llvm-commits, emaste Differential Revision: http://reviews.llvm.org/D15966 llvm-svn: 257103
*	[SCCP] Don't violate the lattice invariants	David Majnemer	2016-01-07	1	-0/+26
\| \| \| \| \| \| \| \| \| \|	We marked values which are 'undef' as constant instead of undefined which violates SCCP's invariants. If we can figure out that a computation results in 'undef', leave it in the undefined state. This fixes PR16052. llvm-svn: 257102
*	Add test for r256912	David Majnemer	2016-01-07	1	-0/+39
\| \| \| \| \| \|	I forgot to add this with the rest of r256912. llvm-svn: 257088
*	[SCCP] Can't go from overdefined to constant	David Majnemer	2016-01-07	1	-0/+31
\| \| \| \| \| \| \| \|	The fix for PR23999 made us mark loads of null as producing the constant undef which upsets the lattice. Instead, keep the load as "undefined". This fixes PR26044. llvm-svn: 257087
*	[WebAssembly] Support combining GEP and FrameIndex offsets in memory operand ↵	Derek Schuff	2016-01-07	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \|	offset field Previously we only supported putting the FI into memory operand offset fields if there was nothing there already. Now combine them. Differential Revision: http://reviews.llvm.org/D15941 llvm-svn: 257084
*	[WebAssembly] Use the default private label prefixes.	Dan Gohman	2016-01-07	4	-384/+384
\| \| \| \| \| \| \| \| \|	The MC assembler doesn't like using the empty string as a private label prefix because then it treats all labels as private. This commit reverts back to the default prefix, which is .L, which is common in ELF targets and consistent with the LLVM name mangler. llvm-svn: 257083
*	AMDGPU/SI: Fold operands with sub-registers	Nicolai Haehnle	2016-01-07	3	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
*	AMDGPU/SI: xnack_mask is always reserved on VI	Nicolai Haehnle	2016-01-07	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073
*	[avx512] Fix test avx512bw-intrinsics.ll	Michael Zuckerman	2016-01-07	1	-58/+58
\| \| \| \| \| \| \|	Change the CHECK lablel into AVX512BW And fix declare lable of llvm.x86.avx512.mask.psrav32_hi llvm-svn: 257071
*	[AVX512] add PSLLW and PSLLV Intrinsic	Michael Zuckerman	2016-01-07	3	-0/+267
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15889 llvm-svn: 257070
*	Revert r257064. It caused failures in some sanitizer tests.	Silviu Baranga	2016-01-07	1	-100/+0
\| \| \| \|	llvm-svn: 257069
*	Revert r257055, it caused PR26064.	Nico Weber	2016-01-07	2	-3/+11
\| \| \| \|	llvm-svn: 257066
*	[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose ↵	Silviu Baranga	2016-01-07	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257064
*	[AVX512] add PSRAV Intrinsic	Michael Zuckerman	2016-01-07	3	-0/+146
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15856 llvm-svn: 257063
*	Added support for macro emission in dwarf (supporting DWARF version 4).	Amjad Aboud	2016-01-07	1	-0/+67
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15495 llvm-svn: 257060