bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PPC64] Add missing dependency on X2 to LDinto_toc.	Bill Schmidt	2014-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LDinto_toc pattern has been part of 64-bit PowerPC for a long time, and represents loading from a memory location into the TOC register (X2). However, this pattern doesn't explicitly record that it modifies that register. This patch adds the missing dependency. It was very surprising to me that this has never shown up as a problem in the past, and that we only saw this problem recently in a single scenario when building a self-hosted clang. It turns out that in most cases we have another dependency present that keeps the LDinto_toc instruction tied in place. LDinto_toc is used for TOC restore following a call site, so this is a typical sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there is a natural anti-dependency between the two that keeps it in place. Therefore we don't usually see a problem. However, in one particular case, one call is followed immediately by another call, and the second call requires a parameter that is a TOC-relative address. This is the code sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Note that the back-to-back stack adjustments are the same size! The back end is smart enough to recognize this and optimize them away: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Now there is nothing to prevent the ADDIStocHA instruction from moving ahead of the LDinto_toc instruction, and because of the longest-path heuristic, this is what happens. With the accompanying patch, %X2 is represented as an implicit def: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1, %X2<imp-def,dead> ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 So now when the two stack adjustments are removed, ADDIStocHA is prevented from being moved above LDinto_toc. I have not yet created a test case for this, because the original failure occurs on a relatively large function that needs reduction. However, this is a fairly serious bug, despite its infrequency, and I wanted to get this patch onto the list as soon as possible so that it can be considered for a 3.5 backport. I'll work on whittling down a test case. Have we missed the boat for 3.5 at this point? Thanks, Bill llvm-svn: 215685
*	[FastISel][ARM] Fall-back to constant pool loads when materializing an i32 ↵	Juergen Ributzka	2014-08-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant. FastEmit_i won't always succeed to materialize an i32 constant and just fail. This would trigger a fall-back to SelectionDAG, which is really not necessary. This fix will first fall-back to a constant pool load to materialize the constant before giving up for good. This fixes <rdar://problem/18022633>. llvm-svn: 215682
*	Copy noalias metadata from call sites to inlined instructions	Hal Finkel	2014-08-14	1	-4/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a call site with noalias metadata is inlined, that metadata can be propagated directly to the inlined instructions (only those that might access memory because it is not useful on the others). Prior to inlining, the noalias metadata could express that a call would not alias with some other memory access, which implies that no instruction within that called function would alias. By propagating the metadata to the inlined instructions, we preserve that knowledge. This should complete the enhancements requested in PR20500. llvm-svn: 215676
*	Revert several FastISel commits to track down a buildbot error.	Juergen Ributzka	2014-08-14	3	-131/+43
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
*	Fix whitespace error from r215279, NFC	Duncan P. N. Exon Smith	2014-08-14	1	-1/+1
\| \| \| \|	llvm-svn: 215667
*	[AVX512] Switch FMA intrinsics to the masking version	Adam Nemet	2014-08-14	1	-24/+37
\| \| \| \| \| \| \| \|	This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> llvm-svn: 215664
*	[X86] Break out logic to map FMA Intrinsic number to Opcode	Adam Nemet	2014-08-14	1	-57/+51
\| \| \| \| \| \|	No functional change. Will be used to lower AVX512 masking FMA intrinsics. llvm-svn: 215663
*	[AVX512] Add enum for the static rounding types	Adam Nemet	2014-08-14	2	-1/+13
\| \| \| \| \| \| \| \| \| \|	No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662
*	[AVX512] Break out the logic to lower masking intrinsics	Adam Nemet	2014-08-14	1	-13/+21
\| \| \| \| \| \| \|	No functional change. This will be used by the FMA intrinsic lowering as well and hopefully many more. llvm-svn: 215661
*	[AVX512] Add masking variant for the FMA instructions	Adam Nemet	2014-08-14	2	-33/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change further evolves the base class AVX512_masking in order to make it suitable for the masking variants of the FMA instructions. Besides AVX512_masking there is now a new base class that instructions including FMAs can use: AVX512_masking_3src. With three-source (destructive) instructions one of the sources is already tied to the destination. This difference from AVX512_masking is captured by this new class. The common bits between _masking and _masking_3src are broken out into a new super class called AVX512_masking_common. As with valign, there is some corresponding restructuring of the underlying format classes. The idea is the same we want to derive from two classes essentially: one providing the format bits and another format-independent multiclass supplying the various masking and non-masking instruction variants. Existing fma tests in avx512-fma.ll provide coverage here for the non-masking variants. For masking, the next patches in the series will add intrinsics and intrinsic tests. For AVX512_masking_3src to work, the (ins ...) dag has to be passed without* the leading source operand that is tied to dst ($src1). This is necessary to properly construct the (ins ...) for the different variants. For the record, I did check that if $src is mistakenly included, you do get a fairly intuitive error message from the tablegen backend. Part of <rdar://problem/17688758> llvm-svn: 215660
*	Revert "[FastISel][AArch64] Add support for more addressing modes."	Juergen Ributzka	2014-08-14	1	-289/+168
\| \| \| \| \| \|	This reverts commits r215597, because it might have broken the build bots. llvm-svn: 215659
*	Add noalias metadata for general calls (not just memory intrinsics) during ↵	Hal Finkel	2014-08-14	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inlining When preserving noalias function parameter attributes by adding noalias metadata in the inliner, we should do this for general function calls (not just memory intrinsics). The logic is very similar to what already existed (except that we want to add this metadata even for functions taking no relevant parameters). This metadata can be used by ModRef queries in the caller after inlining. This addresses the first part of PR20500. Adding noalias metadata during inlining is still turned off by default. llvm-svn: 215657
*	Testing commit access.	Moritz Roth	2014-08-14	1	-1/+1
\| \| \| \| \| \|	Remove a trailing whitespace. llvm-svn: 215653
*	[Reassociation] Add support for reassociation with unsafe algebra.	Chad Rosier	2014-08-14	1	-81/+228
\| \| \| \| \| \| \|	Vector instructions are (still) not supported for either integer or floating point. Hopefully, that work will be landed shortly. llvm-svn: 215647
*	optimize vector fneg of bitcasted integer value	Sanjay Patel	2014-08-14	1	-9/+14
\| \| \| \| \| \| \| \| \| \|	This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
*	Delete support for AuroraUX.	Rafael Espindola	2014-08-14	1	-2/+0
\| \| \| \| \| \| \| \| \|	auroraux.org is not resolving. I will add this to the release notes as soon as I figure out where to put the 3.6 release notes :-) llvm-svn: 215645
*	Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly ↵	Aaron Ballman	2014-08-14	1	-1/+1
\| \| \| \| \| \|	converted to 64 bits (was 64-bit shift intended?)). NFC. llvm-svn: 215642
*	[x86] Begin stubbing out the AVX support in the new vector shuffle	Chandler Carruth	2014-08-14	1	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are shockingly good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. llvm-svn: 215637
*	[mips][microMIPS] MicroMIPS Compact Branch Instructions BEQZC and BNEZC	Zoran Jovanovic	2014-08-14	2	-0/+28
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D3545 llvm-svn: 215636
*	[mips] Add assembler support for the "la $reg,symbol" pseudo-instruction.	Toma Tabacu	2014-08-14	1	-6/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This pseudo-instruction allows the programmer to load an address from a symbolic expression into a register. Patch by David Chisnall. His work was sponsored by: DARPA, AFRL I've made some minor changes to the original, such as improving the formatting and adding some comments, and I've also added a test case. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4808 llvm-svn: 215630
*	[mips] Rename [gs]etCanHaveModuleDir to more natural names	Daniel Sanders	2014-08-14	4	-59/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: getCanHaveModuleDir() is renamed to isModuleDirectiveAllowed(), and setCanHaveModuleDir() is renamed to forbidModuleDirective() since it is only ever given a false argument. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4885 llvm-svn: 215628
*	[SDAG] Fix a bug in the DAG combiner where we would fail to return the	Chandler Carruth	2014-08-14	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being completely confusing and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return that exact node.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine just right. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
*	InstCombine: ((A \| ~B) ^ (~A \| B)) to A ^ B	David Majnemer	2014-08-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Proof using CVC3 follows: $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR((A \| ~B),(~A \|B)) = BVXOR(A,B); $ cvc3 t.cvc Valid. Patch by Mayur Pandey! Differential Revision: http://reviews.llvm.org/D4883 llvm-svn: 215621
*	AArch64: Silence warning in AArch64FastISel	David Majnemer	2014-08-14	1	-1/+1
\| \| \| \| \| \|	GCC was emitting a signed vs unsigned comparison warning. llvm-svn: 215620
*	Added InstCombine Transform for ((B \| C) & A) \| B -> B \| (A & C)	David Majnemer	2014-08-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Transform ((B \| C) & A) \| B --> B \| (A & C) Z3 Link: http://rise4fun.com/Z3/hP6p Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D4865 llvm-svn: 215619
*	MC: AsmLexer: handle multi-character CommentStrings correctly	Saleem Abdulrasool	2014-08-14	1	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \|	As X86MCAsmInfoDarwin uses '##' as CommentString although a single '#' starts a comment a workaround for this special case is added. Fixes divisions in constant expressions for the AArch64 assembler and other targets which use '//' as CommentString. Patch by Janne Grunau! llvm-svn: 215615
*	[MCJIT] Support DisableSymbolSearching and InstallLazyFunctionCreator in MCJIT.	Lang Hames	2014-08-14	1	-5/+13
\| \| \| \| \| \|	Patch by Anthony Pesch. Thanks Anthony! llvm-svn: 215613
*	[SDAG] Fix a case where we would iteratively legalize a node during	Chandler Carruth	2014-08-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611
*	[X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32.	Quentin Colombet	2014-08-13	1	-1/+1
\| \| \| \| \| \|	Found by code inspection. llvm-svn: 215604
*	[AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.	Akira Hatanaka	2014-08-13	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Certain functions such as objc_autoreleaseReturnValue have to be called as tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls, we have to fall back to SelectionDAG to select calls that are marked as tail. <rdar://problem/17991614> llvm-svn: 215600
*	[FastISel][AArch64] Add support for more addressing modes.	Juergen Ributzka	2014-08-13	1	-168/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 215597
*	[FastISel][X86] Add large code model support for materializing ↵	Juergen Ributzka	2014-08-13	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595
*	[FastISel][X86] Use XOR to materialize the "0" value.	Juergen Ributzka	2014-08-13	1	-0/+23
\| \| \| \|	llvm-svn: 215594
*	[FastISel][X86] Emit more efficient instructions for integer constant ↵	Juergen Ributzka	2014-08-13	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593
*	[FastISel][AArch64] Make use of the zero register when possible.	Juergen Ributzka	2014-08-13	1	-1/+13
\| \| \| \| \| \| \| \| \| \|	This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 215591
*	[FastISel] Let the target decide first if it wants to materialize a constant.	Juergen Ributzka	2014-08-13	1	-15/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
*	[MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]	Gerolf Hoflehner	2014-08-13	2	-3/+2
\| \| \| \| \| \| \| \|	This is a cleaner solution to the problem described in r215431. When instructions are combined a dangling DBG_VALUE is removed. This resolves bug 20598. llvm-svn: 215587
*	[FastISel][X86] Refactor constant materialization. NFCI.	Juergen Ributzka	2014-08-13	1	-54/+67
\| \| \| \| \| \| \|	Split the constant materialization code into three separate helper functions for Integer-, Floating-Point-, and GlobalValue-Constants. llvm-svn: 215586
*	[FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.	Juergen Ributzka	2014-08-13	1	-0/+3
\| \| \| \| \| \| \| \|	This change is also in preparation for a future change to make sure that the constant materialization uses MOVT/MOVW when available and not a load from the constant pool. llvm-svn: 215584
*	[FastISel][ARM] Fix a bug in the integer materialization code.	Juergen Ributzka	2014-08-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	getRegClassFor returns the incorrect register class when in Thumb2 mode. This fix simply manually selects the register class as in the code just a few lines above. There is no test case for this code, because the code is currently unreachable. This will be changed in a future commit and existing test cases will exercise this code. llvm-svn: 215583
*	[FastISel][AArch64] Cleanup constant materialization code. NFCI.	Juergen Ributzka	2014-08-13	1	-26/+30
\| \| \| \| \| \|	Cleanup and prepare constant materialization code for future commits. llvm-svn: 215582
*	[Cleanup] Utility function to erase instruction and mark DBG_Values	Gerolf Hoflehner	2014-08-13	2	-12/+24
\| \| \| \| \| \| \| \| \| \|	New function to erase a machine instruction and mark DBG_VALUE for removal. A DBG_VALUE is marked for removal when it references an operand defined in the instruction. Use the new function to cleanup code in dead machine instruction removal pass. llvm-svn: 215580
*	[MachineDominatorTree] Provide a method to inform a MachineDominatorTree that a	Quentin Colombet	2014-08-13	2	-25/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	critical edge has been split. The MachineDominatorTree will when lazy update the underlying dominance properties when require. Context This is a follow-up of r215410. Each time a critical edge is split this invalidates the dominator tree information. Thus, subsequent queries of that interface will be slow until the underlying information is actually recomputed (costly). Problem Prior to this patch, splitting a critical edge needed to query the dominator tree to update the dominator information. Therefore, splitting a bunch of critical edges will likely produce poor performance as each query to the dominator tree will use the slow query path. This happens a lot in passes like MachineSink and PHIElimination. Proposed Solution Splitting a critical edge is a local modification of the CFG. Moreover, as soon as a critical edge is split, it is not critical anymore and thus cannot be a candidate for critical edge splitting anymore. In other words, the predecessor and successor of a basic block inserted on a critical edge cannot be inserted by critical edge splitting. Using these observations, we can pile up the splitting of critical edge and apply then at once before updating the DT information. The core of this patch moves the update of the MachineDominatorTree information from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree. Performance Thanks to this patch, the motivating example compiles in 4- minutes instead of 6+ minutes. No test case added as the motivating example as nothing special but being huge! The binaries are strictly identical for all the llvm test-suite + SPECs with and without this patch for both Os and O3. Regarding compile time, I observed only noise, although on average I saw a small improvement. <rdar://problem/17894619> llvm-svn: 215576
*	utils: Fix segfault in flattencfg	Jan Vesely	2014-08-13	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: continue iterating through the rest of the bb use for loop v3: initialize FlattenCFG pass in ScalarOps add test v4: split off initializing flattencfg to a separate patch add comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215574
*	Initialize FlattenCFG pass	Jan Vesely	2014-08-13	1	-0/+1
\| \| \| \| \|	Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215573
*	R600: Correctly set the src value offset for scalarized kernel args	Matt Arsenault	2014-08-13	1	-11/+29
\| \| \| \| \| \| \| \| \| \|	This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564
*	Canonicalize header guards into a common format.	Benjamin Kramer	2014-08-13	373	-819/+841
\| \| \| \| \| \| \| \| \| \|	Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
*	[DAGCombiner] Improved target independent vector shuffle combine rule.	Andrea Di Biagio	2014-08-13	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
*	[mips] Refactor calls to setCanHaveModuleDir.	Toma Tabacu	2014-08-13	2	-133/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Moved some calls to setCanHaveModuleDir to the MipsTargetStreamer base class and removed the resulting empty functions from the MipsTargetELFStreamer class. Also fixed a missing call to setCanHaveModuleDir in MipsTargetELFStreamer::emitDirectiveSetMicroMips. Reviewers: dsanders Reviewed By: dsanders Subscribers: tomatabacu Differential Revision: http://reviews.llvm.org/D4781 llvm-svn: 215542
*	[optnone] Make the optnone attribute effective at suppressing function	Chandler Carruth	2014-08-13	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	attribute and function argument attribute synthesizing and propagating. As with the other uses of this attribute, the goal remains a best-effort (no guarantees) attempt to not optimize the function or assume things about the function when optimizing. This is particularly useful for compiler testing, bisecting miscompiles, triaging things, etc. I was hitting specific issues using optnone to isolate test code from a test driver for my fuzz testing, and this is one step of fixing that. llvm-svn: 215538