bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Match new shuffle codegen for MOVHPD patterns	Sanjay Patel	2014-12-10	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen when storing the high element of a v2f64. The existing patterns were only checking for an unpckh type of shuffle. http://llvm.org/bugs/show_bug.cgi?id=21791 Differential Revision: http://reviews.llvm.org/D6586 llvm-svn: 223929
*	[X86] Make a code path in EltsFromConsecutiveLoads work only on vectors it ↵	Michael Kuperstein	2014-12-10	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	expects EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash. This adds a check to make this path fire only for 128-bit vectors. Differential Revision: http://reviews.llvm.org/D6579 llvm-svn: 223922
*	[ARM] Combine base-updating/post-incrementing vector load/stores.	Ahmed Bougacha	2014-12-10	4	-19/+381
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 223862
*	[ARM] Make testcase more explicit. NFC.	Ahmed Bougacha	2014-12-09	1	-30/+49
\| \| \| \|	llvm-svn: 223841
*	[ARM] Also support v2f64 vld1/vst1.	Ahmed Bougacha	2014-12-09	2	-0/+19
\| \| \| \| \| \| \| \| \|	It was missing from the VLD1/VST1 handling logic, even though the corresponding instructions exist (same form as v2i64). In preparation for a future patch. llvm-svn: 223832
*	[FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.	Juergen Ributzka	2014-12-09	1	-0/+15
\| \| \| \| \| \| \| \| \|	The load/store value type is currently not available when lowering the memcpy intrinsic. Add the missing nullptr check to support this in 'computeAddress'. Fixes rdar://problem/19178947. llvm-svn: 223818
*	[AVX512] Added lowering for VBROADCASTSS/SD instructions.	Robert Khasanov	2014-12-09	1	-0/+120
\| \| \| \| \| \| \|	Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes. Added lowering tests. llvm-svn: 223804
*	[PowerPC 4/4] Enable little-endian support for VSX.	Bill Schmidt	2014-12-09	4	-29/+10
\| \| \| \| \| \| \| \|	With the foregoing three patches, VSX instructions can be used for little endian. This patch removes the restriction that prevented this, and re-enables the test cases from the first three patches. llvm-svn: 223792
*	[PowerPC 3/4] Little-endian adjustments for VSX vector shuffle	Bill Schmidt	2014-12-09	1	-0/+212
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing instruction selection for ISD::VECTOR_SHUFFLE, there is special code for handling v2f64 and v2i64 using VSX instructions. This code must be adjusted for little-endian. Because the two inputs are treated as a double-wide register, we must swap their order for little endian. To get the appropriate mask elements to use with the big-endian biased XXPERMDI instruction, we must reverse their order and invert the bits. A new test is added to test the 16 possible values of the shuffle mask. It is initially disabled for reasons specified in the test. It is re-enabled by patch 4/4. llvm-svn: 223791
*	Add test cases that were inadvertently omitted from r223783 and r223788	Bill Schmidt	2014-12-09	2	-0/+235
\| \| \| \|	llvm-svn: 223789
*	[CodeGenPrepare] Split branch conditions into multiple conditional branches.	Juergen Ributzka	2014-12-09	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimization transforms code like: bb1: %0 = icmp ne i32 %a, 0 %1 = icmp ne i32 %b, 0 %or.cond = or i1 %0, %1 br i1 %or.cond, label %TrueBB, label %FalseBB into a multiple branch instructions like: bb1: %0 = icmp ne i32 %a, 0 br i1 %0, label %TrueBB, label %bb2 bb2: %1 = icmp ne i32 %b, 0 br i1 %1, label %TrueBB, label %FalseBB This optimization is already performed by SelectionDAG, but not by FastISel. FastISel cannot perform this optimization, because it cannot generate new MachineBasicBlocks. Performing this optimization at CodeGenPrepare time makes it available to both - SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be removed. There are currenty a few differences in codegen for X86 and PPC, so this commmit only enables it for FastISel. Reviewed by Jim Grosbach This fixes rdar://problem/19034919. llvm-svn: 223786
*	[PowerPC 1/4] Little-endian adjustments for VSX loads/stores	Bill Schmidt	2014-12-09	5	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses the inherent big-endian bias in the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. These instructions load vector elements into registers left-to-right (with the first element loaded into the high-order bits of the register), regardless of the endian setting of the processor. However, these are the only vector memory instructions that permit unaligned storage accesses, so we want to use them for little-endian. To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x followed by an xxswapd, which swaps the doublewords. This works for lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the vector elements are in LE order (right-to-left) within each doubleword. (Thus after lxvw2x of a <4 x float> the elements will appear as 1, 0, 3, 2. Following the swap, they will appear as 3, 2, 0, 1, as desired.) For stores, an stxvd2x or stxvw4x is replaced with an stxvd2x preceded by an xxswapd. Introduction of extra swap instructions provides correctness, but obviously is not ideal from a performance perspective. Future patches will address this with optimizations to remove most of the introduced swaps, which have proven effective in other implementations. The introduction of the swaps is performed during lowering of LOAD, STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations. The latter are used to translate intrinsics that specify the VSX loads and stores directly into equivalent sequences for little endian. Thus code that uses vec_vsx_ld and vec_vsx_st does not have to be modified to be ported from BE to LE. We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for use during this lowering step. In PPCInstrVSX.td, we add new SDType and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd). These are recognized during instruction selection and mapped to the correct instructions. Several tests that were written to use -mcpu=pwr7 or pwr8 are modified to disable VSX on LE variants because code generation changes with this and subsequent patches in this set. I chose to include all of these in the first patch than try to rigorously sort out which tests were broken by one or another of the patches. Sorry about that. The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll, are disabled until LE support is enabled because of breakages that occur as noted in those tests. They are re-enabled in patch 4/4. llvm-svn: 223783
*	[x86] Fix the test to actually test things for the CPU names, add the	Chandler Carruth	2014-12-09	1	-32/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	missing barcelona CPU which that test uncovered, and remove the 32-bit x86 CPUs which I really wasn't prepared to audit and test thoroughly. If anyone wants to clean up the 32-bit only x86 CPUs, go for it. Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd be cool, but from the looks of it wouldn't save as much as it did for the Intel CPUs. llvm-svn: 223774
*	[x86] Add a test for the CPU names that should have been in r223769.	Chandler Carruth	2014-12-09	1	-0/+32
\| \| \| \|	llvm-svn: 223770
*	[X86] Convert esp-relative movs of function arguments into pushes, step 1	Michael Kuperstein	2014-12-09	6	-9/+106
\| \| \| \| \| \| \| \| \| \| \|	This handles the simplest case for mov -> push conversion: 1. x86-32 calling convention, everything is passed through the stack. 2. There is no reserved call frame. 3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm. Differential Revision: http://reviews.llvm.org/D6503 llvm-svn: 223757
*	Handle early-clobber registers in the aggressive anti-dep breaker	Hal Finkel	2014-12-09	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \|	The aggressive anti-dep breaker, used by the PowerPC backend during post-RA scheduling (but is available to all targets), did not handle early-clobber MI operands (at all). When constructing the list of available registers for the replacement of some def operand, check the using instructions, and remove registers assigned to early-clobbered defs from the set. Fixes PR21452. llvm-svn: 223727
*	MISched: Fix moving stores across barriers	Tom Stellard	2014-12-08	1	-0/+42
\| \| \| \| \| \| \| \|	This fixes an issue with ScheduleDAGInstrs::buildSchedGraph where stores without an underlying object would not be added as a predecessor to the current BarrierChain. llvm-svn: 223717
*	[PowerPC] Don't use a non-allocatable register to implement the 'cc' alias	Hal Finkel	2014-12-08	2	-0/+343
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when processing inline asm constraints. This had previously been implemented using a non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but the infrastructure does not seem to support this properly (neither the register allocator nor the scheduler properly accounts for the alias). Instead, we can just process this as a naming alias inside of the inline asm constraint-processing code, so we'll do that instead. There are two regression tests, one where the post-RA scheduler did the wrong thing with the non-allocatable alias, and one where the register allocator did the wrong thing. Fixes PR21742. llvm-svn: 223708
*	[CompactUnwind] Fix register encoding logic	Bruno Cardoso Lopes	2014-12-08	2	-73/+79
\| \| \| \| \| \| \| \| \| \| \| \|	Fix a compact unwind encoding logic bug which would try to encode more callee saved registers than it should, leading to early bail out in the encoding logic and abusive use of DWARF frame mode unnecessarily. Also remove no-compact-unwind.ll which was testing the wrong thing based on this bug and move it to valid 'compact unwind' tests. Added other few more tests too. llvm-svn: 223676
*	AArch64: treat HFAs containing "half" types as blocks too.	Tim Northover	2014-12-08	1	-0/+7
\| \| \| \|	llvm-svn: 223669
*	[X86] Improved tablegen patters for matching TZCNT/LZCNT.	Andrea Di Biagio	2014-12-08	1	-0/+131
\| \| \| \| \| \| \| \| \| \| \|	Teach ISel how to match a TZCNT/LZCNT from a conditional move if the condition code is X86_COND_NE. Existing tablegen patterns only allowed to match TZCNT/LZCNT from a X86cond with condition code equal to X86_COND_E. To avoid introducing extra rules, I added an 'ImmLeaf' definition that checks if the condition code is COND_E or COND_NE. llvm-svn: 223668
*	[X86] Improved lowering of packed v8i16 vector shifts by non-constant count.	Andrea Di Biagio	2014-12-08	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch, the backend sub-optimally expanded the non-constant shift count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'. With this patch the backend checks if the target features sse4.1. If so, then it lets the shuffle legalizer deal with the expansion of the shift amount. Example: ;; define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) { %shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer %shl = shl <8 x i16> %A, %shamt ret <8 x i16> %shl } ;; Before (with -mattr=+avx): vmovd %xmm1, %eax movzwl %ax, %eax vmovd %eax, %xmm1 vpsllw %xmm1, %xmm0, %xmm0 retq Now: vpxor %xmm2, %xmm2, %xmm2 vpblendw $1, %xmm1, %xmm2, %xmm1 vpsllw %xmm1, %xmm0, %xmm0 retq llvm-svn: 223660
*	[x86] Clean up the SSE1 test to use a slightly different pattern for	Chandler Carruth	2014-12-07	1	-7/+7
\| \| \| \| \| \| \| \|	matching offsets. I don't expect this to really matter, but its what the latest incarnation of my script for maintaining these tests happens to produce, and so its simpler for me if everything matches. llvm-svn: 223613
*	[x86] Switch a constant selection test to use positive assertions and to	Chandler Carruth	2014-12-07	1	-5/+17
\| \| \| \| \| \| \|	store to real pointers so that its clear that the right code is in fact being generated. llvm-svn: 223612
*	[x86] Cleanup the combining vector shuffle tests a bit by merging	Chandler Carruth	2014-12-07	1	-62/+18
\| \| \| \| \| \|	identical checks for different SSE variants into a single block. llvm-svn: 223611
*	[x86] Clean up the shift lowering vector shuffle tests a bit using my	Chandler Carruth	2014-12-07	4	-93/+26
\| \| \| \| \| \| \|	script. Notably this folds all the SSE cases together into a single FileCheck block. It also adds a vex prefix. llvm-svn: 223610
*	R600/SI: Restore PrivateGlobalPrefix to the default ELF value of ".L"	Tom Stellard	2014-12-06	1	-1/+5
\| \| \| \| \| \|	This was changed in r223323. llvm-svn: 223579
*	Add a proper triple to switch-jump-table.ll	Hans Wennborg	2014-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 223571
*	llvm/test/CodeGen/X86/switch-jump-table.ll: Add explicit triple. Local ↵	NAKAMURA Takumi	2014-12-06	1	-1/+1
\| \| \| \| \| \|	labels have a prefix "." for targeting i686-cygming. llvm-svn: 223570
*	[X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.	Ahmed Bougacha	2014-12-06	4	-2/+388
\| \| \| \| \| \| \| \|	Most patterns will go away once the extload legalization changes land. Differential Revision: http://reviews.llvm.org/D6125 llvm-svn: 223567
*	SelectionDAG switch lowering: Replace unreachable default with most popular ↵	Hans Wennborg	2014-12-06	3	-6/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. SimplifyCFG currently does this transformation, but I'm working towards changing that so we can optimize harder based on unreachable defaults. Differential Revision: http://reviews.llvm.org/D6510 llvm-svn: 223566
*	Optimize merging of scalar loads for 32-byte vectors [X86, AVX]	Sanjay Patel	2014-12-05	1	-10/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ). Before we crack 32-byte build vectors into smaller chunks (and then subsequently glue them back together), we should look for the easy case where we can just load all elements in a single op. An example of the codegen change is: From: vmovss 16(%rdi), %xmm1 vmovups (%rdi), %xmm0 vinsertps $16, 20(%rdi), %xmm1, %xmm1 vinsertps $32, 24(%rdi), %xmm1, %xmm1 vinsertps $48, 28(%rdi), %xmm1, %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 retq To: vmovups (%rdi), %ymm0 retq Differential Revision: http://reviews.llvm.org/D6536 llvm-svn: 223518
*	Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.	Jan Wen Voung	2014-12-05	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Follow up to [x32] "Use ebp/esp as frame and stack pointer": http://reviews.llvm.org/D4617 In that earlier patch, NaCl64 was made to always use rbp. That's needed for most cases because rbp should hold a full 64-bit address within the NaCl sandbox so that load/stores off of rbp don't require sandbox adjustment (zeroing the top 32-bits, then filling those by adding r15). However, llvm.frameaddress returns a pointer and pointers are 32-bit for NaCl64. In this case, use ebp instead, which will make the register copy type check. A similar mechanism may be needed for llvm.eh.return, but is not added in this change. Test Plan: test/CodeGen/X86/frameaddr.ll Reviewers: dschuff, nadav Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6514 llvm-svn: 223510
*	[PowerPC]Update Power VSX test cases to also test fast-isel	Bill Seurer	2014-12-05	7	-131/+541
\| \| \| \| \| \| \| \|	Update of some of the VSX test cases for Power to check fast-isel codegen as well as the regular codegen. http://reviews.llvm.org/D6357 llvm-svn: 223509
*	[X86] Improved lowering of packed vector shifts to vpsllq/vpsrlq.	Andrea Di Biagio	2014-12-05	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of the shift count. This patch teaches function 'getTargetVShiftNode' how to deal with shifts where the shift count node is of type MVT::i64. Before this patch, function 'getTargetVShiftNode' only knew how to deal with shift count nodes of type MVT::i32. This forced the backend to wrongly truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64. llvm-svn: 223505
*	[X86] Avoid introducing extra shuffles when lowering packed vector shifts.	Andrea Di Biagio	2014-12-05	2	-1/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lowering a vector shift node, the backend checks if the shift count is a shuffle with a splat mask. If so, then it introduces an extra dag node to extract the splat value from the shuffle. The splat value is then used to generate a shift count of a target specific shift. However, if we know that the shift count is a splat shuffle, we can use the splat index 'I' to extract the I-th element from the first shuffle operand. The advantage is that the splat shuffle may become dead since we no longer use it. Example: ;; define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) { %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer %shl = shl <4 x i32> %a, %c ret <4 x i32> %shl } ;; Before this patch, llc generated the following code (-mattr=+avx): vpshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0] vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq With this patch, the redundant splat operation is removed from the code. vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq llvm-svn: 223461
*	Add missing FP build attribute tests.	Charlie Turner	2014-12-05	1	-28/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test file test/CodeGen/ARM/build-attributes.ll was missing several floating-point build attribute tests. The intention of this commit is that for each CPU / architecture currently tested, there are now tests that make sure the following attributes are sufficiently checked, * Tag_ABI_FP_rounding * Tag_ABI_FP_denormal * Tag_ABI_FP_exceptions * Tag_ABI_FP_user_exceptions * Tag_ABI_FP_number_model Also in this commit, the -unsafe-fp-math flag has been augmented with the full suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is, `-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim -enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast' Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc llvm-svn: 223454
*	Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for ↵	Hal Finkel	2014-12-05	1	-168/+0
\| \| \| \| \| \| \| \| \|	phys deps" Reverting this because, while it fixes the problem in the reduced test case, it does not fix the problem in the full test case from the bug report. llvm-svn: 223442
*	Consider subregs when calling MI::registerDefIsDead for phys deps	Hal Finkel	2014-12-05	1	-0/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scheduling dependency graph is built bottom-up within each scheduling region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti dependencies, based on physical registers, to the SUs for instructions based on those that come before them. In the test case, we start before post-RA scheduling with a block that looks like this: ... INLINEASM <... andc $0,$0,$2 stdcx. $0,0,$3 bne- 1b > [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>> ... %X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def> ... %R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill> where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a subregister of %CR0. However, for post-RA scheduling, no dependency was added to prevent the INLINEASM from being scheduled in between the ANDIo8 and the ISEL (which communicate via the %CR0GT register). In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd iterate over all of its aliases (which include %CC itself and also %CR0), and look for previously-encountered defs of those registers. We'd find the ANDIo8, but decide not to add a dependency between the INLINEASM and the ANDIo8 because both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a subregister of %CR0, and thus a dependency still must exist. To fix this problem, when calling registerDefIsDead on the SU with the def, we also check all subregisters for possible non-dead defs, and add the dependency if any are found. Fixes PR21742. llvm-svn: 223440
*	Re-add support to llvm-objdump for Mach-O universal files and archives with ↵	Kevin Enderby	2014-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	-macho with fixes. Includes the move of tests for llvm-objdump for universal files to an X86 directory. And the fix where it was failing on linux Rafael tracked down with asan. I had both Jim Grosbach and Adam Hemet look over the second fix since I could not set up asan to reproduce with the old version but not with the fix. llvm-svn: 223416
*	[AArch64] Combining Load and IntToFp should check for neon availability	Weiming Zhao	2014-12-04	1	-0/+16
\| \| \| \|	llvm-svn: 223382
*	Fix thumbv4t indirect calls	Jonathan Roelofs	2014-12-04	2	-2/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So there are a couple of issues with indirect calls on thumbv4t. First, the most 'obvious' instruction, 'blx' isn't available until v5t. And secondly, the next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code because the saved off pc has its thumb bit cleared, so when the callee returns we end up in ARM mode.... yuck. The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it. We could cut down on code size by sharing the landing pads between call sites that are close enough, but for the moment let's do correctness first and look at performance later. Patch by: Iain Sandoe http://reviews.llvm.org/D6519 llvm-svn: 223380
*	[X86] Improve a dag-combine that handles a vector extract -> zext sequence.	Michael Kuperstein	2014-12-04	1	-24/+59
\| \| \| \| \| \| \| \| \|	The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads. According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better. Differential Revision: http://reviews.llvm.org/D6501 llvm-svn: 223360
*	Use DomTree in MachineSink to sink over diamonds.	Patrik Hagglund	2014-12-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to a previous FIXME comment we now not only look at MBB successors, but also handle code sinking past them: x = computation if () {} else {} use x The instruction could be sunk over the whole diamond for the if/then/else (or loop, etc), allowing it to be sunk into other blocks after that. Modified test added in r204522, due to one spill less present. Minor fixes in comments. Patch provided by Jonas Paulsson. Reviewed by Hal Finkel. llvm-svn: 223350
*	Masked Load / Store Intrinsics - the CodeGen part.	Elena Demikhovsky	2014-12-04	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
*	[X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmp	Michael Liao	2014-12-04	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit on - This patch fixes the bug described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html The fix allocates an extra slot just below the GPRs and stores the base pointer there. This is done only for functions containing llvm.eh.sjlj.setjmp that also need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of the callee-save GPRs in the prologue, the offset to the extra slot can be computed before prologue generation runs. Impact at run-time on affected functions is:: - One extra store in the prologue, The store saves the base pointer. - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer. Because the extra slot is just above a gap between frame-pointer-relative and base-pointer-relative chunks of memory, there is no impact on other offset calculations other than ensuring there is room for the extra slot. http://reviews.llvm.org/D6388 Patch by Arch Robison <arch.robison@intel.com> llvm-svn: 223329
*	[PowerPC] 'cc' should be an alias only to 'cr0'	Hal Finkel	2014-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We had mistakenly believed that GCC's 'cc' referred to the entire condition-code register (cr0 through cr7) -- and implemented this in r205630 to fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM to clobber too much with legacy code with inline asm using the 'cc' clobber. Fixes PR21451. llvm-svn: 223328
*	[PowerPC] Fix inline asm memory operands not to use r0	Hal Finkel	2014-12-03	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is a register containing the address. As a result, this register cannot be r0, and we need to enforce this register subclass constraint to prevent miscompiling the code (we'd get this constraint for free with the usual instruction definitions, but that scheme has no knowledge of how we end up printing inline asm memory operands, and so here we need to do it 'by hand'). We can accomplish this within the current address-mode selection framework by introducing an explicit COPY_TO_REGCLASS node. Fixes PR21443. llvm-svn: 223318
*	[RegAllocFast] Handle implicit definitions conservatively.	Quentin Colombet	2014-12-03	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, physical registers defined implicitly were considered free right after their definition, i.e.. like dead definitions. Therefore, their uses had to immediately follow their definitions, otherwise the related register may be reused to allocate a virtual register. This commit fixes this assumption by keeping implicit definitions alive until they are actually used. The downside is that if the implicit definition was dead (and not marked at such), we block an otherwise available register. This is however conservatively correct and makes the fast register allocator much more robust in particular regarding the scheduling of the instructions. Fixes PR21700. llvm-svn: 223317
*	This reverts commit r223306 and r223277.	Rafael Espindola	2014-12-03	1	-1/+1
\| \| \| \| \| \|	The code is using uninitialized memory and failing on linux. llvm-svn: 223315