summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-2676-381/+384
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* Model sqrtsd as a binary operation with one source operand tied to the ↵Sanjay Patel2015-01-261-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | destination (PR14221) This patch fixes the following miscompile: define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp { %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind %a0 = extractelement <2 x double> %0, i32 0 %conv = fptrunc double %a0 to float %a1 = extractelement <2 x double> %0, i32 1 %conv3 = fptrunc double %a1 to float tail call void @callee2(float %conv, float %conv3) nounwind ret void } Current codegen: sqrtsd %xmm0, %xmm1 ## high element of %xmm1 is undef here xorps %xmm0, %xmm0 cvtsd2ss %xmm1, %xmm0 shufpd $1, %xmm1, %xmm1 cvtsd2ss %xmm1, %xmm1 ## operating on undef value jmp _callee This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); this should be the final patch needed to resolve that bug. Differential Revision: http://reviews.llvm.org/D6885 llvm-svn: 227111
* Move the Mips target to storing the ABI in the TargetMachine ratherEric Christopher2015-01-2616-115/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | than on MipsSubtargetInfo. This required a bit of massaging in the MC level to handle this since MC is a) largely a collection of disparate classes with no hierarchy, and b) there's no overarching equivalent to the TargetMachine, instead only the subtarget via MCSubtargetInfo (which is the base class of TargetSubtargetInfo). We're now storing the ABI in both the TargetMachine level and in the MC level because the AsmParser and the TargetStreamer both need to know what ABI we have to parse assembly and emit objects. The target streamer has a pointer to the one in the asm parser and is updated when the asm parser is created. This is fragile as the FIXME comment notes, but shouldn't be a problem in practice since we always create an asm parser before attempting to emit object code via the assembler. The TargetMachine now contains the ABI so that the DataLayout can be constructed dependent upon ABI. All testcases have been updated to use the -target-abi command line flag so that we can set the ABI without using a subtarget feature. Should be no change visible externally here. llvm-svn: 227102
* [mips] Enable arithmetic and binary operations for the i128 data type.Vasileios Kalintiris2015-01-264-30/+66
| | | | | | | | | | | | | | | | | | Summary: This patch adds support for some operations that were missing from 128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these changes we can support the __int128_t and __uint128_t data types from C/C++. Depends on D7125 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7143 llvm-svn: 227089
* When disassembler meets compact jump instructions for r6 it crashes as the ↵Vladimir Medic2015-01-261-1/+0
| | | | | | access to operands array is out of range. This patch removes dedicated decoder method that wrongly handles decoding of these instructions. llvm-svn: 227084
* Revert "[mips] Fix assertion on i128 addition/subtraction on MIPS64"Vasileios Kalintiris2015-01-262-37/+5
| | | | | | | | This reverts commit r227003. Support for addition/subtraction and various other operations for the i128 data type will be added in a future commit based on the review D7143. llvm-svn: 227082
* Correct the header guard for MipsABIInfo.h.Eric Christopher2015-01-261-2/+2
| | | | llvm-svn: 227076
* Fix a problem where the AArch64 ELF assembler was failing withEric Christopher2015-01-261-1/+2
| | | | | | | | | -no-exec-stack. This was due to it not deriving from the correct asm info base class and missing the override for the exec stack section query. Added another line to the noexec test line to make sure this doesn't regress. llvm-svn: 227074
* [PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8Bill Schmidt2015-01-251-14/+30
| | | | | | | | | | | | | | | | | | | | | Test by Nemanja Ivanovic. Since ppc64le implies POWER8 as a minimum, it makes sense that the same features are included. Since the pwr8 processor model will likely be getting new features until the implementation is complete, I created a new list to add these updates to. This will include them in both pwr8 and ppc64le. Furthermore, it seems that it would make sense to compose the feature lists for other processor models (pwr3 and up). Per discussion in the review, I will make this change in a subsequent patch. In order to test the changes, I've added an additional run step to test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu option. Since the feature lists are the same, the behaviour should be unchanged. llvm-svn: 227053
* AVX-512: Changes in operations on masks registers for KNL and SKXElena Demikhovsky2015-01-253-14/+53
| | | | | | | | | - Added KSHIFTB/D/Q for skx - Added KORTESTB/D/Q for skx - Fixed store operation for v8i1 type for KNL - Store size of v8i1, v4i1 and v2i1 are changed to 8 bits llvm-svn: 227043
* [X86] Give scalar VRNDSCALE instructions priority in AVX512 mode.Craig Topper2015-01-252-20/+26
| | | | llvm-svn: 227039
* Simplify a multiclass. No functional change.Craig Topper2015-01-251-14/+18
| | | | llvm-svn: 227038
* Remove tab characters. NFCCraig Topper2015-01-251-1/+1
| | | | llvm-svn: 227036
* Implemented cost model for masked load/store operations.Elena Demikhovsky2015-01-251-0/+57
| | | | llvm-svn: 227035
* [X86] Replace i32i8imm on SSE/AVX instructions with i32u8imm which will make ↵Craig Topper2015-01-254-30/+38
| | | | | | the assembler bounds check them. It will also make them print as unsigned. llvm-svn: 227032
* [X86] Use u8imm in several places that used i32i8imm that don't require an ↵Craig Topper2015-01-252-20/+20
| | | | | | i32 type. llvm-svn: 227031
* Remove tab characters. NFC.Craig Topper2015-01-251-1/+1
| | | | llvm-svn: 227030
* BPF backendAlexei Starovoitov2015-01-2444-1/+3266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: V8->V9: - cleanup tests V7->V8: - addressed feedback from David: - switched to range-based 'for' loops - fixed formatting of tests V6->V7: - rebased and adjusted AsmPrinter args - CamelCased .td, fixed formatting, cleaned up names, removed unused patterns - diffstat: 3 files changed, 203 insertions(+), 227 deletions(-) V5->V6: - addressed feedback from Chandler: - reinstated full verbose standard banner in all files - fixed variables that were not in CamelCase - fixed names of #ifdef in header files - removed redundant braces in if/else chains with single statements - fixed comments - removed trailing empty line - dropped debug annotations from tests - diffstat of these changes: 46 files changed, 456 insertions(+), 469 deletions(-) V4->V5: - fix setLoadExtAction() interface - clang-formated all where it made sense V3->V4: - added CODE_OWNERS entry for BPF backend V2->V3: - fix metadata in tests V1->V2: - addressed feedback from Tom and Matt - removed top level change to configure (now everything via 'experimental-backend') - reworked error reporting via DiagnosticInfo (similar to R600) - added few more tests - added cmake build - added Triple::bpf - tested on linux and darwin V1 cover letter: --------------------- recently linux gained "universal in-kernel virtual machine" which is called eBPF or extended BPF. The name comes from "Berkeley Packet Filter", since new instruction set is based on it. This patch adds a new backend that emits extended BPF instruction set. The concept and development are covered by the following articles: http://lwn.net/Articles/599755/ http://lwn.net/Articles/575531/ http://lwn.net/Articles/603983/ http://lwn.net/Articles/606089/ http://lwn.net/Articles/612878/ One of use cases: dtrace/systemtap alternative. bpf syscall manpage: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe instruction set description and differences vs classic BPF: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt Short summary of instruction set: - 64-bit registers R0 - return value from in-kernel function, and exit value for BPF program R1 - R5 - arguments from BPF program to in-kernel function R6 - R9 - callee saved registers that in-kernel function will preserve R10 - read-only frame pointer to access stack - two-operand instructions like +, -, *, mov, load/store - implicit prologue/epilogue (invisible stack pointer) - no floating point, no simd Short history of extended BPF in kernel: interpreter in 3.15, x64 JIT in 3.16, arm64 JIT, verifier, bpf syscall in 3.18, more to come in the future. It's a very small and simple backend. There is no support for global variables, arbitrary function calls, floating point, varargs, exceptions, indirect jumps, arbitrary pointer arithmetic, alloca, etc. From C front-end point of view it's very restricted. It's done on purpose, since kernel rejects all programs that it cannot prove safe. It rejects programs with loops and with memory accesses via arbitrary pointers. When kernel accepts the program it is guaranteed that program will terminate and will not crash the kernel. This patch implements all 'must have' bits. There are several things on TODO list, so this is not the end of development. Most of the code is a boiler plate code, copy-pasted from other backends. Only odd things are lack or < and <= instructions, specialized load_byte intrinsics and 'compare and goto' as single instruction. Current instruction set is fixed, but more instructions can be added in the future. Signed-off-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Subscribers: majnemer, chandlerc, echristo, joerg, pete, rengolin, kristof.beyls, arsenm, t.p.northover, tstellarAMD, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6494 llvm-svn: 227008
* [mips] Fix 'jumpy' debug line info around calls.Daniel Sanders2015-01-243-39/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: At the moment, address calculation is taking the debug line info from the address node (e.g. TargetGlobalAddress). When a function is called multiple times, this results in output of the form: .loc $first_call_location .. address calculation .. .. function call .. .. address calculation .. .loc $second_call_location .. function call .. .loc $first_call_location .. address calculation .. .loc $third_call_location .. function call .. This patch makes address calculations for function calls take the debug line info for the call node and results in output of the form: .loc $first_call_location .. address calculation .. .. function call .. .loc $second_call_location .. address calculation .. .. function call .. .loc $third_call_location .. address calculation .. .. function call .. All other address calculations continue to use the address node. Test Plan: Fixes test/DebugInfo/multiline.ll on a mips host. Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D7050 llvm-svn: 227005
* [mips] Fix assertion on i128 addition/subtraction on MIPS64Daniel Sanders2015-01-242-5/+37
| | | | | | | | | | | | | | | | Summary: In addition to the included tests, this fixes test/CodeGen/Generic/i128-addsub.ll on a mips64 host. Reviewers: atanasyan, sagar, vmedic Reviewed By: vmedic Subscribers: sdkie, llvm-commits Differential Revision: http://reviews.llvm.org/D6610 llvm-svn: 227003
* [PM] Rework how the TargetLibraryInfo pass integrates with the new passChandler Carruth2015-01-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | manager to support the actual uses of it. =] When I ported instcombine to the new pass manager I discover that it didn't work because TLI wasn't available in the right places. This is a somewhat surprising and/or subtle aspect of the new pass manager design that came up before but I think is useful to be reminded of: While the new pass manager *allows* a function pass to query a module analysis, it requires that the module analysis is already run and cached prior to the function pass manager starting up, possibly with a 'require<foo>' style utility in the pass pipeline. This is an intentional hurdle because using a module analysis from a function pass *requires* that the module analysis is run prior to entering the function pass manager. Otherwise the other functions in the module could be in who-knows-what state, etc. A somewhat surprising consequence of this design decision (at least to me) is that you have to design a function pass that leverages a module analysis to do so as an optional feature. Even if that means your function pass does no work in the absence of the module analysis, you have to handle that possibility and remain conservatively correct. This is a natural consequence of things being able to invalidate the module analysis and us being unable to re-run it. And it's a generally good thing because it lets us reorder passes arbitrarily without breaking correctness, etc. This ends up causing problems in one case. What if we have a module analysis that is *definitionally* impossible to invalidate. In the places this might come up, the analysis is usually also definitionally trivial to run even while other transformation passes run on the module, regardless of the state of anything. And so, it follows that it is natural to have a hard requirement on such analyses from a function pass. It turns out, that TargetLibraryInfo is just such an analysis, and InstCombine has a hard requirement on it. The approach I've taken here is to produce an analysis that models this flexibility by making it both a module and a function analysis. This exposes the fact that it is in fact safe to compute at any point. We can even make it a valid CGSCC analysis at some point if that is useful. However, we don't want to have a copy of the actual target library info state for each function! This state is specific to the triple. The somewhat direct and blunt approach here is to turn TLI into a pimpl, with the state and mutators in the implementation class and the query routines primarily in the wrapper. Then the analysis can lazily construct and cache the implementations, keyed on the triple, and on-demand produce wrappers of them for each function. One minor annoyance is that we will end up with a wrapper for each function in the module. While this is a bit wasteful (one pointer per function) it seems tolerable. And it has the advantage of ensuring that we pay the absolute minimum synchronization cost to access this information should we end up with a nice parallel function pass manager in the future. We could look into trying to mark when analysis results are especially cheap to recompute and more eagerly GC-ing the cached results, or we could look at supporting a variant of analyses whose results are specifically *not* cached and expected to just be used and discarded by the consumer. Either way, these seem like incremental enhancements that should happen when we start profiling the memory and CPU usage of the new pass manager and not before. The other minor annoyance is that if we end up using the TLI in both a module pass and a function pass, those will be produced by two separate analyses, and thus will point to separate copies of the implementation state. While a minor issue, I dislike this and would like to find a way to cleanly allow a single analysis instance to be used across multiple IR unit managers. But I don't have a good solution to this today, and I don't want to hold up all of the work waiting to come up with one. This too seems like a reasonable thing to incrementally improve later. llvm-svn: 226981
* [AArch64][LoadStoreOptimizer] Form LDPSW when possible.Quentin Colombet2015-01-241-1/+15
| | | | | | | | | This patch adds the missing LD[U]RSW variants to the load store optimizer, so that we generate LDPSW when possible. <rdar://problem/19583480> llvm-svn: 226978
* [x86] Fix a commentBruno Cardoso Lopes2015-01-241-1/+1
| | | | llvm-svn: 226974
* R600/SI: Emit .hsa.version section for amdhsa OSTom Stellard2015-01-231-1/+13
| | | | llvm-svn: 226970
* [x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vectorBruno Cardoso Lopes2015-01-231-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of using stack store/load pair to do the job, use scalar_to_vector directly, which in the MMX case can use movq2dq. This was the current behavior prior to improvements for vector legalization of extloads in r213897. This commit fixes the regression and as a side-effect also remove some unnecessary shuffles. In the new attached testcase, we go from: pshufw $-18, (%rdi), %mm0 movq %mm0, -8(%rsp) movq -8(%rsp), %xmm0 pshufd $-44, %xmm0, %xmm0 movd %xmm0, %eax ... To: pshufw $-18, (%rdi), %mm0 movq2dq %mm0, %xmm0 movd %xmm0, %eax ... Differential Revision: http://reviews.llvm.org/D7126 rdar://problem/19413324 llvm-svn: 226953
* R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()Tom Stellard2015-01-232-3/+22
| | | | | | | | | | | We used to do this promotion during DAG legalization, but this caused an infinite loop in ExpandUnalignedLoad() because it assumed that i64 loads were legal if i64 was a legal type. It also seems better to report i64 loads as legal, since they actually are and we were just promoting them to simplify our tablegen files. llvm-svn: 226945
* [mips] fix spelling of 'disassembler'Alexei Starovoitov2015-01-231-3/+3
| | | | | | trivial first commit llvm-svn: 226935
* Classify functions by EH personality type rather than using the tripleReid Kleckner2015-01-231-4/+2
| | | | | | | | | | | | | | | This mostly reverts commit r222062 and replaces it with a new enum. At some point this enum will grow at least for other MSVC EH personalities. Also beefs up the way we were sniffing the personality function. Previously we would emit the Itanium LSDA despite using __C_specific_handler. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6987 llvm-svn: 226920
* Remove some local variables in place of just querying for themEric Christopher2015-01-231-6/+4
| | | | | | in the couple of asserts. llvm-svn: 226917
* [mips] Add new error message and improve testing for parsing the .module ↵Toma Tabacu2015-01-231-26/+27
| | | | | | | | | | | | | | | | | | | | | | | directive. Summary: We used to silently ignore any empty .module's and we used to give an error saying that we found an "unexpected token at start of statement" when the value of the option wasn't an identifier (e.g. if it was a number). We now give an error saying that we "expected .module option identifier" in both of those cases. I also fixed the other tests in mips-abi-bad.s, which all seemed to be broken. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7095 llvm-svn: 226905
* This patch fixes issue with lowering below mentioned pattern :-Jyoti Allur2015-01-231-7/+10
| | | | | | | | | | | | | | | | | | _foo: smull r0, r1, r1, r0 smull r2, r3, r3, r2 adds r0, r2, r0 adc r1, r3, r1 bx lr to _foo: smull r0, r1, r1, r0 smlal r0, r1, r3, r2 bx lr llvm-svn: 226904
* [x86] Change u8imm operands to always print as unsigned. This makes shuffle ↵Craig Topper2015-01-235-0/+15
| | | | | | masks and the like make way more sense. llvm-svn: 226902
* [X86] Add IntrNoMem to the AVX512 conflict intrinsics.Craig Topper2015-01-231-1/+9
| | | | llvm-svn: 226897
* Reformat.NAKAMURA Takumi2015-01-231-3/+2
| | | | llvm-svn: 226888
* MipsAsmParser.cpp: Suppress a warning introduced in r226657. [-Wunused-variable]NAKAMURA Takumi2015-01-231-3/+2
| | | | llvm-svn: 226887
* R600: Try to use lower types for 64bit division if possibleJan Vesely2015-01-222-13/+39
| | | | | | | | v2: add and enable tests for SI Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226881
* R600: Simplify LowerUDIVREMJan Vesely2015-01-221-19/+11
| | | | | | | | | | | optimizations can handle removing the Hi part operations. The generated code is identical for R600, ~10% icount reduction for SI v2: rebase Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226879
* [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.Simon Pilgrim2015-01-221-2/+5
| | | | | | | | Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing. Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP. llvm-svn: 226873
* AArch64: decode all MRS/MSR forms early to avoid saving FeatureBits.Tim Northover2015-01-221-42/+35
| | | | | | | | | | | | Currently, we're adding a uint64_t describing the current subtarget so that matching can check whether the specified register is valid. However, we want to move to a bitset for those bits (x86 has more than 64 of them). This can't live in a union so it's probably better to do the checks early (especially as there are only 3 of them). llvm-svn: 226841
* Mark |TLI| variables used to suppress -Wunused-variable warnings.Alexander Potapenko2015-01-221-0/+2
| | | | | | (These vars are only used in assertions) llvm-svn: 226815
* Fixed a bug in type legalizer for masked load/store intrinsics.Elena Demikhovsky2015-01-221-0/+164
| | | | | | | | | | | | The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808
* Revert r226798. Guess I missed the patterns.Craig Topper2015-01-221-2/+2
| | | | llvm-svn: 226802
* Use u8imm instead of i32i8imm on a couple instructions that have no patterns ↵Craig Topper2015-01-221-2/+2
| | | | | | and thus no reason to use a larger operand size. llvm-svn: 226798
* [X86] Remove some unused multiclasses from AVX512 instruction file.Craig Topper2015-01-221-101/+0
| | | | llvm-svn: 226797
* ARM: fail less catastrophically on invalid Windows inputSaleem Abdulrasool2015-01-222-5/+15
| | | | | | | | | | | | Windows supports a restricted set of relocations (compared to ARM ELF). In some cases, we may end up generating an unsupported relocation. This can occur with bad input to the assembler in particular (the frontend should never generate code that cannot be compiled). Generate an error rather than just aborting. The change in the API is driven by the desire to provide a slightly more helpful message for debugging purposes. llvm-svn: 226779
* [X86][SSE] Missing SSE/AVX1 memory folding integer instructionsSimon Pilgrim2015-01-211-1/+57
| | | | | | | | | | Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1. The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests. Differential Revision: http://reviews.llvm.org/D7094 llvm-svn: 226745
* [X86][SSE] Added support for SSE3 lane duplication shuffle instructionsSimon Pilgrim2015-01-212-25/+37
| | | | | | | | | | | | This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716
* Fix load-store optimizer on thumbv4tJonathan Roelofs2015-01-211-3/+14
| | | | | | | | | | Thumbv4t does not have lo->lo copies other than MOVS, and that can't be predicated. So emit MOVS when needed and bail if there's a predicate. http://reviews.llvm.org/D6592 llvm-svn: 226711
* [X86][SSE] movddup shuffle mask decodesSimon Pilgrim2015-01-213-18/+52
| | | | | | Patch to provide shuffle decodes and asm comments for the SSE3/AVX1 movddup double duplication instructions. llvm-svn: 226705
* R600/SI: Custom lower froundMatt Arsenault2015-01-214-24/+117
| | | | | | | | | This fixes it for SI. It also removes the pattern used previously for Evergreen for f32. I'm not sure if the the new R600 output is better or not, but it uses 1 fewer instructions if BFI is available. llvm-svn: 226682
OpenPOWER on IntegriCloud