summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Enable SSA lowering and other pre-regalloc passesDan Gohman2015-09-081-1/+21
| | | | llvm-svn: 247008
* Removed an old comment, NFCElena Demikhovsky2015-09-081-2/+0
| | | | llvm-svn: 247006
* [mips][microMIPS] Implement SB, SBE, SCE, SH and SHE instructionsZoran Jovanovic2015-09-086-2/+109
| | | | | | Differential Revision: http://reviews.llvm.org/D11801 llvm-svn: 246999
* [mips] Reserve address spaces 1-255 for software use.Daniel Sanders2015-09-081-0/+8
| | | | | | | | | | | | Summary: And define them to have noop casts with address spaces 0-255. Reviewers: pekka.jaaskelainen Subscribers: pekka.jaaskelainen, llvm-commits Differential Revision: http://reviews.llvm.org/D12678 llvm-svn: 246990
* [mips][microMIPS] Add microMIPS32r6 and microMIPS64r6 tests for existing ↵Zoran Jovanovic2015-09-082-2/+4
| | | | | | | | 16-bit LBU16, LHU16, LW16, LWGP and LWSP instructions Differential Revision: http://reviews.llvm.org/D10956 llvm-svn: 246987
* compilation issue, NFCElena Demikhovsky2015-09-081-3/+3
| | | | llvm-svn: 246983
* fixed compilation issue, NFC.Elena Demikhovsky2015-09-081-3/+3
| | | | llvm-svn: 246982
* AVX-512: Lowering for 512-bit vector shuffles.Elena Demikhovsky2015-09-084-68/+324
| | | | | | | | Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981
* [mips][microMIPS] Implement ABS.fmt, CEIL.L.fmt, CEIL.W.fmt, FLOOR.L.fmt, ↵Zoran Jovanovic2015-09-074-12/+155
| | | | | | | | FLOOR.W.fmt, TRUNC.L.fmt, TRUNC.W.fmt, RSQRT.fmt and SQRT.fmt instructions Differential Revision: http://reviews.llvm.org/D11674 llvm-svn: 246968
* [mips][microMIPS] Implement BC16, BEQZC16 and BNEZC16 instructionsZoran Jovanovic2015-09-076-10/+94
| | | | | | Differential Revision: http://reviews.llvm.org/D11181 llvm-svn: 246963
* [ARM] Get rid of SelectT2ShifterOperandReg, NFCJohn Brawn2015-09-072-26/+2
| | | | | | | | | SelectT2ShifterOperandReg has identical behaviour to SelectImmShifterOperand, so get rid of it and use SelectImmShifterOperand instead. Differential Revision: http://reviews.llvm.org/D12195 llvm-svn: 246962
* [mips][microMIPS] Implement CVT.D.fmt, CVT.L.fmt, CVT.S.fmt, CVT.W.fmt, ↵Zoran Jovanovic2015-09-074-58/+290
| | | | | | | | MAX.fmt, MIN.fmt, MAXA.fmt, MINA.fmt and CMP.condn.fmt instructions Differential Revision: http://reviews.llvm.org/D12141 llvm-svn: 246960
* Prune utf8 chars in comments.NAKAMURA Takumi2015-09-071-2/+2
| | | | llvm-svn: 246953
* [PowerPC] Don't commute trivial rlwimi instructionsHal Finkel2015-09-061-0/+5
| | | | | | | | | | | | | | | To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937
* [mips][microMIPS] Implement ADD.fmt, SUB.fmt, MOV.fmt, MUL.fmt, DIV.fmt, ↵Zoran Jovanovic2015-09-053-4/+159
| | | | | | | | MADDF.fmt, MSUBF.fmt and NEG.fmt instructions Differential Revision: http://reviews.llvm.org/D11978 llvm-svn: 246919
* [PowerPC] Fix and(or(x, c1), c2) -> rlwimi generationHal Finkel2015-09-051-3/+15
| | | | | | | | | | | | | | | | PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from an input pattern that looks like this: and(or(x, c1), c2) but the associated logic does not work if there are bits that are 1 in c1 but 0 in c2 (these are normally canonicalized away, but that can't happen if the 'or' has other users. Make sure we abort the transformation if such bits are discovered. Fixes PR24704. llvm-svn: 246900
* [PowerPC] Enable interleaved-access vectorizationHal Finkel2015-09-042-1/+43
| | | | | | | | | | | This adds a basic cost model for interleaved-access vectorization (and a better default for shuffles), and enables interleaved-access vectorization by default. The relevant difference from the default cost model for interleaved-access vectorization, is that on PPC, the shuffles that end up being used are *much* cheaper than modeling the process with insert/extract pairs (which are quite expensive, especially on older cores). llvm-svn: 246824
* [PowerPC] Always use aggressive interleaving on the A2Hal Finkel2015-09-031-0/+7
| | | | | | | | On the A2, with an eye toward QPX unaligned-load merging, we should always use aggressive interleaving. It is generally superior to only using concatenation unrolling. llvm-svn: 246819
* [PowerPC] Try harder to find a base+offset when looking for consecutive accessesHal Finkel2015-09-031-7/+23
| | | | | | | | | | | | | | | | | | When forming permutation-based unaligned vector loads, we need to know whether it is valid to read ahead of the requested address by a full vector length. Doing so is more efficient (and allows for more CSE with later loads), but could trigger a page fault if invalid. To determine validity, we look for other loads in the same block that access the relevant address range. The relevant point here is that we need to do this as part of the process of forming permutation-based vector loads, and this happens quite early in the SDAG pipeline - specifically before many of the address calculations are fully canonicalized. As a result, we need to try harder to recognize base+offset address computations, because they still might appear as chain of adds (base+offset+offset, for example). To account for this, we'll look through chains of adds, accumulating the constant offsets. llvm-svn: 246813
* [PowerPC] Include the permutation cost for unaligned vector loadsHal Finkel2015-09-031-8/+12
| | | | | | | | | | Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX types), even when accounting for the combining that takes place for multiple consecutive such loads, there is at least one load instructions and one permutation for each load. Make sure the cost reported reflects the cost of the permutes as well. llvm-svn: 246807
* [PowerPC] Compute the MMO offset for an unaligned load with signed arithmeticHal Finkel2015-09-031-1/+2
| | | | | | | | | | | If you compute the MMO offset using unsigned arithmetic, you end up with a large positive offset instead of a small negative one. In theory, this could cause bad instruction-scheduling decisions later. I noticed this by inspection from the debug output, and using that for the regression test is the best I can do right now. llvm-svn: 246805
* [AArch64] Improve ISel using across lane addition reduction.Chad Rosier2015-09-031-0/+99
| | | | | | | | | | | | | | | | | | | | In vectorized add reduction code, the final "reduce" step is sub-optimal. This change wll combine : ext v1.16b, v0.16b, v0.16b, #8 add v0.4s, v1.4s, v0.4s dup v1.4s, v0.s[1] add v0.4s, v1.4s, v0.4s into addv s0, v0.4s PR21371 http://reviews.llvm.org/D12325 Patch by Jun Bum Lim <junbuml@codeaurora.org>! llvm-svn: 246790
* Sink COFF.h MC include into .cpp filesReid Kleckner2015-09-031-0/+1
| | | | | | | | This prevents MC clients from getting COFF.h, which conflicts with winnt.h macros. Also a minor IWYU cleanup. Now the only public headers including COFF.h are in Object, and they actually need it. llvm-svn: 246784
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."Chad Rosier2015-09-031-77/+21
| | | | | | | | This reverts commit r246769. This appears to have broken Multisource/Benchmarks/tramp3d-v4. llvm-svn: 246782
* [x86] enable machine combiner reassociations for scalar 'xor' instsSanjay Patel2015-09-031-0/+4
| | | | llvm-svn: 246781
* check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Sanjay Patel2015-09-031-1/+4
| | | | | | | | | | | | | | | | Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771
* [AArch64] Improve load/store optimizer to handle LDUR + LDR.Chad Rosier2015-09-031-21/+77
| | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. llvm-svn: 246769
* [AArch64] Reuse MayLoad. NFC.Chad Rosier2015-09-031-1/+1
| | | | llvm-svn: 246767
* [mips] Added support for the div, divu, ddiv and ddivu macros which use ↵Daniel Sanders2015-09-035-1/+182
| | | | | | | | | | | | | | | | traps and breaks in the integrated assembler. Summary: Patch by Scott Egerton Reviewers: vkalintiris, dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11675 llvm-svn: 246763
* AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, ↵Igor Breger2015-09-035-117/+54
| | | | | | | | | | vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750
* [X86] Require 32-byte alignment for 32-byte VMOVNTs.Ahmed Bougacha2015-09-021-2/+4
| | | | | | | | | | | | | | | | We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733
* [X86] Cleanup nontemporal fragments. NFCI.Ahmed Bougacha2015-09-021-15/+6
| | | | | | | | We can chain other fragments to avoid repeating conditions. This also fixes a potential bug (that realistically can't happen), where we would match indexed nontemporal stores for i32/i64. llvm-svn: 246719
* [PowerPC] Cleanup cost model for unaligned vector loads/storesHal Finkel2015-09-021-22/+37
| | | | | | | | | | I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712
* [AArch64] More consistently separate asm opc and operands with '\t'.Ahmed Bougacha2015-09-021-30/+30
| | | | | | Somehow missed these in r246686. llvm-svn: 246687
* [AArch64] Consistently separate asm opc and operands with '\t'.Ahmed Bougacha2015-09-021-17/+17
| | | | | | | | Some of the instructions use ' ', which drives OCD-me nuts. Let's put an end to this. NFC-ish: hopefully nobody cares about whitespace. llvm-svn: 246686
* [PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLEHal Finkel2015-09-021-6/+8
| | | | | | | | | | | LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the TableGen-generated matching code, and it does this by testing the same predicates used by the TableGen files. Unfortunately, when we added new P8Altivec-only predicates, we started universally testing them in LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8, we'd end up with a selection failure. llvm-svn: 246675
* [x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accessesSanjay Patel2015-09-021-5/+13
| | | | | | | | | | | | | | | | | This is a continuation of the fix from: http://reviews.llvm.org/D10662 and discussion in: http://reviews.llvm.org/D12154 Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) assumes that unaligned scalar accesses are always ok, so this changes allowsMisalignedMemoryAccesses() to match that behavior. Differential Revision: http://reviews.llvm.org/D12543 llvm-svn: 246658
* [X86][AVX512VLBW] add support in byte shift and SADAsaf Badouh2015-09-024-7/+83
| | | | | | | | | add byte shift left/right add SAD - compute sum of absolute differences Differential Revision: http://reviews.llvm.org/D12479 llvm-svn: 246654
* AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S ↵Igor Breger2015-09-025-17/+63
| | | | | | | | | | instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642
* AVX512: Implemented encoding and intrinsics for vshufps/d.Igor Breger2015-09-023-44/+36
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11709 llvm-svn: 246640
* AVX-512: store <4 x i1> and <2 x i1> values in memoryElena Demikhovsky2015-09-021-0/+5
| | | | | | | | Enabled DAG pattern lowering for SKX with DQI predicate. Differential Revision: http://reviews.llvm.org/D12550 llvm-svn: 246625
* [CodeGen] Fix FREM on 32-bit MSVC on x86Vedant Kumar2015-09-021-1/+11
| | | | | | | | Patch by Dylan McKay! Differential Revision: http://reviews.llvm.org/D12099 llvm-svn: 246615
* [ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.Ahmed Bougacha2015-09-011-0/+4
| | | | | | | | | The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only takes constant indices, but it does accept variables. Bail out for those: we can't use them, as the shuffles we want to reconstruct do require constant masks. llvm-svn: 246594
* rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)Sanjay Patel2015-09-015-53/+59
| | | | | | | | | | | | | | | This is a follow-on suggested by: http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 ) http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 ) This makes the attribute name match most of the existing lowering logic and regression test expectations. But the current use of this attribute is inconsistent; see the FIXME comment for "allowsMisalignedMemoryAccesses()". That change will result in functional changes and should be coming soon. llvm-svn: 246585
* [AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0.Ahmed Bougacha2015-09-015-6/+25
| | | | | | | | | | This matches the ARM behavior. In both cases, the register is part of the optional Performance Monitors extension, so, add the feature, and enable it for the A-class processors we support. Differential Revision: http://reviews.llvm.org/D12425 llvm-svn: 246555
* AVX512: Implemented intrinsics for valign.Igor Breger2015-09-011-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D12526 llvm-svn: 246551
* [AArch64] Turn on by default interleaved access vectorizationSilviu Baranga2015-09-011-0/+2
| | | | | | | | | | | | | | | | | Summary: This change turns on by default interleaved access vectorization for AArch64. We also clean up some tests which were spedifically enabling this behaviour. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12149 llvm-svn: 246542
* [ARM] Turn on by default interleaved access vectorizationSilviu Baranga2015-09-011-0/+2
| | | | | | | | | | | | | | Summary: This change turns on by default interleaved access vectorization on ARM, as it has shown to be beneficial on ARM. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12146 llvm-svn: 246541
* AMDGPU: Fix adding redundant implicit operandsMatt Arsenault2015-09-011-11/+7
| | | | | | | These are already added during the MachineInstr construction, so this was adding the implicit registers twice. llvm-svn: 246525
* WebAssembly: generate load/storeJF Bastien2015-08-313-49/+113
| | | | | | | | | Summary: This handles all load/store operations that WebAssembly defines, and handles those necessary for C++ such as i1. I left a FIXME for outstanding features which aren't required for now. Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff llvm-svn: 246500
OpenPOWER on IntegriCloud