summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. ↵Simon Pilgrim2016-02-071-136/+89
| | | | | | | | | | NFCI. Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions. The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively. llvm-svn: 260032
* AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.Igor Breger2016-02-073-70/+89
| | | | | | Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024
* [X86][AVX512] Added support for VPMOVZX shuffle decoding.Simon Pilgrim2016-02-061-75/+35
| | | | llvm-svn: 260007
* [X86][SSE] Moved shuffle decode CASE macros earlier. NFC.Simon Pilgrim2016-02-061-48/+48
| | | | | | To allow the helper functions to make use of them. llvm-svn: 259997
* [X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input typesSimon Pilgrim2016-02-063-75/+47
| | | | | | | | First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement. This should also make it easier to decode X86ISD::VZEXT target shuffles in the future. llvm-svn: 259995
* [X86][SSE] Don't replace an existing 32-bit load with its duplicateSimon Pilgrim2016-02-061-1/+2
| | | | | | | | If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991
* Comment fixSimon Pilgrim2016-02-061-1/+1
| | | | llvm-svn: 259990
* [AArch64] Add the scheduling model for Exynos-M1Evandro Menezes2016-02-062-2/+361
| | | | | | | | | | | | | | Summary: Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A). Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover Subscribers: aemerson, rengolin, MatzeB Differential Revision: http://reviews.llvm.org/D16644 llvm-svn: 259958
* [AArch64] Refactoring aarch64-ldst-opt. NCF.Jun Bum Lim2016-02-051-25/+38
| | | | | | | Remove narrow load / store instructions from getMatchingPairOpcode(), and add getMatchingWideOpcode(). llvm-svn: 259914
* AMDGPU: Account for LDS alignmentMatt Arsenault2016-02-051-4/+9
| | | | | | | | | | | | | The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912
* AMDGPU: Preserve alignments on new created globalsMatt Arsenault2016-02-051-2/+10
| | | | | | | Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911
* AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfoTom Stellard2016-02-055-96/+28
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900
* AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructorsTom Stellard2016-02-052-10/+10
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897
* AMDGPU/SI: Correctly initialize SIInsertWaits passTom Stellard2016-02-053-7/+22
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894
* [WebAssembly] Update the select instructions' operand orders to match the spec.Dan Gohman2016-02-052-16/+16
| | | | llvm-svn: 259893
* Fix for PR 26193Nemanja Ivanovic2016-02-051-1/+1
| | | | | | | This is a simple fix for a PowerPC intrinsic that was incorrectly defined (the return type was incorrect). llvm-svn: 259886
* Move classes defined in a cpp file into an anonymous namespace.Benjamin Kramer2016-02-051-0/+2
| | | | | | No functionality change intended. llvm-svn: 259883
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."Renato Golin2016-02-051-76/+21
| | | | | | This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881
* Fix for PR 26356Nemanja Ivanovic2016-02-041-5/+4
| | | | | | | | | Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840
* [AArch64] Bound the number of instructions we scan when searching for updates.Chad Rosier2016-02-041-14/+26
| | | | | | | This only impacts the creation of pre-/post-index instructions. The bound was set high enough such that it did not change code generation for SPEC200X. llvm-svn: 259828
* [X86][SSE] Select domain for 32/64-bit partial loads for ↵Simon Pilgrim2016-02-043-33/+41
| | | | | | | | | | EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816
* [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).Chad Rosier2016-02-041-21/+76
| | | | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812
* [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'Silviu Baranga2016-02-041-0/+40
| | | | | | | | | | | | | | | | | | | | | | | During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800
* [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to ↵Simon Pilgrim2016-02-042-17/+44
| | | | | | | | | | EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."Chad Rosier2016-02-041-77/+22
| | | | | | This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795
* AVX-512: Fixed a bug in FMA instruction selection on KNLElena Demikhovsky2016-02-041-1/+1
| | | | | | | | The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792
* [AArch64] Improve load/store optimizer to handle LDUR + LDR.Chad Rosier2016-02-041-22/+77
| | | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790
* [AVX512] add vfmadd132ss and vfmadd132sd IntrinsicMichael Zuckerman2016-02-043-11/+42
| | | | | | Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789
* [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.Simon Pilgrim2016-02-041-60/+84
| | | | llvm-svn: 259771
* [X86] Use hash table in LEA optimization pass.Andrey Turetskiy2016-02-041-150/+247
| | | | | | | | Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time. Differential Revision: http://reviews.llvm.org/D16404 llvm-svn: 259770
* [NVPTX] Disable performance optimizations when OptLevel==NoneJingyue Wu2016-02-041-21/+36
| | | | | | | | | | Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
* clean up; NFCSanjay Patel2016-02-031-15/+13
| | | | llvm-svn: 259720
* ARM: support TLS for WoASaleem Abdulrasool2016-02-035-0/+62
| | | | | | | | | | | Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676
* [ARM] Move GNUEABI divmod to __aeabi_divmod*Renato Golin2016-02-031-2/+4
| | | | | | | | | | The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657
* [mips] Remove redundant inclusions of MipsAnalyzeImmediate.hDaniel Sanders2016-02-039-8/+1
| | | | llvm-svn: 259655
* Fix for PR 26381Nemanja Ivanovic2016-02-031-1/+1
| | | | | | Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645
* [mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sectionsSimon Atanasyan2016-02-031-2/+4
| | | | | | | | | | MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641
* [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to ↵Simon Pilgrim2016-02-034-124/+121
| | | | | | | | | | | | EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635
* Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.Kyle Butt2016-02-031-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617
* Revert r259576: Disable the vzeroupper insertion pass on PS4.Yunzhong Gao2016-02-031-3/+0
| | | | | | Will re-implement based on review feedback. llvm-svn: 259615
* Disable the vzeroupper insertion pass on PS4.Yunzhong Gao2016-02-021-0/+3
| | | | | | | | See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation. Original patch by: Sean Silva llvm-svn: 259576
* AMDGPU: Do not promote allocas with non-inbounds GEPsMatt Arsenault2016-02-021-0/+7
| | | | | | | | If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
* AMDGPU: Handle promoting memmoveMatt Arsenault2016-02-021-0/+24
| | | | | | Also add missing tests for the others. llvm-svn: 259558
* [X86] Fix the merging of SP updates in prologue/epilogue insertions.Quentin Colombet2016-02-021-2/+7
| | | | | | | | | When the merging was involving LEAs, we were taking the wrong immediate from the list of operands. rdar://problem/24446069 llvm-svn: 259553
* AMDGPU: Skip promote alloca with no optimizationsMatt Arsenault2016-02-022-2/+2
| | | | llvm-svn: 259551
* AMDGPU: Minor cleanups for AMDGPUPromoteAllocaMatt Arsenault2016-02-021-27/+21
| | | | | | Mostly convert to use range loops. llvm-svn: 259550
* AMDGPU: Report AMDGPUPromoteAlloca changed the functionMatt Arsenault2016-02-021-22/+21
| | | | llvm-svn: 259547
* AMDGPU: Whitelist handled intrinsicsMatt Arsenault2016-02-021-8/+36
| | | | | | | We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546
* AMDGPU: Use inbounds when calculating workitem offsetMatt Arsenault2016-02-021-6/+7
| | | | | | | | | | | | | When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545
* Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.Eugene Zelenko2016-02-022-5/+1
| | | | | | Differential revision: http://reviews.llvm.org/D16793 llvm-svn: 259539
OpenPOWER on IntegriCloud