summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [WebAssembly] Update the br_if instructions' operand orders to match the spec.Dan Gohman2016-02-083-18/+18
| | | | llvm-svn: 260152
* [x86] convert masked store of one element to scalar storeSanjay Patel2016-02-081-2/+75
| | | | | | | | | | | Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145
* AMDGPU/SI: Implement a work-around for smrd corrupting vccz bitTom Stellard2016-02-081-1/+55
| | | | | | | | | | | | | | Summary: We will hit this once we have enabled uniform branches. The smrd-vccz-bug.ll test will be added with the uniform branch commit. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16725 llvm-svn: 260137
* [X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532)Hans Wennborg2016-02-082-10/+9
| | | | | | | | | | | | | | | | | | | | This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133
* AArch64: match correct order in subtraction pattern.Tim Northover2016-02-081-4/+4
| | | | | | | The accumulator in multiply-and-subtract instructions is actually subtracted *from* so these patterns were computing the wrong value. llvm-svn: 260131
* AMDGPU: Remove bfi and bfm intrinsicsMatt Arsenault2016-02-082-13/+0
| | | | | | Nothing is using them. llvm-svn: 260123
* [AVX512][PROLQ][PROLD] Change imm8 to intMichael Zuckerman2016-02-081-6/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D16983 llvm-svn: 260101
* [WebAssembly] Add another optimization idea to README.txt.Dan Gohman2016-02-081-0/+5
| | | | llvm-svn: 260070
* [X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461.Craig Topper2016-02-081-1/+1
| | | | llvm-svn: 260069
* [X86][SSE] Resolve target shuffle inputs to sentinels to permit more combinesSimon Pilgrim2016-02-071-39/+107
| | | | | | | | | | | | The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input. This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle. Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle. Differential Revision: http://reviews.llvm.org/D16683 llvm-svn: 260063
* [X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding.Simon Pilgrim2016-02-073-0/+48
| | | | llvm-svn: 260034
* [X86][AVX512] add intrinsics of Scalar FP to integer conversion with ↵Asaf Badouh2016-02-075-32/+70
| | | | | | | | rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033
* [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. ↵Simon Pilgrim2016-02-071-136/+89
| | | | | | | | | | NFCI. Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions. The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively. llvm-svn: 260032
* AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.Igor Breger2016-02-073-70/+89
| | | | | | Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024
* [X86][AVX512] Added support for VPMOVZX shuffle decoding.Simon Pilgrim2016-02-061-75/+35
| | | | llvm-svn: 260007
* [X86][SSE] Moved shuffle decode CASE macros earlier. NFC.Simon Pilgrim2016-02-061-48/+48
| | | | | | To allow the helper functions to make use of them. llvm-svn: 259997
* [X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input typesSimon Pilgrim2016-02-063-75/+47
| | | | | | | | First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement. This should also make it easier to decode X86ISD::VZEXT target shuffles in the future. llvm-svn: 259995
* [X86][SSE] Don't replace an existing 32-bit load with its duplicateSimon Pilgrim2016-02-061-1/+2
| | | | | | | | If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991
* Comment fixSimon Pilgrim2016-02-061-1/+1
| | | | llvm-svn: 259990
* [AArch64] Add the scheduling model for Exynos-M1Evandro Menezes2016-02-062-2/+361
| | | | | | | | | | | | | | Summary: Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A). Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover Subscribers: aemerson, rengolin, MatzeB Differential Revision: http://reviews.llvm.org/D16644 llvm-svn: 259958
* [AArch64] Refactoring aarch64-ldst-opt. NCF.Jun Bum Lim2016-02-051-25/+38
| | | | | | | Remove narrow load / store instructions from getMatchingPairOpcode(), and add getMatchingWideOpcode(). llvm-svn: 259914
* AMDGPU: Account for LDS alignmentMatt Arsenault2016-02-051-4/+9
| | | | | | | | | | | | | The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912
* AMDGPU: Preserve alignments on new created globalsMatt Arsenault2016-02-051-2/+10
| | | | | | | Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911
* AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfoTom Stellard2016-02-055-96/+28
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900
* AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructorsTom Stellard2016-02-052-10/+10
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897
* AMDGPU/SI: Correctly initialize SIInsertWaits passTom Stellard2016-02-053-7/+22
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894
* [WebAssembly] Update the select instructions' operand orders to match the spec.Dan Gohman2016-02-052-16/+16
| | | | llvm-svn: 259893
* Fix for PR 26193Nemanja Ivanovic2016-02-051-1/+1
| | | | | | | This is a simple fix for a PowerPC intrinsic that was incorrectly defined (the return type was incorrect). llvm-svn: 259886
* Move classes defined in a cpp file into an anonymous namespace.Benjamin Kramer2016-02-051-0/+2
| | | | | | No functionality change intended. llvm-svn: 259883
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."Renato Golin2016-02-051-76/+21
| | | | | | This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881
* Fix for PR 26356Nemanja Ivanovic2016-02-041-5/+4
| | | | | | | | | Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840
* [AArch64] Bound the number of instructions we scan when searching for updates.Chad Rosier2016-02-041-14/+26
| | | | | | | This only impacts the creation of pre-/post-index instructions. The bound was set high enough such that it did not change code generation for SPEC200X. llvm-svn: 259828
* [X86][SSE] Select domain for 32/64-bit partial loads for ↵Simon Pilgrim2016-02-043-33/+41
| | | | | | | | | | EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816
* [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).Chad Rosier2016-02-041-21/+76
| | | | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812
* [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'Silviu Baranga2016-02-041-0/+40
| | | | | | | | | | | | | | | | | | | | | | | During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800
* [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to ↵Simon Pilgrim2016-02-042-17/+44
| | | | | | | | | | EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796
* Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."Chad Rosier2016-02-041-77/+22
| | | | | | This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795
* AVX-512: Fixed a bug in FMA instruction selection on KNLElena Demikhovsky2016-02-041-1/+1
| | | | | | | | The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792
* [AArch64] Improve load/store optimizer to handle LDUR + LDR.Chad Rosier2016-02-041-22/+77
| | | | | | | | | | | | | | This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790
* [AVX512] add vfmadd132ss and vfmadd132sd IntrinsicMichael Zuckerman2016-02-043-11/+42
| | | | | | Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789
* [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.Simon Pilgrim2016-02-041-60/+84
| | | | llvm-svn: 259771
* [X86] Use hash table in LEA optimization pass.Andrey Turetskiy2016-02-041-150/+247
| | | | | | | | Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time. Differential Revision: http://reviews.llvm.org/D16404 llvm-svn: 259770
* [NVPTX] Disable performance optimizations when OptLevel==NoneJingyue Wu2016-02-041-21/+36
| | | | | | | | | | Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
* clean up; NFCSanjay Patel2016-02-031-15/+13
| | | | llvm-svn: 259720
* ARM: support TLS for WoASaleem Abdulrasool2016-02-035-0/+62
| | | | | | | | | | | Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676
* [ARM] Move GNUEABI divmod to __aeabi_divmod*Renato Golin2016-02-031-2/+4
| | | | | | | | | | The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657
* [mips] Remove redundant inclusions of MipsAnalyzeImmediate.hDaniel Sanders2016-02-039-8/+1
| | | | llvm-svn: 259655
* Fix for PR 26381Nemanja Ivanovic2016-02-031-1/+1
| | | | | | Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645
* [mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sectionsSimon Atanasyan2016-02-031-2/+4
| | | | | | | | | | MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641
* [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to ↵Simon Pilgrim2016-02-034-124/+121
| | | | | | | | | | | | EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635
OpenPOWER on IntegriCloud