summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Cleanup function with clang-format. NFCI.Simon Pilgrim2016-11-181-3/+1
| | | | llvm-svn: 287340
* AMDGPU: Fix legalization of MUBUF instructions in shadersNicolai Haehnle2016-11-181-5/+13
| | | | | | | | | | | | | | | | | | | | | | Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339
* Fix spelling mistakes in MIPS target comments. NFC.Simon Pilgrim2016-11-183-4/+4
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287338
* [Power9] Add patterns for vnegd, vnegwEhsan Amiri2016-11-181-2/+7
| | | | | | | Exploit new instructions by adding patterns to .td file. https://reviews.llvm.org/D26551 llvm-svn: 287334
* Fix spelling mistakes in AMDGPU target comments. NFC.Simon Pilgrim2016-11-185-11/+11
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287333
* Fix typo in comment. NFC.Simon Pilgrim2016-11-181-1/+1
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287331
* [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extendedEhsan Amiri2016-11-182-1/+88
| | | | | | | | | | | | | | | | | When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329
* [AVX-512] Replace masked 16-bit element variable shift intrinsics with new ↵Craig Topper2016-11-181-9/+9
| | | | | | | | | | unmasked versions and selects. The same thing was done to 32-bit and 64-bit element sizes previously. This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics. llvm-svn: 287312
* AMDGPU: Move redundant setting of inst propertiesMatt Arsenault2016-11-181-3/+1
| | | | llvm-svn: 287311
* AMDGPU: Fix crash on illegal type for inlineasmMatt Arsenault2016-11-181-0/+2
| | | | | | | There are still crashes on non-MVT types in other places. llvm-svn: 287310
* convert bpf assembler to look like kernel verifier outputAlexei Starovoitov2016-11-183-57/+69
| | | | | | | | | | since bpf instruction set was introduced people learned to read and understand kernel verifier output whereas llvm asm output stayed obscure and unknown. Convert llvm to emit assembler text similar to kernel to avoid this discrepancy Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 287300
* [AVX-512] Support FCOPYSIGN for v16f32 and v8f64Craig Topper2016-11-181-1/+2
| | | | | | | | | | | | | | | Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298
* Fix spelling mistakes in Hexagon target comments. NFC.Simon Pilgrim2016-11-179-12/+12
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287248
* Fix spelling mistakes in X86 target comments. NFC.Simon Pilgrim2016-11-173-5/+5
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287247
* Revert "AMDGPU: Enable ConstrainCopy DAG mutation"Konstantin Zhuravlyov2016-11-171-3/+0
| | | | | | | | This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
* Wdocumentation fixSimon Pilgrim2016-11-171-5/+5
| | | | llvm-svn: 287224
* [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halvesSimon Pilgrim2016-11-171-19/+43
| | | | | | | | | | | | | | vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves. If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer). This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases). Partial fix for PR30845 Differential Revision: https://reviews.llvm.org/D26590 llvm-svn: 287223
* [ARM] Relax restriction on variadic functions for tailcall optimizationPablo Barrio2016-11-171-5/+0
| | | | | | | | | | | | | | Summary: Variadic functions can be treated in the same way as normal functions with respect to the number and types of parameters. Reviewers: grosbach, olista01, t.p.northover, rengolin Subscribers: javed.absar, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26748 llvm-svn: 287219
* [X86] RegCall - Handling v64i1 in 32/64 bit targetOren Ben Simhon2016-11-175-91/+357
| | | | | | | | | | Register Calling Convention defines a new behavior for v64i1 types. This type should be saved in GPR. However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit). Differential Revision: https://reviews.llvm.org/D26181 llvm-svn: 287217
* [X86] Fix formatting. NFCCraig Topper2016-11-171-2/+2
| | | | llvm-svn: 287211
* [XRay] Support AArch64 in LLVMDean Michael Berris2016-11-172-1/+125
| | | | | | | | | | | | | | | | | | This patch adds XRay support in LLVM for AArch64 targets. This patch is one of a series: Clang: https://reviews.llvm.org/D26415 compiler-rt: https://reviews.llvm.org/D26413 Author: rSerge Reviewers: rengolin, dberris Subscribers: amehsan, aemerson, llvm-commits, iid_iunknown Differential Revision: https://reviews.llvm.org/D26412 llvm-svn: 287209
* [CMake] NFC. Updating CMake dependency specificationsChris Bieneman2016-11-173-6/+9
| | | | | | This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system. llvm-svn: 287206
* [AMDGPU] Custom lower f16 = fp_round f64Konstantin Zhuravlyov2016-11-172-0/+23
| | | | llvm-svn: 287203
* [AMDGPU] Promote f16/i16 conversions to f32/i32Konstantin Zhuravlyov2016-11-172-58/+8
| | | | llvm-svn: 287201
* [AMDGPU] Expand `br_cc` for f16Konstantin Zhuravlyov2016-11-171-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199
* [AVR] Wrap all methods in the pseudo expansion pass in an anon namespaceDylan McKay2016-11-161-2/+2
| | | | | | | The '-fpermissive' compiler flag complains if the template specializations used in the class are used in a different namespace. llvm-svn: 287176
* [AVR] Remove unused method from AVRTargetMachineDylan McKay2016-11-161-3/+0
| | | | llvm-svn: 287173
* [x86] allow FP-logic ops when one operand is FP and result is FPSanjay Patel2016-11-161-14/+26
| | | | | | | | | | | | | | We save an inter-register file move this way. If there's any CPU where the FP logic is slower, we could transform this back to int-logic in MachineCombiner. This helps, but doesn't solve, PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 The 'andn' test shows that we're missing a pattern match to recognize the xor with -1 constant as a 'not' op. llvm-svn: 287171
* [AVR] Add the pseudo instruction expansion passDylan McKay2016-11-163-1/+1433
| | | | | | | | | | | | | | | | | | Summary: A lot of the pseudo instructions are required because LLVM assumes that all integers of the same size as the pointer size are legal. This means that it will not currently expand 16-bit instructions to their 8-bit variants because it thinks 16-bit types are legal for the operations. This also adds all of the CodeGen tests that required the pass to run. Reviewers: arsenm, kparzysz Subscribers: wdng, mgorny, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D26577 llvm-svn: 287162
* X86: Simplify X86ISD::Wrapper operand checks. NFCI.Peter Collingbourne2016-11-162-18/+8
| | | | | | | | | | | | | We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and invert the other. Also update the documentation comment for X86ISD::Wrapper. Differential Revision: https://reviews.llvm.org/D26731 llvm-svn: 287160
* ARM: fix CodeGen for 64-bit shifts.Tim Northover2016-11-161-17/+31
| | | | | | | | | One half of the shifts obviously needed conditional selection based on whether the shift amount is more than 32-bits, but leaving the other half as the natural shift isn't acceptable either: it's undefined behaviour to shift a 32-bit value by more than 31. llvm-svn: 287149
* AMDGPU: Enable ConstrainCopy DAG mutationMatt Arsenault2016-11-161-0/+3
| | | | | | | This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
* [AArch64] Handle vector types in replaceZeroVectorStore.Geoff Berry2016-11-161-20/+22
| | | | | | | | | | | | | | | | | | | Summary: Extend replaceZeroVectorStore to handle more vector type stores, floating point zero vectors and set alignment more accurately on split stores. This is a follow-up change to r286875. This change fixes PR31038. Reviewers: MatzeB Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26682 llvm-svn: 287142
* AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies passTom Stellard2016-11-164-26/+78
| | | | | | | | | | | | | | | | | | | | | | Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
* [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save ↵Sanjay Patel2016-11-161-0/+8
| | | | | | | | | | | | | | | | | | | some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122
* [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with ↵Simon Pilgrim2016-11-162-18/+15
| | | | | | | | | | | | generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen. LLVM counterpart to D26686 Differential Revision: https://reviews.llvm.org/D26736 llvm-svn: 287108
* [mips] Fix unsigned/signed type errorSimon Dardis2016-11-161-3/+3
| | | | | | | | | | | | | | | MipsFastISel uses a a class to represent addresses with a signed member to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress all treated the offset as being positive. In cases where the offset was actually negative and a frame pointer was used, this would cause the constant synthesis routine to crash as it would generate an unexpected instruction sequence when frame indexes are replaced. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D26192 llvm-svn: 287099
* [mips] not instruction aliasSimon Dardis2016-11-162-0/+5
| | | | | | | | | | | This patch adds the single operand form of the not alias to microMIPS and MIPS along with additional tests. This partially resolves PR/30381. Thanks to Sean Bruno for reporting the issue! llvm-svn: 287097
* [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.Ayman Musa2016-11-161-4/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 287087
* [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmulCraig Topper2016-11-161-25/+31
| | | | | | | | | | | | Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083
* [AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFCKonstantin Zhuravlyov2016-11-161-23/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D26711 llvm-svn: 287077
* AArch64: Use DeadRegisterDefinitionsPass before regalloc.Matthias Braun2016-11-162-33/+26
| | | | | | | | | Doing this before register allocation reduces register pressure as we do not even have to allocate a register for those dead definitions. Differential Revision: https://reviews.llvm.org/D26111 llvm-svn: 287076
* [AMDGPU] Handle f16 select{_cc}Konstantin Zhuravlyov2016-11-163-15/+13
| | | | | | | | | | - Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074
* Always use relative jump table encodings on PowerPC64.Joerg Sonnenberger2016-11-162-0/+59
| | | | | | | | | | | | | | | | | For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
* AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argumentJan Vesely2016-11-151-0/+4
| | | | | | | | | | wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056
* [AArch64] Add support for Qualcomm's Falkor CPU.Chad Rosier2016-11-153-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036
* AMDGPU/SI: Fix pattern for i16 = sign_extend i1Tom Stellard2016-11-151-1/+5
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035
* GlobalISel: remove unused variable to silence warning.Tim Northover2016-11-152-2/+1
| | | | llvm-svn: 287027
* AMDGPU: Enable store clusteringMatt Arsenault2016-11-153-1/+13
| | | | | | | Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
* [AArch64] Lower multiplication by a constant int to shl+add+shlHaicheng Wu2016-11-151-9/+39
| | | | | | | | | | | Lower a = b * C where C = (2^n + 1) * 2^m to add w0, w0, w0, lsl n lsl w0, w0, m Differential Revision: https://reviews.llvm.org/D229245 llvm-svn: 287019
OpenPOWER on IntegriCloud